Public Repository

Last pushed: 15 days ago
Short Description
Docker image for the SparkNotebook
Full Description

This docker is packaging the SparkNotebook, that is the easiest way to hack on Spark within minute :-D.

The SparkNotebook has been created by @noootsab and is officially supported by the @DataFellas company (website).

WARN: the combination you'd like might not exists, if so come on here.

Running it will simply start the underneath play application which listen on the port 9000.

Hence, using this DockerFile is:

  • docker pull andypetrella/spark-notebook:<tag>
  • docker run -p 9000:9000 -p 4040-4045:4040-4045 andypetrella/spark-notebook:<tag>

The <tag> is actually a composition of the:

  • notebook version
  • scala version
  • spark version
  • hadoop version
  • with spark hive
  • with spark parquet

Hence the version 0.6.1-scala-2.10.4-spark-1.5.0-hadoop-2.4.0 stands for

  • notebook version 0.6.1
  • scala version 2.10.4
  • spark version 1.5.0
  • hadoop version 2.4.0

And the version 0.6.1-scala-2.10.4-spark-1.5.0-hadoop-2.4.0-with-hive-with-parquet adds

  • hive support
  • parquet support

Check the tags page to check if your version is published.
If your combination doesn't not exists please go build it on the generator page

Downloading the image could take some time (as usual), however it starts within seconds.

To start hacking some Spark, just browse to http://localhost:9000

For more information about the Notebook features, you can refer to the README.md.

You can also poke us on Twitter @SparkNotebook, @DataFellas or even @noootsab).

Docker Pull Command
Owner
andypetrella

Comments (5)
yiyang186
2 years ago

Can you provide some DockerFile for HTTP? It always got "connection time out" when I used "docker pull" instruction, which seems be without the ability of Broken-point Continuingly-transferring.

georg45df34
2 years ago

Hi,
I try to docker pull using

docker pull andypetrella/spark-notebook:master-scala-2.11.6-spark-1.5.1-hadoop-2.7.1
docker run -p 9000:9000 andypetrella/spark-notebook:master-scala-2.11.6-spark-1.5.1-hadoop-2.7.1
Pulling repository docker.io/andypetrella/spark-notebook
Tag master-scala-2.11.6-spark-1.5.1-hadoop-2.7.1 not found in repository docker.io/andypetrella/spark-notebook
bash-3.2$ docker run -p 9000:9000 andypetrella/spark-notebook:master-scala-2.11.6-spark-1.5.1-hadoop-2.7.1
Unable to find image 'andypetrella/spark-notebook:master-scala-2.11.6-spark-1.5.1-hadoop-2.7.1' locally
Pulling repository docker.io/andypetrella/spark-notebook
Tag master-scala-2.11.6-spark-1.5.1-hadoop-2.7.1 not found in repository docker.io/andypetrella/spark-notebook
but apparently this is not possible.
What is wrong?

alpinegizmo
2 years ago

I found a solution to this problem at https://github.com/andypetrella/spark-notebook/issues/327.

anandrajj
2 years ago

I could be because port 9000 is already in use by the host system. Try using different host port something like 10001:9000.

rahulcdocker
2 years ago

I downloaded the image with the tag:
0.6.1-scala-2.10.4-spark-1.5.0-hadoop-2.6.0-cdh5.4.4-with-hive-with-parquet

After everything was successfully downloaded, I tried

docker run -p 9000:9000 -p 4040-4045:4040-4045 andypetrella/spark-notebook:0.6.1-scala-2.10.4-spark-1.5.0-hadoop-2.6.0-cdh5.4.4-with-hive-with-parquet

Following is the log message generated

Play server process ID is 1
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/docker/lib/org.slf4j.slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/docker/lib/ch.qos.logback.logback-classic-1.1.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
log4j:WARN No appenders could be found for logger (net.sf.ehcache.config.ConfigurationFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using default LocalFilesystemCluster for integration testing
Exception in thread "main" java.lang.NoSuchMethodError: tachyon.util.NetworkUtils.getPort(Lorg/apache/thrift/transport/TServerSocket;)I
at tachyon.master.TachyonMaster.<init>(TachyonMaster.java:142)
at tachyon.master.LocalTachyonMaster.<init>(LocalTachyonMaster.java:109)
at tachyon.master.LocalTachyonMaster.create(LocalTachyonMaster.java:162)
at tachyon.master.LocalTachyonCluster.start(LocalTachyonCluster.java:189)
at tachyon.master.LocalTachyonCluster.start(LocalTachyonCluster.java:158)
at notebook.share.Tachyon$.start$lzycompute(Tachyon.scala:60)
at notebook.share.Tachyon$.start(Tachyon.scala:59)
at Global$.onStart(Global.scala:11)
at play.api.GlobalPlugin.onStart(GlobalSettings.scala:220)
at play.api.Play$$anonfun$start$1$$anonfun$apply$mcV$sp$1.apply(Play.scala:91)
at play.api.Play$$anonfun$start$1$$anonfun$apply$mcV$sp$1.apply(Play.scala:91)
at scala.collection.immutable.List.foreach(List.scala:318)
at play.api.Play$$anonfun$start$1.apply$mcV$sp(Play.scala:91)
at play.api.Play$$anonfun$start$1.apply(Play.scala:91)
at play.api.Play$$anonfun$start$1.apply(Play.scala:91)
at play.utils.Threads$.withContextClassLoader(Threads.scala:21)
at play.api.Play$.start(Play.scala:90)
at play.core.StaticApplication.<init>(ApplicationProvider.scala:55)
at play.core.server.NettyServer$.createServer(NettyServer.scala:244)
at play.core.server.NettyServer$$anonfun$main$3.apply(NettyServer.scala:280)
at play.core.server.NettyServer$$anonfun$main$3.apply(NettyServer.scala:275)
at scala.Option.map(Option.scala:145)
at play.core.server.NettyServer$.main(NettyServer.scala:275)
at play.core.server.NettyServer.main(NettyServer.scala)

Any help would be appreciated.
Thanks