Public | Automated Build

Last pushed: a year ago
Short Description
Apache Zeppelin with a preconfigured (one node) Apache Spark 2.0
Full Description

Run :

docker run -d -p <ZEPPELIN_HOST_PORT>:8080 \
-p <SPARK_UI_WORKER_HOST_PORT>:4040 \
-p <SPARK_UI_MASTER_HOST_PORT>:8084 \
-v <ZEPPELIN_NOTEBOOK_HOST_PATH>:/home/udl_spark \
-e "SPARK_WORKER_CORES=3" \
-e "SPARK_WORKER_MEMORY=1G" \
--name=spark_zeppelin_standalone \
zenika/spark_zeppelin_standalone:latest

Options :

  • <SPARK_UI_MASTER_HOST_PORT> : Spark master webui port. Replace it with the desired host port
  • <SPARK_UI_WORKER_HOST_PORT> : Spark worker webui port. Replace it with the desired host port
  • <ZEPPELIN_HOST_PORT> : Zeppelin webui port. Replace it with the desired host port
  • <ZEPPELIN_NOTEBOOK_HOST_PATH> : The path in the host to store Zeppelin notebook. For the first run, please specify a non-existing path. See Zeppelin notebooks shared volume management for more information
  • SPARK_WORKER_CORES and SPARK_WORKER_MEMORY : If necessary, change it with desired settings

Connect to Zeppelin :

Spark and Zeppelin start automatically with the container.
Just go with your browser to localhost:<ZEPPELIN_HOST_PORT>
Spark configuration is already done so you can directly play with existing notebook or create a new one with spark code inside.
No risk to loose your notebook changes at container stop : notebooks are persistes in the <ZEPPELIN_NOTEBOOK_HOST_PATH> host path and automatically reloaded at container start.

What's inside the container ?

  • Spark 2.0 pre-built for Hadoop 2.6 and later (package download from Spark Website, no change made on it)
  • Apache Zeppelin from October 2016 github master branch (0.7 - SNAPSHOT). To reduce the image size, only the Spark, md, Angular and sh interpreter are included (deletion of the Cassandea, Flink, Hive, Ignite, Kylin, Lens, Phoenix, Postgre and Tajo interpreters)

Zeppelin notebooks shared volume management

  • Inside the container, Zeppelin is configured to store notebooks in /home/udl_spark
  • When container is running, /home/udl_spark is mounted as shared volume (<ZEPPELIN_NOTEBOOK_HOST_PATH> in host)
  • When container run for the fist time, /home/udl_spark must be empty (the best way is to indicate a non-existing path in <ZEPPELIN_NOTEBOOK_HOST_PATH>, thus Docker will create it)
Docker Pull Command
Owner
zenika
Source Repository

Comments (0)