Zeppelin on Docker
This project aims to help you play Spark with Python, R and Scala on a web GUI project Zeppelin.
For some demonstrations, please refer to the official site of Zeppelin.
- Spark: 2.0
- Zeppelin: 0.7.1
- Python: 2.7
- R: 3.3.1
- Scala: 2.11
Zeppelin and Spark will be running inside the Docker container. To access the web GUI and import data, you have to specify the port forwardings and volume attachments in the Docker command.
Check Available Ports
Find two available ports on your host machine to access Zeppelin and Spark-UI from outside world.
On Linux console, using the command
sudo netstat -nlp to find the current listened ports, for example
root@ubuntu:~# sudo netstat -nlp Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1186/sshd tcp6 0 0 :::22 :::* LISTEN 1186/sshd Active UNIX domain sockets (only servers)
This means the port 22 is occupied by the process named
sshd, and we should choose integers(1-65535) other than 22 for Zeppelin or Spark.
Choose the Binding Ports on Host
Continue with the previous example, choose two integers between 1 and 65535 other than 22. For example, 32770 and 32771.
Specify the Volume Attachment
To keep your notebook and data eternal, choose a directory path on your host machine for storing purpose.
Note that, the directory path need not exist in previously, Docker would create one for you if it did not exist.
Docker Run Command
Adopt from the above examples, we using the following command to run Zeppelin on Docker Container
root@ubuntu:~# sudo docker run -itd \ > -p 32771:8080 \ > -p 32270:4040 \ > -v /usr/zeppelin_dir:/workspace \ > robinlin/zeppelin \ > /bin/bash
Or in one line command
sudo docker run -itd -p 32771:8080 -p 32270:4040 -v /usr/zeppelin_dir:/workspace robinlin/zeppelin /bin/bash
Access Zeppelin from Browser
From above example, the Zeppelin service is bound on port 32771 while 32270 is for the Spark-UI.
Open your browser, and on the URL search bar type for example
http://hostmachine.example.tw:32771, you will see the Zeppelin welcome page.
Note: If your host is a virtual machine on clouds such as GCE or AWS EC2, you have to make sure your firewall rules allow TCP for the specified ports such as 32771 and 32770 in this example.
Import and Load Data
By this version of Zeppelin (0.6.1), data upload and download are not supported, ref. One can only put his data in the attached volume from host.
Following steps show how to load and save your data. Adopt from the examples above, we have specified an attached volume say
- Data Import: Moving your data to
/usr/zeppelin_diron host, e.g.
cp user_data.csv /usr/zeppelin_dir
- Load Data: On your Zeppelin notebook, using the path
/workspace/user_data.csvto read file, e.g.
user_data = sc.textFile('/workspace/user_data.csv').
- Save Data: On your Zeppelin notebook, save your data to the directory path
user_data2.save('/workspace/user_data2.csv')and you can find the file
user_data2.csvon your host's file system path
Update to Latest
sudo docker pull robinlin/zeppelin
For more examples, please refer to my Github