cloudera/quickstart image configured to use pyspark with ipython and jupyter.
The custom Docker image mvervuurt/cloudera-quickstart-miniconda is created from the official cloudera/quickstart image. You can find the Dockerfile on my GitHub Repo. It builds the Docker image cloudera/quickstart image and installs miniconda. Using miniconda interesting python packages (numpy, pandas, etc) are installed. Afterwards it sets the environment variables to use pyspark with ipython and jupyter.
Once you have succesfully pulled the Docker image, you can create a Docker container with the following command:
docker run --hostname=quickstart.cloudera --privileged=true --name=cloudera -t -i -d -p 8889:8889 -v /src-changeme/notebooks:/media/notebooks mvervuurt/cloudera-quickstart-miniconda:tag# /usr/bin/docker-quickstart. Remember to modify the path src-changeme with the path to your notebooks-dir and set the proper tag of the image after looking it up in DockerHub.
Afterwards you can run the following Docker command:
docker exec -ti cloudera /bin/bash. Once you have obtained the Bash terminal within your Docker container, you can run PySpark with:
You can now navigate with your browser to the jupyter notebook enabled with ipython and pyspark using the following link: http://container-ip:8889. Remember to modify the container-ip to the ip of your container or docker-machine.