Public Repository

Last pushed: a year ago
Short Description
Spark Notebook Demo container
Full Description

Demo ready container

Using tools like spark, cassandra, kafka and of course the spark-notebook.

Problems: check the troubleshooting section at the bottom of this page or add a comment or contact me on this email.

Usage

Spark 2.0.0 preview

You can use now with no pain and no constraint... and within minutes the latest greatest spark version, like 2.0.0-preview (at the time writing).

To do so, pull the image, run it, start the services -- see below

Pull

docker pull andypetrella/spark-notebook-demo:master-2.0.0-preview

Run

docker run --rm -it --net=host -m 8g andypetrella/spark-notebook-demo:master-2.0.0-preview bash

Start

The 3 following scripts will setup all tools (incl. cassandra and kafka) and will also configure them with predefined content (like keyspaces or topics).

source var.sh
source start.sh
source create.sh

Access

Use the IP (or localhost of Linux box) of your Docker's VM IP and open the port 9000 and create a new notebook in the interface to start using spark 2.0.0.

On Linux then: http://localhost:9000, on Mac http://<VM-IP>:9000

TROUBLESHOOTING

Resources (VM)

You're in a docker with many things, hence make sure you're VM (on Mac/Win) has enough memory and cpus allocated to it

IP VM

On Mac/Win, you'll need to use the VM's IP instead of localhost to access the services (like the spark-notebook on the 9000 port)

Out of resources

Even if you allocated enough resources to your VM, if you open too many notebooks, you may hit a performance issue since each notebook is actually starting a new VM (with a Spark driver in it). So make sure to shutdown unused notebook (see running tab in the main UI)

Docker Pull Command
Owner
andypetrella