Public Repository

Last pushed: 8 months ago
Short Description
Jupyterhub version integrated with spark, python and R
Full Description

Jupyterhub version integrated with spark

Steps to set it up

1 - To create a container run it as

docker run -d -p 8000:8000 --name jupyterhub hselvaggi/jupyterhub_spark

this will let you access jupyter on localhost:8000 there is a default user guest / guest so you can omit step 2 unless you want to setup more users.

2 - To create a user to access jupyter run the next two commands

docker exec -it jupyterhub /bin/bash
adduser <username to login in jupyter>

After filling all the data requested by adduser you can exit the container by running

exit

3 - There is some work in progress, in the mean time you need to specify the following lines to be able to properly instantiate the spark context.

from pyspark import SparkContext, SparkConf

conf = SparkConf().setAppName('Your application name').setMaster('local[*]')
sc = SparkContext.getOrCreate(conf)

There are available python 2, python 3 and R kernels.

Notes: The latest version fixes the need to set the SPARK_HOME path in python os.environ. The template for new users has been fixed so the creation of new users need no configuration at all from the administrator.
Support to read data from a Cassandra database added.

Docker Pull Command
Owner
hselvaggi