Public Repository

Last pushed: 3 years ago
Short Description
Base Docker image for the Spark course delivered by Data Fellas
Full Description

Welcome to the Spark 4 Devs course by Data Fellas.

You can here pull the image that will be used the whole three days!

Pull the docker image

docker pull datafellas/spark-training

Run it

Run the image

docker  run -it --rm --name spark-training -p 9000:9000 -p 4040-4060:4040-4060 --volume $SSH_AUTH_SOCK:/ssh-agent --env SSH_AUTH_SOCK=/ssh-agent datafellas/spark-training /bin/bash

Note: we mapped all ports from 4040 to 4060, 20 opened notebooks should be enough...

Warn: the port 4040-4060 mapping might not work depending on your docker version, hence must be split manually like for 9000.

Starting the training

When in bash, you need to launch



Run Image: SSH-AGENT

This is required to privately access the last git changes using the git:// protocol that will use SSH.


  1. your ssh key must be known by the training git repo
  2. your host must add this key in a ssh-agent.

Some problem may happen with ssh-agen, for instance of ec2 we can have:

Could not open a connection to your authentication agent.

Here is the SO to solve it:

You'll need to execute eval ssh-agent then exec ssh-agent bash and finally ssh-add ~/.ssh/<your-key>.

Docker Pull Command