Public Repository

Last pushed: 9 months ago
Short Description
An image I've made for a introductory workshop on a Apache Spark
Full Description

Starts a Jupyter notebook with Python 3.6.1 and Spark 2.1.1. If the MODE environmental variable is set to 'CLUSTER', it will also start a Standalone Spark Slave which will connect to the master specified in the SPARK_MASTER_URI environmental variable.

Environmental variables:

  • NOTEBOOK_PORT: the port the notebook will bind to (defaults to 8888)
  • HOST: the public address of the container (defaults to
  • SPARK_MASTER_URI: the URI of the Spark Master (defaults to local[*])
  • SPARK_DRIVER_UI_PORT: the port the UI of the driver will bind to (defaults to 4040)
  • SPARK_DRIVER_PORT: the port the driver will bind to (defaults to 4039)
  • SPARK_BLOCK_MANAGER_PORT: the port the block manager will bind to (defaults to 4038)
  • SPARK_WORKER_PORT: the port the worker hosted in the container will bind to (defaults to 4037)
  • SPARK_WORKER_WEBUI_PORT: the port the worker UI in the container will bind to (defaults to 8081)
  • USER: allows the user of the host to be optionally injected as context (defaults to ws-user)
  • MODE: either LOCAL to run a local Spark instance, or CLUSTER to start a worker and connect to an external master (defaults to LOCAL)
Docker Pull Command