Apache Spark on Rancher
An Apache Spark container image. The image is meant to be used for creating an standalone cluster with multiple workers on Rancher with Managed network.
This image contains a script named
start-spark (included in the PATH). This script is used to initialize the master and the workers.
The custom commands require an HDFS user to be set. The user's name if read from the
HDFS_USER environment variable and the user is automatically created by the commands.
Starting a master
To start a master run the following command:
Starting a worker
To start a worker run the following command:
start-spark worker [HOSTNAME]
worker from previous versions of the image are maintained for compatibility but should not be used.
Creating a Cluster with Docker Compose
Add stack on Rancher using Docker Compose:
master: image: niger/sparkrancher command: start-spark master hostname: master ports: - "6066:6066" - "7070:7070" - "8080:8080" - "50070:50070" worker: image: niger/sparkrancher command: start-spark worker master ports: - "8081:8081" environment: SPARK_WORKER_CORES: 1 SPARK_WORKER_MEMORY: 2g links: - master
The image has a volume mounted at
/opt/hdfs. To maintain states between restarts, mount a volume at this location. This should be done for the master and the workers.