Public | Automated Build

Last pushed: 13 days ago
Short Description
Apache Spark
Full Description

Apache Spark

An Apache Spark container image. The image is meant to be used for creating an standalone cluster with multiple workers.

Custom commands

This image contains a script named start-spark (included in the PATH). This script is used to initialize the master and the workers.

HDFS user

The custom commands require an HDFS user to be set. The user's name if read from the HDFS_USER environment variable and the user is automatically created by the commands.

Starting a master

To start a master run the following command:

start-spark master

Starting a worker

To start a worker run the following command:

start-spark worker [MASTER]

Deprecated commands

The commands master and worker from previous versions of the image are maintained for compatibility but should not be used.

Creating a Cluster with Docker Compose

The easiest way to create a standalone cluster with this image is by using Docker Compose. The following snippet can be used as a docker-compose.yml for a simple cluster:

version: "2"

services:
  master:
    image: singularities/spark
    command: start-spark master
    hostname: master
    ports:
      - "6066:6066"
      - "7070:7070"
      - "8080:8080"
      - "50070:50070"
  worker:
    image: singularities/spark
    command: start-spark worker master
    environment:
      SPARK_WORKER_CORES: 1
      SPARK_WORKER_MEMORY: 2g
    links:
      - master

Persistence

The image has a volume mounted at /opt/hdfs. To maintain states between restarts, mount a volume at this location. This should be done for the master and the workers.

Scaling

If you wish to increase the number of workers scale the worker service by running the scale command like follows:

docker-compose scale worker=2

The workers will automatically register themselves with the master.

Docker Pull Command
Owner
singularities
Source Repository

Comments (3)
sririshinndra
4 months ago

Hi,
I am facing the same problem as gamut.
How to create a multinode cluster. Please provide more documentation

dandekarabhay
6 months ago

Thanks ! I was able to get this working ... I have also added that to a blog here : https://bigdatagurus.wordpress.com/2017/03/01/how-to-start-spark-cluster-in-minutes/
Hope this helps.
Abhay Dandekar

gamut
9 months ago

hi, I want to build a cluster. I didn't see how to connect master and workers via docker compose . Please give more details.