Public | Automated Build

Last pushed: 2 years ago
Short Description
Spark
Full Description

Spark

Spark-Base

Baseline with:

  • Spark build: spark-1.2.1-bin-hadoop2.4
  • Scala: 2.11.4
  • SBT: 0.13.6

Spark-Master

Parameters to configure High Availability with ZooKeeper (Optional):

  • SPARK_DEPLOY_ZOOKEPER_URL=<zookeeper:port>,<zookeeper:port>
  • SPARK_DEPLOY_RECOVERYMODE=${SPARK_DEPLOY_RECOVERYMODE:-"ZOOKEEPER"}
  • SPARK_DEPLOY_ZOOKEPER_DIR=${SPARK_DEPLOY_ZOOKEPER_DIR:-"spark"}

Command-line to create a cluster:

  • docker run raisonata/spark:master

Command-line to join an existing cluster:

  • docker run raisonata/spark:master <master_ip>

Command-line to join an existing cluster (ZooKeeper):

  • docker run -e SPARK_DEPLOY_ZOOKEPER_URL=<zooekeper_ip:port> raisonata/spark:master <master_ip>

Spark-Worker

Command-line to add Worker (mandatory to specify master_ip):

  • docker run raisonata/spark:worker <master_ip>

Spark-Shell

  • Cassandra-Connector: 1.2.0
  • Cassandra-tools: 2.1.3
  • Python: 2.7
  • Packages: libev4 libev-dev python python-support python-pip python-dev build-essential

Spark Variables

  • SPARK_MASTER_IP Bind the master to a specific IP address, for example a public one.
  • SPARK_MASTER_PORT Start the master on a different port (default: 7077).
  • SPARK_MASTER_WEBUI_PORT Port for the master web UI (default: 8080).
  • SPARK_MASTER_OPTS Configuration properties that apply only to the master in the form "-Dx=y" (default: none). See below for a list of possible options.
  • SPARK_LOCAL_DIRS Directory to use for "scratch" space in Spark, including map output files and RDDs that get stored on disk. This should be on a fast, local disk in your system. It can also be a comma-separated list of multiple directories on different disks.
  • SPARK_WORKER_CORES Total number of cores to allow Spark applications to use on the machine (default: all available cores).
  • SPARK_WORKER_MEMORY Total amount of memory to allow Spark applications to use on the machine, e.g. 1000m, 2g (default: total memory minus 1 GB); note that each application's individual memory is configured using its spark.executor.memory property.
  • SPARK_WORKER_PORT Start the Spark worker on a specific port (default: random).
  • SPARK_WORKER_WEBUI_PORT Port for the worker web UI (default: 8081).
  • SPARK_WORKER_INSTANCES Number of worker instances to run on each machine (default: 1). You can make this more than 1 if you have have very large machines and would like multiple Spark worker processes. If you do set this, make sure to also set SPARK_WORKER_CORES explicitly to limit the cores per worker, or else each worker will try to use all the cores.
  • SPARK_WORKER_DIR Directory to run applications in, which will include both logs and scratch space (default: SPARK_HOME/work).
  • SPARK_WORKER_OPTS Configuration properties that apply only to the worker in the form "-Dx=y" (default: none). See below for a list of possible options.
  • SPARK_DAEMON_MEMORY Memory to allocate to the Spark master and worker daemons themselves (default: 512m).
  • SPARK_DAEMON_JAVA_OPTS JVM options for the Spark master and worker daemons themselves in the form "-Dx=y" (default: none).
  • SPARK_PUBLIC_DNS The public DNS name of the Spark master and workers (default: none).

Configuration: Initiated Environment Variables (Default)

Global variables

  • export SPARK_PUBLIC_DNS=${SPARK_MASTER_IP:-$IP}
  • export SPARK_STORE_DIR=${SPARK_STORE_DIR:-'/store'}
  • export SPARK_LOCAL_DIRS=${SPARK_LOCAL_DIRS:-"${SPARK_STORE_DIR}/scratch"}
  • export SPARK_TMP_DIR=${SPARK_TMP_DIR:-"${SPARK_STORE_DIR}/tmp"}
  • export SPARK_PID_DIR=${SPARK_PID_DIR:-"${SPARK_STORE_DIR}/pids"}
  • export SPARK_LOG_DIR=${SPARK_LOG_DIR:-"${SPARK_STORE_DIR}/logs"}

Master variables

  • export SPARK_MASTER_IP=${SPARK_MASTER_IP:?'Please define the \$SPARK_MASTER_IP'}
  • export SPARK_MASTER_PORT=${SPARK_MASTER_PORT:-7077}
  • export SPARK_MASTER_WEBUI_PORT=${SPARK_MASTER_WEBUI_PORT:-8080}

Worker variables

  • export SPARK_WORKER_PORT=${SPARK_WORKER_PORT:-8888}
  • export SPARK_WORKER_WEBUI_PORT=${SPARK_WORKER_WEBUI_PORT:-8081}
  • export SPARK_WORKER_INSTANCES=${SPARK_WORKER_INSTANCES:-1}
  • export SPARK_WORKER_DIR=${SPARK_WORKER_DIR:-"${SPARK_STORE_DIR}/work"}

How to run:

  • docker run -d --name "spark01.comboio01" -h "spark01.comboio01.raisonata.fr" raisonata/spark:master
  • docker run -d --name "spark02.comboio01" -h "spark02.comboio01.raisonata.fr" raisonata/spark:worker <ip_master>
  • docker run -d --name "spark03.comboio01" -h "spark03.comboio01.raisonata.fr" raisonata/spark:master <ip_master>
  • docker run -d --name "spark04.comboio01" -h "spark04.comboio01.raisonata.fr" raisonata/spark:worker <ip_master>
Docker Pull Command
Owner
raisonata
Source Repository