Public | Automated Build

Last pushed: 10 months ago
Short Description
The image contains Spark 2.0 and all the dependencies needed to use Spark's Machine Learning Library
Full Description

SPARK STANDALONE CLUSTER OUT-OF-BOX


1. OBJECTIVE

This project launches a scalable Spark 2.0 Standalone cluster based on Docker containers.

2. REQUIREMENTS

To run this project you should have Docker (recommended version 10.0 or later) and Docker-Compose

3. INSTRUCTIONS

  • First you'll need to build the docker image or pull it from the public repository:

docker build -t athosgvag/sprk:latest .
or
docker pull athosgvag/sprk

  • Then you'll deploy the cluster:

nohup docker-compose -p sprk up &

  • And start running your Spark jobs:

docker exec sprk01 spark-submit pathToYourScript yourArgs

  • Example job:

docker exec sprk01 spark-submit /code/python/exercise.py /data/input spark://138.19.0.2:7077

  • You may also run Spark commands interactively from Spark's python shell:

docker exec -it sprk01 pyspark --master spark://138.19.0.3:7077

  • When you're done with your jobs, just clean everything up:

./remove-cluster.sh

4. TIPS

  • Put your scripts in ./code and your data in ./data so that they're accessible from inside the containers. Any output data you write to these directories will persist even after the cluster is removed.

  • The Spark master URL that your scripts should use to start SparkContext is spark://138.19.0.2:7077

  • Access http://138.19.0.2:8080 from your browser to check on your jobs' and worker nodes' status.

  • Wanting to scale out? Just copy the block named SLAVE NODE 02 in your docker-compose.yml file and change the container name and service name arguments from sprk02 to sprk03, sprk04, etc... Example:

...
#SLAVE NODE 03
 sprk03:
  image: athosgvag/sprk
  container_name: sprk03
  volumes:
  - ./code:/code
  - ./data:/data
  command: bash -c "/code/setup/config-side-node.sh -m=138.19.0.2 -t=1200"
  networks:
  - net
  depends_on:
  - sprk01
...
  • WARNING: this cluster has 20 minutes time-to-live. You may change this with the -t argument in docker-compose.yml:
...
command: bash -c "/code/setup/config-main-node.sh -m=138.19.0.2 -t=<time_to_live_in_milliseconds>"
...
Docker Pull Command
Owner
athosgvag
Source Repository

Comments (0)