gelog/spark

By gelog

Updated over 9 years ago

WORK IN PROGRESS. NOT PRODUCTION READY

Image
4

10K+

Apache Spark

dockeri.co

starsforksissues

Supported tags and respective Dockerfile links

What is Spark ?

Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala and Python, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming.

https://spark.apache.org/docs/latest/

What is Docker?

Docker is an open platform for developers and sysadmins to build, ship, and run distributed applications. Consisting of Docker Engine, a portable, lightweight runtime and packaging tool, and Docker Hub, a cloud service for sharing applications and automating workflows, Docker enables apps to be quickly assembled from components and eliminates the friction between development, QA, and production environments. As a result, IT can ship faster and run the same app, unchanged, on laptops, data center VMs, and any cloud.

https://www.docker.com/whatisdocker/

What is a Docker Image?

Docker images are the basis of containers. Images are read-only, while containers are writeable. Only the containers can be executed by the operating system.

https://docs.docker.com/terms/image/

Dependencies

Base Docker image

BranchBase ImageDescription
mastergelog/java:openjdk7Spark pre-built for Hadoop
spark-for-hadoop" "Spark pre-built for Hadoop (dev branch)
spark-from-sourcescala:2.10.4Spark built from source

Note: currently the spark-from-source image takes quite a while to build, and generates 2.3 GB of virtual size.

The recommended branch for general use is master.

How to use this image?

Spark Master
docker run -d -h spark-master --name spark-master gelog/spark:1.1.0-bin-hadoop2.3  \
  spark-class org.apache.spark.deploy.master.Master
Spark Worker
docker run -d -h spark-worker-01 --name spark-worker-01 --link spark-master:spark-master \
gelog/spark:1.1.0-bin-hadoop2.3 spark-class org.apache.spark.deploy.worker.Worker  \
  spark://spark-master:7077

Docker Pull Command

docker pull gelog/spark