Public | Automated Build

Last pushed: 2 years ago
Short Description
WORK IN PROGRESS. NOT PRODUCTION READY
Full Description

Apache Spark

Supported tags and respective Dockerfile links

What is Spark ?

Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala and Python, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming.

https://spark.apache.org/docs/latest/

What is Docker?

Docker is an open platform for developers and sysadmins to build, ship, and run distributed applications. Consisting of Docker Engine, a portable, lightweight runtime and packaging tool, and Docker Hub, a cloud service for sharing applications and automating workflows, Docker enables apps to be quickly assembled from components and eliminates the friction between development, QA, and production environments. As a result, IT can ship faster and run the same app, unchanged, on laptops, data center VMs, and any cloud.

https://www.docker.com/whatisdocker/

What is a Docker Image?

Docker images are the basis of containers. Images are read-only, while containers are writeable. Only the containers can be executed by the operating system.

https://docs.docker.com/terms/image/

Dependencies

Base Docker image

Branch Base Image Description
master gelog/java:openjdk7 Spark pre-built for Hadoop
spark-for-hadoop " " Spark pre-built for Hadoop (dev branch)
spark-from-source scala:2.10.4 Spark built from source

Note: currently the spark-from-source image takes quite a while to build, and generates 2.3 GB of virtual size.

The recommended branch for general use is master.

How to use this image?

Spark Master

docker run -d -h spark-master --name spark-master gelog/spark:1.1.0-bin-hadoop2.3  \
  spark-class org.apache.spark.deploy.master.Master

Spark Worker

docker run -d -h spark-worker-01 --name spark-worker-01 --link spark-master:spark-master \
gelog/spark:1.1.0-bin-hadoop2.3 spark-class org.apache.spark.deploy.worker.Worker  \
  spark://spark-master:7077
Docker Pull Command
Owner
gelog
Source Repository

Comments (0)