Public | Automated Build

Last pushed: a year ago
Short Description
Full Description


Deploy Spark on Kubernetes

  • Spark 2.0.1
  • Hadoop 2.7.3


kubectl create -f kubernetes/namespace-spark.yaml
kubectl config set-context spark --namespace=spark-cluster --cluster=${CLUSTER_NAME} --user=${USER_NAME}
kubectl config use-context spark
kubectl create -f kubernetes/spark-kubernetes.yaml

(Plagiarized again! Fixme) After you know the master is running, you can use the
cluster proxy to connect to the Spark WebUI:

kubectl proxy --port=8001

At which point the UI will be available at http://localhost:8001/api/v1/proxy/namespaces/spark-cluster/services/spark-webui/


Quick check that things are working with the Hello, World of text: count each word.

kubectl exec spark-worker-deployment-772324441-ihfk7 -it pyspark

When the PySpark prompt appears, run the following Python code:

import os

spark_home = os.environ.get('SPARK_HOME', None)
text_file = sc.textFile(spark_home + "/")

word_counts = text_file \
    .flatMap(lambda line: line.split()) \
    .map(lambda word: (word, 1)) \
    .reduceByKey(lambda a, b: a + b)
print word_counts.collect()


The Docker images is available on Docker Hub. To pull the image, run:

docker pull ramhiser/spark:2.0.1

The image was build with the following:

cd docker
docker build -t ramhiser/spark:2.0.1 .
docker push ramhiser/spark:2.0.1

Plagiarized from
k8s images README. I'll update soon. Sorry about that.

  • spark-master - Runs a Spark master in Standalone mode and exposes a port for
    Spark and a port for the WebUI.
  • spark-worker - Runs a Spark worer in Standalone mode and connects to the
    Spark master via DNS name spark-master.



The spark-kubernetes project is licensed under the
MIT License and is freely available for
commercial and non-commerical usage. Please consult the licensing terms in the
LICENSE file for more details.

NOTE: this project is loosely based on the projects below, which are licensed
under the Apache License, Version 2.0:

Please consult the licensing terms for more details.

Docker Pull Command
Source Repository