Public Repository

Last pushed: 2 years ago
Short Description
base container for spark-master and spark-worker
Full Description

drpaulbrewer/spark-roasted-elephant is a personal build of spark (1.3.1 as of April 2015) that I use to further specialize into master and workers.

It is called spark-roasted-elephant because hadoop is not included. The name is a pun on the O'Reilley book series on Hadoop featuring an elephant on the cover.

This is not an official public build and there is NO WARRANTY for this code. ALL USE IT AT YOUR OWN RISK. If it works for you, great. But don't expect it to always work, or to always have the same options compiled in.

Build Prerequisites: Before building this Dockerfile, you first need to download an Apache Spark source tarball from the Apache Spark main site at https://spark.apache.org, and, as necessary, change the lines with ADD spark-1.3.1.tgz and 1.3.1 to reflect a new version number.

Dockerfile (June, 2015 -- subject to change without further notice)

# Copyright 2015 Paul Brewer http://eaftc.com
# License: MIT
# this docker file builds a non-hadoop version of spark for standalone experimentation
# thanks to article at http://mbonaci.github.io/mbo-spark/ for tips
FROM ubuntu:15.04
MAINTAINER drpaulbrewer@eaftc.com
RUN adduser --disabled-password --home /spark spark
WORKDIR /spark
ADD spark-1.3.1.tgz /spark/ 
WORKDIR /spark/spark-1.3.1
RUN sed -e 's/archive.ubuntu.com/www.gtlib.gatech.edu\/pub/' /etc/apt/sources.list > /tmp/sources.list && mv /tmp/sources.list /etc/apt/sources.list
RUN apt-get update && apt-get --yes upgrade \
    && apt-get --yes install sed nano curl wget openjdk-8-jdk scala sqlite3 python-numpy python-scipy \
    && echo "SPARKDIR=/spark/spark-1.3.1" >>/etc/environment \
    && echo "JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64" >>/etc/environment \
    && env \
    && export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m" \
    && ./build/mvn -Phive -Phive-thriftserver -DskipTests clean package \
    && chown -R spark:spark /spark
# EXPOSE 2222 4040 6066 7077 7777 8080 8081 
# In practice this expose was insufficient as spark will open random ports.
# In response, I expose all the ports on the command line with --expose 1-65535 and keep the cluster hardware behind a hardware firewall/router
Docker Pull Command
Owner
drpaulbrewer