drpaulbrewer/spark-worker-nfs is a personal build of spark (1.3.1 as of April 2015) with scripting to be a worker using nfs4 for a shared file system.
Note that running parallel analyses through nfs4 creates a bottleneck and it is better instead to copy data files to each worker's own file system before running anything.
The version, build options, settings, etc. are from
drpaulbrewer/spark-roasted-elephant and subject to change.
This is not an official public build and there is NO WARRANTY for this code. ALL USE IT AT YOUR OWN RISK. If it works for you, great. But don't expect it to always work, or to always have the same options compiled in.
Dockerfile (subject to change)
FROM drpaulbrewer/spark-roasted-elephant:latest MAINTAINER email@example.com ADD my-spark-worker.sh /spark/ RUN apt-get install --yes nfs-common CMD /spark/my-spark-worker.sh
container included file /spark/my-spark-worker.sh
#!/bin/bash -e cd /spark/spark-1.3.1 sleep 10 # dont use ./sbin/start-slave.sh it wont take numeric URL mkdir -p /Z/data mount -o ro -t nfs4 $nfsdata /Z/data su -c "cd /spark/spark-1.3.1 && ./bin/spark-class org.apache.spark.deploy.worker.Worker --memory $mem $master" spark