drpaulbrewer/spark-worker is a personal build of spark (1.3.1 as of April 2015) with scripting to be a worker
The version, build options, jdk, settings, etc. are from
drpaulbrewer/spark-roasted-elephant and subject to change.
This is not an official public build and there is NO WARRANTY for this code. ALL USE IT AT YOUR OWN RISK. If it works for you, great. But don't expect it to always work, or to always have the same options compiled in.
Here's an example of how to start a worker.
Here the worker's preset IP address is 192.168.1.11 and we have assumed a master at .10
You'll need to edit the IP addresses. All machines used must be able to contact each other.
Docker hostnames and networking seem more confusing than helpful at this stage.
#!/bin/bash SPARK=$(docker run --name="spark1" --expose=1-65535 --env SPARKDIR=/spark/spark-1.3.1 --env mem=10G --env master=spark://192.168.1.10:7077 --env SPARK_LOCAL_IP=192.168.1.11 -v /data:/data -v /tmp:/tmp -d drpaulbrewer/spark-worker:latest) sudo pipework eth0 $SPARK email@example.com
For a worker on a wireless LAN
pipework wasn't useful, even against the proper interface.
For wireless, run the container on the hosts' network stack directly. In such case, SPARK_LOCAL_IP can usually be omitted.
#!/bin/bash sudo -v docker run --net="host" --expose=1-65535 --env SPARKDIR=/spark/spark-1.3.1 --env mem=10G --env master=spark://192.168.1.10:7077 -v /data:/data -v /tmp:/tmp -d drpaulbrewer/spark-worker:latest
To shutdown and cleanup, you may want to create a shell script similar to this but changing the container names as appropriate on the host (not the container):
#!/bin/bash docker kill master spark1 docker rm master spark1