Public Repository

Last pushed: 2 years ago
Short Description
Learn Spark by doing hands-on labs (Spark Fundamentals I course) on your laptop or Cloud
Full Description

Apache Spark Image for the Spark Fundamentals I course

This Docker image should be used for creating environments for conducting hands-on labs for the Spark Fundamentals I course on the www.bigdatauniversity.com. You can use it to create Spark environment on your own laptop/desktop or on one of the supported public clouds.

This docker image contained pre-deployed IBM STC Spark with Hadoop.

Set up Docker environment on your laptop

How to use this image ?

Kitematic (GUI)

  1. Start Kitematic in Docker folder

  2. type bigdatauniversity in the search box to filter the Docker Hub catalog to Big Data University provided images

  3. Click on Create button on the spark image to create Docker container using this image

Docker Quickstart Terminal (CLI)

For Mac
-- "Applications -> Docker -> Docker Quickstart Terminal"
For Windows
-- "Start -> Program -> Docker -> Docker Quickstart Terminal".

Then run the below steps within this terminal.

1) Pull (download) this Docker image
Run this command in your terminal window:

docker pull bigdatauniversity/spark
  • Note: it may take a while to pull this image over the internet

2) Start Docker container as daemon

  • Interactive
docker run -it --hostname bigdatauniversitySpark --name bdu_spark -P -p 8080:8080 -p 8081:8081 bigdatauniversity/spark:latest /etc/bootstrap.sh -bash
  • Daemon
docker run -d --hostname bigdatauniversitySpark --name bdu_spark -P -p 8080:8080 -p 8081:8081 bigdatauniversity/spark:latest /etc/bootstrap.sh -d

3) Start Spark

  • To start Scala Spark shell:
spark-shell
  • To start Python Spark shell:
pyspark

4) Note

  • All hands-on lab files are located in:
/opt/ibm/labfiles
  • How to restart and attach to the container

If you exit from Docker Container, you can always restart and attach to it later by running the below:

docker start  bdu_spark 
docker attach bdu_spark
  • Start a new command in a running container
docker exec -it bdu_spark <command>

Supported tags

  • latest
  • 1.4.0
  • 1.3.1

The supported tags stands for version of Spark.

Supported Docker versions

  • This image is officially supported on Docker version 1.6.0.
  • Support for older versions (down to 1.0) is provided on a best-effort basis.

Community Support

Like this image? Give us a star at the top of this page!

Docker Pull Command
Owner
bigdatauniversity