Container 1: spark master (spark 2.1)
Container 2: spark worker
Container 3: spark-notebook
Container 4: revealjs/D3
This docker-compose file assembles a Spark data munging environment that can be run from a laptop. The notebook "10_munge" connects to the spark-master which distributes the workload to spark worker. Additional workers can be added. Data is being processed using dataframes and Spark SQL.
The revealjs webserver (grunt) in the D3 container can be used to visualize results. This remains incomplete: a forthcoming container release will include an example of how to generate a json resultset to feed into D3.
The data is realtime ( ... well, 15-min latency) global news:
CSV files are transferred to the worker container for import as CSV into a dataframe. "10_munge" contains the scala commands to do this for one file. More to come.
The D3 visualization (not yet completed) will demonstrate how to use Spark SQL to visualize a complex crosstab of tabular data by visualizing a force-directed graph of new item topics, ranked by tone (very negative to very positive) accross various selected news organizations (BBC, Fox, NBC, NYT etc). It will depict the trend in tone for various actors (WHITEHOUSE, SYRIA, GOOGLE etc) as news items are published by each organization.
Docker Compose: version: "2" services: master: volumes: - .:/data image: singularities/spark command: start-spark master hostname: master ports: - "6066:6066" - "7077:7077" - "7070:7070" - "8080:8080" - "50070:50070" worker: volumes: - .:/data image: singularities/spark command: start-spark worker master hostname: worker_1 environment: SPARK_WORKER_CORES: 2 SPARK_WORKER_MEMORY: 3g links: - master notebook: image: markteehan/spark-notebook-gdelt:8 hostname: notebook depends_on: - master ports: - "9000:9000" - "9001:9001" - "4040:4040" - "4041:4041" - "4042:4042" - "4043:4043" - "4044:4044" - "4045:4045" links: - master - worker d3: volumes: - .:/data - .:/revealjs/talks image: gamsd/revealjs hostname: d3 links: - master - worker - notebook ports: - "8000:8000".