Public Repository

Last pushed: 6 months ago
Short Description
This is an wrapper image for application https://github.com/GRpro/spark-popular-words-web
Full Description

The application requires INPUT_URI and OUTPUT_URI to be set.
INPUT_URI is a file with web page addresses (address per line).
OUTPUT_URI is a not existing directory where the results will be written by Spark.
There variables can represent not only local file system path but HDFS as well.

See example below, where directory with input file is mounted to the container from local file system.

mkdir data
echo 'https://en.wikipedia.org/wiki/Meaning_of_life
https://en.wikipedia.org/wiki/Wisdom' >> data/input.txt

docker run -v "$PWD/data":/opt -e INPUT_URI='/opt/input.txt' -e OUTPUT_URI='/opt/output' -it spark-popular-words-web:latest

See results in data/output directory

Docker Pull Command
Owner
grpro