The application requires INPUT_URI and OUTPUT_URI to be set.
INPUT_URI is a file with web page addresses (address per line).
OUTPUT_URI is a not existing directory where the results will be written by Spark.
There variables can represent not only local file system path but HDFS as well.
See example below, where directory with input file is mounted to the container from local file system.
docker run -v "$PWD/data":/opt -e INPUT_URI='/opt/input.txt' -e OUTPUT_URI='/opt/output' -it spark-popular-words-web:latest
See results in data/output directory