Public Repository

Last pushed: 2 years ago
Short Description
Cloudera Hadoop Docker container with word count example
Full Description

WordCount example application

Pre-requisites:

Install docker:

Docker Installation

Download modified(double count "cloudera") WordCount:

WordCount.Java

Pull Docker image:

docker pull karthicks/cdh5-docker

Execution:

Run image in background and mount WordCount.java:

As needed, replace '/root' below with folder containing WordCount.java

docker run -i -t -v /root/WordCount.java:/WordCount.java --name cdh_wc7 -d -p 9020:9020 -p 60070:60070 -p 60010:60010 -p 60020:60020 -p 60075:60075 -p 9030:9030 -p 9031:9031 -p 9032:9032 -p 9033:9033 -p 9088:9088 -p 9040:9040 -p 9042:9042 -p 10021:10021 -p 19889:19889 -p 11001:11001 karthicks/cdh5-docker:latest

Enter docker bash shell (Replace CONTAINERID from above output):

docker exec -i -t CONTAINERID bash -l

Run WordCount application:

su hdfs

hadoop fs -rm -f -r /user/cloudera/wordcount/output

hadoop fs -chown hdfs /tmp/hadoop-yarn/staging/hdfs/.staging

hadoop jar /word_count.jar org.myorg.WordCount /user/cloudera/wordcount/input /user/cloudera/wordcount/output

####Verify Output:
hadoop fs -cat /user/cloudera/wordcount/output/*

If needed, recompile the WordCount class:

mkdir -p build javac -cp /usr/lib/hadoop/*:/usr/lib/hadoop-mapreduce/* WordCount.java -d build -Xlint

If needed, recreate JAR:

jar -cvf word_count.jar -C build/ .


This example uses the CDH-5 Docker image contributed by chalimartines:

WordCount Example using Spark here

Docker Pull Command
Owner
karthicks

Comments (0)