Public Repository

Last pushed: 8 months ago
Short Description
hadoop downloaded on ubuntu
Full Description

Gitub Link:
https://github.com/fidato13/docker/blob/master/hadoop-ubuntu/Dockerfile

if datanode process is not started on all slaves...and you are getting errors like 'Host key verification failed' then you can try to ssh from master to slaves ... like ssh slave1 ...

Please follow the below steps:
-> Execute Docker Pull to fetch the latest image:
docker pull fidato/hadoop-ubuntu

-> Clone the below git repo:
https://github.com/fidato13/docker.git

-> Navigate to the project/folder 'hadoop-ubuntu' and then to 'compose-file' folder.

-> Execute the below command to spin up the cluster:
docker-compose up -d

-> The above command will start 4 containers (1 Master and 3 slave).

-> Open a new terminal to execute the below command, This command will open a bash shell inside your running
container with name 'master':
docker exec -it master /bin/bash

-> Open 3 new terminals to execute the below commands individually, These command will open a bash shell inside your running containers with names 'slave1', 'slave2', 'slave3'

    docker exec -it slave1 /bin/bash
    docker exec -it slave2 /bin/bash
    docker exec -it slave3 /bin/bash

-> Go back to terminal connected to 'master' container. Execute the below command:
hdfs namenode -format

This command will format the namenode.

-> From the master terminal , executed the below command:
start-dfs.sh
Please note that this command will prompt you several times. Please type 'yes' on every instance and hit enter. Wait for
the completion of the command.

-> Execute below command in master terminal:
jps
This command will give you an output like below:

root@248925665d8c:~# jps
    221 NameNode
    416 SecondaryNameNode
This means that Namenode and SecondaryNameNode have been successfully started.

-> Now go to individual slave nodes terminals and execute the below commands:

jps
This will result in below output:
root@3f730f4cf111:~# jps
103 DataNode

This means that Datanode has been successfully started on the slaves.

-> Now switch back to the master terminal and execute the below command:
start-yarn.sh
Please note that this command may prompt you several times. Please type 'yes' on every instance and hit enter. Wait for the completion of the command.

-> Now execute again the below command in master terminal:
jps
You will see that a new process will be listed named 'ResourceManager' .

-> Now go to individual slave nodes terminals and execute the below commands
jps

-> You will see that a new process will be listed named 'NodeManager'

-> Now the setup is complete. Please go to the below address in your web browser in your laptop

     http://localhost:50070/ --  this one is for Namenode
     http://localhost:8088/ -- this one is for applications

-> Now let's create a directory in hdfs by executing the below command:
hdfs dfs -mkdir -p /user/trn/input

-> Execute the below command to verify that directory was created on hdfs:
hdfs dfs -lsr /

-> Execute the below commands to download a sample file and upload it into hdfs:

$wget http://www.gutenberg.org/files/4300/4300-0.txt
$mv 4300-0.txt 4300.txt
$hdfs dfs -put 4300.txt /user/trn/input

-> Execute the below command to download the word count jar:
wget https://github.com/fidato13/docker/raw/master/hadoop-ubuntu/word_count/wc.jar

-> Execute the below command:
export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar

-> Lets run the sample word count jar to verify that our setup is correct:
hadoop jar wc.jar WordCount /user/trn/input /user/trn/output

wait for its execution.

-> Execute the below command to analyse the output:
hdfs dfs -cat /user/trn/output/part-r-00000

Thats it :)

References:
https://hadoop.apache.org/docs/r3.0.0-alpha2/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html
http://pingax.com/install-apache-hadoop-ubuntu-cluster-setup/
http://pingax.com/install-hadoop2-6-0-on-ubuntu/

Docker Pull Command
Owner
fidato