if datanode process is not started on all slaves...and you are getting errors like 'Host key verification failed' then you can try to ssh from master to slaves ... like ssh slave1 ...
Please follow the below steps:
-> Execute Docker Pull to fetch the latest image:
docker pull fidato/hadoop-ubuntu
-> Clone the below git repo:
-> Navigate to the project/folder 'hadoop-ubuntu' and then to 'compose-file' folder.
-> Execute the below command to spin up the cluster:
docker-compose up -d
-> The above command will start 4 containers (1 Master and 3 slave).
-> Open a new terminal to execute the below command, This command will open a bash shell inside your running
container with name 'master':
docker exec -it master /bin/bash
-> Open 3 new terminals to execute the below commands individually, These command will open a bash shell inside your running containers with names 'slave1', 'slave2', 'slave3'
docker exec -it slave1 /bin/bash docker exec -it slave2 /bin/bash docker exec -it slave3 /bin/bash
-> Go back to terminal connected to 'master' container. Execute the below command:
hdfs namenode -format
This command will format the namenode.
-> From the master terminal , executed the below command:
Please note that this command will prompt you several times. Please type 'yes' on every instance and hit enter. Wait for
the completion of the command.
-> Execute below command in master terminal:
This command will give you an output like below:
root@248925665d8c:~# jps 221 NameNode 416 SecondaryNameNode
This means that Namenode and SecondaryNameNode have been successfully started.
-> Now go to individual slave nodes terminals and execute the below commands:
jps This will result in below output: root@3f730f4cf111:~# jps 103 DataNode
This means that Datanode has been successfully started on the slaves.
-> Now switch back to the master terminal and execute the below command:
Please note that this command may prompt you several times. Please type 'yes' on every instance and hit enter. Wait for the completion of the command.
-> Now execute again the below command in master terminal:
You will see that a new process will be listed named 'ResourceManager' .
-> Now go to individual slave nodes terminals and execute the below commands
-> You will see that a new process will be listed named 'NodeManager'
-> Now the setup is complete. Please go to the below address in your web browser in your laptop
http://localhost:50070/ -- this one is for Namenode http://localhost:8088/ -- this one is for applications
-> Now let's create a directory in hdfs by executing the below command:
hdfs dfs -mkdir -p /user/trn/input
-> Execute the below command to verify that directory was created on hdfs:
hdfs dfs -lsr /
-> Execute the below commands to download a sample file and upload it into hdfs:
$wget http://www.gutenberg.org/files/4300/4300-0.txt $mv 4300-0.txt 4300.txt $hdfs dfs -put 4300.txt /user/trn/input
-> Execute the below command to download the word count jar:
-> Execute the below command:
-> Lets run the sample word count jar to verify that our setup is correct:
hadoop jar wc.jar WordCount /user/trn/input /user/trn/output
wait for its execution.
-> Execute the below command to analyse the output:
hdfs dfs -cat /user/trn/output/part-r-00000
Thats it :)