This is an image of a master node of a multi-node hadoop cluster on a docker running Ubuntu 14.04. The image was created on docker running on a Ubuntu 14.04 system. The image has been created using div4/hadoop as the base image. The master node runs the Namenode, Secondary Namenode, Node manager and Resource manager.
The cluster can be set up on a single host using docker link command or on multiple hosts using weave or any other clustering tool. Weave is a good, easy to use and user-friendly tool that can be used to set up a cluster. For weave installation and commands:
To start master node:
Pull this image and run the container
sudo docker run -it --name hadoop_master -P -p 50070:50070 -p 50090:50090 div4/hadoop_master /bin/bash
Dedicated hadoop user group is hadoop_group and user account is hduser1.
su – hduser1 sudo service ssh start ssh localhost
Update the /etc/hosts file with the IPs assigned to docker containers, using weave or any other clustering tool. In the downloaded containers, the master has been assigned an IP 10.0.0.5/24 in the subnetwork of cluster. The slave has been assigned IP 10.0.0.7/24 in the subnetwork of cluster. Do this edit in both master and slave nodes before proceeding.
and write to the file
10.0.0.5 master 10.0.0.7 slave1
Allow master node to connect to slave node. For this run:
ssh-copy-id -i $HOME/.ssh/id_rsa.pub hduser2@slave1
Check connection from master account to the slave account
ssh master ssh hduser2@slave1 exit
Start the Cluster
cd /usr/local/hadoop sbin/start-all.sh
Check the daemons running
To stop hadoop, run
Hadoop Web Interfaces are accessible at http://localhost:50070/ - web UI of the NameNode daemon.
If you are running hadoop on a server on cloud, replace local host with ip of the server.
If password is required: H4doop
Changes can be made to this multi-host configuration using hadoop configuration files, to run more than one slaves just run the required number of slave containers and edit hadoop config files. These links may be helpful: