Public Repository

Last pushed: 2 years ago
Short Description
Slave node of a Multi-node Hadoop Cluster
Full Description

This is an image of a slave node of a multi-node hadoop cluster on a docker running Ubuntu 14.04. The image was created on docker running on a Ubuntu 14.04 system. The image has been created using div4/hadoop as the base image. The slave node runs the Datanode and Node manager.

The cluster can be set up on a single host using docker link command or on multiple hosts using weave or any other clustering tool. Weave is a good, easy to use and user-friendly tool that can be used to set up a cluster. For weave installation and commands:

https://github.com/weaveworks/weave

http://xmodulo.com/networking-between-docker-containers.html

To start slave node:

Pull this image and run the container

sudo docker run -it --name hadoop_slave -P -p 50070:50070 -p 50090:50090 div4/hadoop_slave /bin/bash

Dedicated hadoop user group is hadoop_group and user account is hduser1.

su – hduser2
sudo service ssh start
ssh localhost

Update the /etc/hosts file with the IPs assigned to docker containers, using weave or any other clustering tool. In the downloaded containers, the master has been assigned an IP 10.0.0.5/24 in the subnetwork of cluster. The slave has been assigned IP 10.0.0.7/24 in the subnetwork of cluster.

vi /etc/hosts

and write to the file

10.0.0.5 master
10.0.0.7 slave1

Follow the steps on master node. And once the cluster is started, check the daemons running

jps

Hadoop Web Interfaces are accessible at http://localhost:50070/ - web UI of the NameNode daemon.
If you are running hadoop on a server on cloud, replace local host with ip of the server running hadoop_master.

If password is required: H4doop

Changes can be made to this multi-host configuration using hadoop configuration files, to run more than one slaves just run the required number of slave containers and edit hadoop config files. These links may be helpful:

http://doctuts.readthedocs.org/en/latest/hadoop.html
http://www.bogotobogo.com/Hadoop/BigData_hadoop_Install_on_ubuntu_single_node_cluster.php

Docker Pull Command
Owner
div4

Comments (5)
div4
2 years ago

Hi dealbitte,

When starting master and slaves on the same machine, we cannot map the same port to multiple dockers. Hence, port mapping don't works.

Yes, I think that's the best way when we are starting multiple dockers for a cluster on the same host.

Thanks for bringing this up.

dealbitte
2 years ago

Hi div4,

A follow up question. You've mentioned "The cluster can be set up on a single host using docker link command".

As I am starting both master and slave on a same machine, I cannot start slave with port mapping instructions. So I started the slave

docker run -it --name hadoop_slave --link hadoop_master:hadoop_master div4/hadoop_slave /bin/bash

Please let me know if this is the right way to start?

Thanks

dealbitte
2 years ago

Thanks for the reply. You are right. The issue was docker chain got deleted from iptables. After restarting docker daemon, I could start the containers.

div4
2 years ago

Hi dealbitte,
I am not very clear about the problem, but supposedly random IP addresses should not cause a problem, since, -P is docker's own way to map a port from the docker to host. It takes into consideration the random IP address and does this mapping each time a new container is run from image.

The problem is probably caused because the iptables chain did not init correctly. It may happen if your 50070 or 50090 port is already alloted to some other service or some iptables related issue. For the later you can see https://github.com/docker/docker/issues/10218.

You can try whether this works:
sudo docker run -it -P -p 50070:50070 -p 50090:50090 ubuntu:14.04 /bin/bash

dealbitte
2 years ago

Hi

Thanks for uploading the containers to the hub. I am not able to start either Master or Slave containers. I think the reason is because of IP address configuration (172.17.0.76 and 172.17.0.77) as the containers get random IP address when they start. Starting either Master or Slave results in similar error. Console output below

  1. Starting Master
    $ docker run -it --name hadoop_master -P -p 50070:50070 -p 50090:50090 div4/hadoop_master /bin/bash
    Error response from daemon: Cannot start container 297bc5d907d17bc5f789a7bd36ed4a3bc9c75e1ca03df161968bf7b5d0f4ef8b: iptables failed: iptables --wait -t nat -A DOCKER -p tcp -d 0/0 --dport 50090 -j DNAT --to-destination 172.17.0.76:50090 ! -i docker0: iptables: No chain/target/match by that name.
    (exit status 1)
  1. Starting Slave
    div4/hadoop_slave
    $ docker run -it --name hadoop_slave -P -p 50070:50070 -p 50090:50090 div4/hadoop_slave /bin/bash
    Error response from daemon: Cannot start container 753a0b8ad792134e668ceed5d2d8beca8a4f5596c14592a5d0f9c5b9800ee6b6: iptables failed: iptables --wait -t nat -A DOCKER -p tcp -d 0/0 --dport 50090 -j DNAT --to-destination 172.17.0.77:50090 ! -i docker0: iptables: No chain/target/match by that name.
    (exit status 1)

Thanks