Public | Automated Build

Last pushed: 2 years ago
Short Description
Building Docker image for Hadoop
Full Description


Docker image for Hadoop

Why do I built this images?

I need a Hadoop image that:

  • Was built to run with Java 8
  • Allows to customize Hadoop configuration at runtime
  • Can be used to run both Hadoop server or Hadoop client.

There are some Docker images for Hadoop already but I could not find one that has the features that I need so I decided to built this image.

Which features does this Docker image prodive:

  • Hadoop that is installed in psuedo-distributed mode
  • Runtime customizable options (over environment variables):

    • Namenode hostname. Default is localhost
    • Default is yarn
    • Default is 512m
    • Default is 512m
    • yarn.resourcemanager.hostname. Default is
    • yarn.nodemanager.delete.debug-delay-sec. Default is 600
    • yarn.scheduler.minimum-allocation-mb. Default is 32m
    • yarn.scheduler.maximum-allocation-mb. Default is 1024
    • yarn.nodemanager.resource.memory-mb. Default is 2048m
    • yarn.nodemanager.vmem-check-enabled. Default is false
  • Applied best practices to build image to reduce its size (~900m right now and it is much smaller compare to some other Hadoop docker images)

How to use

Server and client on one container

# Start a Hadoop server on a container named 
docker run -d --name hadoop binhnv/docker-hadoop
# Login to Hadoop container
docker exec -it hadoop bash
# Create a directory
/usr/local/hadoop/bin/hadoop fs -mkdir /test
# List directory
/usr/local/hadoop/bin/hadoop fs -ls /

Server and client on separated containers

Create a Dockerfile for Hadoop client like this

FROM binhnv/docker-hadoop

CMD dhcmd config && tail -f /dev/null

Create a Docker Compose configuration file like this

version: "2"

    image: binhnv/docker-hadoop

    build: .
      HD_NAMENODE_HOST: "hadoops"

Bring up the stack

docker-compose up -d --build

Now you can login into the client container and run all hadoop commands there

Docker Pull Command
Source Repository