Public | Automated Build

Last pushed: a month ago
Short Description
JGroups demo
Full Description

= JGroups and Docker
:author: Bela Ban belaban@yahoo.com

#:backend: deckjs

#:deckjs_transition: fade
:navigation:
:deckjs_theme: web-2.0
:deckjs_transition: fade
:goto:
:menu:
:toc:
:status:

== Overview
This document describes how to use JGroups in Docker containers to form clusters locally,
on Amazon Web Services (AWS) and in the Google cloud (Google Compute Platform, GCP).

The advantage of running Docker images in the cloud, rather than cloud-specific images is that the Docker
images are the same across different clouds, whereas cloud images are always specific to a given cloud.

The predefined docker image we'll use is https://hub.docker.com/r/belaban/jgroups. It contains a number
of interactive <<demos>>, ie. demos which require a TTY.

The difference between running locally, or running in a cloud lies in the JGroups configuration and the Docker
startup options. Clouds generally do not support IP multicasting, so JGroups applications have to resort to
TCP rather than UDP as transport. Plus, PING cannot be used and has to be replaced with different discovery protocols,
e.g. S3_PING or GOOGLE_PING.

=== Bridged and host network

Docker supports none, bridge and host networks out of the box (https://docs.docker.com/engine/userguide/networking).
Which type of network is used is defined by the --network option, e.g. --network=host.

Network none has no communication to the outside world and is used purely for Docker containers running on the same
box. The bind address used is usually 127.0.0.1 (loopback) here.

=== Bridge network
Network bridge is the default and creates a virtual interface in the container that's not available on the host. To
be able to communicate with other Docker containers on other hosts, the host's IP address has to be used (e.g. using
external_addr=<host's address>, or by switching to --network=host (see below).

For instance, starting an AWS EC2 instance, ifconfig executed on the host shows the following interfaces:

[ec2-user@ip-172-31-14-130 ~]$ ifconfig
docker0   Link encap:Ethernet  HWaddr 02:42:33:C7:8E:F1
          inet addr:172.17.0.1  Bcast:0.0.0.0  Mask:255.255.0.0
          inet6 addr: fe80::42:33ff:fec7:8ef1/64 Scope:Link
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:22 errors:0 dropped:0 overruns:0 frame:0
          TX packets:20 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:1342 (1.3 KiB)  TX bytes:1492 (1.4 KiB)

eth0      Link encap:Ethernet  HWaddr 0E:C8:A1:0F:39:E6
          inet addr:172.31.14.130  Bcast:172.31.15.255  Mask:255.255.240.0
          inet6 addr: fe80::cc8:a1ff:fe0f:39e6/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:9001  Metric:1
          RX packets:105050 errors:0 dropped:0 overruns:0 frame:0
          TX packets:37021 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:117948327 (112.4 MiB)  TX bytes:6929186 (6.6 MiB)
...

The docker0 interface is created by Docker to implement the bridged network. This is the interface that will be used
by the Docker container. Note that eth0 is the host's interface, with a routable 172.31.14.130 IP address.

When starting a container with bridged networking (default):

docker run -it -p 7800:7800 --rm belaban/jgroups, ifconfig executed in the container shows:

bash-4.3$ ifconfig
eth0      Link encap:Ethernet  HWaddr 02:42:AC:11:00:02
          inet addr:172.17.0.2  Bcast:0.0.0.0  Mask:255.255.0.0
          inet6 addr: fe80::42:acff:fe11:2%32571/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:6 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:508 (508.0 B)  TX bytes:258 (258.0 B)
...

The 172.17.0.2 address is assigned by Docker from the docker0 interface.

The issue with these Docker-private addresses is that they cannot be used to talk between AWS instances, as they are not
routed. For instances to talk to each other, the host's eth0 has to be used (from the 172.31.0.0 subnet).

To do this, JGroups has an attribute external_addr in the transport. In a configuration, the following transport
snippet would enable JGroups applications in network-bridged Docker containers to communicate:

[source,xml]

<TCP external_addr="${ext-addr:172.31.14.130}"
bind_addr="match-interface:eth0"
...

/>

This means that JGroups will bind to eth0 (172.17.0.2), but advertize its address as 172.31.14.130, so
other instances can talk to the instance.

How to find out the hosts address is implementation-dependent;
in AWS curl http://169.254.169.254/latest/meta-data/local-ipv4 returns the IP address of the EC2 instance.

An alternative is to grab the IP address before starting the Docker container and pass it to the container as an
environment variable (e.g. -e EXT-ADDR=1.2.3.4). The container's entrypoint could then start JGroups by passing
system property -Dext-addr=1.2.3.4 which will set external_addr in TCP.

==== Host network
The host network is started as follows:

docker run --network=host -it -p 7800:7800 --rm belaban/jgroups.

This means that the started container has access to the same interfaces as present on the host. This can be confirmed
by executing ifconfig inside the container:

bash-4.3$ ifconfig
docker0   Link encap:Ethernet  HWaddr 02:42:33:C7:8E:F1
          inet addr:172.17.0.1  Bcast:0.0.0.0  Mask:255.255.0.0
          inet6 addr: fe80::42:33ff:fec7:8ef1%32692/64 Scope:Link
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:30 errors:0 dropped:0 overruns:0 frame:0
          TX packets:20 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:1878 (1.8 KiB)  TX bytes:1492 (1.4 KiB)

eth0      Link encap:Ethernet  HWaddr 0E:C8:A1:0F:39:E6
          inet addr:172.31.14.130  Bcast:172.31.15.255  Mask:255.255.240.0
          inet6 addr: fe80::cc8:a1ff:fe0f:39e6%32692/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:9001  Metric:1
          RX packets:192724 errors:0 dropped:0 overruns:0 frame:0
          TX packets:128453 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:138976219 (132.5 MiB)  TX bytes:27369702 (26.1 MiB)
...

As can be seen, the interfaces are the same as the host's interfaces. This means that the JGroups configuration doesn't
require an external_addr attribute and can simply define bind_addr:

[source,xml]

<TCP
bind_addr="match-interface:eth0"
...

/>

This will bind the transport's sockets to 172.31.14.130.

==== Bridged versus host networking
It is simpler to use host networks as they don't require NATing between addresses (no external_addr). However, in
some cases, this may not be available, when automatically launching containers via AWS Beanstalk.

== Running containers locally
The JGroups demos can be run as multiple containers forming a cluster on the same local box (in bridged or host network
mode), or across boxes in the local network (in host network).

To run containers locally, the configuration used by JGroups uses IP multicasting,
as shown in ./conf/udp.xml (abridged):

[source,xml]

<config>
<UDP bind_addr="match-interface:eth0,match-interface:en0,site_local,loopback" />

<PING />
<MERGE3 max_interval="30000" min_interval="10000"/>
<FD_SOCK/>
<FD_ALL timeout="10000" interval="3000"/>
<pbcast.NAKACK2/>
<UNICAST3 />
<pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
               max_bytes="8m"/>
<pbcast.GMS print_local_addr="true" join_timeout="3000"
            view_bundling="true"/>
<UFC max_credits="2M" min_threshold="0.4"/>
<MFC max_credits="2M" min_threshold="0.4"/>
<FRAG2 frag_size="60K"  />

</config>

This configuration uses IP multicasting (UDP as transport) and multicast discovery (PING as discovery protocol).
It binds to eth0 if found, if not tries to bind to en0 (Macs), then tries to find a site local IP address, and
falls back to lookback (127.0.0.1) if all preceding addresses didn't match.

Run multiple container as follows:

docker run -it --rm --network=host belaban/jgroups

In the container, there's a readme which describes how to run the demos, e.g. Chat. Also see <<Demos>> below for details
on the demos. To run the Chat demo, run

udp.xml is the default so -props can be omitted

chat.sh -props udp.xml -name A

This should create a cluster of Chat nodes, even across hosts due to --network=host.

If IP multicasting is not supported, one can always fall back to TCP, but this means copying tcp.xml (for example)
from the JGroups distribution into the Docker container and modifying it (e.g. change the discovery protocol). The
JGroups manual at http://www.jgroups.org/manual4/index.html shows how to do this.

== Amazon Web Services (AWS)

This section shows how to run Docker containers with the JGroups demo on AWS. Contrary to running locally, we have
to use a security policy to define which traffic (TCP/UDP/ICMP) to send/receive on which ports.

NOTE: Alternative ways of running JGroups in AWS include (1) creating/customizing an EC2 image and then running a host
with the image directly (without docker), or (2) using https://aws.amazon.com/ecs[EC2 Container Service (ECS)],
which runs docker images on a number of EC2 instances. +
The reason (1) is not discussed here is lazyness :-) I did not want to go through the rather long turnaround times
of iteratively creating and customizing an AMI. Instead, creating and customizing a docker image and running/testing it
locally made for much faster turnaround times! +
Running on ECS (option 2) is similar to what's described here, except that there's no need to spin up EC2 instances
manually, as this is done by ECS when defining a cluster.

Since IP multicasting is not supported, we also have to switch to TCP as transport and also change the discovery
protocol (see <<AWS_Discovery>> below).

Finally, if we use S3 for discovery, this requires an IAM role assigned to the container that permits creation,
deletion, reading and writing of S3 buckets. These topics are discussed below.

The Amazon image used for running the demo contains a bare bones Linux plus the Docker software: ami-92659c84.
Search for amazon-ecs-optimized to find it. Of course, any other AMI that contains Docker can be used.

First, an EC2 instance is created with this AMI, then we need to ssh into it and start the Docker container.

=== Security policy
The security policy used for the demo permits all traffic from any protocol on any port and from/to any address. This
is fine for a demo, but of course not recommended for production.

Alternatively, using _no_ security group is the same as above.

The selected security policy needs to be associated with the EC2 instance when it is started.

=== IAM security role to access S3
If S3 (e.g. S3_PING or NATIVE_S3_PING) is used for discovery in the JGroups configuration, then the EC2 instance
needs to be started with a role that permits access to S3 buckets, specifically creation of buckets, reading objects
from buckets and writing objects to buckets (deletion is currently not used).

The role that I used for the demo was AmazonS3FullAccess which (as its name suggests) has all permissions regarding
S3 buckets.

[[AWS_Discovery]]
=== Discovery

The task of discovery is for a new cluster node to find the coordinator of the cluster to join, and send it a join
request. Whereas PING sends a multicast and everyone responds with the coordinator's address and information about
themselves, this cannot be done in a cloud where IP multicasting usually isn't supported.

Therefore, we have to resort to other ways of running discovery. They're discussed in the following sections.

[[NATIVE_S3_PING]]
==== NATIVE_S3_PING

NATIVE_S3_PING is a separate protocol developed at https://github.com/jgroups-extras/native-s3-ping. It uses the
Amazon S3 Java SDK to access buckets in S3 which store information about cluster members.

The advantage over link:http://www.jgroups.org/manual4/index.html#_s3_ping[S3_PING] is that no credentials
(AWS_ACCESS_KEY, AWS_SECRET_ACCESS_KEY) needs to be passed to the EC2 instance on startup, but rather the
credentials of the user which started the EC2 instance are used.

There's a configuration ./conf/aws.xml which includes this protocol:

[source,xml]

<config>
<TCP external_addr="${JGROUPS_EXTERNAL_ADDR:match-interface:eth0}" bind_addr="site_local,match-interface:eth0" bind_port="${TCP_PORT:7800}" />
<!--
Uses an S3 bucket to discover members in the cluster.

  - If "mybucket" doesn't exist, it will be created (requires permissions)
-->
<org.jgroups.aws.s3.NATIVE_S3_PING
    region_name="${S3_REGION:us-east-1}"
    bucket_name="${S3_BUCKET:mybucket}"
 />
<MERGE3 max_interval="30000" min_interval="10000"/>
<FD_SOCK external_addr="${JGROUPS_EXTERNAL_ADDR}"
         start_port="${FD_SOCK_PORT:9000}"/>
<FD_ALL timeout="10000" interval="3000"/>
<pbcast.NAKACK2/>
<UNICAST3/>
<pbcast.STABLE desired_avg_gossip="50000"
               max_bytes="8m"/>
<pbcast.GMS print_local_addr="true" join_timeout="3000"
            view_bundling="true"/>
<UFC max_credits="2M" min_threshold="0.4"/>
<MFC max_credits="2M" min_threshold="0.4"/>
<FRAG2 frag_size="60K"  />

</config>

Attributes bind_addr and external_addr were discussed above. Note that the latter is not required if the Docker
container is started with --network=host.

The attributes used by NATIVE_S3_PING are region_name and bucket_name. The latter defines the bucket that will
be used to store information about the members of this cluster. All objects created in this bucket are prefixed by
the cluster name ("draw" in the example), e.g. mybucket/draw-126532-A.list. The former defines the region in which
the bucket is located.

NOTE: If a bucket doesn't exist, a new one will be created. Since bucket names have a global name space, a bucket that
already exists for a different user will throw an exception. It is therefore recommended to create a bucket up-front
and use it as bucket_name.

Both region and bucket can be overridden by passing system properties S3_REGION and S3_BUCKET_NAME to the JGroups
demo. Similar to passing ext-addr (see above), the docker container could be started with two environment variables
for region and bucket name and they could then be read from the environment by a script that passes them as env vars
to the demo.

==== S3_PING
S3_PING is discussed in the JGroups manual at http://www.jgroups.org/manual4/index.html#_s3_ping.

==== TCPGOSSIP and GossipRouter
link:http://www.jgroups.org/manual4/index.html#TCPGOSSIP_Prot[TCPGOSSIP] is a discovery protocol which stores
information about cluster nodes in one or more GossipRouters, which are separate processes acting as lookup services.
The configuration of TCPGOSSIP needs to include the addresses of the GossipRouters, e.g.:

[source,xml]

<TCP .../>

<TCPGOSSIP
initial_hosts="GR1[12001],GR2[12001]"

/>

This means that there are GossipRouters running on GR1 and GR2 at port 12001, and TCPGOSSIP will register
each member with both processes.

The GossipRouter processes can be started inside Docker container, too, e.g. using the image available at
https://hub.docker.com/r/jboss/jgroups-gossip.

==== TCPPING
link:http://www.jgroups.org/manual4/index.html#TCPPING_Prot[TCPPING] lists all cluster nodes in a static list, e.g.

[source,xml]

<TCP bind_port="7800" .../>

<TCPPING
initial_hosts="HostA[7800],HostB[7800],HostC[7800],..."

/>

The problem here is that the IP addresses of HostA, HostB, etc have to be known before starting any of the EC2
instances. This requires either fixed addresses, or addresses mapped to a DNS service, e.g. Amazon Route 53.

An alternative is to use AWS Elastic IP addresses, where the address of each cluster node is fixed and all of the
potential members of the cluster are known ahead of starting the cluster.

Another alternative is to create a Virtual Private Network (VPC), e.g. 172.45.0.0 and assign addresses from
block 172.45.0.1 - 172.45.0.100. This means that TCPPING.initial_hosts needs to have all 100 members of this
block listed.

==== Elastic File System (EFS) and FILE_PING

Another way to run discovery on AWS is https://aws.amazon.com/efs/[Elastic File System (EFS)]. EFS is a distributed file
system, accessible by all cluster nodes. Any creation of modification of a file by a cluster member is visible by
every other member.

http://www.jgroups.org/manual4/index.html#FILE_PING[FILE_PING] can therefore be used for discovery; the location
attribute needs to point to a directory mounted by EFS. Reads and writes are performed by EFS.

==== JDBC_PING / RDS

http://www.jgroups.org/manual4/index.html#_jdbc_ping[JDBC_PING] uses a table in a database for discovery. In
conjunction with https://aws.amazon.com/rds[RDS], JDBC_PING can be setup to store information in an RDS table.

=== Running the Docker containers on EC2
After spinning up an EC2 instance, we're now ready to run the Docker container with image belaban/jgroups.

The JGroups configuration in the Docker image is ./conf/aws.xml, listed in <<NATIVE_S3_PING>>. It requires
ports 7800 (used by TCP) and 9000 (used by FD_SOCK) to be published; therefore the command to start the container is:

docker run -it --rm --network=host -p 7800:7800 -p 9000:9000 belaban/jgroups

  • -it: starts an interactive shell (TTY) so we can interact with (e.g.) the Chat demo
  • --rm: removes the container when done
  • --network=host: picks a host network. If omitted, a bridge network would be picked by default
  • -p 7800:7800 -p 9000:9000: publishes ports 7800 to 7800 and 9000 to 9000.
  • belaban/jgroups: the Docker images hosted on dockerhub.com

NOTE: When using a host network, publishing the ports is not necessary, therefore the -p 7800:7800 -p 9000:9000
option can be omitted.

When the Docker container has been started, the entrypoint (/bin/bash) shows a readme which explains how to start the
demos. To for example run Chat, the following command has to be executed:

chat.sh -props aws.xml -name A -b bucket-name

  • -props aws.xml: this instructs Chat to use aws.xml (described above) as configuration
  • -name A: gives each cluster node a unique name. If omitted, a random name will be picked
  • -b bucket-name: the name of the S3 bucket. Will be created if it doesn't exist. Note that an exception will be
    thrown if that name is already in use by a different project.

== Google Compute Platform (GCP)

There is a separate project https://github.com/jgroups-extras/jgroups-google[jgroups-google] which uses Google
Cloud Storage for discovery, and provides GOOGLE_PING2 as discovery protocol.

It is meant to be used with Google Compute Engine nodes that are started manually. To
run JGroups in Google Container Engine (GKE), which uses Kubernetes under the covers, we recommend to use
KUBE_PING instead.

Refer to http://belaban.blogspot.ch/2017/05/running-infinispan-cluster-with.html for step-by-step instructions on
how to run a JGroups cluster on GKE.

[[demos]]
== Demos

To run the image directly, execute

  docker run -it --rm --network=host belaban/jgroups

or

  docker run -p 7800:7800 -p 9000:9000 -it --rm belaban/jgroups

To build the image, run

  docker build .

The demos are described below. The idea is to run the demo apps in a
container each on the same host and they will form a cluster.

=== Chat
To run it:

  chat [-props config] [-name name] [-b bucketname], e.g. chat -props ./udp.xml -name A

Run the Chat application in multiple containers on the same host and they will form a cluster.
Typing a message into one Chat will send it to all other chats

=== Distributed locks

Distributed locks are implementations of java.util.concurrent.locks.Lock and provide locks that can be
accessed from all nodes in a cluster.

A typical use case is to lock a resource so that only 1 thread in a given node in the cluster can access it.
Should a node crash while holding a lock, the lock is released immediately.

For more details, see the section on distributed locks at http://www.jgroups.org/manual/index.html#LockService.

To run the lock demo, type:

   lock [-name  name]

Typing help into the shell shows a few commands:

[jgroups@b21d0fa6c79d ~]$ lock -name A

-------------------------------------------------------------------
GMS: address=A, cluster=lock-cluster, physical address=172.17.0.178:52519
-------------------------------------------------------------------
: help

LockServiceDemo [-props properties] [-name name]
Valid commands:
    lock (<lock name>)+
    unlock (<lock name> | "ALL")+
    trylock (<lock name>)+ [<timeout>]

Example:
    lock lock lock2 lock3
    unlock all
    trylock bela michelle 300
:

If you start instances A and B, you can try out the following:

  1. A: lock printer
  2. B: lock printer // will block
  3. A: unlock printer // now B will get the lock

Or a lock holder can be killed:

  1. A: lock printer
  2. B: lock printer
  3. Kill A. B will now get the lock on "printer"

=== Distributed counters

Distributed counters are counters will can be atomically incremented,
decremented, compare-and-set etc across a cluster.

To run the demo:

count [-name name]

Run multiple instances in different containers. The demo uses a
counter named "mycounter" and there's a command prompt which shows the
commands to be executed.

Questions can be asked on the users or dev mailing lists:
https://sourceforge.net/p/javagroups/mailman.

Enjoy !

Bela Ban

Docker Pull Command
Owner
belaban
Source Repository