A base container with confd.
This is the base container the rest of my containers are based on. While in normal circumstances, I would adopt the "use the official images" approach, I'm building a CoreOS cluster and I want to have some very specific behaviours and characteristics for my containers.
Some of these characteristics are behaved in more detail below, but in summary:
- Supervised processes (but very few of them)
Eventually, I'd like to transition to Rkt instead of Docker, but for now the learning curve is too steep and there appear to be some extra infrastructure needs.
The way I'm using this is:
- Create images based on this one. I have several (see further down).
- Build a CoreOS cluster on your cloud infrastructure of choice. In my case I'm using Digital Ocean; their service has been stellar for the last year that I've been using them, and they are pretty low cost.
- Instantiate units in the CoreOS cluster with "docker run". See CoreOS docs for guidance on how to do that. The important bit is to make sure that the required environment variables are included. For example: -e ETCD_SRV_RECORD=_etcd-server-ssl._tcp.coreos.example.com
Images using this one
Some images that I'm using in my environment:
- docker-nginx-lb: Provides an etcd-aware load balancer that other containers can instruct where to forward traffic based on etcd configuration values.
- docker-mariadb: A MariaDB mysql server. Replication support enabled and handled via etcd.
- docker-phabricator: A php-fpm server with Phabricator installed, etcd-aware (etcd configures the Phabricator disk-based config). I use Phabricator for most of my work and personal stuff. It is so good that I forgive it for being written in PHP.
Some more images I plan to build in the near future:
- docker-nodejs: A nodeJS web server that deploys from an etcd-configured git repository, and auto-redeploys when the git repo "deploy" branch is updated.
- docker-mail-backend: A Dovecot based mail store server, etcd-configured with dstync replication set up.
- docker-smtp-server: an SMTP server (goes hand-in-hand with the mail store server)
- docker-webmail: a webmail server (goes hand-in-hand with...well, you get the idea)
- docker-ldap-server: an LDAP server, etcd-configured with replication support.
- docker-buildmaster: a buildmaster server. Not sure what CI tool I want to use for this yet. Maybe several containers for different tools, why limit myself to only one?? ;)
- docker-buildslave: a build slave instance - etcd tells it where to find the master.
- docker-log-server: a logstash server or something. Haven't really thought it through yet.
This section describes some of the characteristics this image is designed to provide.
Based on Alpine Linux, the image is small - REALLY small.
The total size of the base image is a around 25MB - most of this is confd, which contains support for some backends we don't need (we only require etcd). It is possible that with some build customisation, the size of the image can be reduced to under 20MB, although this is mostly an unnecessary customisation - but one I'll look at doing anyway, because confd runs on every container and so the less memory and disk it takes, the better.
Why does small matter? Especially in a CoreOS environment, there may be many nodes in many places and fast network connections between nodes and other nodes and between nodes and the registry is not guaranteed. The smaller the image, the faster deployment of new container units is.
And of course, every image takes space, and I rather not pay any more to my cloud providers than I need to, even if it's only a small percentage of total cost.
The image uses the s6 process supervisor. While I subscribe to the notion of "one process/thing per container" that Docker advocates, in my architecture it makes sense to use something extremely lightweight like s6 to make sure that I only crash/restart my containers when I want to.
s6 gives me a wonderful amount of control, and allows me to do things like defining a service runner for confd to make sure it is always running, and when I define other images on top of this one, those images can define a service runner for their "thing" (e.g. MariaDB or Nginx). Features like 's6-svwait' allow the "thing" to wait until confd is up - really up. s6 has the ability to notify when a service is "really ready". As such, even if confd is up but not yet connected, other processes will wait.
As with Alpine, it's REALLY small and low overhead. The entire very feature rich
application suite is around 800KB! Each supervisor instance measures it's memory
usage in bytes, and generally does not use heap memory unless absolutely needed.
Really the only down-side with s6 is the documentation is awful, but mostly just
because it is not well-oriented to beginners - once you understand how s6 works
it generally makes sense.
The image is designed to be as stateless as possible. This is an absolute necessity for building a useful CoreOS cluster environment.
Stateless behaviour is achieved as follows:
- The container requires (mandatory) environment variables that tell it where to find the CoreOS etcd service - ETCD_SRV_DOMAIN or ETCD_SRV_RECORD.
- Authentication is optional, but can be provided (ETCD_USERNAME, ETCD_PASSWORD)
- If ETCD_PREFIX variable is provided, it will tell the container where to find etcd key-value pairs for that specific container.
- Kelsey Hightower's confd binary handles connection to etcd and generation of configuration files.
By this approach, the entire configuration of the container can be provided via etcd - minimizing the need for local configuration or configuration via a multitude of environment variables or even worse - shudder - a volume mount to the host for a specific config directory.
The beauty of confd is also that it is smart enough to detect configuration changes in etcd - it'll pick up when a config value is changed, validate that it is a valid configuration change, and automatically re-generate and re-load the process(es).
There are also some other ideas in relation to confd/etcd - for example, if the CoreOS cluster has a load balancer container, application containers could register their application data to the load balancer's etcd tree, helping to configure the load balancer to forward the right URLs and Virtual Hosts to the application container.
The only extra state currently required is a volume mount for the SSL certificates and key used to provide TLS authentication to etcd. Ideally, this need can be eliminated in future. So far my only idea for this is to remove client certificate authentication in favor of SSL with Basic Auth. Not sure this is the best solution, but so far I haven't come up with a better way.
Improvements to be made
This area is to some extent a dumping ground for my own ideas on improving the base image.
- I prefer to avoid HTTP, but I should add support for it as a scheme for confd
- A logging strategy is needed - preferably something end-to-end that can take logs from s6-supervised processes and deliver them to a central log store, whether that be rsyslog, logstash, or something else entirely.
- Proper container reporting and cleanup via s6/.s6-svscan/crash|finish would be useful.
- etcd and confd are awesome and provide for the configuration side of things, but some kind of support for diagnostics and central exception handling would be useful (a modern version of SNMP, or maybe SNMP itself is the way to do it?).
- SSL certificates (and, worse, keys) are handled via a volume mount. While I'm aware that a lot of people use this approach, it makes me vomit in my mouth a little every time I think about it. A better strategy is needed, but the ideas are thin on the ground at the moment, other than somehow switching to basic auth instead of TLS client auth.
- Confd is currently a binary in the git repo for this image. I don't like doing that, but it was a matter of expedience - in the longer term, I'd like to get my own build environment for Alpine running and automatically create APK packages for new confd releases. This will also give me an opportunity to look into things like extracting out the unnecessary backends and see if I can get confd smaller than it already is.