Public | Automated Build

Last pushed: 3 years ago
Short Description
Short description is empty for this repo.
Full Description

#docker-postgresql-s3-backup

docker-postgresql-s3-backup is a Docker image that is useful for performing a full backup of a PostgreSQL database and archiving those backups in S3.

This solution is currently incapable of performing incremental backups.

##How it works

This image is built atop Alpine Linux.

The backup strategy itself is borrowed from a previous backup strategy that had been managed with Puppet. The goal has been to port that strategy with as little change as possible to Docker. Although the implementation may not be as minimal as possible, the author values the strategy's ease of use.

The strategy employs the Ruby backup gem, documentation for which can be found here.

The Ruby backup gem allows backup jobs to be configured using a concise and expressive DSL. The image comes with Ruby and the required gems pre-installed and is pre-configured to back up a PostgreSQL database, tar and gzip the backup, and transfer the backup to S3. Connection details for both the PostgreSQL database and S3 are sourced from environment variables.

##Usage

###Basic

Minimal usage takes the following form:

docker run --rm \
  -e DB_NAME=<db name> \
  -e DB_USERNAME=<db username> \
  -e DB_PASSWORD=<db password> \
  -e DB_HOST=<db host or ip address> \
  -e DB_PORT=<db port; usually 5432> \
  -e S3_ACCESS_KEY=<aws access key> \
  -e S3_SECRET_KEY=<aws secret access key> \
  -e S3_REGION=<region; e.g. us-east-1> \
  -e S3_BUCKET=<bucket name> \
  -e S3_PATH=<path within bucket to store backups> \
  -e S3_KEEP=<number of generations to keep> \
  krancour/postgresql-s3-backup:<version>

All environment variables passed to the container in the example above are required. The meaning of each should be obvious, with the exception of S3_KEEP. This variable is used to indicate how many generations of backups should be retained in S3. Each time this container runs, the latest n generations of backups will be retained, where n is specified by the S3_KEEP environment variable. All older generations of backup will be permanently deleted.

This feature will only properly reap older backups if the advanced usage instructions (below) are observed.

###Advanced

In order for reaping of older backups to work properly, the backup gem must be able to retain and access some small amount of state. The container stores this state in a Docker data volumes. Since Docker data volumes are destroyed when the container that created them is destroyed, the basic usage documented above is not sufficient for anything more than development or testing.

There are two options for how to manage this.

The first option is that a container must be created and each time a backup is to be performed, that one container must simply be restarted.

For example, to create the container:

docker create --name <container name> \
  -e DB_NAME=<db name> \
  -e DB_USERNAME=<db username> \
  -e DB_PASSWORD=<db password> \
  -e DB_HOST=<db host or ip address> \
  -e DB_PORT=<db port; usually 5432> \
  -e S3_ACCESS_KEY=<aws access key> \
  -e S3_SECRET_KEY=<aws secret access key> \
  -e S3_REGION=<region; e.g. us-east-1> \
  -e S3_BUCKET=<bucket name> \
  -e S3_PATH=<path within bucket to store backups> \
  -e S3_KEEP=<number of generations to keep> \
  krancour/postgresql-s3-backup:<version>

To run the containerized backup:

docker start -i <container name>

Alternatively, you can mount local directories as Docker data volumes to ensure data is retained on the local host between executions of the containerized backup process. This could be accomplished like so:

docker run --rm \
  -v /Users/<username>/Backup/data:/root/Backup/data \
  -v /Users/<username>/Backup/log:/root/Backup/log \
  -e DB_NAME=<db name> \
  -e DB_USERNAME=<db username> \
  -e DB_PASSWORD=<db password> \
  -e DB_HOST=<db host or ip address> \
  -e DB_PORT=<db port; usually 5432> \
  -e S3_ACCESS_KEY=<aws access key> \
  -e S3_SECRET_KEY=<aws secret access key> \
  -e S3_REGION=<region; e.g. us-east-1> \
  -e S3_BUCKET=<bucket name> \
  -e S3_PATH=<path within bucket to store backups> \
  -e S3_KEEP=<number of generations to keep> \
  krancour/postgresql-s3-backup:<version>

###Scheduling backups

The nuances of different distributed scheduling mechanisms are certain to affect the approach one takes to scheduling backups.

The following works for CoreOS / Fleet:

<db name>-backup.service:

[Unit]
Description=Backup for <db name>
Requires=docker.service

[Service]
Type=simple
Environment="IMAGE=krancour/postgresql-s3-backup:<version>" "CONTAINER=<db name>-backup"
ExecStartPre=/bin/sh -c "docker history $IMAGE >/dev/null 2>&1 || docker pull $IMAGE"
ExecStartPre=/bin/sh -c "docker inspect $CONTAINER >/dev/null 2>&1 && docker rm -f $CONTAINER || true"
ExecStart=/bin/sh -c "docker run --name $CONTAINER --rm \
  -v /var/lib/Backup/data:/root/Backup/data \
  -v /var/lib/Backup/log:/root/Backup/log \
  -e DB_NAME=<db name> \
  -e DB_USERNAME=<db username> \
  -e DB_PASSWORD=<db password> \
  -e DB_HOST=<db host or ip address> \
  -e DB_PORT=<db port; usually 5432> \
  -e S3_ACCESS_KEY=<aws access key> \
  -e S3_SECRET_KEY=<aws secret access key> \
  -e S3_REGION=<region; e.g. us-east-1> \
  -e S3_BUCKET=<bucket name> \
  -e S3_PATH=<path within bucket to store backups> \
  -e S3_KEEP=<number of generations to keep> \
  $IMAGE"

[Install]
WantedBy=multi-user.target

<db name>-backup.timer:

[Unit]
Description=Backup for <db name>
Requires=docker.service

[Timer]
OnCalendar=*:00:00

[Install]
WantedBy=multi-user.target

[X-Fleet]
X-ConditionMachineOf=<db name>-backup.service

The OnCalendar expression above will execute the backup service every hour on the hour.

If scheduling backups for multiple databases, it is highly recommended to not back up multiple databases at once.

From the terminal:

$ fleetctl load <db name>-backup.service
$ fleetctl load <db name>-backup.timer
$ fleetctl start <db name>-backup.timer

####Your mileage may vary

If using a distributed initialization system other than Fleet (Mesos, for instance), the instructions above will not be especially useful. Pull requests to add instructions for other schedulers are welcome.

Lastly, if the node on which a backup job has been scheduled is either gracefully or abruptly removed from a cluster and the job is rescheduled to another node, state will be lost and subsequent executions of the backup will fail to reap older, archived backups created by the old node. If this is extremely problematic, enabling a distributed file system such as Ceph may prove useful for distributing backup state throughout the cluster-- although that will introduce its own complications.

Docker Pull Command
Owner
krancour