Docker image for GA4GH reference server with example data
Federated databases consist in using local resources for hosting data instead of submitting to centralized data servers (reference).
This concept facilitates sharing genomic data with restricted access to sensitive information such as individuals’ information.
It also avoids legal issues associated to data protection.
Hosting data locally leads to difficulties in data integration from different databases because each provider may implement its own non-standard interface for publishing data.
The Global Alliance for Genomics and Health (GA4GH) was formed to help accelerate the potential of genomic medicine to advance human health.
The GA4GH Data Working Group developed data model schemas and application program interfaces (APIs) for standardized genomics data exchange.
The Docker image GA4GH reference implementation server (
ga4gh-server) runs Apache 2 web server exposing TCP port 80.
This Docker image for GA4GH reference server with example data extends
ga4gh-server image adding example database (from 1000 Genomes Project data).
GA4GH database is created using
ga4gh_repo tool provided by the parent image, see Official Documentation.
docker container run --rm --name ga4gh-example -d -p 80:80 welliton/ga4gh-example:0.3.6
Explaining the command line:
docker container run is the base command (same as
--rm tells Docker to remove the container (not the image) after stopped.
--name ga4gh-example gives a name to the container (optional).
Giving a name to the container simplify management (inspect and stop container).
-d runs container in backgroud.
Remove this parameter to see messages of Apache 2 daemon.
-p 80:80 makes a bridge between TCP port 80 of the container and the host.
welliton/ga4gh-example:0.3.6 the name of the images.
This image does not have
It provides a tag for each release of the GA4GH reference server.
The image version (0.3.6) is based on release version of the
The GA4GH server landing page will be available at http://localhost:80.
To stop the container run
docker container stop ga4gh-example.
Sharing genomic database among containers
/data directory might be useful to allow other containers to access the same database without duplicating data.
For example, run the
ga4gh-server image (does not contain data) connected at
Other example is running GA4GH Beacon server connected at the same database.
For more information see Docker image for GA4GH Beacon server.
docker container run --rm --name ga4gh-example -v ga4gh-example-data:/data:ro -d -p 80:80 welliton/ga4gh-example:0.3.6
Explaining the command line:
-v ga4gh-example-data:/data:ro instructs Docker to mount
/data directory as read-only (
ro) at as named volume
Setting directory as read-only avoids problems with concurrency.
Run a second container.
docker container run --rm --name ga4gh-server -v ga4gh-example-data:/data -d -p 81:80 welliton/ga4gh-server:0.3.6
The second container does not depends on the first one.
Instead, it mounts the
Docker volumes are independent of container, we can stop (and remove) the first container.
The second container will still working.
It is even possible to remove the image without losing data in the volume (
docker image rm welliton/ga4gh-example:0.3.6).
The volume will still available.
To remove a volume all containers connected should be stopped first (in this example
docker stop ga4gh-server).
docker volume rm ga4gh-example-data to remove the volume (and all its data).
For more information about Docker Volumes see Manage data in containers.
Buiding this image
git clone firstname.lastname@example.org:labbcb/docker-ga4gh-example.git cd docker-ga4gh-example/ docker build -t welliton/ga4gh-example:0.3.6 .