Public | Automated Build

Last pushed: a year ago
Short Description
Data staging for Open PHACTS Identity Mapping Service
Full Description

Open PHACTS Docker images

The Open PHACTS Discovery Platform can be
installed as a series of Docker containers.

Overview

A Docker container is a kind of sandboxed Linux environment, which typically
runs a single server instance, e.g. mySQL. Each Container has its own virtual
filesystem, which is realized from Docker images, downloaded from the
central Docker Hub Registry.

The Open PHACTS Docker images
provide the different services that form the Open PHACTS platform.
This page describes how these Docker containers can be installed
and started using Docker Compose.

The Open PHACTS containers will download and use
the latest Open PHACTS data release, and provide the
Virtuoso SPARQL endpoint, the Open PHACTS REST API and the
Explorer web interface.

External services: The following components of the Open PHACTS platform
is not yet included in this release.

  • Chemical Resolution Service APIs (e.g. SMILEStoCSID and Similarity search)
  • Text to Concept search calls

You can modify docker-compose.yml to enable usage of the public APIs for these,
see the External services section below.

Requirements

Roughly minimal hardware requirements:

  • ~ 150 GB of disk space
  • ~ 10 GB of RAM
  • ~ 4 CPU core

Recommended hardware:

  • ~ 250 GB of SSD disk
  • ~ 128 GB of RAM
  • ~ 8 CPU cores

Prerequisites:

  • Recent x64 Linux distribution (e.g. Ubuntu 14.04 LTS, Centos 7)
  • Docker 1.7.1 or later
  • Docker Compose 1.5.2 or later
  • Fast Internet connection (during build of data containers)

Note that the you would have to make the
disk space available for Docker.

Docker installation

These Docker images have been tested on:

  • Centos 6.7 (with kernel 3.18.21-17.el6 - yum install centos-release-xen ; yum update)
  • Ubuntu 14.04 LTS

These images have not been tested with
Docker virtualization on non-Linux platforms (OS X, Windows)
or behind a firewall.

See the
Docker installation
guide for details for your Linux distribution. Here's the short-hand
installation for Ubuntu 14.04:

sudo -i
apt-get -y dist-upgrade
wget -qO- https://get.docker.com/ | sh

To test the installation, try:

sudo docker run hello-world

You will additionally need to install
Docker Compose. The exact version
used below might be out of date, see the install guide for details.

sudo -i
curl -L https://github.com/docker/compose/releases/download/1.6.0/docker-compose-`uname -s`-`uname -m` > /usr/local/bin/docker-compose
chmod +x /usr/local/bin/docker-compose

To test the installation, try:

sudo docker-compose --version

Hint: If you add your username to the docker group, as suggested by the
Docker install, and log out and in again, you can run the remaining docker
and docker-compose commands without using sudo. Note that this would
effectively be giving that user privileged root access to the host machine
without password verification.

Disk space for Docker

You will need about 150 GB of disk space for the Open PHACTS Docker containers
and data. Check on the docker host:

sudo df -h /var/lib/docker/

If you do not have enough space on the right permission, you might want
to edit the -volumes sections in docker-compose-override.yml to use alternative
folders for the most disk hungry containers. Note that you still need
about 15 GB of disk space in /var/lib/docker for the
downloaded Docker images.

Another simpler option is to do the equivalent of:

sudo service docker stop
sudo mv /var/lib/docker /bigdisk/
sudo ln -s /bigdisk/docker /var/lib/
sudo service docker start

If you are using a virtual machine to run Docker (e.g. on Windows and OS X)
ensure you have allocated enough disk space (and memory) to the virtual machine's
file system.

It is not recommended to use an externally mounted volume
(e.g. USB, NFS or network share) for the Docker disk space.

Retrieving Open PHACTS Docker images

Download this ops-platform-setup repository from the master branch:

curl -L https://github.com/openphacts/ops-docker/archive/master.tar.gz | tar xzv
cd ops-docker-master

You can also use the above to upgrade the ops-docker download, but this would
overwrite any changes you have made to docker-compose.yml. Therefore
you should put your edits in docker-compose.override.yml instead.

Now make sure you are in the equivalent of the
ops-docker-master folder and run:

sudo docker-compose pull

This will download the latest version of these
Docker images according to docker-compose.yml:

Building data containers

The Open PHACTS Docker container use separate
Data Volume Containers
to contain the Open PHACTS datasets.

On installation you will need to run once these local
data containers and their data staging counterpaths,
virtuosodata-frombackup and
mysqlstaging, which will download
the Open PHACTS 1.5 data.

Make sure you have sufficient disk space available for Docker.

The below will download about ~20 GB and might take some
time to download and stage
(~ 2h depending on network and disk speed).

sudo docker-compose up --no-recreate -d mysqlstaging virtuosostaging

To follow the progress, use:

sudo docker-compose ps
sudo docker-compose logs mysqlstaging
sudo docker-compose logs virtuosostaging

Note that docker-compose logs may not terminate even if its contanier does,
use Ctrl-C to cancel log listing.

Expected output from mysqlstaging:

mysqlstaging_1 | Preparing to stage ims
mysqlstaging_1 | Waiting for mySQL
mysqlstaging_1 | mySQL staging
mysqlstaging_1 | -rw-r--r-- 1 root root 1.2G Jul 22 15:43 /tmp/staging.sql
mysqlstaging_1 |  out: 8737.224ms at  937.6kB/s ( 937.6kB/s avg)    8.0MB
mysqlstaging_1 |  out: 1009.888ms at    7.9MB/s (   1.6MB/s avg)   16.0MB
..
mysqlstaging_1 |  out: 677.916ms at   11.8MB/s (   8.6MB/s avg)    1.1GB
mysqlstaging_1 |  out: 761.496ms at   10.5MB/s (   8.6MB/s avg)    1.1GB
(long wait)
mysqlstaging_1 | mySQL staging finished
docker_mysqlstaging_1 exited with code 0

Expected output from virtuosostaging:

virtuosostaging_1 | Downloading checksums from http://data.openphacts.org/dev/1.5/virtuoso/
virtuosostaging_1 | 2015-07-22 15:43:29 URL:http://data.openphacts.org/dev/1.5/virtuoso/ [1634/1634] -> "index.html" [1]
virtuosostaging_1 | 2015-07-22 15:43:29 URL:http://data.openphacts.org/dev/1.5/virtuoso/?C=N;O=D [1634/1634] -> "index.html?C=N;O=D" [1]
..
virtuosostaging_1 | Downloaded: 29 files, 1.5M in 0.1s (10.2 MB/s)
virtuosostaging_1 | Downloading Virtuoso backup set to /download
virtuosostaging_1 | Initializing download: http://data.openphacts.org/dev/1.5/virtuoso/ghard-dump-20150415.tar
virtuosostaging_1 | File size: 21715220480 bytes
virtuosostaging_1 | Opening output file ghard-dump-20150415.tar
virtuosostaging_1 | Starting download
(long wait)
virtuosostaging_1 | ghard-dump-20150415/bak_325.bp
virtuosostaging_1 | Data download complete
virtuosostaging_1 | Loading bak_ -- 677 files
(..)
virtuosostaging_1 | 08:46:24 OpenLink Virtuoso Universal Server
virtuosostaging_1 | 08:46:24 Version 07.20.3212-pthreads for Linux as of Jun  3 2015
virtuosostaging_1 | 08:46:24 uses parts of OpenSSL, PCRE, Html Tidy
virtuosostaging_1 | 08:46:24 Begin to restore with file prefix bak_
virtuosostaging_1 | 08:46:24 --> Backup file # 1 [0x3F02-0x74-0x8A]
virtuosostaging_1 | 08:46:25 --> Backup file # 2 [0x3F02-0x74-0x8A]
(..)
virtuosostaging_1 | 09:13:35 --> Backup file # 675 [0x3F02-0x74-0x8A]
virtuosostaging_1 | 09:13:36 --> Backup file # 676 [0x3F02-0x74-0x8A]
virtuosostaging_1 | 09:13:37 End of restoring from backup, 6751701 pages
virtuosostaging_1 | 09:13:37 Server exiting
virtuosostaging_1 | Loading completed
docker_virtuosostaging_1 exited with code 0

You may want to inspect the download progress:

stain@heater:~/ops-platform-setup/docker$ docker exec docker_virtuosostaging_1 du -hs /download
5.5G    /download

Staging is finished when both mysqlstaging and
virtuosostaging have exited. Note that the mysql container
will remain up. Check with:

sudo docker-compose ps

Configuring Open PHACTS platform

Edit the docker-compose.yml file for your host-specific settings.
This is a Docker Compose configuration file.

You can modify -volumes to use an explicit folder for the data containers,
e.g. to use a faster/bigger disk partition. See comments in-line in
docker-compose.yml.

You may want to change the exposed -port from 300* to different ports,
or avoid their exposure at all. The only requirement here is that the exposed
port for api must correspond to the port in API_URL, and that the ports
are not already in use on the host server.

Unless you are going to access the platform on localhost exclusively,
you must change the API_URL variable for the
explorer2 container. This URL must use the fully qualified hostname
as it will be accessed in the browser. The port should remain
as 3002 unless you have changed the export port for api.

Important: Do not include the trailing / of the API_URL.

For example:

    environment:
      - API_URL=http://server13.example.com:3002

TODO: Make a wrapping webserver that provides a common port 80 for api,
sparql and api.

External services

The APIs for the Chemical Resolution Service
and ConceptWiki are not currently
available as Docker images. The default configuration for the
Open PHACTS Docker platform is to not access these APIs.

The APIs call below rely on these services and would therefore not
normally be functional in this Docker installation:

  • /structure?inchi={inchi}
  • /structure?inchi_key={inchi_key}
  • /structure?smiles={smiles}
  • /structure/similarity?searchOptions.Molecule={searchOptions.Molecule}
  • /structure/substructure?searchOptions.Molecule={searchOptions.Molecule}
  • /structure/exact?searchOptions.Molecule={searchOptions.Molecule}
  • /search/freetext?q={q}
  • /search/byTag?q={q}&uuid={uuid}
  • /getConceptDescription?uuid={uuid}

To enable usage of the public APIs as a fallback for these calls,
modify docker-compose.yml to uncomment these lines (keep the indendation):

  environment:
    - CRS=https://ops.rsc.org/api/v1/
    - CONCEPTWIKI=http://www.conceptwiki.org/web-ws/concept

Usage of the public Open PHACTS API is covered by the
Terms of Use and
Privacy Policy.

Running the Open PHACTS platform

Assuming the previous loading has completed, you can now start
the rest of the Open PHACTS platform:

sudo docker-compose up --no-recreate -d

You can follow the progress by looking at the logs (press Ctrl-C to stop watching):

sudo docker-compose logs

The Open PHACTS platform should be started when you see the equivalent of these
from each container:

api_1 | [Tue Jun 16 16:49:14.309976 2015] [mpm_prefork:notice] [pid 1] AH00163: Apache/2.4.10 (Debian) PHP/5.6.10 configured -- resuming normal operations
mysql_1           | 2015-06-16 16:48:47 1 [Note] mysqld: ready for connections.
explorer2_1 | [2015-06-16 16:49:35] INFO  WEBrick::HTTPServer#start: pid=1 port=3000
ims_1 | 16-Jun-2015 16:49:06.641 INFO [main] org.apache.catalina.startup.Catalina.start Server startup in 5568 ms

The virtuoso container usually takes the longest to start up.

Once started, this should expose the following services (replace localhost with your
server's hostname):

Note: using the text search in Explorer will use the
remote Text-to-Concept service from conceptiwki.org.

Stopping the Open PHACTS platform

To check the status of the Open PHACTS platform, use:

sudo docker-compose ps

To stop the platform, use:

sudo docker-compose stop

Removing the Open PHACTS platform

sudo docker-compose stop
sudo docker-compose rm -v

To recover additional disk space by the docker images, and don't have any other non-running docker images you want to keep:

sudo docker images -q | xargs sudo docker rmi

Sometimes you might also need to remove all old containers - which would free up the images for the above:

sudo docker ps -aq | xargs sudo docker rm -v

Upgrading the Open PHACTS platform

Unless a new data release needs to be loaded, you do not need to repeat
the staging. To upgrade the software within the docker images
(e.g. newer mySQL or OPS Platform API), do:

sudo docker-compose pull

Then rebuild the containers to use the newer images:

sudo docker-compose up -d

If you need to restart staging from blank, then first remove their data volumes:

sudo docker-compose rm -v mysqldata virtuosodata

Then follow the procedure "Building data containers" above.

Docker Pull Command
Owner
openphacts
Source Repository

Comments (0)