Docker build for the PCIC Data Portal (pdp).
The PCIC Data Portal contains the frontend code required for the PCIC Data Portal as well as WSGI callables to deploy the entire application within a WSGI container.
The following guide assumes an ubuntu/debian based system.
The pdp requires that pip and tox be installed.
sudo apt-get install python-pip python-dev build-essential
sudo pip install tox ## or pip install tox --user
Some of the required python libraries have system-level dependencies.
sudo apt-get install libhdf5-dev libnetcdf-dev libgdal-dev
And GDAL doesn't properly source it's own lib paths when installing the python package:
export CPLUS_INCLUDE_PATH=/usr/include/gdal
export C_INCLUDE_PATH=/usr/include/gdal
With the prerequisites, creating a development environment should be as simple as:
git clone https://github.com/pacificclimate/pdp
cd pdp
tox -e devenv
It could take 5-30 minutes since tox will not use system packages and needs to build any package which requires it.
It is best practice to maintain a consistent virtual environment for production.
git clone https://github.com/pacificclimate/pdp
cd pdp
virtualenv pyenv
The pdp will run in any WSGI container. This guide uses gunicorn.
pyenv/bin/pip install -i https://pypi.pacificclimate.org/simple/ -r requirements.txt -r data_format_requirements.txt -r test_requirements.txt -r deploy_requirements.txt
Install and build the docs. Building the docs requires the package to be installed, then installed again after the docs are built.
pyenv/bin/python setup.py install
pyenv/bin/python setup.py build_sphinx
pyenv/bin/python setup.py install
Configuration of the PDP is accomplished through a set of environment variables. A sample environment file is stored in pdp/config.env
. This environment file can be sourced in before you run the pdp, included in a Docker deployment or used in any other flexible way.
source pdp/config.env
export $(grep -v '^#' pdp/config.env | cut -d= -f1)
Root location where data portal will be exposed. This location will need to be proxied to whatever port the server will be running on.
Root location of backend data server. Probably <app_root>/data
. If you are running in production, this location will need to be proxied to whatever port the data server will be running on. When running a development server, this is redirected internally.
Raster metadata database url of the form dialect[+driver]://username:password@host:port/database
. Password must either be supplied or available in the user's ~/.pgpass
PCDS database URL of the form dialect[+driver]://username:password@host:port/database
. Password must either be supplied or available in the user's ~/.pgpass
Determine's use of javascript bundling/minification.
PCDS Geoserver URL
Raster portal ncWMS URL
Tileserver URLs (space separated list) for base maps
Enable or disable Google Analytics reporting
Google Analytics ID
When correctly configured, all the tests should now pass.
pyenv/bin/py.test -vv --tb=short tests
Provided you installed everything with tox
, you should be able to run a development server with
devenv/bin/python scripts/rast_serve -p <port> [-t]
A production install should be run in a production ready WSGI container with proper process monitoring. We use gunicorn as the WSGI container, Supervisord for process monitoring, and Apache as a reverse proxy.
In production, the frontend and backend are ran in seperate WSGI containers. This is because the front end serves short, non-blocking requests, whereas the back end serves fewer long, process blocking requests.
Running in gunicorn can be tested with a command similar to the following:
pyenv/bin/gunicorn -b<port1> pdp.wsgi:frontend
pyenv/bin/gunicorn -b<port2> pdp.wsgi:backend
Note: this is only an example process monitoring setup. Details can and will be different depending on your particular deployment stragety
Set up the Supervisord config file using
pyenv/bin/echo_supervisord_conf > /install/location/supervisord.conf
In order to run Supervisord, the config file must have a [supervisord]
section. Here's a sample section:
logfile=/install/location/etc/<supervisord_logfile> ; (main log file;default $CWD/supervisord.log)
loglevel=info ; (log level;default info; others: debug,warn,trace)
nodaemon=true ; (start in foreground if true; useful for debugging)
Supervisorctl is a command line utility that lets you see the status and output of processes and start, stop and restart them. The following will set up supervisorctl using a unix socket file, but it is also possible to monitor processes using a web interface if you wish to do so.
file = /tmp/supervisord.sock
serverurl = unix:///tmp/supervisord.sock
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface
Front end config
command=/install/location/pyenv/bin/gunicorn -b<port> --access-logfile=<access_logfile> --error-logfile=<error_logfile> pdp.wsgi:frontend
Back end config
command=/install/location/pyenv/bin/gunicorn -b<port> --workers 10 --worker-class gevent -t 3600 --access-logfile=<access_logfile> --error-logfile=<error_logfile> pdp.wsgi:backend
To make starting/stop easier, add a group to supervisord.conf
Once the config file has been set up, start the processes with the following command:
pyenv/bin/supervisord -c path/to/supervisord.conf
After invoking Supervisord, use supervisorctl to monitor and update the running processes
When upgrading, it's easiest to simply copy the existing config and update the paths/version number.
IMPORTANT: When adding a new version, make sure to set the old version autostart
and autorestart
to false.
Using supervisorctl
, you should then be able to reread
the new config, update
the old version config (so it stops, picks up new autostart/autorestart=false), and update
the new version.
If there are any errors, they can be found in the supervisord_logfile
. Errors starting gunicorn can be found in the error_logfile
docker pull pcic/pdp