Public | Automated Build

Last pushed: 7 months ago
Short Description
WAF Synchronization Container
Full Description

Catalog Harvesting

Python Modules to synchronize third party metadata sources with a central
metadata repository (WAF)


This project suppliments the IOOS Registry by providing the ability to harvest
from WAFs, store the documents in a central WAF and update the MongoDB database
that the registry uses. You'll need a running redis service to run this
project. MongoDB is the means by which this project's workers can communicate
job status with the main registry project.

This project supports only Python 2.7 due to the CKAN dependency.

Under a python virtual environment or the system python::

# Install the project dependencies
pip install -r requirements_ext.txt
pip install -r requirements.txt
pip install gunicorn

# Install this project
pip install -e .

These commands will install the project dependencies and install the project to
the either the current python virtual environment or to the system path for

Configuring the project

Several environment variables drive the project's configuration:

  • OUTPUT_DIR: Where the contents are written to
  • MONGO_URL: The connection string to the MongoDB database. Example: mongodb://localhost:27017/registry
  • REDIS_URL: The connection string to the Redis key-store. Example: redis://localhost:6379/0
  • STALE_EXPIRATION_DAYS: The number of days to keep a dataset which has not been updated before it will be removed by the cleaning job.

There are several email configuration options that mimic the Flask-Email project's configuration:

  • MAIL_SERVER: Defaults to localhost. The SMTP server to connect to.
  • MAIL_PORT: Defaults to 25. The port the SMTP server is listening on.
  • MAIL_USE_TLS: Whether the SMTP client should connect using TLS.
  • MAIL_USE_SSL: Whether the SMTP client should connect using SSL.
  • MAIL_USERNAME: The username for authenticated connections.
  • MAIL_PASSWORD: The password for authenticated connections.
  • MAIL_DEFAULT_SENDER: The e-mail address the notifications should be sent from. Defaults to
  • MAIL_MAX_EMAILS: Set the max amount of emails to send before reconnecting
  • MAIL_DEBUG: Turn on debugging for SMTP
  • MAIL_ASCII_ATTACHMENTS: If true, filenames will be converted to an ASCII equivalent.
  • MAIL_SUPPRESS_SEND: If true, emails won't actually be sent.


To run the project::

gunicorn -b localhost:3000 catalog_harvesting.api:app --workers 4

To manually execute a one-time harvest::

catalog-harvest -s <MongoDB URL> -d <WAF Directory> -v

To run a worker process::




To build the project, is pretty simple::

docker build -t ioos/catalog-harvesting .

The project is also automatically built by dockerhub whenever a change is made
to the master branch, usually through pull-requests.

Docker Pull Command
Source Repository