Augur is Python package to forecast flu evolution. It will
- import public sequence data
- build a phylogenetic tree from this data
- estimate clade fitnesses
- estimate clade frequencies
- project frequencies foreword
It is intended to be run in an always-on fashion, recomputing predictions daily and pushing predictions to a (static) website.
pip install -r requirements.txt
docker pull trvrb/augur docker run -ti -e "GISAID_USER=$GISAID_USER" -e "GISAID_PASS=$GISAID_PASS" -e "S3_KEY=$S3_KEY" -e "S3_SECRET=$S3_SECRET" -e "S3_BUCKET=$S3_BUCKET" --privileged trvrb/augur /bin/bash
Before starting Python scripts, you'll need to run:
supervisord -c supervisord.conf
You will need a GISAID account and an Amazon S3 account. Assumes environment variables:
GISAID_USER: GISAID user name
GISAID_PASS: GISAID password
S3_KEY: Amazon S3 key
S3_SECRET: Amazon S3 secret
S3_BUCKET: Amazon S3 bucket
Keeps viruses with full HA1 sequences, fully specified dates, cell passage and only one sequence per strain name.
Align sequences with muscle and strip to just the 987 bases of HA1. This should take ~1.5 hours for ~15k sequences.
Estimate clade frequencies using SMC particle filtering.