Public Repository

Last pushed: 2 years ago
Short Description
Container which aligns a single ended, or paired end set of sequences using bwa and samtools
Full Description

This container is a proof of concept of how to write a containerized application. It extends dmlond/bwa_samtools_base to add a script which can use the bwa and samtools programs to produce a sequence alignment bam file by aligning a sanger formatted raw sequence file (or set of paired end files) in /home/bwa_user/data against the bwa and samtools indexed build/reference fasta file in /home/bwa_user/bwa_indexed. The container is best used in context with a dmlond/bwa_reference_volume container, which can be updated to contain any publicly available reference fasta file, with bwa and samtools indexes, using the dmlond/bwa_reference container.

The following details how to align the paired sequences found in dmlond/bwa_plasmodium_data against the P. falciparum reference genome.

$ sudo docker run --name bwa_references dmlond/bwa_reference_volume
$ sudo docker run -ti --volumes-from bwa_references dmlond/bwa_reference   -i pf3D7_v2.1.5 ftp://ftp.sanger.ac.uk/pub/project/pathogens/Plasmodium/falciparum/3D7/3D7.version2.1.5/Pf3D7_v2.1.5.fasta -z
$ sudo docker run --name plasmodium_data dmlond/bwa_plasmodium_data
$ ID=`sudo docker run -d --volumes-from bwa_references --volumes-from plasmodium_data dmlond/bwa_aligner  -s ERR022523_1.fastq.gz -b pf3D7_v2.1.5 -R Pf3D7_v2.1.5.fasta.gz -p ERR022523_2.fastq.gz -o ERR022523_1_2.bam`

This will run for a few minutes. You can check the status of the job with

$ sudo docker inspect $ID

When 'Running' is false, it is finished. If 'ExitCode' is 0 it finished successfully, otherwise it finished with an error. Either way, you can get the output of the job STDOUT and STDERR with

$ sudo docker logs $ID

Once the job finished with a 0 ExitCode, you should see the output of the samtools flagstat on the resulting bam file in the docker logs. You can then pull the data to your host machine with

$ sudo docker run --rm --volumes-from plasmodium_data -v /home/${USER}:/archive dmlond/bwa_samtools_base cp /home/bwa_user/data/ERR022523_1.fastq.gz.bam /archive/

You can run the container without any volumes, or arguments, to get a list of requirements.

This example is based upon the Plasmodium falciparum reference genome.
http://www.nature.com/nature/journal/v419/n6906/abs/nature01097.html

Docker Pull Command
Owner
dmlond

Comments (0)