Public Repository

Last pushed: a year ago
Short Description
A thin wrapper around the bcbio/bcbio container
Full Description

A thin wrapper around the bcbio/bcbio container that supports cancer and or structural variant calling

This script runs a cancer variant and or structural variant,
or germline and or structural variant calling pipeline
using the publicly available bcbio/bcbio Docker container.
See this document for more information on bcbio:
http://bcbio-nextgen.readthedocs.io/en/latest/index.html

The current version runs as user ubuntu and writes all output files as user ubuntu and group ubuntu by default.

It will download reference data if the data directory provided to it is empty.
It will install the GATK tools via the GATK switch (-g or --GATK_file) and argument specified on the command line.

-v <path to GATK bz2>:<path to GATK bz2> is required. Used to access the GATK file on the host.
-v <path to bcbio data>:<path to bcbio data> This volume is required if specifying a data directory or data tar file.
-v <path to input files>:<path to input files> is required for the container to access the input files.
-v <base path to working directory and output directory>: <base path to working directory and output directory> is used the by container to write intermediate and result files to the host. The intermediate and results files can be on the order of tens of gigabytes.
-w <base path to working directory>/work is required to set the working directory for the container to the user specified working directory on the host.

Inputs:
'-t', '--tumor_sample_files': The tumor fastq or BAM files: Required for cancer variant calling.
'-n', '--normal_sample_files': The normal fastq or BAM files: Required for cancer variant calling.
'-s', '--sample_files': The fastq or BAM files for the germline variant calling pipeline. Required for germline variant calling.
'-c', '--num_cores': Total available cores. This tells bcbio how many total cores to use.
http://bcbio-nextgen.readthedocs.io/en/latest/contents/parallel.html
'-g', '--GATK_file': The path and file name of the GATK tools. If provided the GATK tools will be installed in the container for use in the workflow. GATK is required for the currently supported workflows.
'-W','--workflow': The name of the workflow to run. If not provided cancer variant calling only is run. Both structural and cancer variant calling can be run at the same time by providing two switch choices.
'-b', '--bed_file': path to and name of the BED file. Required.
'-d', '--data_dir': path to the bcbio genome reference data. If not used and '-f' switch not used data is downloaded to <cwd>/data
'-f', '--data_file': path to a tar file of the bcbio genome reference data. File will be untarred in <cwd>/data. If not used and '-d' switch not used data is downloaded to <cwd>/data
'-o', '--output_dir': path of the output directory. Output is in <cwd>/final if not specified.

Example command line execution:
Germline small and structural variant calling using a previously downloaded bcbio reference data set with the '-d' parameter:
docker run -it
-v /mnt/NA12878-exome-eval/input/:/mnt/NA12878-exome-eval/input/
-v /mnt/UCSCbcbioTooltestdir/data_latest:/mnt/UCSCbcbioTooltestdir/data_latest
-v /mnt/GATK/GenomeAnalysisTK-3.6.tar.bz2:/mnt/GATK/GenomeAnalysisTK-3.6.tar.bz2
-v /mnt/cancer-dream-syn3/:/mnt/cancer-dream-syn3
-v $(pwd):$(pwd)
-w $(pwd)
ucscbcbiotool:latest UCSCbcbioTool.py
-s /mnt/NA12878-exome-eval/input/NA12878-NGv3-LAB1360-A_1.fastq.gz
-s /mnt/NA12878-exome-eval/input/NA12878-NGv3-LAB1360-A_2.fastq.gz
-W germline-variant-calling
-W structural-variant-calling
-b /mnt/cancer-dream-syn3/input/NGv3.bed
-d /mnt/UCSCbcbioTooltestdir/data_latest
| tee outgermlinestructural

Cancer small and structural variant calling using a tar file of bcbio reference data with the '-d' parameter
docker run -it
-v /mnt/UCSCbcbioTooltestdir/bcbiotarreddata:/mnt/UCSCbcbioTooltestdir/bcbiotarreddata
-v /mnt/GATK/GenomeAnalysisTK-3.6.tar.bz2:/mnt/GATK/GenomeAnalysisTK-3.6.tar.bz2
-v /mnt/cancer-dream-syn3/:/mnt/cancer-dream-syn3 -v $(pwd):$(pwd)
-w $(pwd) ucscbcbiotool:latest UCSCbcbioTool.py
-n /mnt/cancer-dream-syn3/input/synthetic_challenge_set3_normal_NGv3_1.fq.gz
-n /mnt/cancer-dream-syn3/input/synthetic_challenge_set3_normal_NGv3_2.fq.gz
-t /mnt/cancer-dream-syn3/input/synthetic_challenge_set3_tumor_NGv3_1.fq.gz
-t /mnt/cancer-dream-syn3/input/synthetic_challenge_set3_tumor_NGv3_2.fq.gz
-c 16
-g /mnt/GATK/GenomeAnalysisTK-3.6.tar.bz2
-W cancer-variant-calling
-W structural-variant-calling
-b /mnt/cancer-dream-syn3/input/NGv3.bed
-f /mnt/UCSCbcbioTooltestdir/bcbiotarreddata/bcbio_data.tar
| tee outalignertest

Docker Pull Command
Owner
bigdog