Public Repository

Last pushed: 2 months ago
Short Description
Appreci8 performs variant calling in NGS data with high sensitivity and high PPV.
Full Description

What is appreci8?

Appreci8 is a variant calling pipeline for detecting single nucleotide variants (SNVs) and short indels (up to ~30 bp) in next-generation sequencing (NGS) data. By integrating and filtering the output of eight individual variant calling tools on the basis of an artifact- and a polymorphism score, appreci8 succeeds in calling variants with high sensitivity and positive predictive value even at variant allele frequencies of 1%.

Prerequisites

This image only supports 64 Bit operating systems (Windows or Linux) with a Docker installation.

In order to use this image you have to download and unzip the appreci8 folder. There are two possibilities:

This directory has to be mounted into the appreci8 docker container in the way described in How to use this image?

The data you wish to analyze has to be prepared in the following way (compare folder Example contained in the appreci8 folder):

  • SampleNames.txt: The names of the samples you wish to analyze (without file extension, one name per line)
  • vcf_header.txt: Standard vcf file header (available in the appreci8 folder)
  • Folder alignment: Containing the bam- and bai files of the samples you wish to analyze (format: sample1.bam, sample1.bai etc.)
  • Folder snpEff_ann:
    • Hotspots.txt: A list containing known hotspot mutations, covering Gene, Mutation (change on amino acid level, one-letter-code), Min_VAF (minimum allelic frequency at which you expect these mutations); an empty list can be passed, containing the header and three NA's (available in the appreci8 folder)
    • transcripts.txt: A list containing the genes and the corresponding Ensembl transcript-IDs to be analyzed (without header; e.g. NRAS\tab ENST00000369535; for an example see file in the Example folder)
  • Folder targetRegions:
    • targetRegions.bed: Bed file containing the target regions to be analyzed (no header, no information except for chr, start, end; 1 instead of chr1 etc.; for an example see file in the Example folder)

How to use this image?

To start the application with the default settings simply run

$ docker run -v /path/to/appreci8:/appreci8 -v /path/to/data:/data -e LOCAL_USER_ID=`id -u $USER` wwuimi/appreci8

Important: If you do not specify the LOCAL_USER_ID, the default user id -9001 is being used and therefore the files might be not accessible by your user, if you are not root and have only limited file system access. The provided command will use the UID of the user account, that you are using on your local system. You could also specify the UID directly by providing a numeric value, but this should be not neccessary in normal environments.

If you encounter Java heap space problems, use this command to adjust it, for example expand it to 4 Gigabyte:

$ docker run -v /path/to/appreci8:/appreci8 -v /path/to/data:/data -e "JAVA_TOOL_OPTIONS=-Xmx4G -Xms4G" -e LOCAL_USER_ID=`id -u $USER` wwuimi/appreci8

If you need to change the default options appreci8 is using for calculating, simply change

  • minimum number of reads with the alternate allele: MIN_ALT (default: 20)
  • minimum depth: MIN_DP (default: 50)
  • minimum variant allele frequency: MIN_VAF (default: 0.01; do not choose values below 0.01)
  • minimum mean base quality for reads with the alternate allele: MIN_BQ_ALT (default: 15)
  • maximum difference for (mean base quality reference) - (mean base quality alternative): MAX_BQ_DIFF (default: 7)
  • maximum number of samples that are allowed to feature the same variant without penalizing: MAX_SAMPLES (default: 3; if your data set contains more than 3 replicates of the same sample, it is recommended to increase this value)
  • bed file containing primer locations is provided: PRIMER (default: "FALSE")

The new command with custom settings could look like this:

$ docker run -v /path/to/appreci8:/appreci8 -v /path/to/data:/data -e "MIN_ALT=XX" -e "MIN_DP=XX" -e LOCAL_USER_ID=`id -u $USER` wwuimi/appreci8

If you need a special Version of appreci8 check out the tag section and decide which container you need.
Then simply run

$ docker run -v /path/to/appreci8:/appreci8 -v /path/to/data:/data -e LOCAL_USER_ID=`id -u $USER` wwuimi/appreci8:tag

Tag versions

  • latest: Basic appreci8 version (Ensembl used for annotation; defined list of transcript IDs)
  • noENST: Alternative appreci8 version (Ensembl used for annotation; consider all available transcript IDs)
  • speedup: Considerable speed-up of the original appreci8 version (Ensembl used for annotation; consider all available transcript IDs)
  • gatk4: Experimental appreci8 version using GATK 4.0.4.0 instead of GATK 3.3.0. The influence of the software update on variant calling results still have to be evaluated in detail.

License

The main application, as well as tag versions noENST, latest and speedup (all using GATK 3.3.0) are free to academic researches for non-commercial purposes (see important licensing information regarding GATK 3.3.0, provided below).
Tag version "gatk4" (using GATK 4.0.4.0), it is open-source under a BSD 3-clause "New" or "Revised" license.

  • Important licensing Information regarding GATK: The GATK 3.3.0 is licensed by the Broad Institute and is made available for free to academic users for non-commercial use only pursuant to the licensing terms below, and to other authorized licensees pursuant to the terms of their respective licenses, in each case for use within this pipeline only. The full text of the academic license for non-commercial use of GATK is available at https://www.broadinstitute.org/gatk/about/license.html. For commercial licensing information, please email softwarelicensing@broadinstitute.org. For more information about GATK 3.3.0, please visit the GATK website at https://www.broadinstitute.org.

  • GATK documentation resources and support: General GATK documentation can be found on the GATK website at http://www.broadinstitute.org/gatk/guide/. Users of this pipeline are welcome to ask GATK-related questions and report problems that are not specific to this pipeline in the GATK forum at http://gatkforums.broadinstitute.org/gatk.

User Feedback

We are continuously working on improving our variant calling pipeline and thus, updating this image. If you have any questions on the pipeline, a feature request or a bug report, please leave a comment or contact us directly.

Remarks

  • The pipeline was developed for targeted NGS data. Although, WES data was already successfully analyzed with appreci8, analysis might take considerably longer. A VAF threshold higher than 0.01 is strongly recommended.
  • Concentrating on the analysis of coding mutations, appreci8 excludes calls that are - according to annotation by SnpEff - a 5_prime_UTR_variant, a 3_prime_UTR_variant, a downstream_gene_variant, an upstream_gene_variant, an intron_variant, an intergenic_variant, an intragenic_variant, a synonymous_variant or involved in protein_protein_contact.
  • The pipeline is working with alignments to GRCh37.
  • Always use 1 instead of chr1, 2 instead of chr2 etc.

Using appreci8 without Docker

Future work

  • A version using RefSeq instead of Ensembl for annotation will shortly be uploaded.
  • A version performing annotation and analysis on the basis of all available RefSeq transcript-IDs for a called variant will shortly be uploaded.
  • A version analyzing matched samples (i.e. tumor- and germline samples) is currently under development.

Change log

Date Changes
30.05.2018 Experimental version of appreci8, using GATK 4.0.4.0 instead of GATK 3.3.0
13.04.2018 Speed-up version of appreci8 (all available Ensembl transcript_IDs for a called variant)
15.08.2017 Updated version of appreci8.
15.08.2017 Tag version performing annotation and analysis on the basis of all available Ensembl transcript-IDs for a called variant available (noENST)
19.06.2017 First version of appreci8.
Docker Pull Command
Owner
wwuimi