Public Repository

Last pushed: 2 months ago
Short Description
Appreci8 performs variant calling in NGS data with high sensitivity and high PPV.
Full Description

What is appreci8?

Appreci8 is a variant calling pipeline for detecting single nucleotide variants (SNVs) and short indels (up to ~30 bp) in next-generation sequencing (NGS) data. By integrating and filtering the output of eight individual variant calling tools on the basis of an artifact- and a polymorphism score, appreci8 succeeds in calling variants with high sensitivity and positive predictive value even at variant allele frequencies of 1%.

Prerequisites

In order to use this image you have to download and unzip the appreci8 folder from the following url: https://uni-muenster.sciebo.de/index.php/s/CV7e0HaR7Z4bzFy
This directory has to be mounted into the appreci8 docker container in the way described in How to use this image?

The data you wish to analyze has to be prepared in the following way (compare folder Example contained in the appreci8 folder):

  • SampleNames.txt: The names of the samples you wish to analyze (without file extension, one name per line)
  • vcf_header.txt: Standard vcf file header (available in the appreci8 folder)
  • Folder alignment: Containing the bam- and bai files of the samples you wish to analyze (format: sample1.bam, sample1.bai etc.)
  • Folder snpEff_ann:
    • Hotspots.txt: A list containing known hotspot mutations, covering Gene, Mutation (change on amino acid level, one-letter-code), Min_VAF (minimum allelic frequency at which you expect these mutations); an empty list can be passed, containing the header and three NA's (available in the appreci8 folder)
    • transcripts.txt: A list containing the genes and the corresponding Ensembl transcript-IDs to be analyzed (without header; e.g. NRAS\tab ENST00000369535; for an example see file in the Example folder)
  • Folder targetRegions:
    • targetRegions.bed: Bed file containing the target regions to be analyzed (no header, no information except for chr, start, end; 1 instead of chr1 etc.; for an example see file in the Example folder)

How to use this image?

To start the application with the default settings simply run

$ docker run -v /path/to/appreci8:/appreci8 -v /path/to/data:/data wwuimi/appreci8

If you need to change the default options appreci8 is using for calculating, simply change

  • minimum number of reads with the alternate allele: MIN_ALT (default: 20)
  • minimum depth: MIN_DP (default: 50)
  • minimum variant allele frequency: MIN_VAF (default: 0.01; do not choose values below 0.01)
  • minimum mean base quality for reads with the alternate allele: MIN_BQ_ALT (default: 15)
  • maximum difference for (mean base quality reference) - (mean base quality alternative): MAX_BQ_DIFF (default: 7)
  • maximum number of samples that are allowed to feature the same variant without penalizing: MAX_SAMPLES (default: 3; if your data set contains more than 3 replicates of the same sample, it is recommended to increase this value)
  • bed file containing primer locations is provided: PRIMER (default: "FALSE")

The new command with custom settings could look like this:

$ docker run -v /path/to/appreci8:/appreci8 -v /path/to/data:/data -e "MIN_ALT=XX" -e "MIN_DP=XX" wwuimi/appreci8

If you need a special Version of appreci8 check out the tag section and decide which container you need.
Then simply run

$ docker run -v /path/to/appreci8:/appreci8 -v /path/to/data:/data wwuimi/appreci8:tag

Tag versions

  • latest: Basic appreci8 version (Ensembl used for annotation; defined list of transcript IDs)
  • noENST: Alternative appreci8 version (Ensembl used for annotation; consider all available transcript IDs)

License

The software contained in this image is licensed under the LGPLv3.

User Feedback

We are continuously working on improving our variant calling pipeline and thus, updating this image. If you have any questions on the pipeline, a feature request or a bug report, please leave a comment or contact us directly.

Remarks

  • The pipeline was developed for targeted NGS data. Although, WES data was already successfully analyzed with appreci8, analysis might take considerably longer. A VAF threshold higher than 0.01 is strongly recommended.
  • Concentrating on the analysis of coding mutations, appreci8 excludes calls that are - according to annotation by SnpEff - a 5_prime_UTR_variant, a 3_prime_UTR_variant, a downstream_gene_variant, an upstream_gene_variant, an intron_variant, an intergenic_variant, an intragenic_variant, a synonymous_variant or involved in protein_protein_contact.
  • The pipeline is working with alignments to GRCh37.
  • Always use 1 instead of chr1, 2 instead of chr2 etc.

Future work

  • A version using RefSeq instead of Ensembl for annotation will shortly be uploaded.
  • A version performing annotation and analysis on the basis of all available RefSeq transcript-IDs for a called variant will shortly be uploaded.
  • A version analyzing matched samples (i.e. tumor- and germline samples) is currently under development.

Change log

Date Changes
15.08.2017 Updated version of appreci8.
15.08.2017 Tag version performing annotation and analysis on the basis of all available Ensembl transcript-IDs for a called variant available (noENST)
19.06.2017 First version of appreci8.
Docker Pull Command
Owner
wwuimi

Comments (0)