Public Repository

Last pushed: a year ago
Short Description
BBCanalyzer is an application for visualing base counts in NGS data.
Full Description

What ist BBCAnalyzer?

Deriving valid variant calling results from raw next-generation sequencing data is a particularly challenging task, especially with respect to clinical diagnostics and personalized medicine. However, when using classic variant calling software, the user usually obtains nothing more than a list of variants that pass the corresponding caller's internal filters. Any expected mutations (e.g. hotspot mutations), that have not been called by the software, need to be investigated manually.

BBCAnalyzer (Bases By CIGAR Analyzer) provides a novel visual approach to facilitate this step of time-consuming, manual inspection of common mutation sites. BBCAnalyzer is able to visualize base counts at predefined positions or regions in any sequence alignment data that are available as BAM files. Thereby, the tool provides a straightforward solution for evaluating any list of expected mutations like hotspot mutations, or even whole regions of interest. In addition to an ordinary textual report, BBCAnalyzer reports highly customizable plots. Information on the counted number of bases, the reference bases, known mutations or polymorphisms, called mutations and base qualities is summarized in a single plot. By uniting this information in a graphical way, the user may easily decide on a variant being present or not - completely independent of any internal filters or frequency thresholds.

The docker container provides the BBCAnalyzer as a local web application. Additionally, the BBCAnalyzer is available as an R package at http://bioconductor.org

Prerequisites

To analyze data with the BBCAnalyzer you need

  • Aligned sequencing data (.bam- and .bai-files)

How to use this image?

To run the latest version of the BBCAnalyzer simple run

$ docker run -p 3838:3838 -v /tmp/output/:/srv/shiny-server/output -t wwuimi/bbcanalyzer
You can test it by visiting http://localhost:3838 or http://host-ip:3838.

A shiny interface will open, providing two different options for analysis: 1) Analyze Bases (left panel) or 2) Plot Only (middle panel). For both analysis options, there are different input options.

1. Analyze Bases

  • Samples to analyze: Names of the samples to analyze (one entry per line, no file name extensions).
  • Define folder containing bam- and bai-files: Input folder containing the alignment data of all samples to analyze. For each sample defined in “Samples to analyze” there must be a bam- and a bai-file with matching names in the defined folder. Important: Windows-User have to define the folder using double-backslashes (e.g. C:\home\BBCAnalyzer local\Scripts\).
  • Target regions to analyze: Target regions to analyze. Single positions (chromosome and position) as well as longer regions (chromosome, start, end) are equally supported.
  • Define folder containing vcf files (optional): Input folder containing vcf files for all samples to analyze. For each sample defined in “Samples to analyze” there must be a vcf-file with matching names in the defined folder.
  • Define output folder: Folder where output files shall be saved.
  • Define tabix file containing known variants (optional): Tabix file (format: file.vcf.gz.tbi) containing known variants. For the correct functioning of this option the file “file.vcf.gz” is equally necessary (same folder as “file.vcf.gz.tbi”; does not have to be defined anywhere).
  • Select reference genome for analysis: Select one of the available reference genomes for analysis. If a genome has not yet been installed, it is automatically downloaded and installed (this process can take a few minutes). Default: BSgenome.Hsapiens.UCSC.hg19.
  • Mapping quality threshold: A PHRED-scaled value to be used as a mapping quality threshold. All reads with a mapping quality below this threshold are excluded from analysis. Every base in an excluded read gets marked in the output. Default: 60.
  • Base quality threshold: A PHRED-scaled value (+33) to be used as a base quality threshold. All bases with a base quality below this threshold are excluded from analysis. Every excluded base gets marked in the output. The number of excluded bases per position gets counted and reported. Default: 50.
  • Frequency threshold for variant reporting: A frequency to be used as a threshold for variants to be reported. Default: 0.01.
  • Lower- and upper mean quality bound for color-coding: The lower- and upper bound for the mean quality that shall be color-coded in the plots. All bases with a mean quality below the lower bound are colored with the lightest color defined for the corresponding base. All bases with a mean quality above the upper bound are colored with the darkest color defined for the corresponding base. If the bases shall not be color-coded according to their mean quality, the definable range has to be zero. Default: 58-63.
  • Select levels at which marks shall be drawn: Levels (relative number of reads) at which horizontal lines shall be drawn in the plots.
  • Plot number of reads: Relative or absolute number of reads can be plotted. Default: Relative.
  • Create one plot per: One plot per sample or one plot per position can be created. Default: Sample.

2. Plot Only

  • Samples to analyze: Names of the samples to analyze (one entry per line, no file name extensions).
  • Consider vcf file information (only possible if evaluated in complete analysis): If vcf file information was available for every sample analyzed with BBCAnalyzer and it was considered in the previously performed complete analysis, it can also be considered for “Create plots only”. Default: No.
  • Define output folder: Folder where previously created output files are stored (sampleX.frequency.txt and sampleX.calling.txt) and new plots shall be saved.
  • Define tabix file containing known variants (optional): Tabix file (format: file.vcf.gz.tbi) containing known variants. For the correct functioning of this option the file “file.vcf.gz” is equally necessary (same folder as “file.vcf.gz.tbi”; does not have to be defined anywhere).
  • Lower- and upper mean quality bound for color-coding: The lower- and upper bound for the mean quality that shall be color-coded in the plots. All bases with a mean quality below the lower bound are colored with the lightest color defined for the corresponding base. All bases with a mean quality above the upper bound are colored with the darkest color defined for the corresponding base. If the bases shall not be color-coded according to their mean quality, the definable range has to be zero. Default: 58-63.
  • Select levels at which marks shall be drawn: Levels (relative number of reads) at which horizontal lines shall be drawn in the plots.
  • Plot number of reads: Relative or absolute number of reads can be plotted. Default: Relative.
  • Create one plot per: One plot per sample or one plot per position can be created. Default: Sample.

License

The software contained in this image is licensed under the LGPLv3.

User Feedback

We are continuously working on improving our variant calling pipeline and thus, updating this image. If you have any questions on the pipeline, a feature request or a bug report, please leave a comment or contact us directly.

Remarks

For detailed information on the BBCAnalyzer - including its performance on a set of well-characterized NGS data - check out our publication: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1549-4

Docker Pull Command
Owner
wwuimi