Public | Automated Build

Last pushed: 4 months ago
Short Description
Freebayes image for CAW
Full Description


Sarek is a complete open source pipeline to detect germline, or somatic variants from WGS data developed at the National Genomics Infastructure at SciLifeLab Stockholm and National Bioinformatics Infastructure Sweden at SciLifeLab.

The pipeline uses Nextflow, a bioinformatics domain specific language for workflow building and Singularity, a container technology specific for high-performance computing.

This pipeline is primarily used with cluster on the Swedish UPPMAX systems.
However, the pipeline should be able to run on any system that supports Nextflow.
The pipeline comes with some configuration for different systems.
See the documentation for more information.

Sarek is based on GATK best practices to align, realign and recalibrate short-read data (which is done in parallel for tumor/normal pair samples).
After these preprocessing steps, several variant callers scan the resulting BAM files:
GATK HaplotyeCaller and Strelka are used to find germline SNVs and small indels (also used on tumor samples).
MuTect1, MuTect2, Freebayes and Strelka are used to find somatic SNVs and small indels.
For structural variants (germline and somatic) we use Manta.
Furthermore, we are applying ASCAT to estimate sample heterogeneity, ploidy and CNVs.

The pipeline is prepared to process normal or tumor/normal pairs (and several relapse samples).
It can begin the analysis either from raw FASTQ files, only from the realignment step, or directly with any subset of variant callers using recalibrated BAM files.
At the end of the analysis the resulting VCF files and results from each caller are also retained.
And snpEff and/or VEP can be used to annotate them.

The flow is capable of accommodating additional variant calling software or CNV callers.

Besides variant calls, the workflow provides quality controls presented by MultiQC.

The containers directory contains building rules for containers for all Sarek processes.

This pipeline is listed on Elixir - Tools and Data Services Registry.


The Sarek pipeline comes with documentation about the pipeline, found in the doc/ directory:

  1. Installation documentation
  2. Installation documentation specific for rackham
  3. Installation documentation specific for bianca
  4. Tests documentation
  5. Reference files documentation
  6. Configuration and profiles documentation
  7. Intervals documentation
  8. Running the pipeline
  9. Examples
  10. TSV file documentation
  11. Processes documentation
  12. Documentation about containers
  13. Documentation about building
  14. More information about ASCAT
  15. Folder structure

Contributions & Support


Docker Pull Command
Source Repository