Public Repository

Last pushed: 3 months ago
Short Description
Support-Vector Structural-Variant Genotyper
Full Description

https://github.com/dantaki/SV2

Support Vector Structural Variation Genotyper

A genotyper for the rest of us

Preprint

bioRxiv : doi

SV2 (support-vector structural-variant genotyper) is a machine learning algorithm for genotyping deletions and duplications from paired-end whole genome sequencing data. SV2 can rapidly integrate variant calls from multiple SV discovery algorithms into a unified callset with high genotyping accuracy and detection of de novo mutations.

User Guide

click here

Tutorial

Getting Started

1: Download Source Files

wget http://downloads.sourceforge.net/project/sv2/sv2-1.2.zip # sv2-1.2.tar.gz also available
unzip sv2-1.2.zip

Source Files

2: Configure Environment

Run configure.pl # define install location and paths to FASTA assemblies

cd sv2-1.2/
perl configure.pl # follow the instructions

Alternatively, manually configure (without Perl5)

3: Compile and Install from Source

python setup.py install # ignore numpy warnings

Options

sv2 --help

Flag Description
-i /-in Tab-delimited input [ID, BAM path, VCF path, Gender]
-r / -cnv SV to genotype. BED or VCF
-c / -cpu Parallelize sample-wise. 1 per CPU
-g / -genome Reference genome build [hg19, hg38]. Default: hg19
-pcrfree GC content normalization for PCR-free libraries
-s / -seed Random seed for genome shuffling in preprocessing. Default: 42
-o / -out Output name
-pre Preprocessing output directory. Skips preprocessing
-feats Feature output directory. Skips feature extraction

Input

Sample information < -i >

Tab-delimited file containing sample information. Gender can also be encoded as 1 for M and 2 for F

ID BAM PATH VCF PATH Gender [M/F]
NA12878 /bam/NA12878.bam /vcf/NA12878_SNVs.vcf.gz F
HG00096 /bam/HG00096.bam /vcf/HG00096_SNVs.vcf.gz M
  • BAM format
    • Supplementary alignment tags (SA) are required for split-read analysis
  • VCF format
    • Allele Depth (AD) is required
    • bgzip and tabix indexed VCF

Refer to the User Guide for more details.

Variants to genotype < -r >

  • BED format
    • Tab-delimited: first four columns
      • Chromosome
      • Start
      • End
      • Type: DEL | DUP
  • VCF format
    • SVTYPE= DEL | DUP
    • Must have END=

Refer to the User Guide for more details.

Output

Output is generated in the current working directory.

sv2_preprocessing/ contains preprocessing output. sv2_features/ contains feature extraction output.

sv2_genotypes/ contains output in tab-delimited BED format and VCF format.

Output VCF comes with gene annotations and other useful statistics

For more detail on SV2 output, please refer to the User Guide

Performance

Performance of de novo mutations

Please refer to the preprint for performance details.

Usage

  • SV2 is designed for human whole genome short-read sequencing libraries. Given deletion and duplication positions, SV2 returns a VCF with predicted copy number genotypes.
  • Whole genome alignments from the 1000 Genomes Project were used for training. Validated genotypes were obtained from the phase 3 integrated structural variation call set (DOI:10.1038/nature15394; PMID: 26432246).
  • Features for genotyping include coverage, discordant paired-ends, split-reads, and heterozygous allele depth ratio.
  • SV2 operates with a bi-allelic model with a copy number range of 0-4
  • Output is in VCF format.
    • Median Phred-adjusted ALT likelihoods are reported in the QUAL column
    • SV2 standard filters are reported in the FILTER column
    • SV2 stringent filters for de novo discovery are located in the INFO column as DENOVO_FILTER=
    • Positions are annotated based on their overlap to genes, RepeatMasker, segmental duplications, 1000 Genomes phase 3 CNV, and more
  • SVs with estimated autosome copy number >10 cannot be genotyped.

Requirements

SV2 requires python 2.7

SV2 has been tested on Linux and MacOS with bioconda

Source Files

Source Forge

GitHub

Please do not use git clone on this repository

Credits

Author:

  • Danny Antaki
    • dantaki@ucsd.edu

Acknowledgements:

Citing SV2

For citing SV2 please refer to the preprint: bioRxiv : doi

History

SV2 version 1.1 used in Brander, Antaki, Gujral, et al. bioRxiv 2017: DOI

gtCNV version 0.1 used in Brander, Antaki, Gujral, et al. AJHG 2016: DOI PMID: 27018473

License

MIT License

Copyright (c) 2017 Danny Antaki

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

Contact

dantaki@ucsd.edu

Docker Pull Command
Owner
dantaki

Comments (0)