Platypus: A Haplotype-Based Variant Caller For Next Generation Sequence Data
Reference Andy Rimmer, Hang Phan, Iain Mathieson, Zamin Iqbal, Stephen R. F. Twigg, WGS500 Consortium, Andrew O. M. Wilkie, Gil McVean, Gerton Lunter. Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications. Nature Genetics (2014) doi:10.1038/ng.3036
Platypus is a tool designed for efficient and accurate variant-detection in high-throughput sequencing data. By using local realignment of reads and local assembly it achieves both high sensitivity and high specificity. Platypus can detect SNPs, MNPs, short indels, replacements and (using the assembly option) deletions up to several kb. It has been extensively tested on whole-genome, exon-capture, and targeted capture data, it has been run on very large datasets as part of the Thousand Genomes and WGS500 projects, and is being used in clinical sequencing trials in the Mainstreaming Cancer Genetics programme. Platypus has been thoroughly tested on data mapped with Stampy and BWA. It has not been tested with other mappers, but it should behave well. Platypus has been used to detect variants inhHuman, mouse, rat and chimpanzee samples, amongst others, and it should perform well on data from any diploid organism. It has also been used to find somatic mutations in cancer, and mozaic mutations in human exome data.
Capabilities Platypus reads data from BAM files, and outputs a single VCF file containing a list of identified variants, and genotype calls and likelihoods for all samples. It can identify SNPs, MNPs and short (less than one read length) indels, and larger (up to several kb deletions and maybe 200bp insertions) variants using local assembly. Platypus can process large amounts of BAM data very efficiently, and can handle samples spread across multiple BAM files. Duplicate read marking, local re-alignment, and variant identification and filtering are performed on-the-fly using a single command. Platypus will run on any input data in BAM format, but has only been properly tested on Illumina data.
Dependencies Platypus is written in Python, Cython and C. It requires only Python (>=2.6) and a C compiler to build; these are standard on most linux and Mac OS distributions, and Platypus should build and run without problems for most people.
Building Platypus To build Platypus, simply un-pack the tar-ball and run the buildPlatypus.sh script provided:
tar -xvzf Platypus_x.x.x.tgz cd Platypus_x.x.x ./buildPlatypus.sh
This will take a minute or so, and generate quite a lot of warnings. If the build is successful, you will see a message, 'Finished building Platypus'. Platypus is then ready for variant-calling.
Running Platypus Platypus can be run from the command-line, using Python. It needs 1 or more BAM input files, and a FASTA reference file. The BAM file(s) must be indexed using Samtools or an equivalent program, and the FASTA file must also be indexed using 'samtools faidx' or equivalent.
The simplest way to tun Platypus is as follows:
python Platypus.py callVariants --bamFiles=input.bam --refFile=ref.fa --output=VariantCalls.vcf
The output will be a single VCF file containing all the variants that Platypus identified, and a 'log.txt' file, containing log information. The last line in the log file, and on the command-line output, should be 'Finished variant calling'. This means that the calling has completed without major errors. It is a good idea to also check the log output for warnings or errors.
Contact: Bug reports, comments, and feature requests (positive feedback also greatly appreciated) can be sent to firstname.lastname@example.org