Public Repository

Last pushed: 5 months ago
Short Description
ConsensusDriver is a system for consensus-based cancer driver prediction
Full Description

ConsensusDriver: a docker-based system for the consensus prediction of cancer drivers


Introduction

ConsensusDriver provides users a framework to easily run a wide range of cancer driver prediction methods on omics datasets and integrate results to obtain consensus predictions that have higher sensitivity and precision Bertrand et al, 2017. It uses docker technology to significantly reduce the effort in installing and using different software packages, and it enables analysis on a personal computer for those who are not adept at using servers and linux systems. ConsensusDriver combines diverse driver prediction paradigms including popular methods such as MutsigCV, OncodriveFM, DriverNET, OncoIMPACT, fathmm and CHASM.


Installing ConsensusDriver on Windows and Mac systems using Kitematic

  • Download the software 'Kitematic'
  • Search for the container 'consensusdriver' within 'Kitematic', and click 'Create' to create the container
  • Start 'DOCKER CLI' (click on the bottom left corner button)

Downloading and installing additional modules for ConsensusDriver

ConsensusDriver requires the installation of additional modules that are not included in the docker image due to space constrains and software licensing issues. The modules can be downloaded using the following command:

docker exec -it consensusdriver /install_module.pl --database

Description of options:
--database: Downloads the ConsensusDriver database (required).
--annovar <URL to annovar.tar.gz>: Installs the software 'annovar' that can be used to annotate single nucleotide variant calls that are provided to ConsensusDriver. As annovar cannot be redistributed, you will have to register and obtain your own licence here. Once the form has been submitted, you will receive an email providing a URL to the latest version of annovar.
--fathmm: To install the fathmm database. Currently we do not install the fathmm database by default as it requires the download of 4Gb of data and the installation of an SQL database that requires 3 to 4 hours of installation time and 30Gb of free disk space.


Performing a test run with ConsensusDriver

  • Install the module required for the test run
    docker exec -it consensusdriver /install_module.pl --test
    
  • Copy the test input file into the docker container
    docker cp <path_to_input_file> consensusdriver:/SAMPLE
    
  • Run ConsensusDriver on the test input file
    docker exec -it consensusdriver `
    /ConsensusDriver.pl --file test.annot --annotation --cancer GBM
    
  • Copy the result file test.annot.csv into a directory outside the docker image
    docker cp consensusdriver:/test.annot.csv <output_directory>
    
  • The result file test.annot.csv contains the following information:
    • Hugo_Symbol: Gene name
    • Rank: Rank of the predicted driver gene
    • TCGA_SNV_Frequency: mutation frequency observed for this gene in the specified cancer type in the corresponding TCGA cohort
    • Cancer_Gene_Census: "CGC" if the gene is present in the manually curated list of cancer driver genes of the cancer gene census, "-" otherwise
    • Chromosome: chromosome name
    • Start_position: starting position of the mutation
    • Reference_Allele: the reference allele at Start_position
    • Alternative_Allele: the allele observed at Start_position in the patient
    • Variant_Classification: the type of variant (missense, nonsense, indel)
    • TranscriptID: the knownGene transcript ID
    • CDS_Coordinate: the coordinate of the mutation in the transcript
    • AAChange: the amino acid change caused by the mutation
    • methodName_Rank: rank of the predicted gene according to the method methodName
    • methodName_Score/methodName_Pvalue: score/p-value of the predicted gene according to the method methodName

Running ConsensusDriver on user data

CHASM and data privacy

Due to the large disk space requirement for a local installation of CHASM, we decided to use the CRAVAT web server to use CHASM. If you decide to use CHASM, your data will be uploaded to the CRAVAT web server. If you prefer to keep your data on your computer, you can disable CHASM using the argument --no_chasm.

Cancer types supported for patient-specific prediction

For patient-specific predictions ConsensusDriver uses precomputed databases to optimize analysis for the following 15 cancer types The cancer type can be specified using the --cancer option and the following TCGA acronyms:

  • BLCA: Bladder Urothelial Carcinoma
  • BRCA: Breast Invasive Carcinoma
  • COAD: Colon Adenocarcinoma
  • GBM: Glioblastoma Multiforme
  • KIRC: Kidney Renal Clear Cell Carcinoma
  • KIRP: Kidney Renal Papillary Cell Carcinoma
  • LIHC: Liver Hepatocellular Carcinoma
  • LUAD: Lung Adenocarcinoma
  • LUSC: Lung Squamous Cell Carcinoma
  • OV: Ovarian Serous Cystadenocarcinoma
  • PAAD: Pancreatic Adenocarcinoma
  • PRAD: Prostate Adenocarcinoma
  • READ: Rectum Adenocarcinoma
  • STAD: Stomach Adenocarcinoma
  • THCA: Thyroid Carcinoma

Mutation file formats

In this early version, ConsensusDriver only support SNVs coordinates based on hg19 (additional human genome assembly will be added in the next version).
ConsensusDriver allows different formats for the input of mutation data file I:

  • vcf format, this is a file containing raw mutation calls obtained from exome or whole genome data. To analyze such input, the installation of the software annovar (see section on installing additional modules) is required.
    docker exec -it consensusdriver /ConsensusDriver.pl  --file I --vcf --cancer GBM
    
  • annovar output format, annovar tab delimited output obtained using the knownGene database.
    docker exec -it consensusdriver /ConsensusDriver.pl --file I --annovar --cancer GBM
    
  • coordinates format, this is a human-readable file format containing the minimum information required to run the annotation software annovar:
    • Chromosome: chromosome name
    • Start_position: starting position of the mutation
    • End_position: end position of the mutation
    • Reference_Allele: the reference allele at Start_position
    • Alternative_Allele: the allele observed at Start_position in the patient
      docker exec -it consensusdriver `
      /ConsensusDriver.pl --file I --coordinate --cancer GBM
      
  • custom annotation format, if you have already performed preliminary annotation using your own pipeline and want to use it as input to ConsensusDriver, you may use the following format:
    • Chromosome: chromosome name
    • Start_position: starting position of the mutation
    • Hugo_Symbol: the gene name of the mutated gene
    • Reference_Allele: the reference allele at Start_position
    • Alternative_Allele: the allele observed at Start_position in the patient
    • TranscriptID: the CCDS transcript ID
    • CDS_Coordinate: the coordinate of the mutation in the transcript
    • AAChange: the amino acid change caused by the mutation
    • Variant_Classification: the type of variant (missense, nonsense, neutral, indel)
      docker exec -it consensusdriver /ConsensusDriver.pl --file I --annot --cancer GBM
      

Contact

ConsensusDriver is a work in progress and comments and feedback are very welcome. Please write to Denis Bertrand (bertrandd@gis.a-star.edu.sg) and Niranjan Nagarajan (nagarajann@gis.a-star.edu.sg).

Docker Pull Command
Owner
csb5gis

Comments (0)