Full documentation on Read the Docs.
A set of tools for viral metagenomics.
virmet is called with a command subcommand
virmet fetch --viral n, for example, downloads the bacterial
database. Other available subcommands so far are
updateupdate viral/bacterial database
wolfpackanalyze a Miseq run
covplotplot coverage for a specific organism
A short help is obtained with
virmet subcommand -h.
The simplest example
[user@host ~]$ virmet wolfpack --run path_to_run_directory ... some time later ... [user@host ~]$ cat virmet_output_name_of_the_run/sample_name/orgs_list.csv organism reads Torque teno virus 3 140 Torque teno virus 101 BeAn 58058 virus 14 Caulobacter phage Ccr29 5 Human immunodeficiency virus 1 3 Moraxella phage Mcat16 2 Torque teno virus 15 1 ...
setuptools or Docker
VirMet contains programs to download and index the genome sequences,
Running a virus scan
This can be run on a single file or on a directory. It will try to guess from
the naming scheme if it is a Miseq output directory (i.e. with
Data/Intensities/BaseCalls/ structure) and analyze all fastq files in there.
The extension must be
.fastq.gz. It will then run a filtering
step based on quality, length and entropy (in short: reads with a lot of
repeats will be discarded), followed by a decontamination step where reads
of human/bacterial/bovine/fungal origin will be discarded. Finally, remaining
reads are blasted against the viral database. The list of organisms with the
count of reads is in files
orgs_list.csv in the output directory
virmet_output_...). For example, if we have a directory named
exp_01 with files
exp_01/AR-1_S1_L001_R1_001.fastq.gz exp_01/AR-2_S2_L001_R1_001.fastq.gz exp_01/AR-3_S3_L001_R1_001.fastq.gz exp_01/AR-4_S4_L001_R1_001.fastq.gz
we could run
[user@host test_virmet]$ virmet wolfpack --dir exp_01
and, after some time, find the results in
virmet_output_exp01. Many files are
present, the most important ones being
first lists the viral organisms found with a count of reads that could be
matched to them.
[user@host test_virmet]$ cat virmet_output_test_dir_150123/3-1-65_S5/orgs_list.tsv organism reads Human adenovirus 7 126 Human poliovirus 1 strain Sabin 45 Human poliovirus 1 Mahoney 29 Human adenovirus 3+11p 19 Human adenovirus 16 1
The second file is a summary of all reads analyzed for this sample and how many
were passing a specific step of the pipeline or matching a specific database.
[user@host test_virmet]$ cat virmet_output_exp01/AR-1_S1/stats.tsv raw_reads 6250 trimmed_too_short 462 low_entropy 1905 low_quality 0 passing_filter 3883 matching_humanGRCh38 3463 matching_bact1 0 matching_bact2 0 matching_bact3 0 matching_fungi1 0 matching_bt_ref 0 reads_to_blast 420 viral_reads 257 undetermined_reads 163
Updating the database
More and more sequences are uploaded to NCBI database every month. The figure
shows the number of viral sequences with complete genome in the title
that are submitted every month to NCBI (code).
VirMet provides a simple way to update the viral database.