Public | Automated Build

Last pushed: 7 months ago
Short Description
CarrierSeq Analysis
Full Description



bioRxiv doi:

CarrierSeq is a sequence analysis workflow for low-input nanopore sequencing which utlizes a genomic carrier.

Contributors: Angel Mojarro (@amojarro) and Christopher E. Carr (@CarrCE).


Long-read nanopore sequencing technology is of particular significance for taxonomic identification at or below the species level. For many environmental samples, the total extractable DNA is far below the current input requirements of nanopore sequencing, preventing “sample to sequence” metagenomics from low-biomass or recalcitrant samples.


Here we address this problem by employing carrier sequencing, a method to sequence low-input DNA by preparing the target DNA with a genomic carrier to achieve ideal library preparation and sequencing stoichiometry without amplification. We then use CarrierSeq, a sequence analysis workflow to identify the low-input target reads from the genomic carrier


CarrierSeq implements bwa-mem (Li, 2013) to first map all reads to the genomic carrier then extracts unmapped reads by using samtools (Li et al., 2009) and seqtk (Li, 2012). Thereafter, the user can define a quality score threshold and CarrierSeq proceeds to discard low-complexity reads with fqtrim (Pertea, 2015). This set of unmapped and filtered reads are labeled “reads of interest” and should theoretically comprise target reads and likely contamination. However, reads of interest may also include “high-quality noise reads” (HQNRs), defined as reads that satisfy quality score and complexity filters yet do not match to any database and disproportionately originate from specific channels. By treating reads as a Poisson arrival process, CarrierSeq models the expected reads of interest channel distribution and rejects data from channels exceeding a reads/channels threshold (xcrit). Reads of interest are then sorted in 08_target_reads (reads/channel ≤ xcrit) or 07_hqnrs (reads/channel > xcrit).


The CarrierSeq scripts requires the following packages to be installed on your local machine.

Biopython -</br>
SciPy -</br>
bwa -</br>
seqtk -</br>
samtools -</br>
fqtrim -</br>

Alternatively, use Docker and the Docker script.

Using Docker and Dockerhub

  1. Download & install Docker -
  2. Start docker
  3. run docker pull mojarro/carrierseq:latest

That's it!

Using CarrierSeq

Note: You may need to first make the script executable with:

chmod +x path/to/

Reads to be analyzed must be compiled into a single fastq file and the carrier reference genome must be in fasta format.

Run CarrierSeq with:

./ -i <input.fastq> -r <reference.fasta> -q <q_score> -p <p_value> -o <output_directory> -t <bwa_threads>

or with Docker...

./ -i <input.fastq> -r <reference.fasta> -q <q_score> -p <p_value> -o <output_directory> -t <bwa_threads>

-i -r and -o are mandatory flags, CarrierSeq will use the default values if -q -p or -t are not defined:

bwa_threads = 1 
q_score = 9
p_value = 0.0001 or 0.05/512 active channels

CarrierSeq Output

CarrierSeq will generate the following folders and files within your working directory:

# All reads mapped to the carrier reference genome.

# Unmapped reads to the carrier.


# Reads equal to or greater than a given q-score threshold (default = 9).

# Discarded reads below the given q-score threshold

# Reads with less than 50% of its length detected as low complexity

# Discarded reads with over than 50% of its length detected as low complexity               

# Reads of Interest - should theoretically consist of target reads and contamination,
# but may also include "high-quality noise reads" HQNRs which originate from specific channels.

# By treating reads as a Poisson arrival process, CarrierSeq models the expected reads-of-interest 
# channel distribution and rejects data from channels exceeding a reads/channels threshold (xcrit)

# Likely HQNRs (reads/channel > xcrit) 

# Likely Target Reads (reads/channel ≤ xcrit)
Docker Pull Command
Source Repository