ncbi/sra-human-scrubber

By ncbi

Updated about 1 year ago

An SRA tool to identify and mask (or remove) unintended human reads from NGS fastq files.

Image
1

100K+

ncbi::sra-human-scrubber

Description

The human read removal tool (HRRT) is based on the SRA Taxonomy Analysis Tool that will take as input a fastq file, and produce as output a fastq.clean file in which all reads identified as potentially of human origin are masked with 'N'. Source files and CHANGELOG are available in the sra-human-scrubber github repository.

Usage

For the container to find your local file, arguments are supplied to the docker run command that mount your local working directly for both reading, and writing.
Specifically:-v $PWD:$PWD:rw mount current working directory (standard unix variable $PWD) inside the container to the same outside the container with read/write permission — this is because it will both read the input fastq file, and write the output fastq.clean file (as well as an intermediate file that is removed).
-w $PWD pass the current working directory to be the working directory inside the container.
-it simply ask to run the container interactively and allocate tty interface for that interactive session.

Invoke the test

Here the command is simply given the file argument test docker run -it -v $PWD:$PWD:rw -w $PWD ncbi/sra-human-scrubber:latest /opt/scrubber/scripts/scrub.sh test

2022-09-06 21:18:26	aligns_to version 0.707
2022-09-06 21:18:26	hardware threads: 8, omp threads: 8
2022-09-06 21:18:26	loading time (sec) 0
2022-09-06 21:18:26	/tmp/tmp.AtEXSJWJDw/temp.fasta
2022-09-06 21:18:26	FastaReader
2022-09-06 21:18:26	100% processed
2022-09-06 21:18:26	total spot count: 2
2022-09-06 21:18:26	total read count: 2
2022-09-06 21:18:26	total time (sec) 0
1  spot(s) masked or removed.

test succeeded

Mask human reads from fastq file

Here the command is given the path to your local fastq file as argument docker run -it -v $PWD:$PWD:rw -w $PWD ncbi/sra-human-scrubber:latest /opt/scrubber/scripts/scrub.sh path-to-fastq-file/filename.fastq

Example: docker run -it -v $PWD:$PWD:rw -w $PWD ncbi/sra-human-scrubber:latest /opt/scrubber/scripts/scrub.sh MyFastqFile.fastq

2022-09-06 21:35:04	aligns_to version 0.707
2022-09-06 21:35:04	hardware threads: 8, omp threads: 8
2022-09-06 21:35:04	loading time (sec) 0
2022-09-06 21:35:04	/tmp/tmp.Ccqruccyoq/temp.fasta
2022-09-06 21:35:04	FastaReader
2022-09-06 21:35:04	0% processed
2022-09-06 21:35:06	100% processed
2022-09-06 21:35:06	total spot count: 216859
2022-09-06 21:35:06	total read count: 216859
2022-09-06 21:35:06	total time (sec) 2
129  spot(s) masked or removed.
$ ls -l
-rw-r--r-- 1 78656910 Sep  6 21:34 MyFastqFile.fastq
-rw-r--r-- 1 78656910 Sep  6 21:35 MyFastqFile.fastq.clean

Note by default the application scales to use all threads available ( see option -p for setting threads below ).

Other useful options docker run -it -v $PWD:$PWD:rw -w $PWD ncbi/sra-human-scrubber:latest /opt/scrubber/scripts/scrub.sh -h

Usage: scrub.sh [OPTIONS] [file.fastq] 
OPTIONS:
	-i <input_file>; Input Fastq File.
	-o <output_file>; Save cleaned sequence reads to file, or set to - for stdout.
		NOTE: When stdin is used, output is stdout by default.
	-p <number> Number of threads to use.
	-d <database_path>; Specify a database other than default to use.
	-x ; Remove spots instead of default 'N' replacement.
		NOTE: Now by default sequence length of identified spots replaced with 'N'.
	-r ; Save identified spots to <input_file>.spots_removed.
	-u <user_named_file>; Save identified spots to <user_named_file>.
		NOTE: Required with -r if output is stdout, otherwise optional.
	-t ; Run test.
	-s ; Input is (collated) interleaved paired-end(read) file AND you wish both reads masked or removed.
	-h ; Display this message.

Docker Pull Command

docker pull ncbi/sra-human-scrubber