ncbi/sra-human-scrubber
An SRA tool to identify and mask (or remove) unintended human reads from NGS fastq files.
100K+
The human read removal tool (HRRT) is based on the SRA Taxonomy Analysis Tool that will take as input a fastq file, and produce as output a fastq.clean file in which all reads identified as potentially of human origin are masked with 'N'. Source files and CHANGELOG are available in the sra-human-scrubber github repository.
For the container to find your local file, arguments are supplied to the docker run command that mount your local working directly for both reading, and writing.
Specifically:-v $PWD:$PWD:rw
mount current working directory (standard unix variable $PWD
) inside the container to the same outside the container with read/write permission — this is because it will both read the input fastq file, and write the output fastq.clean file (as well as an intermediate file that is removed).-w $PWD
pass the current working directory to be the working directory inside the container.-it
simply ask to run the container interactively and allocate tty interface for that interactive session.
Invoke the test
Here the command is simply given the file argument test
docker run -it -v $PWD:$PWD:rw -w $PWD ncbi/sra-human-scrubber:latest /opt/scrubber/scripts/scrub.sh test
2022-09-06 21:18:26 aligns_to version 0.707
2022-09-06 21:18:26 hardware threads: 8, omp threads: 8
2022-09-06 21:18:26 loading time (sec) 0
2022-09-06 21:18:26 /tmp/tmp.AtEXSJWJDw/temp.fasta
2022-09-06 21:18:26 FastaReader
2022-09-06 21:18:26 100% processed
2022-09-06 21:18:26 total spot count: 2
2022-09-06 21:18:26 total read count: 2
2022-09-06 21:18:26 total time (sec) 0
1 spot(s) masked or removed.
test succeeded
Mask human reads from fastq file
Here the command is given the path to your local fastq file as argument
docker run -it -v $PWD:$PWD:rw -w $PWD ncbi/sra-human-scrubber:latest /opt/scrubber/scripts/scrub.sh path-to-fastq-file/filename.fastq
Example:
docker run -it -v $PWD:$PWD:rw -w $PWD ncbi/sra-human-scrubber:latest /opt/scrubber/scripts/scrub.sh MyFastqFile.fastq
2022-09-06 21:35:04 aligns_to version 0.707
2022-09-06 21:35:04 hardware threads: 8, omp threads: 8
2022-09-06 21:35:04 loading time (sec) 0
2022-09-06 21:35:04 /tmp/tmp.Ccqruccyoq/temp.fasta
2022-09-06 21:35:04 FastaReader
2022-09-06 21:35:04 0% processed
2022-09-06 21:35:06 100% processed
2022-09-06 21:35:06 total spot count: 216859
2022-09-06 21:35:06 total read count: 216859
2022-09-06 21:35:06 total time (sec) 2
129 spot(s) masked or removed.
$ ls -l
-rw-r--r-- 1 78656910 Sep 6 21:34 MyFastqFile.fastq
-rw-r--r-- 1 78656910 Sep 6 21:35 MyFastqFile.fastq.clean
Note by default the application scales to use all threads available
( see option -p
for setting threads below ).
Other useful options docker run -it -v $PWD:$PWD:rw -w $PWD ncbi/sra-human-scrubber:latest /opt/scrubber/scripts/scrub.sh -h
Usage: scrub.sh [OPTIONS] [file.fastq]
OPTIONS:
-i <input_file>; Input Fastq File.
-o <output_file>; Save cleaned sequence reads to file, or set to - for stdout.
NOTE: When stdin is used, output is stdout by default.
-p <number> Number of threads to use.
-d <database_path>; Specify a database other than default to use.
-x ; Remove spots instead of default 'N' replacement.
NOTE: Now by default sequence length of identified spots replaced with 'N'.
-r ; Save identified spots to <input_file>.spots_removed.
-u <user_named_file>; Save identified spots to <user_named_file>.
NOTE: Required with -r if output is stdout, otherwise optional.
-t ; Run test.
-s ; Input is (collated) interleaved paired-end(read) file AND you wish both reads masked or removed.
-h ; Display this message.
docker pull ncbi/sra-human-scrubber