Public Repository

Last pushed: 2 years ago
Short Description
elPrep: a high-performance tool for preparing sequence alignment/map files in sequencing pipelines.
Full Description

Description

Docker image for the elPrep tool.

Examples

Run elprep in memory:

To run elprep in memory, simply call docker run elprep file_in file_out ...

elPrep by default writes error logs to the ~/logs/ folder. When running elPrep in the Docker container, these logs are by default not accessible on the host. You must explicitly pass a path where the error logs should be written on the host to the docker command. This can be done using the -v option. For example "-v `pwd`/logs:/root/logs" tells docker to write the error logs to the current working directory on the host, where the docker command is executed.

Example:

Assume there exists a file /home/caherzee/test/data.bam to process by elPrep.

docker run --rm -v `pwd`/logs:/root/logs -v /home/caherzee/data:/data/ caherzee/elprep data/data.bam data/data.out.bam --filter-unmapped-reads --replace-reference-sequences data/ucsc.hg19.dict --replace-read-group "ID:group1 LB:lib1 PL:illumina PU:unit1 SM:sample1" --mark-duplicates --sorting-order coordinate --nr-of-threads 72

The "-v `pwd`/logs:/root/logs" option makes sure that elPrep writes error logs to the current working directory on the host. The path on the host can be freely chosen, but the path of the error logs in the container must always be /root/logs.

The "-v /home/caherzee/data:/data/" option makes sure we make the folder /home/caherzee/data on the host accessible in the container as the folder /data/. You should always be explicit about paths in the docker command, so always list the full path when referring to files.

The rest of the parameters passed to the docker call are regular parameters that can be passed to the elPrep program, as documented on the elprep homepage [https://github.com/ExaScience/elprep].

Run elPrep with the split/merge tools

To run elPrep with the split/merge tools, simply call docker run elprep sfm file_in file_out ...

Example:

docker run --rm -v `pwd`/logs:/root/logs -v /home/caherzee/data:/data/ caherzee/elprep sfm data/data.bam data/data.out.bam --filter-unmapped-reads --replace-reference-sequences data/ucsc.hg19.dict --replace-read-group "ID:group1 LB:lib1 PL:illumina PU:unit1 SM:sample1" --mark-duplicates --sorting-order coordinate --nr-of-threads 72 --intermediate-files-output-type bam

Run elPrep with gnu parallel

To run elPrep with gnu parallel, simply call docker run elprep sfm-gnupar file_in file_out ...

Example:

docker run --rm -v `pwd`/logs:/root/logs -v /home/caherzee/data:/data/ caherzee/elprep sfm-gnupar data/data.bam data/data.out.bam --filter-unmapped-reads --replace-reference-sequences data/ucsc.hg19.dict --replace-read-group "ID:group1 LB:lib1 PL:illumina PU:unit1 SM:sample1" --mark-duplicates --sorting-order coordinate --nr-of-threads 18 --nr-of-jobs 4 --intermediate-files-output-type bam

Further information

Learn more about the elPrep project by visiting our github repository [https://github.com/ExaScience/elprep].

Our github respository includes the Dockerfile so you can build the elprep image yourself.

Docker Pull Command
Owner
caherzee

Comments (0)