pegi3s/bedtools

Sponsored OSS

By i3S

Updated 3 days ago

Bedtools (http://bedtools.readthedocs.io/en/latest/index.html) docker image.

Image
Data Science
Languages & Frameworks

50K+

This image belongs to a larger project called Bioinformatics Docker Images Project (http://pegi3s.github.io/dockerfiles)

(Please note that the original software licenses still apply)

This image allows the usage of the Bedtools suite - a fast and flexible toolset for genome arithmetic. Bedtools allows to intersect, merge, count, complement, and shuffle genomic intervals from multiple files in widely-used genomic file formats such as BAM, BED, GFF/GTF, VCF.

By running the command docker run --rm -v /your/data/dir:/data pegi3s/bedtools bedtools -h you can list the tools included in this suite, namely:

  • annotateBed: annotate coverage of features from multiple files.
  • bamToBed: convert BAM alignments to BED (& other) formats.
  • bamToFastq: convert BAM records to FASTQ records.
  • bed12ToBed6: breaks BED12 intervals into discrete BED6 intervals.
  • bedToBam: convert intervals to BAM records.
  • bedToIgv: create an IGV snapshot batch script.
  • bedpeToBam: convert BEDPE intervals to BAM records.
  • bedtools: print help menu.
  • closestBed: find the closest, potentially non-overlapping interval.
  • clusterBed: cluster (but don’t merge) overlapping/nearby intervals.
  • complementBed: extract intervals not represented by an interval file.
  • coverageBed: compute the coverage over defined intervals.
  • expandCols: replicate lines based on lists of values in columns.
  • fastaFromBed: use intervals to extract sequences from a FASTA file.
  • flankBed: create new intervals from the flanks of existing intervals.
  • genomeCoverageBed: compute the coverage over an entire genome.
  • getOverlap: computes the amount of overlap from two intervals.
  • groupBy: group by common cols. & summarize oth. cols. (~ SQL “groupBy”)
  • intersectBed: find overlapping intervals in various ways.
  • linksBed: create a HTML page of links to UCSC locations.
  • mapBed: apply a function to a column for each overlapping interval.
  • maskFastaFromBed: use intervals to mask sequences from a FASTA file.
  • mergeBed: combine overlapping/nearby intervals into a single interval.
  • multiBamCov: counts coverage from multiple BAMs at specific intervals.
  • multiIntersectBed: identifies common intervals among multiple interval files.
  • nucBed: profile the nucleotide content of intervals in a FASTA file.
  • pairToBed: find pairs that overlap intervals in various ways.
  • pairToPair: find pairs that overlap other pairs in various ways.
  • randomBed: generate random intervals in a genome.
  • shiftBed: adjust the position of intervals.
  • shuffleBed: randomly redistribute intervals in a genome.
  • slopBed: adjust the size of intervals.
  • sortBed: order the intervals in a file.
  • subtractBed: remove intervals based on overlaps b/w two files.
  • tagBam: tag BAM alignments based on overlaps with interval files.
  • unionBedGraphs: combines coverage intervals from multiple BEDGRAPH files.
  • windowBed: find overlapping intervals within a window around an interval.
  • windowMaker: make interval “windows” across a genome.

Note: You may notice some mismatches between the tools names shown in the help list and the names of the corresponding executable functions. Check it please, by performing:

docker run -it pegi3s/bedtools
ls /opt/bedtools2/bin/

In case the mismatch happens, you should use the names of the executable functions.

To obtain the help of a particular application, you just need to run: docker run --rm -v /your/data/dir:/data pegi3s/bedtools <bedtools-application-name> (e.g. docker run --rm -v /your/data/dir:/data pegi3s/bedtools fastaFromBed)

Using the Bedtools image in Linux

To run an application, you should adapt and run the following command: docker run --rm -v /your/data/dir:/data pegi3s/bedtools <bedtools-application-name> -fi <input FASTA> -bed <BED/GFF/VCF> -fo /data/stdout

In this command, you should replace:

  • /your/data/dir to point to the directory that contains the input files you want to analyze.
  • <bedtools-application-name> to the name of the Bedtools application you want to use.
  • <input FASTA> to the actual name of your input FASTA file.
  • <BED/GFF/VCF> to the actual name of your input BED/GFF/VCF file.
  • stdout to the actual name of your output file.

For instance, to use the fastaFromBed application, you should run: docker run --rm -v /your/data/dir:/data pegi3s/bedtools fastaFromBed -fi /data/input_fasta -bed /data/input_gff -fo /data/stdout

Note: In order to use a less complex BED/GFF/VCF file you may want to filter it for exons first, for example, by performing:

grep -P "\texon\t" input_gff

Using the Bedtools image in Windows

Please note that data must be under the same drive than the Docker Toolbox installation (usually C:) and in a folder with write permissions (e.g. C:/Users/User_name/).

As in the Linux case, to run an application, you should adapt and run the following command: docker run --rm -v "/c/Users/User_name/dir/":/data pegi3s/bedtools <bedtools-application-name> -fi <input FASTA> -bed <BED/GFF/VCF> -fo /data/stdout

Docker Pull Command

docker pull pegi3s/bedtools