Public | Automated Build

Last pushed: 2 years ago
Short Description
Segment the genome using epigenetic data.
Full Description

README

Footprints! Genome segmentation by histone modifications and other epigenetic data sources.

What is this repository for?

How do I get set up?

To run the Docker image, simply put your bed files into a single directory with a manifest.txt (see example below), create an empty output directory in a separate location (not inside your bed directory!), then run:

docker pull dennishazelett/footprint:latest

docker run -e ENV_CELLTYPE=ALL -v <full/path/to/beds>:/home/rstudio/data/bed -v <full/local/output/path>:/home/rstudio/data/RSEGMENTATIONS dennishazelett/footprint:latest

Alternatively, source the script inside R or run in batch mode. For these options it is necessary to modify the setwd() command in the head of the file so that R knows where to find your bed files. It will also be necessary to create an output directory called RSEGMENTATIONS in the same parent folder as the one that contains the bed directory.

  • Configuration

A tab-delimited manifest.txt file should be placed in the bed directory. It has the following appearance:

SAMPLE MARK SRC BUILD FILE
K562 CRHMM ENC hg19 ENCFF0120HJ.bed.gz
K562 K4M1 USC hg19 ENCFF001VCQ.bed.gz
K562 K4M3 USC hg19 ENCFF001VCR.bed.gz
K562 K4M3 ENC hg19 ENCFF001XGT.bed.gz
K562 K4M3 ENC hg19 ENCFF001XGU.bed.gz
K562 K27AC ENC hg19 ENCFF001SZE.bed.gz
K562 DHS ENC hg19 ENCFF001WNN.bed.gz
K562 K9AC USC hg19 ENCFF001VCS.bed.gz
K562 K4M2 ENC hg19 ENCFF001SZI.bed.gz
K562 ATF1 ENC hg19 ENCFF002CVM.bed.gz
K562 TAL1 ENC hg19 ENCFF002CYH.bed.gz
K562 CEBPB ENC hg19 ENCFF002CVV.bed.gz
GM12878 DHS ENC hg19 ENCFF001WFT.bed.gz
GM12878 DHS ENC hg19 ENCFF001WFU.bed.gz
GM12878 K4M3 ENC hg19 ENCFF001WYG.bed.gz
GM12878 K4M3 ENC hg19 ENCFF001WYI.bed.gz
GM12878 K4M3 ENC hg19 ENCFF001WYJ.bed.gz
GM12878 K4M3 ENC hg19 ENCFF001WYK.bed.gz
GM12878 K4M1 ENC hg19 ENCFF001SUE.bed.gz
GM12878 K27AC ENC hg19 ENCFF001SUG.bed.gz
GM12878 K4M3 ENC hg19 ENCFF001SUF.bed.gz
GM12878 K9AC ENC hg19 ENCFF001SUO.bed.gz
ALL TSS DJH hg19 promoters.bed

Some important notes about manifest.txt:

If your manifest includes some cell types in the SAMPLE column that lack H3K4Me3 data YOU MUST INCLUDE A BED FILE DEFINING TSS/PROMOTER REGIONS or the script will fail (for example, see last entry in table above). For my own purposes I define promoters as -1kb to + 100bp of RefSeq TSS. Relax: footprint will not annotate all promoter regions defined in the promoter bed; it will only annotate those promoters that have evidence of activity from other epigenetic marks (e.g. H3K27Ac or H3K4me1).

Segmentation Codes and Colors

The default segmentation colors and browser track label key are described in this table with their chromHMM analogs. footprint will call states regardless of how little data you have; it will find an appropriate annotation based on available evidence. Hence, more generic states have no chromHMM analog. The colors were picked to match chromHMM tracks from ENCODE in the UCSC genome browser.

FUNCTION LABEL RGB COLOR CHROMHMM(1) ANALOG
ACTIVE REGION ACTR 194,214,154 NA
ACTIVE ENHANCER EAR 255,200,0 Enh, EnhG
ACTIVE ENHANCER CORE EARC 255,69,0 Enh, EnhG
POISED ENHANCER EPR 204,153,255 NA
POISED ENHANCER CORE EPRC 153,51,255 NA
ACTIVE PROMOTER PAR 255,153,153 TssA
ACTIVE PROMOTER CORE PARC 255,0,0 TssA
POISED PROMOTER PPR 204,153,255 NA
POISED PROMOTER CORE PPRC 153,51,255 NA
ENHANCER ER 255,231,144 NA
ENHANCER CORE ERC 255,145,104 NA
PROMOTER PR 255,198,198 NA
PROMOTER CORE PRC 255,153,153 TSS
PUTATIVE REGULATORY SITE RPS 255,255,253 Enh
SILENCED CHROM REGION SCR 128,128,128 ReprPC
HETEROCHROMATIN HET 138,145,208 Het
TRANSCRIBED REGION TRS 0,128,0 Tx

1 Fifteen State model used for analysis of REMC

Who do I talk to?

  • Dennis Hazelett (dennis.hazelett-at-csmc.edu)
Docker Pull Command
Owner
dennishazelett
Source Repository

Comments (0)