Wellington - Footprint and Bootstrap
The Purpose of the Algorithm
Wellington is an algorithm that allows for the detection of footprints occupied by proteins in DNase-seq data. The method makes use of the lowered activity of DNase I in regions occupied by chromatin or proteins, which leads to a specific cut bias pattern around a region occupied by a protein with respect to the strand orientation. Wellington applies a statistical procedure to identify such sites. The algorithm comes in two modes - the footprint mode identifies footprints in a single sample, while the bootstrap mode identifies differential footprints between samples, i.e. it will return two lists of footprints that show up in one sample but not the other.
Algorithm Availability outside iPlant
The Wellington apps feature the Python pyDNase package mentioned in the original Wellington article at its core. The package is available on pip and can be used for local analyses.
The programs take BAM files of DNase-seq data on input, along with a BED of hypersensitive regions to test for footprints. Such BED files can be obtained with other software, such as EMACS.
The output features a number of potential visualisations discussed in depth in a subsequent section. The main pieces of output are BED files of footprints provided in the main results directory.
If you want to try Wellington and get a feel for its inputs and outputs, test data is available at
iplantcollaborative/example_data/cyverseuk/wellington-bootstrap_testdata under Community Data. The parameters can be left at default values.
Input in Detail
Peak Region BED
The regions in the data to test for footprint presence. These are identified from the data by running it through other software prior to the analysis. An example program capable of carrying out this task is EMACS. In the case of the bootstrap app, two BED files can be provided - for example, with one identified per sample by such a program.
DNAse-Seq BAM Data
The DNase-seq experiment data to analyse with Wellington. A .bai file is generated automatically within the script. In case of the bootstrap app, two BAM files have to be provided, with the first file provided being referred to as treatment1 and the second file being referred to as treatment2 in the output.
The size of the regions flanking the potential footprint on either side to scan for read bias, provided as from;to;by. The notation is that of a Python range, so the actual "to" value is not included - for example, the default setting leads to the only evaluated shoulder size being 35. Only present in the footprint app, in the bootstrap app the shoulder size is set to 35 internally.
The size of the potential footprint regions to evaluate, provided as from;to;by. The notation is that of a Python range, so the actual "to" value is not included.
Once footprint identification is complete, an FDR procedure is in place to ensure that the reported final footprints are corrected for multiple testing. This is the maximum allowed FDR chance to make it into the final FDR-corrected output.
The number of iterations of the FDR procedure.
The raw log10 p-value identified by Wellington has to be less than this value for the footprint to be included in the FDR procedure.
Perform Bonferroni Correction
If checked, the script will perform a Bonferroni FDR correction instead of its normal operation. Only available in the footprint app.
Use 1-D Wellington
If checked, the script will run a 1-D version of the algorithm that ignores strand orientation information of the cuts. The method is inferior to regular Wellington and was largely created to serve as one of the performance comparisons in the original publication. Not recommended. Only available in the footprint app.
Don't Merge Overlapping Footprints
If filtered, the results will not be filtered to combine overlapping footprints. Only available in the footprint app.
Output in Detail
The main output of the script, BED files detailing the location of the footprints. In the case of the footprint app, these will be the footprints in the analysed data, while in the case of the bootstrap app, there will be one file per sample provided with footprints characteristic for that sample listed within.
A Wiggle track showing the change in raw log10 p-value across the tested regions. Only generated by the footprint app.
A visualisation of the average cut patterns around all of the identified footprint regions for a sample. In case of the bootstrap app, there will be one in each of the visualisation folders prefixed with treatment1/treatment2.
A CSV file allowing for an easy creation of a heatmap showing DNase activity around each individual footprint. In case of the bootstrap app, there will be one in each of the visualisation folders prefixed with treatment1/treatment2. For details on how to generate the heatmap within javatreeview, consult the "Visualising footprints as heatmaps" section of the tutorial at http://pythonhosted.org/pyDNase/tutorial.html
Wiggle tracks featuring DNase cuts on the forward and reverse strand respectively. In case of the bootstrap app, there will be one in each of the visualisation folders prefixed with treatment1/treatment2.
p value cutoffs/
A folder featuring a list of footprints filtered with increasing raw log10 p-value stringency, but no FDR. The appropriate p-value threshold is mentioned in the file name. Only generated by the footprint app.