rnacentral/r2dt

By rnacentral

Updated 18 days ago

R2DT is a framework for predicting and visualising RNA secondary structure using templates

Image
Data Science

2.0K

R2DT

Visualise RNA 2D structure in standard layouts

The R2DT software (RNA 2D Templates) automatically generates RNA secondary structure diagrams in standard layouts using a template library representing a wide range of RNAs:

R2DT method overview

R2DT is used by RNAcentral to visualise >14 million RNA secondary structures. See method overview for details or read the R2DT paper in Nature Communications.

Examples

The following example visualisations show LSU, SSU, and 5S rRNA, four tRNAs, two RNAse P, snoRNA, MoCo riboswitch, and U4 snRNA.

R2DT examples

Getting started

R2DT can be used in a number of ways:

Installation

Docker Cloud Build Status

  • Download the R2DT image from Docker Hub and run it with Docker or Singularity.

    Docker

    docker pull rnacentral/r2dt
    docker run --entrypoint r2dt.py rnacentral/r2dt draw --help
    

    Singularity

    singularity build r2dt docker://rnacentral/r2dt    
    singularity exec r2dt r2dt.py draw --help
    
  • :hammer_and_wrench: Development installation:

    # Get the code
    git clone https://github.com/RNAcentral/R2DT.git
    cd R2DT
    
    # Build and tag a Docker image
    docker build -t rnacentral/r2dt .
    docker-compose run cli
    

    The current directory is mounted inside the container so that all code and data changes are instantly reflected in the container.

  • :hammer_and_wrench: Bare metal installation: if running R2DT using containers is not possible, follow instructions in the Dockerfile.

Initial setup
  1. Download a precomputed data library(190.1 MB, last updated Jan 7, 2021) and uncompress it.

  2. Enter an interactive Docker terminal session:

docker run -it -v <path_to_cms>:/rna/r2dt/data/cms -v `pwd`:/rna/r2dt/temp rnacentral/r2dt
  • -it - start an interactive session
  • -v <path_to_cms>:/rna/r2dt/data/cms - mount the precomputed data library folder <path_to_cms> as /rna/r2dt/data/cms inside the container. :warning: Note that <path_to_cms> should be a full path.
  • make the current working directory available inside the container as /rna/r2dt/temp:
    -v `pwd`:/rna/r2dt/temp
    

Any file placed in /rna/r2dt/temp within the container will be available on the host machine after the Docker container exits.

Usage

Automatic template selection

Specify the input file in FASTA format containing one or more RNA sequences as well as the path where the output files will be created (the folder will be created if it does not exist).

r2dt.py draw <input.fasta> <output_folder>

For example:

r2dt.py draw examples/examples.fasta temp/examples

R2DT will automatically select the best matching template and visualise the secondary structures.

Specifying template category

If the RNA type of the input sequences is known in advance, it is possible to bypass the classification steps and achieve faster performance.

  • CRW templates (5S and SSU rRNA)

    r2dt.py crw draw examples/crw-examples.fasta temp/crw-examples
    
  • RiboVision LSU and SSU rRNA templates

    r2dt.py ribovision draw_lsu examples/lsu-examples.fasta temp/lsu-examples
    r2dt.py ribovision draw_ssu examples/ribovision-ssu-examples.fasta temp/ssu-examples
    
  • Rfam families

    r2dt.py rfam draw RF00162 examples/RF00162.example.fasta temp/rfam-example
    
  • RNAse P

    r2dt.py rnasep draw examples/rnasep.fasta temp/rnasep-example
    
  • tRNAs (using GtRNAdb templates)

    # for tRNAs, provide domain and isotype (if known), or use tRNAScan-SE to classify
    r2dt.py gtrnadb draw examples/gtrnadb.E_Thr.fasta temp/gtrnadb
    r2dt.py gtrnadb draw examples/gtrnadb.E_Thr.fasta temp/gtrnadb --domain E --isotype Thr
    
Manual template selection

It is possible to select a specific template and skip the classification step altogether.

  1. Get a list of all available templates and copy the template id:
    r2dt.py list-models
    

In addition, all models are listed in the file models.json.

  1. Specify the template (for example, RNAseP_a_P_furiosus_JB):

    r2dt.py draw --force_template <template_id> <input_fasta> <output_folder>
    

    For example:

    r2dt.py draw --force_template RNAseP_a_P_furiosus_JB examples/force/URS0001BC2932_272844.fasta temp/example
    
Other useful commands
  • Run all tests

    python3 -m unittest
    
  • Run a single test

    python3 -m unittest tests.tests.TestRibovisionLSU
    
  • Classify example sequences using Ribotyper

    perl /rna/ribovore/ribotyper.pl -i data/cms/crw/modelinfo.txt -f examples/pdb.fasta temp/ribotyper-test
    
  • Generate covariance models and modelinfo files

    python3 utils/generate_cm_library.py
    r2dt.py generatemodelinfo <path to covariance models>
    
  • Precompute template library locally (may take up to several hours):

    r2dt.py setup
    
  • Run R2DT with Singularity

    singularity exec --bind <path_to_cms>:/rna/r2dt/data/cms r2dt r2dt.py draw sequence.fasta output
    

Output files

r2dt.py draw produces a folder called results with the following subfolders:

  • svg: RNA secondary structure diagrams in SVG format
  • fasta: input sequences and their secondary structure in dot-bracket notation
  • tsv: a file metadata.tsv listing sequence ids, matching templates, and template sources
  • thumbnail: secondary structure diagrams displayed as outlines in SVG format

How to add new templates

If you would like to submit a new template or replace an existing one, please submit an issue including:

  • A FASTA file with a reference sequence and secondary structure - see example
  • A Traveler XML file - see example
  • Description of the new template and any relevant background information

One can create a new template locally using the generate_cm_library.py script with the FASTA and XML files described above. It is also possible to generate a new template using a special version of the XRNA software, XRNA-GT.

:warning: GitHub currently does not support attaching files with .fasta or .bpseq extensions so please attach the files as .txt.

We will review the template and reply on GitHub as soon as possible.

Method overview

The R2DT pipeline includes the following steps:

  1. Generate a library of covariance models using bpseq files from CRW, RiboVision or another source with Infernal. For best results, remove pseudoknots from the secondary structures using RemovePseudoknots from the RNAStructure package.
  2. Select the best matching covariance model for each input sequence using Ribovore or tRNAScan-SE 2.0.
  3. Fold input sequence into a secondary structure compatible with the template using the top scoring covariance model.
  4. Generate secondary structure diagrams using Traveler and the secondary structure layouts.

See the R2DT paper for more details.

Contributors

:wave: We welcome additional contributions. Please raise an issue or submit a pull request.

Acknowledgements

Docker Pull Command

docker pull rnacentral/r2dt