Public | Automated Build

Last pushed: 12 days ago
Short Description
A Docker execution environment for Companion.
Full Description


A portable, scalable eukaryotic genome annotation pipeline implemented in Nextflow.



This software is a comprehensive computational pipeline for the annotation of eukaryotic genomes (like protozoan parasites). It performs the following tasks:

  • Fast generation of pseudomolecules from scaffolds by ordering and orientating against a reference
  • Accurate transfer of highly conserved gene models from the reference
  • De novo gene finding as a complement to the gene transfer
  • Non-coding RNA detection (tRNA, rRNA, sn(o)RNA, ...)
  • Pseudogene detection
  • Functional annotation (GO, products, ...)
    • transferring reference annotations to the target genome
    • inferring GO terms and products from Pfam pHMM matches
  • Consistent gene ID assignment
  • Preparation of validated GFF3, GAF and EMBL output files for jump-starting manual curation and quick turnaround time to submission

It supports parallelized execution on a single machine as well as on large cluster platforms (LSF, SGE, ...).


Companion has the following dependencies:

  • Java 8 or later
  • Nextflow
  • Docker (if using the Docker image to satisfy dependencies)

To check if you have Java installed, and the version, use the command java -version. Note that this will give you a version number
of 1.8 for Java 8, 1.9 for Java 9, etc.

If you need to install Java, on an Ubuntu system run:

apt-get install default-jre

For other Linux systems, please consult your distribution documentation.

To install Nextflow, run:

curl -fsSL | bash

This will create an executable called 'nextflow', which should be moved to a suitable directory, for example:

mv nextflow /usr/local/bin/

Use the command which nextflow to check that it is found in your path.


Docker is required if you intended to use the Docker image, as recommended below, to satisfy the dependencies.

To install Docker, see the installation guide for
Debian or

Users running Companion with Docker will need to be added to the docker group (unix users can belong to one or more groups, which determine
whether they can peform certain actions; adding a user to the docker group allows them to execute docker commands). To add user <username>, to
the docker group, run:

usermod -aG docker <username>

Some Linux systems may not have usermod installed, as there are different programs that can be used to change user settings;
please consult your Linux distribution documentation if necessary.


There are a number of ways to install Companion. Details for an installation using Docker are described below. If you encounter an issue when installing Companion please contact your local system administrator. If you encounter a bug please log it here or email us at

The easiest way to use the pipeline is to use the prepared Docker image which contains all external dependencies.

docker pull sangerpathogens/companion


Local copy of Companion

To create a local copy of companion, you can download this repo from github (if you are familiar with github, you may
of course prefer to clone or fork it).

curl -L -o  # or click the green button on the guthub web page
mv companion-master my-companion-project # renaming it to something meaningful to you is a good idea

Now you can run Companion. There is an example dataset and parameterization included in the distribution, so
to get started just run:

nextflow run my-companion-project -profile docker

The argument -profile docker instructs nextflow to run the sangerpathogens/companion docker image for the dependencies.

Have a look at the nextflow.config file to see the definition of the docker profile, and how the docker image is specified.
You will also find file names, paths, parameters, etc. that you can edit to perform your own runs. The following warrant
a special mention:

inseq The input FASTA file (${baseDir}/example-data/L_donovani.1.fasta in the example parameter file included wirth the distribution)

ref_dir The directory containing reference genomes (${baseDir}/example-data/references in the example file)

dist_dir The directory that will contain the newly created output files (${baseDir}/example-data-output in the example file)

run_snap We recommend SNAP is disabled, as it has not provided useful results in this pipeline (false in the example file)

Running Companion direct from a repository

If you run nextflow with the name of a github repository, it will pull the contents of the repository and run with those.
This command will do the same as the "local copy" example above:

nextflow run sanger-pathogens/companion -profile docker

It is best to use this with some caution. After the command above is
run, nextflow will have stored a local copy of the repository in .nextflow/assets/sanger-pathogens, and if you run
the command again it will this time use the local copy instead of pulling a copy from the repository. You can
edit the files in your local copy, and nextflow will work from your (now different) version of sanger-pathogens/companion.

If you are familiar with repositories, and the workflow appropriate to using them, this can be a very convenient way of
working; otherwise it can become quite confusing, and you may find it easier to work with a simple local copy.

Preparing reference annotations

The reference annotations used in the pipeline need to be pre-processed before they can be used. Only a few pre-generated
reference sets for various parasite species/families are included in the distribution as examples.

To add a reference organism, you will need:

  • a descriptive name of the organism
  • a short abbreviation for the organism
  • the genome sequence in a single FASTA file
  • a structural gene annotation in GFF3 format (see below for details)
  • functional GO annotation in GAF 1.0 format, on the gene level
  • a pattern matching chromosome headers, describing how to extract chromosome numbers from them
  • an AUGUSTUS model, trained on reference genes

Insert these file names, etc., where <placeholders> appear in the steps below:

  1. Create a new data directory (i.e. the equivalent of the example-data directory included in the distribution)
  2. Edit nextflow.config (and any config files that are referenced) and change parameters such as
    inseq and ref_dir to your new data directory.
  3. Copy the new reference genome (FASTA) into <new_data_dir>/genomes
  4. Copy GFF3 and GAF files into <new_data_dir>/genomes
  5. Copy Augustus model files into data/augustus/species/<species_name>/
  6. Create new directory <new_data_dir>/references/<short_name>/
  7. Add new section to amber-test-data/references/references-in.json, using the
    short name (same as the directory name in the previous step); in this section add
    the names/paths of the files copied (above), a descriptive name, and
    a pattern for matching chromosomes in the FASTA files (in this example, <short_name>_<n>, where _n_ in any integer).
    "<short_name>" : {   "gff"                : "../genomes/<gff3_filename>.gff3",
                      "genome"             : "../genomes/<ref_genome_name>.fasta",
                      "gaf"                : "../genomes/<ref_annot_filename>.gaf",
                      "name"               : "<Descriptive Name of Reference Genome>",
                      "augustus_model"     : "../../data/augustus/species/<species_name>/",
                      "chromosome_pattern" : "<short_name>_(%d+)"
  8. Finally, change directory to <new_data_dir>/references (you must execute the following command in this directory)
    and run ../../bin/update_references.lua. This writes the file <new_data_dir>/references/references.json.

You can now run Companion, and the new reference will be included.

Further documentation on preparing reference data can be found in the GitHub wiki.


Companion is free software, licensed under ISC.


Please report any issues to the issues page or email


If you use this software please cite:
Companion: a web server for annotation and analysis of parasite genomes.
Steinbiss S, Silva-Franco F, Brunk B, Foth B, Hertz-Fowler C et al.
Nucleic Acids Research, 44:W29-W34, 2016.
DOI: 10.1093/nar/gkw292

Docker Pull Command
Source Repository