streptomyces/cblaster

By streptomyces

Updated over 2 years ago

cblaster is a replacement for multigeneblast

Image
0

17

cblaster

Downloading the image from Docker Hub.

First pull the container from Docker Hub.

docker pull streptomyces/cblaster

It does not matter which directory you are in when you run the above command because docker keeps all pulled images in some place where it manages them by itself.

For all of the following, you have to take care of managing the directories and locations of files.

Make a new directory named cblaster. I am assuming that you have made this in your home directory so that the following will put you in cblaster.

cd
cd cblaster

In cblaster make two sub-directories, gbk and out.

mkdir gbk
mkdir out
Starting a container

What you pull from Docker Hub are known as images. When you run an image what you get is known as an instance or container. You can have several instances running from the same image.

After making sure that you are in the cblaster directory, run the following. Note that it is one command broken over three lines. It is the \ (backslashes) at the ends of lines which indicate that the command continues on the next line.

docker run --interactive --tty --rm \
--volume $PWD:/home/work \
streptomyces/cblaster:latest

The --volume $PWD:/home/work tells docker that your current directory (cblaster) should be made visible inside the running container as /home/work. The idea is that you put the genbank files you wish to scan in the cblaster/gbk directory and they will be visible on the container side in /home/work/gbk.

All of the stuff below is done inside the running container.

Testing

I have bundled a couple of Genbank files and a query faa file for the purpose of testing. They are in the directory /home/testdata/. Below we move those files to our work dirrectory.

cd /home/work
cp /home/testdata/*.gbk gbk/
cp /home/testdata/cybact.faa ./

Below has to be done everytime you get a new Docker container for cblaster. These are only really needed when using the databases at NCBI (remotely) but cblaster insists on having these. I think you can get away with just the first one (the email one) if you do not have an API key from NCBI. However, it is not very difficult to get an API key from NCBI if you have an NCBI account. It might be worth the effort if you wish to run your searches on the NCBI databases.

cblaster config --email andrew.truman@jic.ac.uk
cblaster config --api_key <your NCBI API Key>

cblaster config --email govind.chandra@jic.ac.uk
cblaster config --api_key 50e8d3b3dbaec1609eeb3848620eeb49f7b9
# Not my real API key.

Make database for cblaster to search.

cblaster makedb --name testdb gbk/*.gbk

The above will result in the following three files being written to the current directory.

  1. testdb.dmnd
  2. testdb.fasta.gz
  3. testdb.sqlite3
cblaster search --mode local --query_file cybact.faa \
--session_file out/cybact.json \
--plot out/cybact.html \
--binary out/cybact.binary \
--database testdb.dmnd \
--output out/cybact.txt

cblaster extract out/cybact.json -q CybH \
--extract_sequences

One could, search the nr database at NCBI. In which case it is not required to specify a local database (it defaults to nr) and --mode option is set to remote.

cblaster search --mode remote --query_file cybact.faa \
--session_file out/remote.json \
--plot out/remote.html \
--binary out/remote.binary \
--entrez_query '"Cyanobacteria"[Organism]' \
--database nr \
--output out/remote.txt

The above can take quite long. In my single test run it took over 11 minutes.

To extract hit proteins for a particular query protein.

cblaster extract out/remote.json -q CybH \
--output CybH_proteins.faa --extract_sequences

If you have a RID from a previously run search at NCBI then you can fetch the results from NCBI rather than running the search again.

cblaster search --query_file cybact.faa \
--session_file out/remote.json \
--plot out/remote.html \
--binary out/remote.binary \
--entrez_query '"Cyanobacteria"[Organism]' \
--output out/remote.txt \
--rid NZJSZ36D01R 

This paragraph applies to Linux and MacOS only. Whatever you do inside a docker container runs as root. This means that the output files written are owned by root. This makes is a pain to delete them outside of the container if you do not have sudo rights. I often start a container just to delete the files from within it rather than deal with the permissions outside the container.

Notes for Govind, everybody else to ignore!

Built on lin9.

Making the image

cds
cd docker/cblaster
docker build -f Dockerfile -t "streptomyces/cblaster" .
docker build --no-cache -f Dockerfile -t "streptomyces/cblaster" .
### Edits inside the container ###
# $cid is the container id inside which changes are made.
docker commit -a "govind.chandra@jic.ac.uk" -m "Some comments." \
$cid streptomyces/cblaster:latest
docker push streptomyces/cblaster:latest

Docker Pull Command

docker pull streptomyces/cblaster