streptomyces/cblaster
cblaster is a replacement for multigeneblast
17
First pull
the container from Docker Hub.
docker pull streptomyces/cblaster
It does not matter which directory you are in when you run the above command because docker keeps all pulled images in some place where it manages them by itself.
For all of the following, you have to take care of managing the directories and locations of files.
Make a new directory named cblaster
. I am assuming that you have made
this in your home directory so that the following will put you in
cblaster
.
cd
cd cblaster
In cblaster
make two sub-directories, gbk
and out
.
mkdir gbk
mkdir out
What you pull
from Docker Hub are known as images. When you
run
an image what you get is known as an instance or
container. You can have several instances running from the
same image.
After making sure that you are in the cblaster
directory, run the
following. Note that it is one command broken over three lines. It
is the \
(backslashes) at the ends of lines which indicate that the
command continues on the next line.
docker run --interactive --tty --rm \
--volume $PWD:/home/work \
streptomyces/cblaster:latest
The --volume $PWD:/home/work
tells docker that your current
directory (cblaster
) should be made visible inside the running
container as /home/work
. The idea is that you put the genbank files
you wish to scan in the cblaster/gbk
directory and they will be
visible on the container side in /home/work/gbk
.
All of the stuff below is done inside the running container.
I have bundled a couple of Genbank files and a query faa file for the
purpose of testing. They are in the directory /home/testdata/
.
Below we move those files to our work dirrectory.
cd /home/work
cp /home/testdata/*.gbk gbk/
cp /home/testdata/cybact.faa ./
Below has to be done everytime you get a new Docker container for
cblaster
. These are only really needed when using the databases at
NCBI (remotely) but cblaster insists on having these. I think you can
get away with just the first one (the email one) if you do not have an
API key from NCBI. However, it is not very difficult to get an API key
from NCBI if you have an NCBI account. It might be worth the effort if
you wish to run your searches on the NCBI databases.
cblaster config --email andrew.truman@jic.ac.uk
cblaster config --api_key <your NCBI API Key>
cblaster config --email govind.chandra@jic.ac.uk
cblaster config --api_key 50e8d3b3dbaec1609eeb3848620eeb49f7b9
# Not my real API key.
Make database for cblaster to search.
cblaster makedb --name testdb gbk/*.gbk
The above will result in the following three files being written to the current directory.
cblaster search --mode local --query_file cybact.faa \
--session_file out/cybact.json \
--plot out/cybact.html \
--binary out/cybact.binary \
--database testdb.dmnd \
--output out/cybact.txt
cblaster extract out/cybact.json -q CybH \
--extract_sequences
One could, search the nr database at NCBI. In which case it is not
required to specify a local database (it defaults to nr) and
--mode
option is set to remote
.
cblaster search --mode remote --query_file cybact.faa \
--session_file out/remote.json \
--plot out/remote.html \
--binary out/remote.binary \
--entrez_query '"Cyanobacteria"[Organism]' \
--database nr \
--output out/remote.txt
The above can take quite long. In my single test run it took over 11 minutes.
To extract hit proteins for a particular query protein.
cblaster extract out/remote.json -q CybH \
--output CybH_proteins.faa --extract_sequences
If you have a RID
from a previously run search at NCBI then you can
fetch the results from NCBI rather than running the search again.
cblaster search --query_file cybact.faa \
--session_file out/remote.json \
--plot out/remote.html \
--binary out/remote.binary \
--entrez_query '"Cyanobacteria"[Organism]' \
--output out/remote.txt \
--rid NZJSZ36D01R
This paragraph applies to Linux and MacOS only. Whatever you do inside
a docker container runs as root. This means that the output files
written are owned by root. This makes is a pain to delete them outside
of the container if you do not have sudo
rights. I often start a
container just to delete the files from within it rather than deal
with the permissions outside the container.
Built on lin9.
Making the image
cds
cd docker/cblaster
docker build -f Dockerfile -t "streptomyces/cblaster" .
docker build --no-cache -f Dockerfile -t "streptomyces/cblaster" .
### Edits inside the container ###
# $cid is the container id inside which changes are made.
docker commit -a "govind.chandra@jic.ac.uk" -m "Some comments." \
$cid streptomyces/cblaster:latest
docker push streptomyces/cblaster:latest
docker pull streptomyces/cblaster