pegi3s/entrez-direct
Entrez Direct (https://www.ncbi.nlm.nih.gov/books/NBK179288/) docker image.
317
This image allows the usage of the Entrez Direct utilities, including:
Navigation functions support exploration within the Entrez databases:
esearch
performs a new Entrez search using terms in indexed fields.elink
looks up neighbors (within a database) or links (between databases).efilter
filters or restricts the results of a previous query.Records can be retrieved in specified formats or as document summaries:
efetch
downloads records or reports in a designated format.Desired fields from XML results can be extracted without writing a program:
xtract
converts EDirect XML output into a table of data values.Several additional functions are also provided:
einfo
obtains information on indexed fields in an Entrez database.epost
uploads unique identifiers (UIDs) or sequence accession numbers.nquire
sends a URL request to a web page or CGI service.To run an application, you should adapt and run the following command: docker run --rm -v /your/data/dir:/data pegi3s/entrez-direct <entrez-direct-command> <options>
In this command, you should replace:
/your/data/dir
to point to the directory that contains the input files you want to analyze.<entrez-direct-command>
to the name of the Entrez Direct command you want to use.<options>
with the specific options of the Entrez Direct command. These options will include the input/output files if required, which should be referenced under /data/
.To download the FASTA sequences from the accession numbers XM_016057909.1 and XM_030992799.1, the epost
and efetch
commands can be used. The steps are:
accessions.txt
with these accessions.epost
using the accessions file as input and write the result to epost.result
.efetch
using the epost.result
as input and write result to sequences.fastap
.echo -e "XM_016057909.1\nXM_030992799.1" > accessions.txt
docker run --rm -v /your/data/dir:/data pegi3s/entrez-direct epost -db nuccore -format uid -input /data/accessions.txt > epost.result
docker run --rm -v /your/data/dir:/data pegi3s/entrez-direct bash -c "efetch -format fasta < /data/epost.result > /data/sequences.fasta"
Alternatively, the output of the epost
command can be piped to the efetch
command:
docker run --rm -v /your/data/dir:/data pegi3s/entrez-direct bash -c "echo -e \"XM_016057909.1\nXM_030992799.1\" | epost -db nuccore -format uid | efetch -format fasta > /data/sequences.fasta"
To retrieve the taxonomy information for a for a given species, the esearch
and efetch
commands can be used. For instance, the following commands downloads the taxonomy of Homo sapiens into a taxonomy.xml
file:
docker run --rm -v /your/data/dir:/data pegi3s/entrez-direct bash -c "esearch -db taxonomy -query "Homo+sapiens" | efetch -db taxonomy -format xml > /data/taxonomy.xml"
docker pull pegi3s/entrez-direct