Check the Wiki for
ZetaHunter is a command line script designed to assign user-supplied
small subunit ribosomal RNA (SSU rRNA) gene sequences to OTUs defined
by a reference sequence database.
By default, ZetaHunter uses a curated database of full-length,
non-chimeric, Zetaproteobacteria SSU rRNA gene sequences derived from
arb SILVA (release 123) and Zetaproteobacteria genomes from JGI's
Integrated Microbial Genomes (IMG). OTU definitions are the same as
those suggested by McAllister et al. (2011) at 97% identity, with
novel OTUs discovered since that publication named ZetaOTU29 and
higher (curated OTUs only). Infiles aligned by the arb SILVA SINA web
aligner are masked using the same 1282 bp mask used in McAllister et
al. (2011) to obtain reproducible OTU calls through closed reference
OTU binning. User sequences that represent novel Zetaproteobacteria
OTUs are de novo binned into NewZetaOTUs, numbered by abundance.
OTU network analysis is a simple way to visualize the connectivity of
OTUs within a sample or environment type. ZetaHunter will output edge
and node tab-delimited files for import into cytoscape. The node file
contains the abundance information for each node. The edge file lists
OTUs that are found within the same sample (node1, node2, sample), thus
allowing for visualization. Note: Samples with only one ZetaOTU will contain
a self referential edge. Otherwise, only non-self connections are shown.
ZetaHunter also supports user-provided curated OTU databases for
sequence OTU binning of any SINA-aligned SSU rRNA sequences.
- Stable SSU rRNA gene OTU binning to a curated database
- Supports import of multiple files for easy comparison of NewZetaOTUs across samples
- Database and sequence mask management options
- Multi-threaded processing
- Chimera checking
- Flags for sequences not related to the curated database (i.e. not Zetaproteobacteria)
- Cytoscape-compatible output file for OTU network analysis
Running ZetaHunter with Docker
Note: If you have Windows, running
ZetaHunter with Docker is the
only supported option.
After installing Docker, open the Launchpad and click the
Quickstart Terminal icon.
In the terminal window that opens, enter the following command
$ docker pull mooreryan/zetahunter
to download the latest
ZetaHunter Docker image to your computer.
Note: If you already have the
ZetaHunter Docker image, this is
only necessary to ensure you have the latest version of
perl script, and change the permissions to executable.
$ \curl "https://raw.githubusercontent.com/mooreryan/ZetaHunter/master/bin/run_zeta_hunter" > ~/Downloads/run_zeta_hunter $ chmod 755 ~/Downloads/run_zeta_hunter
run_zeta_hunter to somewhere on your path.
$ sudo mv ~/Downloads/run_zeta_hunter /usr/local/bin
Try it out!
$ which run_zeta_hunter
should spit out
/usr/local/bin/run_zeta_hunter $ run_zeta_hunter -h
will display the help banner.
Zetaproteobacteria database curation
McAllister, S. M., R. E. Davis, J. M. McBeth, B. M. Tebo, D. Emerson, and C. L. Moyer. 2011. Biodiversity and emerging biogeography of the neutrophilic iron-oxidizing Zetaproteobacteria. Appl. Environ. Microbiol. 77:5445–5457. doi:10.1128/AEM.00533-11
ZetaHunter uses lots of other software internally. Please cite the
Quast, C., E. Pruesse, P. Yilmaz, J. Gerken, T. Schweer, P. Yarza, J. Peplies, and F. O. Glöckner. 2013. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucl. Acids Res. 41(D1): D590-D596.
Pruesse, E., J. Peplies, and F. O. Glöckner. 2012. SINA: accurate high-throughput multiple sequence alignment of ribosomal RNA genes. Bioinformatics 28:1823–1829.
Schloss, P.D., et al., Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol, 2009. 75(23):7537-41.
Kopylova E., Noé L. and Touzet H., "SortMeRNA: Fast and accurate filtering of ribosomal RNAs in metatranscriptomic data", Bioinformatics (2012), doi: 10.1093/bioinformatics/bts611.
Edgar, R. C., B. J. Haas, J. C. Clemente, C. Quince, and R. Knight. 2011. UCHIME improves sensitivity and speed of chimera detection. Bioinformatics, doi: 10.1093/bioinformatics/btr381
silva.gold.align.gz is from
NOTE: This file will be temporarily unzipped (requires 247mb of
hard drive space) if chimera checking is turned on.
Lines beginning with
# are considered comments.
The headers are split on " " characters and the first part of that is
taken to be the sequence ID and must be unique.
The entropy file needs to be rebuilt each time
Versions & Bug Fixes
- 0.0.7: Add threading to sortmerna
- 0.0.8: Update
- 0.0.9: Add command line args to log
- 0.0.10: Fix bug in
run_zeta_hunterwhere all args are expanded not just filenames
- 0.0.11: Remove
--force. Docker runs as root, so best to just take it out for now.
- 0.0.12: Fix Dockerfile.
- 0.0.13: Fix outdir not writable bug.
- 0.0.14: Remove self connections from Cytoscape unless the sample has only one OTU
- 0.0.15: If a sample has no OTUs (this could happen if all sequences were flagged as not being Zetas and removed) there would be an error in writing the OTU network edges file because the biom file would have a column of zeros for that sample. The biom file remains unchanged but ZH now doesn't try and write any records for samples with an entire column of zeros in the biom file.