Hunt them Zetas!
Check the Wiki for
more info!


ZetaHunter is a command line script designed to assign user-supplied
small subunit ribosomal RNA (SSU rRNA) gene sequences to OTUs defined
by a reference sequence database.

By default, ZetaHunter uses a curated database of full-length,
non-chimeric, Zetaproteobacteria SSU rRNA gene sequences derived from
arb SILVA (release 128) and Zetaproteobacteria genomes from JGI's
Integrated Microbial Genomes (IMG). OTU definitions are the same as
those suggested by McAllister et al. (2011) at 97% identity, with
novel OTUs discovered since that publication named ZetaOTU29 and
higher (curated OTUs only). Infiles aligned by the arb SILVA SINA web
aligner are masked using the same 1282 bp mask used in McAllister et
al. (2011) to obtain reproducible OTU calls through closed reference
OTU binning. User sequences that represent novel Zetaproteobacteria
OTUs are de novo binned into NewZetaOTUs, numbered by abundance.

OTU network analysis is a simple way to visualize the connectivity of
OTUs within a sample or environment type. ZetaHunter will output edge
and node tab-delimited files for import into cytoscape. The node file
contains the abundance information for each node. The edge file lists
OTUs that are found within the same sample (node1, node2, sample), thus
allowing for visualization. Note: Samples with only one ZetaOTU will contain
a self referential edge. Otherwise, only non-self connections are shown.

ZetaHunter also supports user-provided curated OTU databases for
sequence OTU binning of any SINA-aligned SSU rRNA sequences.


  1. Stable SSU rRNA gene OTU binning to a curated database
  2. Supports import of multiple files for easy comparison of NewZetaOTUs across samples
  3. Database and sequence mask management options
  4. Multi-threaded processing
  5. Chimera checking
  6. Flags for sequences not related to the curated database (i.e. not Zetaproteobacteria)
  7. Cytoscape-compatible output file for OTU network analysis

Running ZetaHunter with Docker

If you don't have Docker, follow the instructions to install it here:

Note: If you have Windows, running ZetaHunter with Docker is the
only supported option.


After installing Docker, open the Launchpad and click the Docker

perl script, and change the permissions to executable. In this case,
it will be placed in the following directory ~/software/ZetaHunter.

$ mkdir -p ~/software/ZetaHunter
$ \curl "" > ~/software/ZetaHunter/run_zeta_hunter
$ chmod 755 ~/software/ZetaHunter/run_zeta_hunter

You can create a symbolic link to somewhere on your path so that you
can use the run_zeta_hunter command from any folder. Assuming that
you have /usr/local/bin on your path, you can use this command.

$ sudo ln -s $HOME/software/ZetaHunter/run_zeta_hunter /usr/local/bin

If you don't want to use a symbolic link, you can also move the program to your path directly.

$ sudo mv ~/software/ZetaHunter/run_zeta_hunter /usr/local/bin

Try it out! Running this command

$ run_zeta_hunter -h

will display the help banner.



Zetaproteobacteria database curation

Please cite

McAllister, S. M., R. E. Davis, J. M. McBeth, B. M. Tebo, D. Emerson, and C. L. Moyer. 2011. Biodiversity and emerging biogeography of the neutrophilic iron-oxidizing Zetaproteobacteria. Appl. Environ. Microbiol. 77:5445–5457. doi:10.1128/AEM.00533-11


External programs

ZetaHunter uses lots of other software internally. Please cite the



Please cite

Quast, C., E. Pruesse, P. Yilmaz, J. Gerken, T. Schweer, P. Yarza, J. Peplies, and F. O. Glöckner. 2013. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucl. Acids Res. 41(D1): D590-D596.

SINA Web-Aligner


Please cite

Pruesse, E., J. Peplies, and F. O. Glöckner. 2012. SINA: accurate high-throughput multiple sequence alignment of ribosomal RNA genes. Bioinformatics 28:1823–1829.



Please cite

Schloss, P.D., et al., Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol, 2009. 75(23):7537-41.



Please cite

Kopylova E., Noé L. and Touzet H., "SortMeRNA: Fast and accurate filtering of ribosomal RNAs in metatranscriptomic data", Bioinformatics (2012), doi: 10.1093/bioinformatics/bts611.



Please cite

Edgar, R. C., B. J. Haas, J. C. Clemente, C. Quince, and R. Knight. 2011. UCHIME improves sensitivity and speed of chimera detection. Bioinformatics, doi: 10.1093/bioinformatics/btr381


See Gemfile

Assets is from

NOTE: This file will be temporarily unzipped (requires 247mb of
hard drive space) if chimera checking is turned on.

OTU Metadata

Lines beginning with # are considered comments.

Other info

Gap positions

base.match /[^ACTGUN]/i

Sequence headers

The headers are split on " " characters and the first part of that is
taken to be the sequence ID and must be unique.


The entropy file needs to be rebuilt each time db_seqs.fa is

