CNIDARIA: fast, reference-free phylogenomic clustering

Motivation: Identification of biological specimens is a major requirement
for a range of applications. Reference-free methods analyse unprocessed
sequencing data without relying on prior knowledge, but these do not scale
to arbitrarily large genomes and arbitrarily large phylogenetic distances.

Results: We present Cnidaria, a practical tool for clustering genomic and
transcriptomic data with no limitation on ge-nome size or phylogenetic
distances. We successfully simultaneously clustered 169 genomic and
transcriptomic datasets from 4 kingdoms, achieving 100% accuracy at
supra-species level and 78% accuracy for species level.

Availability and Implementation: Cnidaria is written in C++ and Python and
is available at



