Public | Automated Build

Last pushed: a month ago
Short Description
Support for parsing wnd-charm, avro, hdf5, and more.
Full Description



Pydoop-features is a suite of tools for extracting features from image
data. It uses
to read image data, Avro for
(de)serialization and
WND-CHARM for feature


The fastest way to get a working installation is to pull the
Docker image:

docker pull imagedata/pyfeatures

Java-Python interoperability is achieved via Avro. The input dataset
can be in any format supported by
. For
instance, download
and unpack it under /tmp. The first step is to serialize this data
to Avro:

docker run -u ${UID} --rm -v /tmp:/tmp imagedata/pyfeatures \
  serialize /tmp/MF-2CH-Z-T.tif -o /tmp/

You should get one avro container file per image series in the input
dataset. In this case:


To compute features for the first avro container:

docker run -u ${UID} --rm -v /tmp:/tmp imagedata/pyfeatures \
  calc /tmp/MF-2CH-Z-T_0.avro -o /tmp/

You might want to get a cup of coffee, feature calculation takes time.

When the above finishes, you should have the following file:


which can be read from either Java or Python. For instance:

>>> from avro.datafile import DataFileReader
>>> from import DatumReader, BinaryDecoder
>>> with open("/tmp/MF-2CH-Z-T_0_features.avro") as f:
...     reader = DataFileReader(f, DatumReader())
...     records = [_ for _ in reader]
>>> len(records)
>>> r = records[0]
>>> r['haralick_textures']
[0.0015474594757607179, 0.00029323128834782644, ...]
Docker Pull Command
Source Repository