image-match is a simple (now Python 3!) package for finding approximate image matches from a
corpus. It is similar, for instance, to pHash, but
includes a database backend that easily scales to billions of images and
supports sustained high rates of image insertion: up to 10,000 images/s on our
PLEASE NOTE: This algorithm is intended to find nearly duplicate images -- think copyright
violation detection. It is NOT intended to find images that are conceptually similar.
For more explanation, see this issue or
Based on the paper An image signature for any kind of image, Wong et
al. There is an existing
reference implementation which
may be more suited to your needs.
Once you're up and running, read these two (short) sections of the documentation to get a feel
for what image-match is capable of: