Public | Automated Build

Last pushed: 4 months ago
Short Description
Short Description
Full Description

.. epigraph::

Truth cannot penetrate a closed mind. If all places in the universe are in
the Aleph, then all stars, all lamps, all sources of light are in it, too.

-- The Aleph <>_,
Jorge Luis Borges

.. figure::
:alt: Build Status

Aleph is a tool for indexing large amounts of both unstructured (PDF, Word,
HTML) and structured (CSV, XLS, SQL) data for easy browsing and search. It is
built with investigative reporting as a primary use case. Aleph allows
cross-referencing mentions of well-known entities (such as people and
companies) against watchlists, e.g. from prior research or public datasets.

Here's some key features:

  • Web-based UI for search across large document and data sets.
  • Watchlist editor for making custom sets of entities to be tracked.
  • Equal support for structured (i.e. tabular) and unstructured (i.e. textual)
  • Importers include a local filesystem traverser, web crawlers and a SQL query
  • Document entity tagger (regular expressions-based, and optionally using NLP).
  • Support for OCR, unpacking Zip/RAR/Tarballs, language and encoding detection.
  • Entity watchlist importers for
    OpenNames <> and
    Investigative Dashboard <>
  • OAuth authorization and access control on a per-source and per-watchlist
  • Excel export for search result sets.


The documentation for Aleph is available at <>_.
Feel free to edit the source files in the docs folder and send pull
requests for improvements.

To build the documentation, please install the dependencies first and run
make docs::

(host)$ docker-compose run app bash
(app) $ pip install -r requirements-docs.txt
(app) $ make docs

Now you can browse the documentation locally at

(host)$ make docs-web


Aleph is used by multiple organisations, including Code for Africa, OCCRP and
OpenOil. For coordination, the following mailing list exists:
aleph-search <!forum/aleph-search>_

If you find any errors or issues using Aleph please
file an issue on GitHub <>_ or
contact the mailing list.

Docker Pull Command
Source Repository

Comments (0)