Input: A PDF or Image file. Output: The fulltext of said file.
We will use OpenCV, Tesseract, hook into other ML APIs, whatever. This bot should be able to get text out of images. The goal is to be able to get the text from machine printed documents, specifically.
Not yet 100% sure what we want the output to look like. Fulltext like the Fulltextbot gives us? A PDF document with the scanned OCR appearing in place? What exactly?
docker pull menome/ocrbot