Public | Automated Build

Last pushed: 9 months ago
Short Description
Converts a pdf file into a text file while keeping the layout of the original pdf.
Full Description

PDFLayoutTextStripper as docker container command-line utility

Converts a PDF file into a text file while keeping the layout of the original PDF. Useful to extract the content from a table or a form in a PDF file. PDFLayoutTextStripper is a subclass of PDFTextStripper class (from the Apache PDFBox library).

  • Use cases
  • How to use

Use cases

Data extraction from a table in a PDF file

-
Data extraction from a form in a PDF file

How to use

# i do it myself
docker build -t pdf-layout-text-stripper .
docker run -v $(pwd):/app pdf-layout-text-stripper "sample.pdf"

# i'm lazy
docker run -v $(pwd):/app madnight/pdf-layout-text-stripper "sample.pdf"
Docker Pull Command
Owner
madnight