apache/tika

Sponsored OSS

By The Apache Software Foundation

Updated 23 days ago

Container images for Apache Tika Server (https://github.com/apache/tika-docker)

Image
Content Management System
Data Science
Machine Learning & AI

10M+

Apache Tika Server Build Status

This repo contains convenience Docker images published by the Apache Tika Dev team for Apache Tika Server.

The images are build using the Dockerfiles in the tika-docker repos to create a functional Apache Tika Server instance that contains the latest Ubuntu running the appropriate version's server on Port 9998 using Java 8 (until version 1.20), Java 11 (1.21 and 1.24.1), Java 14 (until 1.27/2.0.0), and Java 16 for newer versions.

There is a minimal version, which contains only Apache Tika and it's core dependencies, and a full version, which also includes dependencies for the GDAL and Tesseract OCR parsers. To balance showing functionality versus the size of the full image, this file currently installs the language packs for the following languages:

  • English
  • French
  • German
  • Italian
  • Spanish.

To install more languages simply update the apt-get command to include the package containing the language you required, or include your own custom packs using an ADD command.

Usage

You can pull down the version you would like using:

docker pull apache/tika:<version>

Then to run the container, execute the following command:

docker run -d -p 127.0.0.1:9998:9998 apache/tika:<version>

Where <version> is the Apache Tika Server version - e.g. 2.5.0 or 2.5.0-full.

Note

In the examples above, we recommend binding the server to localhost because Docker alters iptables and may expose your tika-server to the internet. If you are confident that your tika-server is on an isolated network you can simply run:

docker run -d -p 9998:9998 apache/tika:<version>

CHANGES

For a full list of changes since 2.5.0.1, please visit CHANGES.md.

Building

To build the image from scratch, simply invoke:

docker build -t 'apache/tika' github.com/apache/tika-docker

You can then use the following command (using the name you allocated in the build command as part of -t option):

docker run -d -p 127.0.0.1:9998:9998 apache/tika

More Information

For more infomation on Apache Tika Server, go to the Apache Tika Server documentation.

For more information on Apache Tika, go to the official Apache Tika project website.

For more information on the Apache Software Foundation, go to the Apache Software Foundation website.

Authors

Apache Tika Dev Team (dev@tika.apache.org)

Contributors

There have been a range of contributors on GitHub and via suggestions, including:

Disclaimer

It is worth noting that whilst these Docker images download the binary JARs published by the Apache Tika Team on the Apache Software Foundation distribution sites, only the source release of an Apache Software Foundation project is an official release artefact. See Release Distribution Policy for more details.

Docker Pull Command

docker pull apache/tika