jasonacox/docman

By jasonacox

Updated 5 months ago

TinyLLM Web Based RAG Document Manager

Image
Machine Learning & AI
0

390

TinyLLM Web Based RAG Document Manager

DocMan: DocMan

With the Document Manager, we explore uploading documents to a Vector Database to use in retrieval augmented generation, allowing a Chatbot to produce answers grounded in knowledge that we provide.

Document Manager (Weaviate)

The document manager allows you to manage the collections and documents in the Weaviate vector database. It provides an easy way for you to upload and ingest the content from files or URL. It performs simple chunking (if requested). The simple UI let's you navigate through the collections and documents.

Environment Variables
  • MAX_CHUNK_SIZE: Maximum size of a chunk in bytes (default 1024)
  • UPLOAD_FOLDER: Folder where uploaded files are stored (default uploads)
  • HOST: Weaviate host (default localhost)
  • COLLECTIONS: Comma separated list of collections allowed (default all)
  • PORT: Port for the web server (default 8000)
  • COLLECTIONS_ADMIN: Allow users to create and delete collections (default True)
Docker Setup

The Document Manager uses a vector database to store the uploaded content. Set up the Weaviate vector database using this docker compose and the included docker-compose.yml file.

# Setup and run Weaviate vector database on port 8080

docker compose up -d

To run the Document Manager, run the following and adjust as needed. Once running, the document manager will be available at http://localhost:5001

docker run \
    -d \
    -p 5001:5001 \
    -e PORT="5001" \
    -e WEAVIATE_HOST="localhost" \
    -e WEAVIATE_GRPC_HOST="localhost" \
    -e WEAVIATE_PORT="8080" \
    -e WEAVIATE_GRPC_PORT="50051" \
    -e MAX_CHUNK_SIZE="1024" \
    -e UPLOAD_FOLDER="uploads" \
    -e COLLECTIONS_ADMIN="true" \
    --name docman \
    --restart unless-stopped \
    jasonacox/docman

Note - You can restrict collections by providing the environmental variable COLLECTIONS to a string of comma separated collection names.

Usage

You can now create collections (libraries of content) and upload files and URLs to be stored into the vector database for the Chatbot to reference.

imageimage

The Chatbot can use this information if you send the prompt command:

# Usage: /rag {library} {opt:number} {prompt}

# Examples:
/rag records How much did we donate to charity in 2022?
/rag blog 5 List some facts about solar energy.

Docker Pull Command

docker pull jasonacox/docman