ai/chat-demo-backend

Verified Publisher

By Docker

Updated 3 months ago

FastAPI-based backend service for AI-driven text and chat generation with Ollama model server.

Image

4.0K

Text Generation API

This Docker image provides a FastAPI-based backend service for AI-driven text and chat generation, built on python:3.12-slim. It integrates with an Ollama model server to support real-time streaming responses, text generation, and chat interactions.

This image is a component of the full AI Chat Application Demo - ai/chat-demo. More information about how to run the whole demo can be found on the ai/chat-demo image.

Features

  • Text Generation: Generate AI-based responses based on user-provided prompts.
  • Chat Streaming: Supports streaming chat responses with customizable models and parameters.
  • Model Management: List available models from the Ollama server.

Prerequisites

  • Docker: Ensure Docker is installed on your system.
  • Environment Variable:
    • MODEL_HOST: The URL for the Ollama model server (default is http://ollama:11434).

Quick Start

  1. Pull the Backend Image

    docker pull ai/chat-demo-backend:latest
    
  2. Run the Model Server

    Ensure the Ollama model server is running before starting the backend container. If you haven’t set it up yet, you can run it with:

    docker run -e MODEL=mistral:latest -p 11434:11434 ai/chat-demo-model:latest
    

    Note: You can replace mistral:latest with any other compatible model name.

  3. Run the Backend Container

    After the model server is running, start the backend container:

    docker run -e MODEL_HOST=http://ollama:11434 -p 8000:8000 ai/chat-demo-backend:latest
    

    This command starts the backend API, which will be accessible at http://localhost:8000.

  4. Access the API

    You can now use the backend API endpoints for text generation, chat interaction, and model listing.

Endpoints

1. Health Check
  • Endpoint: /health
  • Method: GET
  • Description: Checks if the API is running and returns {"status": "healthy"} if successful.
2. Text Generation
  • Endpoint: /api/v1/generate
  • Method: POST
  • Description: Generates text based on a given prompt.
  • Request Body:
    • prompt (string): The input text for generation.
    • max_tokens (int, default=500): Maximum number of tokens for the response.
    • temperature (float, default=0.7): Controls creativity of the response.
  • Response:
    • text (string): The generated response text.
    {
      "prompt": "What is a Dockerfile?",
      "max_tokens": 100,
      "temperature": 0.7
    }
    
3. Chat Interaction
  • Endpoint: /api/v1/chat
  • Method: POST
  • Description: Provides chat-based responses.
  • Request Body:
    • messages (list of ChatMessage): Array of message objects, each with role (e.g., "user") and content.
    • model (string, default="mistral"): Model to use.
    • temperature (float, default=0.7): Response creativity.
  • Response:
    • message (ChatMessage): The assistant’s response.
    • created_at (string): Timestamp of response creation.
    {
      "messages": [
        {"role": "user", "content": "How do I write a Python function?"}
      ],
      "model": "mistral",
      "temperature": 0.7
    }
    
4. Model Listing
  • Endpoint: /api/v1/models
  • Method: GET
  • Description: Lists available models on the Ollama server.
  • Response:
    • models (array of ModelInfo): Array containing details about each model.
5. Streaming Chat
  • Endpoint: /api/v1/chat/stream
  • Method: POST
  • Description: Streams chat responses in real time, particularly suited for coding and technical interactions.
  • Usage: This endpoint is especially useful for interactive applications that require immediate feedback.

Environment Variables

  • MODEL_HOST: URL of the model server (default is http://ollama:11434).

Logging and Debugging

  • The backend logs requests and responses to assist in debugging and performance monitoring.
  • Logs are printed to the console with a default level of INFO.

Docker Pull Command

docker pull ai/chat-demo-backend