ai/chat-demo-model

Verified Publisher

By Docker

Updated 3 months ago

Runtime environment for AI models deployed with Ollama based on ollama/ollama:0.4.0-rc8

Image

3.2K

Chat Demo Ollama Model Runtime

This Docker image is based on ollama/ollama:0.4.0-rc8 and serves as a runtime environment for AI models deployed with Ollama. The image supports real-time streaming chat responses and is pre-configured with tools for API health checks and model management.

This image is a component of the full AI Chat Application Demo - ai/chat-demo. More information about how to run the whole demo can be found on the ai/chat-demo image.

Features

  • Automatic Model Loading: Checks if the specified model is available locally; pulls the model if not.
  • Real-Time Streaming: Designed to work with real-time, streaming responses for interactive applications.
  • Code Assistance: Optimized for technical and coding-related interactions, responding to requests in a concise and example-driven manner.

Requirements

  • Docker: Ensure Docker is installed on your system.
  • Disk Space: Allocate at least 4GB of disk space to store models.
  • Environment Variables:
    • MODEL: Specify the model to load, e.g., mistral:latest.

Quick Start

  1. Run the Container

    Use MODEL to specify the AI model (default is mistral:latest):

    docker run -e MODEL=mistral:latest -p 11434:11434 ai/chat-demo-model:latest
    

    The container automatically pulls and loads the specified model if it’s not already available.

  2. Health Check and Model Management

    The container includes a health check that waits for the Ollama server to start and verifies the specified model's presence. If the model isn’t found, it will be pulled automatically

API Usage

The image is designed to integrate with applications that communicate with Ollama’s API for real-time AI interactions.

Example Endpoint
  • Endpoint: /chat/stream

  • Description: Streams chat responses in real time. Each message is concise and focused on coding and technical topics.

    Example Request (via FastAPI)
    import httpx
    
    async def chat_stream():
        model_host = "http://localhost:11434"
        request_data = {
            "model": "mistral",
            "messages": [{"role": "user", "content": "How can I create a Dockerfile for a FastAPI app?"}],
            "stream": True,
            "temperature": 0.7,
        }
    
        async with httpx.AsyncClient() as client:
            async with client.stream("POST", f"{model_host}/api/chat", json=request_data) as response:
                async for line in response.aiter_lines():
                    print(line)  # Processes each line of the streaming response
    

    Note: The Ollama model is optimized for code and technical responses, with concise answers and practical examples.

Environment Variables

  • MODEL: Model name to use (default is mistral:latest), e.g., MODEL=gpt-4:latest.

Troubleshooting

  • Model Not Found: Ensure the MODEL environment variable is set correctly. The startup script will pull the model if it's unavailable.
  • API Connection: Verify the container is running and accessible on http://localhost:11434.

Docker Pull Command

docker pull ai/chat-demo-model