ai/chat-demo-backend
FastAPI-based backend service for AI-driven text and chat generation with Ollama model server.
4.0K
This Docker image provides a FastAPI-based backend service for AI-driven text and chat generation, built on python:3.12-slim
. It integrates with an Ollama model server to support real-time streaming responses, text generation, and chat interactions.
This image is a component of the full AI Chat Application Demo - ai/chat-demo
. More information about how to run the whole demo can be found on the ai/chat-demo
image.
MODEL_HOST
: The URL for the Ollama model server (default is http://ollama:11434
).Pull the Backend Image
docker pull ai/chat-demo-backend:latest
Run the Model Server
Ensure the Ollama model server is running before starting the backend container. If you haven’t set it up yet, you can run it with:
docker run -e MODEL=mistral:latest -p 11434:11434 ai/chat-demo-model:latest
Note: You can replace mistral:latest with any other compatible model name.
Run the Backend Container
After the model server is running, start the backend container:
docker run -e MODEL_HOST=http://ollama:11434 -p 8000:8000 ai/chat-demo-backend:latest
This command starts the backend API, which will be accessible at http://localhost:8000
.
Access the API
You can now use the backend API endpoints for text generation, chat interaction, and model listing.
/health
GET
{"status": "healthy"}
if successful./api/v1/generate
POST
prompt
(string): The input text for generation.max_tokens
(int, default=500): Maximum number of tokens for the response.temperature
(float, default=0.7): Controls creativity of the response.text
(string): The generated response text.{
"prompt": "What is a Dockerfile?",
"max_tokens": 100,
"temperature": 0.7
}
/api/v1/chat
POST
messages
(list of ChatMessage): Array of message objects, each with role
(e.g., "user") and content
.model
(string, default="mistral"): Model to use.temperature
(float, default=0.7): Response creativity.message
(ChatMessage): The assistant’s response.created_at
(string): Timestamp of response creation.{
"messages": [
{"role": "user", "content": "How do I write a Python function?"}
],
"model": "mistral",
"temperature": 0.7
}
/api/v1/models
GET
models
(array of ModelInfo): Array containing details about each model./api/v1/chat/stream
POST
MODEL_HOST
: URL of the model server (default is http://ollama:11434
).INFO
.docker pull ai/chat-demo-backend