ai/meta-llama
ai/meta-llama
The Llama model Docker image provides Meta's Llama model (version 3.1) packaged with vllm
for efficient deployment on NVIDIA GPUs (CUDA 12.6). Designed for advanced natural language processing (NLP) applications, this image handles complex language queries and offers a range of model sizes to suit different performance and resource needs. Use cases include interactive chatbots, summarization tools, and automated content generation.
To get started, use the following command to launch the Llama model:
docker run -it --rm --gpus=all -p 8000:8000 --name vllm ai/meta-llama:3.1-8B-Instruct-cuda-12.6 --cpu-offload-gb 5 --max-model-len 30576
This starts the model server with GPU support and CPU offloading for enhanced performance. You can test the model’s NLP capabilities with an OpenAI-compatible request:
curl -s http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llm",
"prompt": "Enter your prompt here",
"max_tokens": 10,
"temperature": 0.5
}'
3.1-8B-Instruct
, 3.1-8B-Instruct-cuda-12.6
: Optimized for general-purpose NLP applications; balances performance with resource use.The Llama models developed by Meta are distributed under the Meta Llama Community License, which grants users a non-exclusive, worldwide, non-transferable, and royalty-free limited license to use, reproduce, distribute, and modify the Llama Materials. For comprehensive details, please refer to the full license text available on Meta's official GitHub repository.
docker pull ai/meta-llama