Official ElevenLabs Model Context Protocol (MCP) server that enables interaction with powerful Text to Speech and audio processing APIs.
8.7K
24 Tools
Version 4.43 or later needs to be installed to add the server automatically
Use cases
About
Official ElevenLabs Model Context Protocol (MCP) server that enables interaction with powerful Text to Speech and audio processing APIs.
Attribute | Details |
---|---|
Docker Image | mcp/elevenlabs |
Author | elevenlabs |
Repository | https://github.com/elevenlabs/elevenlabs-mcp |
Docker Image built by|Docker Inc.
Docker Scout Health Score| Verify Signature|
COSIGN_REPOSITORY=mcp/signatures cosign verify mcp/elevenlabs --key https://raw.githubusercontent.com/docker/keyring/refs/heads/main/public/mcp/latest.pub
Licence|MIT License
Tools provided by this Server | Short Description |
---|---|
add_knowledge_base_to_agent | Add a knowledge base to ElevenLabs workspace. |
check_subscription | Check the current subscription status. |
compose_music | Convert a prompt to music and save the output audio file to a given directory. |
create_agent | Create a conversational AI agent with custom configuration. |
create_composition_plan | Create a composition plan for music generation. |
create_voice_from_preview | Add a generated voice to the voice library. |
get_agent | Get details about a specific conversational AI agent |
get_conversation | Gets conversation with transcript. |
get_voice | Get details of a specific voice |
isolate_audio | Isolate audio from a file. |
list_agents | List all available conversational AI agents |
list_conversations | Lists agent conversations. |
list_models | List all available models |
list_phone_numbers | List all phone numbers associated with the ElevenLabs account |
make_outbound_call | Make an outbound call using an ElevenLabs agent. |
play_audio | Play an audio file. |
search_voice_library | Search for a voice across the entire ElevenLabs voice library. |
search_voices | Search for existing voices, a voice that has already been added to the user's ElevenLabs voice library. |
speech_to_speech | Transform audio from one voice to another using provided audio files. |
speech_to_text | Transcribe speech from an audio file. |
text_to_sound_effects | Convert text description of a sound effect to sound effect with a given duration. |
text_to_speech | Convert text to speech with a given voice. |
text_to_voice | Create voice previews from a text prompt. |
voice_clone | Create an instant voice clone of a voice using provided audio files. |
add_knowledge_base_to_agent
Add a knowledge base to ElevenLabs workspace. Allowed types are epub, pdf, docx, txt, html.
⚠️ COST WARNING: This tool makes an API call to ElevenLabs which may incur costs. Only use when explicitly requested by the user.
Parameters | Type | Description |
---|---|---|
agent_id | string | ID of the agent to add the knowledge base to. |
knowledge_base_name | string | Name of the knowledge base. |
input_file_path | string optional | Path to the file to add to the knowledge base. |
text | string optional | Text to add to the knowledge base. |
url | string optional | URL of the knowledge base. |
check_subscription
Check the current subscription status. Could be used to measure the usage of the API.
compose_music
Convert a prompt to music and save the output audio file to a given directory. Directory is optional, if not provided, the output file will be saved to $HOME/Desktop.
Parameters | Type | Description |
---|---|---|
composition_plan | string optional | Composition plan to use for the music. Must provide either prompt or composition_plan. |
music_length_ms | string optional | Length of the generated music in milliseconds. Cannot be used if composition_plan is provided. |
output_directory | string optional | Directory to save the output audio file |
prompt | string optional | Prompt to convert to music. Must provide either prompt or composition_plan. |
create_agent
Create a conversational AI agent with custom configuration.
⚠️ COST WARNING: This tool makes an API call to ElevenLabs which may incur costs. Only use when explicitly requested by the user.
Parameters | Type | Description |
---|---|---|
first_message | string | First message the agent will say i.e. "Hi, how can I help you today?" |
name | string | Name of the agent |
system_prompt | string | System prompt for the agent |
asr_quality | string optional | Quality of the ASR. high or low . |
language | string optional | ISO 639-1 language code for the agent |
llm | string optional | LLM to use for the agent |
max_duration_seconds | integer optional | Maximum duration of a conversation in seconds. Defaults to 600 seconds (10 minutes). |
max_tokens | string optional | Maximum number of tokens to generate. |
model_id | string optional | ID of the ElevenLabs model to use for the agent. |
optimize_streaming_latency | integer optional | Optimize streaming latency. Range is 0 to 4. |
record_voice | boolean optional | Whether to record the agent's voice. |
retention_days | integer optional | Number of days to retain the agent's data. |
similarity_boost | number optional | Similarity boost for the agent. Range is 0 to 1. |
stability | number optional | Stability for the agent. Range is 0 to 1. |
temperature | number optional | Temperature for the agent. The lower the temperature, the more deterministic the agent's responses will be. Range is 0 to 1. |
turn_timeout | integer optional | Timeout for the agent to respond in seconds. Defaults to 7 seconds. |
voice_id | string optional | ID of the voice to use for the agent |
create_composition_plan
Create a composition plan for music generation. Usage of this endpoint does not cost any credits but is subject to rate limiting depending on your tier. Composition plans can be used when generating music with the compose_music tool.
Parameters | Type | Description |
---|---|---|
prompt | string | Prompt to create a composition plan for |
music_length_ms | string optional | The length of the composition plan to generate in milliseconds. Must be between 10000ms and 300000ms. Optional - if not provided, the model will choose a length based on the prompt. |
source_composition_plan | string optional | An optional composition plan to use as a source for the new composition plan |
create_voice_from_preview
Add a generated voice to the voice library. Uses the voice ID from the text_to_voice
tool.
⚠️ COST WARNING: This tool makes an API call to ElevenLabs which may incur costs. Only use when explicitly requested by the user.
Parameters | Type | Description |
---|---|---|
generated_voice_id | string | |
voice_description | string | |
voice_name | string |
get_agent
Get details about a specific conversational AI agent
Parameters | Type | Description |
---|---|---|
agent_id | string |
get_conversation
Gets conversation with transcript. Returns: conversation details and full transcript. Use when: analyzing completed agent conversations.
Parameters | Type | Description |
---|---|---|
conversation_id | string | The unique identifier of the conversation to retrieve, you can get the ids from the list_conversations tool. |
get_voice
Get details of a specific voice
Parameters | Type | Description |
---|---|---|
voice_id | string |
isolate_audio
Isolate audio from a file. Saves output file to directory (default: $HOME/Desktop).
⚠️ COST WARNING: This tool makes an API call to ElevenLabs which may incur costs. Only use when explicitly requested by the user.
Parameters | Type | Description |
---|---|---|
input_file_path | string | |
output_directory | string optional |
list_agents
List all available conversational AI agents
list_conversations
Lists agent conversations. Returns: conversation list with metadata. Use when: asked about conversation history.
Parameters | Type | Description |
---|---|---|
agent_id | string optional | |
call_start_after_unix | string optional | |
call_start_before_unix | string optional | |
cursor | string optional | |
max_length | integer optional | |
page_size | integer optional |
list_models
List all available models
list_phone_numbers
List all phone numbers associated with the ElevenLabs account
make_outbound_call
Make an outbound call using an ElevenLabs agent. Automatically detects provider type (Twilio or SIP trunk) and uses the appropriate API.
⚠️ COST WARNING: This tool makes an API call to ElevenLabs which may incur costs. Only use when explicitly requested by the user.
Parameters | Type | Description |
---|---|---|
agent_id | string | The ID of the agent that will handle the call |
agent_phone_number_id | string | The ID of the phone number to use for the call |
to_number | string | The phone number to call (E.164 format: +1xxxxxxxxxx) |
play_audio
Play an audio file. Supports WAV and MP3 formats.
Parameters | Type | Description |
---|---|---|
input_file_path | string |
search_voice_library
Search for a voice across the entire ElevenLabs voice library.
Parameters | Type | Description |
---|---|---|
page | integer optional | Page number to return (0-indexed) |
page_size | integer optional | Number of voices to return per page (1-100) |
search | string optional | Search term to filter voices by |
search_voices
Search for existing voices, a voice that has already been added to the user's ElevenLabs voice library. Searches in name, description, labels and category.
Parameters | Type | Description |
---|---|---|
search | string optional | Search term to filter voices by. Searches in name, description, labels and category. |
sort | string optional | Which field to sort by. created_at_unix might not be available for older voices. |
sort_direction | string optional | Sort order, either ascending or descending. |
speech_to_speech
Transform audio from one voice to another using provided audio files. Saves output file to directory (default: $HOME/Desktop).
⚠️ COST WARNING: This tool makes an API call to ElevenLabs which may incur costs. Only use when explicitly requested by the user.
Parameters | Type | Description |
---|---|---|
input_file_path | string | |
output_directory | string optional | |
voice_name | string optional |
speech_to_text
Transcribe speech from an audio file. When save_transcript_to_file=True: Saves output file to directory (default: $HOME/Desktop). When return_transcript_to_client_directly=True, always returns text directly regardless of output mode.
⚠️ COST WARNING: This tool makes an API call to ElevenLabs which may incur costs. Only use when explicitly requested by the user.
Parameters | Type | Description |
---|---|---|
input_file_path | string | |
diarize | boolean optional | Whether to diarize the audio file. If True, which speaker is currently speaking will be annotated in the transcription. |
language_code | string optional | ISO 639-3 language code for transcription. If not provided, the language will be detected automatically. |
output_directory | string optional | Directory where files should be saved (only used when saving files). |
return_transcript_to_client_directly | boolean optional | Whether to return the transcript to the client directly. |
save_transcript_to_file | boolean optional | Whether to save the transcript to a file. |
text_to_sound_effects
Convert text description of a sound effect to sound effect with a given duration. Saves output file to directory (default: $HOME/Desktop).
Duration must be between 0.5 and 5 seconds.
⚠️ COST WARNING: This tool makes an API call to ElevenLabs which may incur costs. Only use when explicitly requested by the user.
Parameters | Type | Description |
---|---|---|
text | string | Text description of the sound effect |
duration_seconds | number optional | Duration of the sound effect in seconds |
loop | boolean optional | Whether to loop the sound effect. Defaults to False. |
output_directory | string optional | Directory where files should be saved (only used when saving files). |
output_format | string optional |
text_to_speech
Convert text to speech with a given voice. Saves output file to directory (default: $HOME/Desktop).
Only one of voice_id or voice_name can be provided. If none are provided, the default voice will be used.
⚠️ COST WARNING: This tool makes an API call to ElevenLabs which may incur costs. Only use when explicitly requested by the user.
Parameters | Type | Description |
---|---|---|
text | string | |
language | string optional | ISO 639-1 language code for the voice. |
model_id | string optional | |
output_directory | string optional | |
output_format | string optional | |
similarity_boost | number optional | |
speed | number optional | |
stability | number optional | |
style | number optional | |
use_speaker_boost | boolean optional | |
voice_id | string optional | |
voice_name | string optional |
text_to_voice
Create voice previews from a text prompt. Creates three previews with slight variations. Saves output file to directory (default: $HOME/Desktop).
If no text is provided, the tool will auto-generate text.
Voice preview files are saved as: voice_design_(generated_voice_id)_(timestamp).mp3
Example file name: voice_design_Ya2J5uIa5Pq14DNPsbC1_20250403_164949.mp3
⚠ ️ COST WARNING: This tool makes an API call to ElevenLabs which may incur costs. Only use when explicitly requested by the user.
Parameters | Type | Description |
---|---|---|
voice_description | string | |
output_directory | string optional | |
text | string optional |
voice_clone
Create an instant voice clone of a voice using provided audio files.
⚠️ COST WARNING: This tool makes an API call to ElevenLabs which may incur costs. Only use when explicitly requested by the user.
Parameters | Type | Description |
---|---|---|
files | array | |
name | string | |
description | string optional |
{
"mcpServers": {
"elevenlabs": {
"command": "docker",
"args": [
"run",
"-i",
"--rm",
"-e",
"ELEVENLABS_API_KEY",
"-v",
"/local-directory:/local-directory",
"mcp/elevenlabs"
],
"env": {
"ELEVENLABS_API_KEY": "<ELEVENLABS_API_KEY>"
}
}
}
}
Manual installation
You can install the MCP server using:
Installation for