Official Supadata MCP Server - Adds powerful video & web scraping to Cursor, Claude and any other LLM clients.
6 Tools
Version 4.43 or later needs to be installed to add the server automatically
Use cases
About
Official Supadata MCP Server - Adds powerful video & web scraping to Cursor, Claude and any other LLM clients.
Attribute | Details |
---|---|
Docker Image | ghcr.io/supadata-ai/mcp |
Author | supadata-ai |
Repository | https://github.com/supadata-ai/mcp |
Dockerfile | https://github.com/supadata-ai/mcp/blob/main/Dockerfile |
Docker Image built by | supadata-ai |
Docker Scout Health Score | Not available |
Verify Signature | Not available |
Licence | MIT License |
Tools provided by this Server | Short Description |
---|---|
supadata_check_crawl_status | Check the status and retrieve results of a crawl job created with supadata_crawl. |
supadata_check_transcript_status | Check the status and retrieve results of a transcript job created with supadata_transcript. |
supadata_crawl | Create a crawl job to extract content from all pages on a website using Supadata's crawling API. |
supadata_map | Crawl a whole website and get all URLs on it using Supadata's mapping API. |
supadata_scrape | Extract content from any web page to Markdown format using Supadata's powerful scraping API. |
supadata_transcript | Extract transcript from supported video platforms (YouTube, TikTok, Instagram, Twitter) or file URLs using Supadata's transcript API. |
supadata_check_crawl_status
Check the status and retrieve results of a crawl job created with supadata_crawl.
Purpose: Monitor crawl job progress and retrieve completed results. Workflow: Use the job ID returned from supadata_crawl to check status and get results.
Usage Example:
{
"name": "supadata_check_crawl_status",
"arguments": {
"id": "550e8400-e29b-41d4-a716-446655440000"
}
}
Returns:
Tip: Poll this endpoint periodically until status is 'completed' or 'failed'.
Parameters | Type | Description |
---|---|---|
id | string | Crawl job ID returned from supadata_crawl |
supadata_check_transcript_status
Check the status and retrieve results of a transcript job created with supadata_transcript.
Purpose: Monitor transcript job progress and retrieve completed results. Workflow: Use the job ID returned from supadata_transcript to check status and get results.
Usage Example:
{
"name": "supadata_check_transcript_status",
"arguments": {
"id": "550e8400-e29b-41d4-a716-446655440000"
}
}
Returns:
Tip: Poll this endpoint periodically until status is 'completed' or 'failed'.
Parameters | Type | Description |
---|---|---|
id | string | Transcript job ID returned from supadata_transcript |
supadata_crawl
Create a crawl job to extract content from all pages on a website using Supadata's crawling API.
Purpose: Crawl a whole website and get content of all pages on it. Best for: Extracting content from multiple related pages when you need comprehensive coverage. Workflow: 1) Create crawl job → 2) Receive job ID → 3) Check job status and retrieve results
Crawling Behavior:
Usage Example:
{
"name": "supadata_crawl",
"arguments": {
"url": "https://example.com",
"limit": 100
}
}
Returns: Job ID for status checking. Use supadata_check_crawl_status to check progress. Job Status: Possible statuses are 'scraping', 'completed', 'failed', or 'cancelled'
Important: Respect robots.txt and website terms of service when crawling web content.
Parameters | Type | Description |
---|---|---|
url | string | URL of the webpage to crawl |
limit | number optional | Maximum number of pages to crawl (1-5000, default: 100) |
supadata_map
Crawl a whole website and get all URLs on it using Supadata's mapping API.
Purpose: Extract all links found on a website for content discovery and sitemap creation. Best for: Website content discovery, SEO analysis, content aggregation, automated web scraping and indexing. Use cases: Creating a sitemap, running a crawler to fetch content from all pages of a website.
Usage Example:
{
"name": "supadata_map",
"arguments": {
"url": "https://example.com"
}
}
Returns: Array of URLs found on the website.
Parameters | Type | Description |
---|---|---|
url | string | URL of the website to map |
supadata_scrape
Extract content from any web page to Markdown format using Supadata's powerful scraping API.
Purpose: Single page content extraction with automatic formatting to Markdown. Best for: When you know exactly which page contains the information you need.
Usage Example:
{
"name": "supadata_scrape",
"arguments": {
"url": "https://example.com",
"noLinks": false,
"lang": "en"
}
}
Returns:
url
|string
|Web page URL to scrape
lang
|string
optional|Preferred language for the scraped content (ISO 639-1 code)
noLinks
|boolean
optional|When true, removes markdown links from the contentsupadata_transcript
Extract transcript from supported video platforms (YouTube, TikTok, Instagram, Twitter) or file URLs using Supadata's transcript API.
Purpose: Get transcripts from video content across multiple platforms. Best for: Video content analysis, subtitle extraction, content indexing.
Usage Example:
{
"name": "supadata_transcript",
"arguments": {
"url": "https://youtube.com/watch?v=example",
"lang": "en",
"text": false,
"mode": "auto"
}
}
Returns:
Supported Platforms: YouTube, TikTok, Instagram, Twitter, and file URLs
Parameters | Type | Description |
---|---|---|
url | string | Video or file URL to get transcript from (YouTube, TikTok, Instagram, Twitter, file) |
chunkSize | number optional | Maximum characters per transcript chunk |
lang | string optional | Preferred language code (ISO 639-1) |
mode | string optional | Transcript generation mode |
text | boolean optional | Return plain text instead of formatted output |
{
"mcpServers": {
"supadata": {
"command": "docker",
"args": [
"run",
"-i",
"--rm",
"-e",
"SUPADATA_API_KEY",
"ghcr.io/supadata-ai/mcp"
],
"env": {
"SUPADATA_API_KEY": "YOUR-API-KEY"
}
}
}
}
Manual installation
You can install the MCP server using:
Installation for