Supadata

Official Supadata MCP Server - Adds powerful video & web scraping to Cursor, Claude and any other LLM clients.

6 Tools

Requires Secrets

Add to Docker Desktop

Version 4.43 or later needs to be installed to add the server automatically

Overview Tools (6)Config Manual installation

Github repository⁠

Use cases

Check the status and retrieve results of a crawl job created with supadata_crawl. **Purpose:** Monitor crawl job progress and retrieve completed results. **Workflow:** Use the job ID returned from supadata_crawl to check status and get results. **Usage Example:** ```json { "name": "supadata_check_crawl_status", "arguments": { "id": "550e8400-e29b-41d4-a716-446655440000" } } ``` **Returns:** - Job status: 'scraping', 'completed', 'failed', or 'cancelled' - For completed jobs: URL, Markdown content, page title, and description for each crawled page - Progress information and any error details if applicable **Tip:** Poll this endpoint periodically until status is 'completed' or 'failed'.

Check the status and retrieve results of a transcript job created with supadata_transcript. **Purpose:** Monitor transcript job progress and retrieve completed results. **Workflow:** Use the job ID returned from supadata_transcript to check status and get results. **Usage Example:** ```json { "name": "supadata_check_transcript_status", "arguments": { "id": "550e8400-e29b-41d4-a716-446655440000" } } ``` **Returns:** - Job status: 'queued', 'active', 'completed', 'failed' - For completed jobs: Full transcript content - Error details if job failed **Tip:** Poll this endpoint periodically until status is 'completed' or 'failed'.

Create a crawl job to extract content from all pages on a website using Supadata's crawling API. **Purpose:** Crawl a whole website and get content of all pages on it. **Best for:** Extracting content from multiple related pages when you need comprehensive coverage. **Workflow:** 1) Create crawl job → 2) Receive job ID → 3) Check job status and retrieve results **Crawling Behavior:** - Follows only child links within the specified domain - Example: For https://supadata.ai/blog, crawls https://supadata.ai/blog/article-1 but not https://supadata.ai/about - To crawl entire website, use top-level URL like https://supadata.ai **Usage Example:** ```json { "name": "supadata_crawl", "arguments": { "url": "https://example.com", "limit": 100 } } ``` **Returns:** Job ID for status checking. Use supadata_check_crawl_status to check progress. **Job Status:** Possible statuses are 'scraping', 'completed', 'failed', or 'cancelled' **Important:** Respect robots.txt and website terms of service when crawling web content.

Crawl a whole website and get all URLs on it using Supadata's mapping API. **Purpose:** Extract all links found on a website for content discovery and sitemap creation. **Best for:** Website content discovery, SEO analysis, content aggregation, automated web scraping and indexing. **Use cases:** Creating a sitemap, running a crawler to fetch content from all pages of a website. **Usage Example:** ```json { "name": "supadata_map", "arguments": { "url": "https://example.com" } } ``` **Returns:** Array of URLs found on the website.

Extract content from any web page to Markdown format using Supadata's powerful scraping API. **Purpose:** Single page content extraction with automatic formatting to Markdown. **Best for:** When you know exactly which page contains the information you need. **Usage Example:** ```json { "name": "supadata_scrape", "arguments": { "url": "https://example.com", "noLinks": false, "lang": "en" } } ``` **Returns:** - URL of the scraped page - Extracted content in Markdown format - Page name and description - Character count - List of URLs found on the page

Extract transcript from supported video platforms (YouTube, TikTok, Instagram, Twitter) or file URLs using Supadata's transcript API. **Purpose:** Get transcripts from video content across multiple platforms. **Best for:** Video content analysis, subtitle extraction, content indexing. **Usage Example:** ```json { "name": "supadata_transcript", "arguments": { "url": "https://youtube.com/watch?v=example", "lang": "en", "text": false, "mode": "auto" } } ``` **Returns:** - Either immediate transcript content - Or job ID for asynchronous processing (use supadata_check_transcript_status) **Supported Platforms:** YouTube, TikTok, Instagram, Twitter, and file URLs

About

Supadata MCP Server

Official Supadata MCP Server - Adds powerful video & web scraping to Cursor, Claude and any other LLM clients.

What is an MCP Server?⁠

Characteristics

Attribute	Details
Docker Image	ghcr.io/supadata-ai/mcp⁠
Author	supadata-ai⁠
Repository	https://github.com/supadata-ai/mcp⁠
Dockerfile	https://github.com/supadata-ai/mcp/blob/main/Dockerfile⁠
Docker Image built by	supadata-ai
Docker Scout Health Score	Not available
Verify Signature	Not available
Licence	MIT License

Available Tools (6)

Tools provided by this Server	Short Description
`supadata_check_crawl_status`	Check the status and retrieve results of a crawl job created with supadata_crawl.
`supadata_check_transcript_status`	Check the status and retrieve results of a transcript job created with supadata_transcript.
`supadata_crawl`	Create a crawl job to extract content from all pages on a website using Supadata's crawling API.
`supadata_map`	Crawl a whole website and get all URLs on it using Supadata's mapping API.
`supadata_scrape`	Extract content from any web page to Markdown format using Supadata's powerful scraping API.
`supadata_transcript`	Extract transcript from supported video platforms (YouTube, TikTok, Instagram, Twitter) or file URLs using Supadata's transcript API.

Tools Details

Tool: `supadata_check_crawl_status`

Check the status and retrieve results of a crawl job created with supadata_crawl.

Purpose: Monitor crawl job progress and retrieve completed results. Workflow: Use the job ID returned from supadata_crawl to check status and get results.

Usage Example:

{
  "name": "supadata_check_crawl_status",
  "arguments": {
    "id": "550e8400-e29b-41d4-a716-446655440000"
  }
}

Returns:

Job status: 'scraping', 'completed', 'failed', or 'cancelled'
For completed jobs: URL, Markdown content, page title, and description for each crawled page
Progress information and any error details if applicable

Tip: Poll this endpoint periodically until status is 'completed' or 'failed'.

Parameters	Type	Description
`id`	`string`	Crawl job ID returned from supadata_crawl

Tool: `supadata_check_transcript_status`

Check the status and retrieve results of a transcript job created with supadata_transcript.

Purpose: Monitor transcript job progress and retrieve completed results. Workflow: Use the job ID returned from supadata_transcript to check status and get results.

Usage Example:

{
  "name": "supadata_check_transcript_status",
  "arguments": {
    "id": "550e8400-e29b-41d4-a716-446655440000"
  }
}

Returns:

Job status: 'queued', 'active', 'completed', 'failed'
For completed jobs: Full transcript content
Error details if job failed

Tip: Poll this endpoint periodically until status is 'completed' or 'failed'.

Parameters	Type	Description
`id`	`string`	Transcript job ID returned from supadata_transcript

Tool: `supadata_crawl`

Create a crawl job to extract content from all pages on a website using Supadata's crawling API.

Purpose: Crawl a whole website and get content of all pages on it. Best for: Extracting content from multiple related pages when you need comprehensive coverage. Workflow: 1) Create crawl job → 2) Receive job ID → 3) Check job status and retrieve results

Crawling Behavior:

Follows only child links within the specified domain
Example: For https://supadata.ai/blog⁠, crawls https://supadata.ai/blog/article-1⁠ but not https://supadata.ai/about⁠
To crawl entire website, use top-level URL like https://supadata.ai⁠

Usage Example:

{
  "name": "supadata_crawl",
  "arguments": {
    "url": "https://example.com",
    "limit": 100
  }
}

Returns: Job ID for status checking. Use supadata_check_crawl_status to check progress. Job Status: Possible statuses are 'scraping', 'completed', 'failed', or 'cancelled'

Important: Respect robots.txt and website terms of service when crawling web content.

Parameters	Type	Description
`url`	`string`	URL of the webpage to crawl
`limit`	`number`optional	Maximum number of pages to crawl (1-5000, default: 100)

Tool: `supadata_map`

Crawl a whole website and get all URLs on it using Supadata's mapping API.

Purpose: Extract all links found on a website for content discovery and sitemap creation. Best for: Website content discovery, SEO analysis, content aggregation, automated web scraping and indexing. Use cases: Creating a sitemap, running a crawler to fetch content from all pages of a website.

Usage Example:

{
  "name": "supadata_map",
  "arguments": {
    "url": "https://example.com"
  }
}

Returns: Array of URLs found on the website.

Parameters	Type	Description
`url`	`string`	URL of the website to map

Tool: `supadata_scrape`

Extract content from any web page to Markdown format using Supadata's powerful scraping API.

Purpose: Single page content extraction with automatic formatting to Markdown. Best for: When you know exactly which page contains the information you need.

Usage Example:

{
  "name": "supadata_scrape",
  "arguments": {
    "url": "https://example.com",
    "noLinks": false,
    "lang": "en"
  }
}

Returns:

URL of the scraped page
Extracted content in Markdown format
Page name and description
Character count
List of URLs found on the page Parameters|Type|Description -|-|- url|string|Web page URL to scrape lang|stringoptional|Preferred language for the scraped content (ISO 639-1 code) noLinks|booleanoptional|When true, removes markdown links from the content

Tool: `supadata_transcript`

Extract transcript from supported video platforms (YouTube, TikTok, Instagram, Twitter) or file URLs using Supadata's transcript API.

Purpose: Get transcripts from video content across multiple platforms. Best for: Video content analysis, subtitle extraction, content indexing.

Usage Example:

{
  "name": "supadata_transcript",
  "arguments": {
    "url": "https://youtube.com/watch?v=example",
    "lang": "en",
    "text": false,
    "mode": "auto"
  }
}

Returns:

Either immediate transcript content
Or job ID for asynchronous processing (use supadata_check_transcript_status)

Supported Platforms: YouTube, TikTok, Instagram, Twitter, and file URLs

Parameters	Type	Description
`url`	`string`	Video or file URL to get transcript from (YouTube, TikTok, Instagram, Twitter, file)
`chunkSize`	`number`optional	Maximum characters per transcript chunk
`lang`	`string`optional	Preferred language code (ISO 639-1)
`mode`	`string`optional	Transcript generation mode
`text`	`boolean`optional	Return plain text instead of formatted output

Use this MCP Server

{
  "mcpServers": {
    "supadata": {
      "command": "docker",
      "args": [
        "run",
        "-i",
        "--rm",
        "-e",
        "SUPADATA_API_KEY",
        "ghcr.io/supadata-ai/mcp"
      ],
      "env": {
        "SUPADATA_API_KEY": "YOUR-API-KEY"
      }
    }
  }
}

Why is it safer to run MCP Servers with Docker?⁠

Manual installation

You can install the MCP server using:

Installation for

Supadata

Supadata MCP Server

Characteristics

Available Tools (6)

Tools Details

Tool: `supadata_check_crawl_status`

Tool: `supadata_check_transcript_status`

Tool: `supadata_crawl`

Tool: `supadata_map`

Tool: `supadata_scrape`

Tool: `supadata_transcript`

Use this MCP Server

Related servers

Brave Search

Elasticsearch

ArXiv MCP Server

EverArt (Archived)

Supadata

Supadata MCP Server

Characteristics

Available Tools (6)

Tools Details

Tool: supadata_check_crawl_status

Tool: supadata_check_transcript_status

Tool: supadata_crawl

Tool: supadata_map

Tool: supadata_scrape

Tool: supadata_transcript

Use this MCP Server

Related servers

Brave Search

Elasticsearch

ArXiv MCP Server

EverArt (Archived)

Tool: `supadata_check_crawl_status`

Tool: `supadata_check_transcript_status`

Tool: `supadata_crawl`

Tool: `supadata_map`

Tool: `supadata_scrape`

Tool: `supadata_transcript`