Supadata

Supadata

Official Supadata MCP Server - Adds powerful video & web scraping to Cursor, Claude and any other LLM clients.

6 Tools

Requires Secrets
Add to Docker Desktop

Version 4.43 or later needs to be installed to add the server automatically

Use cases

About

Supadata MCP Server

Official Supadata MCP Server - Adds powerful video & web scraping to Cursor, Claude and any other LLM clients.

What is an MCP Server?

Characteristics

AttributeDetails
Docker Imageghcr.io/supadata-ai/mcp
Authorsupadata-ai
Repositoryhttps://github.com/supadata-ai/mcp
Dockerfilehttps://github.com/supadata-ai/mcp/blob/main/Dockerfile
Docker Image built bysupadata-ai
Docker Scout Health ScoreNot available
Verify SignatureNot available
LicenceMIT License

Available Tools (6)

Tools provided by this ServerShort Description
supadata_check_crawl_statusCheck the status and retrieve results of a crawl job created with supadata_crawl.
supadata_check_transcript_statusCheck the status and retrieve results of a transcript job created with supadata_transcript.
supadata_crawlCreate a crawl job to extract content from all pages on a website using Supadata's crawling API.
supadata_mapCrawl a whole website and get all URLs on it using Supadata's mapping API.
supadata_scrapeExtract content from any web page to Markdown format using Supadata's powerful scraping API.
supadata_transcriptExtract transcript from supported video platforms (YouTube, TikTok, Instagram, Twitter) or file URLs using Supadata's transcript API.

Tools Details

Tool: supadata_check_crawl_status

Check the status and retrieve results of a crawl job created with supadata_crawl.

Purpose: Monitor crawl job progress and retrieve completed results. Workflow: Use the job ID returned from supadata_crawl to check status and get results.

Usage Example:

{
  "name": "supadata_check_crawl_status",
  "arguments": {
    "id": "550e8400-e29b-41d4-a716-446655440000"
  }
}

Returns:

  • Job status: 'scraping', 'completed', 'failed', or 'cancelled'
  • For completed jobs: URL, Markdown content, page title, and description for each crawled page
  • Progress information and any error details if applicable

Tip: Poll this endpoint periodically until status is 'completed' or 'failed'.

ParametersTypeDescription
idstringCrawl job ID returned from supadata_crawl

Tool: supadata_check_transcript_status

Check the status and retrieve results of a transcript job created with supadata_transcript.

Purpose: Monitor transcript job progress and retrieve completed results. Workflow: Use the job ID returned from supadata_transcript to check status and get results.

Usage Example:

{
  "name": "supadata_check_transcript_status",
  "arguments": {
    "id": "550e8400-e29b-41d4-a716-446655440000"
  }
}

Returns:

  • Job status: 'queued', 'active', 'completed', 'failed'
  • For completed jobs: Full transcript content
  • Error details if job failed

Tip: Poll this endpoint periodically until status is 'completed' or 'failed'.

ParametersTypeDescription
idstringTranscript job ID returned from supadata_transcript

Tool: supadata_crawl

Create a crawl job to extract content from all pages on a website using Supadata's crawling API.

Purpose: Crawl a whole website and get content of all pages on it. Best for: Extracting content from multiple related pages when you need comprehensive coverage. Workflow: 1) Create crawl job → 2) Receive job ID → 3) Check job status and retrieve results

Crawling Behavior:

Usage Example:

{
  "name": "supadata_crawl",
  "arguments": {
    "url": "https://example.com",
    "limit": 100
  }
}

Returns: Job ID for status checking. Use supadata_check_crawl_status to check progress. Job Status: Possible statuses are 'scraping', 'completed', 'failed', or 'cancelled'

Important: Respect robots.txt and website terms of service when crawling web content.

ParametersTypeDescription
urlstringURL of the webpage to crawl
limitnumberoptionalMaximum number of pages to crawl (1-5000, default: 100)

Tool: supadata_map

Crawl a whole website and get all URLs on it using Supadata's mapping API.

Purpose: Extract all links found on a website for content discovery and sitemap creation. Best for: Website content discovery, SEO analysis, content aggregation, automated web scraping and indexing. Use cases: Creating a sitemap, running a crawler to fetch content from all pages of a website.

Usage Example:

{
  "name": "supadata_map",
  "arguments": {
    "url": "https://example.com"
  }
}

Returns: Array of URLs found on the website.

ParametersTypeDescription
urlstringURL of the website to map

Tool: supadata_scrape

Extract content from any web page to Markdown format using Supadata's powerful scraping API.

Purpose: Single page content extraction with automatic formatting to Markdown. Best for: When you know exactly which page contains the information you need.

Usage Example:

{
  "name": "supadata_scrape",
  "arguments": {
    "url": "https://example.com",
    "noLinks": false,
    "lang": "en"
  }
}

Returns:

  • URL of the scraped page
  • Extracted content in Markdown format
  • Page name and description
  • Character count
  • List of URLs found on the page Parameters|Type|Description -|-|- url|string|Web page URL to scrape lang|stringoptional|Preferred language for the scraped content (ISO 639-1 code) noLinks|booleanoptional|When true, removes markdown links from the content

Tool: supadata_transcript

Extract transcript from supported video platforms (YouTube, TikTok, Instagram, Twitter) or file URLs using Supadata's transcript API.

Purpose: Get transcripts from video content across multiple platforms. Best for: Video content analysis, subtitle extraction, content indexing.

Usage Example:

{
  "name": "supadata_transcript",
  "arguments": {
    "url": "https://youtube.com/watch?v=example",
    "lang": "en",
    "text": false,
    "mode": "auto"
  }
}

Returns:

  • Either immediate transcript content
  • Or job ID for asynchronous processing (use supadata_check_transcript_status)

Supported Platforms: YouTube, TikTok, Instagram, Twitter, and file URLs

ParametersTypeDescription
urlstringVideo or file URL to get transcript from (YouTube, TikTok, Instagram, Twitter, file)
chunkSizenumberoptionalMaximum characters per transcript chunk
langstringoptionalPreferred language code (ISO 639-1)
modestringoptionalTranscript generation mode
textbooleanoptionalReturn plain text instead of formatted output

Use this MCP Server

{
  "mcpServers": {
    "supadata": {
      "command": "docker",
      "args": [
        "run",
        "-i",
        "--rm",
        "-e",
        "SUPADATA_API_KEY",
        "ghcr.io/supadata-ai/mcp"
      ],
      "env": {
        "SUPADATA_API_KEY": "YOUR-API-KEY"
      }
    }
  }
}

Why is it safer to run MCP Servers with Docker?

Manual installation

You can install the MCP server using:

Installation for

Related servers