MCP

scraper-farm ships an MCP endpoint over plain HTTP JSON-RPC 2.0:

POST /mcp — per-user. Tools: scrape, analyse, map, get_execution, list_executions. Use this from an LLM client to drive scrapes directly. (POST /mcp/scrape is a deprecated alias from the key-in-header era and keeps working.)

The endpoint accepts a single JSON-RPC request per POST (no SSE, no long-lived connection). Auth is OAuth (sign in with your dashboard account — no key handling) or an sf_… API key for headless use.

Claude Code (OAuth — recommended)

claude mcp add --transport http scraper-farm https://api.scrapesilo.com/mcp

Then inside Claude Code run /mcp, pick scraper-farm, and choose Authenticate. Your browser opens a sign-in with the same account you use for the dashboard; once approved, the tools connect — no API key to create, export, or rotate. Any MCP client that implements the spec’s OAuth flow (Cursor, Goose, MCP Inspector, …) discovers the same flow automatically from the endpoint’s metadata.

Scrapes run against your account’s default API key, so usage shows up in the dashboard as usual.

API key (CI / headless)

For environments where a browser sign-in isn’t possible, bearer-token auth keeps working. Add to .mcp.json in your repo (or ~/.claude/.mcp.json for global):

{
  "mcpServers": {
    "scraper-farm-scrape": {
      "type": "http",
      "url": "https://api.scrapesilo.com/mcp",
      "headers": {
        "Authorization": "Bearer ${SCRAPER_FARM_API_KEY}"
      }
    }
  }
}

Export SCRAPER_FARM_API_KEY=sf_… in your shell (create the key under Settings → API keys).

Cursor / other clients

Cursor’s ~/.cursor/mcp.json accepts the same type: 'http' shape. So does Goose. Both the OAuth flow and the sf_… bearer work.

For clients that only support stdio MCP, run a local proxy that forwards stdio to the HTTP endpoint — mcp-remote works well and handles the OAuth flow itself (it opens the browser for you):

npx mcp-remote https://api.scrapesilo.com/mcp

(or pass --header "Authorization: Bearer $SCRAPER_FARM_API_KEY" to skip OAuth.)

Tools

`scrape`

Same surface as POST /scrape. Args: { url, engine, actions?, query?, sessionId?, force?, options?, useProxy?, myProxyUrl?, myProxyConfig? } (url may be an array). Returns an array of { url, data, tookMs, executionId, batchId?, error?, generatedActions?, iterations? }.

AI mode: engine: "ai" + short field-shaped query → pin generatedActions for reruns.
Deterministic: engine: "html" | "browser" + actions (see Actions DSL).
Prefer analyse for interpretation / long-form page reading — do not stuff multi-part document Q&A into scrape AI.

`analyse`

Document analysis over page markdown (not a CSS plan). Use for outlines, summaries, yes/no flags, compliance-style reading. Args align with scrape proxies + url + query. Returns { url, data: { result, evidence, confidence }, engine: "analyse", … }. No generatedActions to pin.

`map`

Same surface as POST /map. Args: { url, include?, exclude?, allowSubdomains?, depth?, limit?, ignoreSitemap? } → { tookMs, sitemaps: [{ domain, url, links }] }.

`get_execution`

args: { id: string } → ExecutionRow. Equivalent to GET /executions/:id.

`list_executions`

args: { status?, batchId?, limit? } → ExecutionRow[]. Equivalent to GET /executions.