browser scraping api

Turn any page into clean, structured data.

Selectors when you can, AI when you can’t — one endpoint, every site. Run it on real, self-hosted Chromium and get JSON back inline.

1,000 free credits / month · no card required

Real self-hosted Chromium· MCP-native· Typed action DSL + AI mode· Stripe billing· Credits that scale
Why ScrapeSilo

Built for the page you actually have to scrape.

A real browser, self-hosted

Scrapes run on actual Chromium through your own browser node — JavaScript, SPAs and anti-bot pages render the way a user sees them. No emulated DOM, no flaky headless tricks.

Selectors when you can, AI when you can’t

Hand-write a typed action DSL for pages you know, or pass a plain-English query and let the model emit a validated plan. Same endpoint, same schema, one mental model.

AI plans that pay for themselves

The first AI run generates an extraction plan; every run after reuses it — keyed per URL, healed when a page drifts. You pay the model once, then scrape at selector speed.

One endpoint, five engines

Pick your trade-off. Pay only for what it costs.

Every engine runs the same action schema. Credits are charged per URL, weighted by what the run actually costs us to run.

html 1cr

Static HTML over plain fetch — the cheap floor.

ai-html 3cr

Free-form query resolved on the static DOM.

browser 5cr

Real Chromium for JS, SPAs & anti-bot pages.

ai 5cr

English query, auto-routed to a live browser.

ai-browser 10cr

AI planning on a fully rendered page.

map 1cr

Discover a site’s URLs via sitemap & robots.

The hard parts

The stuff that breaks scrapers — handled.

Real pages fight back. ScrapeSilo absorbs the failure modes so your extraction code stays boring.

JavaScript & SPAs

The browser engine drives real Chromium, so client-rendered pages return the DOM a user actually sees.

Anti-bot defenses

A stealth browser plus a proxy hop means most bot checks see a genuine session, not a headless tell.

Proxies & geo-targeting

Route through a residential pool, target a country with a two-letter code, or bring your own proxy.

Per-URL resilience

Send up to 50 URLs per call; each one succeeds or fails on its own, so a single bad page never sinks the batch.

Authenticated pages

Capture a logged-in session once, then replay its cookies and storage on every later scrape.

Structured output

Extract text, attributes, tables, clean Markdown, JSON paths or screenshots — typed, not scraped-and-prayed.

Looks like this

From request to clean JSON, inline.

Deterministic CSS/XPath. Cheapest, fastest, exact.

request
curl -X POST https://api.scrapesilo.com/scrape \
  -H "Authorization: Bearer sf_live_…" \
  -H "Content-Type: application/json" \
  -d '{
  "url": "https://example.com",
  "engine": "html",
  "actions": {
    "title": "css=h1@text",
    "body": "css=p@text"
  }
}'
import requests

resp = requests.post(
    "https://api.scrapesilo.com/scrape",
    headers={"Authorization": "Bearer sf_live_…"},
    json={
        "url": "https://example.com",
        "engine": "html",
        "actions": {
            "title": "css=h1@text",
            "body": "css=p@text"
        }
    },
)
print(resp.json())
const res = await fetch("https://api.scrapesilo.com/scrape", {
  method: "POST",
  headers: {
    Authorization: "Bearer sf_live_…",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    "url": "https://example.com",
    "engine": "html",
    "actions": {
      "title": "css=h1@text",
      "body": "css=p@text"
    }
  }),
});

const data = await res.json();
type ScrapeResponse = { url: string; data: unknown; tookMs: number };

const res = await fetch("https://api.scrapesilo.com/scrape", {
  method: "POST",
  headers: {
    Authorization: "Bearer sf_live_…",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    "url": "https://example.com",
    "engine": "html",
    "actions": {
      "title": "css=h1@text",
      "body": "css=p@text"
    }
  }),
});

const { data } = (await res.json()) as ScrapeResponse;
response
{
  "url": "https://example.com",
  "data": {
    "title": "Example Domain",
    "body": "This domain is for use in examples…"
  },
  "tookMs": 312
}

Describe it in English. We generate the plan and cache it.

request
curl -X POST https://api.scrapesilo.com/scrape \
  -H "Authorization: Bearer sf_live_…" \
  -H "Content-Type: application/json" \
  -d '{
  "url": "https://news.ycombinator.com",
  "engine": "ai",
  "query": "top 10 stories: title, link, points, author"
}'
import requests

resp = requests.post(
    "https://api.scrapesilo.com/scrape",
    headers={"Authorization": "Bearer sf_live_…"},
    json={
        "url": "https://news.ycombinator.com",
        "engine": "ai",
        "query": "top 10 stories: title, link, points, author"
    },
)
print(resp.json())
const res = await fetch("https://api.scrapesilo.com/scrape", {
  method: "POST",
  headers: {
    Authorization: "Bearer sf_live_…",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    "url": "https://news.ycombinator.com",
    "engine": "ai",
    "query": "top 10 stories: title, link, points, author"
  }),
});

const data = await res.json();
type ScrapeResponse = { url: string; data: unknown; tookMs: number };

const res = await fetch("https://api.scrapesilo.com/scrape", {
  method: "POST",
  headers: {
    Authorization: "Bearer sf_live_…",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    "url": "https://news.ycombinator.com",
    "engine": "ai",
    "query": "top 10 stories: title, link, points, author"
  }),
});

const { data } = (await res.json()) as ScrapeResponse;
response
{
  "url": "https://news.ycombinator.com",
  "data": [ /* 10 story objects */ ],
  "tookMs": 2847,
  "generatedActions": {
    "title":  "css=.titleline > a@text",
    "link":   "css=.titleline > a@href",
    "points": "css=.score@text",
    "author": "css=.hnuser@text"
  },
  "aiIterations": 1
}

Real Chromium for JS-heavy, anti-bot, SPA pages.

request
curl -X POST https://api.scrapesilo.com/scrape \
  -H "Authorization: Bearer sf_live_…" \
  -H "Content-Type: application/json" \
  -d '{
  "url": "https://example-spa.com",
  "engine": "browser",
  "actions": {
    "title": "css=h1@text",
    "content": "css=main@markdown"
  },
  "options": {
    "waitFor": "networkidle",
    "blockAds": true
  }
}'
import requests

resp = requests.post(
    "https://api.scrapesilo.com/scrape",
    headers={"Authorization": "Bearer sf_live_…"},
    json={
        "url": "https://example-spa.com",
        "engine": "browser",
        "actions": {
            "title": "css=h1@text",
            "content": "css=main@markdown"
        },
        "options": {
            "waitFor": "networkidle",
            "blockAds": True
        }
    },
)
print(resp.json())
const res = await fetch("https://api.scrapesilo.com/scrape", {
  method: "POST",
  headers: {
    Authorization: "Bearer sf_live_…",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    "url": "https://example-spa.com",
    "engine": "browser",
    "actions": {
      "title": "css=h1@text",
      "content": "css=main@markdown"
    },
    "options": {
      "waitFor": "networkidle",
      "blockAds": true
    }
  }),
});

const data = await res.json();
type ScrapeResponse = { url: string; data: unknown; tookMs: number };

const res = await fetch("https://api.scrapesilo.com/scrape", {
  method: "POST",
  headers: {
    Authorization: "Bearer sf_live_…",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    "url": "https://example-spa.com",
    "engine": "browser",
    "actions": {
      "title": "css=h1@text",
      "content": "css=main@markdown"
    },
    "options": {
      "waitFor": "networkidle",
      "blockAds": true
    }
  }),
});

const { data } = (await res.json()) as ScrapeResponse;
response
{
  "url": "https://example-spa.com",
  "data": {
    "title": "Pricing — Acme",
    "content": "# Pricing\n\nSimple, usage-based…"
  },
  "tookMs": 1840,
  "antibot": "passed"
}
Built for

What teams run through ScrapeSilo.

One endpoint, a lot of jobs — every one of these is the same /scrape call with a different plan.

LLM & RAG ingestion

Turn any page into clean Markdown or typed JSON your model can embed — no boilerplate, no broken tags.

Price & catalog monitoring

Track competitor prices, stock and listings across thousands of product URLs in one batched call.

SERP & market research

Pull search results, rankings and SERP features at scale to feed dashboards, models and reports.

Lead enrichment

Resolve company and contact pages into structured firmographics for your CRM or GTM stack.

Content migration

Map a whole site, then extract every page into structured records in one coordinated pass.

Agent web access

Give Claude or any MCP client live, key-scoped read access to the web — no glue code.

A whole console, not just an endpoint

Everything around the scrape.

Playground

Fire a one-off scrape with full engine and option control, right from the browser.

Live recorder

Watch a browser session stream in real time as it runs.

Execution history

Every run recorded with request, result and per-URL errors.

Trace replay

Recorded sessions saved as Playwright traces you can scrub through.

MCP-native

Call scrape, get_execution and list_executions straight from Claude or any MCP client.

Proxy + geo

Built-in proxy pool with country targeting, or bring your own.

Concurrency ceilings

Per-plan limits with clean backpressure — never a silent stampede.

Markdown & attributes

Extract text, attributes, HTML or clean Markdown — many elements at once.

~200ms
Typical HTML response
2–5s
Full Chromium render
50
URLs per request
5
Engines, one API
MCP-native

Scrape straight from Claude.

Point any MCP client at your account and the model can scrape pages, list runs and inspect results with your key — no glue code.

See the MCP docs
POST /mcp/scrape
Authorization: Bearer sf_live_…

tools:
  • scrape
  • list_executions
  • get_execution
Pricing

Start free. Pay by the credit, not the seat.

Credits reset monthly. Per-credit cost drops as you scale — and AI plan caching means you’re mostly paying selector prices anyway.

Free

Kick the tyres.

$0

  • 1,000 credits / month
  • 2 concurrent scrapes
  • All engines + AI mode
  • MCP access
Start free

Starter

For steady pipelines.

$49/mo

  • 25,000 credits / month
  • 5 concurrent scrapes
  • AI plan caching
  • Execution history
Choose Starter

Scale

High-volume extraction.

$499/mo

  • 750,000 credits / month
  • 40 concurrent scrapes
  • Lowest per-credit cost
  • Priority throughput
Choose Scale

Per-URL credit cost: html 1ai-html 3browser 5ai 5ai-browser 10

FAQ

The fine print, up front.

What’s a credit?

One unit of scrape cost, charged per URL and weighted by engine: HTML 1, AI-HTML 3, browser 5, AI 5, AI-browser 10. A multi-URL request costs the sum. Credits reset monthly and don’t roll over.

Which engine should I use?

Start with html — it’s the cheapest and works for static pages. Reach for browser when a page needs JavaScript or trips anti-bot. Use ai / ai-html when you’d rather describe the data than write selectors.

Does it handle anti-bot pages?

The browser engine drives real Chromium through a self-hosted node with a proxy hop and ad/tracker blocking, so most bot checks see a genuine browser.

Can I call it from an LLM?

Yes. Every account exposes an MCP endpoint — POST /mcp/scrape — so Claude Code and other MCP clients can scrape, list and inspect executions with your API key.

Can I cancel anytime?

Yes. Billing is handled by Stripe; manage or cancel from the billing portal. The Free plan is never charged.

Do I have to write selectors?

No. Pass a plain-English query in AI mode and we generate a validated action plan, return it in the response, and cache it — so you can lift it into deterministic selectors whenever you want.

Point it at a URL. Get JSON back.

Spin up in minutes with 1,000 free credits — no card, every engine, full console.