Skip to content

POST /scrape

The main endpoint. Validates the request, runs the scrape on the html or browser engine (driving a real Chromium browser when needed), records it to executions, and returns the result once it completes.

POST /scrape
Authorization: Bearer <token>
Content-Type: application/json

Body (any field marked optional may be omitted):

{
url: string | string[]; // one http(s) URL, or an array (see Multiple URLs below)
engine: 'html' | 'browser' | 'ai' | 'ai-html' | 'ai-browser';
// Exactly one of `actions` (deterministic engines) or `query` (AI engines)
actions?: BrowserActions; // see /docs/actions-dsl
query?: string; // max 2000 chars
sessionId?: string; // replay a recorded auth session — see /docs/engines#authenticated-sessions
force?: boolean; // bypass any cached AI plan for this URL+query
options?: {
waitFor?: 'load' | 'domcontentloaded' | 'networkidle' | 'commit' | number | string; // lifecycle, ms, or CSS selector
timeoutMs?: number; // capped at 120000
resolution?: 'desktop' | 'mobile' | { width: number; height: number }; // browser only
headless?: boolean; // browser only, default true
blockAds?: boolean; // browser only, default true
};
// Proxy (all optional). With useProxy=true and no BYO fields, uses the built-in
// residential pool. See /docs/engines#proxies.
useProxy?: boolean | string; // true = built-in pool; "us" (ISO 3166-1 alpha-2) geo-targets it
myProxyUrl?: string; // BYO: "http://user:pass@host:port"
myProxyConfig?: { // BYO structured form
server: string;
username?: string;
password?: string;
};
}

The server rejects requests where engine is AI but query is missing, or where engine is deterministic but actions is missing. Both presented together → 400.

url may be an array. The same actions/query run against each URL, fanned out through a bounded per-engine pool, and you get one result item per URL — a bad URL fails on its own without sinking the batch. Each URL is charged separately. Per-request caps: html 50, browser/ai/ai-html/ai-browser 10. See Engines → Multiple URLs.

/scrape blocks until the run finishes, then returns the result directly — there is no async/polling mode:

{
url: string; // final URL, after redirects
data: Record<string, unknown> | null;
tookMs: number;
generatedActions?: BrowserActions; // AI engines only — the plan the model emitted
iterations?: number; // AI engines only — plan validation/refine passes
}

Every call also creates an execution row (full request, result, error, timestamps) — inspect it later via the Executions API. The response itself does not include the execution id.

StatusBodyCause
400{ "error": "invalid scrape request", "issues": [...] }The body failed validation. issues[] lists field paths.
401{ "error": "invalid api key" }Bearer didn’t resolve.
402{ "error": "credit quota exceeded", "creditsLimit", "creditsUsed", "creditsRemaining" }Request would exceed your monthly credit pool. See Pricing & quotas.
429{ "error": "concurrency limit exceeded", "concurrencyLimit" }Too many scrapes in flight. Retry-After: 5 header set.
5xx{ "error": "<message>" }Engine or browser-node failure. The execution row still exists with the error captured.
Terminal window
curl -sS -X POST https://api.scrapesilo.com/scrape \
-H "Authorization: Bearer sf_…" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"engine": "html",
"actions": { "title": "css=h1@text" }
}'
{
"url": "https://example.com",
"data": { "title": "Example Domain" },
"tookMs": 287
}
Terminal window
curl -sS -X POST https://api.scrapesilo.com/scrape \
-H "Authorization: Bearer sf_…" -H "Content-Type: application/json" \
-d '{
"url": "https://news.ycombinator.com",
"engine": "ai",
"query": "top 10 stories: title, link, points, author"
}'
{
"url": "",
"data": { "stories": [ ] },
"tookMs": 4120,
"generatedActions": { "stories": { } },
"iterations": 1
}

Save generatedActions, POST it back as actions next time, drop the AI cost.