POST /scrape
The main endpoint. Validates the request, runs the scrape on the html or browser engine (driving a real Chromium browser when needed), records it to executions, and returns the result once it completes.
Request
Section titled “Request”POST /scrapeAuthorization: Bearer <token>Content-Type: application/jsonBody (any field marked optional may be omitted):
{ url: string | string[]; // one http(s) URL, or an array (see Multiple URLs below) engine: 'html' | 'browser' | 'ai' | 'ai-html' | 'ai-browser';
// Exactly one of `actions` (deterministic engines) or `query` (AI engines) actions?: BrowserActions; // see /docs/actions-dsl query?: string; // max 2000 chars
sessionId?: string; // replay a recorded auth session — see /docs/engines#authenticated-sessions force?: boolean; // bypass any cached AI plan for this URL+query options?: { waitFor?: 'load' | 'domcontentloaded' | 'networkidle' | 'commit' | number | string; // lifecycle, ms, or CSS selector timeoutMs?: number; // capped at 120000 resolution?: 'desktop' | 'mobile' | { width: number; height: number }; // browser only headless?: boolean; // browser only, default true blockAds?: boolean; // browser only, default true };
// Proxy (all optional). With useProxy=true and no BYO fields, uses the built-in // residential pool. See /docs/engines#proxies. useProxy?: boolean | string; // true = built-in pool; "us" (ISO 3166-1 alpha-2) geo-targets it myProxyUrl?: string; // BYO: "http://user:pass@host:port" myProxyConfig?: { // BYO structured form server: string; username?: string; password?: string; };}The server rejects requests where engine is AI but query is missing, or where engine is deterministic but actions is missing. Both presented together → 400.
Multiple URLs
Section titled “Multiple URLs”url may be an array. The same actions/query run against each URL, fanned out through a bounded per-engine pool, and you get one result item per URL — a bad URL fails on its own without sinking the batch. Each URL is charged separately. Per-request caps: html 50, browser/ai/ai-html/ai-browser 10. See Engines → Multiple URLs.
Response
Section titled “Response”/scrape blocks until the run finishes, then returns the result directly — there is no async/polling mode:
{ url: string; // final URL, after redirects data: Record<string, unknown> | null; tookMs: number; generatedActions?: BrowserActions; // AI engines only — the plan the model emitted iterations?: number; // AI engines only — plan validation/refine passes}Every call also creates an execution row (full request, result, error, timestamps) — inspect it later via the Executions API. The response itself does not include the execution id.
Errors
Section titled “Errors”| Status | Body | Cause |
|---|---|---|
400 | { "error": "invalid scrape request", "issues": [...] } | The body failed validation. issues[] lists field paths. |
401 | { "error": "invalid api key" } | Bearer didn’t resolve. |
402 | { "error": "credit quota exceeded", "creditsLimit", "creditsUsed", "creditsRemaining" } | Request would exceed your monthly credit pool. See Pricing & quotas. |
429 | { "error": "concurrency limit exceeded", "concurrencyLimit" } | Too many scrapes in flight. Retry-After: 5 header set. |
5xx | { "error": "<message>" } | Engine or browser-node failure. The execution row still exists with the error captured. |
Example: deterministic
Section titled “Example: deterministic”curl -sS -X POST https://api.scrapesilo.com/scrape \ -H "Authorization: Bearer sf_…" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com", "engine": "html", "actions": { "title": "css=h1@text" } }'{ "url": "https://example.com", "data": { "title": "Example Domain" }, "tookMs": 287}Example: AI
Section titled “Example: AI”curl -sS -X POST https://api.scrapesilo.com/scrape \ -H "Authorization: Bearer sf_…" -H "Content-Type: application/json" \ -d '{ "url": "https://news.ycombinator.com", "engine": "ai", "query": "top 10 stories: title, link, points, author" }'{ "url": "…", "data": { "stories": [ … ] }, "tookMs": 4120, "generatedActions": { "stories": { … } }, "iterations": 1}Save generatedActions, POST it back as actions next time, drop the AI cost.