POST /scrape

The main endpoint. Validates the request, runs the scrape on the html or browser engine (driving a real Chromium browser when needed), records it to executions, and returns the result once it completes.

Request

POST /scrape
Authorization: Bearer <token>
Content-Type:  application/json

Body (any field marked optional may be omitted):

{
  url: string | string[];   // one http(s) URL, or an array (see Multiple URLs below)
  engine: 'html' | 'browser' | 'ai' | 'ai-html' | 'ai-browser';

  // Exactly one of `actions` (deterministic engines) or `query` (AI engines)
  actions?: BrowserActions;   // see /docs/actions-dsl
  query?:   string;           // max 2000 chars

  sessionId?: string;         // replay a recorded auth session — see /docs/engines#authenticated-sessions
  force?:     boolean;        // bypass any cached AI plan for this URL+query
  options?: {
    waitFor?:    'load' | 'domcontentloaded' | 'networkidle' | 'commit' | number | string; // lifecycle, ms, or CSS selector
    timeoutMs?:  number;      // capped at 120000
    resolution?: 'desktop' | 'mobile' | { width: number; height: number }; // browser only
    headless?:   boolean;     // browser only, default true
    blockAds?:   boolean;     // browser only, default true
  };

  // Proxy (all optional). With useProxy=true and no BYO fields, uses the built-in
  // residential pool. See /docs/engines#proxies.
  useProxy?:     boolean | string;  // true = built-in pool; "us" (ISO 3166-1 alpha-2) geo-targets it
  myProxyUrl?:   string;      // BYO: "http://user:pass@host:port"
  myProxyConfig?: {           // BYO structured form
    server:   string;
    username?: string;
    password?: string;
  };
}

The server rejects requests where engine is AI but query is missing, or where engine is deterministic but actions is missing. Both presented together → 400.

Multiple URLs

url may be an array. The same actions/query run against each URL, fanned out through a bounded per-engine pool, and you get one result item per URL — a bad URL fails on its own without sinking the batch. Each URL is charged separately. Per-request caps: html 50, browser/ai/ai-html/ai-browser 10. See Engines → Multiple URLs.

Response

/scrape blocks until the run finishes, then returns the result directly — there is no async/polling mode:

{
  url:  string;                        // final URL, after redirects
  data: Record<string, unknown> | null;
  tookMs: number;
  generatedActions?: BrowserActions;   // AI engines only — the plan the model emitted
  iterations?:       number;           // AI engines only — plan validation/refine passes
}

Every call also creates an execution row (full request, result, error, timestamps) — inspect it later via the Executions API. The response itself does not include the execution id.

Errors

Status	Body	Cause
`400`	`{ "error": "invalid scrape request", "issues": [...] }`	The body failed validation. `issues[]` lists field paths.
`401`	`{ "error": "invalid api key" }`	Bearer didn’t resolve.
`402`	`{ "error": "credit quota exceeded", "creditsLimit", "creditsUsed", "creditsRemaining" }`	Request would exceed your monthly credit pool. See Pricing & quotas.
`429`	`{ "error": "concurrency limit exceeded", "concurrencyLimit" }`	Too many scrapes in flight. `Retry-After: 5` header set.
`5xx`	`{ "error": "<message>" }`	Engine or browser-node failure. The execution row still exists with the error captured.

Example: deterministic

curl -sS -X POST https://api.scrapesilo.com/scrape \
  -H "Authorization: Bearer sf_…" \
  -H "Content-Type: application/json" \
  -d '{
        "url": "https://example.com",
        "engine": "html",
        "actions": { "title": "h1@text" }
      }'

{
  "url": "https://example.com",
  "data": { "title": "Example Domain" },
  "tookMs": 287
}

Example: AI

curl -sS -X POST https://api.scrapesilo.com/scrape \
  -H "Authorization: Bearer sf_…" -H "Content-Type: application/json" \
  -d '{
        "url": "https://news.ycombinator.com",
        "engine": "ai",
        "query": "top 10 stories: title, link, points, author"
      }'

{
  "url": "…",
  "data": { "stories": [ … ] },
  "tookMs": 4120,
  "generatedActions": { "stories": { … } },
  "iterations": 1
}

Save generatedActions, POST it back as actions next time, drop the AI cost.