Engines

The engine field on every scrape request decides how the page is fetched and how the DSL is executed.

At a glance

Engine	Fetch	JS execution	Latency	Cost	Use when
`html`	`fetch()`	No	~100–400 ms	1 credit	Static pages, SSR’d HTML, RSS, sitemaps, anything that renders without JS.
`browser`	Headless Chromium	Yes	~2–5 s	5 credits	SPAs, anti-bot defenses, login flows, anything that needs JS or interaction.
`ai` / `ai-html` / `ai-browser`	Same as `html` or `browser`	(decided by suffix)	+1–5 s for plan	3 / 5 / 10 credits	You don’t want to hand-write selectors.

Cost is per URL — see Pricing & quotas for the full table and how multi-URL requests are charged.

`html` — plain fetch

{
  "url": "https://example.com",
  "engine": "html",
  "actions": { "title": "h1@text" }
}

html fetches the page’s raw HTML and runs the action tree over the static DOM. No JS execution, no cookies, no waiting.

Supports fn actions wait and evaluate (synchronous, no async/Promise race). All others (goto, click, fill, selectOption, 2fa) are explicitly rejected — switch to browser.

`browser` — real Chromium

{
  "url": "https://example.com",
  "engine": "browser",
  "actions": {
    "search": { "fn": "fill", "selector": "#q", "args": "playwright" },
    "submit": { "fn": "click", "selector": "button[type=submit]" },
    "results": {
      "selector": ".result",
      "many": true,
      "output": { "title": "h3@text", "link": "a@href" }
    }
  }
}

Full browser runtime: navigation, click/fill/select, evaluate (with async/timeout race), 2FA via otpauth, network throttling, and so on. Pages run in a real Chromium browser with stealth and ad-blocking on by default.

options.waitFor controls when the navigation is considered “done” — a lifecycle keyword ('networkidle' is the safest default for modern SPAs; 'domcontentloaded' is fastest), a number of ms, or a CSS selector to wait for. options.timeoutMs caps the per-page run at up to 120 s.

Browser-only options

These options fields are honoured by browser / ai-browser and ignored by the html engines:

Field	Default	What it does
`resolution`	`desktop` (1280×800)	Viewport: `"desktop"`, `"mobile"` (390×844 + a mobile UA), or a custom `{ width, height }`.
`headless`	`true`	`false` renders into a real display — better fingerprint stealth, higher CPU/RAM.
`blockAds`	`true`	`false` stops refusing known ad/tracker hosts.

Authenticated sessions

Capture a logged-in browser session once via the dashboard’s live recorder, then replay its cookies + per-origin localStorage on later scrapes by passing the captured id as top-level sessionId. Browser / ai-browser restore the full session; html / ai-html send only the matching cookies (they can’t reach localStorage). List your saved sessions with GET /sessions.

`ai`, `ai-html`, `ai-browser` — generated plan

Pass a natural-language query instead of actions. Keep the query short and field-shaped (lists of extractable values). For page interpretation / long legal reading / yes-no reasoning without a pin-able CSS plan, use the separate analyse API/tool — not scrape AI.

{
  "url": "https://news.ycombinator.com",
  "engine": "ai",
  "query": "top 10 stories: title, link, points, author"
}

The engine fetches the page, digests a DOM skeleton, and an OpenRouter model (z-ai/glm-5.2 by default in the server registry) builds a real actions plan, validates it against the HTML, and may refine failed fields. The plan is returned as generatedActions; refine count is iterations. Healthy plans are cached per (url, query, engine) — set force: true to re-explore.

Variants:

ai — alias for ai-html. Cheapest. Use this first.
ai-html — explicit. Same as ai.
ai-browser — fetches in Chromium so the planner sees the post-JS DOM. Does not enable the screenshot extractor (use deterministic browser for screenshots).

The response includes the generated plan:

{
  "url": "…",
  "data": { "stories": [ … ] },
  "generatedActions": { "stories": { "selector": "…", "many": true, "output": { … } } },
  "iterations": 1
}

Persist generatedActions and POST it back as actions on subsequent runs of the same URL to skip the AI cost entirely.

Picking

Try html first. Cheapest (1 credit), fastest.
Move to browser when html returns null/empty fields or the page needs interaction.
Move to ai-html when writing selectors by hand isn’t worth the time.
Move to ai-browser when (3) returns nothing useful because the page is JS-rendered.

Multiple URLs per request

url accepts either a single URL or an array. The same actions/query run against every URL; the engine fans them out through a bounded pool and returns one result item per URL, so one bad URL fails on its own without sinking the batch. Each URL is charged independently (cost = per-engine credit × URL count).

Per-request URL caps depend on the engine — the cheap fetch path allows more than the ones that open a Chromium session per URL:

Engine	Max URLs	Run concurrency
`html`	50	10
`browser`	10	3
`ai` / `ai-html` / `ai-browser`	10	2

{
  "url": ["https://example.com/a", "https://example.com/b"],
  "engine": "html",
  "actions": { "title": "h1@text" }
}

Proxies

Any scrape — deterministic or AI, html or browser — accepts these optional top-level fields:

Field	What it does
`useProxy: true`	Route the request through the built-in residential pool, when no BYO fields are set.
`useProxy: "us"`	Same, but an ISO 3166-1 alpha-2 country code geo-targets the built-in pool. Ignored when BYO is set.
`myProxyUrl`	BYO proxy as a single URL with embedded creds, e.g. `http://user:pass@host:port`.
`myProxyConfig`	BYO proxy as `{ server, username?, password? }`.

The proxy applies to both engines transparently. When an AI engine caches a plan for replay, the proxy fields are carried through, so deterministic reruns hit the same proxy.

{
  "url": "https://example.com",
  "engine": "html",
  "actions": { "title": "h1@text" },
  "useProxy": "de"
}