Skip to content

Engines

The engine field on every scrape request decides how the page is fetched and how the DSL is executed.

EngineFetchJS executionLatencyCostUse when
htmlfetch()No~100–400 ms1 creditStatic pages, SSR’d HTML, RSS, sitemaps, anything that renders without JS.
browserHeadless ChromiumYes~2–5 s5 creditsSPAs, anti-bot defenses, login flows, anything that needs JS or interaction.
ai / ai-html / ai-browserSame as html or browser(decided by suffix)+1–5 s for plan3 / 5 / 10 creditsYou don’t want to hand-write selectors.

Cost is per URL — see Pricing & quotas for the full table and how multi-URL requests are charged.

{
"url": "https://example.com",
"engine": "html",
"actions": { "title": "css=h1@text" }
}

html fetches the page’s raw HTML and runs the action tree over the static DOM. No JS execution, no cookies, no waiting.

Supports fn actions wait and evaluate (synchronous, no async/Promise race). All others (goto, click, fill, selectOption, 2fa) are explicitly rejected — switch to browser.

{
"url": "https://example.com",
"engine": "browser",
"actions": {
"search": { "fn": "fill", "selector": "css=#q", "args": "playwright" },
"submit": { "fn": "click", "selector": "css=button[type=submit]" },
"results": {
"selector": "css=.result",
"many": true,
"output": { "title": "css=h3@text", "link": "css=a@href" }
}
}
}

Full browser runtime: navigation, click/fill/select, evaluate (with async/timeout race), 2FA via otpauth, network throttling, and so on. Pages run in a real Chromium browser with stealth and ad-blocking on by default.

options.waitFor controls when the navigation is considered “done” — a lifecycle keyword ('networkidle' is the safest default for modern SPAs; 'domcontentloaded' is fastest), a number of ms, or a CSS selector to wait for. options.timeoutMs caps the per-page run at up to 120 s.

These options fields are honoured by browser / ai-browser and ignored by the html engines:

FieldDefaultWhat it does
resolutiondesktop (1280×800)Viewport: "desktop", "mobile" (390×844 + a mobile UA), or a custom { width, height }.
headlesstruefalse renders into a real display — better fingerprint stealth, higher CPU/RAM.
blockAdstruefalse stops refusing known ad/tracker hosts.

Capture a logged-in browser session once via the dashboard’s live recorder, then replay its cookies + per-origin localStorage on later scrapes by passing the captured id as top-level sessionId. Browser / ai-browser restore the full session; html / ai-html send only the matching cookies (they can’t reach localStorage). List your saved sessions with GET /sessions.

ai, ai-html, ai-browser — generated plan

Section titled “ai, ai-html, ai-browser — generated plan”

Pass a natural-language query instead of actions:

{
"url": "https://news.ycombinator.com",
"engine": "ai",
"query": "top 10 stories: title, link, points, author"
}

The chosen engine fetches the page (HTML by default, or Browser if the suffix says so) and an AI model analyzes it to produce an action plan — the same shape you’d hand-write, returned as generatedActions. The plan is checked against the page and refined if fields come back empty; the number of passes is returned as iterations. Plans are cached per URL + query, so a healthy result is reused and only regenerated when it goes stale — set force: true to bypass the cache and re-explore.

Variants:

  • ai — alias for ai-html. Cheapest. Use this first.
  • ai-html — explicit. Same as ai.
  • ai-browser — fetches in a real browser so the model sees the post-JS DOM. Use this when the page needs JS to render its content.

The response includes the generated plan:

{
"url": "",
"data": { "stories": [ ] },
"generatedActions": { "stories": { "selector": "", "many": true, "output": { } } },
"iterations": 1
}

Persist generatedActions and POST it back as actions on subsequent runs of the same URL to skip the AI cost entirely.

  1. Try html first. Cheapest (1 credit), fastest.
  2. Move to browser when html returns null/empty fields or the page needs interaction.
  3. Move to ai-html when writing selectors by hand isn’t worth the time.
  4. Move to ai-browser when (3) returns nothing useful because the page is JS-rendered.

url accepts either a single URL or an array. The same actions/query run against every URL; the engine fans them out through a bounded pool and returns one result item per URL, so one bad URL fails on its own without sinking the batch. Each URL is charged independently (cost = per-engine credit × URL count).

Per-request URL caps depend on the engine — the cheap fetch path allows more than the ones that open a Chromium session per URL:

EngineMax URLsRun concurrency
html5010
browser103
ai / ai-html / ai-browser102
{
"url": ["https://example.com/a", "https://example.com/b"],
"engine": "html",
"actions": { "title": "css=h1@text" }
}

Any scrape — deterministic or AI, html or browser — accepts these optional top-level fields:

FieldWhat it does
useProxy: trueRoute the request through the built-in residential pool, when no BYO fields are set.
useProxy: "us"Same, but an ISO 3166-1 alpha-2 country code geo-targets the built-in pool. Ignored when BYO is set.
myProxyUrlBYO proxy as a single URL with embedded creds, e.g. http://user:pass@host:port.
myProxyConfigBYO proxy as { server, username?, password? }.

The proxy applies to both engines transparently. When an AI engine caches a plan for replay, the proxy fields are carried through, so deterministic reruns hit the same proxy.

{
"url": "https://example.com",
"engine": "html",
"actions": { "title": "css=h1@text" },
"useProxy": "de"
}