Engines
The engine field on every scrape request decides how the page is fetched and how the DSL is executed.
At a glance
Section titled “At a glance”| Engine | Fetch | JS execution | Latency | Cost | Use when |
|---|---|---|---|---|---|
html | fetch() | No | ~100–400 ms | 1 credit | Static pages, SSR’d HTML, RSS, sitemaps, anything that renders without JS. |
browser | Headless Chromium | Yes | ~2–5 s | 5 credits | SPAs, anti-bot defenses, login flows, anything that needs JS or interaction. |
ai / ai-html / ai-browser | Same as html or browser | (decided by suffix) | +1–5 s for plan | 3 / 5 / 10 credits | You don’t want to hand-write selectors. |
Cost is per URL — see Pricing & quotas for the full table and how multi-URL requests are charged.
html — plain fetch
Section titled “html — plain fetch”{ "url": "https://example.com", "engine": "html", "actions": { "title": "css=h1@text" }}html fetches the page’s raw HTML and runs the action tree over the static DOM. No JS execution, no cookies, no waiting.
Supports fn actions wait and evaluate (synchronous, no async/Promise race). All others (goto, click, fill, selectOption, 2fa) are explicitly rejected — switch to browser.
browser — real Chromium
Section titled “browser — real Chromium”{ "url": "https://example.com", "engine": "browser", "actions": { "search": { "fn": "fill", "selector": "css=#q", "args": "playwright" }, "submit": { "fn": "click", "selector": "css=button[type=submit]" }, "results": { "selector": "css=.result", "many": true, "output": { "title": "css=h3@text", "link": "css=a@href" } } }}Full browser runtime: navigation, click/fill/select, evaluate (with async/timeout race), 2FA via otpauth, network throttling, and so on. Pages run in a real Chromium browser with stealth and ad-blocking on by default.
options.waitFor controls when the navigation is considered “done” — a lifecycle keyword ('networkidle' is the safest default for modern SPAs; 'domcontentloaded' is fastest), a number of ms, or a CSS selector to wait for. options.timeoutMs caps the per-page run at up to 120 s.
Browser-only options
Section titled “Browser-only options”These options fields are honoured by browser / ai-browser and ignored by the html engines:
| Field | Default | What it does |
|---|---|---|
resolution | desktop (1280×800) | Viewport: "desktop", "mobile" (390×844 + a mobile UA), or a custom { width, height }. |
headless | true | false renders into a real display — better fingerprint stealth, higher CPU/RAM. |
blockAds | true | false stops refusing known ad/tracker hosts. |
Authenticated sessions
Section titled “Authenticated sessions”Capture a logged-in browser session once via the dashboard’s live recorder, then replay its cookies + per-origin localStorage on later scrapes by passing the captured id as top-level sessionId. Browser / ai-browser restore the full session; html / ai-html send only the matching cookies (they can’t reach localStorage). List your saved sessions with GET /sessions.
ai, ai-html, ai-browser — generated plan
Section titled “ai, ai-html, ai-browser — generated plan”Pass a natural-language query instead of actions:
{ "url": "https://news.ycombinator.com", "engine": "ai", "query": "top 10 stories: title, link, points, author"}The chosen engine fetches the page (HTML by default, or Browser if the suffix says so) and an AI model analyzes it to produce an action plan — the same shape you’d hand-write, returned as generatedActions. The plan is checked against the page and refined if fields come back empty; the number of passes is returned as iterations. Plans are cached per URL + query, so a healthy result is reused and only regenerated when it goes stale — set force: true to bypass the cache and re-explore.
Variants:
ai— alias forai-html. Cheapest. Use this first.ai-html— explicit. Same asai.ai-browser— fetches in a real browser so the model sees the post-JS DOM. Use this when the page needs JS to render its content.
The response includes the generated plan:
{ "url": "…", "data": { "stories": [ … ] }, "generatedActions": { "stories": { "selector": "…", "many": true, "output": { … } } }, "iterations": 1}Persist generatedActions and POST it back as actions on subsequent runs of the same URL to skip the AI cost entirely.
Picking
Section titled “Picking”- Try
htmlfirst. Cheapest (1 credit), fastest. - Move to
browserwhenhtmlreturns null/empty fields or the page needs interaction. - Move to
ai-htmlwhen writing selectors by hand isn’t worth the time. - Move to
ai-browserwhen (3) returns nothing useful because the page is JS-rendered.
Multiple URLs per request
Section titled “Multiple URLs per request”url accepts either a single URL or an array. The same actions/query run against every URL; the engine fans them out through a bounded pool and returns one result item per URL, so one bad URL fails on its own without sinking the batch. Each URL is charged independently (cost = per-engine credit × URL count).
Per-request URL caps depend on the engine — the cheap fetch path allows more than the ones that open a Chromium session per URL:
| Engine | Max URLs | Run concurrency |
|---|---|---|
html | 50 | 10 |
browser | 10 | 3 |
ai / ai-html / ai-browser | 10 | 2 |
{ "url": ["https://example.com/a", "https://example.com/b"], "engine": "html", "actions": { "title": "css=h1@text" }}Proxies
Section titled “Proxies”Any scrape — deterministic or AI, html or browser — accepts these optional top-level fields:
| Field | What it does |
|---|---|
useProxy: true | Route the request through the built-in residential pool, when no BYO fields are set. |
useProxy: "us" | Same, but an ISO 3166-1 alpha-2 country code geo-targets the built-in pool. Ignored when BYO is set. |
myProxyUrl | BYO proxy as a single URL with embedded creds, e.g. http://user:pass@host:port. |
myProxyConfig | BYO proxy as { server, username?, password? }. |
The proxy applies to both engines transparently. When an AI engine caches a plan for replay, the proxy fields are carried through, so deterministic reruns hit the same proxy.
{ "url": "https://example.com", "engine": "html", "actions": { "title": "css=h1@text" }, "useProxy": "de"}