Actions DSL

The actions field is a tree. Leaves extract values; branches recurse into matched elements; fn nodes are imperative steps (browser engine).

CSS is the default selector syntax. Prefer bare selectors (h1, a.cta) — do not prefix with css= (accepted but redundant). Use xpath= for XPath. jQuery extensions (:contains(), :eq(), …) are not supported and throw at runtime.

Selector syntax

h1                           → text of the first <h1> (default extractor: text)
a.primary@href               → absolutized href
xpath=//*[@id='x']@html      → outer HTML
"div[data-x='@']"@text       → quoting lets you put '@' inside the selector

Full string form: [scheme=]<selector>@<extractor>.

Scheme — omit (css default) or xpath= / text=.
Selector — standard CSS or XPath. Wrap in quotes if it contains @.
Extractor — what to pull (see table). Defaults to text.

Extractors

Extractor	Returns	Notes
`text` / `textOriginal`	`string`	Visible text (`text` trims).
`html` / `innerHtml`	`string`	Outer / inner HTML.
`md` / `fitMd`	`string`	Element as Markdown; `fitMd` strips boilerplate harder.
`value`	`string \| null`	Input value. Browser engine.
`tagName`	`string`	e.g. `"div"`.
`cssSelector`	`string`	A CSS path for the matched element.
`json` / `json:<path>`	`unknown`	Parse text as JSON; optional path drill. See JSON paths.
`attributes`	`Record<string, string>`	All DOM attributes.
`href` / `hrefOriginal`	`string`	Absolutized (`Original` = raw). Same pattern for `src` / `poster` / `srcset`.
`table` / `tableJson` / `tableArray`	rows	`options.headers = false` → 2D array.
`screenshot`	`string`	Deterministic `browser` only — public PNG URL by default (`returnType: 'dataUrl'` for base64).
anything else	`string \| null`	Named DOM attribute via `getAttribute`.

context / ocr / htmlTree are not supported in v2.

Branches: `many`, `output`

{
  "stories": {
    "selector": ".athing",
    "many": true,
    "output": {
      "title": ".titleline > a@text",
      "link": ".titleline > a@href",
      "rank": ".rank@text"
    }
  }
}

many: true — apply once per match → array.
many: false (default) — first match only.
Nested branches are relative to the matched element.

Flat vs nested `output` (easy to get wrong)

Form	Example	How the string is read
Flat string	`{ "selector": "a", "many": true, "output": "href" }`	`"href"` = extractor name on each match
Nested self	`{ "selector": "a", "many": true, "output": { "url": "@href", "label": "@text" } }`	Leading `@` = extractor on this match
Nested child	`{ "selector": "li", "many": true, "output": { "link": "a@href" } }`	Look up descendant `a` (first match only)
Nested list	`{ "output": { "tags": { "selector": "a.tag", "many": true, "output": "text" } } }`	Array of values inside each row
Nested bug (scalar)	`{ "output": { "tags": "a.tag@text" } }`	First tag only — not an array
Nested bug (null)	`{ "output": { "url": "href", "label": "text" } }`	CSS for `<href>`/`<text>` → null

String nested leaves never return arrays. For list fields inside a row (tags, chips, multi-images), use object form + many: true.

Always model multi-field rows as one nested many: true action. Never parallel arrays (titles[] + authors[]) — they desync when a selector misses a row.

JSON paths

Form	Returns	Example
Dot / bracket	scalar	`json:items.0.name`
JSONPath (`*`, `..`, or `$…`)	array of matches	`json:items[*].name`

JSON-row projection

When selector ends in @json:<path> and output is an object, each row is projected. Leaf strings are row-relative JSON paths (no @ — @ means DOM extractors and is rejected here).

{
  "images": {
    "selector": "body@json:products[*].images[*]",
    "output": { "pos": "position", "src": "src" }
  }
}

Function actions (`fn`)

Browser engine (subset on html: wait, sync evaluate only).

{
  "login": { "fn": "fill", "selector": "input[name=email]", "args": "ada@example.com" },
  "submit": { "fn": "click", "selector": "button[type=submit]" },
  "wait": { "fn": "wait", "args": 1500 }
}

`fn`	Notes
`wait` / `goto` / `back` / `forward` / `click` / `fill` / `selectOption` / `2fa` / `evaluate`	See engines doc for support matrix

Keys run top-to-bottom; fn nodes sequence side effects.

Options

Per-action options:

Field	Meaning
`timeout`	Per-action ms (caps apply).
`filter`	Regex; value must match. On flat `many`, drops non-matches.
`match`	Regex; return first capture group (or full match).
`maxChars`	Cap string length after match/filter (+head / −tail).
`headers`	`tableJson`: `false` → 2D array.
`excludeTags` / `includeTags`	`md` / `fitMd` / html strip lists.
`returnType` / `fullPage`	Screenshot only.

Top-level request options (sibling of actions): waitFor, timeoutMs, resolution, headless, blockAds — see Engines.

End-to-end example

{
  "url": "https://example.com/blog",
  "engine": "browser",
  "options": { "waitFor": "networkidle", "timeoutMs": 30000 },
  "actions": {
    "posts": {
      "selector": "article",
      "many": true,
      "output": {
        "title": "h2@text",
        "url": "h2 a@href",
        "summary": { "selector": "p.summary", "options": { "match": "Summary:\\s*(.+)" } }
      }
    }
  }
}

Prefer AI mode (engine: "ai" + query) to generate plans, then pin generatedActions for production.