Actions DSL
The actions field is a tree. Leaves are selectors; branches are sub-trees that recurse into matched elements; fn nodes are imperative steps (navigate, click, fill, etc.).
Selector syntax
Section titled “Selector syntax”Every selector is <scheme>=<value>@<attribute>.
css=h1@text → text of the first <h1>xpath=//*[@id='x']@html → outer HTML of the elementcss=img@src → absolutized src URL of the first <img>"css=div[data-x='@']"@text → quoting lets you put '@' inside the selector- Scheme —
css(default if omitted) orxpath. - Selector — any valid CSS selector or XPath expression. Wrap in quotes if it contains
@. - Attribute — what to extract. Defaults to the raw DOM attribute if it’s not one of the special families below.
Attribute families
Section titled “Attribute families”| Family | Returns | Notes |
|---|---|---|
text / textOriginal | string | Visible text content. |
html / innerHtml | string | Inner HTML of the element. |
value | string | null | Input value (<input value>). Browser engine only. |
tagName | string | E.g. "div", "a". |
cssSelector | string | A unique CSS selector for the matched element. |
json / json:<path> | unknown | Parses the element’s text content as JSON. Optional <path> drills in. See JSON paths. |
attributes | Record<string, string> | All DOM attributes as a map. |
href / hrefOriginal | string | href absolutized against the page URL (hrefOriginal is the raw value). |
src / srcOriginal | string | src absolutized; Original is the raw value. Mirrors poster / posterOriginal. |
srcset / srcsetOriginal | { url, size }[] | Parsed srcset; Original keeps the raw URLs. |
table / tableJson / tableArray | Record<string,string>[] or string[][] | Pick array form via options.headers = false. |
md / fitMd | string | Element rendered as Markdown. fitMd strips boilerplate aggressively. |
screenshot | string | Browser engine only; PNG. Public URL by default, data URL via returnType: 'dataUrl'. See options.fullPage. |
| anything else | string | null | Returns the named DOM attribute via getAttribute(...). |
context / contextAlpha / ocr / htmlTree are not yet supported in v2.
Branches: many, output
Section titled “Branches: many, output”{ "stories": { "selector": "css=.athing", "many": true, "output": { "title": "css=.titleline > a@text", "link": "css=.titleline > a@href", "rank": "css=.rank@text" } }}many: true— applyoutputonce per matched element; returns an array.many: false(default) — apply to the first match; returns one object.outputcan be a string (treated as<attribute>on the matched element) or a nested branch.
Nested branches operate relative to the matched element, not the document root — useful for scoping repeated structures.
JSON paths
Section titled “JSON paths”The json:<path> extractor parses an element’s text content as JSON, then drills in. Two syntaxes are supported:
| Form | Returns | Example |
|---|---|---|
| Dot / bracket | the scalar at that path | json:items.0.name, json:items[0].name, json:items["k"] |
| JSONPath | array of all matches (null if zero) | json:items[*].name, json:items.*.name, json:$..price |
A path is treated as JSONPath when it contains *, .., or starts with $. Sloppy forms like items.[*].name are normalized.
{ "first": "script#__NEXT_DATA__@json:items[0].name", // → "a" "names": "script#__NEXT_DATA__@json:items[*].name", // → ["a", "b"] "prices": "body@json:$..price" // recursive descent}JSON-row projection
Section titled “JSON-row projection”When selector ends in @json:<path> and output is a sub-action object, the parent path is fanned out into rows and each row is projected through output. Leaf strings are JSON paths relative to the row (no @ — those mean DOM extractors and are rejected here).
{ "images": { "selector": "body@json:products[*].images[*]", "output": { "pos": "position", "src": "src" } }}// → [ { "pos": 1, "src": "a.jpg" }, { "pos": 2, "src": "b.jpg" }, … ]- Wildcard parent (
products[*],images.*.foo,$..items) → array of projected rows. - Scalar parent (
products[0]) → single projected object. - Sub-action
outputmay nest groups; each nested string is still a row-relative JSON path. - Leaf paths support the same dot / bracket / JSONPath syntax as the top-level extractor (e.g.
output: { allTags: "tags[*].k" }).
Use this instead of parallel many: true extractions when the source is JSON — it avoids the desync risk of zipping parallel arrays.
Function actions (fn)
Section titled “Function actions (fn)”{ "login": { "fn": "fill", "selector": "css=input[name=email]", "args": "ada@example.com" }, "submit": { "fn": "click", "selector": "css=button[type=submit]" }, "wait": { "fn": "wait", "args": 1500 }}fn | Args | Engines |
|---|---|---|
wait | number ms, or omit + selector to wait for it | both |
goto | URL | browser |
back / forward | — | browser |
click | — (uses selector) | browser |
fill | string value (uses selector) | browser |
selectOption | string or string[] | browser |
2fa | { kind: 'otpauth', secret: '…' } | browser |
evaluate (default if no other arm matches) | inline JS string in args | both — synchronous in html, async race in browser |
fn nodes don’t produce output keys directly; they sequence side effects. Mix them with extract leaves freely — keys are executed top-to-bottom.
Options
Section titled “Options”Per-action options (set on a leaf or branch):
| Field | Meaning |
|---|---|
timeout | Per-action timeout in ms. Hard cap: 30 s for extracts, 60 s for fn. |
filter | Regex string. The extracted value must match (.test(value)); else the key is null. |
match | Regex string with a capture group. Returns the first capture group instead of the raw value. |
headers | tableJson only. false → 2D array; true (default) → array of objects keyed by header text. |
excludeTags / includeTags | md / fitMd only. Override the default allow/deny tag lists. |
returnType | screenshot only. 'url' (default) or 'dataUrl'. |
fullPage | screenshot only. Full-page screenshot if true. |
Top-level options (sibling to actions):
| Field | Meaning |
|---|---|
waitFor | Browser engine. A lifecycle (load / domcontentloaded / networkidle / commit), a number of ms, or a CSS selector to wait for. |
timeoutMs | Whole-run timeout, capped at 120 s. |
End-to-end example
Section titled “End-to-end example”{ "url": "https://example.com/blog", "engine": "browser", "options": { "waitFor": "networkidle", "timeoutMs": 30000 }, "actions": { "accept": { "fn": "click", "selector": "css=button:has-text('Accept')" }, "posts": { "selector": "css=article", "many": true, "output": { "title": "css=h2@text", "url": "css=h2 a@href", "summary": { "selector": "css=p.summary", "options": { "match": "(?<=Summary:\\s).+" } } } } }}