Browser Extraction
Browser connectors extract structured data from web pages using headless Chromium.
Node types
Section titled “Node types”| Operation | Input | Output | Use case |
|---|---|---|---|
browser.extract | Single URL | Record or Table | Scrape details from one page. |
browser.list | URL with repeated items | Table | Collect links or items from a listing. |
browser.extract
Section titled “browser.extract”Navigates to a URL, runs extraction steps, returns structured data.
scrape-profile: type: service op: browser.extract params: url: "https://example.com/profile/{{ user_id }}" steps: - action: extract selector: h1.profile-name field: name - action: extract selector: span.title field: title inputs: request: { type: Record, from: ref(lookup.user) } outputs: profile: type: Record schema: name: { type: string } title: { type: string }When the input is a Table, extraction runs once per row.
browser.list
Section titled “browser.list”Extracts repeated items from a listing page. Each item becomes a row.
list-products: type: source op: browser.list params: url: https://shop.example.com/catalog item_selector: div.product-card steps: - action: extract selector: h3.product-name field: name - action: extract selector: span.price field: price - action: extract selector: a.product-link attribute: href field: url outputs: products: { type: Table }Step actions
Section titled “Step actions”| Action | Description |
|---|---|
navigate | Load the page. Implicit first step. Set wait_for: networkidle for SPAs. |
click | Click an element. Set wait_after (ms) for dynamic content. |
extract | Pull text or an attribute from an element into a named field. |
steps: - action: navigate wait_for: networkidle - action: click selector: button.load-more wait_after: 1000 - action: extract selector: h1.title field: title - action: extract selector: a.main-link attribute: href field: link_urlSemantic selectors
Section titled “Semantic selectors”CSS selectors break when page structure changes. Semantic selectors describe elements by their visual role for greater stability.
- action: extract semantic: "the price displayed near the buy button" field: priceAt creation time, the code agent resolves semantic descriptions to concrete CSS selectors. At execution time, the concrete selector runs.
Config reference
Section titled “Config reference”| Field | Required | Default | Description |
|---|---|---|---|
url | Yes | — | Target URL. Supports {{ }} templates. |
item_selector | browser.list only | — | CSS selector for repeated items. |
steps | Yes | — | Ordered extraction steps. |
steps[].action | Yes | — | navigate, click, or extract. |
steps[].selector | Per action | — | CSS selector for the target element. |
steps[].semantic | No | — | Natural-language element description. |
steps[].field | extract only | — | Output field name. |
steps[].attribute | No | Inner text | HTML attribute to extract. |
steps[].wait_for | navigate | load | load, domcontentloaded, networkidle. |
steps[].wait_after | click | 0 | Milliseconds to wait after click. |