Browser Extraction

Browser connectors extract structured data from web pages using headless Chromium.

Node types

Operation	Input	Output	Use case
`browser.extract`	Single URL	Record or Table	Scrape details from one page.
`browser.list`	URL with repeated items	Table	Collect links or items from a listing.

browser.extract

Navigates to a URL, runs extraction steps, returns structured data.

scrape-profile:
  type: service
  op: browser.extract
  params:
    url: "https://example.com/profile/{{ user_id }}"
    steps:
      - action: extract
        selector: h1.profile-name
        field: name
      - action: extract
        selector: span.title
        field: title
  inputs:
    request: { type: Record, from: ref(lookup.user) }
  outputs:
    profile:
      type: Record
      schema:
        name: { type: string }
        title: { type: string }

When the input is a Table, extraction runs once per row.

browser.list

Extracts repeated items from a listing page. Each item becomes a row.

list-products:
  type: source
  op: browser.list
  params:
    url: https://shop.example.com/catalog
    item_selector: div.product-card
    steps:
      - action: extract
        selector: h3.product-name
        field: name
      - action: extract
        selector: span.price
        field: price
      - action: extract
        selector: a.product-link
        attribute: href
        field: url
  outputs:
    products: { type: Table }

Step actions

Action	Description
`navigate`	Load the page. Implicit first step. Set `wait_for: networkidle` for SPAs.
`click`	Click an element. Set `wait_after` (ms) for dynamic content.
`extract`	Pull text or an `attribute` from an element into a named `field`.

steps:
  - action: navigate
    wait_for: networkidle
  - action: click
    selector: button.load-more
    wait_after: 1000
  - action: extract
    selector: h1.title
    field: title
  - action: extract
    selector: a.main-link
    attribute: href
    field: link_url

Semantic selectors

CSS selectors break when page structure changes. Semantic selectors describe elements by their visual role for greater stability.

- action: extract
  semantic: "the price displayed near the buy button"
  field: price

At creation time, the code agent resolves semantic descriptions to concrete CSS selectors. At execution time, the concrete selector runs.

Config reference

Field	Required	Default	Description
`url`	Yes	—	Target URL. Supports `{{ }}` templates.
`item_selector`	`browser.list` only	—	CSS selector for repeated items.
`steps`	Yes	—	Ordered extraction steps.
`steps[].action`	Yes	—	`navigate`, `click`, or `extract`.
`steps[].selector`	Per action	—	CSS selector for the target element.
`steps[].semantic`	No	—	Natural-language element description.
`steps[].field`	`extract` only	—	Output field name.
`steps[].attribute`	No	Inner text	HTML attribute to extract.
`steps[].wait_for`	`navigate`	`load`	`load`, `domcontentloaded`, `networkidle`.
`steps[].wait_after`	`click`	`0`	Milliseconds to wait after click.