Skip to content

Browser Extraction

Browser connectors extract structured data from web pages using headless Chromium.

OperationInputOutputUse case
browser.extractSingle URLRecord or TableScrape details from one page.
browser.listURL with repeated itemsTableCollect links or items from a listing.

Navigates to a URL, runs extraction steps, returns structured data.

scrape-profile:
type: service
op: browser.extract
params:
url: "https://example.com/profile/{{ user_id }}"
steps:
- action: extract
selector: h1.profile-name
field: name
- action: extract
selector: span.title
field: title
inputs:
request: { type: Record, from: ref(lookup.user) }
outputs:
profile:
type: Record
schema:
name: { type: string }
title: { type: string }

When the input is a Table, extraction runs once per row.

Extracts repeated items from a listing page. Each item becomes a row.

list-products:
type: source
op: browser.list
params:
url: https://shop.example.com/catalog
item_selector: div.product-card
steps:
- action: extract
selector: h3.product-name
field: name
- action: extract
selector: span.price
field: price
- action: extract
selector: a.product-link
attribute: href
field: url
outputs:
products: { type: Table }
ActionDescription
navigateLoad the page. Implicit first step. Set wait_for: networkidle for SPAs.
clickClick an element. Set wait_after (ms) for dynamic content.
extractPull text or an attribute from an element into a named field.
steps:
- action: navigate
wait_for: networkidle
- action: click
selector: button.load-more
wait_after: 1000
- action: extract
selector: h1.title
field: title
- action: extract
selector: a.main-link
attribute: href
field: link_url

CSS selectors break when page structure changes. Semantic selectors describe elements by their visual role for greater stability.

- action: extract
semantic: "the price displayed near the buy button"
field: price

At creation time, the code agent resolves semantic descriptions to concrete CSS selectors. At execution time, the concrete selector runs.

FieldRequiredDefaultDescription
urlYesTarget URL. Supports {{ }} templates.
item_selectorbrowser.list onlyCSS selector for repeated items.
stepsYesOrdered extraction steps.
steps[].actionYesnavigate, click, or extract.
steps[].selectorPer actionCSS selector for the target element.
steps[].semanticNoNatural-language element description.
steps[].fieldextract onlyOutput field name.
steps[].attributeNoInner textHTML attribute to extract.
steps[].wait_fornavigateloadload, domcontentloaded, networkidle.
steps[].wait_afterclick0Milliseconds to wait after click.