Pipeline Spec

This is the structured reference for pipeline.rf.yaml. AI agents parse this page to generate valid pipeline definitions. Every field is documented with its type, constraints, and purpose.

Top-level schema

# pipeline.rf.yaml — complete schema
# ─────────────────────────────────────────────

name: lead-scoring          # string, required. Human-readable pipeline name.
                             # Lowercase, hyphens, no spaces.

version: 1                  # integer, required. Increment on breaking changes.

description: >              # string, optional. What this pipeline does.
  Score inbound leads by engagement metrics
  and push qualified leads to CRM.

nodes:                       # object, required. Map of node ID → node definition.
  # ... (see Node schema below)

edges:                       # string[], required. List of edge strings.
  # ... (see Edge schema below)

Node schema

Each key under nodes is the node ID. IDs are human-readable slugs: lowercase, hyphens, no spaces, no dots in the ID itself.

nodes:
  read-leads:                # string (node ID), required. Unique within the pipeline.

    type: file.source        # string, required. Node type identifier.
                             # Format: category.operation
                             # Categories: file, data, http, google, browser,
                             #             cli, value, router, custom

    # ── Type-specific config fields ──────────────────────────
    # These vary by node type. Examples below.

    path: data/leads.csv     # string. File path (file.source, file.write).
    format: csv              # string. File format: csv, json, ndjson.

    query: "SELECT * FROM i" # string. SQL query (data.sql).
    expression: "score > 80" # string. Filter expression (data.filter).

    url: "https://api.ex/v1" # string. URL (http.request).
    method: GET              # string. HTTP method (http.request).

    value: "hello"           # any. Literal value (value.literal).
    valueType: string        # string. Value type (value.literal).

    # ── Common optional fields ───────────────────────────────

    spec: nodes/x/spec.yaml  # string, optional. Path to external node-spec.yaml
                             # for custom nodes.

    parallel:                # object, optional. Parallel execution config.
      over: input            # string. Input port to split.
      chunks: auto           # integer | "auto". Number of parallel chunks.
      merge: output          # string. Output port to concatenate.

    csvOptions:              # object, optional. CSV parsing options.
      delimiter: ","         # string. Field delimiter.
      hasHeader: true        # boolean. First row is header.
      quote: "\""            # string. Quote character.

Node types reference

Type	Category	Config fields
`file.source`	File I/O	`path`, `format`, `csvOptions`
`file.write`	File I/O	`path`, `format`
`data.sql`	Transform	`query`
`data.filter`	Transform	`expression`
`data.map`	Transform	`expression`
`data.sort`	Transform	`field`, `order`
`data.limit`	Transform	`count`
`data.dedup`	Transform	`fields`
`data.join`	Transform	`on`, `type`
`data.group`	Transform	`by`, `aggregations`
`http.request`	Connector	`url`, `method`, `headers`, `body`
`google.sheets`	Connector	`spreadsheetId`, `range`, `credentials`
`browser.extract`	Connector	`url`, `selector`, `waitFor`
`cli.run`	CLI	`command`, `env`, `sandbox`
`value.literal`	Value	`value`, `valueType`
`router`	Control	`input`, `routes`
`custom`	Custom	`spec` (path to node-spec.yaml)

Edge schema

Edges are strings in the format sourceNode.port -> targetNode.port.

edges:
  # ── Basic edge ─────────────────────────────────────────────
  - "read-leads.data -> filter.input"
    # string, required format: "nodeId.portName -> nodeId.portName"
    # The parser splits on " -> " (space-arrow-space).
    # Port name is after the last dot in each side.

  # ── Multiple edges ─────────────────────────────────────────
  - "filter.output -> score.input"
  - "score.output -> write.records"

  # ── Indexed ports (for multi-input nodes) ──────────────────
  - "source-a.output -> merge.inputs[0]"
  - "source-b.output -> merge.inputs[1]"
    # Bracket notation for indexed input ports.

  # ── Fan-out (one output to multiple inputs) ────────────────
  - "read.data -> branch-a.input"
  - "read.data -> branch-b.input"
    # Same output port can connect to multiple input ports.

Validation rules

The parser enforces these rules at load time:

Every node must have a unique ID.
Every node must have a type field.
Edge source and target node IDs must exist in nodes.
Edge port names must match declared ports on the node type.
Connected ports must have compatible types (Value/Record/Table/Stream).
Connected ports must have compatible schemas (see Type System).
The graph must be a DAG — no cycles.
name and version are required at top level.

Minimal valid pipeline

name: minimal
version: 1
nodes:
  greeting:
    type: value.literal
    valueType: string
    value: "hello world"
edges: []

Complete example

name: lead-scoring
version: 1
description: Read leads, filter active, score by engagement, export top tier

nodes:
  read-leads:
    type: file.source
    path: data/leads.csv
    format: csv
    csvOptions:
      delimiter: ","
      hasHeader: true

  filter-active:
    type: data.filter
    expression: "status = 'active' AND email IS NOT NULL"

  score:
    type: data.sql
    query: |
      SELECT *,
        (clicks * 0.3 + opens * 0.5 + replies * 0.2) AS score
      FROM input
      ORDER BY score DESC

  top-tier:
    type: data.filter
    expression: "score >= 80"

  write-results:
    type: file.write
    path: output/qualified-leads.ndjson
    format: ndjson

edges:
  - "read-leads.data -> filter-active.input"
  - "filter-active.output -> score.input"
  - "score.output -> top-tier.input"
  - "top-tier.output -> write-results.records"