Skip to content

Node Spec Format

The node.yaml specification for programmatic node creation.

Every custom node has a node.yaml file. It defines the node’s contract: what data it accepts, what it produces, and what parameters it requires. The runtime enforces this contract at both validation and execution time.

# node.yaml — complete schema
# ─────────────────────────────────────────────
id: enrich-leads # string, required. Unique node identifier.
# Lowercase, hyphens only. Must match the
# node ID in flow.yaml.
type: custom # string, required. Node type.
# custom | deterministic | source | service
description: > # string, optional. What this node does.
Enrich lead records with company data
from the Clearbit API.
inputs: # object, required. Map of port name → port spec.
# ... (see Port schema below)
outputs: # object, required. Map of port name → port spec.
# ... (see Port schema below)
params: # object, optional. Map of param name → param spec.
# ... (see Param schema below)
sandbox: # object, optional. Sandbox overrides.
network: false # boolean. Allow network access. Default: false.
timeout: 30000 # integer. Max execution time in ms. Default: 30000.
FieldTypeConstraints
idstringLowercase, hyphens only. Must match the node ID in flow.yaml.
typestringOne of: custom, deterministic, source, service.
inputsobjectAt least one port (unless type is source).
outputsobjectAt least one port.
FieldTypeDefaultDescription
descriptionstring""Human-readable description.
paramsobject{}Configuration parameters.
sandboxobjectSee defaultsSandbox overrides.
TypeDescriptionInputsNetwork
deterministicPure data transform. Same input always produces same output.RequiredNo
sourceData producer. No input ports. Reads from external systems.NoneTypically yes
serviceCalls an external API or service.RequiredYes
customGeneral-purpose. Use when other types don’t fit.RequiredConfigurable

For SQL transforms, filters, maps, and any logic where the output is a pure function of the input. No side effects, no network access.

id: score-leads
type: deterministic
inputs:
leads:
type: Table
schema:
name: { type: string }
email: { type: string }
clicks: { type: integer }
outputs:
scored:
type: Table
schema:
name: { type: string }
email: { type: string }
score: { type: number }

For nodes that produce data from external systems. No input ports. Typically requires network access.

id: read-sheets
type: source
inputs: {}
outputs:
data:
type: Table
schema:
name: { type: string }
email: { type: string }
status: { type: string }
params:
spreadsheet_id:
type: string
required: true
range:
type: string
default: "Sheet1!A:Z"
sandbox:
network: true

For nodes that call external APIs. Has inputs and outputs. Requires network access.

id: enrich-clearbit
type: service
inputs:
leads:
type: Table
schema:
email: { type: string }
outputs:
enriched:
type: Table
schema:
email: { type: string }
company: { type: string }
industry: { type: string }
params:
api_key:
type: string
required: true
secret: true
sandbox:
network: true
timeout: 60000

General-purpose type. Use for nodes that don’t fit the other categories. Sandbox defaults to maximum restriction.

id: generate-report
type: custom
inputs:
data:
type: Table
schema:
name: { type: string }
score: { type: number }
outputs:
report:
type: Record
schema:
total: { type: integer }
average_score: { type: number }
top_name: { type: string }

Ports are typed connection points. Each input port declares the data type it accepts. Each output port declares the data type it produces.

inputs:
leads: # string (port name). Lowercase, hyphens.
type: Table # string, required. One of: Value, Record, Table, Stream.
schema: # object, required for Table and Record.
name: # field name
type: string # string, required. Field type:
# string, number, integer, boolean, null,
# array, object
required: true # boolean, optional. Default: true.
description: "" # string, optional. Field description.
email:
type: string
score:
type: number
required: false # Optional field — may be absent in input data.
description: > # string, optional. Port description.
Leads table with contact info and scores.
outputs:
enriched:
type: Table
schema:
name: { type: string }
email: { type: string }
score: { type: number }
company: { type: string }
industry: { type: string }
employee_count: { type: integer }
TypeSchema requiredDescription
ValueNoSingle scalar: string, number, boolean, null.
RecordYesSingle JSON object with named fields.
TableYesOrdered collection of records (rows).
StreamYesUnbounded sequence of records.

Value ports carry a single typed scalar and do not have a schema. Record, Table, and Stream ports require a schema defining their fields.

PropertyTypeRequiredDefaultDescription
typestringyesstring, number, integer, boolean, null, array, object
requiredbooleannotrueWhether the field must be present.
descriptionstringno""Human-readable field description.

Parameters configure the node’s behavior. They are set in flow.yaml and passed to the node at execution time.

params:
api_key: # string (param name). Lowercase, underscores.
type: string # string, required. Param type:
# string, number, integer, boolean
required: true # boolean, optional. Default: false.
description: > # string, optional.
Clearbit API key for enrichment lookups.
secret: true # boolean, optional. Stored encrypted. Default: false.
batch_size:
type: integer
required: false
default: 100 # any, optional. Default value if not provided.
description: >
Number of records to process per API call.
include_funding:
type: boolean
required: false
default: false
description: >
Include funding data in enrichment results.
FieldTypeRequiredDefaultDescription
typestringyesstring, number, integer, boolean
requiredbooleannofalseWhether the param must be provided.
defaultanynoDefault value when param is not set.
descriptionstringno""Human-readable description.
secretbooleannofalseStore encrypted, inject as env var.
enumarraynoAllowed values. Validation rejects others.

Each node directory contains an implementation file. The runtime determines which file to execute based on what exists:

FileLanguageExecution
main.pyPythonExecuted via Python runtime
main.jsJavaScriptExecuted via Node.js runtime
main.sqlSQLExecuted via DuckDB
run.shShellExecuted via nix-shell + bubblewrap

The runtime looks for these files in order: main.sql, main.py, main.js, run.sh. The first match wins.

  1. id must be lowercase with hyphens only. Must match the node ID in flow.yaml.
  2. type must be one of: custom, deterministic, source, service.
  3. inputs is required (can be empty {} for source type only).
  4. outputs is required and must have at least one port.
  5. Port type must be one of: Value, Record, Table, Stream.
  6. Ports of type Record, Table, or Stream must have a schema.
  7. Schema field type must be one of: string, number, integer, boolean, null, array, object.
  8. Params marked required: true must be provided in flow.yaml.
  9. Params with enum reject values not in the list.
  10. sandbox.timeout must be a positive integer (milliseconds).

Custom nodes live in the nodes/ directory of the pipeline workspace:

nodes/
enrich-leads/
node.yaml # this file — the contract
main.py # implementation (generated or hand-written)
schemas/
leads.schema.json # input schema (auto-generated from spec)
enriched.schema.json # output schema (auto-generated from spec)
artifacts/
enriched.ndjson # output data (written at execution time)
errors.ndjson # error output (written at execution time)

The runtime reads node.yaml, validates inputs against the declared schemas, executes the implementation file (main.py, main.sql, main.js, or run.sh), and validates outputs before passing them downstream.

A node that reads leads from a Table, enriches them via an external API, and outputs an enriched Table:

id: enrich-leads
type: service
description: >
Look up company data for each lead using the Clearbit API.
Adds company name, industry, and employee count.
inputs:
leads:
type: Table
schema:
name: { type: string }
email: { type: string }
score: { type: number, required: false }
description: Raw leads with at least name and email.
outputs:
enriched:
type: Table
schema:
name: { type: string }
email: { type: string }
score: { type: number, required: false }
company: { type: string }
industry: { type: string }
employee_count: { type: integer }
description: Leads enriched with Clearbit company data.
errors:
type: Table
schema:
email: { type: string }
error: { type: string }
description: Leads that failed enrichment lookup.
params:
api_key:
type: string
required: true
secret: true
description: Clearbit API key.
batch_size:
type: integer
required: false
default: 50
description: Records per API batch.
timeout_ms:
type: integer
required: false
default: 5000
description: Per-request timeout in milliseconds.
sandbox:
network: true
timeout: 60000