Node Spec Format
The node.yaml specification for programmatic node creation.
Every custom node has a node.yaml file. It defines the node’s contract: what data it accepts, what it produces, and what parameters it requires. The runtime enforces this contract at both validation and execution time.
Top-level schema
Section titled “Top-level schema”# node.yaml — complete schema# ─────────────────────────────────────────────
id: enrich-leads # string, required. Unique node identifier. # Lowercase, hyphens only. Must match the # node ID in flow.yaml.
type: custom # string, required. Node type. # custom | deterministic | source | service
description: > # string, optional. What this node does. Enrich lead records with company data from the Clearbit API.
inputs: # object, required. Map of port name → port spec. # ... (see Port schema below)
outputs: # object, required. Map of port name → port spec. # ... (see Port schema below)
params: # object, optional. Map of param name → param spec. # ... (see Param schema below)
sandbox: # object, optional. Sandbox overrides. network: false # boolean. Allow network access. Default: false. timeout: 30000 # integer. Max execution time in ms. Default: 30000.Required fields
Section titled “Required fields”| Field | Type | Constraints |
|---|---|---|
id | string | Lowercase, hyphens only. Must match the node ID in flow.yaml. |
type | string | One of: custom, deterministic, source, service. |
inputs | object | At least one port (unless type is source). |
outputs | object | At least one port. |
Optional fields
Section titled “Optional fields”| Field | Type | Default | Description |
|---|---|---|---|
description | string | "" | Human-readable description. |
params | object | {} | Configuration parameters. |
sandbox | object | See defaults | Sandbox overrides. |
Node types
Section titled “Node types”| Type | Description | Inputs | Network |
|---|---|---|---|
deterministic | Pure data transform. Same input always produces same output. | Required | No |
source | Data producer. No input ports. Reads from external systems. | None | Typically yes |
service | Calls an external API or service. | Required | Yes |
custom | General-purpose. Use when other types don’t fit. | Required | Configurable |
deterministic
Section titled “deterministic”For SQL transforms, filters, maps, and any logic where the output is a pure function of the input. No side effects, no network access.
id: score-leadstype: deterministicinputs: leads: type: Table schema: name: { type: string } email: { type: string } clicks: { type: integer }outputs: scored: type: Table schema: name: { type: string } email: { type: string } score: { type: number }source
Section titled “source”For nodes that produce data from external systems. No input ports. Typically requires network access.
id: read-sheetstype: sourceinputs: {}outputs: data: type: Table schema: name: { type: string } email: { type: string } status: { type: string }params: spreadsheet_id: type: string required: true range: type: string default: "Sheet1!A:Z"sandbox: network: trueservice
Section titled “service”For nodes that call external APIs. Has inputs and outputs. Requires network access.
id: enrich-clearbittype: serviceinputs: leads: type: Table schema: email: { type: string }outputs: enriched: type: Table schema: email: { type: string } company: { type: string } industry: { type: string }params: api_key: type: string required: true secret: truesandbox: network: true timeout: 60000custom
Section titled “custom”General-purpose type. Use for nodes that don’t fit the other categories. Sandbox defaults to maximum restriction.
id: generate-reporttype: custominputs: data: type: Table schema: name: { type: string } score: { type: number }outputs: report: type: Record schema: total: { type: integer } average_score: { type: number } top_name: { type: string }Port schema
Section titled “Port schema”Ports are typed connection points. Each input port declares the data type it accepts. Each output port declares the data type it produces.
inputs: leads: # string (port name). Lowercase, hyphens. type: Table # string, required. One of: Value, Record, Table, Stream.
schema: # object, required for Table and Record. name: # field name type: string # string, required. Field type: # string, number, integer, boolean, null, # array, object required: true # boolean, optional. Default: true. description: "" # string, optional. Field description.
email: type: string
score: type: number required: false # Optional field — may be absent in input data.
description: > # string, optional. Port description. Leads table with contact info and scores.
outputs: enriched: type: Table schema: name: { type: string } email: { type: string } score: { type: number } company: { type: string } industry: { type: string } employee_count: { type: integer }Port types
Section titled “Port types”| Type | Schema required | Description |
|---|---|---|
Value | No | Single scalar: string, number, boolean, null. |
Record | Yes | Single JSON object with named fields. |
Table | Yes | Ordered collection of records (rows). |
Stream | Yes | Unbounded sequence of records. |
Value ports carry a single typed scalar and do not have a schema. Record, Table, and Stream ports require a schema defining their fields.
Schema field properties
Section titled “Schema field properties”| Property | Type | Required | Default | Description |
|---|---|---|---|---|
type | string | yes | — | string, number, integer, boolean, null, array, object |
required | boolean | no | true | Whether the field must be present. |
description | string | no | "" | Human-readable field description. |
Param schema
Section titled “Param schema”Parameters configure the node’s behavior. They are set in flow.yaml and passed to the node at execution time.
params: api_key: # string (param name). Lowercase, underscores. type: string # string, required. Param type: # string, number, integer, boolean required: true # boolean, optional. Default: false. description: > # string, optional. Clearbit API key for enrichment lookups. secret: true # boolean, optional. Stored encrypted. Default: false.
batch_size: type: integer required: false default: 100 # any, optional. Default value if not provided. description: > Number of records to process per API call.
include_funding: type: boolean required: false default: false description: > Include funding data in enrichment results.Param fields
Section titled “Param fields”| Field | Type | Required | Default | Description |
|---|---|---|---|---|
type | string | yes | — | string, number, integer, boolean |
required | boolean | no | false | Whether the param must be provided. |
default | any | no | — | Default value when param is not set. |
description | string | no | "" | Human-readable description. |
secret | boolean | no | false | Store encrypted, inject as env var. |
enum | array | no | — | Allowed values. Validation rejects others. |
Implementation files
Section titled “Implementation files”Each node directory contains an implementation file. The runtime determines which file to execute based on what exists:
| File | Language | Execution |
|---|---|---|
main.py | Python | Executed via Python runtime |
main.js | JavaScript | Executed via Node.js runtime |
main.sql | SQL | Executed via DuckDB |
run.sh | Shell | Executed via nix-shell + bubblewrap |
The runtime looks for these files in order: main.sql, main.py, main.js, run.sh. The first match wins.
Validation rules
Section titled “Validation rules”idmust be lowercase with hyphens only. Must match the node ID inflow.yaml.typemust be one of:custom,deterministic,source,service.inputsis required (can be empty{}forsourcetype only).outputsis required and must have at least one port.- Port
typemust be one of:Value,Record,Table,Stream. - Ports of type
Record,Table, orStreammust have aschema. - Schema field
typemust be one of:string,number,integer,boolean,null,array,object. - Params marked
required: truemust be provided inflow.yaml. - Params with
enumreject values not in the list. sandbox.timeoutmust be a positive integer (milliseconds).
File layout
Section titled “File layout”Custom nodes live in the nodes/ directory of the pipeline workspace:
nodes/ enrich-leads/ node.yaml # this file — the contract main.py # implementation (generated or hand-written) schemas/ leads.schema.json # input schema (auto-generated from spec) enriched.schema.json # output schema (auto-generated from spec) artifacts/ enriched.ndjson # output data (written at execution time) errors.ndjson # error output (written at execution time)The runtime reads node.yaml, validates inputs against the declared schemas, executes the implementation file (main.py, main.sql, main.js, or run.sh), and validates outputs before passing them downstream.
Complete example
Section titled “Complete example”A node that reads leads from a Table, enriches them via an external API, and outputs an enriched Table:
id: enrich-leadstype: servicedescription: > Look up company data for each lead using the Clearbit API. Adds company name, industry, and employee count.
inputs: leads: type: Table schema: name: { type: string } email: { type: string } score: { type: number, required: false } description: Raw leads with at least name and email.
outputs: enriched: type: Table schema: name: { type: string } email: { type: string } score: { type: number, required: false } company: { type: string } industry: { type: string } employee_count: { type: integer } description: Leads enriched with Clearbit company data.
errors: type: Table schema: email: { type: string } error: { type: string } description: Leads that failed enrichment lookup.
params: api_key: type: string required: true secret: true description: Clearbit API key.
batch_size: type: integer required: false default: 50 description: Records per API batch.
timeout_ms: type: integer required: false default: 5000 description: Per-request timeout in milliseconds.
sandbox: network: true timeout: 60000