Schemas
Schemas are the contract system. Every port declares a schema. The runtime validates data against schemas at execution boundaries. Mismatches are caught before data flows.
NDJSON format
Section titled “NDJSON format”Tables are stored as NDJSON — one JSON object per line. No wrapping array. No trailing comma.
{"email":"a@example.com","name":"Alice","score":92}{"email":"b@example.com","name":"Bob","score":45}{"email":"c@example.com","name":"Carol","score":78}Rules:
- Each line is a complete, valid JSON object.
- Lines are separated by
\n. - No blank lines between records.
- Field order does not matter.
- All records in a file share the same schema.
Companion .schema.json
Section titled “Companion .schema.json”Every NDJSON data file has a companion schema file. For output.ndjson, the schema is output.schema.json.
{ "email": { "type": "string", "required": true }, "name": { "type": "string" }, "score": { "type": "number" }, "tier": { "type": "string", "enum": ["high", "medium", "low"] }}The schema mirrors the FieldSpec structure:
{ "tags": { "type": "list", "items": { "type": "string" } }, "address": { "type": "record", "schema": { "city": { "type": "string" }, "zip": { "type": "string" } } }}Field modifiers:
| Modifier | Type | Default | Effect |
|---|---|---|---|
required | boolean | true | Missing field = validation error |
nullable | boolean | false | null values allowed |
default | any | — | Used when field is absent |
enum | array | — | Restricts to listed values |
description | string | — | Human-readable documentation |
Metadata files
Section titled “Metadata files”Each node output directory contains a _meta.json with execution metadata:
{ "nodeId": "filter-active", "type": "data.filter", "startedAt": "2025-01-15T10:00:00.000Z", "completedAt": "2025-01-15T10:00:00.150Z", "durationMs": 150, "outputs": { "output": { "rowCount": 1247, "bytes": 89340 } }}Schema propagation
Section titled “Schema propagation”Schemas flow through the graph via edges. The type checker resolves schemas at validation time.
Custom nodes: schemas come from node-spec.yaml. The spec explicitly declares every field on every port.
Data ops: output schemas are inferred from the operation config and the upstream input schema.
data.filter: output schema = input schema (rows removed, fields unchanged).data.sort,data.limit,data.dedup: output schema = input schema.data.map: output schema = input schema + mapped fields.data.sql: output schema inferred from the SQL query columns.data.join: output schema = merged fields fromleftandrightinputs.data.partition: bothmatchingandnot_matching= input schema.
file.source: schema is inferred from the first rows of the input file at parse time for CSV, or from the schema companion for NDJSON.
Edge validation
Section titled “Edge validation”When the type checker validates an edge, it compares the source port’s schema against the destination port’s expected schema.
Valid connection
Section titled “Valid connection”Source output has all fields the destination requires:
# Source node outputs:output: type: table schema: email: { type: string } name: { type: string } score: { type: number } region: { type: string } # extra field -- OK
# Destination node expects:input: type: table schema: email: { type: string, required: true } score: { type: number }Result: compatible. The name and region fields are passed through but ignored by the destination if not declared.
Invalid: missing required field
Section titled “Invalid: missing required field”# Source outputs:output: type: table schema: name: { type: string }
# Destination expects:input: type: table schema: email: { type: string, required: true } # not in source name: { type: string }Result: error MISSING_REQUIRED — source is missing required field email.
Invalid: type mismatch
Section titled “Invalid: type mismatch”# Source outputs:output: type: table schema: score: { type: string } # string
# Destination expects:input: type: table schema: score: { type: number } # numberResult: error TYPE_MISMATCH — field score is string in source but number in destination.
Warning: enum superset
Section titled “Warning: enum superset”# Source outputs:output: type: table schema: tier: { type: string, enum: [high, medium, low, unknown] }
# Destination expects:input: type: table schema: tier: { type: string, enum: [high, medium, low] }Result: warning ENUM_SUPERSET — source may produce unknown which destination does not expect. Not a blocking error.
Error codes
Section titled “Error codes”| Code | Severity | Meaning |
|---|---|---|
MISSING_REQUIRED | error | Destination requires a field source lacks |
TYPE_MISMATCH | error | Port type or field type incompatible |
MISSING_PORT | error | Edge references a port that does not exist |
UNKNOWN_NODE | error | Edge references a node not in the graph |
ENUM_SUPERSET | warning | Source enum has values destination lacks |
EXTRA_FIELDS | warning | Source has fields destination ignores |
NULLABLE_MISMATCH | warning | Source nullable, destination not |
Errors block execution. Warnings are reported but do not prevent a run.