Skip to content

Schemas

Schemas are the contract system. Every port declares a schema. The runtime validates data against schemas at execution boundaries. Mismatches are caught before data flows.

Tables are stored as NDJSON — one JSON object per line. No wrapping array. No trailing comma.

{"email":"a@example.com","name":"Alice","score":92}
{"email":"b@example.com","name":"Bob","score":45}
{"email":"c@example.com","name":"Carol","score":78}

Rules:

  • Each line is a complete, valid JSON object.
  • Lines are separated by \n.
  • No blank lines between records.
  • Field order does not matter.
  • All records in a file share the same schema.

Every NDJSON data file has a companion schema file. For output.ndjson, the schema is output.schema.json.

{
"email": {
"type": "string",
"required": true
},
"name": {
"type": "string"
},
"score": {
"type": "number"
},
"tier": {
"type": "string",
"enum": ["high", "medium", "low"]
}
}

The schema mirrors the FieldSpec structure:

{
"tags": {
"type": "list",
"items": {
"type": "string"
}
},
"address": {
"type": "record",
"schema": {
"city": { "type": "string" },
"zip": { "type": "string" }
}
}
}

Field modifiers:

ModifierTypeDefaultEffect
requiredbooleantrueMissing field = validation error
nullablebooleanfalsenull values allowed
defaultanyUsed when field is absent
enumarrayRestricts to listed values
descriptionstringHuman-readable documentation

Each node output directory contains a _meta.json with execution metadata:

{
"nodeId": "filter-active",
"type": "data.filter",
"startedAt": "2025-01-15T10:00:00.000Z",
"completedAt": "2025-01-15T10:00:00.150Z",
"durationMs": 150,
"outputs": {
"output": {
"rowCount": 1247,
"bytes": 89340
}
}
}

Schemas flow through the graph via edges. The type checker resolves schemas at validation time.

Custom nodes: schemas come from node-spec.yaml. The spec explicitly declares every field on every port.

Data ops: output schemas are inferred from the operation config and the upstream input schema.

  • data.filter: output schema = input schema (rows removed, fields unchanged).
  • data.sort, data.limit, data.dedup: output schema = input schema.
  • data.map: output schema = input schema + mapped fields.
  • data.sql: output schema inferred from the SQL query columns.
  • data.join: output schema = merged fields from left and right inputs.
  • data.partition: both matching and not_matching = input schema.

file.source: schema is inferred from the first rows of the input file at parse time for CSV, or from the schema companion for NDJSON.

When the type checker validates an edge, it compares the source port’s schema against the destination port’s expected schema.

Source output has all fields the destination requires:

# Source node outputs:
output:
type: table
schema:
email: { type: string }
name: { type: string }
score: { type: number }
region: { type: string } # extra field -- OK
# Destination node expects:
input:
type: table
schema:
email: { type: string, required: true }
score: { type: number }

Result: compatible. The name and region fields are passed through but ignored by the destination if not declared.

# Source outputs:
output:
type: table
schema:
name: { type: string }
# Destination expects:
input:
type: table
schema:
email: { type: string, required: true } # not in source
name: { type: string }

Result: error MISSING_REQUIRED — source is missing required field email.

# Source outputs:
output:
type: table
schema:
score: { type: string } # string
# Destination expects:
input:
type: table
schema:
score: { type: number } # number

Result: error TYPE_MISMATCH — field score is string in source but number in destination.

# Source outputs:
output:
type: table
schema:
tier: { type: string, enum: [high, medium, low, unknown] }
# Destination expects:
input:
type: table
schema:
tier: { type: string, enum: [high, medium, low] }

Result: warning ENUM_SUPERSET — source may produce unknown which destination does not expect. Not a blocking error.

CodeSeverityMeaning
MISSING_REQUIREDerrorDestination requires a field source lacks
TYPE_MISMATCHerrorPort type or field type incompatible
MISSING_PORTerrorEdge references a port that does not exist
UNKNOWN_NODEerrorEdge references a node not in the graph
ENUM_SUPERSETwarningSource enum has values destination lacks
EXTRA_FIELDSwarningSource has fields destination ignores
NULLABLE_MISMATCHwarningSource nullable, destination not

Errors block execution. Warnings are reported but do not prevent a run.