Skip to content

Schemas

Schemas define the shape of data at each port. Every port declares a schema. The runtime validates data against schemas at execution boundaries. Mismatches are caught before data flows.

Custom nodes declare schemas in their node-spec.yaml:

nodes/
score-calculator/
node-spec.yaml # declares input/output schemas
schemas/
input.schema.json # generated from spec
output.schema.json # generated from spec
main.js # implementation

At execution time, data files and their companion schemas are written to the node’s output directory:

nodes/
score-calculator/
output/
scored.ndjson # output data
scored.schema.json # output schema

A schema is a JSON object where keys are field names and values describe the field type and constraints:

{
"email": {
"type": "string",
"required": true
},
"score": {
"type": "number"
},
"tier": {
"type": "string",
"enum": ["high", "medium", "low"]
}
}

In node-spec.yaml, the same schema is declared in YAML:

inputs:
records:
type: table
schema:
email:
type: string
required: true
score:
type: number
tier:
type: string
enum: [high, medium, low]

Validation happens at two points:

Before execution (static). The type checker compares schemas across edges. It verifies that source port schemas satisfy destination port requirements — required fields exist, types match, enum constraints hold. This happens during rf validate and at the start of rf run.

At edge boundaries (runtime). When a node finishes executing, the runtime validates its output data against the declared output schema. When a node starts, its input data is validated against the declared input schema.

Validation errors block execution. Warnings (like enum supersets or extra fields) are logged but do not prevent a run.

Not all schemas need to be declared manually. Radhflow infers schemas in several cases:

file.source nodes. For CSV files, the schema is inferred from the header row and first N data rows. For NDJSON files, the companion .schema.json is used directly.

Data operations. Output schemas are computed from the operation config and the upstream input schema:

OperationSchema rule
data.filterOutput = input (rows removed, fields unchanged)
data.sortOutput = input (rows reordered)
data.limitOutput = input (rows capped)
data.dedupOutput = input (duplicates removed)
data.mapOutput = input fields + mapped fields (or mapped only)
data.sqlOutput inferred from SQL query columns
data.joinOutput = merged fields from left and right inputs
data.partitionBoth matching and not_matching = input schema
data.groupOutput = group-by fields + aggregation result columns

Custom nodes. Schemas must be declared explicitly in node-spec.yaml. No inference happens.

FeatureSyntaxDescription
Type"type": "string"Field type (see Data Types)
Required"required": trueField must be present (default: true)
Nullable"nullable": trueAllows null values
Default"default": 0Value used when field is absent
Enum"enum": ["a", "b"]Restricts to listed values
Description"description": "User email"Human-readable documentation
List items"items": {"type": "string"}Type of elements in a list field
Nested record"schema": {"city": {...}}Fields within a record field

Mark fields as not required with required: false. They may be absent from records.

schema:
email:
type: string
required: true
phone:
type: string
required: false
nickname:
type: string
required: false
default: "Anonymous"

When phone is absent, it is simply missing from the output. When nickname is absent, the default value "Anonymous" is used.

Use the list type with an items declaration:

schema:
tags:
type: list
items:
type: string
scores:
type: list
items:
type: number

Use the record type with a nested schema:

schema:
address:
type: record
schema:
street:
type: string
city:
type: string
zip:
type: string
required: true
country:
type: string
enum: [US, DE, GB, FR]

Allow null values with nullable: true:

schema:
score:
type: number
nullable: true
last_login:
type: timestamp
nullable: true

This is distinct from required: false. A required-but-nullable field must be present, but its value can be null.

When the type checker validates an edge, it compares the source port’s schema against the destination port’s expected schema.

Source has all fields the destination requires:

# Source outputs: email, name, score, region
# Destination expects: email (required), score
# Result: compatible. name and region are passed through but ignored.
# Source outputs: name
# Destination expects: email (required), name
# Result: error MISSING_REQUIRED — source is missing required field email.
# Source outputs: score (string)
# Destination expects: score (number)
# Result: error TYPE_MISMATCH — field score is string in source but number in destination.
CodeSeverityMeaning
MISSING_REQUIREDerrorDestination requires a field source lacks
TYPE_MISMATCHerrorPort type or field type incompatible
MISSING_PORTerrorEdge references a port that does not exist
UNKNOWN_NODEerrorEdge references a node not in the graph
ENUM_SUPERSETwarningSource enum has values destination lacks
EXTRA_FIELDSwarningSource has fields destination ignores
NULLABLE_MISMATCHwarningSource nullable, destination not

Errors block execution. Warnings are reported but do not prevent a run.