Schemas

Schemas define the shape of data at each port. Every port declares a schema. The runtime validates data against schemas at execution boundaries. Mismatches are caught before data flows.

Where schemas live

Custom nodes declare schemas in their node-spec.yaml:

nodes/
  score-calculator/
    node-spec.yaml          # declares input/output schemas
    schemas/
      input.schema.json     # generated from spec
      output.schema.json    # generated from spec
    main.js                 # implementation

At execution time, data files and their companion schemas are written to the node’s output directory:

nodes/
  score-calculator/
    output/
      scored.ndjson         # output data
      scored.schema.json    # output schema

Minimal schema example

A schema is a JSON object where keys are field names and values describe the field type and constraints:

{
  "email": {
    "type": "string",
    "required": true
  },
  "score": {
    "type": "number"
  },
  "tier": {
    "type": "string",
    "enum": ["high", "medium", "low"]
  }
}

In node-spec.yaml, the same schema is declared in YAML:

inputs:
  records:
    type: table
    schema:
      email:
        type: string
        required: true
      score:
        type: number
      tier:
        type: string
        enum: [high, medium, low]

Schema validation

Validation happens at two points:

Before execution (static). The type checker compares schemas across edges. It verifies that source port schemas satisfy destination port requirements — required fields exist, types match, enum constraints hold. This happens during rf validate and at the start of rf run.

At edge boundaries (runtime). When a node finishes executing, the runtime validates its output data against the declared output schema. When a node starts, its input data is validated against the declared input schema.

Validation errors block execution. Warnings (like enum supersets or extra fields) are logged but do not prevent a run.

Schema inference

Not all schemas need to be declared manually. Radhflow infers schemas in several cases:

file.source nodes. For CSV files, the schema is inferred from the header row and first N data rows. For NDJSON files, the companion .schema.json is used directly.

Data operations. Output schemas are computed from the operation config and the upstream input schema:

Operation	Schema rule
`data.filter`	Output = input (rows removed, fields unchanged)
`data.sort`	Output = input (rows reordered)
`data.limit`	Output = input (rows capped)
`data.dedup`	Output = input (duplicates removed)
`data.map`	Output = input fields + mapped fields (or mapped only)
`data.sql`	Output inferred from SQL query columns
`data.join`	Output = merged fields from `left` and `right` inputs
`data.partition`	Both `matching` and `not_matching` = input schema
`data.group`	Output = group-by fields + aggregation result columns

Custom nodes. Schemas must be declared explicitly in node-spec.yaml. No inference happens.

Supported JSON Schema features

Feature	Syntax	Description
Type	`"type": "string"`	Field type (see Data Types)
Required	`"required": true`	Field must be present (default: true)
Nullable	`"nullable": true`	Allows null values
Default	`"default": 0`	Value used when field is absent
Enum	`"enum": ["a", "b"]`	Restricts to listed values
Description	`"description": "User email"`	Human-readable documentation
List items	`"items": {"type": "string"}`	Type of elements in a list field
Nested record	`"schema": {"city": {...}}`	Fields within a record field

Common patterns

Optional fields

Mark fields as not required with required: false. They may be absent from records.

schema:
  email:
    type: string
    required: true
  phone:
    type: string
    required: false
  nickname:
    type: string
    required: false
    default: "Anonymous"

When phone is absent, it is simply missing from the output. When nickname is absent, the default value "Anonymous" is used.

Arrays

Use the list type with an items declaration:

schema:
  tags:
    type: list
    items:
      type: string
  scores:
    type: list
    items:
      type: number

Nested objects

Use the record type with a nested schema:

schema:
  address:
    type: record
    schema:
      street:
        type: string
      city:
        type: string
      zip:
        type: string
        required: true
      country:
        type: string
        enum: [US, DE, GB, FR]

Nullable fields

Allow null values with nullable: true:

schema:
  score:
    type: number
    nullable: true
  last_login:
    type: timestamp
    nullable: true

This is distinct from required: false. A required-but-nullable field must be present, but its value can be null.

Edge validation

When the type checker validates an edge, it compares the source port’s schema against the destination port’s expected schema.

Valid connection

Source has all fields the destination requires:

# Source outputs: email, name, score, region
# Destination expects: email (required), score

# Result: compatible. name and region are passed through but ignored.

Missing required field

# Source outputs: name
# Destination expects: email (required), name

# Result: error MISSING_REQUIRED — source is missing required field email.

Type mismatch

# Source outputs: score (string)
# Destination expects: score (number)

# Result: error TYPE_MISMATCH — field score is string in source but number in destination.

Error codes

Code	Severity	Meaning
`MISSING_REQUIRED`	error	Destination requires a field source lacks
`TYPE_MISMATCH`	error	Port type or field type incompatible
`MISSING_PORT`	error	Edge references a port that does not exist
`UNKNOWN_NODE`	error	Edge references a node not in the graph
`ENUM_SUPERSET`	warning	Source enum has values destination lacks
`EXTRA_FIELDS`	warning	Source has fields destination ignores
`NULLABLE_MISMATCH`	warning	Source nullable, destination not

Errors block execution. Warnings are reported but do not prevent a run.