Skip to content

Type System

Radhflow has four data types. Every port on every node uses one of them. Schemas are validated at construction time — before any code runs.

TypeWhat it isExample
ValueA single scalar: string, number, boolean, null.An API key. A threshold. A file path.
RecordA single JSON object with named fields.One user profile. One config block.
TableAn ordered collection of records (rows).A CSV import. A query result. A report.
StreamAn unbounded sequence of records.A webhook feed. A log tail. A message queue.

Value and Record are singular. Table and Stream are plural. Table is bounded (all rows in memory or on disk). Stream is unbounded (processed incrementally).

Tables are the most common type. Most pipelines read a Table, transform it through SQL or data ops, and write a Table.

Every Table and Record port has a schema. Schemas define fields and their types. They live as .schema.json files alongside the data.

{
"fields": {
"name": { "type": "string" },
"email": { "type": "string" },
"score": { "type": "number" },
"active": { "type": "boolean" }
}
}

Supported field types: string, number, integer, boolean, null, array, object. Nested objects and arrays use standard JSON Schema structure.

When an edge connects an output port to an input port, the runtime checks schema compatibility. The rules are lenient by design — pipelines should not break because a source added a column.

ScenarioResult
Extra fields in sourceAllowed. Downstream sees a superset.
Missing required fieldError at validation time.
integer output to number inputAllowed. Integer is a subset of number.
number output to integer inputError. Potential data loss.
Enum subset (source has fewer values)Allowed.
Enum superset (source has extra values)Error. Downstream cannot handle unknown values.
Field type mismatch (string to number)Error at validation time.

The principle: a producer can give more than the consumer expects, but never less.

Nodes declare schemas on their ports in node-spec.yaml:

id: score-leads
type: deterministic
inputs:
leads:
type: Table
schema:
name: { type: string }
email: { type: string }
clicks: { type: integer }
opens: { type: integer }
outputs:
scored:
type: Table
schema:
name: { type: string }
email: { type: string }
score: { type: number }

When edges connect ports, the graph parser checks compatibility between the output schema of the source and the input schema of the target. This happens at construction time — before execution.

Tables flow between nodes as NDJSON — one JSON object per line. Each Table has a companion .schema.json file.

nodes/read-leads/artifacts/
leads.ndjson # data
leads.schema.json # schema

The NDJSON file:

{"name":"Alice","email":"alice@example.com","clicks":42,"opens":18}
{"name":"Bob","email":"bob@example.com","clicks":7,"opens":3}
{"name":"Carol","email":"carol@example.com","clicks":91,"opens":55}

NDJSON is human-readable, diffable in Git, streamable, and supported by every language. No proprietary format. No binary serialization.

SQL nodes execute against DuckDB. The runtime loads input Tables as DuckDB tables, runs the SQL query, and produces an output Table.

score:
type: data.sql
query: |
SELECT *,
(clicks * 0.3 + opens * 0.5) AS score
FROM input
ORDER BY score DESC

DuckDB reads NDJSON natively. No import step. The query runs in-process with columnar execution — fast even on large Tables.

SQL nodes run inside the DuckDB sandbox. They have no filesystem access, no network access, no ability to execute external commands. The only thing a SQL node can do is query data.

node-spec.yaml node-spec.yaml
┌──────────┐ ┌──────────┐
│ output: │ │ input: │
│ Table │───edge────▶│ Table │
│ schema A│ │ schema B│
└──────────┘ └──────────┘
│ │
└───────┐ ┌───────────┘
▼ ▼
compatibility check
(A must satisfy B)

If schema A is not compatible with schema B, the graph parser rejects the pipeline before any node executes. You see the error immediately — not after processing half your data.