Skip to content

Type System

Four types. That’s it. Everything in Radhflow is a Value, Record, Table, or Stream.

Most data pipeline tools give you a dozen primitives, half of which overlap. Radhflow reduces to four because that’s all you actually need:

  • Simplicity. Four types fit in your head. You don’t need to look up whether you want a DataFrame, Dataset, or DataStream.
  • Composability. Every node port uses one of these four types. Any output port can connect to any input port of the same type with a compatible schema.
  • Serialization. All four types map directly to JSON. No custom binary formats, no language-specific serialization. NDJSON files are the interchange format.
┌───────────────────────────────────────────┐
│ Singular │
│ │
│ Value "hello" 42 true null │
│ Record { name: "Alice", score: 91 } │
│ │
├───────────────────────────────────────────┤
│ Plural │
│ │
│ Table [ {row1}, {row2}, ... {rowN} ] │
│ Stream {row1}↓ {row2}↓ {row3}↓ ... │
│ │
└───────────────────────────────────────────┘

A single scalar: string, number, boolean, or null.

"hello world"
42
true
null

Use Value for configuration inputs, thresholds, file paths, API keys, and single-result outputs. Value ports have no schema — the type is the scalar itself.

A single JSON object with named, typed fields.

{ "name": "Alice", "email": "alice@example.com", "score": 91 }

Use Record for configuration blocks, single-row lookups, and API responses. Record ports require a schema defining their fields.

An ordered collection of Records. Bounded — all rows exist on disk or in memory. Stored as NDJSON (one JSON object per line).

{"name":"Alice","email":"alice@example.com","score":91}
{"name":"Bob","email":"bob@example.com","score":47}
{"name":"Carol","email":"carol@example.com","score":83}

Tables are the most common type. Most pipelines read a Table, transform it through SQL or data ops, and write a Table. Table ports require a schema.

Like Table, but rows arrive incrementally. Unbounded — you process rows as they arrive without loading everything into memory.

Use Stream for webhook feeds, log tails, message queues, and any source where you don’t know the total row count up front. Stream ports require a schema. The runtime handles backpressure — if a downstream node is slower than the producer, the runtime buffers or pauses the source.

When an edge connects an output port to an input port, the runtime checks type and schema compatibility. The rules follow a principle: a producer can give more than the consumer expects, but never less.

ScenarioResult
Extra fields in sourceAllowed. Downstream sees a superset.
Missing required fieldError at validation time.
integer output to number inputAllowed. Integer is a subset of number.
number output to integer inputError. Potential data loss.
Enum subset (source has fewer values)Allowed.
Enum superset (source has extra values)Error. Downstream cannot handle unknown values.
Field type mismatch (string to number)Error at validation time.

Schema enforcement: compile-time vs runtime

Section titled “Schema enforcement: compile-time vs runtime”

Schema checking happens twice:

  1. At compile time (when you run rf validate or rf run): the graph parser reads every node.yaml, resolves edges, and checks that connected ports have compatible schemas. If schema A is not compatible with schema B, the pipeline is rejected before any node executes.

  2. At runtime (during execution): the executor validates actual data against declared schemas before passing it to the next node. This catches data-level issues that static analysis can’t — like a field that’s declared number but contains "N/A" in the source data.

node.yaml node.yaml
┌──────────┐ ┌──────────┐
│ output: │ │ input: │
│ Table │───edge─────▶│ Table │
│ schema A│ │ schema B│
└──────────┘ └──────────┘
│ │
└────────┐ ┌──────────┘
▼ ▼
compatibility check
(A must satisfy B)

If schema A is not compatible with schema B, the graph parser rejects the pipeline before any node executes. You see the error immediately — not after processing half your data.

Every Table and Record port has a schema. Schemas define fields and their types. They live as .schema.json files alongside the data.

{
"fields": {
"name": { "type": "string" },
"email": { "type": "string" },
"score": { "type": "number" },
"active": { "type": "boolean" }
}
}

Supported field types: string, number, integer, boolean, null, array, object. Nested objects and arrays use standard JSON Schema structure.

Nodes declare schemas on their ports in node.yaml:

id: score-leads
type: deterministic
inputs:
leads:
type: Table
schema:
name: { type: string }
email: { type: string }
clicks: { type: integer }
opens: { type: integer }
outputs:
scored:
type: Table
schema:
name: { type: string }
email: { type: string }
score: { type: number }

Tables flow between nodes as NDJSON — Newline Delimited JSON. One JSON object per line. Each Table has a companion .schema.json file.

nodes/read-leads/artifacts/
leads.ndjson # data
leads.schema.json # schema

The NDJSON file:

{"name":"Alice","email":"alice@example.com","clicks":42,"opens":18}
{"name":"Bob","email":"bob@example.com","clicks":7,"opens":3}
{"name":"Carol","email":"carol@example.com","clicks":91,"opens":55}

Why NDJSON and not CSV or JSON arrays?

  • Not CSV because CSV has no type information, ambiguous quoting, no nested objects, and encoding chaos.
  • Not JSON arrays because a JSON array requires loading the entire file into memory to parse. NDJSON is streamable — you can process one line at a time.
  • NDJSON is human-readable, diffable in Git, streamable, and supported by every language. DuckDB reads it natively with no import step.

Every NDJSON file has a companion .schema.json that lives in the same directory. The schema file describes the fields and types:

{
"fields": {
"name": { "type": "string" },
"email": { "type": "string" },
"clicks": { "type": "integer" },
"opens": { "type": "integer" }
}
}

Schema files are auto-generated from node.yaml port declarations. You don’t write them by hand. The executor creates them when a node produces output, and validates against them when a downstream node consumes input.

SQL nodes execute against DuckDB. The runtime loads input Tables as DuckDB tables, runs the SQL query, and produces an output Table.

score:
type: data.sql
query: |
SELECT *,
(clicks * 0.3 + opens * 0.5) AS score
FROM input
ORDER BY score DESC

DuckDB reads NDJSON natively. No import step. The query runs in-process with columnar execution — fast even on large Tables.