Type System
Four types. That’s it. Everything in Radhflow is a Value, Record, Table, or Stream.
Why four types
Section titled “Why four types”Most data pipeline tools give you a dozen primitives, half of which overlap. Radhflow reduces to four because that’s all you actually need:
- Simplicity. Four types fit in your head. You don’t need to look up whether you want a DataFrame, Dataset, or DataStream.
- Composability. Every node port uses one of these four types. Any output port can connect to any input port of the same type with a compatible schema.
- Serialization. All four types map directly to JSON. No custom binary formats, no language-specific serialization. NDJSON files are the interchange format.
The four types
Section titled “The four types” ┌───────────────────────────────────────────┐ │ Singular │ │ │ │ Value "hello" 42 true null │ │ Record { name: "Alice", score: 91 } │ │ │ ├───────────────────────────────────────────┤ │ Plural │ │ │ │ Table [ {row1}, {row2}, ... {rowN} ] │ │ Stream {row1}↓ {row2}↓ {row3}↓ ... │ │ │ └───────────────────────────────────────────┘A single scalar: string, number, boolean, or null.
"hello world"42truenullUse Value for configuration inputs, thresholds, file paths, API keys, and single-result outputs. Value ports have no schema — the type is the scalar itself.
Record
Section titled “Record”A single JSON object with named, typed fields.
{ "name": "Alice", "email": "alice@example.com", "score": 91 }Use Record for configuration blocks, single-row lookups, and API responses. Record ports require a schema defining their fields.
An ordered collection of Records. Bounded — all rows exist on disk or in memory. Stored as NDJSON (one JSON object per line).
{"name":"Alice","email":"alice@example.com","score":91}{"name":"Bob","email":"bob@example.com","score":47}{"name":"Carol","email":"carol@example.com","score":83}Tables are the most common type. Most pipelines read a Table, transform it through SQL or data ops, and write a Table. Table ports require a schema.
Stream
Section titled “Stream”Like Table, but rows arrive incrementally. Unbounded — you process rows as they arrive without loading everything into memory.
Use Stream for webhook feeds, log tails, message queues, and any source where you don’t know the total row count up front. Stream ports require a schema. The runtime handles backpressure — if a downstream node is slower than the producer, the runtime buffers or pauses the source.
Subtyping rules
Section titled “Subtyping rules”When an edge connects an output port to an input port, the runtime checks type and schema compatibility. The rules follow a principle: a producer can give more than the consumer expects, but never less.
| Scenario | Result |
|---|---|
| Extra fields in source | Allowed. Downstream sees a superset. |
| Missing required field | Error at validation time. |
integer output to number input | Allowed. Integer is a subset of number. |
number output to integer input | Error. Potential data loss. |
| Enum subset (source has fewer values) | Allowed. |
| Enum superset (source has extra values) | Error. Downstream cannot handle unknown values. |
Field type mismatch (string to number) | Error at validation time. |
Schema enforcement: compile-time vs runtime
Section titled “Schema enforcement: compile-time vs runtime”Schema checking happens twice:
-
At compile time (when you run
rf validateorrf run): the graph parser reads everynode.yaml, resolves edges, and checks that connected ports have compatible schemas. If schema A is not compatible with schema B, the pipeline is rejected before any node executes. -
At runtime (during execution): the executor validates actual data against declared schemas before passing it to the next node. This catches data-level issues that static analysis can’t — like a field that’s declared
numberbut contains"N/A"in the source data.
node.yaml node.yaml ┌──────────┐ ┌──────────┐ │ output: │ │ input: │ │ Table │───edge─────▶│ Table │ │ schema A│ │ schema B│ └──────────┘ └──────────┘ │ │ └────────┐ ┌──────────┘ ▼ ▼ compatibility check (A must satisfy B)If schema A is not compatible with schema B, the graph parser rejects the pipeline before any node executes. You see the error immediately — not after processing half your data.
Schemas
Section titled “Schemas”Every Table and Record port has a schema. Schemas define fields and their types. They live as .schema.json files alongside the data.
{ "fields": { "name": { "type": "string" }, "email": { "type": "string" }, "score": { "type": "number" }, "active": { "type": "boolean" } }}Supported field types: string, number, integer, boolean, null, array, object. Nested objects and arrays use standard JSON Schema structure.
Nodes declare schemas on their ports in node.yaml:
id: score-leadstype: deterministicinputs: leads: type: Table schema: name: { type: string } email: { type: string } clicks: { type: integer } opens: { type: integer }outputs: scored: type: Table schema: name: { type: string } email: { type: string } score: { type: number }NDJSON format deep dive
Section titled “NDJSON format deep dive”Tables flow between nodes as NDJSON — Newline Delimited JSON. One JSON object per line. Each Table has a companion .schema.json file.
nodes/read-leads/artifacts/ leads.ndjson # data leads.schema.json # schemaThe NDJSON file:
{"name":"Alice","email":"alice@example.com","clicks":42,"opens":18}{"name":"Bob","email":"bob@example.com","clicks":7,"opens":3}{"name":"Carol","email":"carol@example.com","clicks":91,"opens":55}Why NDJSON and not CSV or JSON arrays?
- Not CSV because CSV has no type information, ambiguous quoting, no nested objects, and encoding chaos.
- Not JSON arrays because a JSON array requires loading the entire file into memory to parse. NDJSON is streamable — you can process one line at a time.
- NDJSON is human-readable, diffable in Git, streamable, and supported by every language. DuckDB reads it natively with no import step.
.schema.json companion files
Section titled “.schema.json companion files”Every NDJSON file has a companion .schema.json that lives in the same directory. The schema file describes the fields and types:
{ "fields": { "name": { "type": "string" }, "email": { "type": "string" }, "clicks": { "type": "integer" }, "opens": { "type": "integer" } }}Schema files are auto-generated from node.yaml port declarations. You don’t write them by hand. The executor creates them when a node produces output, and validates against them when a downstream node consumes input.
DuckDB for query execution
Section titled “DuckDB for query execution”SQL nodes execute against DuckDB. The runtime loads input Tables as DuckDB tables, runs the SQL query, and produces an output Table.
score: type: data.sql query: | SELECT *, (clicks * 0.3 + opens * 0.5) AS score FROM input ORDER BY score DESCDuckDB reads NDJSON natively. No import step. The query runs in-process with columnar execution — fast even on large Tables.