Skip to content

Key Concepts

A pipeline is a directed acyclic graph (DAG) of nodes. Data flows from sources through transforms to outputs. The entire pipeline is defined in gain.yaml and versioned in Git.

# gain.yaml — a pipeline with three nodes
nodes:
fetch-data:
type: source
# ...
transform:
type: deterministic
# ...
export:
type: deterministic
# ...

Pipelines execute deterministically. No LLM runs at execution time.

A node is a unit of work. It has typed input ports, typed output ports, and an operation that transforms inputs into outputs. Every node has a human-readable slug as its ID.

score-leads:
type: deterministic
op: sql.query
params:
query: "SELECT *, score * weight AS final FROM leads"
inputs:
leads: { type: Table, from: ref(read-csv.leads) }
outputs:
scored: { type: Table }

Node types include source (data ingestion), deterministic (transforms), llm (AI-generated at creation time), conditional (branching), and service (external APIs).

Ports are the typed connection points on a node. Each input port declares the data type it accepts. Each output port declares the data type it produces. Types are checked before execution.

inputs:
orders: { type: Table, from: ref(fetch-orders.orders) }
threshold: { type: Value, from: ref(config.min-amount) }
outputs:
filtered: { type: Table }

A port mismatch — connecting a Value output to a Table input — is caught before any code runs.

Edges are the connections between ports. They are declared implicitly through from references on input ports. An edge carries data from one node’s output to another node’s input.

# This input declaration creates an edge:
# clean-data.customers → enrich.customers
enrich:
inputs:
customers: { type: Table, from: ref(clean-data.customers) }

Edges enforce type contracts. The output type of the source port must match the input type of the target port.

Radhflow has four primitive data types. Every port uses one of them.

TypeDescriptionExample
ValueA single scalar — string, number, boolean.An API key, a threshold, a file path.
RecordA single JSON object with named fields.One user profile, one config block.
TableAn ordered collection of records (rows).A CSV import, a query result, a report.
StreamAn unbounded sequence of records.A webhook feed, a log tail, a queue.

Tables are the most common type. They flow between nodes as NDJSON.

Every Table and Record port has a schema. Schemas define the fields and their types. They are stored as .schema.json files alongside NDJSON data.

nodes/read-leads/
schemas/
leads.schema.json # field definitions
artifacts/
leads.ndjson # the data

A schema file:

{
"fields": {
"name": { "type": "string" },
"email": { "type": "string" },
"score": { "type": "number" }
}
}

NDJSON is one JSON object per line. Human-readable. Diffable in Git. Universal across languages.

{"name":"Alice","email":"alice@example.com","score":92}
{"name":"Bob","email":"bob@example.com","score":45}

ref() is how you wire nodes together. It references an output port on another node using the pattern ref(node-id.port-name).

transform:
inputs:
raw: { type: Table, from: ref(fetch-data.rows) }
config: { type: Record, from: ref(load-config.settings) }

ref(fetch-data.rows) means: take the rows output from the fetch-data node and feed it into this input. The graph executor resolves these references, validates types, and determines execution order automatically.