Key Concepts

Pipelines

A pipeline is a directed acyclic graph (DAG) of nodes. Data flows from sources through transforms to outputs. The entire pipeline is defined in gain.yaml and versioned in Git.

# gain.yaml — a pipeline with three nodes
nodes:
  fetch-data:
    type: source
    # ...
  transform:
    type: deterministic
    # ...
  export:
    type: deterministic
    # ...

Pipelines execute deterministically. No LLM runs at execution time.

Nodes

A node is a unit of work. It has typed input ports, typed output ports, and an operation that transforms inputs into outputs. Every node has a human-readable slug as its ID.

score-leads:
  type: deterministic
  op: sql.query
  params:
    query: "SELECT *, score * weight AS final FROM leads"
  inputs:
    leads: { type: Table, from: ref(read-csv.leads) }
  outputs:
    scored: { type: Table }

Node types include source (data ingestion), deterministic (transforms), llm (AI-generated at creation time), conditional (branching), and service (external APIs).

Ports

Ports are the typed connection points on a node. Each input port declares the data type it accepts. Each output port declares the data type it produces. Types are checked before execution.

inputs:
  orders: { type: Table, from: ref(fetch-orders.orders) }
  threshold: { type: Value, from: ref(config.min-amount) }
outputs:
  filtered: { type: Table }

A port mismatch — connecting a Value output to a Table input — is caught before any code runs.

Edges

Edges are the connections between ports. They are declared implicitly through from references on input ports. An edge carries data from one node’s output to another node’s input.

# This input declaration creates an edge:
#   clean-data.customers → enrich.customers
enrich:
  inputs:
    customers: { type: Table, from: ref(clean-data.customers) }

Edges enforce type contracts. The output type of the source port must match the input type of the target port.

Data types

Radhflow has four primitive data types. Every port uses one of them.

Type	Description	Example
Value	A single scalar — string, number, boolean.	An API key, a threshold, a file path.
Record	A single JSON object with named fields.	One user profile, one config block.
Table	An ordered collection of records (rows).	A CSV import, a query result, a report.
Stream	An unbounded sequence of records.	A webhook feed, a log tail, a queue.

Tables are the most common type. They flow between nodes as NDJSON.

Schemas

Every Table and Record port has a schema. Schemas define the fields and their types. They are stored as .schema.json files alongside NDJSON data.

nodes/read-leads/
  schemas/
    leads.schema.json    # field definitions
  artifacts/
    leads.ndjson         # the data

A schema file:

{
  "fields": {
    "name": { "type": "string" },
    "email": { "type": "string" },
    "score": { "type": "number" }
  }
}

NDJSON is one JSON object per line. Human-readable. Diffable in Git. Universal across languages.

{"name":"Alice","email":"alice@example.com","score":92}
{"name":"Bob","email":"bob@example.com","score":45}

ref()

ref() is how you wire nodes together. It references an output port on another node using the pattern ref(node-id.port-name).

transform:
  inputs:
    raw: { type: Table, from: ref(fetch-data.rows) }
    config: { type: Record, from: ref(load-config.settings) }

ref(fetch-data.rows) means: take the rows output from the fetch-data node and feed it into this input. The graph executor resolves these references, validates types, and determines execution order automatically.