data.join
Combine two tables on a shared key.
Input: Table (left port) + Table (right port) | Output: Table
Minimal example
Section titled “Minimal example”Join customers with their orders:
nodes: orders-with-customers: type: data.join config: join_type: left left_key: customer_id right_key: id
edges: - "load-orders.output -> orders-with-customers.left" - "load-customers.output -> orders-with-customers.right"Config reference
Section titled “Config reference”| Field | Type | Required | Description |
|---|---|---|---|
join_type | string | no | Join type: inner, left, right, full (default: inner) |
left_key | string | yes | Column name on the left table to join on |
right_key | string | yes | Column name on the right table to join on |
right_prefix | string | no | Prefix for right-side columns to avoid name collisions |
Output schema
Section titled “Output schema”The output contains all columns from the left table plus all columns from the right table (excluding the right join key, which would duplicate the left key). When both tables share a column name, use right_prefix to disambiguate.
nodes: merged: type: data.join config: join_type: inner left_key: id right_key: order_id right_prefix: "order_"With right_prefix: "order_", right-side columns status and total become order_status and order_total in the output.
Join types
Section titled “Join types”Inner join
Section titled “Inner join”Returns only rows with matching keys on both sides.
nodes: matched-orders: type: data.join config: join_type: inner left_key: id right_key: customer_id
edges: - "customers.output -> matched-orders.left" - "orders.output -> matched-orders.right"Left join
Section titled “Left join”Returns all rows from the left table. Right-side columns are null when no match exists.
nodes: all-customers: type: data.join config: join_type: left left_key: id right_key: customer_idRight join
Section titled “Right join”Returns all rows from the right table. Left-side columns are null when no match exists.
nodes: all-orders: type: data.join config: join_type: right left_key: id right_key: customer_idFull join
Section titled “Full join”Returns all rows from both tables. Columns are null on the side where no match exists.
nodes: full-merge: type: data.join config: join_type: full left_key: email right_key: emailProgressive examples
Section titled “Progressive examples”Customer + orders enrichment
Section titled “Customer + orders enrichment”name: enrich-ordersversion: 1
nodes: load-orders: type: file.source path: orders.csv format: csv
load-products: type: file.source path: products.csv format: csv
orders-enriched: type: data.join config: join_type: left left_key: product_id right_key: id right_prefix: "product_"
summary: type: data.map config: include_original: true fields: line_total: "{{ quantity * product_price }}"
edges: - "load-orders.data -> orders-enriched.left" - "load-products.data -> orders-enriched.right" - "orders-enriched.output -> summary.input"This pipeline joins orders with product data, then computes a line total using the product price from the right table.
Enrichment join (adding metadata)
Section titled “Enrichment join (adding metadata)”nodes: enrich-users: type: data.join config: join_type: left left_key: country_code right_key: code right_prefix: "country_"
edges: - "users.output -> enrich-users.left" - "country-lookup.output -> enrich-users.right"Left join ensures all users are kept, even if the country code lookup fails.
Edge cases
Section titled “Edge cases”Duplicate keys. When the right table has multiple rows matching a single left key, the output contains one row per match (a one-to-many join). This multiplies the left-side rows.
NULL keys. NULL keys never match. Rows with NULL join keys are excluded from inner joins and appear with null counterpart columns in outer joins.
Many-to-many. When both sides have duplicate keys, the result is a cross product of matching rows. A left table with 3 rows matching key “A” and a right table with 2 rows matching key “A” produces 6 output rows. This is usually unintended — dedup one side first.
Column name collisions. If both tables have a column named status and you do not set right_prefix, the right-side status overwrites the left-side one. Always set right_prefix when column names may overlap.