Skip to content

data.join

Combine two tables on a shared key.

Input: Table (left port) + Table (right port) | Output: Table

Join customers with their orders:

nodes:
orders-with-customers:
type: data.join
config:
join_type: left
left_key: customer_id
right_key: id
edges:
- "load-orders.output -> orders-with-customers.left"
- "load-customers.output -> orders-with-customers.right"
FieldTypeRequiredDescription
join_typestringnoJoin type: inner, left, right, full (default: inner)
left_keystringyesColumn name on the left table to join on
right_keystringyesColumn name on the right table to join on
right_prefixstringnoPrefix for right-side columns to avoid name collisions

The output contains all columns from the left table plus all columns from the right table (excluding the right join key, which would duplicate the left key). When both tables share a column name, use right_prefix to disambiguate.

nodes:
merged:
type: data.join
config:
join_type: inner
left_key: id
right_key: order_id
right_prefix: "order_"

With right_prefix: "order_", right-side columns status and total become order_status and order_total in the output.

Returns only rows with matching keys on both sides.

nodes:
matched-orders:
type: data.join
config:
join_type: inner
left_key: id
right_key: customer_id
edges:
- "customers.output -> matched-orders.left"
- "orders.output -> matched-orders.right"

Returns all rows from the left table. Right-side columns are null when no match exists.

nodes:
all-customers:
type: data.join
config:
join_type: left
left_key: id
right_key: customer_id

Returns all rows from the right table. Left-side columns are null when no match exists.

nodes:
all-orders:
type: data.join
config:
join_type: right
left_key: id
right_key: customer_id

Returns all rows from both tables. Columns are null on the side where no match exists.

nodes:
full-merge:
type: data.join
config:
join_type: full
left_key: email
right_key: email
name: enrich-orders
version: 1
nodes:
load-orders:
type: file.source
path: orders.csv
format: csv
load-products:
type: file.source
path: products.csv
format: csv
orders-enriched:
type: data.join
config:
join_type: left
left_key: product_id
right_key: id
right_prefix: "product_"
summary:
type: data.map
config:
include_original: true
fields:
line_total: "{{ quantity * product_price }}"
edges:
- "load-orders.data -> orders-enriched.left"
- "load-products.data -> orders-enriched.right"
- "orders-enriched.output -> summary.input"

This pipeline joins orders with product data, then computes a line total using the product price from the right table.

nodes:
enrich-users:
type: data.join
config:
join_type: left
left_key: country_code
right_key: code
right_prefix: "country_"
edges:
- "users.output -> enrich-users.left"
- "country-lookup.output -> enrich-users.right"

Left join ensures all users are kept, even if the country code lookup fails.

Duplicate keys. When the right table has multiple rows matching a single left key, the output contains one row per match (a one-to-many join). This multiplies the left-side rows.

NULL keys. NULL keys never match. Rows with NULL join keys are excluded from inner joins and appear with null counterpart columns in outer joins.

Many-to-many. When both sides have duplicate keys, the result is a cross product of matching rows. A left table with 3 rows matching key “A” and a right table with 2 rows matching key “A” produces 6 output rows. This is usually unintended — dedup one side first.

Column name collisions. If both tables have a column named status and you do not set right_prefix, the right-side status overwrites the left-side one. Always set right_prefix when column names may overlap.