data.join

Combine two tables on a shared key.

Input: Table (left port) + Table (right port) | Output: Table

Minimal example

Join customers with their orders:

nodes:
  orders-with-customers:
    type: data.join
    config:
      join_type: left
      left_key: customer_id
      right_key: id

edges:
  - "load-orders.output -> orders-with-customers.left"
  - "load-customers.output -> orders-with-customers.right"

Config reference

Field	Type	Required	Description
`join_type`	string	no	Join type: `inner`, `left`, `right`, `full` (default: `inner`)
`left_key`	string	yes	Column name on the left table to join on
`right_key`	string	yes	Column name on the right table to join on
`right_prefix`	string	no	Prefix for right-side columns to avoid name collisions

Output schema

The output contains all columns from the left table plus all columns from the right table (excluding the right join key, which would duplicate the left key). When both tables share a column name, use right_prefix to disambiguate.

nodes:
  merged:
    type: data.join
    config:
      join_type: inner
      left_key: id
      right_key: order_id
      right_prefix: "order_"

With right_prefix: "order_", right-side columns status and total become order_status and order_total in the output.

Join types

Inner join

Returns only rows with matching keys on both sides.

nodes:
  matched-orders:
    type: data.join
    config:
      join_type: inner
      left_key: id
      right_key: customer_id

edges:
  - "customers.output -> matched-orders.left"
  - "orders.output -> matched-orders.right"

Left join

Returns all rows from the left table. Right-side columns are null when no match exists.

nodes:
  all-customers:
    type: data.join
    config:
      join_type: left
      left_key: id
      right_key: customer_id

Right join

Returns all rows from the right table. Left-side columns are null when no match exists.

nodes:
  all-orders:
    type: data.join
    config:
      join_type: right
      left_key: id
      right_key: customer_id

Full join

Returns all rows from both tables. Columns are null on the side where no match exists.

nodes:
  full-merge:
    type: data.join
    config:
      join_type: full
      left_key: email
      right_key: email

Progressive examples

Customer + orders enrichment

name: enrich-orders
version: 1

nodes:
  load-orders:
    type: file.source
    path: orders.csv
    format: csv

  load-products:
    type: file.source
    path: products.csv
    format: csv

  orders-enriched:
    type: data.join
    config:
      join_type: left
      left_key: product_id
      right_key: id
      right_prefix: "product_"

  summary:
    type: data.map
    config:
      include_original: true
      fields:
        line_total: "{{ quantity * product_price }}"

edges:
  - "load-orders.data -> orders-enriched.left"
  - "load-products.data -> orders-enriched.right"
  - "orders-enriched.output -> summary.input"

This pipeline joins orders with product data, then computes a line total using the product price from the right table.

Enrichment join (adding metadata)

nodes:
  enrich-users:
    type: data.join
    config:
      join_type: left
      left_key: country_code
      right_key: code
      right_prefix: "country_"

edges:
  - "users.output -> enrich-users.left"
  - "country-lookup.output -> enrich-users.right"

Left join ensures all users are kept, even if the country code lookup fails.

Edge cases

Duplicate keys. When the right table has multiple rows matching a single left key, the output contains one row per match (a one-to-many join). This multiplies the left-side rows.

NULL keys. NULL keys never match. Rows with NULL join keys are excluded from inner joins and appear with null counterpart columns in outer joins.

Many-to-many. When both sides have duplicate keys, the result is a cross product of matching rows. A left table with 3 rows matching key “A” and a right table with 2 rows matching key “A” produces 6 output rows. This is usually unintended — dedup one side first.

Column name collisions. If both tables have a column named status and you do not set right_prefix, the right-side status overwrites the left-side one. Always set right_prefix when column names may overlap.