data.group

Group rows and compute aggregates.

Input: Table | Output: Table (group columns + aggregation columns)

Minimal example

Revenue by region:

nodes:
  revenue-by-region:
    type: data.group
    config:
      by: [region]
      aggregations:
        total_revenue:
          op: sum
          field: amount

Config reference

Field	Type	Required	Description
`by`	array	yes	Fields to group on
`aggregations`	map	yes	Output column name to aggregation definition

Each aggregation definition:

Field	Type	Required	Description
`op`	string	yes	Aggregation function
`field`	string	varies	Input field to aggregate (required for all except `count` with `*`)
`limit`	number	no	Max items for `collect`
`separator`	string	no	Delimiter for `join` (default: `", "`)

Supported aggregates

Function	Description	Output type
`count`	Count of non-null values (or all rows with `*`)	number
`sum`	Sum of numeric values	number
`avg`	Arithmetic mean	number
`min`	Minimum value	same as input
`max`	Maximum value	same as input
`first`	First value in group	same as input
`last`	Last value in group	same as input
`collect`	Collect values into a JSON array	list
`count_unique`	Count of distinct values	number
`join`	Concatenate string values with separator	string

Progressive examples

Revenue by category

nodes:
  revenue-summary:
    type: data.group
    config:
      by: [category]
      aggregations:
        total_revenue:
          op: sum
          field: amount
        avg_order:
          op: avg
          field: amount
        order_count:
          op: count
          field: id

Count by status

nodes:
  status-breakdown:
    type: data.group
    config:
      by: [status]
      aggregations:
        count:
          op: count
          field: "*"

Multi-field grouping

nodes:
  region-product:
    type: data.group
    config:
      by: [region, product_type]
      aggregations:
        units_sold:
          op: sum
          field: quantity
        unique_customers:
          op: count_unique
          field: customer_id

Collect and join

nodes:
  tags-per-author:
    type: data.group
    config:
      by: [author]
      aggregations:
        all_tags:
          op: collect
          field: tag
          limit: 10
        tag_list:
          op: join
          field: tag
          separator: " | "

collect gathers values into a JSON array. limit caps the array size. join concatenates values into a single string with the specified separator.

First and last values

nodes:
  session-summary:
    type: data.group
    config:
      by: [session_id]
      aggregations:
        first_event:
          op: first
          field: event_type
        last_event:
          op: last
          field: event_type
        event_count:
          op: count
          field: "*"

Edge cases

Empty groups. If the input table is empty, the output is also empty — no group rows are produced.

NULL in group key. Rows with NULL in a group-by field are grouped together into a single group where the group key is NULL.

NULL in aggregation field. count skips NULLs (use field: "*" to count all rows including NULLs). sum and avg ignore NULL values. min and max ignore NULLs.

Single-row groups. first and last return the same value when the group has one row.

Pipeline example

name: monthly-sales-report
version: 1

nodes:
  load-sales:
    type: file.source
    path: sales.csv
    format: csv

  monthly-summary:
    type: data.group
    config:
      by: [month, region]
      aggregations:
        revenue:
          op: sum
          field: amount
        deals:
          op: count
          field: id
        top_deal:
          op: max
          field: amount

  ranked:
    type: data.sort
    config:
      by:
        - field: revenue
          direction: desc

edges:
  - "load-sales.data -> monthly-summary.input"
  - "monthly-summary.output -> ranked.input"