Skip to content

data.group

Group rows and compute aggregates.

Input: Table | Output: Table (group columns + aggregation columns)

Revenue by region:

nodes:
revenue-by-region:
type: data.group
config:
by: [region]
aggregations:
total_revenue:
op: sum
field: amount
FieldTypeRequiredDescription
byarrayyesFields to group on
aggregationsmapyesOutput column name to aggregation definition

Each aggregation definition:

FieldTypeRequiredDescription
opstringyesAggregation function
fieldstringvariesInput field to aggregate (required for all except count with *)
limitnumbernoMax items for collect
separatorstringnoDelimiter for join (default: ", ")
FunctionDescriptionOutput type
countCount of non-null values (or all rows with *)number
sumSum of numeric valuesnumber
avgArithmetic meannumber
minMinimum valuesame as input
maxMaximum valuesame as input
firstFirst value in groupsame as input
lastLast value in groupsame as input
collectCollect values into a JSON arraylist
count_uniqueCount of distinct valuesnumber
joinConcatenate string values with separatorstring
nodes:
revenue-summary:
type: data.group
config:
by: [category]
aggregations:
total_revenue:
op: sum
field: amount
avg_order:
op: avg
field: amount
order_count:
op: count
field: id
nodes:
status-breakdown:
type: data.group
config:
by: [status]
aggregations:
count:
op: count
field: "*"
nodes:
region-product:
type: data.group
config:
by: [region, product_type]
aggregations:
units_sold:
op: sum
field: quantity
unique_customers:
op: count_unique
field: customer_id
nodes:
tags-per-author:
type: data.group
config:
by: [author]
aggregations:
all_tags:
op: collect
field: tag
limit: 10
tag_list:
op: join
field: tag
separator: " | "

collect gathers values into a JSON array. limit caps the array size. join concatenates values into a single string with the specified separator.

nodes:
session-summary:
type: data.group
config:
by: [session_id]
aggregations:
first_event:
op: first
field: event_type
last_event:
op: last
field: event_type
event_count:
op: count
field: "*"

Empty groups. If the input table is empty, the output is also empty — no group rows are produced.

NULL in group key. Rows with NULL in a group-by field are grouped together into a single group where the group key is NULL.

NULL in aggregation field. count skips NULLs (use field: "*" to count all rows including NULLs). sum and avg ignore NULL values. min and max ignore NULLs.

Single-row groups. first and last return the same value when the group has one row.

name: monthly-sales-report
version: 1
nodes:
load-sales:
type: file.source
path: sales.csv
format: csv
monthly-summary:
type: data.group
config:
by: [month, region]
aggregations:
revenue:
op: sum
field: amount
deals:
op: count
field: id
top_deal:
op: max
field: amount
ranked:
type: data.sort
config:
by:
- field: revenue
direction: desc
edges:
- "load-sales.data -> monthly-summary.input"
- "monthly-summary.output -> ranked.input"