Skip to content

data.group

data.group partitions rows by one or more fields, then computes aggregation functions over each group. The output contains one row per unique combination of group-by values, plus the aggregation result columns.

Input: Table Output: Table (group columns + aggregation columns)

nodes:
orders-by-status:
type: data.group
config:
by: [status]
aggregations:
order_count:
op: count
field: id
total_revenue:
op: sum
field: amount
FieldTypeRequiredDescription
byarrayyesFields to group on
aggregationsmapyesOutput column name to aggregation definition

Each aggregation definition has:

FieldTypeRequiredDescription
opstringyesAggregation function
fieldstringvariesInput field to aggregate (required for all except count with *)
limitnumbernoMax items for collect
separatorstringnoDelimiter for join (default: ", ")
FunctionDescriptionOutput type
countCount of non-null values (or all rows with *)number
sumSum of numeric valuesnumber
avgArithmetic meannumber
minMinimum valuesame as input
maxMaximum valuesame as input
firstFirst value in groupsame as input
lastLast value in groupsame as input
collectCollect values into a listlist
count_uniqueCount of distinct valuesnumber
joinConcatenate string values with separatorstring
nodes:
revenue-summary:
type: data.group
config:
by: [category]
aggregations:
total_revenue:
op: sum
field: amount
avg_order:
op: avg
field: amount
order_count:
op: count
field: id
nodes:
status-breakdown:
type: data.group
config:
by: [status]
aggregations:
count:
op: count
field: "*"
nodes:
region-product:
type: data.group
config:
by: [region, product_type]
aggregations:
units_sold:
op: sum
field: quantity
unique_customers:
op: count_unique
field: customer_id
nodes:
tags-per-author:
type: data.group
config:
by: [author]
aggregations:
all_tags:
op: collect
field: tag
limit: 10
tag_list:
op: join
field: tag
separator: " | "

collect gathers values into a JSON array. limit caps the array size. join concatenates values into a single string with the specified separator.

nodes:
load-sales:
type: file.csv
config:
path: sales.csv
monthly-summary:
type: data.group
config:
by: [month, region]
aggregations:
revenue:
op: sum
field: amount
deals:
op: count
field: id
top_deal:
op: max
field: amount
ranked:
type: data.sort
config:
by:
- field: revenue
direction: desc
edges:
- load-sales.output -> monthly-summary.input
- monthly-summary.output -> ranked.input