data.group
Group rows and compute aggregates.
Input: Table | Output: Table (group columns + aggregation columns)
Minimal example
Section titled “Minimal example”Revenue by region:
nodes: revenue-by-region: type: data.group config: by: [region] aggregations: total_revenue: op: sum field: amountConfig reference
Section titled “Config reference”| Field | Type | Required | Description |
|---|---|---|---|
by | array | yes | Fields to group on |
aggregations | map | yes | Output column name to aggregation definition |
Each aggregation definition:
| Field | Type | Required | Description |
|---|---|---|---|
op | string | yes | Aggregation function |
field | string | varies | Input field to aggregate (required for all except count with *) |
limit | number | no | Max items for collect |
separator | string | no | Delimiter for join (default: ", ") |
Supported aggregates
Section titled “Supported aggregates”| Function | Description | Output type |
|---|---|---|
count | Count of non-null values (or all rows with *) | number |
sum | Sum of numeric values | number |
avg | Arithmetic mean | number |
min | Minimum value | same as input |
max | Maximum value | same as input |
first | First value in group | same as input |
last | Last value in group | same as input |
collect | Collect values into a JSON array | list |
count_unique | Count of distinct values | number |
join | Concatenate string values with separator | string |
Progressive examples
Section titled “Progressive examples”Revenue by category
Section titled “Revenue by category”nodes: revenue-summary: type: data.group config: by: [category] aggregations: total_revenue: op: sum field: amount avg_order: op: avg field: amount order_count: op: count field: idCount by status
Section titled “Count by status”nodes: status-breakdown: type: data.group config: by: [status] aggregations: count: op: count field: "*"Multi-field grouping
Section titled “Multi-field grouping”nodes: region-product: type: data.group config: by: [region, product_type] aggregations: units_sold: op: sum field: quantity unique_customers: op: count_unique field: customer_idCollect and join
Section titled “Collect and join”nodes: tags-per-author: type: data.group config: by: [author] aggregations: all_tags: op: collect field: tag limit: 10 tag_list: op: join field: tag separator: " | "collect gathers values into a JSON array. limit caps the array size. join concatenates values into a single string with the specified separator.
First and last values
Section titled “First and last values”nodes: session-summary: type: data.group config: by: [session_id] aggregations: first_event: op: first field: event_type last_event: op: last field: event_type event_count: op: count field: "*"Edge cases
Section titled “Edge cases”Empty groups. If the input table is empty, the output is also empty — no group rows are produced.
NULL in group key. Rows with NULL in a group-by field are grouped together into a single group where the group key is NULL.
NULL in aggregation field. count skips NULLs (use field: "*" to count all rows including NULLs). sum and avg ignore NULL values. min and max ignore NULLs.
Single-row groups. first and last return the same value when the group has one row.
Pipeline example
Section titled “Pipeline example”name: monthly-sales-reportversion: 1
nodes: load-sales: type: file.source path: sales.csv format: csv
monthly-summary: type: data.group config: by: [month, region] aggregations: revenue: op: sum field: amount deals: op: count field: id top_deal: op: max field: amount
ranked: type: data.sort config: by: - field: revenue direction: desc
edges: - "load-sales.data -> monthly-summary.input" - "monthly-summary.output -> ranked.input"