sort / limit / dedup
Sort rows, cap the count, remove duplicates — in one step or chained together.
Three operations for shaping result sets. Each does one thing. They chain together cleanly.
data.sort
Section titled “data.sort”Orders rows by one or more fields.
Input: Table | Output: Table (same schema, reordered)
Config reference
Section titled “Config reference”| Field | Type | Required | Description |
|---|---|---|---|
by | array | yes | Sort fields, each with field and optional direction |
null_handling | string | no | Where nulls appear: first or last (default: last) |
Each entry in by:
| Field | Type | Default | Description |
|---|---|---|---|
field | string | (required) | Column to sort on |
direction | string | asc | Sort direction: asc or desc |
Examples
Section titled “Examples”nodes: by-date: type: data.sort config: by: - field: created_at direction: desc# Multi-field sort with nulls firstnodes: ranked: type: data.sort config: null_handling: first by: - field: priority direction: asc - field: created_at direction: descEdge cases
Section titled “Edge cases”Stable sort. Rows with equal sort keys retain their original relative order.
NULL ordering. By default, NULLs sort last. Set null_handling: first to put them at the top.
data.limit
Section titled “data.limit”Takes the first N rows, optionally skipping an offset.
Input: Table | Output: Table (same schema, at most N rows)
Config reference
Section titled “Config reference”| Field | Type | Required | Description |
|---|---|---|---|
count | number | yes | Maximum rows to return |
offset | number | no | Rows to skip before taking (default: 0) |
Examples
Section titled “Examples”nodes: top-10: type: data.limit config: count: 10# Pagination: skip 20, take 10nodes: page-3: type: data.limit config: count: 10 offset: 20Edge cases
Section titled “Edge cases”Fewer rows than count. If the input has fewer rows than count, all rows are returned.
Offset beyond data. If offset exceeds the row count, the output is empty.
data.dedup
Section titled “data.dedup”Removes duplicate rows based on key fields. Keeps either the first or last occurrence in input order.
Input: Table | Output: Table (same schema, duplicates removed)
Config reference
Section titled “Config reference”| Field | Type | Required | Description |
|---|---|---|---|
on | array | yes | Fields to deduplicate on |
keep | string | no | Which duplicate to keep: first or last (default: first) |
Examples
Section titled “Examples”nodes: unique-emails: type: data.dedup config: on: [email]# Keep the most recent entry per usernodes: latest-per-user: type: data.dedup config: on: [user_id] keep: lastEdge cases
Section titled “Edge cases”Multi-field dedup. When on lists multiple fields, rows are considered duplicates only if all listed fields match.
NULL keys. Two rows with NULL in a dedup key field are considered duplicates of each other.
Pipeline: sort + dedup + limit
Section titled “Pipeline: sort + dedup + limit”These three operations compose into a common pattern: sort to establish order, deduplicate on a key (keeping the desired occurrence based on that order), then cap the result size.
name: recent-unique-signupsversion: 1
nodes: load-signups: type: file.source path: signups.csv format: csv
by-date: type: data.sort config: by: - field: signed_up_at direction: desc
one-per-email: type: data.dedup config: on: [email] keep: first
top-100: type: data.limit config: count: 100
edges: - "load-signups.data -> by-date.input" - "by-date.output -> one-per-email.input" - "one-per-email.output -> top-100.input"This pipeline loads signups, sorts newest first, keeps only the most recent signup per email address, and returns the top 100.