sort / limit / dedup
Three operations for shaping result sets. They do one thing each and chain together cleanly.
data.sort
Section titled “data.sort”Orders rows by one or more fields.
Input: Table Output: Table (same schema, reordered)
Config reference
Section titled “Config reference”| Field | Type | Required | Description |
|---|---|---|---|
by | array | yes | Sort fields, each with field and optional direction |
null_handling | string | no | Where nulls appear: first or last (default: last) |
Each entry in by has:
| Field | Type | Default | Description |
|---|---|---|---|
field | string | (required) | Column to sort on |
direction | string | asc | Sort direction: asc or desc |
Examples
Section titled “Examples”nodes: by-date: type: data.sort config: by: - field: created_at direction: desc# Multi-field sort with nulls firstnodes: ranked: type: data.sort config: null_handling: first by: - field: priority direction: asc - field: created_at direction: descdata.limit
Section titled “data.limit”Takes the first N rows, optionally skipping an offset.
Input: Table Output: Table (same schema, at most N rows)
Config reference
Section titled “Config reference”| Field | Type | Required | Description |
|---|---|---|---|
count | number | yes | Maximum rows to return |
offset | number | no | Rows to skip before taking (default: 0) |
Examples
Section titled “Examples”nodes: top-10: type: data.limit config: count: 10# Pagination: skip 20, take 10nodes: page-3: type: data.limit config: count: 10 offset: 20data.dedup
Section titled “data.dedup”Removes duplicate rows based on key fields. Keeps either the first or last occurrence in input order.
Input: Table Output: Table (same schema, duplicates removed)
Config reference
Section titled “Config reference”| Field | Type | Required | Description |
|---|---|---|---|
on | array | yes | Fields to deduplicate on |
keep | string | no | Which duplicate to keep: first or last (default: first) |
Examples
Section titled “Examples”nodes: unique-emails: type: data.dedup config: on: [email]# Keep the most recent entry per usernodes: latest-per-user: type: data.dedup config: on: [user_id] keep: lastPipeline: sort, dedup, limit
Section titled “Pipeline: sort, dedup, limit”These three operations compose into a common pattern: sort to establish order, deduplicate on a key (keeping the desired occurrence based on that order), then cap the result size.
nodes: load-signups: type: file.csv config: path: signups.csv
by-date: type: data.sort config: by: - field: signed_up_at direction: desc
one-per-email: type: data.dedup config: on: [email] keep: first
top-100: type: data.limit config: count: 100
edges: - load-signups.output -> by-date.input - by-date.output -> one-per-email.input - one-per-email.output -> top-100.inputThis pipeline loads signups, sorts newest first, keeps only the most recent signup per email address, and returns the top 100.