SQL Transforms
Write raw SQL when built-in ops are not enough.
transform.sql gives you the full DuckDB SQL dialect — window functions, CTEs, JSON extraction, regex, date arithmetic — while staying connected to the pipeline graph through ref() references.
Input: One or more Tables (via ref()) | Output: Table
Minimal example
Section titled “Minimal example”nodes: top-customers: type: transform.sql config: query: | SELECT customer_id, sum(amount) AS total_spend, count(*) AS order_count FROM ref('load-orders') GROUP BY customer_id HAVING sum(amount) > 1000 ORDER BY total_spend DESCConfig reference
Section titled “Config reference”| Field | Type | Required | Description |
|---|---|---|---|
query | string | yes | Full DuckDB SQL query using ref() to reference upstream nodes |
The ref() function
Section titled “The ref() function”ref('node_id') references the output of an upstream node. Radhflow resolves it to the node’s NDJSON output file at execution time. Edges are created automatically from ref() calls — you do not need to declare them manually.
-- Single sourceSELECT * FROM ref('clean-data') WHERE status = 'active'
-- Multiple sourcesSELECT a.*, b.category_nameFROM ref('orders') aJOIN ref('categories') b ON a.category_id = b.idNode IDs in ref() must match existing node IDs in the pipeline. The type checker validates these references at parse time.
DuckDB SQL dialect notes
Section titled “DuckDB SQL dialect notes”DuckDB supports most PostgreSQL syntax plus extensions for analytics. A few things to know:
- String literals use single quotes:
'active' - Identifiers with special characters use double quotes:
"my-column" ILIKEfor case-insensitive LIKESTRFTIMEfor date formattingLIST()aggregate for collecting values into arraysUNNEST()for expanding arrays into rowsSTRUCTfor nested objects- Full window function support:
ROW_NUMBER(),RANK(),LAG(),LEAD(), etc.
Progressive examples
Section titled “Progressive examples”Window functions
Section titled “Window functions”nodes: ranked-sales: type: transform.sql config: query: | SELECT *, ROW_NUMBER() OVER ( PARTITION BY region ORDER BY revenue DESC ) AS rank_in_region, revenue / SUM(revenue) OVER ( PARTITION BY region ) AS pct_of_region FROM ref('monthly-sales')Common Table Expressions (CTEs)
Section titled “Common Table Expressions (CTEs)”nodes: cohort-analysis: type: transform.sql config: query: | WITH first_purchase AS ( SELECT customer_id, MIN(order_date) AS cohort_date FROM ref('orders') GROUP BY customer_id ), labeled AS ( SELECT o.*, fp.cohort_date, DATE_DIFF('month', fp.cohort_date, o.order_date) AS months_since FROM ref('orders') o JOIN first_purchase fp ON o.customer_id = fp.customer_id ) SELECT cohort_date, months_since, COUNT(DISTINCT customer_id) AS customers, SUM(amount) AS revenue FROM labeled GROUP BY cohort_date, months_since ORDER BY cohort_date, months_sinceJSON extraction
Section titled “JSON extraction”nodes: parse-metadata: type: transform.sql config: query: | SELECT id, metadata->>'$.name' AS name, CAST(metadata->>'$.score' AS DOUBLE) AS score, json_array_length(metadata->'$.tags') AS tag_count FROM ref('raw-events')Regex and string functions
Section titled “Regex and string functions”nodes: extract-domains: type: transform.sql config: query: | SELECT email, regexp_extract(email, '@(.+)$', 1) AS domain, CASE WHEN email LIKE '%@gmail.com' THEN 'personal' WHEN email LIKE '%@company.com' THEN 'internal' ELSE 'other' END AS email_type FROM ref('contacts')Date functions
Section titled “Date functions”nodes: daily-metrics: type: transform.sql config: query: | SELECT DATE_TRUNC('day', created_at) AS day, COUNT(*) AS events, COUNT(DISTINCT user_id) AS unique_users, DATE_DIFF('hour', MIN(created_at), MAX(created_at)) AS active_hours FROM ref('event-log') GROUP BY DATE_TRUNC('day', created_at) ORDER BY dayMulti-source join
Section titled “Multi-source join”nodes: full-report: type: transform.sql config: query: | SELECT o.id AS order_id, c.name AS customer_name, p.name AS product_name, o.quantity, o.quantity * p.unit_price AS line_total FROM ref('orders') o LEFT JOIN ref('customers') c ON o.customer_id = c.id LEFT JOIN ref('products') p ON o.product_id = p.id WHERE o.status != 'cancelled' ORDER BY o.created_at DESCWhen to use SQL transforms
Section titled “When to use SQL transforms”| Scenario | Recommended approach |
|---|---|
| Filter rows by a condition | data.filter |
| Sort and take top N | data.sort + data.limit |
| Join two tables | data.join |
| Group and aggregate | data.group |
| Window ranking within groups | transform.sql |
| CTE-based multi-step logic | transform.sql |
| Multi-table join with conditions | transform.sql |
| Regex extraction + conditional logic | transform.sql |
Anything with HAVING, QUALIFY, or LATERAL | transform.sql |