Skip to content

data.filter

Keep rows that match a condition. Drop the rest.

Input: Table | Output: Table (same schema, fewer rows)

Filter customers by country:

nodes:
german-customers:
type: data.filter
config:
conditions:
all:
- field: country
op: equals
value: DE
FieldTypeRequiredDescription
conditionsobjectyesCondition group (all or any) containing filter rules

Conditions are wrapped in all (AND) or any (OR). Groups nest arbitrarily.

# AND — all conditions must match
conditions:
all:
- field: age
op: gte
value: 18
- field: country
op: equals
value: DE
# OR — at least one condition must match
conditions:
any:
- field: role
op: equals
value: admin
- field: role
op: equals
value: editor
# Nested — (status = active) AND (role = admin OR role = editor)
conditions:
all:
- field: status
op: equals
value: active
- any:
- field: role
op: equals
value: admin
- field: role
op: equals
value: editor
OperatorDescriptionValue type
equalsExact matchany
not_equalsNot equalany
gtGreater thannumber, string
gteGreater than or equalnumber, string
ltLess thannumber, string
lteLess than or equalnumber, string
containsSubstring matchstring
not_containsNo substring matchstring
starts_withPrefix matchstring
ends_withSuffix matchstring
inValue in listarray
not_inValue not in listarray
is_nullField is null(none)
is_not_nullField is not null(none)
matchesRegex matchstring (regex pattern)
conditions:
all:
- field: status
op: equals
value: active
conditions:
all:
- field: score
op: gte
value: 50
- field: score
op: lt
value: 100
conditions:
all:
- field: email
op: is_not_null
- field: score
op: gte
value: 50
- field: source
op: not_in
value: [spam, test]
# Keep rows where phone is not null
conditions:
all:
- field: phone
op: is_not_null
# Keep rows where phone IS null
conditions:
all:
- field: phone
op: is_null
conditions:
all:
- field: zip_code
op: matches
value: "^[0-9]{5}$"
conditions:
all:
- field: country
op: in
value: [DE, AT, CH]
conditions:
all:
- field: email
op: contains
value: "@company.com"

Filter values can come from upstream nodes instead of static config. This makes thresholds dynamic.

nodes:
threshold:
type: value.literal
config:
value: 0.8
type: number
high-scores:
type: data.filter
config:
conditions:
all:
- field: score
op: gte
value: "{{threshold}}"
edges:
- "threshold.value -> high-scores.threshold"
- "scores-table.output -> high-scores.input"

Change the upstream value and the filter adapts.

Empty result. If no rows match, the output is an empty NDJSON file (zero lines) with the same schema as the input.

All rows filtered. Same as empty result — valid output, zero rows. Downstream nodes receive an empty table.

NULL comparisons. Comparisons against NULL follow SQL semantics: NULL = anything is false. Use is_null and is_not_null operators for null checks.

name: qualified-leads
version: 1
nodes:
raw-leads:
type: file.source
path: leads.csv
format: csv
qualified:
type: data.filter
config:
conditions:
all:
- field: email
op: is_not_null
- field: score
op: gte
value: 50
- field: source
op: not_in
value: [spam, test]
edges:
- "raw-leads.data -> qualified.input"