Skip to content

data.filter

data.filter keeps rows that match a set of conditions. Rows that fail the conditions are dropped. The output schema is identical to the input — same fields, fewer rows.

Input: Table Output: Table (same schema)

nodes:
active-users:
type: data.filter
config:
conditions:
all:
- field: status
op: equals
value: active

Conditions are wrapped in all (AND) or any (OR). Groups nest arbitrarily.

# AND — all conditions must match
conditions:
all:
- field: age
op: gte
value: 18
- field: country
op: equals
value: DE
# OR — at least one condition must match
conditions:
any:
- field: role
op: equals
value: admin
- field: role
op: equals
value: editor
# Nested — (status = active) AND (role = admin OR role = editor)
conditions:
all:
- field: status
op: equals
value: active
- any:
- field: role
op: equals
value: admin
- field: role
op: equals
value: editor
OperatorDescriptionValue type
equalsExact matchany
not_equalsNot equalany
gtGreater thannumber, string
gteGreater than or equalnumber, string
ltLess thannumber, string
lteLess than or equalnumber, string
containsSubstring matchstring
not_containsNo substring matchstring
starts_withPrefix matchstring
ends_withSuffix matchstring
inValue in listarray
not_inValue not in listarray
is_nullField is null(none)
is_not_nullField is not null(none)
matchesRegex matchstring (regex pattern)
# Numeric comparison
- field: score
op: gte
value: 0.75
# List membership
- field: country
op: in
value: [DE, AT, CH]
# Substring search
- field: email
op: contains
value: "@company.com"
# Prefix match
- field: sku
op: starts_with
value: "PROD-"
# Null check (no value needed)
- field: phone
op: is_not_null
# Regex match
- field: zip_code
op: matches
value: "^[0-9]{5}$"

Filter values can come from upstream nodes instead of static config. When a field is marked as a promoted port, its value is read from an incoming edge at runtime.

nodes:
threshold:
type: value.literal
config:
value: 0.8
type: number
high-scores:
type: data.filter
config:
conditions:
all:
- field: score
op: gte
value: "{{threshold}}"
edges:
- threshold.value -> high-scores.threshold
- scores-table.output -> high-scores.input

This makes the filter threshold dynamic — change the upstream value and the filter adapts.

nodes:
raw-leads:
type: file.csv
config:
path: leads.csv
qualified:
type: data.filter
config:
conditions:
all:
- field: email
op: is_not_null
- field: score
op: gte
value: 50
- field: source
op: not_in
value: [spam, test]
edges:
- raw-leads.output -> qualified.input