data.filter

Keep rows that match a condition. Drop the rest.

Input: Table | Output: Table (same schema, fewer rows)

Minimal example

Filter customers by country:

nodes:
  german-customers:
    type: data.filter
    config:
      conditions:
        all:
          - field: country
            op: equals
            value: DE

Config reference

Field	Type	Required	Description
`conditions`	object	yes	Condition group (`all` or `any`) containing filter rules

Condition groups

Conditions are wrapped in all (AND) or any (OR). Groups nest arbitrarily.

# AND — all conditions must match
conditions:
  all:
    - field: age
      op: gte
      value: 18
    - field: country
      op: equals
      value: DE

# OR — at least one condition must match
conditions:
  any:
    - field: role
      op: equals
      value: admin
    - field: role
      op: equals
      value: editor

# Nested — (status = active) AND (role = admin OR role = editor)
conditions:
  all:
    - field: status
      op: equals
      value: active
    - any:
        - field: role
          op: equals
          value: admin
        - field: role
          op: equals
          value: editor

Operators

Operator	Description	Value type
`equals`	Exact match	any
`not_equals`	Not equal	any
`gt`	Greater than	number, string
`gte`	Greater than or equal	number, string
`lt`	Less than	number, string
`lte`	Less than or equal	number, string
`contains`	Substring match	string
`not_contains`	No substring match	string
`starts_with`	Prefix match	string
`ends_with`	Suffix match	string
`in`	Value in list	array
`not_in`	Value not in list	array
`is_null`	Field is null	(none)
`is_not_null`	Field is not null	(none)
`matches`	Regex match	string (regex pattern)

Progressive examples

Simple equality

conditions:
  all:
    - field: status
      op: equals
      value: active

Range comparison

conditions:
  all:
    - field: score
      op: gte
      value: 50
    - field: score
      op: lt
      value: 100

Multiple conditions

conditions:
  all:
    - field: email
      op: is_not_null
    - field: score
      op: gte
      value: 50
    - field: source
      op: not_in
      value: [spam, test]

NULL handling

# Keep rows where phone is not null
conditions:
  all:
    - field: phone
      op: is_not_null

# Keep rows where phone IS null
conditions:
  all:
    - field: phone
      op: is_null

Regex match

conditions:
  all:
    - field: zip_code
      op: matches
      value: "^[0-9]{5}$"

List membership

conditions:
  all:
    - field: country
      op: in
      value: [DE, AT, CH]

Substring search

conditions:
  all:
    - field: email
      op: contains
      value: "@company.com"

Promoted ports

Filter values can come from upstream nodes instead of static config. This makes thresholds dynamic.

nodes:
  threshold:
    type: value.literal
    config:
      value: 0.8
      type: number

  high-scores:
    type: data.filter
    config:
      conditions:
        all:
          - field: score
            op: gte
            value: "{{threshold}}"

edges:
  - "threshold.value -> high-scores.threshold"
  - "scores-table.output -> high-scores.input"

Change the upstream value and the filter adapts.

Edge cases

Empty result. If no rows match, the output is an empty NDJSON file (zero lines) with the same schema as the input.

All rows filtered. Same as empty result — valid output, zero rows. Downstream nodes receive an empty table.

NULL comparisons. Comparisons against NULL follow SQL semantics: NULL = anything is false. Use is_null and is_not_null operators for null checks.

Pipeline example

name: qualified-leads
version: 1

nodes:
  raw-leads:
    type: file.source
    path: leads.csv
    format: csv

  qualified:
    type: data.filter
    config:
      conditions:
        all:
          - field: email
            op: is_not_null
          - field: score
            op: gte
            value: 50
          - field: source
            op: not_in
            value: [spam, test]

edges:
  - "raw-leads.data -> qualified.input"