- All six target load modes are now real (with
cdc_mirrorreserved for the streaming runtime in Phase 6). - Pre/post hooks fire around the load.
- Dataset-level validators run as parameterized SQL probes.
- Row-level rules evaluate in memory, with optional DLQ quarantine.
Target load modes
| Mode | When to use | What runs |
|---|---|---|
append | Insert new rows alongside existing data | Explicit-column projection from the staging rows into the target table (column-aware projection that nulls out empty strings) |
truncate | Replace all rows on every run | Truncates the target table and reloads from staging — wrapped in a single Postgres transaction so a failed reload rolls back the truncate. Aborts before any destructive statement runs when upstream produced 0 rows (see Empty-source guard below). |
upsert | Idempotent merge by primary-like key | INSERT … ON CONFLICT (key) DO UPDATE SET nonkey = EXCLUDED.nonkey (requires upsertKey; throws if every projected column is in upsertKey since there’s nothing to update) |
blue_green | Atomic swap with zero downtime | CREATE <table>_new (LIKE <table> INCLUDING ALL), then the same column-aware projection used by append/truncate/upsert inserts into <table>_new, then a 3-step rename triple swaps it in and drops <table>_old. Aborts before the load when upstream produced 0 rows. |
incremental_watermark | Append-only with high-watermark filter | Reads the persisted high-water from pipeline_watermarks, INSERT-and-watermark-upsert wrapped in a single transaction so a failed load rolls back the watermark advance (requires watermarkColumn) |
cdc_mirror | Streaming change-data-capture mirror | Gated behind FLOW_STREAMING_ENABLED=true. When the env is off, throws “streaming disabled”. When on but no Debezium connector is configured, throws “Debezium connector not configured” with a docs link. |
Empty-source guard
WhenloadMode is truncate or blue_green and the upstream produced
0 rows, the runtime aborts the load before issuing the destructive
statement. The run fails with:
failOnEmptySource: false on the target node.
A side-effect: when an onMissAction: alert source comes up empty,
the source executor also emits a warning alert into the
System Health feed so operators see it in
the topbar widget without having to open the runs page.
Mode-switch validation
Picking a load mode in the UI surfaces inline errors before save when:upsertis chosen without anupsertKey, or with keys not in the destination schema, or with keys covering every column.incremental_watermarkis chosen without awatermarkColumn, or the column doesn’t exist in the destination schema.cdc_mirroris chosen without acdcSource.truncateis chosen on a “full-refresh-shaped” pipeline (exactly one source + one target). Soft warning recommendsblue_green; you can keeptruncateif intentional.
parsePipelineIR so the JSON IR
tab can’t bypass them.
Blue-green details
Blue-green loads the new dataset into a sibling<table>_new, runs the
post-hook (e.g., a stats-collection command or a materialized-view
refresh), then issues three sequential alter table … rename
statements:
<table>_new fails, <table>_new is dropped and the
upstream error rethrows so the existing target is untouched.
Incremental-watermark details
The high-water mark lives in thepipeline_watermarks table keyed by
(tenantId, pipelineId, targetSchema, targetTable). Each run:
- Reads the persisted
watermarkValue(falls back toSELECT MAX(<watermarkColumn>) FROM <target>on the first run after an upgrade so we don’t replay history). - Computes the new MAX from the staging table.
- Wraps the
INSERT … WHERE col > $1and the watermarkupsertin a single Postgres transaction so a failed INSERT rolls back the watermark advance.
0001-01-01 (timestamp) or 0 (integer/bigint) so the first
non-empty run loads everything.
Row-count anomaly detection (optional)
AddexpectedRowCountDelta to a target node to receive a warning
alert when today’s load is wildly outside the recent average:
windowRuns successful runs (default 7),
computes the running mean of rowsProcessed, and emits a warning
alert into the System Health feed if today’s load is outside the band.
The run still completes — anomaly detection never fails the run on its
own. Skipped when there are fewer than 3 prior successful runs.
Hooks
The IRpreHook and postHook slots accept four hook kinds. The
pre-hook runs before the load; the post-hook runs only on
successful load.
SQL hook safety
Hook SQL is admin-grade. The runtime forbids any statement starting with a data-modifying verb (drop, delete, update, truncate, insert,
merge, copy, grant, revoke, alter). Allowed verbs include
analyze, refresh, cluster, vacuum, set, explain, etc.
Embedded ; and SQL comments are also rejected.
Webhook safety
Webhook URLs flow throughlib/security/url-validator.validateAndResolve
— the same SSRF guard the rest of the platform uses. Private IP ranges
and DNS rebinding attempts are blocked before the fetch fires.
Dataset validators
{ ok, message?, observed? } result. The validate executor aggregates
them and applies each check’s onFail:
onFail | Behavior |
|---|---|
abort | Throws; the run fails |
warn | Pushes a string into the warnings array; downstream still runs |
Row-level rules
upstream.meta.rows). Each rule’s onFail:
onFail | Behavior |
|---|---|
abort | Throws on first violation |
warn | Logs; row continues downstream |
skip | Drops the row; recorded in failed[] for optional DLQ writes |
fieldType.expected supports: string, integer, float, boolean,
date, timestamp, json, uuid.
DLQ (quarantine)
skip- or warn-mode rule is
written to the configured DLQ table. The table is created idempotently
on first failed-row write:
inspectFlowError tool reads this table; the future “Errors”
UI will render the same data row by row.
Honest residuals
- Streaming CDC runtime —
cdc_mirroris gated byFLOW_STREAMING_ENABLED. Self-hosters with a real Debezium connector enable streaming and complete the wiring; playground keeps it off. custom_functionhook implementation — requires the plugin SDK.- Cross-load-mode transactions across hooks —
truncate,incremental_watermark,append, andupsertnow run their destructive + INSERT statements insideprisma.$transaction. Pre/post hooks still run in their own sessions, so a hook failing after a successful load won’t roll back the load. - Backup before destructive load — opt-in
backupBeforeLoadflag (lands in PR 5) creates a pre-load snapshot table for paranoid users; not yet shipped.
Related
- Pipeline IR — the typed schema target/validate executors operate on.
- File Ingestion — Phase 3 source-side companion.
- MCP Flow Server —
inspectFlowErrorreads the DLQ table.