- All six target load modes are now real (with
cdc_mirrorreserved for the streaming runtime in Phase 6). - Pre/post hooks fire around the load.
- Dataset-level validators run as parameterized SQL probes.
- Row-level rules evaluate in memory, with optional DLQ quarantine.
Target load modes
| Mode | When to use | What runs |
|---|---|---|
append | Insert new rows alongside existing data | INSERT INTO target SELECT * FROM source |
truncate | Replace all rows on every run | TRUNCATE then INSERT |
upsert | Idempotent merge by primary-like key | INSERT … ON CONFLICT (...) DO UPDATE (requires upsertKey) |
blue_green | Atomic swap with zero downtime | Load to <table>_new, then 3-step rename triple, drop <table>_old |
incremental_watermark | Append-only with high-watermark filter | SELECT MAX(<watermarkColumn>) then INSERT … WHERE … > $1 (requires watermarkColumn) |
cdc_mirror | Streaming change-data-capture mirror | Phase 6 stub — requires Flink/Debezium runtime; current runtime throws a clear scope-defer error |
Blue-green details
Blue-green loads the new dataset into a sibling<table>_new, runs the
post-hook (e.g., ANALYZE or REFRESH MATERIALIZED VIEW), then issues
three sequential ALTER TABLE ... RENAME statements:
<table>_new fails, <table>_new is dropped and the
upstream error rethrows so the existing target is untouched.
Incremental-watermark details
SELECT MAX(<watermarkColumn>) AS max FROM <target> produces the high
water; INSERT INTO <target> SELECT * FROM <source> WHERE <watermarkColumn> > $1
appends only newer rows. When the target is empty, the watermark
defaults to '0001-01-01' so the first run loads everything.
Hooks
The IRpreHook and postHook slots accept four hook kinds. The
pre-hook runs before the load; the post-hook runs only on
successful load.
SQL hook safety
Hook SQL is admin-grade. The runtime forbids any statement starting with a data-modifying verb (DROP, DELETE, UPDATE, TRUNCATE, INSERT,
MERGE, COPY, GRANT, REVOKE, ALTER). Allowed verbs include
ANALYZE, REFRESH, CLUSTER, VACUUM, SET, EXPLAIN, etc.
Embedded ; and SQL comments are also rejected.
Webhook safety
Webhook URLs flow throughlib/security/url-validator.validateAndResolve
— the same SSRF guard the rest of the platform uses. Private IP ranges
and DNS rebinding attempts are blocked before the fetch fires.
Dataset validators
{ ok, message?, observed? } result. The validate executor aggregates
them and applies each check’s onFail:
onFail | Behavior |
|---|---|
abort | Throws; the run fails |
warn | Pushes a string into the warnings array; downstream still runs |
Row-level rules
upstream.meta.rows). Each rule’s onFail:
onFail | Behavior |
|---|---|
abort | Throws on first violation |
warn | Logs; row continues downstream |
skip | Drops the row; recorded in failed[] for optional DLQ writes |
fieldType.expected supports: string, integer, float, boolean,
date, timestamp, json, uuid.
DLQ (quarantine)
skip- or warn-mode rule is
written to the configured DLQ table. The table is created idempotently
on first failed-row write:
inspectFlowError tool reads this table; the future “Errors”
UI will render the same data row by row.
Out of scope
- Real source materialization — the existing tests stub Prisma; production runs against a live Postgres still require the source executor to actually
CREATE + INSERTrows, which is a separate cross-cutting concern. - Streaming CDC runtime —
cdc_mirrorkeeps its scope-defer error until Phase 6. custom_functionhook implementation — requires the Phase 5 plugin SDK.- Cross-session transaction coordination — pre/load/post hooks each run in their own session today; coordinated transactions land in Phase 6.
Related
- Pipeline IR — the typed schema target/validate executors operate on.
- File Ingestion — Phase 3 source-side companion.
- MCP Flow Server —
inspectFlowErrorreads the DLQ table.