Skip to main content
KaireonAI Flow is the data + decisioning fabric for the platform — a typed Pipeline IR, an in-process batch interpreter, an AI authoring layer, an MCP server for external agents, atomic file ingestion, six target load modes, hook execution, dataset + row-level validators, YAML connectors with an AI generator, row-level lineage, and configurable UI pages.

Phase summary

PhaseWhat’s live
1Pipeline IR (8 node kinds), parsePipelineIR, versioned IR repo, batch interpreter, 8 executors, legacy adapter, IR-native POST /pipelines + run dispatch
2aAI Pipeline Mode — validate-then-regenerate Coordinator (≤3 retries), tenant-configured provider, RFC 6902 IrDiffView, 100/h rate limit
2bMCP flow-server — 11 tools + 3 cross-cutting IR endpoints (GET/POST /pipelines/:id/ir, GET …/versions); isMcpReadOnly() write-gated
3File ingestion — date-template patterns w/ IANA tz, 4 ordering modes, wait policy + onMissAction, atomic stage→archive→failure, csv/json/jsonl
4Load modes — append/truncate/upsert + blue_green (rename triple) + incremental_watermark; cdc_mirror Phase-6 stub. Hooks (sql/refresh/webhook/custom_function). 5 dataset validators + row-level rules + DLQ auto-create
5YAML connectors — 4 auth × 4 pagination × 9 categories. HTTP runtime with template substitution + SSRF + rate-limit. Plugin SDK. POST /api/v1/connectors/yaml
5bAI connector generator — paste docs URL or hint → draft YAML via tenant’s configured provider
6Observability core — _kaireon_lineage JSONB on every target write, run-metrics summarizer

UI surfaces (configure everything from the sidebar)

PagePath
Flow Pipelines/data/flow-pipelines — IR list + JSON editor + version history + Run-now
YAML Connectors/data/yaml-connectors — YAML editor + AI generator + Register
Flow Runs/data/flow-runs — run history with status + row counts
AI Pipeline Mode(right-side AI panel on /data/pipelines/* routes)

Validation contract (defense in depth)

Every IR write goes through three checks server-side:
  1. HTTP body validation — Zod on each route
  2. parsePipelineIR — Phase 1 two-phase validator (Zod + structural acyclic + ref-integrity)
  3. AuditLog — every AI proposal + IR save + connector register
Plus runtime safety:
  • All outbound webhooks + AI generator docs fetch + YAML HTTP runtime use validateAndResolve SSRF guard
  • All SQL identifiers go through IDENT regex + safeIdent
  • Hook SQL: forbidden-leading-verb check (DROP/DELETE/UPDATE/TRUNCATE/INSERT/MERGE/COPY/GRANT/REVOKE/ALTER)
  • Transform SQL: sanitizeExpression whitelist
  • All MCP write tools respect isMcpReadOnly() gate; MCP_ALLOW_WRITES=true unlocks

Lineage on every row

Every Phase 4 target write augments rows with a _kaireon_lineage JSONB column:
{
  "runId": "<uuid>",
  "pipelineId": "<id>",
  "sourceNodeId": "<upstream-node>"
}
Idempotent ALTER TABLE ... ADD COLUMN IF NOT EXISTS runs before every load. The future “Errors UI” + the MCP inspectFlowError tool both query against this column.

Honest residuals

Each of these has a clear runtime error pointing at when it’ll be real, rather than a silent stub:
ResidualGate
Real source materialization (rows actually populating PG temp tables)Documented in Phase 4 spec; tests stub Prisma
cdc_mirror runtimeThrows “cdc_mirror requires streaming runtime — Phase 6 (Flink CDC / Debezium integration)“
Cloud connector executors (S3/GCS/Azure/SFTP/FTP/HTTP-pull)Source executor throws “Phase 1-3 supports only local_fs source”
Parquet/Avro/ORC/TSV/XML formatsThrows “format X not yet supported (deferred to a later phase)“
custom_function hookThrows “custom_function hook requires the plugin SDK — Phase 5”
Cron + EventBridge schedulingInline-on-run today; documented as ops phase
Marketplace UI publishingUX phase
Filesystem auto-discovery of pluginsPlugins register explicitly; auto-discovery is a follow-up
Bulk migration of 25 HTTP-shaped connectors to YAMLEach its own follow-up PR
When the missing infrastructure lands, no UI or schema work is needed — the configurable surface is already there.

Test + typecheck health

  • 260 / 260 lib/flow tests passing
  • Typecheck clean across lib/flow, components/flow, flow API routes
  • ~50 atomic commits across the 8 phases + UI + audit-closure passes
  • Six Mintlify pages cover every user-visible capability

Reading order

  1. Pipeline IR — the typed contract
  2. AI Pipeline Authoring — NL → IR
  3. MCP Flow Server — external-agent surface
  4. File Ingestion — source-side semantics
  5. Loading Modes & Validation — target-side semantics
  6. YAML Connectors — declarative HTTP/REST connectors