Documentation Index
Fetch the complete documentation index at: https://docs.kaireonai.com/llms.txt
Use this file to discover all available pages before exploring further.
KaireonAI Flow is the data + decisioning fabric for the platform — a typed
Pipeline IR, an in-process batch interpreter, an AI authoring layer, an
MCP server for external agents, atomic file ingestion, six target load
modes, hook execution, dataset + row-level validators, YAML connectors
with an AI generator, row-level lineage, and configurable UI pages.
Phase summary
| Phase | What’s live |
|---|
| 1 | Pipeline IR (8 node kinds), parsePipelineIR, versioned IR repo, batch interpreter, 8 executors, legacy adapter, IR-native POST /pipelines + run dispatch |
| 2a | AI Pipeline Mode — validate-then-regenerate Coordinator (≤3 retries), tenant-configured provider, RFC 6902 IrDiffView, 100/h rate limit |
| 2b | MCP flow-server — 130 tools (120 primitives + 10 playbooks) + cross-cutting IR endpoints; isMcpReadOnly() write-gated |
| 3 | File ingestion — date-template patterns w/ IANA tz, 4 ordering modes, wait policy + onMissAction (alert now emits to System Health), atomic stage→archive→failure |
| 4 | Load modes — append / truncate / upsert / blue_green / incremental_watermark all live. cdc_mirror gated by FLOW_STREAMING_ENABLED. Hooks, 5 dataset validators + row-level rules + DLQ auto-create. Empty-source guard, transactional truncate, persisted watermark, upsert all-keys block, mode-switch validation, optional row-count anomaly detection, optional backupBeforeLoad |
| 5 | YAML connectors — 4 auth × 4 pagination × 9 categories. HTTP runtime with template substitution + SSRF + rate-limit. Plugin SDK. POST /api/v1/connectors/yaml |
| 5b | AI connector generator — paste docs URL or hint → draft YAML via tenant’s configured provider |
| 6.1 | All 8 source formats live (csv / json / jsonl / parquet / avro / orc / tsv / xml) |
| 6.2 | Cloud-store source executors live for S3 / GCS / Azure Blob / SFTP / HTTP-pull (plus local_fs); FTP documented as deprecated and unreachable from runtime |
| 6.3 | Flow editor 2-pane UX (Visual / JSON IR / SQL Preview / Lineage / Schedule + per-kind node config forms) |
| 6.4 | Streaming runtime feature-flag (FLOW_STREAMING_ENABLED) + checkpoint scaffold |
| 6.5 | Scheduler tick endpoint + per-tenant scheduler page |
| 6.6 | k6 baseline + 9 net-new Mintlify hardening pages |
| System Health (2026-05-02) | Operational alerts widget (top of every page) — DB-backed feed, 5-severity taxonomy, per-user read state, 30s polling, retention purge, 3 first consumers wired (run failures, source-no-files, row-count anomaly) |
| Page | Path |
|---|
| Pipelines | /data/flow-pipelines — IR list + JSON editor + version history + Run-now |
| YAML Connectors | /data/yaml-connectors — YAML editor + AI generator + Register |
| Pipeline Runs | /data/flow-runs — run history with status + row counts (sidebar entry: “Pipeline Runs”) |
| AI Pipeline Mode | (right-side AI panel on /data/flow-pipelines/* routes) |
Validation contract (defense in depth)
Every IR write goes through three checks server-side:
- HTTP body validation — Zod on each route
parsePipelineIR — Phase 1 two-phase validator (Zod + structural acyclic + ref-integrity)
- AuditLog — every AI proposal + IR save + connector register
Plus runtime safety:
- All outbound webhooks + AI generator docs fetch + YAML HTTP runtime use
validateAndResolve SSRF guard
- All SQL identifiers go through a strict identifier-regex check plus the safe-identifier helper
- Hook SQL: forbidden-leading-verb check (DROP/DELETE/UPDATE/TRUNCATE/INSERT/MERGE/COPY/GRANT/REVOKE/ALTER)
- Transform SQL:
sanitizeExpression allowlist
- All MCP write tools respect the read-only safety gate;
MCP_ALLOW_WRITES=true unlocks
Lineage on every row
Every Phase 4 target write augments rows with a _kaireon_lineage
JSONB column:
{
"runId": "<uuid>",
"pipelineId": "<id>",
"sourceNodeId": "<upstream-node>"
}
Idempotent ALTER TABLE ... ADD COLUMN IF NOT EXISTS runs before
every load. The future “Errors UI” + the MCP inspectFlowError tool
both query against this column.
Honest residuals
Each remaining gap has a clear runtime error or env gate, never a
silent stub:
| Residual | Gate |
|---|
Streaming consumer kinds (kafka, kinesis, pulsar) | Gated by FLOW_STREAMING_ENABLED. Off → runtime returns “streaming disabled” |
cdc_mirror runtime | Same gate; on → throws “Debezium connector not configured” until self-hoster wires the broker |
archive.connectorId runtime sink | IR field accepted, executor still parses destination URLs; cross-cutting wire-up pending |
custom_function hook | Throws “custom_function hook requires the plugin SDK” |
| Marketplace UI publishing | UX phase |
| Filesystem auto-discovery of plugins | Plugins register explicitly; auto-discovery is a follow-up |
| Bulk migration of 25 HTTP-shaped connectors to YAML | Each its own follow-up PR |
When a residual lifts, no UI or schema work is needed — the
configurable surface is already there.
Test + typecheck health
- All
lib/flow test suites passing (json-bigint, build-insert-select,
incremental-watermark, ir-to-react-flow, plus the existing 260+
pre-Phase-6 tests)
npx tsc --noEmit clean across lib/flow, components/flow-editor,
flow API routes
- Eight Mintlify pages now cover every user-visible capability:
pipeline-ir,
ai-pipeline-authoring,
mcp-flow-server,
file-ingestion,
loading-modes-validation,
system-health,
flow-editor-ui,
flow-lineage.
Reading order
- Pipeline IR — the typed contract
- AI Pipeline Authoring — NL → IR
- MCP Flow Server — external-agent surface
- File Ingestion — source-side semantics
- Loading Modes & Validation — target-side semantics
- YAML Connectors — declarative HTTP/REST connectors