Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.kaireonai.com/llms.txt

Use this file to discover all available pages before exploring further.

KaireonAI Flow is the data + decisioning fabric for the platform — a typed Pipeline IR, an in-process batch interpreter, an AI authoring layer, an MCP server for external agents, atomic file ingestion, six target load modes, hook execution, dataset + row-level validators, YAML connectors with an AI generator, row-level lineage, and configurable UI pages.

Phase summary

PhaseWhat’s live
1Pipeline IR (8 node kinds), parsePipelineIR, versioned IR repo, batch interpreter, 8 executors, legacy adapter, IR-native POST /pipelines + run dispatch
2aAI Pipeline Mode — validate-then-regenerate Coordinator (≤3 retries), tenant-configured provider, RFC 6902 IrDiffView, 100/h rate limit
2bMCP flow-server — 130 tools (120 primitives + 10 playbooks) + cross-cutting IR endpoints; isMcpReadOnly() write-gated
3File ingestion — date-template patterns w/ IANA tz, 4 ordering modes, wait policy + onMissAction (alert now emits to System Health), atomic stage→archive→failure
4Load modes — append / truncate / upsert / blue_green / incremental_watermark all live. cdc_mirror gated by FLOW_STREAMING_ENABLED. Hooks, 5 dataset validators + row-level rules + DLQ auto-create. Empty-source guard, transactional truncate, persisted watermark, upsert all-keys block, mode-switch validation, optional row-count anomaly detection, optional backupBeforeLoad
5YAML connectors — 4 auth × 4 pagination × 9 categories. HTTP runtime with template substitution + SSRF + rate-limit. Plugin SDK. POST /api/v1/connectors/yaml
5bAI connector generator — paste docs URL or hint → draft YAML via tenant’s configured provider
6.1All 8 source formats live (csv / json / jsonl / parquet / avro / orc / tsv / xml)
6.2Cloud-store source executors live for S3 / GCS / Azure Blob / SFTP / HTTP-pull (plus local_fs); FTP documented as deprecated and unreachable from runtime
6.3Flow editor 2-pane UX (Visual / JSON IR / SQL Preview / Lineage / Schedule + per-kind node config forms)
6.4Streaming runtime feature-flag (FLOW_STREAMING_ENABLED) + checkpoint scaffold
6.5Scheduler tick endpoint + per-tenant scheduler page
6.6k6 baseline + 9 net-new Mintlify hardening pages
System Health (2026-05-02)Operational alerts widget (top of every page) — DB-backed feed, 5-severity taxonomy, per-user read state, 30s polling, retention purge, 3 first consumers wired (run failures, source-no-files, row-count anomaly)

UI surfaces (configure everything from the sidebar)

PagePath
Pipelines/data/flow-pipelines — IR list + JSON editor + version history + Run-now
YAML Connectors/data/yaml-connectors — YAML editor + AI generator + Register
Pipeline Runs/data/flow-runs — run history with status + row counts (sidebar entry: “Pipeline Runs”)
AI Pipeline Mode(right-side AI panel on /data/flow-pipelines/* routes)

Validation contract (defense in depth)

Every IR write goes through three checks server-side:
  1. HTTP body validation — Zod on each route
  2. parsePipelineIR — Phase 1 two-phase validator (Zod + structural acyclic + ref-integrity)
  3. AuditLog — every AI proposal + IR save + connector register
Plus runtime safety:
  • All outbound webhooks + AI generator docs fetch + YAML HTTP runtime use validateAndResolve SSRF guard
  • All SQL identifiers go through a strict identifier-regex check plus the safe-identifier helper
  • Hook SQL: forbidden-leading-verb check (DROP/DELETE/UPDATE/TRUNCATE/INSERT/MERGE/COPY/GRANT/REVOKE/ALTER)
  • Transform SQL: sanitizeExpression allowlist
  • All MCP write tools respect the read-only safety gate; MCP_ALLOW_WRITES=true unlocks

Lineage on every row

Every Phase 4 target write augments rows with a _kaireon_lineage JSONB column:
{
  "runId": "<uuid>",
  "pipelineId": "<id>",
  "sourceNodeId": "<upstream-node>"
}
Idempotent ALTER TABLE ... ADD COLUMN IF NOT EXISTS runs before every load. The future “Errors UI” + the MCP inspectFlowError tool both query against this column.

Honest residuals

Each remaining gap has a clear runtime error or env gate, never a silent stub:
ResidualGate
Streaming consumer kinds (kafka, kinesis, pulsar)Gated by FLOW_STREAMING_ENABLED. Off → runtime returns “streaming disabled”
cdc_mirror runtimeSame gate; on → throws “Debezium connector not configured” until self-hoster wires the broker
archive.connectorId runtime sinkIR field accepted, executor still parses destination URLs; cross-cutting wire-up pending
custom_function hookThrows “custom_function hook requires the plugin SDK”
Marketplace UI publishingUX phase
Filesystem auto-discovery of pluginsPlugins register explicitly; auto-discovery is a follow-up
Bulk migration of 25 HTTP-shaped connectors to YAMLEach its own follow-up PR
When a residual lifts, no UI or schema work is needed — the configurable surface is already there.

Test + typecheck health

Reading order

  1. Pipeline IR — the typed contract
  2. AI Pipeline Authoring — NL → IR
  3. MCP Flow Server — external-agent surface
  4. File Ingestion — source-side semantics
  5. Loading Modes & Validation — target-side semantics
  6. YAML Connectors — declarative HTTP/REST connectors