Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.kaireonai.com/llms.txt

Use this file to discover all available pages before exploring further.

The Flow editor is the single surface for designing, inspecting, and operating pipelines. It lives at /data/flow-pipelines/[id]/edit. The legacy single-pane pipeline-flow-editor (PipelineNode/Edge model) was deleted on 2026-04-28 — every pipeline is now IR-native (irVersion: "1.0").

Layout

PaneWidthContents
Centerflex (~72%)Tabbed editor: Visual / JSON IR / SQL Preview / Lineage / Schedule
Right~26% (resizable)AI Panel (Pipeline mode) ↔ Node Config (auto-swaps when you click a node on the canvas)
A top bar shows the pipeline name, status pill, IR version pill, Publish button, and a Run-now button. A bottom strip shows the last run, next scheduled fire, DLQ count, and quick links. Run history lives on the dedicated Pipeline Runs page reached from the sidebar — the editor’s previous “Recent runs” left pane was redundant and was removed on 2026-05-02.

Tab reference

Visual

React Flow canvas, kind-coloured nodes (source / transform / validate / target / branch / join / archive / enrich), deterministic vertical waterfall layout. Each node card shows a one-line config summary inline (source: path · pattern · format; target: schema · loadMode; transform: op types; validate: rule count). Click a node → right pane swaps to Node Config (read-only JSON view). Above the canvas there is an Add Node toolbar with five buttons: + Transform / + Validate / + Enrich / + Branch / + Archive. Click one and kaireon inserts a default-configured node into the IR between the last upstream and your target, repoints the target to consume the new node, bumps the IR version, and selects the new node so you can edit it. Defaults intentionally use placeholder values (e.g. transform with a rename: from_field → to_field op) so you can tell at a glance which fields to customize. Source / Target / Join are not in the toolbar — every starter pipeline already has one source and one target, multi-source pipelines are advanced, and Join requires picking two upstream node refs. Add those via the JSON IR tab or the AI panel.

JSON IR

Read/write IR editor with format-on-blur. Server-side validation goes through parsePipelineIR — invalid IR is rejected with the structured errors array rendered verbatim. Save creates a new IR version (POST /api/v1/pipelines/:id/ir) and bumps the version pill.

SQL Preview

Renders the exact INSERT … SELECT each target node would execute, using the same builders the runtime uses (buildLoadSql + buildSelectExpr). blue_green, incremental_watermark, and cdc_mirror targets render an explicit “preview not available — runtime load-mode helper handles it” note rather than fabricated SQL. GET /api/v1/pipelines/:id/sql-preview powers this tab.

Lineage

See Flow Lineage.

Schedule

See Flow Schedule.

Run history

Run history is a dedicated page at Sidebar → Data → Pipeline Runs (/data/flow-runs) — the editor’s previous in-pane “Recent runs” preview was redundant once the standalone page landed and was removed on 2026-05-02. Per-node metric drawers (rowsIn / rowsOut / rowsLoaded / warnings) live there.

Node config — what each form does

The right pane swaps to a per-kind form when you click a node:
  • Source — Path / file mask / format / wait policy. If the parent connector already supplies the bucket (S3 / GCS / Azure), the form hides the bucket input and only asks for the prefix.
  • Transform — Add op rows in order; each op has its own subform (cast type, formula, rename pair, etc.). A collapsible Sample row preview at the bottom takes one JSON row and shows the before / after diff for each op (added / removed / changed fields highlighted). Complex ops (aggregate, lookup_join, vector_embed, geo_resolve, sentiment_score, language_detect) preview as “not previewable client-side — run the pipeline” because they need runtime context.
  • Validate — Row-level rules (notNull, regex, range, fieldType, maxLength), optional dataset-level row-count check, optional quarantine table for failed rows. The same Sample row preview at the bottom shows pass/fail per rule.
  • Enrich — Provider picker (llm_tag / geocode / ml_score) with a per-provider config block. The output-field input checks the destination DataSchema in real time: if the column doesn’t exist, it shows a red “missing column” badge with an inline + Add as <dataType> button that POSTs to /api/v1/schemas/fields with a sensible default type (string for llm_tag, float for geocode lat/lon and ml_score).
  • Branchwhen predicate + then route per case, plus a default route. The route inputs are dropdowns populated from the current IR’s other nodes (excluding the branch itself and source nodes). Stale references (a then value that no longer matches any node id) render with a red border so they’re easy to spot and fix.
  • Archive — Connector picker (Select limited to S3 / GCS / Azure / SFTP / local_fs) + destination path with date-template tokens. The form shows folder-creation semantics inline:
    • Cloud blob stores create the object-key path on PUT, so no pre-existing folder is needed.
    • SFTP and local_fs require the parent directory to exist already; date-token sub-folders (e.g. {YYYY-MM-DD}/) are auto-created beneath it. A Test connection button reuses POST /api/v1/connectors/test to verify credentials before the first run. Optional connectorId lands in the IR alongside the destination — backwards compatible with archive nodes that only set destination.
  • Target — Substantial form. Includes:
    • Reads from Select listing every other node in the IR — so multi-target IRs (one source fan-out to multiple destinations) are editable in the visual surface.
    • Schema name (schema.table, defaults to public.).
    • Load mode (append / truncate / upsert / blue_green / incremental_watermark / cdc_mirror).
    • Full-refresh hard warning — picking truncate on a one-source-one-target shape surfaces a recommend-blue_green banner with a one-click switch.
    • failOnEmptySource Switch (default on) for destructive modes — aborts before TRUNCATE / blue_green when upstream is empty.
    • backupBeforeLoad Switch for destructive modes — opt-in pre-load snapshot to <table>_backup_<runId> with retain-N pruning.
    • Mode-specific config blocks: upsertKey, watermarkColumn, cdcSource.
    • Inbound-FK probe when blue_green is picked — fires GET /api/v1/schemas/:id/incoming-fks and surfaces an amber warning listing referencing tables that would block the swap’s final DROP.
    • Inline mode-switch validation — empty upsertKey, missing watermark column, all-keys upsert, etc. all surface as inline errors that block save.
The visual canvas also pins archive nodes to the bottom of the layout regardless of where their input lives in the DAG — they’re post-load side effects, not inline transforms. The toolbar above the canvas has + Target alongside + Transform /
  • Validate / + Enrich / + Branch / + Archive: clicking it inserts a new sibling target consuming the same upstream as the existing target, without re-pointing it. Pick a destination schema in the right pane.

Honest residuals

Documented gaps, deferred to follow-up plans:
  • Drag-to-create-edge / right-click “Delete” on the visual canvas.
  • Pane collapse buttons (react-resizable-panels v4 typing settles).
  • Run-over-run trend chart (hardening pass).
  • Visual DAG highlight when clicking a lineage row (Wave 4 polish).
  • Selective replay-from-DLQ runtime (replayDlqOnly flag).
  • Sample row preview’s expression op shows the formula text rather than evaluating client-side — the server-side formula engine is the source of truth and replicating it in the browser would risk drift.
  • Archive runtime sink does not yet read archive.connectorId — the IR field is reserved and accepted, but the executor still parses the destination URL until the cross-cutting wiring lands.
  • The 1500-line node-config-panel.tsx is partially extracted (constants + field-input primitives moved to components/flow-editor/forms/); the per-kind subforms still live in the panel. Full per-form file split is a follow-up refactor.
These aren’t broken — they aren’t built yet, and the UI calls them out by name where they would otherwise appear.