Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.kaireonai.com/llms.txt

Use this file to discover all available pages before exploring further.

This walkthrough takes you from zero to a running IR-native pipeline with lineage, error inspection, and a saved schedule. Flow IR is on by default — tenant_settings.flowIrEnabled defaults to true. The flag remains as an explicit kill-switch under /settings if you ever need to disable the pipeline feature wholesale.

1. Create a pipeline (UI)

Open Data → Pipelines and click + New Pipeline. The dialog asks for three things:
  • Name — anything human-readable, e.g. orders-ingest.
  • Connector — pick from your existing connectors. Create one under Data → Connectors first if needed.
  • Schema — pick the destination data schema. Create one under Data → Schemas first if needed.
Click Create & Open Editor. The platform builds a starter IR (one source node mapped from your connector’s type, one append-mode target node pointing at your schema) and redirects you straight into the visual editor at /data/flow-pipelines/<id>/edit. You never paste JSON.

Or: create a pipeline (API)

For automation or CI, the same starter IR can be POSTed:
curl -X POST https://playground.kaireonai.com/api/v1/pipelines \
  -H "x-api-key: $API_KEY" \
  -H "x-tenant-id: $TENANT_ID" \
  -H "content-type: application/json" \
  -d '{
    "name": "orders-ingest",
    "connectorId": "<connector-id>",
    "schemaId": "<schema-id>",
    "irVersion": "1.0",
    "ir": {
      "kind": "pipeline",
      "version": "1.0",
      "id": "orders-ingest",
      "metadata": { "name": "orders-ingest" },
      "nodes": [
        { "id": "src", "kind": "source", "connector": "local_fs",
          "config": { "path": "/data/orders/", "pattern": { "type": "glob", "value": "*.csv" },
            "ordering": "lexicographic",
            "waitPolicy": { "maxRetries": 3, "intervalMinutes": 5, "onMissAction": "skip" },
            "atomicity": { "stagingFolder": ".processing/", "successFolder": ".archive/", "failureFolder": ".failed/" },
            "format": "csv" } },
        { "id": "tgt", "kind": "target", "input": "src",
          "schema": "public.ds_orders", "loadMode": "append" }
      ],
      "errorHandling": { "dlq": { "enabled": false }, "retry": { "maxAttempts": 1, "backoff": "exponential" } }
    }
  }'

2. Tour the editor

After Create & Open Editor, you land on /data/flow-pipelines/<id>/edit. The shell has three resizable panes plus a top bar and a bottom strip:
PaneWhat it shows
Top bar← Pipelines link, pipeline name, status pill (draft until first successful run), IR version pill (v1, v2, …), and Run now
Left paneRecent runs preview (5 latest); a See all runs → link jumps to the Runs tab
Center paneTabs: Visual / JSON IR / SQL Preview / Lineage / Runs / Errors / Schedule
Right paneNode Config — click any node on the Visual canvas to inspect its config; if no node is selected, a tip points you at the docked AI panel
Bottom stripLast run timestamp + status, next-fire summary, DLQ count, link to docs
Click the src node on the Visual canvas — the right pane shows the full source-node IR (path, pattern, ordering, format, atomicity, waitPolicy). Click × in the top-right of the right pane to deselect.

3. Add transforms + validation

Two ways to add nodes:

a. Visual canvas Add-Node toolbar (one click)

Above the canvas there’s a row of buttons: **+ Transform / + Validate /
  • Enrich / + Branch / + Archive**. Click one — kaireon inserts a default node between the last upstream and your target, repoints the target to consume the new node, and bumps the IR version. The new node is auto- selected so you can immediately edit its config in the right pane or the JSON IR tab. Defaults are intentionally placeholder (e.g. a transform with a single rename: from_field → to_field op) so you know exactly what to fix.

b. JSON IR tab (full control)

a. Add a cast / rename / hash transform

Insert a transform node between source and target, then point tgt.input at the transform:
{ "id": "norm", "kind": "transform", "input": "src",
  "ops": [
    { "type": "cast",   "field": "amount",  "to": "decimal" },
    { "type": "rename", "from": "cust_id",  "to": "customer_id" },
    { "type": "hash",   "field": "ssn",     "algorithm": "sha256" }
  ] }
11 transform types are accepted by the IR schema; the visual editor’s toolbar exposes the most-used: cast, expression, rename, drop, filter, add_field, hash, mask_pii, map_values, split, merge. Additional types (summarize, vector_embed, geo_resolve, sentiment_score, language_detect) are authored via the JSON IR tab or AI Pipeline Mode. See Transforms for the full reference.

b. Add row-level + dataset-level validation

{ "id": "v", "kind": "validate", "input": "norm",
  "datasetLevel": { "rowCount": { "min": 1, "onFail": "abort" } },
  "rowLevel": [
    { "id": "amt-positive", "field": "amount", "rule": "gt:0", "onFail": "quarantine" }
  ],
  "quarantine": { "table": "dlq.orders" } }

c. Repoint the target + save

Change tgt.input to "v", click Save in the JSON IR tab. parsePipelineIR runs server-side; if the IR is invalid, the structured errors render verbatim on the page. On success the IR version pill bumps (v1 → v2). Or skip the JSON entirely and use the docked AI panel: “Add a sha256 hash on ssn and quarantine rows where amount ≤ 0” produces the same change as a proposal you can accept with one click.

4. Run the pipeline

Hit Run now in the top bar. The status badge stays draft until the first run succeeds — then it flips to active.

4. Inspect

  • Runs tab → click your run row → drawer shows per-node metrics (rowsIn/rowsOut/rowsLoaded)
  • Lineage tab → pick public.ds_orders → see your loaded rows with their _kaireon_lineage envelopes; click a row to walk back through the IR
  • SQL Preview tab → see the exact INSERT … SELECT the runtime built from your column schema
  • Errors tab → if any rows failed validation, they’re here with the rule id that caught them; Fix with AI drafts an IR change

5. Schedule it

Switch to the Schedule tab. Pick cron, click “Daily at 09:00”, choose your timezone, watch the “Next 5 fires” preview, hit Save schedule. The IR gets a schedule field; the in-process scheduler (running every 60 seconds since server startup) picks it up automatically — no external cron, no extra service to deploy. See Flow Scheduler for tuning knobs and how to disable in-process scheduling if you bring your own orchestrator. Click Tick scheduler now in the same tab to fire any due pipelines immediately, useful right after saving a new schedule.

Where to go next