Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.kaireonai.com/llms.txt

Use this file to discover all available pages before exploring further.

Phase 6.6 closes the operational gaps identified across Phases 6.0–6.5. Each item is a real correctness or observability fix, not a stylistic sweep.

1. Advisory PG lock on scheduler tick

Problem: /api/v1/cron/flow-scheduler-tick was lock-free. An external orchestrator that double-fires the endpoint (e.g. a Vercel Cron retry on a transient timeout) would dispatch every due pipeline twice. Fix: Wrap the tick in pg_try_advisory_lock(0x666c6f77). When the lock is already held, the route returns 200 with skipped: true so the orchestrator doesn’t treat it as an error.
const lockRows = await prisma.$queryRawUnsafe(
  `SELECT pg_try_advisory_lock($1::bigint) AS locked`,
  ADVISORY_KEY,
);
if (!lockRows[0]?.locked) {
  return NextResponse.json({ skipped: true, ... }, { status: 200 });
}
try { /* sweep + dispatch */ } finally {
  await prisma.$queryRawUnsafe(`SELECT pg_advisory_unlock($1::bigint)`, ADVISORY_KEY);
}
The advisory lock is process-wide on the PG instance, so even multi-replica deployments are safe.

2. Per-tenant scheduler dashboard at /data/scheduler

A read-only page (no write controls) showing every IR-native pipeline with ir.schedule:
ColumnSource
Pipeline namepipelines.name
Scheduleir.schedule (cron / interval / rrule)
Last runpipelines.lastRunAt + lastRunStatus pill
LagMinutes between most-recent expected fire and actual lastRunAt — green if <5min, amber otherwise
Next 5 firesComputed via the next-fire-times helper
API surface: GET /api/v1/scheduler-status. Tenant-scoped via requireTenant.

3. k6 baseline perf script at k6/flow/baseline.js

Run with:
k6 run k6/flow/baseline.js \
  --env BASE_URL=http://localhost:3000 \
  --env API_KEY=$KEY \
  --env TENANT_ID=$TENANT \
  --env PIPELINE_ID=$PIPELINE
SLO thresholds enforced in the script:
Endpointp95
GET /api/v1/pipelines?limit=20<400ms
GET /api/v1/pipelines/:id/ir<400ms
GET /api/v1/pipelines/:id/sql-preview<800ms (info_schema lookup)
GET /api/v1/lineage<800ms (hits actual target table)
Global threshold: http_req_failed: rate<0.01 (under 1% errors).

Honest limits deferred beyond Phase 6.6

ItemWhy deferred
Per-partition lag metrics for streaming consumersRequires a real broker, gated by FLOW_STREAMING_ENABLED
trigger.file_arrival.deadline enforcementRequires source-watcher service; not yet built
Run-over-run trend chart on the per-node metrics drawerFrontend chart component, not yet picked
Selective replay-from-DLQ runtime (replayDlqOnly flag)Source executor needs row-key filter capability
Visual DAG highlight on lineage row clickReact Flow custom edge highlighting — polish plan
Drag-add-edge / right-click delete on Visual canvasPolish plan
Editable per-node config form in right panePolish plan
DLQ surfacing for non-validate failures (.failed/ folder rows)Source executor needs to emit failure-folder index