Documentation Index
Fetch the complete documentation index at: https://docs.kaireonai.com/llms.txt
Use this file to discover all available pages before exploring further.
Phase 6.6 closes the operational gaps identified across Phases 6.0–6.5.
Each item is a real correctness or observability fix, not a stylistic
sweep.
1. Advisory PG lock on scheduler tick
Problem: /api/v1/cron/flow-scheduler-tick was lock-free. An external
orchestrator that double-fires the endpoint (e.g. a Vercel Cron retry on
a transient timeout) would dispatch every due pipeline twice.
Fix: Wrap the tick in pg_try_advisory_lock(0x666c6f77). When the
lock is already held, the route returns 200 with skipped: true so the
orchestrator doesn’t treat it as an error.
const lockRows = await prisma.$queryRawUnsafe(
`SELECT pg_try_advisory_lock($1::bigint) AS locked`,
ADVISORY_KEY,
);
if (!lockRows[0]?.locked) {
return NextResponse.json({ skipped: true, ... }, { status: 200 });
}
try { /* sweep + dispatch */ } finally {
await prisma.$queryRawUnsafe(`SELECT pg_advisory_unlock($1::bigint)`, ADVISORY_KEY);
}
The advisory lock is process-wide on the PG instance, so even multi-replica
deployments are safe.
2. Per-tenant scheduler dashboard at /data/scheduler
A read-only page (no write controls) showing every IR-native pipeline
with ir.schedule:
| Column | Source |
|---|
| Pipeline name | pipelines.name |
| Schedule | ir.schedule (cron / interval / rrule) |
| Last run | pipelines.lastRunAt + lastRunStatus pill |
| Lag | Minutes between most-recent expected fire and actual lastRunAt — green if <5min, amber otherwise |
| Next 5 fires | Computed via the next-fire-times helper |
API surface: GET /api/v1/scheduler-status. Tenant-scoped via requireTenant.
3. k6 baseline perf script at k6/flow/baseline.js
Run with:
k6 run k6/flow/baseline.js \
--env BASE_URL=http://localhost:3000 \
--env API_KEY=$KEY \
--env TENANT_ID=$TENANT \
--env PIPELINE_ID=$PIPELINE
SLO thresholds enforced in the script:
| Endpoint | p95 |
|---|
GET /api/v1/pipelines?limit=20 | <400ms |
GET /api/v1/pipelines/:id/ir | <400ms |
GET /api/v1/pipelines/:id/sql-preview | <800ms (info_schema lookup) |
GET /api/v1/lineage | <800ms (hits actual target table) |
Global threshold: http_req_failed: rate<0.01 (under 1% errors).
Honest limits deferred beyond Phase 6.6
| Item | Why deferred |
|---|
| Per-partition lag metrics for streaming consumers | Requires a real broker, gated by FLOW_STREAMING_ENABLED |
trigger.file_arrival.deadline enforcement | Requires source-watcher service; not yet built |
| Run-over-run trend chart on the per-node metrics drawer | Frontend chart component, not yet picked |
Selective replay-from-DLQ runtime (replayDlqOnly flag) | Source executor needs row-key filter capability |
| Visual DAG highlight on lineage row click | React Flow custom edge highlighting — polish plan |
| Drag-add-edge / right-click delete on Visual canvas | Polish plan |
| Editable per-node config form in right pane | Polish plan |
DLQ surfacing for non-validate failures (.failed/ folder rows) | Source executor needs to emit failure-folder index |