Documentation Index
Fetch the complete documentation index at: https://docs.kaireonai.com/llms.txt
Use this file to discover all available pages before exploring further.
Topology
/api/v1/cron/<name>. Splitting this from the API replicas means a
backed-up API pool cannot prevent the cron from firing, and a slow
handler does not consume an API replica’s RPS budget for the next tick
(the CronJob’s concurrencyPolicy: Forbid ensures this).
Configured jobs
Seven jobs inhelm/values.yaml:
| Name | Default schedule | Endpoint |
|---|---|---|
cleanup | 30 4 * * * | /api/v1/cron/cleanup (DSAR/retention purge) |
engagementHealthRecompute | 15 3 * * * | /api/v1/cron/engagement-health-recompute |
gitopsDriftCheck | 0 5 * * * | /api/v1/cron/gitops-drift-check |
approvalsExpire | */15 * * * * | /api/v1/cron/approvals-expire |
flowSchedulerTick | * * * * * | /api/v1/cron/flow-scheduler-tick |
scheduledRetrains | 0 2 * * * | /api/v1/cron/scheduled-retrains |
outboxReaper | */2 * * * * | /api/v1/cron/outbox-reaper (resets stuck outbox rows) |
cron.schedules.<name>.enabled
to false. Disable the whole tier with cron.enabled: false (the
existing in-process schedulers continue to work; you’d lose the
external trigger redundancy).
Auth + secret handling
CRON_SECRET is read from the existing api-secrets Secret via a
secretKeyRef — it is never rendered into the values file or the
container args. The cron container builds the Authorization: Bearer $CRON_SECRET header inside its shell so the secret never appears in
ps output.
The cron container runs as non-root, drops all capabilities, and is
restricted to a bare curlimages/curl image — no node/python runtimes
inside the trigger surface.
Operational checklist
- Watch
kube-systemevents forBackoffLimitExceededon any cron job — repeated failures usually mean the corresponding API handler is broken or the API service DNS is wrong. - Set tighter
activeDeadlineSeconds(default 600s) for jobs whose handlers should never run that long. - For staged rollouts, override
cron.schedules.<name>.scheduleper environment so non-prod runs less frequently. - Cleanup history retention:
successfulJobsHistoryLimit: 3,failedJobsHistoryLimit: 5— bump if you need more for forensics.
Operator-wired cron routes
Three/api/v1/cron/* handlers ship in the API image but are not
wired into helm/values.yaml’s cron.schedules block. They exist
because their cadence is policy-driven (retention windows, SIEM batch
intervals, export checkpoints) rather than chart-driven, so the chart
leaves the schedule choice to the operator.
All three accept the same Authorization: Bearer $CRON_SECRET shape
as the wired routes (platform/src/app/api/v1/cron/dsar-purge/route.ts:43-52,
siem-ship/route.ts:18-23, export-interactions/route.ts:28-38).
Each fails closed when CRON_SECRET is unset.
To run any of them, copy the wired-cron CronJob template from
helm/templates/cronjobs.yaml, swap the path: value, and pick a
schedule appropriate for your retention or SIEM-batch policy. No
handler-side change is needed — the only thing the chart contributes
to a wired job is the schedule + the curl trigger.
/api/v1/cron/dsar-purge
Source: platform/src/app/api/v1/cron/dsar-purge/route.ts. Iterates
every tenant, reads each tenant’s RetentionConfig rows (legalHold: false only), collapses to the strictest (smallest) retentionDays
across data classes, and deletes rows older than that cutoff from
DecisionTrace, InteractionHistory, and AiAttachment. Attachment
storage blobs are deleted best-effort via getAttachmentStore() — when
the local-fs file or S3 object is already gone the route logs +
continues. Each tenant produces an audit_log row with the per-class
purge counts. The response shape is { ok, tenantsScanned, totalBlobsAttempted, perTenant: [{ tenantId, retentionDays, decisionTraces, interactionHistory, aiAttachments, blobsDeleted, errors[] }] }.
- Schedule: operator-supplied; no default ships in the chart.
- Recommended cadence: once per day, off-hours. The handler scales linearly with row count so a tenant with multi-million-row purge backlogs benefits from running daily rather than weekly.
- Pre-flight: make sure every tenant that should be purged has a
RetentionConfigrow withlegalHold: falseandretentionDays > 0. Tenants without a config are scanned but skipped (retentionDays = 0short-circuits the per-tenant loop). - See Retention for the per-class
RetentionConfigschema and the legal-hold semantics.
/api/v1/cron/siem-ship
Source: platform/src/app/api/v1/cron/siem-ship/route.ts. Reads the
last 5 minutes of AuditLog rows (up to 500) and ships them to the
SIEM backend selected by SIEM_BACKEND (splunk | datadog |
elastic per platform/src/lib/audit/sink.ts:23 and the
getSiemConfigFromEnv() validator at audit/sink.ts:49-65). When
SIEM_BACKEND is unset or holds an unknown value the route returns
{ ok: true, skipped: "..." } and is a structured no-op. Response
shape: { ok, backend, shipped, errors, windowSeconds: 300, rowsScanned }.
- Schedule: operator-supplied; no default ships in the chart.
- Recommended cadence: every 5 minutes — matches the hard-coded 5-minute window the handler reads. Slower ticks risk dropping the tail of the 500-row batch limit; faster ticks ship duplicate rows.
- Required env vars (read by
getSiemConfigFromEnv()ataudit/sink.ts:49-65):SIEM_BACKEND— one ofsplunk,datadog,elastic.SIEM_ENDPOINT— destination URL for the chosen backend.SIEM_API_KEY— bearer/HEC token; backend-specific.SIEM_INDEX— target index (Splunk / Elastic).SIEM_SOURCETYPE— Splunk source-type tag.
/api/v1/cron/export-interactions
Source: platform/src/app/api/v1/cron/export-interactions/route.ts.
Hive-partitioned NDJSON export of interaction_history. For each
active tenant, reads export_checkpoints.lastExportAt, queries up to
10 000 newer interactionHistory rows (the BATCH_SIZE constant at
export-interactions/route.ts:24), writes them to
exports/{tenantId}/interaction_history/year=YYYY/month=MM/day=DD/batch-{ts}.json,
and advances the checkpoint. First run starts at epoch (1970-01-01)
and exports all existing rows. Response shape: { ok, totalExported, tenants: [{ tenantId, tenantName, exported, filePath, error? }] }.
- Schedule: operator-supplied; no default ships in the chart.
- Recommended cadence: hourly. The 10 000-row batch limit means a tenant generating > 240k interactions per day needs more than one tick per hour to keep up.
- Honest limit: the handler writes to the API pod’s local filesystem
(
fs.writeFile) — the file path ispath.resolve(process.cwd(), relativePath). Production deployments must replace this with an S3 / GCS upload before flipping the schedule on; otherwise the export files are lost when the API pod restarts. The handler comment atexport-interactions/route.ts:12flags this asProduction note: Replace fs writes with S3 uploads via @aws-sdk/client-s3. - The route requires
prisma.exportCheckpointrows to exist or be creatable for every tenant; the first run auto-creates a checkpoint per tenant.
Legacy /api/cron/tick
Source: platform/src/app/api/cron/tick/route.ts. The original
once-per-minute tick that runs evaluateAllAlertRules (Phase 02) and
the report-schedule loop (Phase 03) for every tenant. Superseded by the
/api/v1/cron/* tier for Kubernetes deployments — the wired-cron jobs
above split the same work across narrower handlers with concurrencyPolicy: Forbid and isolated retries.
This route stays in the codebase only for the EventBridge pilot path
where a single AWS scheduled rule fires POST /api/cron/tick against
the API service. See EventBridge Setup
for that runbook. New Helm deployments should leave the legacy tick
unscheduled and rely on the cron.schedules block instead.
Auth on the legacy route is wider — it accepts any of x-cron-token,
x-cron-secret, or Authorization: Bearer ..., and reads
CRON_TOKEN || CRON_SECRET (api/cron/tick/route.ts:62-72). The
v1 tier accepts only Authorization: Bearer $CRON_SECRET.