Documentation Index
Fetch the complete documentation index at: https://docs.kaireonai.com/llms.txt
Use this file to discover all available pages before exploring further.
Topology
/api/v1/cron/<name>. Splitting this from the API replicas means a
backed-up API pool cannot prevent the cron from firing, and a slow
handler does not consume an API replica’s RPS budget for the next tick
(the CronJob’s concurrencyPolicy: Forbid ensures this).
Configured jobs
Seven jobs inhelm/values.yaml:
| Name | Default schedule | Endpoint |
|---|---|---|
cleanup | 30 4 * * * | /api/v1/cron/cleanup (DSAR/retention purge) |
engagementHealthRecompute | 15 3 * * * | /api/v1/cron/engagement-health-recompute |
gitopsDriftCheck | 0 5 * * * | /api/v1/cron/gitops-drift-check |
approvalsExpire | */15 * * * * | /api/v1/cron/approvals-expire |
flowSchedulerTick | * * * * * | /api/v1/cron/flow-scheduler-tick |
scheduledRetrains | 0 2 * * * | /api/v1/cron/scheduled-retrains |
outboxReaper | */2 * * * * | /api/v1/cron/outbox-reaper (resets stuck outbox rows) |
cron.schedules.<name>.enabled
to false. Disable the whole tier with cron.enabled: false (the
existing in-process schedulers continue to work; you’d lose the
external trigger redundancy).
Auth + secret handling
CRON_SECRET is read from the existing api-secrets Secret via a
secretKeyRef — it is never rendered into the values file or the
container args. The cron container builds the Authorization: Bearer $CRON_SECRET header inside its shell so the secret never appears in
ps output.
The cron container runs as non-root, drops all capabilities, and is
restricted to a bare curlimages/curl image — no node/python runtimes
inside the trigger surface.
Operational checklist
- Watch the cluster’s
kube-systemevents for backoff-limit-exceeded errors on any cron job — repeated failures usually mean the corresponding API handler is broken or the API service DNS is wrong. - Set tighter
activeDeadlineSeconds(default 600s) for jobs whose handlers should never run that long. - For staged rollouts, override
cron.schedules.<name>.scheduleper environment so non-prod runs less frequently. - Cleanup history retention:
successfulJobsHistoryLimit: 3,failedJobsHistoryLimit: 5— bump if you need more for forensics.
Operator-wired cron routes
Three/api/v1/cron/* handlers ship in the API image but are not
wired into helm/values.yaml’s cron.schedules block. They exist
because their cadence is policy-driven (retention windows, SIEM batch
intervals, export checkpoints) rather than chart-driven, so the chart
leaves the schedule choice to the operator.
All three accept the same Authorization: Bearer $CRON_SECRET shape
as the wired routes. Each fails closed when CRON_SECRET is unset.
To run any of them, copy the wired-cron CronJob template from
helm/templates/cronjobs.yaml, swap the path: value, and pick a
schedule appropriate for your retention or SIEM-batch policy. No
handler-side change is needed — the only thing the chart contributes
to a wired job is the schedule + the curl trigger.
/api/v1/cron/dsar-purge
Iterates every tenant, reads each tenant’s per-class retention rows
(skipping any row marked legal-hold), collapses to the strictest
(smallest) retentionDays across data classes, and deletes rows
older than that cutoff from the decision-trace, interaction-history,
and AI-attachment tables. Attachment storage blobs are deleted
best-effort through the configured attachment store — when the
local-filesystem file or S3 object is already gone the route logs
and continues. Each tenant produces an audit_log row with the
per-class purge counts. The response shape is { ok, tenantsScanned, totalBlobsAttempted, perTenant: [{ tenantId, retentionDays, decisionTraces, interactionHistory, aiAttachments, blobsDeleted, errors[] }] }.
- Schedule: operator-supplied; no default ships in the chart.
- Recommended cadence: once per day, off-hours. The handler scales linearly with row count so a tenant with multi-million-row purge backlogs benefits from running daily rather than weekly.
- Pre-flight: make sure every tenant that should be purged has a
retention-policy row that is not on legal hold and that has a
positive
retentionDays. Tenants without a config are scanned but skipped (a zero-day retention short-circuits the per-tenant loop). - See Retention for the per-class retention-policy schema and the legal-hold semantics.
/api/v1/cron/siem-ship
Reads the last 5 minutes of audit-log rows (up to 500) and ships them
to the SIEM backend selected by SIEM_BACKEND (splunk, datadog,
or elastic); the SIEM configuration is parsed and validated at
startup from the SIEM environment variables listed below. When
SIEM_BACKEND is unset or holds an unknown value the route returns
{ ok: true, skipped: "..." } and is a structured no-op. Response
shape: { ok, backend, shipped, errors, windowSeconds: 300, rowsScanned }.
- Schedule: operator-supplied; no default ships in the chart.
- Recommended cadence: every 5 minutes — matches the hard-coded 5-minute window the handler reads. Slower ticks risk dropping the tail of the 500-row batch limit; faster ticks ship duplicate rows.
- Required environment variables (validated at startup):
SIEM_BACKEND— one ofsplunk,datadog,elastic.SIEM_ENDPOINT— destination URL for the chosen backend.SIEM_API_KEY— bearer/HEC token; backend-specific.SIEM_INDEX— target index (Splunk / Elastic).SIEM_SOURCETYPE— Splunk source-type tag.
/api/v1/cron/export-interactions
Hive-partitioned NDJSON export of the interaction-history table. For
each active tenant, reads the export checkpoint’s last-export-at
timestamp, queries up to 10 000 newer interaction-history rows (the
batch size is a fixed constant in the handler), writes them to
exports/{tenantId}/interaction_history/year=YYYY/month=MM/day=DD/batch-{ts}.json,
and advances the checkpoint. First run starts at the Unix epoch
(1970-01-01) and exports all existing rows. Response shape:
{ ok, totalExported, tenants: [{ tenantId, tenantName, exported, filePath, error? }] }.
- Schedule: operator-supplied; no default ships in the chart.
- Recommended cadence: hourly. The 10 000-row batch limit means a tenant generating > 240k interactions per day needs more than one tick per hour to keep up.
- Honest limit: the handler writes to the API pod’s local filesystem rooted at the process working directory. Production deployments must replace this with an S3 or GCS upload before flipping the schedule on; otherwise the export files are lost when the API pod restarts. A code comment at the top of the handler flags this explicitly as a production note recommending S3 uploads.
- The route requires
prisma.exportCheckpointrows to exist or be creatable for every tenant; the first run auto-creates a checkpoint per tenant.
Legacy /api/cron/tick
The original once-per-minute tick that evaluates every active alert
rule (Phase 02) and runs the report-schedule loop (Phase 03) for
every tenant. Superseded by the
/api/v1/cron/* tier for Kubernetes deployments — the wired-cron jobs
above split the same work across narrower handlers with concurrencyPolicy: Forbid and isolated retries.
This route stays in the codebase only for the EventBridge pilot path
where a single AWS scheduled rule fires POST /api/cron/tick against
the API service. See EventBridge Setup
for that runbook. New Helm deployments should leave the legacy tick
unscheduled and rely on the cron.schedules block instead.
Auth on the legacy route is wider — it accepts any of x-cron-token,
x-cron-secret, or Authorization: Bearer ..., and treats either
CRON_TOKEN or CRON_SECRET as a valid shared secret. The v1 tier
accepts only Authorization: Bearer $CRON_SECRET.