Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.kaireonai.com/llms.txt

Use this file to discover all available pages before exploring further.

Topology

CronJob (curlimages/curl)  →  curl POST  →  API Service (in-cluster)
                                              └─ /api/v1/cron/<name> handler
The CronJob does not run business logic. It is a schedule-aware HTTP trigger — the work itself runs inside the API tier handler at /api/v1/cron/<name>. Splitting this from the API replicas means a backed-up API pool cannot prevent the cron from firing, and a slow handler does not consume an API replica’s RPS budget for the next tick (the CronJob’s concurrencyPolicy: Forbid ensures this).

Configured jobs

Seven jobs in helm/values.yaml:
NameDefault scheduleEndpoint
cleanup30 4 * * */api/v1/cron/cleanup (DSAR/retention purge)
engagementHealthRecompute15 3 * * */api/v1/cron/engagement-health-recompute
gitopsDriftCheck0 5 * * */api/v1/cron/gitops-drift-check
approvalsExpire*/15 * * * */api/v1/cron/approvals-expire
flowSchedulerTick* * * * */api/v1/cron/flow-scheduler-tick
scheduledRetrains0 2 * * */api/v1/cron/scheduled-retrains
outboxReaper*/2 * * * */api/v1/cron/outbox-reaper (resets stuck outbox rows)
Disable any individual job by flipping cron.schedules.<name>.enabled to false. Disable the whole tier with cron.enabled: false (the existing in-process schedulers continue to work; you’d lose the external trigger redundancy).

Auth + secret handling

CRON_SECRET is read from the existing api-secrets Secret via a secretKeyRef — it is never rendered into the values file or the container args. The cron container builds the Authorization: Bearer $CRON_SECRET header inside its shell so the secret never appears in ps output. The cron container runs as non-root, drops all capabilities, and is restricted to a bare curlimages/curl image — no node/python runtimes inside the trigger surface.

Operational checklist

  • Watch kube-system events for BackoffLimitExceeded on any cron job — repeated failures usually mean the corresponding API handler is broken or the API service DNS is wrong.
  • Set tighter activeDeadlineSeconds (default 600s) for jobs whose handlers should never run that long.
  • For staged rollouts, override cron.schedules.<name>.schedule per environment so non-prod runs less frequently.
  • Cleanup history retention: successfulJobsHistoryLimit: 3, failedJobsHistoryLimit: 5 — bump if you need more for forensics.

Operator-wired cron routes

Three /api/v1/cron/* handlers ship in the API image but are not wired into helm/values.yaml’s cron.schedules block. They exist because their cadence is policy-driven (retention windows, SIEM batch intervals, export checkpoints) rather than chart-driven, so the chart leaves the schedule choice to the operator. All three accept the same Authorization: Bearer $CRON_SECRET shape as the wired routes (platform/src/app/api/v1/cron/dsar-purge/route.ts:43-52, siem-ship/route.ts:18-23, export-interactions/route.ts:28-38). Each fails closed when CRON_SECRET is unset. To run any of them, copy the wired-cron CronJob template from helm/templates/cronjobs.yaml, swap the path: value, and pick a schedule appropriate for your retention or SIEM-batch policy. No handler-side change is needed — the only thing the chart contributes to a wired job is the schedule + the curl trigger.

/api/v1/cron/dsar-purge

Source: platform/src/app/api/v1/cron/dsar-purge/route.ts. Iterates every tenant, reads each tenant’s RetentionConfig rows (legalHold: false only), collapses to the strictest (smallest) retentionDays across data classes, and deletes rows older than that cutoff from DecisionTrace, InteractionHistory, and AiAttachment. Attachment storage blobs are deleted best-effort via getAttachmentStore() — when the local-fs file or S3 object is already gone the route logs + continues. Each tenant produces an audit_log row with the per-class purge counts. The response shape is { ok, tenantsScanned, totalBlobsAttempted, perTenant: [{ tenantId, retentionDays, decisionTraces, interactionHistory, aiAttachments, blobsDeleted, errors[] }] }.
  • Schedule: operator-supplied; no default ships in the chart.
  • Recommended cadence: once per day, off-hours. The handler scales linearly with row count so a tenant with multi-million-row purge backlogs benefits from running daily rather than weekly.
  • Pre-flight: make sure every tenant that should be purged has a RetentionConfig row with legalHold: false and retentionDays > 0. Tenants without a config are scanned but skipped (retentionDays = 0 short-circuits the per-tenant loop).
  • See Retention for the per-class RetentionConfig schema and the legal-hold semantics.

/api/v1/cron/siem-ship

Source: platform/src/app/api/v1/cron/siem-ship/route.ts. Reads the last 5 minutes of AuditLog rows (up to 500) and ships them to the SIEM backend selected by SIEM_BACKEND (splunk | datadog | elastic per platform/src/lib/audit/sink.ts:23 and the getSiemConfigFromEnv() validator at audit/sink.ts:49-65). When SIEM_BACKEND is unset or holds an unknown value the route returns { ok: true, skipped: "..." } and is a structured no-op. Response shape: { ok, backend, shipped, errors, windowSeconds: 300, rowsScanned }.
  • Schedule: operator-supplied; no default ships in the chart.
  • Recommended cadence: every 5 minutes — matches the hard-coded 5-minute window the handler reads. Slower ticks risk dropping the tail of the 500-row batch limit; faster ticks ship duplicate rows.
  • Required env vars (read by getSiemConfigFromEnv() at audit/sink.ts:49-65):
    • SIEM_BACKEND — one of splunk, datadog, elastic.
    • SIEM_ENDPOINT — destination URL for the chosen backend.
    • SIEM_API_KEY — bearer/HEC token; backend-specific.
    • SIEM_INDEX — target index (Splunk / Elastic).
    • SIEM_SOURCETYPE — Splunk source-type tag.

/api/v1/cron/export-interactions

Source: platform/src/app/api/v1/cron/export-interactions/route.ts. Hive-partitioned NDJSON export of interaction_history. For each active tenant, reads export_checkpoints.lastExportAt, queries up to 10 000 newer interactionHistory rows (the BATCH_SIZE constant at export-interactions/route.ts:24), writes them to exports/{tenantId}/interaction_history/year=YYYY/month=MM/day=DD/batch-{ts}.json, and advances the checkpoint. First run starts at epoch (1970-01-01) and exports all existing rows. Response shape: { ok, totalExported, tenants: [{ tenantId, tenantName, exported, filePath, error? }] }.
  • Schedule: operator-supplied; no default ships in the chart.
  • Recommended cadence: hourly. The 10 000-row batch limit means a tenant generating > 240k interactions per day needs more than one tick per hour to keep up.
  • Honest limit: the handler writes to the API pod’s local filesystem (fs.writeFile) — the file path is path.resolve(process.cwd(), relativePath). Production deployments must replace this with an S3 / GCS upload before flipping the schedule on; otherwise the export files are lost when the API pod restarts. The handler comment at export-interactions/route.ts:12 flags this as Production note: Replace fs writes with S3 uploads via @aws-sdk/client-s3.
  • The route requires prisma.exportCheckpoint rows to exist or be creatable for every tenant; the first run auto-creates a checkpoint per tenant.

Legacy /api/cron/tick

Source: platform/src/app/api/cron/tick/route.ts. The original once-per-minute tick that runs evaluateAllAlertRules (Phase 02) and the report-schedule loop (Phase 03) for every tenant. Superseded by the /api/v1/cron/* tier for Kubernetes deployments — the wired-cron jobs above split the same work across narrower handlers with concurrencyPolicy: Forbid and isolated retries. This route stays in the codebase only for the EventBridge pilot path where a single AWS scheduled rule fires POST /api/cron/tick against the API service. See EventBridge Setup for that runbook. New Helm deployments should leave the legacy tick unscheduled and rely on the cron.schedules block instead. Auth on the legacy route is wider — it accepts any of x-cron-token, x-cron-secret, or Authorization: Bearer ..., and reads CRON_TOKEN || CRON_SECRET (api/cron/tick/route.ts:62-72). The v1 tier accepts only Authorization: Bearer $CRON_SECRET.