Cron tier (Helm)

Topology

CronJob (curlimages/curl)  →  curl POST  →  API Service (in-cluster)
                                              └─ /api/v1/cron/<name> handler

The CronJob does not run business logic. It is a schedule-aware HTTP trigger — the work itself runs inside the API tier handler at /api/v1/cron/<name>. Splitting this from the API replicas means a backed-up API pool cannot prevent the cron from firing, and a slow handler does not consume an API replica’s RPS budget for the next tick (the CronJob’s concurrencyPolicy: Forbid ensures this).

Configured jobs

Seven jobs in helm/values.yaml:

Name	Default schedule	Endpoint
`cleanup`	`30 4 * * *`	`/api/v1/cron/cleanup` (DSAR/retention purge)
`engagementHealthRecompute`	`15 3 * * *`	`/api/v1/cron/engagement-health-recompute`
`gitopsDriftCheck`	`0 5 * * *`	`/api/v1/cron/gitops-drift-check`
`approvalsExpire`	`/15 * * *`	`/api/v1/cron/approvals-expire`
`flowSchedulerTick`	`* * * * *`	`/api/v1/cron/flow-scheduler-tick`
`scheduledRetrains`	`0 2 * * *`	`/api/v1/cron/scheduled-retrains`
`outboxReaper`	`/2 * * *`	`/api/v1/cron/outbox-reaper` (resets stuck outbox rows)

Disable any individual job by flipping cron.schedules.<name>.enabled to false. Disable the whole tier with cron.enabled: false (the existing in-process schedulers continue to work; you’d lose the external trigger redundancy).

Auth + secret handling

CRON_SECRET is read from the existing api-secrets Secret via a secretKeyRef — it is never rendered into the values file or the container args. The cron container builds the Authorization: Bearer $CRON_SECRET header inside its shell so the secret never appears in ps output. The cron container runs as non-root, drops all capabilities, and is restricted to a bare curlimages/curl image — no node/python runtimes inside the trigger surface.

Operational checklist

Watch the cluster’s kube-system events for backoff-limit-exceeded errors on any cron job — repeated failures usually mean the corresponding API handler is broken or the API service DNS is wrong.
Set tighter activeDeadlineSeconds (default 600s) for jobs whose handlers should never run that long.
For staged rollouts, override cron.schedules.<name>.schedule per environment so non-prod runs less frequently.
Cleanup history retention: successfulJobsHistoryLimit: 3, failedJobsHistoryLimit: 5 — bump if you need more for forensics.

Operator-wired cron routes

Three /api/v1/cron/* handlers ship in the API image but are not wired into helm/values.yaml’s cron.schedules block. They exist because their cadence is policy-driven (retention windows, SIEM batch intervals, export checkpoints) rather than chart-driven, so the chart leaves the schedule choice to the operator. All three accept the same Authorization: Bearer $CRON_SECRET shape as the wired routes. Each fails closed when CRON_SECRET is unset. To run any of them, copy the wired-cron CronJob template from helm/templates/cronjobs.yaml, swap the path: value, and pick a schedule appropriate for your retention or SIEM-batch policy. No handler-side change is needed — the only thing the chart contributes to a wired job is the schedule + the curl trigger.

`/api/v1/cron/dsar-purge`

Iterates every tenant, reads each tenant’s per-class retention rows (skipping any row marked legal-hold), collapses to the strictest (smallest) retentionDays across data classes, and deletes rows older than that cutoff from the decision-trace, interaction-history, and AI-attachment tables. Attachment storage blobs are deleted best-effort through the configured attachment store — when the local-filesystem file or S3 object is already gone the route logs and continues. Each tenant produces an audit_log row with the per-class purge counts. The response shape is

{ ok, tenantsScanned, totalBlobsAttempted, perTenant: [{ tenantId, retentionDays, decisionTraces, interactionHistory, aiAttachments, blobsDeleted, errors[] }] }

Schedule: operator-supplied; no default ships in the chart.
Recommended cadence: once per day, off-hours. The handler scales linearly with row count so a tenant with multi-million-row purge backlogs benefits from running daily rather than weekly.
Pre-flight: make sure every tenant that should be purged has a retention-policy row that is not on legal hold and that has a positive retentionDays. Tenants without a config are scanned but skipped (a zero-day retention short-circuits the per-tenant loop).
See Retention for the per-class retention-policy schema and the legal-hold semantics.

`/api/v1/cron/siem-ship`

Reads the last 5 minutes of audit-log rows (up to 500) and ships them to the SIEM backend selected by SIEM_BACKEND (splunk, datadog, or elastic); the SIEM configuration is parsed and validated at startup from the SIEM environment variables listed below. When SIEM_BACKEND is unset or holds an unknown value the route returns { ok: true, skipped: "..." } and is a structured no-op. Response shape: { ok, backend, shipped, errors, windowSeconds: 300, rowsScanned }.

Schedule: operator-supplied; no default ships in the chart.
Recommended cadence: every 5 minutes — matches the hard-coded 5-minute window the handler reads. Slower ticks risk dropping the tail of the 500-row batch limit; faster ticks ship duplicate rows.
Required environment variables (validated at startup):
- SIEM_BACKEND — one of splunk, datadog, elastic.
- SIEM_ENDPOINT — destination URL for the chosen backend.
- SIEM_API_KEY — bearer/HEC token; backend-specific.
- SIEM_INDEX — target index (Splunk / Elastic).
- SIEM_SOURCETYPE — Splunk source-type tag.

`/api/v1/cron/export-interactions`

Hive-partitioned NDJSON export of the interaction-history table. For each active tenant, reads the export checkpoint’s last-export-at timestamp, queries up to 10 000 newer interaction-history rows (the batch size is a fixed constant in the handler), writes them to exports/{tenantId}/interaction_history/year=YYYY/month=MM/day=DD/batch-{ts}.json, and advances the checkpoint. First run starts at the Unix epoch (1970-01-01) and exports all existing rows. Response shape: { ok, totalExported, tenants: [{ tenantId, tenantName, exported, filePath, error? }] }.

Schedule: operator-supplied; no default ships in the chart.
Recommended cadence: hourly. The 10 000-row batch limit means a tenant generating > 240k interactions per day needs more than one tick per hour to keep up.
Honest limit: the handler writes to the API pod’s local filesystem rooted at the process working directory. Production deployments must replace this with an S3 or GCS upload before flipping the schedule on; otherwise the export files are lost when the API pod restarts. A code comment at the top of the handler flags this explicitly as a production note recommending S3 uploads.
The route requires prisma.exportCheckpoint rows to exist or be creatable for every tenant; the first run auto-creates a checkpoint per tenant.

Legacy `/api/cron/tick`

The original once-per-minute tick that evaluates every active alert rule (Phase 02) and runs the report-schedule loop (Phase 03) for every tenant. Superseded by the /api/v1/cron/* tier for Kubernetes deployments — the wired-cron jobs above split the same work across narrower handlers with concurrencyPolicy: Forbid and isolated retries. This route stays in the codebase only for the EventBridge pilot path where a single AWS scheduled rule fires POST /api/cron/tick against the API service. See EventBridge Setup for that runbook. New Helm deployments should leave the legacy tick unscheduled and rely on the cron.schedules block instead. Auth on the legacy route is wider — it accepts any of x-cron-token, x-cron-secret, or Authorization: Bearer ..., and treats either CRON_TOKEN or CRON_SECRET as a valid shared secret. The v1 tier accepts only Authorization: Bearer $CRON_SECRET.

Deploy

Configure

Operate

Architecture

Runbooks

Topology

Configured jobs

Auth + secret handling

Operational checklist

Operator-wired cron routes

`/api/v1/cron/dsar-purge`

`/api/v1/cron/siem-ship`

`/api/v1/cron/export-interactions`

Legacy `/api/cron/tick`

Deploy

Configure

Operate

Architecture

Runbooks

Documentation Index

​Topology

​Configured jobs

​Auth + secret handling

​Operational checklist

​Operator-wired cron routes

​/api/v1/cron/dsar-purge

​/api/v1/cron/siem-ship

​/api/v1/cron/export-interactions

​Legacy /api/cron/tick

Topology

Configured jobs

Auth + secret handling

Operational checklist

Operator-wired cron routes

`/api/v1/cron/dsar-purge`

`/api/v1/cron/siem-ship`

`/api/v1/cron/export-interactions`

Legacy `/api/cron/tick`