Decision Sentinel - KaireonAI

Overview

The Decision Sentinel is a background watcher that answers a question dashboards can’t: is the decisioning stream silently going wrong right now? It runs every 30 minutes (GET /api/v1/cron/ai-sentinel, CRON_SECRET-gated) and evaluates two metrics per tenant over the last 60 minutes versus the previous 60-minute window.

Metrics

Both metrics are computed from decision_traces and are also registered as standard alert-rule metrics, so you can build your own alert rules on them in Settings > Alerts:

Metric	Definition	Warn	Hard breach
`suppression_rate`	Qualified candidates removed by the suppression + contact-policy stages: `(Σ afterQualification − Σ afterContactPolicy) / Σ afterQualification`	≥ 80%	≥ 95%
`empty_candidate_rate`	Decisions that returned zero offers: traces with `finalCount = 0` / total traces	≥ 30%	≥ 50%

New tenants get default alert rules for both (suppression ≥ 80%, empty-candidate ≥ 30%) alongside the existing 5xx and degraded-scoring defaults. The Sentinel requires at least 20 traces across the two windows before trusting a rate — low-traffic tenants are skipped rather than alerted on noise.

What happens on a breach

Warn breach — a System Health alert (source sentinel) is written and appears on AI > Insights and the notification surfaces. Alerts are deduplicated within the window.
Hard breach — the alert severity is critical (which also routes to configured side channels). If — and only if — the tenant has opted in via Settings > AI Configuration > AI Autonomy > “Sentinel may auto-pause active flows” (aiAutopilot.sentinelAutoPause: true), the Sentinel pauses all active decision flows using the same optimistic-locking pattern as the fairness recheck, with a full audit trail (auto_pause / sentinel_breach).

Without the opt-in, the Sentinel never mutates anything.

Why these two metrics

Both failure modes are config-induced silence: a mis-scoped contact policy, an aggressive frequency cap, a broken qualification rule, or an expired offer schedule doesn’t throw errors — it just quietly stops decisions from going out. Latency and 5xx alerting never notices. The Sentinel watches the funnel itself.

Decisioning Autopilot Smart Policy Recommender

​Overview

​Metrics

​What happens on a breach

​Why these two metrics

Overview

Metrics

What happens on a breach

Why these two metrics