Skip to main content

Overview

The Decision Sentinel is a background watcher that answers a question dashboards can’t: is the decisioning stream silently going wrong right now? It runs every 30 minutes (GET /api/v1/cron/ai-sentinel, CRON_SECRET-gated) and evaluates two metrics per tenant over the last 60 minutes versus the previous 60-minute window.

Metrics

Both metrics are computed from decision_traces and are also registered as standard alert-rule metrics, so you can build your own alert rules on them in Settings > Alerts:
MetricDefinitionWarnHard breach
suppression_rateQualified candidates removed by the suppression + contact-policy stages: (Σ afterQualification − Σ afterContactPolicy) / Σ afterQualification≥ 80%≥ 95%
empty_candidate_rateDecisions that returned zero offers: traces with finalCount = 0 / total traces≥ 30%≥ 50%
New tenants get default alert rules for both (suppression ≥ 80%, empty-candidate ≥ 30%) alongside the existing 5xx and degraded-scoring defaults. The Sentinel requires at least 20 traces across the two windows before trusting a rate — low-traffic tenants are skipped rather than alerted on noise.

What happens on a breach

  • Warn breach — a System Health alert (source sentinel) is written and appears on AI > Insights and the notification surfaces. Alerts are deduplicated within the window.
  • Hard breach — the alert severity is critical (which also routes to configured side channels). If — and only if — the tenant has opted in via Settings > AI Configuration > AI Autonomy > “Sentinel may auto-pause active flows” (aiAutopilot.sentinelAutoPause: true), the Sentinel pauses all active decision flows using the same optimistic-locking pattern as the fairness recheck, with a full audit trail (auto_pause / sentinel_breach).
Without the opt-in, the Sentinel never mutates anything.

Why these two metrics

Both failure modes are config-induced silence: a mis-scoped contact policy, an aggressive frequency cap, a broken qualification rule, or an expired offer schedule doesn’t throw errors — it just quietly stops decisions from going out. Latency and 5xx alerting never notices. The Sentinel watches the funnel itself.