Automatic firing requires the cron to be wired. Alert rules are evaluated
only when
POST /api/cron/tick is invoked. During pilot / initial deployment
the cron is not wired to AWS EventBridge by default, so rules are defined
but dormant — they won’t fire on their own.To run an evaluation on demand (for development, manual triggering, or
smoke-testing a newly created rule), hit /api/cron/tick yourself — see
Triggering evaluation manually below. To
enable automatic every-minute evaluation, follow
EventBridge Setup when you’re ready.- A caller (EventBridge, a manual
curl, or any other scheduler) hits/api/cron/tickwith the sharedCRON_TOKEN. - The tick iterates tenants and calls
evaluateAllAlertRulesper tenant. - The evaluator computes the observed metric value over
windowMinutes, compares againstthresholdusingoperator, and — if triggered — fans out a notification to every destination listed inchannels(as long as the rule is outside itscooldownMinutes). lastFiredAtis updated and an audit log entry is written.
Supported metrics
| Metric | Unit | Window semantics |
|---|---|---|
acceptance_rate | rate (0–1) | positives / impressions over the window |
ctr | rate (0–1) | clicks (click + convert) / impressions |
revenue | currency units | sum of outcome.conversionValue |
selection_frequency | rate (0–1) | decisions with at least one selected offer / total decision traces |
latency_p99 | milliseconds | 99th percentile of DecisionTrace.totalLatencyMs |
degraded_scoring_rate | rate (0–1) | traces with degradedScoring=true / total traces |
Operators
| Operator | Meaning |
|---|---|
gt | observed > threshold |
lt | observed < threshold |
gte | observed ≥ threshold |
lte | observed ≤ threshold |
eq | observed = threshold |
Severity
When a rule fires, the evaluator derives severity from the ratio|observed − threshold| / |threshold| (falling back to baseline when
threshold = 0):
≥ 1.0→ critical≥ 0.5→ warning- else → info
themeColor in Teams, severity emoji in
Slack, colored band in ops email).
Cooldown
Every rule has acooldownMinutes knob. After a rule fires, subsequent
evaluations that would otherwise trigger are recorded with
status = "cooldown" and no notification is dispatched until
lastFiredAt + cooldownMinutes has passed.
This prevents paging storms when a metric bounces across the threshold.
Configure a rule
Open Settings → Alert Rules and click New Rule. Fields:- Name — free text; included in notification titles.
- Metric — one of the supported metrics above.
- Operator / Threshold — comparison to evaluate.
- Window (minutes) — observation window.
- Cooldown (minutes) — minimum gap between consecutive fires.
- Destinations — multi-select of Notification Destinations; every selected destination receives the alert on fire.
- Enabled — toggle to pause evaluation without deleting the rule.
Example rule payloads
A rule that pages when p99 decision latency crosses 500ms over a 10-minute window:channels is a NotificationProvider UUID. The legacy
{type, target} shape remains supported for backward compatibility;
prefer provider IDs for new rules.
Rule lifecycle
status = "ok"— last evaluation did not trigger.status = "fired"— last evaluation triggered and at least one destination accepted the dispatch.status = "cooldown"— last evaluation triggered but the rule is still within cooldown.status = "delivery_failed"— last evaluation triggered but every destination returned a failure.status = "unsupported_metric"— metric name is not recognized (the rule never fires until fixed).
A rule that is never evaluated stays at whatever
status value it last had
(or the default). Dormant rules don’t transition states on their own — only
a tick evaluation can move them.Triggering evaluation manually
When the cron is not wired to EventBridge (e.g., during pilot, local development, or to smoke-test a newly created rule), you can invoke the evaluator directly:cooldownMinutes, so two calls back-to-back will not double-fire.
Response: