Alert Rules

Alert rules monitor platform metrics against thresholds across a rolling time window. When a rule triggers, the platform fans out a notification to every destination the rule references.

Automatic firing requires the cron to be wired. Alert rules are evaluated only when POST /api/cron/tick is invoked. During pilot / initial deployment the cron is not wired to AWS EventBridge by default, so rules are defined but dormant — they won’t fire on their own.To run an evaluation on demand (for development, manual triggering, or smoke-testing a newly created rule), hit /api/cron/tick yourself — see Triggering evaluation manually below. To enable automatic every-minute evaluation, follow EventBridge Setup when you’re ready.

Every rule ships through the same pipeline:

A caller (EventBridge, a manual curl, or any other scheduler) hits /api/cron/tick with the shared CRON_TOKEN.
The tick iterates tenants and calls evaluateAllAlertRules per tenant.
The evaluator computes the observed metric value over windowMinutes, compares against threshold using operator, and — if triggered — fans out a notification to every destination listed in channels (as long as the rule is outside its cooldownMinutes).
lastFiredAt is updated and an audit log entry is written.

Supported metrics

Metric	Unit	Window semantics
`acceptance_rate`	rate (0–1)	positives / impressions over the window
`ctr`	rate (0–1)	clicks (click + convert) / impressions
`revenue`	currency units	sum of `outcome.conversionValue`
`selection_frequency`	rate (0–1)	decisions with at least one selected offer / total decision traces
`latency_p99`	milliseconds	99th percentile of `DecisionTrace.totalLatencyMs`
`degraded_scoring_rate`	rate (0–1)	traces with `degradedScoring=true` / total traces

Each observation is computed against two windows: the current window (ending now) and the baseline window (same width, immediately preceding). The evaluator uses the baseline to derive severity — a bigger breach yields a higher severity tier.

Operators

Operator	Meaning
`gt`	observed > threshold
`lt`	observed < threshold
`gte`	observed ≥ threshold
`lte`	observed ≤ threshold
`eq`	observed = threshold

Severity

When a rule fires, the evaluator derives severity from the ratio |observed − threshold| / |threshold| (falling back to baseline when threshold = 0):

≥ 1.0 → critical
≥ 0.5 → warning
else → info

Severity is passed to the notification payload so adapters render the right visual treatment (e.g., themeColor in Teams, severity emoji in Slack, colored band in ops email).

Cooldown

Every rule has a cooldownMinutes knob. After a rule fires, subsequent evaluations that would otherwise trigger are recorded with status = "cooldown" and no notification is dispatched until lastFiredAt + cooldownMinutes has passed. This prevents paging storms when a metric bounces across the threshold.

Configure a rule

Open Settings → Alert Rules and click New Rule. Fields:

Name — free text; included in notification titles.
Metric — one of the supported metrics above.
Operator / Threshold — comparison to evaluate.
Window (minutes) — observation window.
Cooldown (minutes) — minimum gap between consecutive fires.
Destinations — multi-select of Notification Destinations; every selected destination receives the alert on fire.
Enabled — toggle to pause evaluation without deleting the rule.

Rules must reference at least one destination. If you delete a destination, rules pointing at it will log delivery_failed on their next fire.

Creating a rule in the UI does not cause it to start firing on its own. Until /api/cron/tick is invoked (manually or via EventBridge), the rule sits idle. See Triggering evaluation manually and EventBridge Setup.

Example rule payloads

A rule that pages when p99 decision latency crosses 500ms over a 10-minute window:

{
  "name": "p99 latency spike",
  "metric": "latency_p99",
  "operator": "gt",
  "threshold": 500,
  "windowMinutes": 10,
  "cooldownMinutes": 30,
  "channels": ["11111111-2222-3333-4444-555555555555"],
  "enabled": true
}

A rule that alerts when acceptance rate drops below 5% over 5 minutes:

{
  "name": "Acceptance rate drop",
  "metric": "acceptance_rate",
  "operator": "lt",
  "threshold": 0.05,
  "windowMinutes": 5,
  "cooldownMinutes": 60,
  "channels": [
    "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
    { "type": "webhook", "target": "https://legacy.example.com/hook" }
  ]
}

Each string in channels is a NotificationProvider UUID. The legacy {type, target} shape remains supported for backward compatibility; prefer provider IDs for new rules.

Rule lifecycle

status = "ok" — last evaluation did not trigger.
status = "fired" — last evaluation triggered and at least one destination accepted the dispatch.
status = "cooldown" — last evaluation triggered but the rule is still within cooldown.
status = "delivery_failed" — last evaluation triggered but every destination returned a failure.
status = "unsupported_metric" — metric name is not recognized (the rule never fires until fixed).

A rule that is never evaluated stays at whatever status value it last had (or the default). Dormant rules don’t transition states on their own — only a tick evaluation can move them.

Triggering evaluation manually

When the cron is not wired to EventBridge (e.g., during pilot, local development, or to smoke-test a newly created rule), you can invoke the evaluator directly:

curl -X POST https://your-deployment/api/cron/tick \
  -H "x-cron-token: $CRON_TOKEN"

This runs a single evaluation pass across every enabled rule for every tenant. Each invocation is independent — rules still respect cooldownMinutes, so two calls back-to-back will not double-fire. Response:

{
  "ok": true,
  "tenantsProcessed": 3,
  "rulesEvaluated": 12,
  "rulesFired": 2,
  "errors": [],
  "durationMs": 187,
  "timestamp": "2026-04-17T12:34:56.789Z"
}

Wire automatic evaluation

When you’re ready for rules to evaluate on a cadence without manual invocation, follow EventBridge Setup. That page is marked optional on purpose — automatic firing is a pilot graduation step, not a prerequisite.

Get Started

Deploy & Operate

Runbooks

Data Platform

Decisioning Studio

Execute & Optimize

Intelligence

Platform & Security

Integrations

Reports

Release Notes

Supported metrics

Operators

Severity

Cooldown

Configure a rule

Example rule payloads

Rule lifecycle

Triggering evaluation manually

Wire automatic evaluation

Get Started

Deploy & Operate

Runbooks

Data Platform

Decisioning Studio

Execute & Optimize

Intelligence

Platform & Security

Integrations

Reports

Release Notes

​Supported metrics

​Operators

​Severity

​Cooldown

​Configure a rule

​Example rule payloads

​Rule lifecycle

​Triggering evaluation manually

​Wire automatic evaluation

Supported metrics

Operators

Severity

Cooldown

Configure a rule

Example rule payloads

Rule lifecycle

Triggering evaluation manually

Wire automatic evaluation