The widget
Top-right of every page, anActivity icon (the EKG-line glyph from
Lucide) shows the current operational state:
| State | Icon | Behavior |
|---|---|---|
| All clean | muted grey, no badge | hover: “System Health — all clear” |
| Has warnings | amber, count badge | click → drawer |
| Has error | red, count badge | click → drawer |
| Has critical | red, pulsing, count badge | click → drawer |
/system-health for the full filterable table.
Severity taxonomy
| Severity | Color | Auto-purge | External channel (Slack/Teams/email) |
|---|---|---|---|
info | foreground | 90 days | no |
success | green | 90 days | no |
warning | amber | 90 days | no |
error | red | 90 days | yes (when tenant has a provider configured) |
critical | red, pulses | not auto-purged (pinned: true semantics) | yes |
dataClass: "system_health". Set a custom value via the
standard retention API.
Server emitter
Any module records an alert viarecordHealthAlert:
error and critical.
Read API
| Verb | Path | Notes |
|---|---|---|
| GET | /api/v1/system-health | Cursor-paginated feed. Query params: ?cursor=&limit=&unreadOnly=&since=&severity=&source=. Tenant-scoped, per-user read state. |
| PATCH | /api/v1/system-health/:id/read | Mark one alert read for the current user. Idempotent. |
| POST | /api/v1/system-health/read-all | Bulk-mark every visible alert read for the current user. Optional { source: "flow" } to scope. |
| DELETE | /api/v1/system-health/:id | Dismiss for the tenant. Pinned alerts are admin-only; tenant-wide alerts (userId = null) are admin-only — non-admins must mark-read. |
First consumers
| Source | When | Severity |
|---|---|---|
flow | Pipeline run fails for any reason | error |
flow | Source node’s onMissAction: alert matches 0 files | warning |
flow | Target’s expectedRowCountDelta band tripped | warning |
Retention purge
GET /api/v1/cron/system-health-purge deletes alerts past their
tenant’s RetentionConfig.dataClass="system_health" window. Pinned
alerts are skipped. Auth: CRON_SECRET Bearer token (matches the rest
of the cron tier).
Honest residuals
- External side-channel is currently a no-op when the tenant
hasn’t installed a Slack/Teams/email provider. Wiring is duck-typed
through
lib/notifications/provider#dispatchExternalso providers can land later without churning the emitter. - No SSE / WebSocket push. 30-second polling is the v1 pattern; real-time push lands in a follow-up if latency becomes a problem.
- No bulk dismiss UI on
/system-health. Per-row mark-read / dismiss only.