System Health

The widget
Severity taxonomy
Server emitter
Read API
First consumers
Retention purge
Honest residuals

System Health is the operational alerts feed for a tenant. It is not the bell — the bell is reserved for product-update content. System Health surfaces things that need an operator’s attention right now: pipeline failures, configuration tripwires, threshold trips, license limits, approvals waiting. Top-right of every page, an Activity icon (the EKG-line glyph from Lucide) shows the current operational state:

State	Icon	Behavior
All clean	muted grey, no badge	hover: “System Health — all clear”
Has warnings	amber, count badge	click → drawer
Has error	red, count badge	click → drawer
Has critical	red, pulsing, count badge	click → drawer

Clicking the icon opens a 320px drawer with the recent 20 alerts. “View all” routes to /system-health for the full filterable table.

Severity taxonomy

Severity	Color	Auto-purge	External channel (Slack/Teams/email)
`info`	foreground	90 days	no
`success`	green	90 days	no
`warning`	amber	90 days	no
`error`	red	90 days	yes (when tenant has a provider configured)
`critical`	red, pulses	not auto-purged (`pinned: true` semantics)	yes

The 90-day default lives in RetentionConfig keyed by dataClass: "system_health" per tenant. Set a custom value via the standard retention API.

Server emitter

Any module records an alert via recordHealthAlert:

import { recordHealthAlert } from "@/lib/system-health/emit";

await recordHealthAlert({
  tenantId,
  userId: null,                 // null = fan-out to every user; set for a specific operator
  severity: "warning",
  source: "flow",               // "flow" | "decisioning" | "approvals" | "license" | "system" | …
  title: "Pipeline source had no files",
  message: "Source node \"src\" matched 0 files for pattern \"*.csv\".",
  link: "/data/flow-runs",      // optional in-app route; validated against an allow-list
  metadata: { pipelineId, runId, retries },
  pinned: false,                // true = exclude from auto-purge (compliance-class)
});

Best-effort: failures are logged but don’t throw — alert recording must never break a calling code path. Side-channel routing to external notification providers fires only for error and critical.

Read API

Verb	Path	Notes
`GET`	`/api/v1/system-health`	Cursor-paginated feed. Query params: `?cursor=&limit=&unreadOnly=&since=&severity=&source=`. Tenant-scoped, per-user read state.
`PATCH`	`/api/v1/system-health/:id/read`	Mark one alert read for the current user. Idempotent.
`POST`	`/api/v1/system-health/read-all`	Bulk-mark every visible alert read for the current user. Optional `{ source: "flow" }` to scope.
`DELETE`	`/api/v1/system-health/:id`	Dismiss for the tenant. Pinned alerts are admin-only; tenant-wide alerts (`userId = null`) are admin-only — non-admins must mark-read.

Polling: the topbar widget fetches every 30s while the tab is focused; polling pauses on tab background.

First consumers

Source	When	Severity
`flow`	Pipeline run fails for any reason	`error`
`flow`	Source node’s `onMissAction: alert` matches 0 files	`warning`
`flow`	Target’s `expectedRowCountDelta` band tripped	`warning`

More consumers (approvals waiting, license-tier soft limits, model retraining done, decision flow errors) are wired as their respective features lift.

Retention purge

GET /api/v1/cron/system-health-purge deletes alerts past their tenant’s RetentionConfig.dataClass="system_health" window. Pinned alerts are skipped. Auth: CRON_SECRET Bearer token (matches the rest of the cron tier).

Honest residuals

External side-channel is currently a no-op when the tenant hasn’t installed a Slack/Teams/email provider. Wiring is duck-typed through lib/notifications/provider#dispatchExternal so providers can land later without churning the emitter.
No SSE / WebSocket push. 30-second polling is the v1 pattern; real-time push lands in a follow-up if latency becomes a problem.
No bulk dismiss UI on /system-health. Per-row mark-read / dismiss only.

Loading Modes, Hooks & Validation YAML Connectors & Plugin SDK

Get Started

Deploy & Operate

Runbooks

Data Platform

Decisioning Studio

Execute & Optimize

Intelligence

Platform & Security

Integrations

Reports

Release Notes

The widget

Severity taxonomy

Server emitter

Read API

First consumers

Retention purge

Honest residuals

Get Started

Deploy & Operate

Runbooks

Data Platform

Decisioning Studio

Execute & Optimize

Intelligence

Platform & Security

Integrations

Reports

Release Notes

Documentation Index

​The widget

​Severity taxonomy

​Server emitter

​Read API

​First consumers

​Retention purge

​Honest residuals

The widget

Severity taxonomy

Server emitter

Read API

First consumers

Retention purge

Honest residuals