MFA enforcement

Why this exists
How the flow works
Step-up TTL
What’s bypassed
Kill switch (incident only)
Operator runbook
What’s in scope (W2.2 baseline)
Out of scope (later)
What’s already shipped
How it’s wired

Why this exists

MFA enforcement protects against credential-leak risk on admin accounts. Even if an attacker steals a session cookie or API password, they cannot mutate decisioning configuration (offers, contact policies, decision flows, MCP playbooks, models, approvals, etc.) without the second factor. The enforcement runs in edge middleware for low latency — typically <5ms additional overhead per request — and applies to every state-changing HTTP method (POST / PUT / PATCH / DELETE) on /api/* routes.

How the flow works

Admin signs in with email + password (or Google OAuth).
If the user has MFA enabled, NextAuth issues a JWT with mfaPending = true and mfaVerifiedAt = null.

Admin attempts a state-changing API call. Middleware sees mfaPending && !fresh(mfaVerifiedAt) and returns 403 MFA_REQUIRED with this body:

{
  "error": {
    "code": "MFA_REQUIRED",
    "message": "Admin accounts require MFA verification within the last 15 minutes for write operations.",
    "status": 403,
    "hint": "POST /api/v1/auth/mfa with { action: 'verify', token: '<TOTP>' }, then refresh the session so the verification timestamp is written to the JWT."
  }
}

Client calls POST /api/v1/auth/mfa with { action: "verify", token: "<6-digit TOTP>" }. Returns 200 { verified: true } on success or 400 on bad token.
Client calls the NextAuth session-update hook with the fresh verification timestamp; the auth callback handles the update trigger and writes the timestamp into the JWT.
Subsequent admin writes pass for 15 minutes, after which the user must re-verify.

Step-up TTL

Default is 15 minutes. The TTL is intentionally short to limit blast radius if a session is hijacked. Sliding-window: every successful verify resets the 15-minute timer. The TTL is hardcoded in the edge middleware as a constant equal to 15 minutes in milliseconds. To change it, modify the constant and redeploy. There is no per-tenant configuration today.

What’s bypassed

GET / HEAD / OPTIONS requests — no enforcement (read-only)
Non-admin users — no enforcement (the gate is admin-only)
Users with MFA not enabled — no enforcement (mfaPending is false)
/api/v1/auth/mfa itself — must be reachable to do the verify
/api/auth/* (NextAuth handlers) — needed for sign-in flow
API-key-authenticated server-to-server calls — these don’t carry a JWT and are gated separately by API-key scope

Kill switch (incident only)

Set MFA_ENFORCEMENT_DISABLED=true in the deployment environment to bypass enforcement for all requests. This is only for incident recovery — for example, if the TOTP server’s clock is drifting and legitimate codes are being rejected. When set, requests that would have been blocked are passed through with no logging change. Set the env var back to false (or unset) and redeploy to re-enable.

Operator runbook

Symptom	Likely cause	Action
All admin writes returning 403 MFA_REQUIRED	Admin has MFA enabled but never completed verify	Have admin do `POST /api/v1/auth/mfa { action: "verify" }` then update session
Admin verified successfully but next write returns 403	JWT didn’t receive the timestamp because the session-update hook wasn’t called after verify	Check client-side: the NextAuth session-update hook must be invoked with the fresh `mfaVerifiedAt` ISO timestamp after the verify response
403 returns even though admin verified < 15 min ago	Clock drift between auth server and middleware host	Check NTP on both hosts. Or temporarily disable via kill switch
All non-admin users returning 403	Bug — non-admins shouldn’t be gated	File issue immediately. Set kill switch as workaround
Production-grade incident: locked out of admin	Kill switch not yet set, need access NOW	Set `MFA_ENFORCEMENT_DISABLED=true` in env, redeploy. Re-enable once admin can verify

What’s in scope (W2.2 baseline)

✅ Edge middleware enforcement on all state-changing requests to /api/*
✅ mfaVerifiedAt JWT timestamp + 15-minute sliding window
✅ MFA_ENFORCEMENT_DISABLED env kill switch
✅ Source-level regression tests covering the middleware path
✅ MFA verify endpoint at POST /api/v1/auth/mfa

Out of scope (later)

Per-tenant TTL configuration
Audit log entry on every verify (currently logged at info via logger)
IP-based step-up (require fresh verify when source IP changes mid-session)

What’s already shipped

✅ WebAuthn / Passkey support — registration is a two-step ceremony at POST /api/v1/auth/webauthn/register/begin (server issues a challenge) and POST /api/v1/auth/webauthn/register/finish (client posts the attestation). Verification follows the same pattern: POST /api/v1/auth/webauthn/verify/begin then POST /api/v1/auth/webauthn/verify/finish. Implements the same mfaVerifiedAt JWT claim as TOTP, so once a passkey is registered the middleware treats both factors equivalently.
✅ TOTP + backup codes
✅ Step-up verify endpoint at POST /api/v1/auth/mfa

How it’s wired

The edge middleware is the single enforcement point for state-changing requests on /api/*.
The auth-layer JWT callback writes the verification timestamp into the session whenever the session-update hook fires with trigger="update".
The verify endpoint at POST /api/v1/auth/mfa performs TOTP validation, accepts backup codes, and is rate-limited.
A regression test covers the middleware enforcement path end-to-end.

Authentication Four-eyes governance gate

Get Started

Tutorials

Decisioning

Studio

Data Pipelines

AI & ML

Operations & Reporting

Governance & Security

Integrations

Reference

Why this exists

How the flow works

Step-up TTL

What’s bypassed

Kill switch (incident only)

Operator runbook

What’s in scope (W2.2 baseline)

Out of scope (later)

What’s already shipped

How it’s wired

Get Started

Tutorials

Decisioning

Studio

Data Pipelines

AI & ML

Operations & Reporting

Governance & Security

Integrations

Reference

Documentation Index

​Why this exists

​How the flow works

​Step-up TTL

​What’s bypassed

​Kill switch (incident only)

​Operator runbook

​What’s in scope (W2.2 baseline)

​Out of scope (later)

​What’s already shipped

​How it’s wired

Why this exists

How the flow works

Step-up TTL

What’s bypassed

Kill switch (incident only)

Operator runbook

What’s in scope (W2.2 baseline)

Out of scope (later)

What’s already shipped

How it’s wired