Why this exists
MFA enforcement protects against credential-leak risk on admin accounts. Even if an attacker steals a session cookie or API password, they cannot mutate decisioning configuration (offers, contact policies, decision flows, MCP playbooks, models, approvals, etc.) without the second factor. The enforcement runs in edge middleware for low latency — typically <5ms additional overhead per request — and applies to every state-changing HTTP method (POST / PUT / PATCH / DELETE) on/api/* routes.
How the step-up proof works (server-issued HMAC cookie)
The freshness proof is a server-issued HMAC-SHA256 signed cookie (kaireon_stepup),
not a client-asserted timestamp. Previous versions trusted a mfaVerifiedAt timestamp
pushed through NextAuth session.update() — any client could mint its own freshness
without completing a real challenge. That path is no longer trusted.
The new flow:
- Admin signs in with email + password (or Google OAuth).
-
If the user has MFA enabled, the JWT carries
mfaPending = true. -
Admin attempts a state-changing API call. Middleware validates the
kaireon_stepupcookie. If the cookie is absent, expired, tampered, or signed for a different user, middleware returns403 MFA_REQUIRED: -
Client calls
POST /api/v1/auth/mfawith{ action: "verify", token: "<6-digit TOTP>" }(or completes a WebAuthn verify-finish ceremony). On success the server mints akaireon_stepupcookie:- Format:
base64url(JSON{sub, iat}).<hex HMAC-SHA256>— signed withNEXTAUTH_SECRET - The cookie is
httpOnly,sameSite=strict,securein production - TTL: 15 minutes (
STEP_UP_TTL_MS = 15 * 60 * 1000)
- Format:
-
Subsequent admin writes pass as long as the cookie is fresh. No
session.update()call is needed — the cookie is the sole freshness proof.
Step-up TTL
Default is 15 minutes (hardcoded insrc/lib/auth/step-up-edge.ts as
STEP_UP_TTL_MS = 15 * 60 * 1000). The TTL is intentionally short to limit blast
radius if a session is hijacked. Every successful verify resets the timer.
There is no per-tenant configuration today; changing the TTL requires a code change
and redeploy.
What’s bypassed
- GET / HEAD / OPTIONS requests — no enforcement (read-only)
- Non-admin users — no enforcement (the gate is admin-only)
- Users with MFA not enabled — no enforcement (
mfaPendingisfalse) /api/v1/auth/mfaitself — must be reachable to do the verify/api/auth/*(NextAuth handlers) — needed for sign-in flow- API-key-authenticated server-to-server calls — these don’t carry a JWT and are gated separately by API-key scope
Kill switch (incident only)
SetMFA_ENFORCEMENT_DISABLED=true in the deployment environment to
bypass enforcement for all requests. This is only for incident
recovery — for example, if the TOTP server’s clock is drifting and
legitimate codes are being rejected.
When set, requests that would have been blocked are passed through with
no logging change. Set the env var back to false (or unset) and redeploy
to re-enable.
Operator runbook
| Symptom | Likely cause | Action |
|---|---|---|
| All admin writes returning 403 MFA_REQUIRED | Admin has MFA enabled but never completed verify | Have admin call POST /api/v1/auth/mfa with action: "verify" — the server sets the step-up cookie on success |
| Admin verified successfully but next write returns 403 | The kaireon_stepup cookie was not set or was blocked (e.g. cross-site context) | Verify the response from POST /api/v1/auth/mfa — it should set an httpOnly cookie. Check browser devtools under Application → Cookies. |
| 403 returns even though admin verified within the last 15 minutes | Clock drift between auth server and middleware host | Check NTP on both hosts. Or temporarily disable via kill switch |
| All non-admin users returning 403 | Bug — non-admins shouldn’t be gated | File issue immediately. Set kill switch as workaround |
| Production-grade incident: locked out of admin | Kill switch not yet set, need access NOW | Set MFA_ENFORCEMENT_DISABLED=true in env, redeploy. Re-enable once admin can verify |
What’s in scope
- ✅ Edge middleware enforcement on all state-changing requests to
/api/* - ✅ Server-issued HMAC-SHA256
kaireon_stepupcookie (minted by TOTP/WebAuthn verify) - ✅ 15-minute sliding-window TTL validated in the Edge runtime via Web Crypto
- ✅
MFA_ENFORCEMENT_DISABLEDenv kill switch - ✅ Source-level regression tests covering the middleware path
- ✅ MFA verify endpoint at
POST /api/v1/auth/mfa - ✅ WebAuthn verify-finish also mints the step-up cookie on success
Out of scope (later)
- Per-tenant TTL configuration
- Audit log entry on every verify (currently logged at info via
logger) - IP-based step-up (require fresh verify when source IP changes mid-session)
What’s already shipped
- ✅ WebAuthn / Passkey support — registration is a two-step ceremony at
POST /api/v1/auth/webauthn/register/begin(server issues a challenge) andPOST /api/v1/auth/webauthn/register/finish(client posts the attestation). Verification follows the same pattern:POST /api/v1/auth/webauthn/verify/beginthenPOST /api/v1/auth/webauthn/verify/finish. Implements the samemfaVerifiedAtJWT claim as TOTP, so once a passkey is registered the middleware treats both factors equivalently. - ✅ TOTP + backup codes
- ✅ Step-up verify endpoint at
POST /api/v1/auth/mfa
How it’s wired
- The edge middleware (
src/middleware.ts) is the single enforcement point for state-changing requests on/api/*. It importsverifyStepUpTokenAsyncfromsrc/lib/auth/step-up-edge.ts(Web Crypto only — nonode:cryptoin the Edge bundle). POST /api/v1/auth/mfa(src/app/api/v1/auth/mfa/route.ts) callsmintStepUpTokenfromsrc/lib/auth/step-up.tson a valid TOTP/backup-code verify and sets thekaireon_stepupcookie on the response. Nosession.update()is called or required.- The WebAuthn verify-finish route likewise calls
mintStepUpTokenon a valid assertion. - A regression test covers the middleware enforcement path end-to-end.