Skip to main content

Why this exists

MFA enforcement protects against credential-leak risk on admin accounts. Even if an attacker steals a session cookie or API password, they cannot mutate decisioning configuration (offers, contact policies, decision flows, MCP playbooks, models, approvals, etc.) without the second factor. The enforcement runs in edge middleware for low latency — typically <5ms additional overhead per request — and applies to every state-changing HTTP method (POST / PUT / PATCH / DELETE) on /api/* routes. The freshness proof is a server-issued HMAC-SHA256 signed cookie (kaireon_stepup), not a client-asserted timestamp. Previous versions trusted a mfaVerifiedAt timestamp pushed through NextAuth session.update() — any client could mint its own freshness without completing a real challenge. That path is no longer trusted. The new flow:
  1. Admin signs in with email + password (or Google OAuth).
  2. If the user has MFA enabled, the JWT carries mfaPending = true.
  3. Admin attempts a state-changing API call. Middleware validates the kaireon_stepup cookie. If the cookie is absent, expired, tampered, or signed for a different user, middleware returns 403 MFA_REQUIRED:
    {
      "error": {
        "code": "MFA_REQUIRED",
        "message": "Admin accounts require MFA verification within the last 15 minutes for write operations.",
        "status": 403,
        "hint": "POST /api/v1/auth/mfa with { action: 'verify', token: '<TOTP>' } (or complete a WebAuthn verify). The server sets a signed step-up cookie on success — no client-side session.update is needed."
      }
    }
    
  4. Client calls POST /api/v1/auth/mfa with { action: "verify", token: "<6-digit TOTP>" } (or completes a WebAuthn verify-finish ceremony). On success the server mints a kaireon_stepup cookie:
    • Format: base64url(JSON{sub, iat}).<hex HMAC-SHA256> — signed with NEXTAUTH_SECRET
    • The cookie is httpOnly, sameSite=strict, secure in production
    • TTL: 15 minutes (STEP_UP_TTL_MS = 15 * 60 * 1000)
  5. Subsequent admin writes pass as long as the cookie is fresh. No session.update() call is needed — the cookie is the sole freshness proof.

Step-up TTL

Default is 15 minutes (hardcoded in src/lib/auth/step-up-edge.ts as STEP_UP_TTL_MS = 15 * 60 * 1000). The TTL is intentionally short to limit blast radius if a session is hijacked. Every successful verify resets the timer. There is no per-tenant configuration today; changing the TTL requires a code change and redeploy.

What’s bypassed

  • GET / HEAD / OPTIONS requests — no enforcement (read-only)
  • Non-admin users — no enforcement (the gate is admin-only)
  • Users with MFA not enabled — no enforcement (mfaPending is false)
  • /api/v1/auth/mfa itself — must be reachable to do the verify
  • /api/auth/* (NextAuth handlers) — needed for sign-in flow
  • API-key-authenticated server-to-server calls — these don’t carry a JWT and are gated separately by API-key scope

Kill switch (incident only)

Set MFA_ENFORCEMENT_DISABLED=true in the deployment environment to bypass enforcement for all requests. This is only for incident recovery — for example, if the TOTP server’s clock is drifting and legitimate codes are being rejected. When set, requests that would have been blocked are passed through with no logging change. Set the env var back to false (or unset) and redeploy to re-enable.

Operator runbook

SymptomLikely causeAction
All admin writes returning 403 MFA_REQUIREDAdmin has MFA enabled but never completed verifyHave admin call POST /api/v1/auth/mfa with action: "verify" — the server sets the step-up cookie on success
Admin verified successfully but next write returns 403The kaireon_stepup cookie was not set or was blocked (e.g. cross-site context)Verify the response from POST /api/v1/auth/mfa — it should set an httpOnly cookie. Check browser devtools under Application → Cookies.
403 returns even though admin verified within the last 15 minutesClock drift between auth server and middleware hostCheck NTP on both hosts. Or temporarily disable via kill switch
All non-admin users returning 403Bug — non-admins shouldn’t be gatedFile issue immediately. Set kill switch as workaround
Production-grade incident: locked out of adminKill switch not yet set, need access NOWSet MFA_ENFORCEMENT_DISABLED=true in env, redeploy. Re-enable once admin can verify

What’s in scope

  • ✅ Edge middleware enforcement on all state-changing requests to /api/*
  • ✅ Server-issued HMAC-SHA256 kaireon_stepup cookie (minted by TOTP/WebAuthn verify)
  • ✅ 15-minute sliding-window TTL validated in the Edge runtime via Web Crypto
  • MFA_ENFORCEMENT_DISABLED env kill switch
  • ✅ Source-level regression tests covering the middleware path
  • ✅ MFA verify endpoint at POST /api/v1/auth/mfa
  • ✅ WebAuthn verify-finish also mints the step-up cookie on success

Out of scope (later)

  • Per-tenant TTL configuration
  • Audit log entry on every verify (currently logged at info via logger)
  • IP-based step-up (require fresh verify when source IP changes mid-session)

What’s already shipped

  • WebAuthn / Passkey support — registration is a two-step ceremony at POST /api/v1/auth/webauthn/register/begin (server issues a challenge) and POST /api/v1/auth/webauthn/register/finish (client posts the attestation). Verification follows the same pattern: POST /api/v1/auth/webauthn/verify/begin then POST /api/v1/auth/webauthn/verify/finish. Implements the same mfaVerifiedAt JWT claim as TOTP, so once a passkey is registered the middleware treats both factors equivalently.
  • ✅ TOTP + backup codes
  • ✅ Step-up verify endpoint at POST /api/v1/auth/mfa

How it’s wired

  • The edge middleware (src/middleware.ts) is the single enforcement point for state-changing requests on /api/*. It imports verifyStepUpTokenAsync from src/lib/auth/step-up-edge.ts (Web Crypto only — no node:crypto in the Edge bundle).
  • POST /api/v1/auth/mfa (src/app/api/v1/auth/mfa/route.ts) calls mintStepUpToken from src/lib/auth/step-up.ts on a valid TOTP/backup-code verify and sets the kaireon_stepup cookie on the response. No session.update() is called or required.
  • The WebAuthn verify-finish route likewise calls mintStepUpToken on a valid assertion.
  • A regression test covers the middleware enforcement path end-to-end.