Skip to main content

Startup Issues

These errors typically occur when starting the development server or deploying for the first time.
Cause: KaireonAI uses Prisma 7, which moved the datasource URL out of schema.prisma and into prisma.config.ts.Fix: Open prisma/schema.prisma and ensure the datasource block only contains the provider — no url line:
datasource db {
  provider = "postgresql"
}
The connection URL is defined in prisma.config.ts and reads from the DATABASE_URL environment variable.
Cause: The Prisma client has not been generated yet. The generated client lives in platform/generated/prisma/ and is gitignored, so it must be created locally.Fix:
npx prisma generate
This runs automatically on npm install via the postinstall script, but you may need to run it manually after schema changes.
Cause: The application cannot connect to PostgreSQL.Fix — checklist:
  1. Verify DATABASE_URL is set in your .env file
  2. Ensure PostgreSQL is running (pg_isready or psql to test)
  3. Confirm the hostname, port, database name, and credentials in the URL are correct
  4. If using Docker, make sure the container is up and the port is mapped
# Test connectivity directly
psql "$DATABASE_URL" -c "SELECT 1"
Cause: The authentication system requires a secret key for signing session tokens.Fix: Add NEXTAUTH_SECRET to your .env file:
# Generate a random secret
openssl rand -base64 32
NEXTAUTH_SECRET=your-generated-secret-here
Cause: Another process is already listening on port 3000.Fix: Either kill the existing process or start on a different port:
# Find and kill the process on port 3000
lsof -ti:3000 | xargs kill -9

# Or start on a different port
PORT=3001 npm run dev

API Errors

All KaireonAI API routes return standard HTTP status codes. Here is a reference for the most common error responses.
StatusMeaningCommon CauseFix
401UnauthorizedMissing X-Tenant-Id header or invalid/expired sessionInclude the X-Tenant-Id header in every API request. Re-authenticate if session expired.
403ForbiddenUser role lacks permission (admin/editor/viewer)Check the user’s role. Write operations require admin or editor.
404Not FoundResource does not exist, or belongs to a different tenantVerify the resource ID and that the X-Tenant-Id matches the owning tenant.
409ConflictDuplicate key violation, or rowVersion mismatch (optimistic locking)Re-fetch the resource to get the latest rowVersion, then retry the update.
429Too Many RequestsRate limit exceededWait for the duration in the Retry-After header before retrying. See Rate Limiting below.
500Internal Server ErrorUnhandled exception on the serverCheck server logs. If reproducible, file a bug with the request payload.

Rate Limiting

KaireonAI uses a sliding-window rate limiter. When you receive a 429 response, the Retry-After header tells you how many seconds to wait. The X-RateLimit-Remaining header shows how many requests remain in the current window.
In multi-node deployments, rate limiting uses Redis sorted sets for global accuracy. If Redis is unavailable, the limiter falls back to per-process in-memory tracking — limits may be less precise across nodes.

Decision Flow Issues

No Offers Returned

If the Recommend API returns an empty list, work through this checklist:
Each item below is a filter in the decision pipeline. Offers must pass all of them to appear in the response.
  1. Offers are active — Check that the offer’s status is active (not draft or paused)
  2. Schedule window — Verify startDate and endDate encompass the current date
  3. Budget remaining — Check the offer has not exhausted its decision or cost budget
  4. Flow inventory — The offer must be assigned to the Decision Flow being evaluated
  5. Qualification rules passing — Review the qualification rules attached to the offer; test with the exact customer attributes being sent
  6. Contact policies not suppressing — Check that contact policy limits (frequency caps, channel fatigue) are not filtering out the offer for this customer
  7. Channel/placement match — The request’s channel and placement must match what the flow is configured for
Enable Decision Traces in tenant settings to get a step-by-step breakdown of why each offer was included or filtered. See Debugging with Decision Traces below.

Scoring Returns 0

  • Model configured? Verify a scoring model is assigned to the Decision Flow and that it has valid weights
  • Circuit breaker tripped? If the model’s error rate exceeded the threshold, the circuit breaker opens and scoring returns a fallback value. Check the Operations Dashboard for circuit breaker status
  • Model health — Check the model health dashboard for drift or degraded performance

Wrong Offers Returned

  • Flow routing — Ensure the channel and placement in the Recommend request match the intended Decision Flow
  • Qualification rule scopes — Rules scoped to the wrong category or sub-category can inadvertently include/exclude offers
  • Arbitration weights — If offers are returned but in an unexpected order, review the arbitration profile weights (revenue, margin, propensity, engagement)

Health Checks

KaireonAI exposes two health endpoints for monitoring and orchestration.

GET /api/health

Returns the overall system health including database connectivity, Redis status, and circuit breaker states.
curl -s http://localhost:3000/api/health | jq
Response:
{
  "status": "ok",
  "database": "ok",
  "uptime": 3621.45,
  "timestamp": "2026-03-16T14:30:00.000Z"
}
StatusHTTP CodeMeaning
ok200All systems healthy
degraded (200)200Database is up, but Redis is down or a circuit breaker is open
degraded (503)503Database is unreachable
A degraded status with HTTP 200 means the platform can still serve requests with reduced functionality (e.g., no caching, fallback scoring). A 503 means the database is down and the platform cannot process decisions.

GET /api/ready

A readiness probe suitable for Kubernetes or load balancer health checks. Returns 200 only when both database and cache are connected.
curl -s http://localhost:3000/api/ready | jq
Response:
{
  "status": "ready",
  "checks": {
    "database": "connected",
    "cache": "connected"
  },
  "timestamp": "2026-03-16T14:30:00.000Z"
}
StatusHTTP CodeMeaning
ready200All dependencies healthy
degraded503Database is disconnected
Use /api/health for monitoring dashboards and alerting. Use /api/ready for Kubernetes readiness probes and load balancer target health checks.

Debugging with Decision Traces

Decision traces provide a forensic log of every step in the decision pipeline — which offers were considered, which filters removed them, and the final scoring/ranking.

Enabling Traces

  1. Go to Settings > Tenant Settings
  2. Enable Decision Tracing
  3. Set a sample rate (e.g., 0.1 for 10% of requests, or 1.0 for all requests during debugging)
Setting the sample rate to 1.0 in production will significantly increase database writes and storage usage. Use a low sample rate (0.01 - 0.1) for production monitoring, and 1.0 only during active debugging.

Reading a Trace

Each trace contains:
  • Request context — channel, placement, customer ID, attributes sent
  • Candidate set — all offers that entered the pipeline
  • Filter stages — which offers were removed at each stage (qualification, contact policy, budget, schedule) and why
  • Scoring — the raw and weighted scores for each surviving offer
  • Final ranking — the ordered list returned to the caller

Finding Why an Offer Was Filtered

  1. Go to Studio > Decision Flows and open the flow
  2. Click Recent Traces to see the latest decision trace results
  3. Search by customer ID or request ID
  4. Expand the trace and look at each filter stage — the filtered offer will show the stage name and reason (e.g., qualification_rule: min_balance >= 1000 failed)

Database Issues

Schema Push Fails

npx prisma db push
If this fails:
  • Connection error — Verify DATABASE_URL in .env (see Startup Issues)
  • Schema conflict — If you changed a column type on an existing table with data, Prisma may refuse. Use npx prisma db push --accept-data-loss only if you are okay losing data in that column
  • Permission denied — Ensure the database user has DDL privileges (CREATE TABLE, ALTER TABLE)

Migration Status Check

npx prisma migrate status
This shows pending and applied migrations. If migrations are out of sync, you may need to run:
npx prisma migrate deploy    # Apply pending migrations (production)
npx prisma migrate dev       # Create + apply migrations (development)

Connection Pool Exhaustion

Symptoms: requests hang or timeout, P2024: Timed out fetching a new connection from the connection pool errors. The default pool is configured with a maximum of 50 connections and a 2-second connection timeout. Fix:
  • Check for long-running queries or uncommitted transactions
  • Increase the pool size via the PG_POOL_MAX environment variable if your database supports more connections
  • Ensure the application is using the Prisma singleton (not creating new clients per request)
PG_POOL_MAX=100

Redis Issues

Rate Limiter Not Working as Expected

If rate limits are inconsistent across nodes, Redis may not be connected. The rate limiter falls back to in-memory mode when Redis is unavailable, which means each process tracks limits independently. Check Redis connectivity:
curl -s http://localhost:3000/api/ready | jq '.checks.cache'
If the response shows "unavailable" or "disconnected", verify your Redis connection configuration.

Stale Cache Data

If you see outdated data after making changes:
  • Clear the Redis cache by restarting Redis or flushing the relevant keys
  • Check that cache invalidation is wired up correctly for the entity you changed
  • As a workaround, restart the application to clear in-memory caches

Redis Connection Errors

Common causes:
  • Redis not running — Start your Redis instance
  • Wrong host/port — Verify REDIS_URL in your .env
  • Max connections exceeded — Check Redis maxclients setting
  • Network/firewall — Ensure the application can reach the Redis host
KaireonAI is designed for graceful degradation. If Redis is unavailable, the platform continues to operate — caching and distributed rate limiting fall back to in-memory alternatives. However, performance and accuracy of rate limits may be reduced.