Troubleshooting - KaireonAI

Where to Find Logs

Before debugging any issue, know where to look.

Local Development

When running npm run dev, all server logs appear in your terminal’s stdout/stderr. Next.js prints compilation errors, API route logs, and unhandled exceptions directly to the console.

Docker

# Follow logs in real time
docker logs -f kaireon-api

# Show last 200 lines
docker logs --tail 200 kaireon-api

# Filter for errors
docker logs kaireon-api 2>&1 | grep -i error

Production (Structured Logging)

In production, KaireonAI outputs structured JSON logs via Winston. Control the verbosity with the LOG_LEVEL environment variable:

# Options: error, warn, info, http, verbose, debug, silly
LOG_LEVEL=info   # default

Example log line:

{
  "level": "info",
  "message": "Recommend API completed",
  "recommendationId": "abc-123-def",
  "durationMs": 142,
  "offersReturned": 3,
  "timestamp": "2026-03-30T10:15:22.000Z"
}

Correlating Logs with API Responses

Every API error response includes a recommendationId field. Use it to find the matching server-side log entry:

{
  "error": "Internal Server Error",
  "recommendationId": "abc-123-def"
}

# Search Docker logs by recommendationId
docker logs kaireon-api 2>&1 | grep "abc-123-def"

Every API response also includes an x-request-id header. You can use either the response body recommendationId or the header value to correlate client requests with server logs.

Startup Issues

These errors typically occur when starting the development server or deploying for the first time.

The datasource property 'url' is no longer supported in schema files

Cause: KaireonAI uses Prisma 7, which moved the datasource URL out of schema.prisma and into prisma.config.ts.Fix: Open prisma/schema.prisma and ensure the datasource block only contains the provider — no url line:

datasource db {
  provider = "postgresql"
}

The connection URL is defined in prisma.config.ts and reads from the DATABASE_URL environment variable.

Cannot find module '@generated/prisma'

Cause: The Prisma client has not been generated yet. The generated client lives in platform/generated/prisma/ and is gitignored, so it must be created locally.Fix:

npx prisma generate

This runs automatically on npm install via the postinstall script, but you may need to run it manually after schema changes.

P1001: Can't reach database server

Cause: The application cannot connect to PostgreSQL.Fix — checklist:

Verify DATABASE_URL is set in your .env file
Ensure PostgreSQL is running (pg_isready or psql to test)
Confirm the hostname, port, database name, and credentials in the URL are correct
If using Docker, make sure the container is up and the port is mapped

# Test connectivity directly
psql "$DATABASE_URL" -c "SELECT 1"

NEXTAUTH_SECRET is not set

Cause: The authentication system requires a secret key for signing session tokens.Fix: Add NEXTAUTH_SECRET to your .env file:

# Generate a random secret
openssl rand -base64 32

NEXTAUTH_SECRET=your-generated-secret-here

Port 3000 already in use

Cause: Another process is already listening on port 3000.Fix: Either kill the existing process or start on a different port:

# Find and kill the process on port 3000
lsof -ti:3000 | xargs kill -9

# Or start on a different port
PORT=3001 npm run dev

AI Chat not working — 'LLM provider not configured'

Cause: The AI assistant requires an LLM provider to be configured before it can respond.Fix: Go to Settings > AI Configuration and configure your preferred provider:

Google Gemini — Free tier available, good starting point
OpenAI — GPT-4o and GPT-4o-mini
Anthropic — Claude models
Ollama — Self-hosted open-source models (no API key needed)

Enter your API key, select a model, and click Save. The AI chat will work immediately after configuration.

Docker build fails with SIGKILL (exit code 137)

Cause: The Docker build process ran out of memory. Next.js builds and Prisma generation are memory-intensive.Fix: Increase Docker Desktop memory allocation to at least 8 GB:

Open Docker Desktop > Settings > Resources
Set Memory to 8.00 GB (or higher)
Click Apply & Restart
Retry the build:

docker build -t kaireon-api .

If you are on a CI server, ensure the build runner has at least 8 GB of RAM available.

CSRF errors on API calls (403 Forbidden)

Cause: KaireonAI enforces CSRF protection on state-changing requests. Requests from API clients (not the browser UI) must include a specific header.Fix: Add the X-Requested-With header to all POST, PUT, PATCH, and DELETE requests:

curl -X POST http://localhost:3000/api/v1/offers \
  -H "Content-Type: application/json" \
  -H "X-Requested-With: XMLHttpRequest" \
  -H "X-Tenant-Id: your-tenant-id" \
  -d '{"name": "Summer Sale", ...}'

This header signals that the request is intentional and not a cross-site forgery attempt.

Starbucks sample dataset load fails

Cause: The dataset loader creates multiple tables and inserts several thousand rows. Failures are typically caused by database connectivity issues or insufficient disk space.Fix — checklist:

Verify the database connection is working: psql "$DATABASE_URL" -c "SELECT 1"
Check available disk space on the database server (the dataset requires ~50 MB)
Ensure the database user has CREATE TABLE and INSERT privileges
If the load partially completed, try again — the loader uses upserts and is safe to re-run

# Check disk space (Linux/macOS)
df -h

Recommend API returns 0 decisions

API Errors

All KaireonAI API routes return standard HTTP status codes. Here is a reference for the most common error responses.

Status	Meaning	Common Cause	Fix
401	Unauthorized	Missing `X-Tenant-Id` header or invalid/expired session	Include the `X-Tenant-Id` header in every API request. Re-authenticate if session expired.
403	Forbidden	User role lacks permission (admin/editor/viewer)	Check the user’s role. Write operations require `admin` or `editor`.
404	Not Found	Resource does not exist, or belongs to a different tenant	Verify the resource ID and that the `X-Tenant-Id` matches the owning tenant.
409	Conflict	Duplicate key violation, or `rowVersion` mismatch (optimistic locking)	Re-fetch the resource to get the latest `rowVersion`, then retry the update.
429	Too Many Requests	Rate limit exceeded	Wait for the duration in the `Retry-After` header before retrying. See Rate Limiting below.
500	Internal Server Error	Unhandled exception on the server	Check server logs. If reproducible, file a bug with the request payload.

Rate Limiting

KaireonAI uses a sliding-window rate limiter. When you receive a 429 response, the Retry-After header tells you how many seconds to wait. The X-RateLimit-Remaining header shows how many requests remain in the current window.

In multi-node deployments, rate limiting uses Redis sorted sets for global accuracy. If Redis is unavailable, the limiter falls back to per-process in-memory tracking — limits may be less precise across nodes.

Decision Flow Issues

No Offers Returned

If the Recommend API returns an empty list, work through this checklist:

Each item below is a filter in the decision pipeline. Offers must pass all of them to appear in the response.

Offers are active — Check that the offer’s status is active (not draft or paused)
Schedule window — Verify startDate and endDate encompass the current date
Budget remaining — Check the offer has not exhausted its decision or cost budget
Flow inventory — The offer must be assigned to the Decision Flow being evaluated
Qualification rules passing — Review the qualification rules attached to the offer; test with the exact customer attributes being sent
Contact policies not suppressing — Check that contact policy limits (frequency caps, channel fatigue) are not filtering out the offer for this customer
Channel/placement match — The request’s channel and placement must match what the flow is configured for

Enable Decision Traces in tenant settings to get a step-by-step breakdown of why each offer was included or filtered. See Debugging with Decision Traces below.

Scoring Returns 0

Model configured? Verify a scoring model is assigned to the Decision Flow and that it has valid weights
Circuit breaker tripped? If the model’s error rate exceeded the threshold, the circuit breaker opens and scoring returns a fallback value. Check the Operations Dashboard for circuit breaker status
Model health — Check the model health dashboard for drift or degraded performance

Wrong Offers Returned

Flow routing — Ensure the channel and placement in the Recommend request match the intended Decision Flow
Qualification rule scopes — Rules scoped to the wrong category or sub-category can inadvertently include/exclude offers
Optimization weights — If offers are returned but in an unexpected order, review the portfolio optimization profile weights (revenue, margin, propensity, engagement)

Health Checks

KaireonAI exposes two health endpoints for monitoring and orchestration.

GET /api/health

Returns the overall system health including database connectivity, Redis status, and circuit breaker states.

curl -s http://localhost:3000/api/health | jq

Response:

{
  "status": "ok",
  "database": "ok",
  "uptime": 3621.45,
  "timestamp": "2026-03-16T14:30:00.000Z"
}

Status	HTTP Code	Meaning
`ok`	200	All systems healthy
`degraded` (200)	200	Database is up, but Redis is down or a circuit breaker is open
`degraded` (503)	503	Database is unreachable

A degraded status with HTTP 200 means the platform can still serve requests with reduced functionality (e.g., no caching, fallback scoring). A 503 means the database is down and the platform cannot process decisions.

GET /api/ready

A readiness probe suitable for Kubernetes or load balancer health checks. Returns 200 only when both database and cache are connected.

curl -s http://localhost:3000/api/ready | jq

Response:

{
  "status": "ready",
  "checks": {
    "database": "connected",
    "cache": "connected"
  },
  "timestamp": "2026-03-16T14:30:00.000Z"
}

Status	HTTP Code	Meaning
`ready`	200	All dependencies healthy
`degraded`	503	Database is disconnected

Use /api/health for monitoring dashboards and alerting. Use /api/ready for Kubernetes readiness probes and load balancer target health checks.

Debugging with Decision Traces

Decision traces provide a forensic log of every step in the decision pipeline — which offers were considered, which filters removed them, and the final scoring/ranking.

Enabling Traces

Go to Settings > Tenant Settings
Enable Decision Tracing
Set a sample rate (e.g., 0.1 for 10% of requests, or 1.0 for all requests during debugging)

Setting the sample rate to 1.0 in production will significantly increase database writes and storage usage. Use a low sample rate (0.01 - 0.1) for production monitoring, and 1.0 only during active debugging.

Reading a Trace

Each trace contains:

Request context — channel, placement, customer ID, attributes sent
Candidate set — all offers that entered the pipeline
Filter stages — which offers were removed at each stage (qualification, contact policy, budget, schedule) and why
Scoring — the raw and weighted scores for each surviving offer
Final ranking — the ordered list returned to the caller

Finding Why an Offer Was Filtered

Go to Studio > Decision Flows and open the flow
Click Recent Traces to see the latest decision trace results
Search by customer ID or request ID
Expand the trace and look at each filter stage — the filtered offer will show the stage name and reason (e.g., qualification_rule: min_balance >= 1000 failed)

Database Issues

Schema Push Fails

npx prisma db push

If this fails:

Connection error — Verify DATABASE_URL in .env (see Startup Issues)
Schema conflict — If you changed a column type on an existing table with data, Prisma may refuse. Use npx prisma db push --accept-data-loss only if you are okay losing data in that column
Permission denied — Ensure the database user has DDL privileges (CREATE TABLE, ALTER TABLE)

Migration Status Check

npx prisma migrate status

This shows pending and applied migrations. If migrations are out of sync, you may need to run:

npx prisma migrate deploy    # Apply pending migrations (production)
npx prisma migrate dev       # Create + apply migrations (development)

Connection Pool Exhaustion

Symptoms: requests hang or timeout, P2024: Timed out fetching a new connection from the connection pool errors. The default pool is configured with a maximum of 50 connections and a 2-second connection timeout. Fix:

Check for long-running queries or uncommitted transactions
Increase the pool size via the PG_POOL_MAX environment variable if your database supports more connections
Ensure the application is using the Prisma singleton (not creating new clients per request)

PG_POOL_MAX=100

Redis Issues

Rate Limiter Not Working as Expected

If rate limits are inconsistent across nodes, Redis may not be connected. The rate limiter falls back to in-memory mode when Redis is unavailable, which means each process tracks limits independently. Check Redis connectivity:

curl -s http://localhost:3000/api/ready | jq '.checks.cache'

If the response shows "unavailable" or "disconnected", verify your Redis connection configuration.

Stale Cache Data

If you see outdated data after making changes:

Clear the Redis cache by restarting Redis or flushing the relevant keys
Check that cache invalidation is wired up correctly for the entity you changed
As a workaround, restart the application to clear in-memory caches

Redis Connection Errors

Common causes:

Redis not running — Start your Redis instance
Wrong host/port — Verify REDIS_URL in your .env
Max connections exceeded — Check Redis maxclients setting
Network/firewall — Ensure the application can reach the Redis host

KaireonAI is designed for graceful degradation. If Redis is unavailable, the platform continues to operate — caching and distributed rate limiting fall back to in-memory alternatives. However, performance and accuracy of rate limits may be reduced.

Debug Mode

When standard logs are not enough, enable debug mode for verbose output across all subsystems.

Verbose Logging

Set LOG_LEVEL=debug to see detailed internal operations — query execution, cache hits/misses, scoring calculations, and middleware processing:

LOG_LEVEL=debug

# Docker
docker run -e LOG_LEVEL=debug kaireon-api

# Local development
LOG_LEVEL=debug npm run dev

Debug logging produces a large volume of output. Do not leave it enabled in production — it will fill your log storage quickly and may impact performance.

Decision Traces

For deep visibility into why the Recommend API returned specific results, enable decision tracing:

Go to Settings > General > Decision Tracing
Toggle tracing on and set the sample rate to 1.0 (100%) for debugging
Make a Recommend API call
Go to Studio > Decision Flows > Recent Traces to inspect the step-by-step pipeline execution

See Debugging with Decision Traces for details on reading trace output.

API Request IDs

Every API response includes an x-request-id header. Use this to correlate a specific client request with its server-side processing:

curl -v http://localhost:3000/api/v1/recommend \
  -H "Content-Type: application/json" \
  -d '{"customerId": "C001", "channel": "web"}' 2>&1 | grep x-request-id

# Output: < x-request-id: req_a1b2c3d4e5

Search your logs for this ID to find every log line related to that request.

Getting Help

If the troubleshooting steps above do not resolve your issue:

GitHub Issues

Search existing issues or file a new one. Include your error message, recommendationId, and steps to reproduce.

Documentation

Full platform documentation including API reference, deployment guides, and tutorials.

When filing an issue, include:

Error message — the full error text, not a summary
recommendationId / x-request-id — from the API response
Steps to reproduce — what you did before the error occurred
Environment — Docker or local dev, OS, Node.js version, PostgreSQL version
Relevant logs — the surrounding log lines (with LOG_LEVEL=debug if possible)

Operations Dashboard

Monitor pipeline metrics, DLQ, and circuit breakers

Decision Traces

Forensic tracing of decision pipeline execution

Architecture Overview

System architecture and scaling guidance

Get Started

Deploy & Operate

Runbooks

Data Platform

Decisioning Studio

Execute & Optimize

Intelligence

Platform & Security

Integrations

Reports

Release Notes

Documentation Index

​Where to Find Logs

​Local Development

​Docker

​Production (Structured Logging)

​Correlating Logs with API Responses

​Startup Issues

​API Errors

​Rate Limiting

​Decision Flow Issues

​No Offers Returned

​Scoring Returns 0

​Wrong Offers Returned

​Health Checks

​GET /api/health

​GET /api/ready

​Debugging with Decision Traces

​Enabling Traces

​Reading a Trace

​Finding Why an Offer Was Filtered

​Database Issues

​Schema Push Fails

​Migration Status Check

​Connection Pool Exhaustion

​Redis Issues

​Rate Limiter Not Working as Expected

​Stale Cache Data

​Redis Connection Errors

​Debug Mode

​Verbose Logging

​Decision Traces

​API Request IDs

​Getting Help

GitHub Issues

Documentation

​Related

Operations Dashboard

Decision Traces

Architecture Overview

Where to Find Logs

Local Development

Docker

Production (Structured Logging)

Correlating Logs with API Responses

Startup Issues

API Errors

Rate Limiting

Decision Flow Issues

No Offers Returned

Scoring Returns 0

Wrong Offers Returned

Health Checks

GET /api/health

GET /api/ready

Debugging with Decision Traces

Enabling Traces

Reading a Trace

Finding Why an Offer Was Filtered

Database Issues

Schema Push Fails

Migration Status Check

Connection Pool Exhaustion

Redis Issues

Rate Limiter Not Working as Expected

Stale Cache Data

Redis Connection Errors

Debug Mode

Verbose Logging

Decision Traces

API Request IDs

Getting Help

Related