KaireonAI uses Redis as its caching layer. All cache reads go through the
getCache().getOrFetch() helper, which transparently handles cache misses by
querying PostgreSQL and storing the result with a configurable TTL.
Entity caches (offers, rules, policies) use a TTL-based expiration strategy.
After updating an offer or policy via the CRUD API, changes propagate within
the TTL window (up to 5 minutes for offers/policies, 30 seconds for guardrails).For immediate invalidation, restart the API process or reduce the TTL via
environment configuration.
ZREMRANGEBYSCORE — remove entries outside the current window
ZADD — add the current request with its timestamp as score
ZCARD — count entries remaining in the window
EXPIRE — set key TTL to window duration + 1 second for cleanup
All four operations execute in a single Redis pipeline for atomicity. A 500ms
timeout protects against Redis latency — if Redis does not respond in time,
the limiter falls back to an in-memory sliding window.
For DDoS protection, layer edge-level rate limiting in front of the API:
nginx limit_req_zone, AWS WAF rate rules, or Cloudflare rate limiting rules.
The application-level limiter handles tenant-scoped business logic limits;
the edge layer handles volumetric protection.
On a model error, the failure counter increments via recordModelFailure().
When failures reach the threshold (5), the circuit opens and sets a
cooldown expiry at now + 60 seconds.
While open, all requests for that model skip the model call entirely and
receive the fallback score (0.5 * weight * fitMultiplier).
After the cooldown expires, the next request acts as a probe — if it
succeeds, recordModelSuccess() resets the counter (circuit closes).
If it fails, the circuit re-opens for another cooldown period.
The response includes degradedScoring: true when any model was bypassed.
The scoringModelFailureTotal Prometheus counter tracks failures by model key.
Monitor this metric to detect model health issues before they impact all decisions.
The decisionFlowExecutionLatency histogram tracks end-to-end pipeline latency.
Mandatory offer daily caps use Redis INCR for race-free atomic counting:
INCR kaireon:cap:mandatory:{customerId}:{YYYY-MM-DD} — single atomic
read-and-increment
On first increment (current === 1), set EXPIRE to end of current UTC day
If current > cap, the offer is blocked
Default cap: 5 mandatory offers per customer per day (configurable via
MAX_MANDATORY_OFFERS_PER_DAY env var)
When Redis is unavailable, falls back to a Prisma-based count query against
interactionSummary records. This fallback has a small race window under
concurrent requests but is acceptable for resilience.The cap check fails closed — if both Redis and the database count fail,
mandatory offers are blocked entirely (safety-first design).
The database connection is configured in prisma.config.ts via the DATABASE_URL
environment variable. Pool sizing is controlled through connection string parameters:
# Example with pool sizing parametersDATABASE_URL="postgresql://user:pass@host:5432/kaireon?connection_limit=20&pool_timeout=10"
Rule of thumb: Total connections across all replicas should not exceed your
database’s max_connections minus a buffer for admin/monitoring connections.
The KaireonAI API is stateless — all shared state lives in Redis and
PostgreSQL. This means you can scale API instances horizontally with no
coordination overhead.
No session affinity required: Any API pod can handle any request.
Rate limit state and caching are in Redis; all persistent state is in PostgreSQL.
In-memory circuit breakers are per-process: Each pod tracks its own model
failure counts. This is intentional — a model failure on one pod does not
cascade to others, and each pod independently probes recovery.
In-memory rate limit fallback is per-process: When Redis is down, each pod
maintains its own rate limit counters. Effective limits become
configured_limit x num_pods during Redis outages.
Scale API pods independently from worker pods: Decision API pods handle
synchronous request/response. Data pipeline worker pods handle asynchronous
ETL. Size each tier based on its workload.
Streaming mode is a placeholder and not yet implemented. The execution
mode field accepts streaming for forward-compatibility, but the platform
does not currently spawn a long-lived consumer process. Kafka, Confluent,
and (when shipped) Amazon Kinesis connectors run as batch polling —
each pipeline run opens a consumer, reads up to maxMessages records,
commits offsets, and closes. Schedule those pipelines on a cron cadence
that matches your freshness target until a persistent worker is available.
The config shape below is reserved for the future streaming runtime — today
it has no effect beyond being persisted on the pipeline record: