Skip to main content

Request Lifecycle

Every Recommend API request passes through a series of stages. Understanding where caching and rate limiting apply helps you tune for your workload.

Caching Strategy

KaireonAI uses Redis as its caching layer. All cache reads go through the getCache().getOrFetch() helper, which transparently handles cache misses by querying PostgreSQL and storing the result with a configurable TTL.

What gets cached

DataCache Key PatternDefault TTLNotes
Active offerst:{tenantId}:offers:active300s (5 min)Includes creatives, categories, placements
Qualification rulest:{tenantId}:policies:eligibility300s (5 min)Active rules ordered by priority
Contact policiest:{tenantId}:policies:contactPolicy300s (5 min)Active policies ordered by priority
Guardrail rulest:{tenantId}:guardrails:active30sShorter TTL for faster policy iteration
Enrichment dataenrich:{tenantId}:{customerId}:{schemaId}Configurable per source (default 60s)Per-customer, per-schema-source
Rate limit countersratelimit:{route}:{tenantId}:{identifier}Window duration + 1sRedis sorted sets
Cap counterskaireon:cap:{key}End of current UTC dayAuto-expire at midnight UTC

Tuning TTLs

Enrichment sources support per-source TTL configuration in the Decision Flow’s enrichment stage:
{
  "enrichment": {
    "sources": [
      {
        "schemaId": "customer_profile",
        "fields": ["loan_amount", "credit_score"],
        "cacheTtlSeconds": 300,
        "prefix": "customer"
      },
      {
        "schemaId": "real_time_signals",
        "fields": ["last_page_viewed"],
        "cacheTtlSeconds": 10,
        "prefix": "signals"
      }
    ]
  }
}
Set cacheTtlSeconds lower for volatile data (real-time signals, session context) and higher for stable data (customer demographics, account details).

Cache Invalidation

Entity caches (offers, rules, policies) use a TTL-based expiration strategy. After updating an offer or policy via the CRUD API, changes propagate within the TTL window (up to 5 minutes for offers/policies, 30 seconds for guardrails). For immediate invalidation, restart the API process or reduce the TTL via environment configuration.

Rate Limiting

The Recommend API enforces per-tenant, per-endpoint rate limiting using a Redis sorted-set sliding window algorithm.

Algorithm

  1. ZREMRANGEBYSCORE — remove entries outside the current window
  2. ZADD — add the current request with its timestamp as score
  3. ZCARD — count entries remaining in the window
  4. EXPIRE — set key TTL to window duration + 1 second for cleanup
All four operations execute in a single Redis pipeline for atomicity. A 500ms timeout protects against Redis latency — if Redis does not respond in time, the limiter falls back to an in-memory sliding window.

Tier Configuration

Rate limits are tenant-scoped with three built-in tiers:
TierRequests per Minute
free100
standard1,000
enterprise10,000
Configure per-tenant tiers via environment variables:
# Global default tier
RATE_LIMIT_TIER=standard

# Per-tenant override (tenant ID uppercased, hyphens to underscores)
RATE_LIMIT_TIER_ACME_CORP=enterprise

Response Headers

When a request is rate-limited, the API returns HTTP 429 with:
HeaderDescription
X-RateLimit-LimitMaximum requests allowed in the window
X-RateLimit-RemainingRequests remaining (0 when limited)
Retry-AfterSeconds until the client should retry

Fail-Open vs Fail-Closed

The rate limiter supports two failure modes when Redis is unavailable:
  • Fail-open (default): Falls back to in-memory rate limiting. Use for standard API endpoints where availability matters more than strict enforcement.
  • Fail-closed: Returns 429 when Redis is down. The Recommend API uses failOpen: false to prevent abuse when rate limit state is unavailable.

Edge-Layer Rate Limiting

For DDoS protection, layer edge-level rate limiting in front of the API: nginx limit_req_zone, AWS WAF rate rules, or Cloudflare rate limiting rules. The application-level limiter handles tenant-scoped business logic limits; the edge layer handles volumetric protection.

Circuit Breakers

The Decision Flow engine includes a per-model circuit breaker to prevent cascading failures when a scoring model is unhealthy.

Parameters

ParameterValueSource
Failure threshold5 consecutive failuresMODEL_CB_THRESHOLD
Cooldown period60 secondsMODEL_CB_COOLDOWN_MS
Fallback score0.5 (configurable)SCORING_FALLBACK_SCORE env var
Fallback methodpriority_weighted scoringUses offer priority and creative weight

Behavior

  1. Each scoring model is tracked by its modelKey.
  2. On a model error, the failure counter increments via recordModelFailure().
  3. When failures reach the threshold (5), the circuit opens and sets a cooldown expiry at now + 60 seconds.
  4. While open, all requests for that model skip the model call entirely and receive the fallback score (0.5 * weight * fitMultiplier).
  5. After the cooldown expires, the next request acts as a probe — if it succeeds, recordModelSuccess() resets the counter (circuit closes). If it fails, the circuit re-opens for another cooldown period.
  6. The response includes degradedScoring: true when any model was bypassed.

Monitoring

The scoringModelFailureTotal Prometheus counter tracks failures by model key. Monitor this metric to detect model health issues before they impact all decisions. The decisionFlowExecutionLatency histogram tracks end-to-end pipeline latency.

Atomic Cap Checking

Mandatory offer daily caps use Redis INCR for race-free atomic counting:
  1. INCR kaireon:cap:mandatory:{customerId}:{YYYY-MM-DD} — single atomic read-and-increment
  2. On first increment (current === 1), set EXPIRE to end of current UTC day
  3. If current > cap, the offer is blocked
  4. Default cap: 5 mandatory offers per customer per day (configurable via MAX_MANDATORY_OFFERS_PER_DAY env var)
When Redis is unavailable, falls back to a Prisma-based count query against interactionSummary records. This fallback has a small race window under concurrent requests but is acceptable for resilience. The cap check fails closed — if both Redis and the database count fail, mandatory offers are blocked entirely (safety-first design).

Connection Pooling

KaireonAI uses Prisma 7 with the @prisma/adapter-pg driver adapter. Connection pooling is handled by the underlying pg Pool.

Configuration

The database connection is configured in prisma.config.ts via the DATABASE_URL environment variable. Pool sizing is controlled through connection string parameters:
# Example with pool sizing parameters
DATABASE_URL="postgresql://user:pass@host:5432/kaireon?connection_limit=20&pool_timeout=10"

Sizing Guidance

Deployment SizeSuggested Pool SizeNotes
Single instance5-10Default pg Pool settings are sufficient
2-5 replicas10-15 per replicaTotal connections = replicas x pool size
10+ replicas5-10 per replicaUse PgBouncer or RDS Proxy to multiplex
Rule of thumb: Total connections across all replicas should not exceed your database’s max_connections minus a buffer for admin/monitoring connections.

Horizontal Scaling

The KaireonAI API is stateless — all shared state lives in Redis and PostgreSQL. This means you can scale API instances horizontally with no coordination overhead.

Architecture

                    ┌─────────────┐
                    │   Load      │
                    │  Balancer   │
                    └──────┬──────┘

              ┌────────────┼────────────┐
              │            │            │
        ┌─────┴─────┐ ┌───┴─────┐ ┌───┴─────┐
        │  API Pod   │ │ API Pod │ │ API Pod │
        │  (Node.js) │ │         │ │         │
        └─────┬──────┘ └───┬─────┘ └───┬─────┘
              │            │            │
              └────────────┼────────────┘

              ┌────────────┼────────────┐
              │                         │
        ┌─────┴─────┐           ┌──────┴──────┐
        │   Redis   │           │ PostgreSQL  │
        │ (shared)  │           │  (shared)   │
        └───────────┘           └─────────────┘

Key Properties

  • No session affinity required: Any API pod can handle any request. Rate limit state and caching are in Redis; all persistent state is in PostgreSQL.
  • In-memory circuit breakers are per-process: Each pod tracks its own model failure counts. This is intentional — a model failure on one pod does not cascade to others, and each pod independently probes recovery.
  • In-memory rate limit fallback is per-process: When Redis is down, each pod maintains its own rate limit counters. Effective limits become configured_limit x num_pods during Redis outages.
  • Scale API pods independently from worker pods: Decision API pods handle synchronous request/response. Data pipeline worker pods handle asynchronous ETL. Size each tier based on its workload.

Batch vs Streaming Pipelines

Data pipelines support two execution modes, configured per-pipeline in the executionConfig JSON field:

Batch Mode

For scheduled or on-demand data loads:
{
  "mode": "batch",
  "batchSize": 1000,
  "parallelism": 4,
  "partitioning": {
    "strategy": "hash",
    "key": "customer_id",
    "partitions": 8
  }
}
ParameterDescriptionDefault
batchSizeRecords per processing chunk1000
parallelismConcurrent processing threads1
partitioning.strategyHow to split data (hash, range, round_robin)None
partitioning.keyField to partition on
partitioning.partitionsNumber of partitions

Streaming Mode

For real-time event processing from Kafka, Kinesis, or Confluent:
{
  "mode": "streaming",
  "batchSize": 100,
  "parallelism": 2,
  "checkpointIntervalMs": 30000
}

K8s Worker Pod Configuration

Pipeline execution runs on dedicated worker pods, separate from the API tier. Configure resource limits based on pipeline complexity:
  • CPU-bound transforms (expression evaluation, hashing): Scale parallelism up to available CPU cores.
  • I/O-bound transforms (external lookups, PII masking): Higher parallelism with moderate CPU allocation.
  • Memory-bound transforms (large batch joins): Increase pod memory limits and reduce batchSize.

Production Tuning Checklist

Under 1K decisions/day

  • Single API instance with default settings
  • In-memory rate limiting and caching are sufficient
  • Default connection pool (5-10 connections)
  • No Redis required (in-memory fallbacks handle the load)

1K — 100K decisions/day

  • Redis required for rate limiting, enrichment caching, and atomic cap checks
  • Tune enrichment TTLs: increase to 300s+ for stable customer data
  • Increase connection pool to 15-20 per instance
  • Enable decision tracing with a sample rate (e.g., 10%) rather than 100%
  • Monitor decisionLatencyMs and scoringModelFailureTotal metrics

100K+ decisions/day

  • Multiple API replicas behind a load balancer (3+ pods recommended)
  • Dedicated Redis instance (ElastiCache or equivalent) with sufficient memory for rate limit sorted sets + enrichment cache
  • PostgreSQL read replicas for offer/policy reads; primary for writes only
  • Use PgBouncer or RDS Proxy to multiplex database connections
  • Set MAX_ACTIVE_OFFERS env var to limit the offer scan set (default: 5,000)
  • Lower guardrail TTL if policy changes need sub-30s propagation
  • Configure edge-layer rate limiting (AWS WAF, Cloudflare) for DDoS protection
  • Enable budget pacing for high-volume offers to spread delivery across the day
  • Set SCORING_FALLBACK_SCORE to a value appropriate for your scoring distribution

Environment Variables Reference

VariableDescriptionDefault
REDIS_URLRedis connection stringredis://localhost:6379
DATABASE_URLPostgreSQL connection string— (required)
MAX_ACTIVE_OFFERSMax offers loaded per decision5000
MAX_MANDATORY_OFFERS_PER_DAYDaily mandatory cap per customer5
SCORING_FALLBACK_SCOREScore when model is unavailable0.5
RATE_LIMIT_TIERGlobal rate limit tierstandard