Documentation Index
Fetch the complete documentation index at: https://docs.kaireonai.com/llms.txt
Use this file to discover all available pages before exploring further.
Audience: SREs, platform operators, DevOps engineers
Last updated: 2026-02-23
Infrastructure: EKS (Kubernetes), RDS PostgreSQL, ElastiCache Redis, PgBouncer
Table of Contents
- Scaling the API Layer
- Scaling Workers
- Scaling the Database
- Scaling Redis
- Scaling PgBouncer
- Capacity Planning
1. Scaling the API Layer
Current Architecture
ALB (Application Load Balancer)
|
v
KaireonAI API Deployment (N replicas, HPA-managed)
|
+---> PgBouncer ---> PostgreSQL
+---> Redis
Horizontal Pod Autoscaler (HPA)
Default HPA configuration (from Helm chart):
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: kaireon-api-hpa
namespace: kaireon
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: kaireon-api
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
The default Helm chart provides CPU-based HPA only. For production environments with high throughput, add memory and custom metric scaling via a Helm values override:
HPA Tuning
Adjusting target CPU utilization:
# Check current HPA status
kubectl get hpa kaireon-api-hpa -n kaireon
# View HPA events and scaling decisions
kubectl describe hpa kaireon-api-hpa -n kaireon
# Patch CPU target (lower = more aggressive scaling)
kubectl patch hpa kaireon-api-hpa -n kaireon --type merge -p \
'{"spec":{"metrics":[{"type":"Resource","resource":{"name":"cpu","target":{"type":"Utilization","averageUtilization":50}}}]}}'
Adjusting replica bounds:
# Increase max replicas for expected traffic spike
kubectl patch hpa kaireon-api-hpa -n kaireon --type merge -p \
'{"spec":{"maxReplicas":30}}'
# Increase minimum replicas for guaranteed capacity
kubectl patch hpa kaireon-api-hpa -n kaireon --type merge -p \
'{"spec":{"minReplicas":5}}'
Adjusting scale-up/down behavior:
# Faster scale-up (for traffic spikes)
kubectl patch hpa kaireon-api-hpa -n kaireon --type merge -p \
'{"spec":{"behavior":{"scaleUp":{"stabilizationWindowSeconds":30,"policies":[{"type":"Pods","value":6,"periodSeconds":30}]}}}}'
# Slower scale-down (prevent flapping)
kubectl patch hpa kaireon-api-hpa -n kaireon --type merge -p \
'{"spec":{"behavior":{"scaleDown":{"stabilizationWindowSeconds":600}}}}'
Manual Scaling
For planned events or emergencies:
# Scale to a specific number of replicas
kubectl scale deploy/kaireon-api -n kaireon --replicas=15
# Note: HPA will override manual scaling. To hold a fixed count,
# set minReplicas = maxReplicas temporarily:
kubectl patch hpa kaireon-api-hpa -n kaireon --type merge -p \
'{"spec":{"minReplicas":15,"maxReplicas":15}}'
# Revert after the event
kubectl patch hpa kaireon-api-hpa -n kaireon --type merge -p \
'{"spec":{"minReplicas":3,"maxReplicas":20}}'
Pod Resource Tuning
resources:
requests:
cpu: "500m"
memory: "512Mi"
limits:
cpu: "2000m"
memory: "2Gi"
Adjusting pod resources:
# Increase memory for OOM-prone pods
kubectl set resources deploy/kaireon-api -n kaireon \
--requests=memory=1Gi \
--limits=memory=4Gi
# Increase CPU for CPU-bound workloads
kubectl set resources deploy/kaireon-api -n kaireon \
--requests=cpu=1000m \
--limits=cpu=4000m
Pre-scaling for Planned Events
# 1. Scale up 30 minutes before the event
kubectl patch hpa kaireon-api-hpa -n kaireon --type merge -p \
'{"spec":{"minReplicas":15}}'
# 2. Verify pods are ready
kubectl get pods -n kaireon -l app=kaireon-api | grep Running | wc -l
# 3. After the event, reduce minimum
kubectl patch hpa kaireon-api-hpa -n kaireon --type merge -p \
'{"spec":{"minReplicas":3}}'
2. Scaling Workers
Current Architecture
Redis Queue (decisions, pipelines, batch)
|
v
KaireonAI Worker Deployment (M replicas, KEDA-managed)
KEDA Autoscaler
Current KEDA configuration:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: kaireon-worker-scaler
namespace: kaireon
spec:
scaleTargetRef:
name: kaireon-worker
minReplicaCount: 2
maxReplicaCount: 30
pollingInterval: 15
cooldownPeriod: 300
triggers:
- type: redis
metadata:
address: kaireon-redis.kaireon.svc.cluster.local:6379
listName: queue:decisions
listLength: "50"
enableTLS: "false"
databaseIndex: "0"
- type: redis
metadata:
address: kaireon-redis.kaireon.svc.cluster.local:6379
listName: queue:pipelines
listLength: "20"
enableTLS: "false"
databaseIndex: "0"
KEDA Tuning
Adjusting trigger thresholds:
# Edit the ScaledObject
kubectl edit scaledobject kaireon-worker-scaler -n kaireon
# Key parameters:
# - listLength: queue depth per replica (lower = more aggressive)
# - pollingInterval: how often KEDA checks (seconds)
# - cooldownPeriod: wait before scaling down (seconds)
Recommended settings by scenario:
| Scenario | listLength | pollingInterval | cooldownPeriod | maxReplicas |
|---|
| Normal | 50 | 15 | 300 | 30 |
| High throughput | 20 | 10 | 180 | 50 |
| Cost sensitive | 100 | 30 | 600 | 15 |
| Batch processing | 10 | 10 | 60 | 50 |
Manual Worker Scaling
# Temporarily override KEDA (pause the scaler)
kubectl annotate scaledobject kaireon-worker-scaler -n kaireon \
autoscaling.keda.sh/paused-replicas="20"
# Scale directly
kubectl scale deploy/kaireon-worker -n kaireon --replicas=20
# Resume KEDA
kubectl annotate scaledobject kaireon-worker-scaler -n kaireon \
autoscaling.keda.sh/paused-replicas-
# Or remove the annotation entirely
kubectl annotate scaledobject kaireon-worker-scaler -n kaireon \
autoscaling.keda.sh/paused-replicas- --overwrite
Worker Resource Tuning
Workers are CPU-intensive during scoring and memory-intensive during pipeline execution.
resources:
requests:
cpu: "1000m"
memory: "1Gi"
limits:
cpu: "4000m"
memory: "4Gi"
Queue-Specific Worker Pools
For isolating workloads, deploy separate worker pools per queue:
# Decision workers (latency-sensitive)
kubectl scale deploy/kaireon-worker-decisions -n kaireon --replicas=10
# Pipeline workers (throughput-focused)
kubectl scale deploy/kaireon-worker-pipelines -n kaireon --replicas=5
# Batch workers (cost-optimized, use spot instances)
kubectl scale deploy/kaireon-worker-batch -n kaireon --replicas=3
3. Scaling the Database
Vertical Scaling (RDS)
Instance type progression:
| Tier | Instance Type | vCPUs | Memory | Max Connections | Use Case |
|---|
| Dev | db.t3.medium | 2 | 4 GB | 100 | Development |
| Small | db.r6g.large | 2 | 16 GB | 200 | Low traffic |
| Medium | db.r6g.xlarge | 4 | 32 GB | 400 | Standard |
| Large | db.r6g.2xlarge | 8 | 64 GB | 800 | High traffic |
| XLarge | db.r6g.4xlarge | 16 | 128 GB | 1600 | Peak traffic |
Scaling up:
# Check current instance type
aws rds describe-db-instances \
--db-instance-identifier kaireon-prod \
--query 'DBInstances[0].DBInstanceClass'
# Scale up (causes a brief outage during maintenance window)
aws rds modify-db-instance \
--db-instance-identifier kaireon-prod \
--db-instance-class db.r6g.2xlarge \
--apply-immediately
# Monitor the modification
aws rds describe-db-instances \
--db-instance-identifier kaireon-prod \
--query 'DBInstances[0].[DBInstanceStatus,PendingModifiedValues]'
Important: Scaling up causes a brief outage (typically 1-3 minutes). Schedule during maintenance windows if possible. With Multi-AZ, failover minimizes downtime.
Read Replicas
Use read replicas to offload read-heavy queries (dashboards, analytics, reporting).
Creating a read replica:
aws rds create-db-instance-read-replica \
--db-instance-identifier kaireon-prod-read-1 \
--source-db-instance-identifier kaireon-prod \
--db-instance-class db.r6g.xlarge \
--availability-zone us-east-1b
# Wait for it to become available
aws rds wait db-instance-available --db-instance-identifier kaireon-prod-read-1
Application configuration for read replicas:
# Set environment variables for the API
kubectl set env deploy/kaireon-api \
DATABASE_READ_URL="postgresql://$DB_USER:$DB_PASSWORD@kaireon-prod-read-1.xxxxxxxx.us-east-1.rds.amazonaws.com:5432/kaireon"
Monitoring replica lag:
aws cloudwatch get-metric-statistics \
--namespace AWS/RDS \
--metric-name ReplicaLag \
--dimensions Name=DBInstanceIdentifier,Value=kaireon-prod-read-1 \
--start-time $(date -u -v-1H +%Y-%m-%dT%H:%M:%S) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
--period 60 \
--statistics Average
Promoting a read replica (for failover or splitting):
aws rds promote-read-replica \
--db-instance-identifier kaireon-prod-read-1
Storage Scaling
# Check current storage
aws rds describe-db-instances \
--db-instance-identifier kaireon-prod \
--query 'DBInstances[0].[AllocatedStorage,StorageType,Iops]'
# Increase storage (online, no downtime for gp3)
aws rds modify-db-instance \
--db-instance-identifier kaireon-prod \
--allocated-storage 500 \
--apply-immediately
# Enable storage autoscaling
aws rds modify-db-instance \
--db-instance-identifier kaireon-prod \
--max-allocated-storage 1000 \
--apply-immediately
Connection Limit Scaling
When scaling the database vertically, adjust max_connections accordingly:
# Check current max_connections
aws rds describe-db-parameters \
--db-parameter-group-name kaireon-prod-params \
--query "Parameters[?ParameterName=='max_connections']"
# RDS default formula: LEAST({DBInstanceClassMemory/9531392}, 5000)
# Override if needed:
aws rds modify-db-parameter-group \
--db-parameter-group-name kaireon-prod-params \
--parameters "ParameterName=max_connections,ParameterValue=400,ApplyMethod=pending-reboot"
4. Scaling Redis
Vertical Scaling (ElastiCache)
Instance type progression:
| Tier | Instance Type | Memory | Network | Use Case |
|---|
| Dev | cache.t3.small | 1.37 GB | Low-Moderate | Development |
| Small | cache.r6g.large | 13.07 GB | Up to 10 Gbps | Standard |
| Medium | cache.r6g.xlarge | 26.32 GB | Up to 10 Gbps | High cache |
| Large | cache.r6g.2xlarge | 52.82 GB | Up to 10 Gbps | Large dataset |
Scaling up (single node, non-clustered):
aws elasticache modify-replication-group \
--replication-group-id kaireon-redis \
--cache-node-type cache.r6g.xlarge \
--apply-immediately
Cluster Mode
For datasets that exceed single-node memory or require higher throughput.
Enabling cluster mode:
# Create a new cluster-mode-enabled replication group
aws elasticache create-replication-group \
--replication-group-id kaireon-redis-cluster \
--replication-group-description "KaireonAI Redis Cluster" \
--cache-node-type cache.r6g.large \
--num-node-groups 3 \
--replicas-per-node-group 1 \
--automatic-failover-enabled \
--multi-az-enabled \
--cache-parameter-group-name default.redis7.cluster.on
Resharding (adding shards):
aws elasticache modify-replication-group-shard-configuration \
--replication-group-id kaireon-redis-cluster \
--node-group-count 6 \
--apply-immediately
Application configuration for cluster mode:
kubectl set env deploy/kaireon-api \
REDIS_CLUSTER_MODE=true \
REDIS_URL="redis://kaireon-redis-cluster.xxxxxxxx.clustercfg.use1.cache.amazonaws.com:6379"
Redis Memory Management
# Check memory usage
kubectl exec -it deploy/kaireon-redis -- redis-cli INFO memory
# Set maxmemory
kubectl exec -it deploy/kaireon-redis -- redis-cli CONFIG SET maxmemory 12gb
# Set eviction policy
kubectl exec -it deploy/kaireon-redis -- redis-cli CONFIG SET maxmemory-policy volatile-lru
Scaling Redis for Specific Use Cases
Decision caching (read-heavy):
- Use read replicas for read distribution.
- Set short TTLs (30-60s) to limit memory growth.
- Use
volatile-lru eviction.
Queue processing (write-heavy):
- Scale vertically for more throughput.
- Monitor
instantaneous_ops_per_sec.
- Consider cluster mode if >100K ops/sec.
Session storage:
- Separate from cache Redis.
- Use
noeviction policy (sessions must not be evicted).
- Size for peak concurrent users.
5. Scaling PgBouncer
Pool Size Calculations
Total server connections needed:
= (API replicas * connections_per_pod) + (Worker replicas * connections_per_worker)
+ monitoring + replication + admin overhead
Example:
API: 10 replicas * 5 connections = 50
Workers: 10 replicas * 3 connections = 30
Monitoring: 5
Replication: 5
Admin: 10
Total: 100
PgBouncer settings:
max_db_connections = 100 (per PgBouncer instance)
default_pool_size = 50 (per database)
max_client_conn = 500 (per PgBouncer instance)
Scaling PgBouncer Horizontally
Deploy multiple PgBouncer instances behind a Kubernetes Service:
# Scale PgBouncer replicas
kubectl scale deploy/kaireon-pgbouncer -n kaireon --replicas=3
# Verify all replicas are healthy
kubectl get pods -n kaireon -l app=kaireon-pgbouncer
Adjusting pool size when scaling API/workers:
| API Replicas | Worker Replicas | PgBouncer Instances | default_pool_size | max_db_connections |
|---|
| 3 | 2 | 1 | 25 | 80 |
| 10 | 10 | 2 | 25 | 100 |
| 20 | 20 | 3 | 30 | 120 |
| 30 | 30 | 4 | 25 | 100 |
Important: Total max_db_connections across all PgBouncer instances must not exceed PostgreSQL max_connections.
Applying Pool Changes
# Edit ConfigMap
kubectl edit configmap kaireon-pgbouncer-config -n kaireon
# Hot-reload (no restart needed for pool size changes)
for pod in $(kubectl get pods -n kaireon -l app=kaireon-pgbouncer -o name); do
kubectl exec -n kaireon $pod -- psql -p 6432 pgbouncer -c "RELOAD;"
done
# Verify new settings
kubectl exec -it deploy/kaireon-pgbouncer -n kaireon -- psql -p 6432 pgbouncer -c "SHOW CONFIG;" | grep pool
6. Capacity Planning
Metrics to Track
| Metric | Current | Warning Threshold | Scaling Action |
|---|
| API CPU utilization | - | >60% sustained | Increase HPA maxReplicas |
| API memory utilization | - | >75% sustained | Increase pod memory limit |
| Worker queue depth | - | >50 sustained | Lower KEDA listLength |
| DB CPU utilization | - | >70% sustained | Scale up instance type |
| DB connections used | - | >70% of max | Scale PgBouncer or DB |
| DB storage used | - | >70% of allocated | Increase storage |
| Redis memory used | - | >70% of maxmemory | Scale up instance type |
| Redis ops/sec | - | >80% of benchmark | Enable cluster mode |
Monthly Capacity Review Checklist
- Review 30-day trends for all metrics above.
- Project growth for the next 90 days based on customer onboarding pipeline.
- Identify any component within 30 days of hitting a threshold.
- Plan scaling actions with cost estimates.
- Update this document with new current values.
Cost-Aware Scaling
- Use Spot instances for batch workers (up to 70% savings).
- Use Reserved Instances for baseline API and database capacity.
- Use Graviton (arm64) instances for 20% better price-performance.
- Scale down non-production environments during off-hours:
# Scale down staging at 8 PM
kubectl scale deploy -n kaireon-staging --all --replicas=0
# Scale up at 8 AM
kubectl scale deploy -n kaireon-staging --all --replicas=1
Load Testing Before Scaling
Before major scaling changes, validate with load tests:
# Install k6
brew install k6
# Run a decision endpoint load test
k6 run --vus 100 --duration 5m scripts/load-test-decisions.js
# Run with ramping to find the breaking point
k6 run --stage '1m:50,3m:200,1m:50' scripts/load-test-decisions.js
Key metrics to capture during load tests:
- P50, P95, P99 latency at each VU level.
- Error rate at each VU level.
- Database connection count and query time.
- Redis memory and ops/sec.
- Pod CPU and memory utilization.