Scaling - KaireonAI

Audience: SREs, platform operators, DevOps engineers Last updated: 2026-02-23 Infrastructure: EKS (Kubernetes), RDS PostgreSQL, ElastiCache Redis, PgBouncer

Scaling the API Layer
Scaling Workers
Scaling the Database
Scaling Redis
Scaling PgBouncer
Capacity Planning

1. Scaling the API Layer

Current Architecture

ALB (Application Load Balancer)
    |
    v
KaireonAI API Deployment (N replicas, HPA-managed)
    |
    +---> PgBouncer ---> PostgreSQL
    +---> Redis

Horizontal Pod Autoscaler (HPA)

Default HPA configuration (from Helm chart):

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: kaireon-api-hpa
  namespace: kaireon
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: kaireon-api
  minReplicas: 3
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

The default Helm chart provides CPU-based HPA only. For production environments with high throughput, add memory and custom metric scaling via a Helm values override:

HPA Tuning

Adjusting target CPU utilization:

# Check current HPA status
kubectl get hpa kaireon-api-hpa -n kaireon

# View HPA events and scaling decisions
kubectl describe hpa kaireon-api-hpa -n kaireon

# Patch CPU target (lower = more aggressive scaling)
kubectl patch hpa kaireon-api-hpa -n kaireon --type merge -p \
  '{"spec":{"metrics":[{"type":"Resource","resource":{"name":"cpu","target":{"type":"Utilization","averageUtilization":50}}}]}}'

Adjusting replica bounds:

# Increase max replicas for expected traffic spike
kubectl patch hpa kaireon-api-hpa -n kaireon --type merge -p \
  '{"spec":{"maxReplicas":30}}'

# Increase minimum replicas for guaranteed capacity
kubectl patch hpa kaireon-api-hpa -n kaireon --type merge -p \
  '{"spec":{"minReplicas":5}}'

Adjusting scale-up/down behavior:

# Faster scale-up (for traffic spikes)
kubectl patch hpa kaireon-api-hpa -n kaireon --type merge -p \
  '{"spec":{"behavior":{"scaleUp":{"stabilizationWindowSeconds":30,"policies":[{"type":"Pods","value":6,"periodSeconds":30}]}}}}'

# Slower scale-down (prevent flapping)
kubectl patch hpa kaireon-api-hpa -n kaireon --type merge -p \
  '{"spec":{"behavior":{"scaleDown":{"stabilizationWindowSeconds":600}}}}'

Manual Scaling

For planned events or emergencies:

# Scale to a specific number of replicas
kubectl scale deploy/kaireon-api -n kaireon --replicas=15

# Note: HPA will override manual scaling. To hold a fixed count,
# set minReplicas = maxReplicas temporarily:
kubectl patch hpa kaireon-api-hpa -n kaireon --type merge -p \
  '{"spec":{"minReplicas":15,"maxReplicas":15}}'

# Revert after the event
kubectl patch hpa kaireon-api-hpa -n kaireon --type merge -p \
  '{"spec":{"minReplicas":3,"maxReplicas":20}}'

Pod Resource Tuning

resources:
  requests:
    cpu: "500m"
    memory: "512Mi"
  limits:
    cpu: "2000m"
    memory: "2Gi"

Adjusting pod resources:

# Increase memory for OOM-prone pods
kubectl set resources deploy/kaireon-api -n kaireon \
  --requests=memory=1Gi \
  --limits=memory=4Gi

# Increase CPU for CPU-bound workloads
kubectl set resources deploy/kaireon-api -n kaireon \
  --requests=cpu=1000m \
  --limits=cpu=4000m

Pre-scaling for Planned Events

# 1. Scale up 30 minutes before the event
kubectl patch hpa kaireon-api-hpa -n kaireon --type merge -p \
  '{"spec":{"minReplicas":15}}'

# 2. Verify pods are ready
kubectl get pods -n kaireon -l app=kaireon-api | grep Running | wc -l

# 3. After the event, reduce minimum
kubectl patch hpa kaireon-api-hpa -n kaireon --type merge -p \
  '{"spec":{"minReplicas":3}}'

2. Scaling Workers

Current Architecture

Redis Queue (decisions, pipelines, batch)
    |
    v
KaireonAI Worker Deployment (M replicas, KEDA-managed)

KEDA Autoscaler

Current KEDA configuration:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: kaireon-worker-scaler
  namespace: kaireon
spec:
  scaleTargetRef:
    name: kaireon-worker
  minReplicaCount: 2
  maxReplicaCount: 30
  pollingInterval: 15
  cooldownPeriod: 300
  triggers:
    - type: redis
      metadata:
        address: kaireon-redis.kaireon.svc.cluster.local:6379
        listName: queue:decisions
        listLength: "50"
        enableTLS: "false"
        databaseIndex: "0"
    - type: redis
      metadata:
        address: kaireon-redis.kaireon.svc.cluster.local:6379
        listName: queue:pipelines
        listLength: "20"
        enableTLS: "false"
        databaseIndex: "0"

KEDA Tuning

Adjusting trigger thresholds:

# Edit the ScaledObject
kubectl edit scaledobject kaireon-worker-scaler -n kaireon

# Key parameters:
# - listLength: queue depth per replica (lower = more aggressive)
# - pollingInterval: how often KEDA checks (seconds)
# - cooldownPeriod: wait before scaling down (seconds)

Recommended settings by scenario:

Scenario	listLength	pollingInterval	cooldownPeriod	maxReplicas
Normal	50	15	300	30
High throughput	20	10	180	50
Cost sensitive	100	30	600	15
Batch processing	10	10	60	50

Manual Worker Scaling

# Temporarily override KEDA (pause the scaler)
kubectl annotate scaledobject kaireon-worker-scaler -n kaireon \
  autoscaling.keda.sh/paused-replicas="20"

# Scale directly
kubectl scale deploy/kaireon-worker -n kaireon --replicas=20

# Resume KEDA
kubectl annotate scaledobject kaireon-worker-scaler -n kaireon \
  autoscaling.keda.sh/paused-replicas-

# Or remove the annotation entirely
kubectl annotate scaledobject kaireon-worker-scaler -n kaireon \
  autoscaling.keda.sh/paused-replicas- --overwrite

Worker Resource Tuning

Workers are CPU-intensive during scoring and memory-intensive during pipeline execution.

resources:
  requests:
    cpu: "1000m"
    memory: "1Gi"
  limits:
    cpu: "4000m"
    memory: "4Gi"

Queue-Specific Worker Pools

For isolating workloads, deploy separate worker pools per queue:

# Decision workers (latency-sensitive)
kubectl scale deploy/kaireon-worker-decisions -n kaireon --replicas=10

# Pipeline workers (throughput-focused)
kubectl scale deploy/kaireon-worker-pipelines -n kaireon --replicas=5

# Batch workers (cost-optimized, use spot instances)
kubectl scale deploy/kaireon-worker-batch -n kaireon --replicas=3

3. Scaling the Database

Vertical Scaling (RDS)

Instance type progression:

Tier	Instance Type	vCPUs	Memory	Max Connections	Use Case
Dev	db.t3.medium	2	4 GB	100	Development
Small	db.r6g.large	2	16 GB	200	Low traffic
Medium	db.r6g.xlarge	4	32 GB	400	Standard
Large	db.r6g.2xlarge	8	64 GB	800	High traffic
XLarge	db.r6g.4xlarge	16	128 GB	1600	Peak traffic

Scaling up:

# Check current instance type
aws rds describe-db-instances \
  --db-instance-identifier kaireon-prod \
  --query 'DBInstances[0].DBInstanceClass'

# Scale up (causes a brief outage during maintenance window)
aws rds modify-db-instance \
  --db-instance-identifier kaireon-prod \
  --db-instance-class db.r6g.2xlarge \
  --apply-immediately

# Monitor the modification
aws rds describe-db-instances \
  --db-instance-identifier kaireon-prod \
  --query 'DBInstances[0].[DBInstanceStatus,PendingModifiedValues]'

Important: Scaling up causes a brief outage (typically 1-3 minutes). Schedule during maintenance windows if possible. With Multi-AZ, failover minimizes downtime.

Read Replicas

Use read replicas to offload read-heavy queries (dashboards, analytics, reporting). Creating a read replica:

aws rds create-db-instance-read-replica \
  --db-instance-identifier kaireon-prod-read-1 \
  --source-db-instance-identifier kaireon-prod \
  --db-instance-class db.r6g.xlarge \
  --availability-zone us-east-1b

# Wait for it to become available
aws rds wait db-instance-available --db-instance-identifier kaireon-prod-read-1

Application configuration for read replicas:

# Set environment variables for the API
kubectl set env deploy/kaireon-api \
  DATABASE_READ_URL="postgresql://$DB_USER:$DB_PASSWORD@kaireon-prod-read-1.xxxxxxxx.us-east-1.rds.amazonaws.com:5432/kaireon"

Monitoring replica lag:

aws cloudwatch get-metric-statistics \
  --namespace AWS/RDS \
  --metric-name ReplicaLag \
  --dimensions Name=DBInstanceIdentifier,Value=kaireon-prod-read-1 \
  --start-time $(date -u -v-1H +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 60 \
  --statistics Average

Promoting a read replica (for failover or splitting):

aws rds promote-read-replica \
  --db-instance-identifier kaireon-prod-read-1

Storage Scaling

# Check current storage
aws rds describe-db-instances \
  --db-instance-identifier kaireon-prod \
  --query 'DBInstances[0].[AllocatedStorage,StorageType,Iops]'

# Increase storage (online, no downtime for gp3)
aws rds modify-db-instance \
  --db-instance-identifier kaireon-prod \
  --allocated-storage 500 \
  --apply-immediately

# Enable storage autoscaling
aws rds modify-db-instance \
  --db-instance-identifier kaireon-prod \
  --max-allocated-storage 1000 \
  --apply-immediately

Connection Limit Scaling

When scaling the database vertically, adjust max_connections accordingly:

# Check current max_connections
aws rds describe-db-parameters \
  --db-parameter-group-name kaireon-prod-params \
  --query "Parameters[?ParameterName=='max_connections']"

# RDS default formula: LEAST({DBInstanceClassMemory/9531392}, 5000)
# Override if needed:
aws rds modify-db-parameter-group \
  --db-parameter-group-name kaireon-prod-params \
  --parameters "ParameterName=max_connections,ParameterValue=400,ApplyMethod=pending-reboot"

4. Scaling Redis

Vertical Scaling (ElastiCache)

Instance type progression:

Tier	Instance Type	Memory	Network	Use Case
Dev	cache.t3.small	1.37 GB	Low-Moderate	Development
Small	cache.r6g.large	13.07 GB	Up to 10 Gbps	Standard
Medium	cache.r6g.xlarge	26.32 GB	Up to 10 Gbps	High cache
Large	cache.r6g.2xlarge	52.82 GB	Up to 10 Gbps	Large dataset

Scaling up (single node, non-clustered):

aws elasticache modify-replication-group \
  --replication-group-id kaireon-redis \
  --cache-node-type cache.r6g.xlarge \
  --apply-immediately

Cluster Mode

For datasets that exceed single-node memory or require higher throughput. Enabling cluster mode:

# Create a new cluster-mode-enabled replication group
aws elasticache create-replication-group \
  --replication-group-id kaireon-redis-cluster \
  --replication-group-description "KaireonAI Redis Cluster" \
  --cache-node-type cache.r6g.large \
  --num-node-groups 3 \
  --replicas-per-node-group 1 \
  --automatic-failover-enabled \
  --multi-az-enabled \
  --cache-parameter-group-name default.redis7.cluster.on

Resharding (adding shards):

aws elasticache modify-replication-group-shard-configuration \
  --replication-group-id kaireon-redis-cluster \
  --node-group-count 6 \
  --apply-immediately

Application configuration for cluster mode:

kubectl set env deploy/kaireon-api \
  REDIS_CLUSTER_MODE=true \
  REDIS_URL="redis://kaireon-redis-cluster.xxxxxxxx.clustercfg.use1.cache.amazonaws.com:6379"

Redis Memory Management

# Check memory usage
kubectl exec -it deploy/kaireon-redis -- redis-cli INFO memory

# Set maxmemory
kubectl exec -it deploy/kaireon-redis -- redis-cli CONFIG SET maxmemory 12gb

# Set eviction policy
kubectl exec -it deploy/kaireon-redis -- redis-cli CONFIG SET maxmemory-policy volatile-lru

Scaling Redis for Specific Use Cases

Decision caching (read-heavy):

Use read replicas for read distribution.
Set short TTLs (30-60s) to limit memory growth.
Use volatile-lru eviction.

Queue processing (write-heavy):

Scale vertically for more throughput.
Monitor instantaneous_ops_per_sec.
Consider cluster mode if >100K ops/sec.

Session storage:

Separate from cache Redis.
Use noeviction policy (sessions must not be evicted).
Size for peak concurrent users.

5. Scaling PgBouncer

Pool Size Calculations

Total server connections needed:
  = (API replicas * connections_per_pod) + (Worker replicas * connections_per_worker)
  + monitoring + replication + admin overhead

Example:
  API: 10 replicas * 5 connections = 50
  Workers: 10 replicas * 3 connections = 30
  Monitoring: 5
  Replication: 5
  Admin: 10
  Total: 100

PgBouncer settings:
  max_db_connections = 100 (per PgBouncer instance)
  default_pool_size = 50 (per database)
  max_client_conn = 500 (per PgBouncer instance)

Scaling PgBouncer Horizontally

Deploy multiple PgBouncer instances behind a Kubernetes Service:

# Scale PgBouncer replicas
kubectl scale deploy/kaireon-pgbouncer -n kaireon --replicas=3

# Verify all replicas are healthy
kubectl get pods -n kaireon -l app=kaireon-pgbouncer

Adjusting pool size when scaling API/workers:

API Replicas	Worker Replicas	PgBouncer Instances	default_pool_size	max_db_connections
3	2	1	25	80
10	10	2	25	100
20	20	3	30	120
30	30	4	25	100

Important: Total max_db_connections across all PgBouncer instances must not exceed PostgreSQL max_connections.

Applying Pool Changes

# Edit ConfigMap
kubectl edit configmap kaireon-pgbouncer-config -n kaireon

# Hot-reload (no restart needed for pool size changes)
for pod in $(kubectl get pods -n kaireon -l app=kaireon-pgbouncer -o name); do
  kubectl exec -n kaireon $pod -- psql -p 6432 pgbouncer -c "RELOAD;"
done

# Verify new settings
kubectl exec -it deploy/kaireon-pgbouncer -n kaireon -- psql -p 6432 pgbouncer -c "SHOW CONFIG;" | grep pool

6. Capacity Planning

Metrics to Track

Metric	Current	Warning Threshold	Scaling Action
API CPU utilization	-	>60% sustained	Increase HPA maxReplicas
API memory utilization	-	>75% sustained	Increase pod memory limit
Worker queue depth	-	>50 sustained	Lower KEDA listLength
DB CPU utilization	-	>70% sustained	Scale up instance type
DB connections used	-	>70% of max	Scale PgBouncer or DB
DB storage used	-	>70% of allocated	Increase storage
Redis memory used	-	>70% of maxmemory	Scale up instance type
Redis ops/sec	-	>80% of benchmark	Enable cluster mode

Monthly Capacity Review Checklist

Review 30-day trends for all metrics above.
Project growth for the next 90 days based on customer onboarding pipeline.
Identify any component within 30 days of hitting a threshold.
Plan scaling actions with cost estimates.
Update this document with new current values.

Cost-Aware Scaling

Use Spot instances for batch workers (up to 70% savings).
Use Reserved Instances for baseline API and database capacity.
Use Graviton (arm64) instances for 20% better price-performance.

Scale down non-production environments during off-hours:

# Scale down staging at 8 PM
kubectl scale deploy -n kaireon-staging --all --replicas=0

# Scale up at 8 AM
kubectl scale deploy -n kaireon-staging --all --replicas=1

Load Testing Before Scaling

Before major scaling changes, validate with load tests:

# Install k6
brew install k6

# Run a decision endpoint load test
k6 run --vus 100 --duration 5m scripts/load-test-decisions.js

# Run with ramping to find the breaking point
k6 run --stage '1m:50,3m:200,1m:50' scripts/load-test-decisions.js

Key metrics to capture during load tests:

P50, P95, P99 latency at each VU level.
Error rate at each VU level.
Database connection count and query time.
Redis memory and ops/sec.
Pod CPU and memory utilization.

Get Started

Deploy & Operate

Runbooks

Data Platform

Decisioning Studio

Execute & Optimize

Intelligence

Platform & Security

Integrations

Reports

Release Notes

Documentation Index

​Table of Contents

​1. Scaling the API Layer

​Current Architecture

​Horizontal Pod Autoscaler (HPA)

​HPA Tuning

​Manual Scaling

​Pod Resource Tuning

​Pre-scaling for Planned Events

​2. Scaling Workers

​Current Architecture

​KEDA Autoscaler

​KEDA Tuning

​Manual Worker Scaling

​Worker Resource Tuning

​Queue-Specific Worker Pools

​3. Scaling the Database

​Vertical Scaling (RDS)

​Read Replicas

​Storage Scaling

​Connection Limit Scaling

​4. Scaling Redis

​Vertical Scaling (ElastiCache)

​Cluster Mode

​Redis Memory Management

​Scaling Redis for Specific Use Cases

​5. Scaling PgBouncer

​Pool Size Calculations

​Scaling PgBouncer Horizontally

​Applying Pool Changes

​6. Capacity Planning

​Metrics to Track

​Monthly Capacity Review Checklist

​Cost-Aware Scaling

​Load Testing Before Scaling

Table of Contents

1. Scaling the API Layer

Current Architecture

Horizontal Pod Autoscaler (HPA)

HPA Tuning

Manual Scaling

Pod Resource Tuning

Pre-scaling for Planned Events

2. Scaling Workers

Current Architecture

KEDA Autoscaler

KEDA Tuning

Manual Worker Scaling

Worker Resource Tuning

Queue-Specific Worker Pools

3. Scaling the Database

Vertical Scaling (RDS)

Read Replicas

Storage Scaling

Connection Limit Scaling

4. Scaling Redis

Vertical Scaling (ElastiCache)

Cluster Mode

Redis Memory Management

Scaling Redis for Specific Use Cases

5. Scaling PgBouncer

Pool Size Calculations

Scaling PgBouncer Horizontally

Applying Pool Changes

6. Capacity Planning

Metrics to Track

Monthly Capacity Review Checklist

Cost-Aware Scaling

Load Testing Before Scaling