Skip to main content
Audience: SREs, platform operators, DevOps engineers Last updated: 2026-02-23 Infrastructure: EKS (Kubernetes), RDS PostgreSQL, ElastiCache Redis, PgBouncer

Table of Contents

  1. Scaling the API Layer
  2. Scaling Workers
  3. Scaling the Database
  4. Scaling Redis
  5. Scaling PgBouncer
  6. Capacity Planning

1. Scaling the API Layer

Current Architecture

ALB (Application Load Balancer)
    |
    v
KaireonAI API Deployment (N replicas, HPA-managed)
    |
    +---> PgBouncer ---> PostgreSQL
    +---> Redis

Horizontal Pod Autoscaler (HPA)

Default HPA configuration (from Helm chart):
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: kaireon-api-hpa
  namespace: kaireon
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: kaireon-api
  minReplicas: 3
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
The default Helm chart provides CPU-based HPA only. For production environments with high throughput, add memory and custom metric scaling via a Helm values override:

HPA Tuning

Adjusting target CPU utilization:
# Check current HPA status
kubectl get hpa kaireon-api-hpa -n kaireon

# View HPA events and scaling decisions
kubectl describe hpa kaireon-api-hpa -n kaireon

# Patch CPU target (lower = more aggressive scaling)
kubectl patch hpa kaireon-api-hpa -n kaireon --type merge -p \
  '{"spec":{"metrics":[{"type":"Resource","resource":{"name":"cpu","target":{"type":"Utilization","averageUtilization":50}}}]}}'
Adjusting replica bounds:
# Increase max replicas for expected traffic spike
kubectl patch hpa kaireon-api-hpa -n kaireon --type merge -p \
  '{"spec":{"maxReplicas":30}}'

# Increase minimum replicas for guaranteed capacity
kubectl patch hpa kaireon-api-hpa -n kaireon --type merge -p \
  '{"spec":{"minReplicas":5}}'
Adjusting scale-up/down behavior:
# Faster scale-up (for traffic spikes)
kubectl patch hpa kaireon-api-hpa -n kaireon --type merge -p \
  '{"spec":{"behavior":{"scaleUp":{"stabilizationWindowSeconds":30,"policies":[{"type":"Pods","value":6,"periodSeconds":30}]}}}}'

# Slower scale-down (prevent flapping)
kubectl patch hpa kaireon-api-hpa -n kaireon --type merge -p \
  '{"spec":{"behavior":{"scaleDown":{"stabilizationWindowSeconds":600}}}}'

Manual Scaling

For planned events or emergencies:
# Scale to a specific number of replicas
kubectl scale deploy/kaireon-api -n kaireon --replicas=15

# Note: HPA will override manual scaling. To hold a fixed count,
# set minReplicas = maxReplicas temporarily:
kubectl patch hpa kaireon-api-hpa -n kaireon --type merge -p \
  '{"spec":{"minReplicas":15,"maxReplicas":15}}'

# Revert after the event
kubectl patch hpa kaireon-api-hpa -n kaireon --type merge -p \
  '{"spec":{"minReplicas":3,"maxReplicas":20}}'

Pod Resource Tuning

resources:
  requests:
    cpu: "500m"
    memory: "512Mi"
  limits:
    cpu: "2000m"
    memory: "2Gi"
Adjusting pod resources:
# Increase memory for OOM-prone pods
kubectl set resources deploy/kaireon-api -n kaireon \
  --requests=memory=1Gi \
  --limits=memory=4Gi

# Increase CPU for CPU-bound workloads
kubectl set resources deploy/kaireon-api -n kaireon \
  --requests=cpu=1000m \
  --limits=cpu=4000m

Pre-scaling for Planned Events

# 1. Scale up 30 minutes before the event
kubectl patch hpa kaireon-api-hpa -n kaireon --type merge -p \
  '{"spec":{"minReplicas":15}}'

# 2. Verify pods are ready
kubectl get pods -n kaireon -l app=kaireon-api | grep Running | wc -l

# 3. After the event, reduce minimum
kubectl patch hpa kaireon-api-hpa -n kaireon --type merge -p \
  '{"spec":{"minReplicas":3}}'

2. Scaling Workers

Current Architecture

Redis Queue (decisions, pipelines, batch)
    |
    v
KaireonAI Worker Deployment (M replicas, KEDA-managed)

KEDA Autoscaler

Current KEDA configuration:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: kaireon-worker-scaler
  namespace: kaireon
spec:
  scaleTargetRef:
    name: kaireon-worker
  minReplicaCount: 2
  maxReplicaCount: 30
  pollingInterval: 15
  cooldownPeriod: 300
  triggers:
    - type: redis
      metadata:
        address: kaireon-redis.kaireon.svc.cluster.local:6379
        listName: queue:decisions
        listLength: "50"
        enableTLS: "false"
        databaseIndex: "0"
    - type: redis
      metadata:
        address: kaireon-redis.kaireon.svc.cluster.local:6379
        listName: queue:pipelines
        listLength: "20"
        enableTLS: "false"
        databaseIndex: "0"

KEDA Tuning

Adjusting trigger thresholds:
# Edit the ScaledObject
kubectl edit scaledobject kaireon-worker-scaler -n kaireon

# Key parameters:
# - listLength: queue depth per replica (lower = more aggressive)
# - pollingInterval: how often KEDA checks (seconds)
# - cooldownPeriod: wait before scaling down (seconds)
Recommended settings by scenario:
ScenariolistLengthpollingIntervalcooldownPeriodmaxReplicas
Normal501530030
High throughput201018050
Cost sensitive1003060015
Batch processing10106050

Manual Worker Scaling

# Temporarily override KEDA (pause the scaler)
kubectl annotate scaledobject kaireon-worker-scaler -n kaireon \
  autoscaling.keda.sh/paused-replicas="20"

# Scale directly
kubectl scale deploy/kaireon-worker -n kaireon --replicas=20

# Resume KEDA
kubectl annotate scaledobject kaireon-worker-scaler -n kaireon \
  autoscaling.keda.sh/paused-replicas-

# Or remove the annotation entirely
kubectl annotate scaledobject kaireon-worker-scaler -n kaireon \
  autoscaling.keda.sh/paused-replicas- --overwrite

Worker Resource Tuning

Workers are CPU-intensive during scoring and memory-intensive during pipeline execution.
resources:
  requests:
    cpu: "1000m"
    memory: "1Gi"
  limits:
    cpu: "4000m"
    memory: "4Gi"

Queue-Specific Worker Pools

For isolating workloads, deploy separate worker pools per queue:
# Decision workers (latency-sensitive)
kubectl scale deploy/kaireon-worker-decisions -n kaireon --replicas=10

# Pipeline workers (throughput-focused)
kubectl scale deploy/kaireon-worker-pipelines -n kaireon --replicas=5

# Batch workers (cost-optimized, use spot instances)
kubectl scale deploy/kaireon-worker-batch -n kaireon --replicas=3

3. Scaling the Database

Vertical Scaling (RDS)

Instance type progression:
TierInstance TypevCPUsMemoryMax ConnectionsUse Case
Devdb.t3.medium24 GB100Development
Smalldb.r6g.large216 GB200Low traffic
Mediumdb.r6g.xlarge432 GB400Standard
Largedb.r6g.2xlarge864 GB800High traffic
XLargedb.r6g.4xlarge16128 GB1600Peak traffic
Scaling up:
# Check current instance type
aws rds describe-db-instances \
  --db-instance-identifier kaireon-prod \
  --query 'DBInstances[0].DBInstanceClass'

# Scale up (causes a brief outage during maintenance window)
aws rds modify-db-instance \
  --db-instance-identifier kaireon-prod \
  --db-instance-class db.r6g.2xlarge \
  --apply-immediately

# Monitor the modification
aws rds describe-db-instances \
  --db-instance-identifier kaireon-prod \
  --query 'DBInstances[0].[DBInstanceStatus,PendingModifiedValues]'
Important: Scaling up causes a brief outage (typically 1-3 minutes). Schedule during maintenance windows if possible. With Multi-AZ, failover minimizes downtime.

Read Replicas

Use read replicas to offload read-heavy queries (dashboards, analytics, reporting). Creating a read replica:
aws rds create-db-instance-read-replica \
  --db-instance-identifier kaireon-prod-read-1 \
  --source-db-instance-identifier kaireon-prod \
  --db-instance-class db.r6g.xlarge \
  --availability-zone us-east-1b

# Wait for it to become available
aws rds wait db-instance-available --db-instance-identifier kaireon-prod-read-1
Application configuration for read replicas:
# Set environment variables for the API
kubectl set env deploy/kaireon-api \
  DATABASE_READ_URL="postgresql://$DB_USER:$DB_PASSWORD@kaireon-prod-read-1.xxxxxxxx.us-east-1.rds.amazonaws.com:5432/kaireon"
Monitoring replica lag:
aws cloudwatch get-metric-statistics \
  --namespace AWS/RDS \
  --metric-name ReplicaLag \
  --dimensions Name=DBInstanceIdentifier,Value=kaireon-prod-read-1 \
  --start-time $(date -u -v-1H +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 60 \
  --statistics Average
Promoting a read replica (for failover or splitting):
aws rds promote-read-replica \
  --db-instance-identifier kaireon-prod-read-1

Storage Scaling

# Check current storage
aws rds describe-db-instances \
  --db-instance-identifier kaireon-prod \
  --query 'DBInstances[0].[AllocatedStorage,StorageType,Iops]'

# Increase storage (online, no downtime for gp3)
aws rds modify-db-instance \
  --db-instance-identifier kaireon-prod \
  --allocated-storage 500 \
  --apply-immediately

# Enable storage autoscaling
aws rds modify-db-instance \
  --db-instance-identifier kaireon-prod \
  --max-allocated-storage 1000 \
  --apply-immediately

Connection Limit Scaling

When scaling the database vertically, adjust max_connections accordingly:
# Check current max_connections
aws rds describe-db-parameters \
  --db-parameter-group-name kaireon-prod-params \
  --query "Parameters[?ParameterName=='max_connections']"

# RDS default formula: LEAST({DBInstanceClassMemory/9531392}, 5000)
# Override if needed:
aws rds modify-db-parameter-group \
  --db-parameter-group-name kaireon-prod-params \
  --parameters "ParameterName=max_connections,ParameterValue=400,ApplyMethod=pending-reboot"

4. Scaling Redis

Vertical Scaling (ElastiCache)

Instance type progression:
TierInstance TypeMemoryNetworkUse Case
Devcache.t3.small1.37 GBLow-ModerateDevelopment
Smallcache.r6g.large13.07 GBUp to 10 GbpsStandard
Mediumcache.r6g.xlarge26.32 GBUp to 10 GbpsHigh cache
Largecache.r6g.2xlarge52.82 GBUp to 10 GbpsLarge dataset
Scaling up (single node, non-clustered):
aws elasticache modify-replication-group \
  --replication-group-id kaireon-redis \
  --cache-node-type cache.r6g.xlarge \
  --apply-immediately

Cluster Mode

For datasets that exceed single-node memory or require higher throughput. Enabling cluster mode:
# Create a new cluster-mode-enabled replication group
aws elasticache create-replication-group \
  --replication-group-id kaireon-redis-cluster \
  --replication-group-description "KaireonAI Redis Cluster" \
  --cache-node-type cache.r6g.large \
  --num-node-groups 3 \
  --replicas-per-node-group 1 \
  --automatic-failover-enabled \
  --multi-az-enabled \
  --cache-parameter-group-name default.redis7.cluster.on
Resharding (adding shards):
aws elasticache modify-replication-group-shard-configuration \
  --replication-group-id kaireon-redis-cluster \
  --node-group-count 6 \
  --apply-immediately
Application configuration for cluster mode:
kubectl set env deploy/kaireon-api \
  REDIS_CLUSTER_MODE=true \
  REDIS_URL="redis://kaireon-redis-cluster.xxxxxxxx.clustercfg.use1.cache.amazonaws.com:6379"

Redis Memory Management

# Check memory usage
kubectl exec -it deploy/kaireon-redis -- redis-cli INFO memory

# Set maxmemory
kubectl exec -it deploy/kaireon-redis -- redis-cli CONFIG SET maxmemory 12gb

# Set eviction policy
kubectl exec -it deploy/kaireon-redis -- redis-cli CONFIG SET maxmemory-policy volatile-lru

Scaling Redis for Specific Use Cases

Decision caching (read-heavy):
  • Use read replicas for read distribution.
  • Set short TTLs (30-60s) to limit memory growth.
  • Use volatile-lru eviction.
Queue processing (write-heavy):
  • Scale vertically for more throughput.
  • Monitor instantaneous_ops_per_sec.
  • Consider cluster mode if >100K ops/sec.
Session storage:
  • Separate from cache Redis.
  • Use noeviction policy (sessions must not be evicted).
  • Size for peak concurrent users.

5. Scaling PgBouncer

Pool Size Calculations

Total server connections needed:
  = (API replicas * connections_per_pod) + (Worker replicas * connections_per_worker)
  + monitoring + replication + admin overhead

Example:
  API: 10 replicas * 5 connections = 50
  Workers: 10 replicas * 3 connections = 30
  Monitoring: 5
  Replication: 5
  Admin: 10
  Total: 100

PgBouncer settings:
  max_db_connections = 100 (per PgBouncer instance)
  default_pool_size = 50 (per database)
  max_client_conn = 500 (per PgBouncer instance)

Scaling PgBouncer Horizontally

Deploy multiple PgBouncer instances behind a Kubernetes Service:
# Scale PgBouncer replicas
kubectl scale deploy/kaireon-pgbouncer -n kaireon --replicas=3

# Verify all replicas are healthy
kubectl get pods -n kaireon -l app=kaireon-pgbouncer
Adjusting pool size when scaling API/workers:
API ReplicasWorker ReplicasPgBouncer Instancesdefault_pool_sizemax_db_connections
3212580
1010225100
2020330120
3030425100
Important: Total max_db_connections across all PgBouncer instances must not exceed PostgreSQL max_connections.

Applying Pool Changes

# Edit ConfigMap
kubectl edit configmap kaireon-pgbouncer-config -n kaireon

# Hot-reload (no restart needed for pool size changes)
for pod in $(kubectl get pods -n kaireon -l app=kaireon-pgbouncer -o name); do
  kubectl exec -n kaireon $pod -- psql -p 6432 pgbouncer -c "RELOAD;"
done

# Verify new settings
kubectl exec -it deploy/kaireon-pgbouncer -n kaireon -- psql -p 6432 pgbouncer -c "SHOW CONFIG;" | grep pool

6. Capacity Planning

Metrics to Track

MetricCurrentWarning ThresholdScaling Action
API CPU utilization->60% sustainedIncrease HPA maxReplicas
API memory utilization->75% sustainedIncrease pod memory limit
Worker queue depth->50 sustainedLower KEDA listLength
DB CPU utilization->70% sustainedScale up instance type
DB connections used->70% of maxScale PgBouncer or DB
DB storage used->70% of allocatedIncrease storage
Redis memory used->70% of maxmemoryScale up instance type
Redis ops/sec->80% of benchmarkEnable cluster mode

Monthly Capacity Review Checklist

  1. Review 30-day trends for all metrics above.
  2. Project growth for the next 90 days based on customer onboarding pipeline.
  3. Identify any component within 30 days of hitting a threshold.
  4. Plan scaling actions with cost estimates.
  5. Update this document with new current values.

Cost-Aware Scaling

  • Use Spot instances for batch workers (up to 70% savings).
  • Use Reserved Instances for baseline API and database capacity.
  • Use Graviton (arm64) instances for 20% better price-performance.
  • Scale down non-production environments during off-hours:
    # Scale down staging at 8 PM
    kubectl scale deploy -n kaireon-staging --all --replicas=0
    
    # Scale up at 8 AM
    kubectl scale deploy -n kaireon-staging --all --replicas=1
    

Load Testing Before Scaling

Before major scaling changes, validate with load tests:
# Install k6
brew install k6

# Run a decision endpoint load test
k6 run --vus 100 --duration 5m scripts/load-test-decisions.js

# Run with ramping to find the breaking point
k6 run --stage '1m:50,3m:200,1m:50' scripts/load-test-decisions.js
Key metrics to capture during load tests:
  • P50, P95, P99 latency at each VU level.
  • Error rate at each VU level.
  • Database connection count and query time.
  • Redis memory and ops/sec.
  • Pod CPU and memory utilization.