Skip to main content
This document provides sizing guidance, cost estimates, and scaling thresholds for operating KaireonAI across three deployment tiers.

1. Tier Overview

TierNodesRPS CapacityEstimated Monthly CostTypical Use Case
Startup2 EKS (t3.large)< 100 RPS~$150/moProof of concept, small teams
Growth3-4 EKS nodes100-1,000 RPS~$400-600/moProduction workloads, mid-market
Enterprise6+ EKS nodes1,000+ RPS~$2,000+/moHigh-volume, multi-tenant, regulated

2. Startup Tier

Target: Teams evaluating KaireonAI or running low-volume production workloads with fewer than 100 requests per second.

2.1 Compute

ComponentSpecNotes
EKS Nodes2x t3.large (2 vCPU, 8 GiB)Managed node group
Next.js App2 replicas, 512Mi RAM, 250m CPUCovers API + UI
Pipeline Workers1 replica, 512Mi RAM, 250m CPUBatch processing

2.2 Data Stores

ComponentSpecNotes
PostgreSQLIn-cluster (Helm), 20 GiB EBSSingle instance, no replication
RedisIn-cluster (Helm), 1 GiBSession cache, rate limiting

2.3 Cost Breakdown

ItemMonthly Cost
2x t3.large on-demand~$120
EBS (30 GiB gp3)~$8
EKS control plane0(freetier)or0 (free tier) or 73
Data transfer~$5
Total~$150-220

2.4 Limitations

  • No database failover. A PostgreSQL pod restart causes brief downtime.
  • Not suitable for workloads requiring high availability or disaster recovery.
  • Pipeline throughput is limited to a single worker.

3. Growth Tier

Target: Production deployments serving 100 to 1,000 requests per second with availability requirements.

3.1 Compute

ComponentSpecNotes
EKS Nodes3-4x t3.xlarge (4 vCPU, 16 GiB)Managed node group, multi-AZ
Next.js App3-4 replicas, 1Gi RAM, 500m CPUHPA enabled, target 70% CPU
Pipeline Workers2 replicas, 1Gi RAM, 500m CPUParallel pipeline execution

3.2 Data Stores

ComponentSpecNotes
PostgreSQLRDS db.t3.medium (2 vCPU, 4 GiB)Automated backups, single-AZ
RedisElastiCache cache.t3.small (1.5 GiB)Single node, snapshot backups

3.3 Cost Breakdown

ItemMonthly Cost
3x t3.xlarge on-demand~$290
EKS control plane~$73
RDS db.t3.medium~$55
ElastiCache cache.t3.small~$25
EBS + storage~$15
Data transfer~$20
Total~$480-600

3.4 Key Improvements Over Startup

  • Managed database with automated backups and point-in-time recovery.
  • Horizontal Pod Autoscaler for the application tier.
  • Multi-AZ node placement for compute resilience.
  • Dedicated Redis for consistent cache performance.

4. Enterprise Tier

Target: High-volume production deployments exceeding 1,000 requests per second with strict availability, compliance, and multi-region requirements.

4.1 Compute

ComponentSpecNotes
EKS Nodes6+ m6i.xlarge (4 vCPU, 16 GiB)Multi-AZ, cluster autoscaler
Next.js App6+ replicas, 2Gi RAM, 1 CPUHPA + PDB (minAvailable: 3)
Pipeline Workers3-4 replicas, 2Gi RAM, 1 CPUAutoscaled on queue depth
Decision CacheDedicated Redis read replicasSub-millisecond cached decisions

4.2 Data Stores

ComponentSpecNotes
PostgreSQLRDS db.r6g.large (2 vCPU, 16 GiB) Multi-AZRead replicas, IAM auth, encrypted
RedisElastiCache cluster (3 shards, 2 replicas)Cluster mode, auto-failover

4.3 Cost Breakdown

ItemMonthly Cost
6x m6i.xlarge on-demand~$690
EKS control plane~$73
RDS db.r6g.large Multi-AZ~$400
RDS read replica~$200
ElastiCache cluster (3 shards)~$450
EBS + storage~$50
Data transfer + NAT gateway~$100
WAF + Shield Standard~$50
Total~$2,000-2,500

4.4 Key Improvements Over Growth

  • Multi-AZ RDS with synchronous replication and automatic failover.
  • Read replicas to offload analytics and reporting queries.
  • ElastiCache cluster mode for horizontal cache scaling.
  • Pod Disruption Budgets ensure rolling updates never drop below minimum replicas.
  • Cluster Autoscaler adjusts node count based on pending pod demand.

5. Component Sizing Guide

5.1 Next.js Application Pods

MetricStartupGrowthEnterprise
Replicas23-4 (HPA)6+ (HPA)
CPU request/limit250m / 500m500m / 11 / 2
Memory request/limit512Mi / 1Gi1Gi / 2Gi2Gi / 4Gi
HPA targetN/A70% CPU70% CPU

5.2 Pipeline Workers

MetricStartupGrowthEnterprise
Replicas123-4 (KEDA)
CPU request/limit250m / 500m500m / 11 / 2
Memory request/limit512Mi / 1Gi1Gi / 2Gi2Gi / 4Gi
Scaling triggerN/AManualQueue depth

5.3 PostgreSQL

MetricStartupGrowthEnterprise
InstanceIn-cluster podRDS db.t3.mediumRDS db.r6g.large Multi-AZ
Storage20 GiB gp350 GiB gp3200 GiB io2 (3000 IOPS)
Max connections100200500
BackupsManualAutomated (7 days)Automated (30 days) + snapshots
Read replicas001-2

5.4 Redis

MetricStartupGrowthEnterprise
InstanceIn-cluster podcache.t3.smallCluster mode (3 shards)
Memory1 GiB1.5 GiB3x 6.5 GiB (19.5 GiB total)
PersistenceNoneSnapshotAOF + snapshot
FailoverNoneNoneAutomatic (Multi-AZ)

6. Monitoring Thresholds and Scaling Triggers

Use the following thresholds to determine when to scale up or transition to the next tier.

6.1 Compute Scaling Triggers

MetricWarning ThresholdAction
Node CPU utilization (avg)> 70% sustainedAdd nodes or increase instance size
Node memory utilization (avg)> 75% sustainedAdd nodes or increase instance size
Pod CPU throttling> 10% of periodsIncrease CPU limits or add replicas
Pending pods (unschedulable)> 0 for 5 minEnable cluster autoscaler or add nodes
HPA at max replicasSustained 15 minIncrease maxReplicas or node capacity

6.2 Database Scaling Triggers

MetricWarning ThresholdAction
RDS CPU utilization> 70% sustainedUpgrade instance class
RDS freeable memory< 500 MiBUpgrade instance class
RDS connection count> 80% of maxAdd read replicas or use PgBouncer
RDS read latency> 10 ms avgAdd read replica for read-heavy queries
RDS free storage< 20%Enable autoscaling or increase volume
RDS IOPS utilization> 80% of baselineUpgrade to io2 or increase provisioned

6.3 Cache Scaling Triggers

MetricWarning ThresholdAction
Redis CPU utilization> 65%Upgrade instance or add shards
Redis memory utilization> 80%Increase instance size or add shards
Redis evictions> 0 sustainedIncrease memory or review TTL policies
Redis cache hit rate< 90%Review cache strategy, increase memory

7. When to Upgrade Tiers

Startup to Growth

Upgrade when any of the following conditions persist for more than one week:
  • Sustained RPS exceeds 80.
  • Database connection count regularly exceeds 80.
  • Application pod CPU consistently above 70%.
  • Downtime from in-cluster database restarts is unacceptable.
  • Business requires automated backups or point-in-time recovery.

Growth to Enterprise

Upgrade when any of the following conditions persist:
  • Sustained RPS exceeds 800.
  • Decision latency P99 approaches the 200ms SLO limit.
  • Compliance requirements mandate Multi-AZ database or encryption at rest.
  • Read replica is needed to offload analytics workloads.
  • Cache evictions occur despite proper TTL tuning.
  • Business requires 99.9% or higher availability with automatic failover.

8. Cost Optimization Tips

  1. Reserved Instances: Purchase 1-year reserved instances for predictable node types to save 30-40%.
  2. Spot Instances: Use spot instances for pipeline worker nodes (stateless, tolerant of interruption).
  3. Right-sizing: Review CloudWatch/Prometheus metrics monthly. Downsize over-provisioned instances.
  4. Storage tiering: Use gp3 for general workloads, io2 only when IOPS-bound.
  5. Data transfer: Keep services in the same AZ where possible. Use VPC endpoints for AWS services.
  6. Scheduled scaling: Scale down non-production environments outside business hours.