1. Tier Overview
| Tier | Nodes | RPS Capacity | Estimated Monthly Cost | Typical Use Case |
|---|---|---|---|---|
| Startup | 2 EKS (t3.large) | < 100 RPS | ~$150/mo | Proof of concept, small teams |
| Growth | 3-4 EKS nodes | 100-1,000 RPS | ~$400-600/mo | Production workloads, mid-market |
| Enterprise | 6+ EKS nodes | 1,000+ RPS | ~$2,000+/mo | High-volume, multi-tenant, regulated |
2. Startup Tier
Target: Teams evaluating KaireonAI or running low-volume production workloads with fewer than 100 requests per second.2.1 Compute
| Component | Spec | Notes |
|---|---|---|
| EKS Nodes | 2x t3.large (2 vCPU, 8 GiB) | Managed node group |
| Next.js App | 2 replicas, 512Mi RAM, 250m CPU | Covers API + UI |
| Pipeline Workers | 1 replica, 512Mi RAM, 250m CPU | Batch processing |
2.2 Data Stores
| Component | Spec | Notes |
|---|---|---|
| PostgreSQL | In-cluster (Helm), 20 GiB EBS | Single instance, no replication |
| Redis | In-cluster (Helm), 1 GiB | Session cache, rate limiting |
2.3 Cost Breakdown
| Item | Monthly Cost |
|---|---|
| 2x t3.large on-demand | ~$120 |
| EBS (30 GiB gp3) | ~$8 |
| EKS control plane | 73 |
| Data transfer | ~$5 |
| Total | ~$150-220 |
2.4 Limitations
- No database failover. A PostgreSQL pod restart causes brief downtime.
- Not suitable for workloads requiring high availability or disaster recovery.
- Pipeline throughput is limited to a single worker.
3. Growth Tier
Target: Production deployments serving 100 to 1,000 requests per second with availability requirements.3.1 Compute
| Component | Spec | Notes |
|---|---|---|
| EKS Nodes | 3-4x t3.xlarge (4 vCPU, 16 GiB) | Managed node group, multi-AZ |
| Next.js App | 3-4 replicas, 1Gi RAM, 500m CPU | HPA enabled, target 70% CPU |
| Pipeline Workers | 2 replicas, 1Gi RAM, 500m CPU | Parallel pipeline execution |
3.2 Data Stores
| Component | Spec | Notes |
|---|---|---|
| PostgreSQL | RDS db.t3.medium (2 vCPU, 4 GiB) | Automated backups, single-AZ |
| Redis | ElastiCache cache.t3.small (1.5 GiB) | Single node, snapshot backups |
3.3 Cost Breakdown
| Item | Monthly Cost |
|---|---|
| 3x t3.xlarge on-demand | ~$290 |
| EKS control plane | ~$73 |
| RDS db.t3.medium | ~$55 |
| ElastiCache cache.t3.small | ~$25 |
| EBS + storage | ~$15 |
| Data transfer | ~$20 |
| Total | ~$480-600 |
3.4 Key Improvements Over Startup
- Managed database with automated backups and point-in-time recovery.
- Horizontal Pod Autoscaler for the application tier.
- Multi-AZ node placement for compute resilience.
- Dedicated Redis for consistent cache performance.
4. Enterprise Tier
Target: High-volume production deployments exceeding 1,000 requests per second with strict availability, compliance, and multi-region requirements.4.1 Compute
| Component | Spec | Notes |
|---|---|---|
| EKS Nodes | 6+ m6i.xlarge (4 vCPU, 16 GiB) | Multi-AZ, cluster autoscaler |
| Next.js App | 6+ replicas, 2Gi RAM, 1 CPU | HPA + PDB (minAvailable: 3) |
| Pipeline Workers | 3-4 replicas, 2Gi RAM, 1 CPU | Autoscaled on queue depth |
| Decision Cache | Dedicated Redis read replicas | Sub-millisecond cached decisions |
4.2 Data Stores
| Component | Spec | Notes |
|---|---|---|
| PostgreSQL | RDS db.r6g.large (2 vCPU, 16 GiB) Multi-AZ | Read replicas, IAM auth, encrypted |
| Redis | ElastiCache cluster (3 shards, 2 replicas) | Cluster mode, auto-failover |
4.3 Cost Breakdown
| Item | Monthly Cost |
|---|---|
| 6x m6i.xlarge on-demand | ~$690 |
| EKS control plane | ~$73 |
| RDS db.r6g.large Multi-AZ | ~$400 |
| RDS read replica | ~$200 |
| ElastiCache cluster (3 shards) | ~$450 |
| EBS + storage | ~$50 |
| Data transfer + NAT gateway | ~$100 |
| WAF + Shield Standard | ~$50 |
| Total | ~$2,000-2,500 |
4.4 Key Improvements Over Growth
- Multi-AZ RDS with synchronous replication and automatic failover.
- Read replicas to offload analytics and reporting queries.
- ElastiCache cluster mode for horizontal cache scaling.
- Pod Disruption Budgets ensure rolling updates never drop below minimum replicas.
- Cluster Autoscaler adjusts node count based on pending pod demand.
5. Component Sizing Guide
5.1 Next.js Application Pods
| Metric | Startup | Growth | Enterprise |
|---|---|---|---|
| Replicas | 2 | 3-4 (HPA) | 6+ (HPA) |
| CPU request/limit | 250m / 500m | 500m / 1 | 1 / 2 |
| Memory request/limit | 512Mi / 1Gi | 1Gi / 2Gi | 2Gi / 4Gi |
| HPA target | N/A | 70% CPU | 70% CPU |
5.2 Pipeline Workers
| Metric | Startup | Growth | Enterprise |
|---|---|---|---|
| Replicas | 1 | 2 | 3-4 (KEDA) |
| CPU request/limit | 250m / 500m | 500m / 1 | 1 / 2 |
| Memory request/limit | 512Mi / 1Gi | 1Gi / 2Gi | 2Gi / 4Gi |
| Scaling trigger | N/A | Manual | Queue depth |
5.3 PostgreSQL
| Metric | Startup | Growth | Enterprise |
|---|---|---|---|
| Instance | In-cluster pod | RDS db.t3.medium | RDS db.r6g.large Multi-AZ |
| Storage | 20 GiB gp3 | 50 GiB gp3 | 200 GiB io2 (3000 IOPS) |
| Max connections | 100 | 200 | 500 |
| Backups | Manual | Automated (7 days) | Automated (30 days) + snapshots |
| Read replicas | 0 | 0 | 1-2 |
5.4 Redis
| Metric | Startup | Growth | Enterprise |
|---|---|---|---|
| Instance | In-cluster pod | cache.t3.small | Cluster mode (3 shards) |
| Memory | 1 GiB | 1.5 GiB | 3x 6.5 GiB (19.5 GiB total) |
| Persistence | None | Snapshot | AOF + snapshot |
| Failover | None | None | Automatic (Multi-AZ) |
6. Monitoring Thresholds and Scaling Triggers
Use the following thresholds to determine when to scale up or transition to the next tier.6.1 Compute Scaling Triggers
| Metric | Warning Threshold | Action |
|---|---|---|
| Node CPU utilization (avg) | > 70% sustained | Add nodes or increase instance size |
| Node memory utilization (avg) | > 75% sustained | Add nodes or increase instance size |
| Pod CPU throttling | > 10% of periods | Increase CPU limits or add replicas |
| Pending pods (unschedulable) | > 0 for 5 min | Enable cluster autoscaler or add nodes |
| HPA at max replicas | Sustained 15 min | Increase maxReplicas or node capacity |
6.2 Database Scaling Triggers
| Metric | Warning Threshold | Action |
|---|---|---|
| RDS CPU utilization | > 70% sustained | Upgrade instance class |
| RDS freeable memory | < 500 MiB | Upgrade instance class |
| RDS connection count | > 80% of max | Add read replicas or use PgBouncer |
| RDS read latency | > 10 ms avg | Add read replica for read-heavy queries |
| RDS free storage | < 20% | Enable autoscaling or increase volume |
| RDS IOPS utilization | > 80% of baseline | Upgrade to io2 or increase provisioned |
6.3 Cache Scaling Triggers
| Metric | Warning Threshold | Action |
|---|---|---|
| Redis CPU utilization | > 65% | Upgrade instance or add shards |
| Redis memory utilization | > 80% | Increase instance size or add shards |
| Redis evictions | > 0 sustained | Increase memory or review TTL policies |
| Redis cache hit rate | < 90% | Review cache strategy, increase memory |
7. When to Upgrade Tiers
Startup to Growth
Upgrade when any of the following conditions persist for more than one week:- Sustained RPS exceeds 80.
- Database connection count regularly exceeds 80.
- Application pod CPU consistently above 70%.
- Downtime from in-cluster database restarts is unacceptable.
- Business requires automated backups or point-in-time recovery.
Growth to Enterprise
Upgrade when any of the following conditions persist:- Sustained RPS exceeds 800.
- Decision latency P99 approaches the 200ms SLO limit.
- Compliance requirements mandate Multi-AZ database or encryption at rest.
- Read replica is needed to offload analytics workloads.
- Cache evictions occur despite proper TTL tuning.
- Business requires 99.9% or higher availability with automatic failover.
8. Cost Optimization Tips
- Reserved Instances: Purchase 1-year reserved instances for predictable node types to save 30-40%.
- Spot Instances: Use spot instances for pipeline worker nodes (stateless, tolerant of interruption).
- Right-sizing: Review CloudWatch/Prometheus metrics monthly. Downsize over-provisioned instances.
- Storage tiering: Use gp3 for general workloads, io2 only when IOPS-bound.
- Data transfer: Keep services in the same AZ where possible. Use VPC endpoints for AWS services.
- Scheduled scaling: Scale down non-production environments outside business hours.