Documentation Index
Fetch the complete documentation index at: https://docs.kaireonai.com/llms.txt
Use this file to discover all available pages before exploring further.
This document provides sizing guidance, cost estimates, and scaling thresholds for operating KaireonAI across three deployment tiers.
1. Tier Overview
| Tier | Nodes | RPS Capacity | Estimated Monthly Cost | Typical Use Case |
|---|
| Startup | 2 EKS (t3.large) | < 100 RPS | ~$150/mo | Proof of concept, small teams |
| Growth | 3-4 EKS nodes | 100-1,000 RPS | ~$400-600/mo | Production workloads, mid-market |
| Enterprise | 6+ EKS nodes | 1,000+ RPS | ~$2,000+/mo | High-volume, multi-tenant, regulated |
2. Startup Tier
Target: Teams evaluating KaireonAI or running low-volume production workloads with fewer than 100 requests per second.
2.1 Compute
| Component | Spec | Notes |
|---|
| EKS Nodes | 2x t3.large (2 vCPU, 8 GiB) | Managed node group |
| Next.js App | 2 replicas, 512Mi RAM, 250m CPU | Covers API + UI |
| Pipeline Workers | 1 replica, 512Mi RAM, 250m CPU | Batch processing |
2.2 Data Stores
| Component | Spec | Notes |
|---|
| PostgreSQL | In-cluster (Helm), 20 GiB EBS | Single instance, no replication |
| Redis | In-cluster (Helm), 1 GiB | Session cache, rate limiting |
2.3 Cost Breakdown
| Item | Monthly Cost |
|---|
| 2x t3.large on-demand | ~$120 |
| EBS (30 GiB gp3) | ~$8 |
| EKS control plane | 0(freetier)or73 |
| Data transfer | ~$5 |
| Total | ~$150-220 |
2.4 Limitations
- No database failover. A PostgreSQL pod restart causes brief downtime.
- Not suitable for workloads requiring high availability or disaster recovery.
- Pipeline throughput is limited to a single worker.
3. Growth Tier
Target: Production deployments serving 100 to 1,000 requests per second with availability requirements.
3.1 Compute
| Component | Spec | Notes |
|---|
| EKS Nodes | 3-4x t3.xlarge (4 vCPU, 16 GiB) | Managed node group, multi-AZ |
| Next.js App | 3-4 replicas, 1Gi RAM, 500m CPU | HPA enabled, target 70% CPU |
| Pipeline Workers | 2 replicas, 1Gi RAM, 500m CPU | Parallel pipeline execution |
3.2 Data Stores
| Component | Spec | Notes |
|---|
| PostgreSQL | RDS db.t3.medium (2 vCPU, 4 GiB) | Automated backups, single-AZ |
| Redis | ElastiCache cache.t3.small (1.5 GiB) | Single node, snapshot backups |
3.3 Cost Breakdown
| Item | Monthly Cost |
|---|
| 3x t3.xlarge on-demand | ~$290 |
| EKS control plane | ~$73 |
| RDS db.t3.medium | ~$55 |
| ElastiCache cache.t3.small | ~$25 |
| EBS + storage | ~$15 |
| Data transfer | ~$20 |
| Total | ~$480-600 |
3.4 Key Improvements Over Startup
- Managed database with automated backups and point-in-time recovery.
- Horizontal Pod Autoscaler for the application tier.
- Multi-AZ node placement for compute resilience.
- Dedicated Redis for consistent cache performance.
4. Enterprise Tier
Target: High-volume production deployments exceeding 1,000 requests per second with strict availability, compliance, and multi-region requirements.
4.1 Compute
| Component | Spec | Notes |
|---|
| EKS Nodes | 6+ m6i.xlarge (4 vCPU, 16 GiB) | Multi-AZ, cluster autoscaler |
| Next.js App | 6+ replicas, 2Gi RAM, 1 CPU | HPA + PDB (minAvailable: 3) |
| Pipeline Workers | 3-4 replicas, 2Gi RAM, 1 CPU | Autoscaled on queue depth |
| Decision Cache | Dedicated Redis read replicas | Sub-millisecond cached decisions |
4.2 Data Stores
| Component | Spec | Notes |
|---|
| PostgreSQL | RDS db.r6g.large (2 vCPU, 16 GiB) Multi-AZ | Read replicas, IAM auth, encrypted |
| Redis | ElastiCache cluster (3 shards, 2 replicas) | Cluster mode, auto-failover |
4.3 Cost Breakdown
| Item | Monthly Cost |
|---|
| 6x m6i.xlarge on-demand | ~$690 |
| EKS control plane | ~$73 |
| RDS db.r6g.large Multi-AZ | ~$400 |
| RDS read replica | ~$200 |
| ElastiCache cluster (3 shards) | ~$450 |
| EBS + storage | ~$50 |
| Data transfer + NAT gateway | ~$100 |
| WAF + Shield Standard | ~$50 |
| Total | ~$2,000-2,500 |
4.4 Key Improvements Over Growth
- Multi-AZ RDS with synchronous replication and automatic failover.
- Read replicas to offload analytics and reporting queries.
- ElastiCache cluster mode for horizontal cache scaling.
- Pod Disruption Budgets ensure rolling updates never drop below minimum replicas.
- Cluster Autoscaler adjusts node count based on pending pod demand.
5. Component Sizing Guide
5.1 Next.js Application Pods
| Metric | Startup | Growth | Enterprise |
|---|
| Replicas | 2 | 3-4 (HPA) | 6+ (HPA) |
| CPU request/limit | 250m / 500m | 500m / 1 | 1 / 2 |
| Memory request/limit | 512Mi / 1Gi | 1Gi / 2Gi | 2Gi / 4Gi |
| HPA target | N/A | 70% CPU | 70% CPU |
5.2 Pipeline Workers
| Metric | Startup | Growth | Enterprise |
|---|
| Replicas | 1 | 2 | 3-4 (KEDA) |
| CPU request/limit | 250m / 500m | 500m / 1 | 1 / 2 |
| Memory request/limit | 512Mi / 1Gi | 1Gi / 2Gi | 2Gi / 4Gi |
| Scaling trigger | N/A | Manual | Queue depth |
5.3 PostgreSQL
| Metric | Startup | Growth | Enterprise |
|---|
| Instance | In-cluster pod | RDS db.t3.medium | RDS db.r6g.large Multi-AZ |
| Storage | 20 GiB gp3 | 50 GiB gp3 | 200 GiB io2 (3000 IOPS) |
| Max connections | 100 | 200 | 500 |
| Backups | Manual | Automated (7 days) | Automated (30 days) + snapshots |
| Read replicas | 0 | 0 | 1-2 |
5.4 Redis
| Metric | Startup | Growth | Enterprise |
|---|
| Instance | In-cluster pod | cache.t3.small | Cluster mode (3 shards) |
| Memory | 1 GiB | 1.5 GiB | 3x 6.5 GiB (19.5 GiB total) |
| Persistence | None | Snapshot | AOF + snapshot |
| Failover | None | None | Automatic (Multi-AZ) |
6. Monitoring Thresholds and Scaling Triggers
Use the following thresholds to determine when to scale up or transition to the next tier.
6.1 Compute Scaling Triggers
| Metric | Warning Threshold | Action |
|---|
| Node CPU utilization (avg) | > 70% sustained | Add nodes or increase instance size |
| Node memory utilization (avg) | > 75% sustained | Add nodes or increase instance size |
| Pod CPU throttling | > 10% of periods | Increase CPU limits or add replicas |
| Pending pods (unschedulable) | > 0 for 5 min | Enable cluster autoscaler or add nodes |
| HPA at max replicas | Sustained 15 min | Increase maxReplicas or node capacity |
6.2 Database Scaling Triggers
| Metric | Warning Threshold | Action |
|---|
| RDS CPU utilization | > 70% sustained | Upgrade instance class |
| RDS freeable memory | < 500 MiB | Upgrade instance class |
| RDS connection count | > 80% of max | Add read replicas or use PgBouncer |
| RDS read latency | > 10 ms avg | Add read replica for read-heavy queries |
| RDS free storage | < 20% | Enable autoscaling or increase volume |
| RDS IOPS utilization | > 80% of baseline | Upgrade to io2 or increase provisioned |
6.3 Cache Scaling Triggers
| Metric | Warning Threshold | Action |
|---|
| Redis CPU utilization | > 65% | Upgrade instance or add shards |
| Redis memory utilization | > 80% | Increase instance size or add shards |
| Redis evictions | > 0 sustained | Increase memory or review TTL policies |
| Redis cache hit rate | < 90% | Review cache strategy, increase memory |
7. When to Upgrade Tiers
Startup to Growth
Upgrade when any of the following conditions persist for more than one week:
- Sustained RPS exceeds 80.
- Database connection count regularly exceeds 80.
- Application pod CPU consistently above 70%.
- Downtime from in-cluster database restarts is unacceptable.
- Business requires automated backups or point-in-time recovery.
Growth to Enterprise
Upgrade when any of the following conditions persist:
- Sustained RPS exceeds 800.
- Decision latency P99 approaches the 200ms SLO limit.
- Compliance requirements mandate Multi-AZ database or encryption at rest.
- Read replica is needed to offload analytics workloads.
- Cache evictions occur despite proper TTL tuning.
- Business requires 99.9% or higher availability with automatic failover.
8. Cost Optimization Tips
- Reserved Instances: Purchase 1-year reserved instances for predictable node types to save 30-40%.
- Spot Instances: Use spot instances for pipeline worker nodes (stateless, tolerant of interruption).
- Right-sizing: Review CloudWatch/Prometheus metrics monthly. Downsize over-provisioned instances.
- Storage tiering: Use gp3 for general workloads, io2 only when IOPS-bound.
- Data transfer: Keep services in the same AZ where possible. Use VPC endpoints for AWS services.
- Scheduled scaling: Scale down non-production environments outside business hours.