Deployment Tiers
Choose a tier based on your traffic volume, latency requirements, and budget.| Tier | Compute | Database | Cache | Monthly Cost | Decisions/sec | P99 Latency |
|---|---|---|---|---|---|---|
| Hobby | App Runner 0.25 vCPU / 0.5 GB | Supabase Free (500 MB) | Upstash Free (10K cmd/day) | ~$6-8 | 5-10 | ~800ms |
| Startup | App Runner 1 vCPU / 2 GB | Neon Pro or RDS t3.micro | Upstash Pro or ElastiCache t3.micro | ~$50-80 | 50-100 | ~200ms |
| Growth | ECS 2x t3.medium | RDS r6g.large (multi-AZ) | ElastiCache r6g.large | ~$300-500 | 500-1K | ~100ms |
| Enterprise | EKS 4+ nodes (c6g.xlarge) | Aurora r6g.xlarge (multi-AZ) | ElastiCache cluster (3 shards) | ~$1,500-3K+ | 5K-50K+ | ~50ms |
Response Time Breakdown
What happens during a/api/v1/recommend call:
| Stage | Description | Typical Duration |
|---|---|---|
| Request parsing & validation | Zod schema validation | 1-2ms |
| Inventory lookup | Load active offers from DB | 5-15ms |
| Enrichment (if configured) | Query schema tables for customer data | 10-30ms |
| Qualification | Apply eligibility rules | 2-5ms |
| Contact policy check | Frequency cap, cooldown evaluation | 2-5ms |
| Scoring | Run scoring model (scorecard/Bayesian/etc.) | 5-20ms |
| Ranking | Sort and select top N | 1-3ms |
| Arbitration (if configured) | Multi-objective optimization | 3-10ms |
| Response serialization | Build JSON response | 1-2ms |
| Total P50 | 30-90ms | |
| Total P95 | 80-200ms | |
| Total P99 | 120-400ms |
These timings are for a typical pipeline with 50-100 candidate offers.
Latency scales roughly linearly with candidate count.
Scaling Levers
Horizontal scaling
Horizontal scaling
Add more App Runner or ECS instances. Each instance handles approximately 100-500 req/s
depending on pipeline complexity. App Runner auto-scales based on concurrency — set the
MaxConcurrency parameter to control when new instances spin up.Vertical scaling
Vertical scaling
A larger database instance reduces query time. This is most impactful when qualification
rules or enrichment queries are complex. Moving from
t3.micro to r6g.large can cut
DB-bound latency by 60-70%.Redis caching
Redis caching
Enrichment caching reduces database load by 80%+. A cache hit resolves in ~1-2ms versus
~10-30ms for a cache miss. Set the
REDIS_URL environment variable to enable caching in
production. This is the single highest-impact optimization.Read replicas
Read replicas
Use read replicas for dashboard and analytics queries. Keep the primary instance dedicated
to decision writes and real-time reads. RDS and Aurora both support up to 15 read replicas.
Connection pooling
Connection pooling
PgBouncer reduces PostgreSQL connection overhead and is essential at more than 50 concurrent
connections. The Helm chart includes a PgBouncer sidecar by default. For App Runner, use
Supabase’s built-in connection pooler or deploy PgBouncer separately.
Batch mode
Batch mode
For non-real-time use cases, use the
/api/v1/decide endpoint with batch customer lists.
Batch mode amortizes connection and parsing overhead across many decisions, achieving
higher throughput at the cost of individual response latency.Cost Optimization Tips
Start small
Begin with the Hobby tier for evaluation. Scale up only when you have real traffic
that demands it.
Free database tiers
Supabase and Neon free tiers provide sufficient PostgreSQL capacity for development
and small production workloads.
Enable Redis early
Redis caching is the single biggest performance win. Enable it before scaling compute.
Pay-per-request pricing
App Runner charges per request-second, making it cost-effective for unpredictable
or bursty traffic patterns.
Graduate to containers
Move to ECS or EKS only when you need sustained throughput above 100 req/s.
Container orchestration adds operational overhead.
Spot instances for workers
Use Spot instances for worker pods handling non-latency-sensitive batch processing.
Spot pricing can reduce compute costs by 60-90%.