Skip to main content

Deployment Tiers

Choose a tier based on your traffic volume, latency requirements, and budget.
TierComputeDatabaseCacheMonthly CostDecisions/secP99 Latency
HobbyApp Runner 0.25 vCPU / 0.5 GBSupabase Free (500 MB)Upstash Free (10K cmd/day)~$6-85-10~800ms
StartupApp Runner 1 vCPU / 2 GBNeon Pro or RDS t3.microUpstash Pro or ElastiCache t3.micro~$50-8050-100~200ms
GrowthECS 2x t3.mediumRDS r6g.large (multi-AZ)ElastiCache r6g.large~$300-500500-1K~100ms
EnterpriseEKS 4+ nodes (c6g.xlarge)Aurora r6g.xlarge (multi-AZ)ElastiCache cluster (3 shards)~$1,500-3K+5K-50K+~50ms
The Hobby tier is perfect for evaluation and proof-of-concept work. You can scale up to Startup with minimal configuration changes when you are ready for production traffic.

Response Time Breakdown

What happens during a /api/v1/recommend call:
StageDescriptionTypical Duration
Request parsing & validationZod schema validation1-2ms
Inventory lookupLoad active offers from DB5-15ms
Enrichment (if configured)Query schema tables for customer data10-30ms
QualificationApply eligibility rules2-5ms
Contact policy checkFrequency cap, cooldown evaluation2-5ms
ScoringRun scoring model (scorecard/Bayesian/etc.)5-20ms
RankingSort and select top N1-3ms
Arbitration (if configured)Multi-objective optimization3-10ms
Response serializationBuild JSON response1-2ms
Total P5030-90ms
Total P9580-200ms
Total P99120-400ms
These timings are for a typical pipeline with 50-100 candidate offers. Latency scales roughly linearly with candidate count.

Scaling Levers

Add more App Runner or ECS instances. Each instance handles approximately 100-500 req/s depending on pipeline complexity. App Runner auto-scales based on concurrency — set the MaxConcurrency parameter to control when new instances spin up.
A larger database instance reduces query time. This is most impactful when qualification rules or enrichment queries are complex. Moving from t3.micro to r6g.large can cut DB-bound latency by 60-70%.
Enrichment caching reduces database load by 80%+. A cache hit resolves in ~1-2ms versus ~10-30ms for a cache miss. Set the REDIS_URL environment variable to enable caching in production. This is the single highest-impact optimization.
Use read replicas for dashboard and analytics queries. Keep the primary instance dedicated to decision writes and real-time reads. RDS and Aurora both support up to 15 read replicas.
PgBouncer reduces PostgreSQL connection overhead and is essential at more than 50 concurrent connections. The Helm chart includes a PgBouncer sidecar by default. For App Runner, use Supabase’s built-in connection pooler or deploy PgBouncer separately.
For non-real-time use cases, use the /api/v1/decide endpoint with batch customer lists. Batch mode amortizes connection and parsing overhead across many decisions, achieving higher throughput at the cost of individual response latency.

Cost Optimization Tips

Start small

Begin with the Hobby tier for evaluation. Scale up only when you have real traffic that demands it.

Free database tiers

Supabase and Neon free tiers provide sufficient PostgreSQL capacity for development and small production workloads.

Enable Redis early

Redis caching is the single biggest performance win. Enable it before scaling compute.

Pay-per-request pricing

App Runner charges per request-second, making it cost-effective for unpredictable or bursty traffic patterns.

Graduate to containers

Move to ECS or EKS only when you need sustained throughput above 100 req/s. Container orchestration adds operational overhead.

Spot instances for workers

Use Spot instances for worker pods handling non-latency-sensitive batch processing. Spot pricing can reduce compute costs by 60-90%.