Cost & Performance Guide

Deployment Tiers

Choose a tier based on your traffic volume, latency requirements, and budget.

Tier	Compute	Database	Cache	Monthly Cost	Decisions/sec	P99 Latency
Hobby	App Runner 0.25 vCPU / 0.5 GB	Supabase Free (500 MB)	Upstash Free (10K cmd/day)	~$6-8	5-10	~800ms
Startup	App Runner 1 vCPU / 2 GB	Neon Pro or RDS t3.micro	Upstash Pro or ElastiCache t3.micro	~$50-80	50-100	~200ms
Growth	ECS 2x t3.medium	RDS r6g.large (multi-AZ)	ElastiCache r6g.large	~$300-500	500-1K	~100ms
Enterprise	EKS 4+ nodes (c6g.xlarge)	Aurora r6g.xlarge (multi-AZ)	ElastiCache cluster (3 shards)	~$1,500-3K+	5K-50K+	~50ms

The Hobby tier is perfect for evaluation and proof-of-concept work. You can scale up to Startup with minimal configuration changes when you are ready for production traffic.

Response Time Breakdown

What happens during a /api/v1/recommend call:

Stage	Description	Typical Duration
Request parsing & validation	Zod schema validation	1-2ms
Inventory lookup	Load active offers from DB	5-15ms
Enrichment (if configured)	Query schema tables for customer data	10-30ms
Qualification	Apply eligibility rules	2-5ms
Contact policy check	Frequency cap, cooldown evaluation	2-5ms
Scoring	Run scoring model (scorecard/Bayesian/etc.)	5-20ms
Ranking	Sort and select top N	1-3ms
Portfolio Optimization (if configured)	Multi-objective optimization	3-10ms
Response serialization	Build JSON response	1-2ms
Total P50		30-90ms
Total P95		80-200ms
Total P99		120-400ms

These timings are for a typical pipeline with 50-100 candidate offers. Latency scales roughly linearly with candidate count.

Scaling Levers

Horizontal scaling

Add more App Runner or ECS instances. Each instance handles approximately 100-500 req/s depending on pipeline complexity. App Runner auto-scales based on concurrency — set the MaxConcurrency parameter to control when new instances spin up.

Vertical scaling

A larger database instance reduces query time. This is most impactful when qualification rules or enrichment queries are complex. Moving from t3.micro to r6g.large can cut DB-bound latency by 60-70%.

Redis caching

Enrichment caching reduces database load by 80%+. A cache hit resolves in ~1-2ms versus ~10-30ms for a cache miss. Set the REDIS_URL environment variable to enable caching in production. This is the single highest-impact optimization.

Read replicas

Use read replicas for dashboard and analytics queries. Keep the primary instance dedicated to decision writes and real-time reads. RDS and Aurora both support up to 15 read replicas.

Connection pooling

PgBouncer reduces PostgreSQL connection overhead and is essential at more than 50 concurrent connections. The Helm chart includes a PgBouncer sidecar by default. For App Runner, use Supabase’s built-in connection pooler or deploy PgBouncer separately.

Batch mode

For non-real-time use cases, use the /api/v1/decide endpoint with batch customer lists. Batch mode amortizes connection and parsing overhead across many decisions, achieving higher throughput at the cost of individual response latency.

Cost Optimization Tips

Start small

Begin with the Hobby tier for evaluation. Scale up only when you have real traffic that demands it.

Free database tiers

Supabase and Neon free tiers provide sufficient PostgreSQL capacity for development and small production workloads.

Enable Redis early

Redis caching is the single biggest performance win. Enable it before scaling compute.

Pay-per-request pricing

App Runner charges per request-second, making it cost-effective for unpredictable or bursty traffic patterns.

Graduate to containers

Move to ECS or EKS only when you need sustained throughput above 100 req/s. Container orchestration adds operational overhead.

Spot instances for workers

Use Spot instances for worker pods handling non-latency-sensitive batch processing. Spot pricing can reduce compute costs by 60-90%.

Get Started

Deploy & Operate

Runbooks

Data Platform

Decisioning Studio

Execute & Optimize

Intelligence

Platform & Security

Integrations

Reports

Release Notes

Cost & Performance Guide

Deployment Tiers

Response Time Breakdown

Scaling Levers

Cost Optimization Tips

Start small

Free database tiers

Enable Redis early

Pay-per-request pricing

Graduate to containers

Spot instances for workers

Get Started

Deploy & Operate

Runbooks

Data Platform

Decisioning Studio

Execute & Optimize

Intelligence

Platform & Security

Integrations

Reports

Release Notes

Documentation Index

​Deployment Tiers

​Response Time Breakdown

​Scaling Levers

​Cost Optimization Tips

Start small

Free database tiers

Enable Redis early

Pay-per-request pricing

Graduate to containers

Spot instances for workers

Deployment Tiers

Response Time Breakdown

Scaling Levers

Cost Optimization Tips