Overview
The chart packages everything needed for a production KaireonAI deployment:| Component | Description |
|---|---|
| API Deployment | Next.js application with health checks and HPA |
| Worker Deployment | BullMQ background job processor for pipelines and model retraining |
| ML Worker | Optional Python/FastAPI service for AI-powered analysis |
| PostgreSQL | Internal StatefulSet or external managed database |
| Redis | Internal StatefulSet or external managed cache |
| PgBouncer | Connection pooling for PostgreSQL |
| Prometheus | Metrics collection with pre-configured scrape targets |
| Grafana | 6 auto-provisioned dashboards |
| Ingress | HTTPS with AWS ALB or nginx ingress controller |
| NetworkPolicies | Pod-to-pod and egress traffic restrictions |
| RBAC | ServiceAccounts, Roles, and RoleBindings |
Prerequisites
- Kubernetes 1.24+
- Helm 3.x
kubectlconfigured for your cluster- Container images pushed to a registry accessible from the cluster
Quick Start
Deployment Modes
KaireonAI supports three deployment modes depending on your environment and requirements.Dev Mode — Single replicas, all in-cluster
Dev Mode — Single replicas, all in-cluster
t3.medium node (~2 vCPU, 4 GiB).| Service | Estimated 7-Day Cost |
|---|---|
| EKS Control Plane | $16.80 |
| EC2 Node (1x t3.medium) | $7.00 |
| ALB | $4.20 |
| EBS Storage (~10 GiB) | $1.00 |
| Total | ~$29 |
values-minimal.yaml:- API and worker: 1 replica each
- HPA and KEDA disabled
- Prometheus and Grafana disabled
- Reduced resource requests (256Mi memory, 250m CPU)
- Smaller PVC sizes (5 GiB database, 2 GiB Redis)
App Mode — External database and Redis, production-ready
App Mode — External database and Redis, production-ready
Full Mode — All components including monitoring stack
Full Mode — All components including monitoring stack
values.yaml enables the full mode with 3 API replicas, 2 worker replicas, HPA, KEDA autoscaling, Prometheus, and Grafana.Values Reference
All configurable values are defined inhelm/values.yaml. The sections below document each configuration group.
Global
| Key | Type | Default | Description |
|---|---|---|---|
namespace | string | kaireon | Kubernetes namespace for all resources |
API
api.* — API deployment configuration
api.* — API deployment configuration
| Key | Type | Default | Description |
|---|---|---|---|
api.image.repository | string | ECR repo | Container image repository |
api.image.tag | string | "082a1e2" | Image tag |
api.image.pullPolicy | string | IfNotPresent | Image pull policy |
api.replicas | int | 3 | Number of API replicas |
api.resources.requests.memory | string | "512Mi" | Memory request |
api.resources.requests.cpu | string | "500m" | CPU request |
api.resources.limits.memory | string | "2Gi" | Memory limit |
api.resources.limits.cpu | string | "2000m" | CPU limit |
api.hpa.enabled | bool | true | Enable Horizontal Pod Autoscaler |
api.hpa.minReplicas | int | 3 | Minimum replicas |
api.hpa.maxReplicas | int | 20 | Maximum replicas |
api.hpa.targetCPUUtilization | int | 70 | CPU target percentage for scaling |
api.hpa.targetMemoryUtilization | int | 80 | Memory target percentage for scaling |
api.env.NEXTAUTH_URL | string | "https://app.kaireon.com" | Public URL for NextAuth callbacks |
- Scale up: stabilization window of 60s, add up to 4 pods per 60s
- Scale down: stabilization window of 300s, remove up to 10% of pods per 60s
Worker
worker.* — Worker deployment configuration
worker.* — Worker deployment configuration
| Key | Type | Default | Description |
|---|---|---|---|
worker.image.repository | string | ECR repo | Worker image repository |
worker.image.tag | string | "082a1e2" | Image tag |
worker.replicas | int | 2 | Number of worker replicas |
worker.resources.requests.memory | string | "1Gi" | Memory request |
worker.resources.requests.cpu | string | "1000m" | CPU request |
worker.resources.limits.memory | string | "4Gi" | Memory limit |
worker.resources.limits.cpu | string | "4000m" | CPU limit |
worker.keda.enabled | bool | true | Enable KEDA-based autoscaling |
worker.keda.minReplicas | int | 1 | Minimum worker replicas |
worker.keda.maxReplicas | int | 10 | Maximum worker replicas |
worker.keda.queueThreshold | string | "5" | Queue depth threshold for scaling |
ML Worker
mlWorker.* — ML Worker deployment configuration (optional)
mlWorker.* — ML Worker deployment configuration (optional)
| Key | Type | Default | Description |
|---|---|---|---|
mlWorker.enabled | bool | false | Deploy the ML Worker |
mlWorker.image.repository | string | ECR repo | ML Worker image repository |
mlWorker.image.tag | string | "latest" | Image tag |
mlWorker.replicas | int | 1 | Number of ML Worker replicas |
mlWorker.resources.requests.memory | string | "1Gi" | Memory request |
mlWorker.resources.requests.cpu | string | "500m" | CPU request |
mlWorker.resources.limits.memory | string | "4Gi" | Memory limit |
mlWorker.resources.limits.cpu | string | "2000m" | CPU limit |
ML_WORKER_URL into the API pods.Config
config.* — Application configuration
config.* — Application configuration
| Key | Type | Default | Description |
|---|---|---|---|
config.LOG_LEVEL | string | "info" | Log level: debug, info, warn, error |
config.WORKER_CONCURRENCY | string | "5" | Concurrent jobs per worker pod |
config.NODE_ENV | string | "production" | Node.js environment |
config.EVENT_PUBLISHER | string | "redis" | Event bus backend: redis, kafka, msk, eventbridge, kinesis |
config.INTERACTION_STORE | string | "pg" | Interaction history store: pg, dynamodb, keyspaces, scylla |
config.SEARCH_INDEX | string | "pg" | Search index backend: pg, opensearch |
Database
database.* — PostgreSQL configuration
database.* — PostgreSQL configuration
database.mode to control how PostgreSQL is provisioned:internal— Deploys a PostgreSQL 16 StatefulSet inside the clusterexternal— Connects to a managed database (RDS, Cloud SQL, Supabase, etc.)
| Key | Type | Default | Description |
|---|---|---|---|
database.internal.image | string | postgres:16-alpine | PostgreSQL image |
database.internal.storage | string | 10Gi | PVC storage size |
database.internal.username | string | kaireon | Database user |
database.internal.password | string | "" | Password (auto-generated 32-char random if empty) |
database.internal.database | string | kaireon | Database name |
database.internal.resources | object | 256Mi/250m - 1Gi/1000m | Resource requests/limits |
| Key | Type | Default | Description |
|---|---|---|---|
database.external.host | string | "" | Database hostname |
database.external.port | int | 5432 | Database port |
database.external.name | string | kaireon | Database name |
database.external.username | string | "" | Database user |
database.external.password | string | "" | Database password |
database.external.sslMode | string | require | SSL mode: require, no-verify, disable |
database.external.existingSecret | string | "" | Use existing K8s secret instead of password |
database.external.secretKey | string | password | Key within the existing secret |
Secrets
secrets.* — Application secrets
secrets.* — Application secrets
| Key | Type | Default | Description |
|---|---|---|---|
secrets.provider | string | kubernetes | Secrets backend |
secrets.NEXTAUTH_SECRET | string | "" | NextAuth session encryption key |
secrets.JWT_SIGNING_SECRET | string | "" | JWT signing key |
secrets.CONNECTOR_ENCRYPTION_KEY | string | "" | Encryption key for stored connector credentials |
Redis
redis.* — Redis configuration
redis.* — Redis configuration
redis.mode to control how Redis is provisioned:internal— Deploys a Redis 7 StatefulSet inside the clusterexternal— Connects to a managed Redis (ElastiCache, Upstash, etc.)
| Key | Type | Default | Description |
|---|---|---|---|
redis.internal.enabled | bool | true | Deploy Redis StatefulSet |
redis.internal.image.repository | string | redis | Redis image |
redis.internal.image.tag | string | 7-alpine | Redis image tag |
redis.internal.storage | string | 10Gi | PVC storage size |
redis.internal.maxmemory | string | "512mb" | Redis max memory |
redis.internal.resources | object | 256Mi/250m - 1Gi/1000m | Resource requests/limits |
| Key | Type | Default | Description |
|---|---|---|---|
redis.external.host | string | "" | Redis hostname |
redis.external.port | int | 6379 | Redis port |
redis.external.tls | bool | true | Enable TLS |
redis.external.password | string | "" | Redis password |
redis.external.existingSecret | string | "" | Use existing K8s secret instead of password |
redis.external.secretKey | string | password | Key within the existing secret |
PgBouncer
pgbouncer.* — Connection pooling configuration
pgbouncer.* — Connection pooling configuration
| Key | Type | Default | Description |
|---|---|---|---|
pgbouncer.enabled | bool | true | Deploy PgBouncer connection pooler |
pgbouncer.image.repository | string | edoburu/pgbouncer | PgBouncer image |
pgbouncer.image.tag | string | "1.22.0" | PgBouncer version |
pgbouncer.poolMode | string | transaction | Pool mode: transaction, session, statement |
pgbouncer.defaultPoolSize | int | 25 | Connections per user/database pair |
pgbouncer.maxClientConn | int | 1000 | Max client connections |
pgbouncer.maxDbConnections | int | 25 | Max server connections to PostgreSQL |
transaction mode) is recommended for Next.js applications. It allows multiple clients to share database connections between transactions, significantly reducing the number of connections to PostgreSQL.Ingress
ingress.* — Ingress and TLS configuration
ingress.* — Ingress and TLS configuration
| Key | Type | Default | Description |
|---|---|---|---|
ingress.enabled | bool | true | Create Ingress resource |
ingress.className | string | alb | Ingress class: alb (AWS) or nginx |
ingress.host | string | app.kaireon.com | Hostname for the application |
className: alb):| Key | Type | Default | Description |
|---|---|---|---|
ingress.aws.certificateArn | string | "" | ACM certificate ARN for HTTPS |
ingress.aws.scheme | string | internet-facing | internet-facing or internal |
ingress.aws.targetType | string | ip | ip (Fargate/CNI) or instance |
ingress.aws.wafAclArn | string | "" | Optional WAF WebACL ARN |
className: nginx):| Key | Type | Default | Description |
|---|---|---|---|
ingress.tls.enabled | bool | true | Enable TLS |
ingress.tls.secretName | string | kaireon-tls | TLS secret name |
ingress.tls.clusterIssuer | string | letsencrypt-prod | cert-manager ClusterIssuer |
ingress.annotations | object | {} | Additional Ingress annotations |
DNS
externalDns.* — Route53 via external-dns
externalDns.* — Route53 via external-dns
| Key | Type | Default | Description |
|---|---|---|---|
externalDns.enabled | bool | false | Auto-create Route53 records |
externalDns.hostedZoneId | string | "" | Route53 hosted zone ID |
externalDns.txtOwnerId | string | kaireon | TXT record owner ID |
Monitoring
monitoring.* — Prometheus and Grafana
monitoring.* — Prometheus and Grafana
| Key | Type | Default | Description |
|---|---|---|---|
monitoring.prometheus.enabled | bool | true | Deploy Prometheus |
monitoring.prometheus.image.tag | string | v2.51.0 | Prometheus version |
monitoring.prometheus.retention | string | 7d | Metrics retention period |
monitoring.grafana.enabled | bool | true | Deploy Grafana with auto-provisioned dashboards |
monitoring.grafana.image.tag | string | "10.4.1" | Grafana version |
monitoring.grafana.adminUser | string | admin | Grafana admin username |
monitoring.grafana.adminPassword | string | "" | Grafana admin password |
Event Bus (Optional)
Setconfig.EVENT_PUBLISHER to activate an event bus backend. The default is redis which requires no extra configuration.
kafka.* / msk.* / eventbridge.* / kinesis.* — Event bus backends
kafka.* / msk.* / eventbridge.* / kinesis.* — Event bus backends
| Key | Type | Default | Description |
|---|---|---|---|
kafka.enabled | bool | false | Enable Kafka |
kafka.brokers | string | "" | Broker addresses (broker1:9092,broker2:9092) |
kafka.clientId | string | "kaireon-platform" | Kafka client ID |
kafka.tlsEnabled | bool | false | Enable TLS |
kafka.saslMechanism | string | "" | SASL auth: none, plain, scram-sha-256, scram-sha-512 |
| Key | Type | Default | Description |
|---|---|---|---|
msk.enabled | bool | false | Enable MSK |
msk.brokers | string | "" | MSK broker endpoints |
msk.region | string | "eu-west-2" | AWS region |
msk.authMode | string | "iam_role" | Auth: iam_role or sasl_scram |
| Key | Type | Default | Description |
|---|---|---|---|
eventbridge.enabled | bool | false | Enable EventBridge |
eventbridge.region | string | "eu-west-2" | AWS region |
eventbridge.busName | string | "kaireon-events" | Event bus name |
eventbridge.authMode | string | "iam_role" | Auth: iam_role or access_key |
| Key | Type | Default | Description |
|---|---|---|---|
kinesis.enabled | bool | false | Enable Kinesis |
kinesis.region | string | "eu-west-2" | AWS region |
kinesis.streamName | string | "kaireon-events" | Stream name |
kinesis.partitionKey | string | "tenantId" | Partition key |
Interaction Store (Optional)
Setconfig.INTERACTION_STORE to activate an alternative interaction history backend. The default is pg (PostgreSQL).
dynamodb.* / keyspaces.* / scylla.* — Interaction store backends
dynamodb.* / keyspaces.* / scylla.* — Interaction store backends
| Key | Type | Default | Description |
|---|---|---|---|
dynamodb.enabled | bool | false | Enable DynamoDB |
dynamodb.region | string | "eu-west-2" | AWS region |
dynamodb.tableName | string | "kaireon-interactions" | Table name |
dynamodb.authMode | string | "iam_role" | Auth: iam_role or access_key |
| Key | Type | Default | Description |
|---|---|---|---|
keyspaces.enabled | bool | false | Enable Keyspaces |
keyspaces.region | string | "eu-west-2" | AWS region |
keyspaces.keyspace | string | "kaireon" | Keyspace name |
| Key | Type | Default | Description |
|---|---|---|---|
scylla.enabled | bool | false | Enable ScyllaDB |
scylla.contactPoints | string | "" | Contact points (host1:9042,host2:9042) |
scylla.localDatacenter | string | "datacenter1" | Local datacenter |
scylla.keyspace | string | "kaireon" | Keyspace name |
scylla.replicationFactor | int | 3 | Replication factor |
Search Index (Optional)
Setconfig.SEARCH_INDEX to activate an alternative search backend. The default is pg (PostgreSQL tsvector).
opensearch.* — OpenSearch configuration
opensearch.* — OpenSearch configuration
| Key | Type | Default | Description |
|---|---|---|---|
opensearch.enabled | bool | false | Enable OpenSearch |
opensearch.nodeUrl | string | "" | OpenSearch endpoint |
opensearch.authMode | string | "basic" | Auth: basic or iam_role |
opensearch.indexPrefix | string | "kaireon-" | Index name prefix |
Grafana Dashboards
The chart includes 6 pre-built Grafana dashboards that are auto-provisioned fromhelm/dashboards/. When monitoring.grafana.enabled=true, these dashboards are available immediately after deployment.
| Dashboard | File | Key Panels |
|---|---|---|
| API Overview | api-overview.json | Request rates, error rates, latency percentiles (p50/p95/p99), HTTP status breakdown |
| Decision Engine | decision-engine.json | Pipeline stage durations, candidate counts, scoring latency, cache hit rates |
| Decision Performance | decision-performance.json | Scoring model performance, qualification rates, conversion tracking, uplift metrics |
| Infrastructure | infrastructure.json | CPU and memory utilization, disk I/O, network throughput, pod restarts |
| Model Health | model-health.json | Model AUC tracking, drift detection, retraining triggers, prediction distributions |
| Worker Queues | worker-queues.json | Queue depth, processing rates, job durations, DLQ counts, retry rates |
/api/v1/metrics. The Prometheus deployment is pre-configured to scrape this endpoint. Key metrics include kaireon_decisions_total, kaireon_decision_latency_ms, kaireon_pipeline_executions_total, and kaireon_api_requests_total.