The KaireonAI Helm chart deploys the complete platform to any Kubernetes cluster — EKS, GKE, AKS, or self-managed. The chart includes the API server, worker, PostgreSQL, Redis, PgBouncer, Prometheus, Grafana, ingress, network policies, and RBAC resources.Documentation Index
Fetch the complete documentation index at: https://docs.kaireonai.com/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The chart packages everything needed for a production KaireonAI deployment:| Component | Description |
|---|---|
| API Deployment | Next.js application with health checks and HPA |
| Worker Deployment | BullMQ background job processor for pipelines and model retraining |
| ML Worker | Optional Python/FastAPI service for AI-powered analysis |
| PostgreSQL | Internal StatefulSet or external managed database |
| Redis | Internal StatefulSet or external managed cache |
| PgBouncer | Connection pooling for PostgreSQL |
| Prometheus | Metrics collection with pre-configured scrape targets |
| Grafana | 6 auto-provisioned dashboards |
| Ingress | HTTPS with AWS ALB or nginx ingress controller |
| NetworkPolicies | Pod-to-pod and egress traffic restrictions |
| RBAC | ServiceAccounts, Roles, and RoleBindings |
Prerequisites
- Kubernetes 1.24+
- Helm 3.x
kubectlconfigured for your cluster- Container images pushed to a registry accessible from the cluster
Quick Start
Deployment Modes
KaireonAI supports three deployment modes depending on your environment and requirements.Dev Mode — Single replicas, all in-cluster
Dev Mode — Single replicas, all in-cluster
t3.medium node (~2 vCPU, 4 GiB).| Service | Estimated 7-Day Cost |
|---|---|
| EKS Control Plane | $16.80 |
| EC2 Node (1x t3.medium) | $7.00 |
| ALB | $4.20 |
| EBS Storage (~10 GiB) | $1.00 |
| Total | ~$29 |
values-minimal.yaml:- API and worker: 1 replica each
- HPA and KEDA disabled
- Prometheus and Grafana disabled
- Reduced resource requests (256Mi memory, 250m CPU)
- Smaller PVC sizes (5 GiB database, 2 GiB Redis)
App Mode — External database and Redis, production-ready
App Mode — External database and Redis, production-ready
Full Mode — All components including monitoring stack
Full Mode — All components including monitoring stack
values.yaml enables the full mode with 3 API replicas, 2 worker replicas, HPA, KEDA autoscaling, Prometheus, and Grafana.Values Reference
All configurable values are defined inhelm/values.yaml. The sections below document each configuration group.
Global
| Key | Type | Default | Description |
|---|---|---|---|
namespace | string | kaireon | Kubernetes namespace for all resources |
API
api.* — API deployment configuration
api.* — API deployment configuration
| Key | Type | Default | Description |
|---|---|---|---|
api.image.repository | string | ECR repo | Container image repository |
api.image.tag | string | "082a1e2" | Image tag |
api.image.pullPolicy | string | IfNotPresent | Image pull policy |
api.replicas | int | 3 | Number of API replicas |
api.resources.requests.memory | string | "512Mi" | Memory request |
api.resources.requests.cpu | string | "500m" | CPU request |
api.resources.limits.memory | string | "2Gi" | Memory limit |
api.resources.limits.cpu | string | "2000m" | CPU limit |
api.hpa.enabled | bool | true | Enable Horizontal Pod Autoscaler |
api.hpa.minReplicas | int | 3 | Minimum replicas |
api.hpa.maxReplicas | int | 20 | Maximum replicas |
api.hpa.targetCPUUtilization | int | 70 | CPU target percentage for scaling |
api.hpa.targetMemoryUtilization | int | 80 | Memory target percentage for scaling |
api.env.NEXTAUTH_URL | string | "https://app.kaireon.com" | Public URL for NextAuth callbacks |
- Scale up: stabilization window of 60s, add up to 4 pods per 60s
- Scale down: stabilization window of 300s, remove up to 10% of pods per 60s
Worker
worker.* — Worker deployment configuration
worker.* — Worker deployment configuration
| Key | Type | Default | Description |
|---|---|---|---|
worker.image.repository | string | ECR repo | Worker image repository |
worker.image.tag | string | "082a1e2" | Image tag |
worker.replicas | int | 2 | Number of worker replicas |
worker.resources.requests.memory | string | "1Gi" | Memory request |
worker.resources.requests.cpu | string | "1000m" | CPU request |
worker.resources.limits.memory | string | "4Gi" | Memory limit |
worker.resources.limits.cpu | string | "4000m" | CPU limit |
worker.keda.enabled | bool | true | Enable KEDA-based autoscaling |
worker.keda.minReplicas | int | 1 | Minimum worker replicas |
worker.keda.maxReplicas | int | 10 | Maximum worker replicas |
worker.keda.queueThreshold | string | "5" | Queue depth threshold for scaling |
ML Worker
mlWorker.* — ML Worker deployment configuration (optional)
mlWorker.* — ML Worker deployment configuration (optional)
| Key | Type | Default | Description |
|---|---|---|---|
mlWorker.enabled | bool | false | Deploy the ML Worker |
mlWorker.image.repository | string | ECR repo | ML Worker image repository |
mlWorker.image.tag | string | "latest" | Image tag |
mlWorker.replicas | int | 1 | Number of ML Worker replicas |
mlWorker.resources.requests.memory | string | "1Gi" | Memory request |
mlWorker.resources.requests.cpu | string | "500m" | CPU request |
mlWorker.resources.limits.memory | string | "4Gi" | Memory limit |
mlWorker.resources.limits.cpu | string | "2000m" | CPU limit |
ML_WORKER_URL into the API pods.Config
config.* — Application configuration
config.* — Application configuration
| Key | Type | Default | Description |
|---|---|---|---|
config.LOG_LEVEL | string | "info" | Log level: debug, info, warn, error |
config.WORKER_CONCURRENCY | string | "5" | Concurrent jobs per worker pod |
config.NODE_ENV | string | "production" | Node.js environment |
config.EVENT_PUBLISHER | string | "redis" | Event bus backend: redis, kafka, msk, eventbridge, kinesis |
config.INTERACTION_STORE | string | "pg" | Interaction history store: pg, dynamodb, keyspaces, scylla |
config.SEARCH_INDEX | string | "pg" | Search index backend: pg, opensearch |
Database
database.* — PostgreSQL configuration
database.* — PostgreSQL configuration
database.mode to control how PostgreSQL is provisioned:internal— Deploys a PostgreSQL 16 StatefulSet inside the clusterexternal— Connects to a managed database (RDS, Cloud SQL, Supabase, etc.)
| Key | Type | Default | Description |
|---|---|---|---|
database.internal.image | string | postgres:16-alpine | PostgreSQL image |
database.internal.storage | string | 10Gi | PVC storage size |
database.internal.username | string | kaireon | Database user |
database.internal.password | string | "" | Password (auto-generated 32-char random if empty) |
database.internal.database | string | kaireon | Database name |
database.internal.resources | object | 256Mi/250m - 1Gi/1000m | Resource requests/limits |
| Key | Type | Default | Description |
|---|---|---|---|
database.external.host | string | "" | Database hostname |
database.external.port | int | 5432 | Database port |
database.external.name | string | kaireon | Database name |
database.external.username | string | "" | Database user |
database.external.password | string | "" | Database password |
database.external.sslMode | string | require | SSL mode: require, no-verify, disable |
database.external.existingSecret | string | "" | Use existing K8s secret instead of password |
database.external.secretKey | string | password | Key within the existing secret |
Secrets
secrets.* — Application secrets
secrets.* — Application secrets
| Key | Type | Default | Description |
|---|---|---|---|
secrets.provider | string | kubernetes | Secrets backend |
secrets.NEXTAUTH_SECRET | string | "" | NextAuth session encryption key |
secrets.JWT_SIGNING_SECRET | string | "" | JWT signing key |
secrets.CONNECTOR_ENCRYPTION_KEY | string | "" | Encryption key for stored connector credentials |
Redis
redis.* — Redis configuration
redis.* — Redis configuration
redis.mode to control how Redis is provisioned:internal— Deploys a Redis 7 StatefulSet inside the clusterexternal— Connects to a managed Redis (ElastiCache, Upstash, etc.)
| Key | Type | Default | Description |
|---|---|---|---|
redis.internal.enabled | bool | true | Deploy Redis StatefulSet |
redis.internal.image.repository | string | redis | Redis image |
redis.internal.image.tag | string | 7-alpine | Redis image tag |
redis.internal.storage | string | 10Gi | PVC storage size |
redis.internal.maxmemory | string | "512mb" | Redis max memory |
redis.internal.resources | object | 256Mi/250m - 1Gi/1000m | Resource requests/limits |
| Key | Type | Default | Description |
|---|---|---|---|
redis.external.host | string | "" | Redis hostname |
redis.external.port | int | 6379 | Redis port |
redis.external.tls | bool | true | Enable TLS |
redis.external.password | string | "" | Redis password |
redis.external.existingSecret | string | "" | Use existing K8s secret instead of password |
redis.external.secretKey | string | password | Key within the existing secret |
PgBouncer
pgbouncer.* — Connection pooling configuration
pgbouncer.* — Connection pooling configuration
helm/values.yaml:209-225. PgBouncer fronts PostgreSQL so the API tier shares a small pool against the database while individual Next.js request handlers can each open their own client connection.| Key | Type | Default | Description |
|---|---|---|---|
pgbouncer.enabled | bool | true | Deploy PgBouncer connection pooler |
pgbouncer.image.repository | string | edoburu/pgbouncer | PgBouncer image |
pgbouncer.image.tag | string | "1.22.0" | PgBouncer version |
pgbouncer.poolMode | string | transaction | Pool mode: transaction, session, statement |
pgbouncer.defaultPoolSize | int | 25 | Connections per user/database pair |
pgbouncer.maxClientConn | int | 1000 | Max client connections accepted from API/worker pods |
pgbouncer.maxDbConnections | int | 25 | Max server connections opened to PostgreSQL |
pgbouncer.resources.requests.memory | string | "64Mi" | Memory request |
pgbouncer.resources.requests.cpu | string | "100m" | CPU request |
pgbouncer.resources.limits.memory | string | "256Mi" | Memory limit |
pgbouncer.resources.limits.cpu | string | "500m" | CPU limit |
transaction mode) is recommended for Next.js applications. It allows multiple clients to share database connections between transactions, significantly reducing the number of connections to PostgreSQL.Ingress
ingress.* — Ingress and TLS configuration
ingress.* — Ingress and TLS configuration
| Key | Type | Default | Description |
|---|---|---|---|
ingress.enabled | bool | true | Create Ingress resource |
ingress.className | string | alb | Ingress class: alb (AWS) or nginx |
ingress.host | string | app.kaireon.com | Hostname for the application |
className: alb):| Key | Type | Default | Description |
|---|---|---|---|
ingress.aws.certificateArn | string | "" | ACM certificate ARN for HTTPS |
ingress.aws.scheme | string | internet-facing | internet-facing or internal |
ingress.aws.targetType | string | ip | ip (Fargate/CNI) or instance |
ingress.aws.wafAclArn | string | "" | Optional WAF WebACL ARN |
className: nginx):| Key | Type | Default | Description |
|---|---|---|---|
ingress.tls.enabled | bool | true | Enable TLS |
ingress.tls.secretName | string | kaireon-tls | TLS secret name |
ingress.tls.clusterIssuer | string | letsencrypt-prod | cert-manager ClusterIssuer |
ingress.annotations | object | {} | Additional Ingress annotations |
DNS
externalDns.* — Route53 via external-dns
externalDns.* — Route53 via external-dns
helm/values.yaml:246-249. When enabled, the chart adds the annotations external-dns expects on the Ingress so Route53 records track the cluster automatically. The companion external-dns controller is not installed by this chart — operators run it once per cluster from the upstream Helm chart.| Key | Type | Default | Description |
|---|---|---|---|
externalDns.enabled | bool | false | Auto-create Route53 records for the Ingress hostname |
externalDns.hostedZoneId | string | "" | Route53 hosted zone ID, e.g. Z0123456789ABCDEF |
externalDns.txtOwnerId | string | kaireon | TXT record owner — disambiguates ownership when several clusters share a zone |
Monitoring
monitoring.* — Prometheus and Grafana
monitoring.* — Prometheus and Grafana
| Key | Type | Default | Description |
|---|---|---|---|
monitoring.prometheus.enabled | bool | true | Deploy Prometheus |
monitoring.prometheus.image.tag | string | v2.51.0 | Prometheus version |
monitoring.prometheus.retention | string | 7d | Metrics retention period |
monitoring.grafana.enabled | bool | true | Deploy Grafana with auto-provisioned dashboards |
monitoring.grafana.image.tag | string | "10.4.1" | Grafana version |
monitoring.grafana.adminUser | string | admin | Grafana admin username |
monitoring.grafana.adminPassword | string | "" | Grafana admin password |
Event Bus (Optional)
Setconfig.EVENT_PUBLISHER to activate an event bus backend. The default is redis which requires no extra configuration.
kafka.* / msk.* / eventbridge.* / kinesis.* — Event bus backends
kafka.* / msk.* / eventbridge.* / kinesis.* — Event bus backends
| Key | Type | Default | Description |
|---|---|---|---|
kafka.enabled | bool | false | Enable Kafka |
kafka.brokers | string | "" | Broker addresses (broker1:9092,broker2:9092) |
kafka.clientId | string | "kaireon-platform" | Kafka client ID |
kafka.tlsEnabled | bool | false | Enable TLS |
kafka.saslMechanism | string | "" | SASL auth: none, plain, scram-sha-256, scram-sha-512 |
| Key | Type | Default | Description |
|---|---|---|---|
msk.enabled | bool | false | Enable MSK |
msk.brokers | string | "" | MSK broker endpoints |
msk.region | string | "eu-west-2" | AWS region |
msk.authMode | string | "iam_role" | Auth: iam_role or sasl_scram |
| Key | Type | Default | Description |
|---|---|---|---|
eventbridge.enabled | bool | false | Enable EventBridge |
eventbridge.region | string | "eu-west-2" | AWS region |
eventbridge.busName | string | "kaireon-events" | Event bus name |
eventbridge.authMode | string | "iam_role" | Auth: iam_role or access_key |
| Key | Type | Default | Description |
|---|---|---|---|
kinesis.enabled | bool | false | Enable Kinesis |
kinesis.region | string | "eu-west-2" | AWS region |
kinesis.streamName | string | "kaireon-events" | Stream name |
kinesis.partitionKey | string | "tenantId" | Partition key |
Interaction Store (Optional)
Setconfig.INTERACTION_STORE to activate an alternative interaction history backend. The default is pg (PostgreSQL).
dynamodb.* / keyspaces.* / scylla.* — Interaction store backends
dynamodb.* / keyspaces.* / scylla.* — Interaction store backends
| Key | Type | Default | Description |
|---|---|---|---|
dynamodb.enabled | bool | false | Enable DynamoDB |
dynamodb.region | string | "eu-west-2" | AWS region |
dynamodb.tableName | string | "kaireon-interactions" | Table name |
dynamodb.authMode | string | "iam_role" | Auth: iam_role or access_key |
| Key | Type | Default | Description |
|---|---|---|---|
keyspaces.enabled | bool | false | Enable Keyspaces |
keyspaces.region | string | "eu-west-2" | AWS region |
keyspaces.keyspace | string | "kaireon" | Keyspace name |
| Key | Type | Default | Description |
|---|---|---|---|
scylla.enabled | bool | false | Enable ScyllaDB |
scylla.contactPoints | string | "" | Contact points (host1:9042,host2:9042) |
scylla.localDatacenter | string | "datacenter1" | Local datacenter |
scylla.keyspace | string | "kaireon" | Keyspace name |
scylla.replicationFactor | int | 3 | Replication factor |
Search Index (Optional)
Setconfig.SEARCH_INDEX to activate an alternative search backend. The default is pg (PostgreSQL tsvector).
opensearch.* — OpenSearch configuration
opensearch.* — OpenSearch configuration
| Key | Type | Default | Description |
|---|---|---|---|
opensearch.enabled | bool | false | Enable OpenSearch |
opensearch.nodeUrl | string | "" | OpenSearch endpoint |
opensearch.authMode | string | "basic" | Auth: basic or iam_role |
opensearch.indexPrefix | string | "kaireon-" | Index name prefix |
Large-Topology Aliases
The following keys appear only in the large-enterprise overlay athelm/values-large.yaml. The base chart (helm/values.yaml) does not define them — they are layered in via helm upgrade -f helm/values.yaml -f helm/values-large.yaml. Each is a high-level alias that the chart maps onto the per-backend sections above. See values-large.yaml — enterprise topology for the full overlay.
postgresExternalReplicas.* — Read-replica advisory count
postgresExternalReplicas.* — Read-replica advisory count
helm/values-large.yaml:163. A single integer that records how many read replicas the operator has provisioned alongside the primary. Advisory only — the chart does not itself create RDS replicas; consumers (e.g. the read-only analytics path) read this number when deciding whether to fan out reads.| Key | Type | Default | Description |
|---|---|---|---|
postgresExternalReplicas | int | 2 (large overlay) | Number of read replicas the chart caller has provisioned. Set to 0 when running primary-only. |
postgresExternalReplicas is meaningful only when database.mode: external (or, in the large overlay, postgres.mode: external). For internal-StatefulSet deployments leave it unset.eventbus.* — Event-bus backend selector
eventbus.* — Event-bus backend selector
helm/values-large.yaml:173-175. A high-level alias that picks one of the per-backend blocks (kafka.*, msk.*, eventbridge.*, kinesis.*). The base chart instead drives this from config.EVENT_PUBLISHER (redis | kafka | msk | eventbridge | kinesis); the large overlay adds the alias for callers who prefer to read the choice as a single key. Either form is honoured — the operator should not set both unless the values agree.| Key | Type | Default | Description |
|---|---|---|---|
eventbus.enabled | bool | true (large overlay) | Whether an external event bus is in use. When false, the chart falls back to config.EVENT_PUBLISHER: redis. |
eventbus.backend | string | kafka (large overlay) | One of kafka, kinesis, pulsar. Selecting a backend here still requires the matching per-backend block (e.g. kafka.brokers) to be filled in. |
interactionStore.* — Interaction-store backend selector
interactionStore.* — Interaction-store backend selector
helm/values-large.yaml:177-178. The large-overlay alias for config.INTERACTION_STORE. Selects which write path receives the partitioned InteractionHistory rows. Each backend still requires its own per-backend block (dynamodb.*, scylla.*, keyspaces.*) to be filled in.| Key | Type | Default | Description |
|---|---|---|---|
interactionStore.backend | string | dynamodb (large overlay) | One of pg, dynamodb, scylla, cassandra (mid-tier). The base chart drives the same choice from config.INTERACTION_STORE. |
Grafana Dashboards
The chart includes 6 pre-built Grafana dashboards that are auto-provisioned fromhelm/dashboards/. When monitoring.grafana.enabled=true, these dashboards are available immediately after deployment.
| Dashboard | File | Key Panels |
|---|---|---|
| API Overview | api-overview.json | Request rates, error rates, latency percentiles (p50/p95/p99), HTTP status breakdown |
| Decision Engine | decision-engine.json | Pipeline stage durations, candidate counts, scoring latency, cache hit rates |
| Decision Performance | decision-performance.json | Scoring model performance, qualification rates, conversion tracking, uplift metrics |
| Infrastructure | infrastructure.json | CPU and memory utilization, disk I/O, network throughput, pod restarts |
| Model Health | model-health.json | Model AUC tracking, drift detection, retraining triggers, prediction distributions |
| Worker Queues | worker-queues.json | Queue depth, processing rates, job durations, DLQ counts, retry rates |
/api/v1/metrics. The Prometheus deployment is pre-configured to scrape this endpoint. Key metrics include kaireon_decisions_total, kaireon_decision_latency_ms, kaireon_pipeline_executions_total, and kaireon_api_requests_total.