Skip to main content
The KaireonAI Helm chart deploys the complete platform to any Kubernetes cluster — EKS, GKE, AKS, or self-managed. The chart includes the API server, worker, PostgreSQL, Redis, PgBouncer, Prometheus, Grafana, ingress, network policies, and RBAC resources.

Overview

The chart packages everything needed for a production KaireonAI deployment:
ComponentDescription
API DeploymentNext.js application with health checks and HPA
Worker DeploymentBullMQ background job processor for pipelines and model retraining
ML WorkerOptional Python/FastAPI service for AI-powered analysis
PostgreSQLInternal StatefulSet or external managed database
RedisInternal StatefulSet or external managed cache
PgBouncerConnection pooling for PostgreSQL
PrometheusMetrics collection with pre-configured scrape targets
Grafana6 auto-provisioned dashboards
IngressHTTPS with AWS ALB or nginx ingress controller
NetworkPoliciesPod-to-pod and egress traffic restrictions
RBACServiceAccounts, Roles, and RoleBindings

Prerequisites

  • Kubernetes 1.24+
  • Helm 3.x
  • kubectl configured for your cluster
  • Container images pushed to a registry accessible from the cluster

Quick Start

1

Add the chart

Clone the KaireonAI repository which includes the Helm chart in helm/:
git clone https://github.com/kaireonai/platform.git
cd platform
2

Install with minimal values

For a quick test deployment with everything in-cluster:
helm install kaireon ./helm \
  -f helm/values-minimal.yaml \
  -n kaireon --create-namespace --wait
3

Verify the deployment

kubectl get pods -n kaireon
kubectl get svc -n kaireon
Or use the deploy script which handles namespace creation and secret generation:
./scripts/deploy.sh

Deployment Modes

KaireonAI supports three deployment modes depending on your environment and requirements.
Everything runs inside the cluster with minimal resources. Uses the internal PostgreSQL and Redis StatefulSets. No external dependencies required.Best for: Local development, CI testing, quick demos.
# Using the deploy script
./scripts/deploy.sh

# Or directly with Helm
helm install kaireon ./helm -f helm/values-minimal.yaml -n kaireon --create-namespace
Resource footprint: Fits on a single t3.medium node (~2 vCPU, 4 GiB).
ServiceEstimated 7-Day Cost
EKS Control Plane$16.80
EC2 Node (1x t3.medium)$7.00
ALB$4.20
EBS Storage (~10 GiB)$1.00
Total~$29
Key overrides in values-minimal.yaml:
  • API and worker: 1 replica each
  • HPA and KEDA disabled
  • Prometheus and Grafana disabled
  • Reduced resource requests (256Mi memory, 250m CPU)
  • Smaller PVC sizes (5 GiB database, 2 GiB Redis)
Connect to managed services (RDS, ElastiCache, Cloud SQL, etc.) while running the application tier in Kubernetes. This is the recommended mode for production.Best for: Staging and production environments.
./scripts/deploy.sh --mode=app --config=my-config.yaml
Requires a config file specifying external database and Redis endpoints. See the Example Configurations section below.
Deploys everything including Prometheus and Grafana with 6 auto-provisioned dashboards. Can use either internal or external database and Redis.Best for: Production environments where you want the full observability stack deployed alongside the application.
helm upgrade --install kaireon ./helm -n kaireon --create-namespace --wait
The default values.yaml enables the full mode with 3 API replicas, 2 worker replicas, HPA, KEDA autoscaling, Prometheus, and Grafana.

Values Reference

All configurable values are defined in helm/values.yaml. The sections below document each configuration group.

Global

KeyTypeDefaultDescription
namespacestringkaireonKubernetes namespace for all resources

API

KeyTypeDefaultDescription
api.image.repositorystringECR repoContainer image repository
api.image.tagstring"082a1e2"Image tag
api.image.pullPolicystringIfNotPresentImage pull policy
api.replicasint3Number of API replicas
api.resources.requests.memorystring"512Mi"Memory request
api.resources.requests.cpustring"500m"CPU request
api.resources.limits.memorystring"2Gi"Memory limit
api.resources.limits.cpustring"2000m"CPU limit
api.hpa.enabledbooltrueEnable Horizontal Pod Autoscaler
api.hpa.minReplicasint3Minimum replicas
api.hpa.maxReplicasint20Maximum replicas
api.hpa.targetCPUUtilizationint70CPU target percentage for scaling
api.hpa.targetMemoryUtilizationint80Memory target percentage for scaling
api.env.NEXTAUTH_URLstring"https://app.kaireon.com"Public URL for NextAuth callbacks
The HPA includes scale-up/down behavior policies:
  • Scale up: stabilization window of 60s, add up to 4 pods per 60s
  • Scale down: stabilization window of 300s, remove up to 10% of pods per 60s

Worker

KeyTypeDefaultDescription
worker.image.repositorystringECR repoWorker image repository
worker.image.tagstring"082a1e2"Image tag
worker.replicasint2Number of worker replicas
worker.resources.requests.memorystring"1Gi"Memory request
worker.resources.requests.cpustring"1000m"CPU request
worker.resources.limits.memorystring"4Gi"Memory limit
worker.resources.limits.cpustring"4000m"CPU limit
worker.keda.enabledbooltrueEnable KEDA-based autoscaling
worker.keda.minReplicasint1Minimum worker replicas
worker.keda.maxReplicasint10Maximum worker replicas
worker.keda.queueThresholdstring"5"Queue depth threshold for scaling

ML Worker

KeyTypeDefaultDescription
mlWorker.enabledboolfalseDeploy the ML Worker
mlWorker.image.repositorystringECR repoML Worker image repository
mlWorker.image.tagstring"latest"Image tag
mlWorker.replicasint1Number of ML Worker replicas
mlWorker.resources.requests.memorystring"1Gi"Memory request
mlWorker.resources.requests.cpustring"500m"CPU request
mlWorker.resources.limits.memorystring"4Gi"Memory limit
mlWorker.resources.limits.cpustring"2000m"CPU limit
When enabled, the chart automatically injects ML_WORKER_URL into the API pods.

Config

KeyTypeDefaultDescription
config.LOG_LEVELstring"info"Log level: debug, info, warn, error
config.WORKER_CONCURRENCYstring"5"Concurrent jobs per worker pod
config.NODE_ENVstring"production"Node.js environment
config.EVENT_PUBLISHERstring"redis"Event bus backend: redis, kafka, msk, eventbridge, kinesis
config.INTERACTION_STOREstring"pg"Interaction history store: pg, dynamodb, keyspaces, scylla
config.SEARCH_INDEXstring"pg"Search index backend: pg, opensearch

Database

Set database.mode to control how PostgreSQL is provisioned:
  • internal — Deploys a PostgreSQL 16 StatefulSet inside the cluster
  • external — Connects to a managed database (RDS, Cloud SQL, Supabase, etc.)
Internal mode values:
KeyTypeDefaultDescription
database.internal.imagestringpostgres:16-alpinePostgreSQL image
database.internal.storagestring10GiPVC storage size
database.internal.usernamestringkaireonDatabase user
database.internal.passwordstring""Password (auto-generated 32-char random if empty)
database.internal.databasestringkaireonDatabase name
database.internal.resourcesobject256Mi/250m - 1Gi/1000mResource requests/limits
External mode values:
KeyTypeDefaultDescription
database.external.hoststring""Database hostname
database.external.portint5432Database port
database.external.namestringkaireonDatabase name
database.external.usernamestring""Database user
database.external.passwordstring""Database password
database.external.sslModestringrequireSSL mode: require, no-verify, disable
database.external.existingSecretstring""Use existing K8s secret instead of password
database.external.secretKeystringpasswordKey within the existing secret

Secrets

KeyTypeDefaultDescription
secrets.providerstringkubernetesSecrets backend
secrets.NEXTAUTH_SECRETstring""NextAuth session encryption key
secrets.JWT_SIGNING_SECRETstring""JWT signing key
secrets.CONNECTOR_ENCRYPTION_KEYstring""Encryption key for stored connector credentials
Generate secrets with openssl rand -base64 32. Never commit plaintext secrets to version control. Use --set flags, sealed secrets, or an external secrets manager in production.

Redis

Set redis.mode to control how Redis is provisioned:
  • internal — Deploys a Redis 7 StatefulSet inside the cluster
  • external — Connects to a managed Redis (ElastiCache, Upstash, etc.)
Internal mode values:
KeyTypeDefaultDescription
redis.internal.enabledbooltrueDeploy Redis StatefulSet
redis.internal.image.repositorystringredisRedis image
redis.internal.image.tagstring7-alpineRedis image tag
redis.internal.storagestring10GiPVC storage size
redis.internal.maxmemorystring"512mb"Redis max memory
redis.internal.resourcesobject256Mi/250m - 1Gi/1000mResource requests/limits
External mode values:
KeyTypeDefaultDescription
redis.external.hoststring""Redis hostname
redis.external.portint6379Redis port
redis.external.tlsbooltrueEnable TLS
redis.external.passwordstring""Redis password
redis.external.existingSecretstring""Use existing K8s secret instead of password
redis.external.secretKeystringpasswordKey within the existing secret

PgBouncer

KeyTypeDefaultDescription
pgbouncer.enabledbooltrueDeploy PgBouncer connection pooler
pgbouncer.image.repositorystringedoburu/pgbouncerPgBouncer image
pgbouncer.image.tagstring"1.22.0"PgBouncer version
pgbouncer.poolModestringtransactionPool mode: transaction, session, statement
pgbouncer.defaultPoolSizeint25Connections per user/database pair
pgbouncer.maxClientConnint1000Max client connections
pgbouncer.maxDbConnectionsint25Max server connections to PostgreSQL
Transaction pooling (transaction mode) is recommended for Next.js applications. It allows multiple clients to share database connections between transactions, significantly reducing the number of connections to PostgreSQL.

Ingress

KeyTypeDefaultDescription
ingress.enabledbooltrueCreate Ingress resource
ingress.classNamestringalbIngress class: alb (AWS) or nginx
ingress.hoststringapp.kaireon.comHostname for the application
AWS ALB Ingress (when className: alb):
KeyTypeDefaultDescription
ingress.aws.certificateArnstring""ACM certificate ARN for HTTPS
ingress.aws.schemestringinternet-facinginternet-facing or internal
ingress.aws.targetTypestringipip (Fargate/CNI) or instance
ingress.aws.wafAclArnstring""Optional WAF WebACL ARN
Nginx Ingress (when className: nginx):
KeyTypeDefaultDescription
ingress.tls.enabledbooltrueEnable TLS
ingress.tls.secretNamestringkaireon-tlsTLS secret name
ingress.tls.clusterIssuerstringletsencrypt-prodcert-manager ClusterIssuer
ingress.annotationsobject{}Additional Ingress annotations

DNS

KeyTypeDefaultDescription
externalDns.enabledboolfalseAuto-create Route53 records
externalDns.hostedZoneIdstring""Route53 hosted zone ID
externalDns.txtOwnerIdstringkaireonTXT record owner ID

Monitoring

KeyTypeDefaultDescription
monitoring.prometheus.enabledbooltrueDeploy Prometheus
monitoring.prometheus.image.tagstringv2.51.0Prometheus version
monitoring.prometheus.retentionstring7dMetrics retention period
monitoring.grafana.enabledbooltrueDeploy Grafana with auto-provisioned dashboards
monitoring.grafana.image.tagstring"10.4.1"Grafana version
monitoring.grafana.adminUserstringadminGrafana admin username
monitoring.grafana.adminPasswordstring""Grafana admin password
Set the Grafana admin password via --set or a sealed secret in production. Never leave it empty in a publicly accessible deployment.

Event Bus (Optional)

Set config.EVENT_PUBLISHER to activate an event bus backend. The default is redis which requires no extra configuration.
Kafka:
KeyTypeDefaultDescription
kafka.enabledboolfalseEnable Kafka
kafka.brokersstring""Broker addresses (broker1:9092,broker2:9092)
kafka.clientIdstring"kaireon-platform"Kafka client ID
kafka.tlsEnabledboolfalseEnable TLS
kafka.saslMechanismstring""SASL auth: none, plain, scram-sha-256, scram-sha-512
Amazon MSK:
KeyTypeDefaultDescription
msk.enabledboolfalseEnable MSK
msk.brokersstring""MSK broker endpoints
msk.regionstring"eu-west-2"AWS region
msk.authModestring"iam_role"Auth: iam_role or sasl_scram
Amazon EventBridge:
KeyTypeDefaultDescription
eventbridge.enabledboolfalseEnable EventBridge
eventbridge.regionstring"eu-west-2"AWS region
eventbridge.busNamestring"kaireon-events"Event bus name
eventbridge.authModestring"iam_role"Auth: iam_role or access_key
Amazon Kinesis:
KeyTypeDefaultDescription
kinesis.enabledboolfalseEnable Kinesis
kinesis.regionstring"eu-west-2"AWS region
kinesis.streamNamestring"kaireon-events"Stream name
kinesis.partitionKeystring"tenantId"Partition key

Interaction Store (Optional)

Set config.INTERACTION_STORE to activate an alternative interaction history backend. The default is pg (PostgreSQL).
DynamoDB:
KeyTypeDefaultDescription
dynamodb.enabledboolfalseEnable DynamoDB
dynamodb.regionstring"eu-west-2"AWS region
dynamodb.tableNamestring"kaireon-interactions"Table name
dynamodb.authModestring"iam_role"Auth: iam_role or access_key
Amazon Keyspaces:
KeyTypeDefaultDescription
keyspaces.enabledboolfalseEnable Keyspaces
keyspaces.regionstring"eu-west-2"AWS region
keyspaces.keyspacestring"kaireon"Keyspace name
ScyllaDB:
KeyTypeDefaultDescription
scylla.enabledboolfalseEnable ScyllaDB
scylla.contactPointsstring""Contact points (host1:9042,host2:9042)
scylla.localDatacenterstring"datacenter1"Local datacenter
scylla.keyspacestring"kaireon"Keyspace name
scylla.replicationFactorint3Replication factor

Search Index (Optional)

Set config.SEARCH_INDEX to activate an alternative search backend. The default is pg (PostgreSQL tsvector).
KeyTypeDefaultDescription
opensearch.enabledboolfalseEnable OpenSearch
opensearch.nodeUrlstring""OpenSearch endpoint
opensearch.authModestring"basic"Auth: basic or iam_role
opensearch.indexPrefixstring"kaireon-"Index name prefix

Grafana Dashboards

The chart includes 6 pre-built Grafana dashboards that are auto-provisioned from helm/dashboards/. When monitoring.grafana.enabled=true, these dashboards are available immediately after deployment.
DashboardFileKey Panels
API Overviewapi-overview.jsonRequest rates, error rates, latency percentiles (p50/p95/p99), HTTP status breakdown
Decision Enginedecision-engine.jsonPipeline stage durations, candidate counts, scoring latency, cache hit rates
Decision Performancedecision-performance.jsonScoring model performance, qualification rates, conversion tracking, uplift metrics
Infrastructureinfrastructure.jsonCPU and memory utilization, disk I/O, network throughput, pod restarts
Model Healthmodel-health.jsonModel AUC tracking, drift detection, retraining triggers, prediction distributions
Worker Queuesworker-queues.jsonQueue depth, processing rates, job durations, DLQ counts, retry rates
KaireonAI exposes Prometheus metrics at /api/v1/metrics. The Prometheus deployment is pre-configured to scrape this endpoint. Key metrics include kaireon_decisions_total, kaireon_decision_latency_ms, kaireon_pipeline_executions_total, and kaireon_api_requests_total.

Example Configurations

Dev / Testing (Minimal)

Uses internal PostgreSQL and Redis with minimal resources. No external dependencies needed.
values-minimal.yaml
api:
  replicas: 1
  resources:
    requests:
      memory: "256Mi"
      cpu: "250m"
    limits:
      memory: "1Gi"
      cpu: "1000m"
  hpa:
    enabled: false

worker:
  replicas: 1
  resources:
    requests:
      memory: "256Mi"
      cpu: "250m"
    limits:
      memory: "1Gi"
      cpu: "1000m"
  keda:
    enabled: false

config:
  WORKER_CONCURRENCY: "2"

database:
  internal:
    storage: 5Gi

redis:
  internal:
    storage: 2Gi
    maxmemory: "128mb"

monitoring:
  prometheus:
    enabled: false
  grafana:
    enabled: false

EKS with RDS + ElastiCache

Production deployment on AWS with managed database and cache services.
api:
  image:
    repository: <ACCOUNT_ID>.dkr.ecr.<REGION>.amazonaws.com/kaireon-api
    tag: "latest"
  replicas: 3
  env:
    NEXTAUTH_URL: "https://app.yourdomain.com"

ingress:
  enabled: true
  className: alb
  host: app.yourdomain.com
  aws:
    certificateArn: arn:aws:acm:us-east-1:123456789:certificate/abc-123

database:
  mode: external
  external:
    host: kaireon-db.abc123.us-east-1.rds.amazonaws.com
    port: 5432
    name: kaireon
    username: kaireon
    existingSecret: kaireon-db-creds

redis:
  mode: external
  external:
    host: kaireon-cache.abc123.0001.use1.cache.amazonaws.com
    port: 6379
    tls: true
    existingSecret: kaireon-redis-creds

GKE / Self-Managed with Nginx Ingress

Use nginx ingress controller with cert-manager for automatic TLS certificates.
values-gke.yaml
ingress:
  className: nginx
  host: app.yourdomain.com
  tls:
    enabled: true
    clusterIssuer: letsencrypt-prod

database:
  mode: external
  external:
    host: kaireon-db.us-central1.cloudsql.google.com
    username: kaireon
    existingSecret: kaireon-db-creds

redis:
  mode: external
  external:
    host: 10.0.0.5
    port: 6379
    tls: false

With Kafka and DynamoDB

Production deployment using Kafka for event streaming and DynamoDB for interaction history.
values-enterprise.yaml
config:
  EVENT_PUBLISHER: kafka
  INTERACTION_STORE: dynamodb

kafka:
  enabled: true
  brokers: "broker1:9092,broker2:9092"
  tlsEnabled: true
  saslMechanism: scram-sha-256
  saslUsername: kaireon-user
  saslPassword: ""  # Use --set kafka.saslPassword=...

dynamodb:
  enabled: true
  region: us-east-1
  tableName: kaireon-interactions
  authMode: iam_role

Upgrading

To upgrade an existing deployment to a new version:
# Build and push new images, then upgrade
helm upgrade kaireon ./helm \
  -n kaireon \
  --set api.image.tag=$(git rev-parse --short HEAD) \
  --set worker.image.tag=$(git rev-parse --short HEAD) \
  --wait
Database migrations run automatically on API pod startup via Prisma. Before upgrading, verify the migration is backward-compatible. If a migration requires downtime, scale down the API deployment first.
To roll back a failed upgrade:
# View release history
helm history kaireon -n kaireon

# Roll back to previous revision
helm rollback kaireon <REVISION> -n kaireon

Next Steps

Kubernetes Deployment

Architecture overview, troubleshooting, and database options.

ML Worker

Configure the ML Worker for AI-powered analysis features.

Infrastructure Backends

Configure Kafka, DynamoDB, OpenSearch, and other backends.

Cloud Deployment

One-click deployment to AWS App Runner.