Skip to main content
KaireonAI is a Next-Best-Action decisioning platform deployed on Kubernetes via Helm. This guide covers three installation modes ranging from a quick local demo to a fully managed production stack on AWS.
ModeTargetInfrastructureTime
Dev/DemoLocal / CIIn-cluster PostgreSQL (StatefulSet); Redis optional~5 min
App-OnlyBYO InfraYou provide RDS; Redis/Kafka/OpenSearch optional (configured via UI)~10 min
Full StackProduction AWSTerraform creates VPC, EKS, RDS, S3; optional ElastiCache, MSK; Helm installs~30 min

1. Prerequisites

Required Tools

ToolMinimum VersionPurpose
kubectl1.27+Kubernetes CLI
helm3.12+Chart installation
makeanyOrchestrates install targets
bash4.0+Install scripts
opensslanySecret generation

Additional Tools by Mode

ToolVersionRequired For
terraform1.3+Full Stack mode only
aws-cli2.xFull Stack mode and AWS Secrets Manager
docker24+Building custom images (optional)

Cluster Requirements

  • Dev/Demo: Any Kubernetes cluster (Docker Desktop, kind, minikube, or a remote cluster). Minimum 2 CPU / 4 GB RAM available.
  • App-Only: An existing Kubernetes cluster with access to your managed database and cache.
  • Full Stack: An AWS account with permissions to create VPC, EKS, RDS, ElastiCache, S3, IAM, and Route 53 resources.

Verify Prerequisites

make preflight
This runs scripts/preflight.sh and validates that required tools are installed and cluster connectivity works.

2. Quick Start (Dev/Demo)

The dev mode deploys everything inside the cluster. Only PostgreSQL is required — Redis and other infrastructure are optional and can be configured later from the Settings > Integrations UI.
# Clone the repository
git clone https://github.com/OWNER/kaireonai.git
cd kaireonai

# Install in dev mode
make install-dev
This command:
  1. Creates the kaireon namespace.
  2. Deploys an in-cluster PostgreSQL 16 StatefulSet (10 Gi volume).
  3. Deploys the KaireonAI API (1 replica) and Worker (1 replica).
  4. Runs database schema sync automatically.
Note: Redis, Kafka, OpenSearch, and other infrastructure are not deployed by default. The platform runs entirely on PostgreSQL. To add caching, event streaming, or search indexing, configure them from Settings > Integrations after installation.

Access the Application

Port-forward the API service to your local machine:
kubectl port-forward -n kaireon svc/kaireon-api 3000:3000
Open http://localhost:3000 in your browser.

Load Demo Data (Optional)

make seed

Override the Namespace

make install-dev NAMESPACE=kaireon-demo

3. App-Only Install

Use this mode when you already have a managed PostgreSQL database (e.g., Amazon RDS, Cloud SQL). KaireonAI installs only the application components into your existing cluster. Redis and other infrastructure are optional — configure them from Settings > Integrations after installation.

Step 1 — Create a Configuration File

Create a file named kaireon-config.yaml (or any name you prefer):
# kaireon-config.yaml -- App-Only example

database:
  mode: external
  external:
    host: my-rds-instance.abc123.us-east-1.rds.amazonaws.com
    port: 5432
    name: kaireon
    username: kaireon_app
    password: ""                # Leave empty if using existingSecret
    sslMode: require
    existingSecret: kaireon-db   # K8s secret containing the password
    secretKey: password

# Redis is optional — configure via Settings > Integrations UI if needed.
# Uncomment below to provide Redis at install time:
# redis:
#   mode: external
#   external:
#     host: my-redis.abc123.0001.use1.cache.amazonaws.com
#     port: 6379
#     tls: true
#     password: ""
#     existingSecret: kaireon-redis
#     secretKey: password

secrets:
  provider: kubernetes
  NEXTAUTH_SECRET: ""           # Will be auto-generated if empty
  JWT_SIGNING_SECRET: ""
  CONNECTOR_ENCRYPTION_KEY: ""

ingress:
  enabled: true
  className: nginx
  host: app.example.com
  tls:
    enabled: true
    secretName: kaireon-tls
    clusterIssuer: letsencrypt-prod

api:
  replicas: 3
  env:
    NEXTAUTH_URL: "https://app.example.com"

Step 2 — Create Kubernetes Secrets (If Using existingSecret)

kubectl create namespace kaireon

kubectl create secret generic kaireon-db \
  -n kaireon \
  --from-literal=password='YOUR_DB_PASSWORD'

# Only needed if you configured Redis in your config file:
# kubectl create secret generic kaireon-redis \
#   -n kaireon \
#   --from-literal=password='YOUR_REDIS_PASSWORD'

Step 3 — Install

make install-app CONFIG=kaireon-config.yaml
This deploys the API, Worker, PgBouncer, and optionally Prometheus/Grafana into the target namespace. Database schema sync runs automatically.

4. Full Stack Install

The full-stack mode provisions all AWS infrastructure with Terraform, then deploys the application with Helm.

Step 1 — Configure AWS Credentials

aws configure
# or export AWS_PROFILE=your-profile
Ensure the IAM principal has broad permissions (VPC, EKS, RDS, ElastiCache, S3, IAM, Route 53, CloudWatch). An AdministratorAccess policy works for initial setup; scope down for production.

Step 2 — Set Terraform Variables

cd terraform/environments/production
Edit variables.tf defaults or create a terraform.tfvars file:
# terraform.tfvars
aws_region        = "us-east-1"
environment       = "production"
project_name      = "kaireon"
domain            = "app.kaireon.com"      # Leave empty to skip DNS/TLS
rds_multi_az      = true
eks_node_max_size = 6
cloudwatch_enabled = true
cloudwatch_email  = "ops@example.com"

Step 3 — Initialize and Apply Terraform

terraform init
terraform plan -out=tfplan
terraform apply tfplan
Terraform creates:
  • A VPC with public and private subnets across availability zones.
  • An EKS cluster with a managed node group (t3.large, min 2 / max 6 nodes).
  • An RDS PostgreSQL instance in private subnets.
  • An ElastiCache Redis cluster in private subnets.
  • An S3 bucket for backups.
  • IAM roles with IRSA for pod-level AWS access.
  • Route 53 records and an ACM certificate (if domain is set).
  • CloudWatch alarms and dashboards (if enabled).

Step 4 — Configure kubectl

aws eks update-kubeconfig \
  --region us-east-1 \
  --name $(terraform output -raw cluster_name)

Step 5 — Install the Application

Return to the repository root and run:
cd /path/to/kaireonai
make install
The install script reads Terraform outputs and generates the Helm values automatically, setting database.mode: external, redis.mode: external, and secrets.provider: aws-secrets-manager.

5. Post-Install Verification

Check Deployment Status

make status
Expected output:
=== Pods ===
NAME                           READY   STATUS    RESTARTS   AGE
kaireon-api-7b8c9d6e5f-abc12   1/1     Running   0          2m
kaireon-api-7b8c9d6e5f-def34   1/1     Running   0          2m
kaireon-api-7b8c9d6e5f-ghi56   1/1     Running   0          2m
kaireon-worker-4a5b6c7d-jkl78  1/1     Running   0          2m
kaireon-pgbouncer-8e9f0a1b-x9  1/1     Running   0          2m
kaireon-prometheus-0            1/1     Running   0          2m
kaireon-grafana-5c6d7e8f-mn01  1/1     Running   0          2m

=== Services ===
NAME               TYPE        CLUSTER-IP      PORT(S)
kaireon-api         ClusterIP   10.100.45.12    3000/TCP
kaireon-pgbouncer   ClusterIP   10.100.45.13    5432/TCP

=== Health Check ===
{"status":"healthy","version":"1.0.0"}

Health Check Endpoints

EndpointPurpose
GET /api/healthLiveness check (HTTP 200 if process is running)
GET /api/health?detail=trueReadiness check (verifies DB and Redis connectivity)

Access Grafana

# Port-forward Grafana
kubectl port-forward -n kaireon svc/kaireon-grafana 3001:3000

# Open http://localhost:3001
# Default credentials: admin / (value of monitoring.grafana.adminPassword or auto-generated)
Grafana ships with pre-configured dashboards for API latency, worker queue depth, database connections, and Redis memory.

Access Prometheus

kubectl port-forward -n kaireon svc/kaireon-prometheus 9090:9090

Tail Logs

make logs          # API logs
make logs-worker   # Worker logs

6. Configuration Reference

The Helm chart is configured through values.yaml or a custom config file passed via CONFIG=. Below is a complete reference of all options.

Global

KeyTypeDefaultDescription
namespacestringkaireonKubernetes namespace for all resources

API

KeyTypeDefaultDescription
api.image.repositorystringghcr.io/OWNER/kaireon-apiContainer image repository
api.image.tagstringlatestImage tag
api.image.pullPolicystringIfNotPresentImage pull policy
api.replicasint3Number of API replicas
api.resources.requests.memorystring512MiMemory request
api.resources.requests.cpustring500mCPU request
api.resources.limits.memorystring2GiMemory limit
api.resources.limits.cpustring2000mCPU limit
api.hpa.enabledbooltrueEnable Horizontal Pod Autoscaler
api.hpa.minReplicasint3HPA minimum replicas
api.hpa.maxReplicasint20HPA maximum replicas
api.hpa.targetCPUUtilizationint70Target CPU % for scaling
api.env.NEXTAUTH_URLstringhttps://app.kaireon.comPublic URL for NextAuth callbacks

Worker

KeyTypeDefaultDescription
worker.image.repositorystringghcr.io/OWNER/kaireon-workerContainer image repository
worker.image.tagstringlatestImage tag
worker.replicasint2Number of worker replicas
worker.resources.requests.memorystring1GiMemory request
worker.resources.requests.cpustring1000mCPU request
worker.resources.limits.memorystring4GiMemory limit
worker.resources.limits.cpustring4000mCPU limit
worker.keda.enabledbooltrueEnable KEDA autoscaling
worker.keda.minReplicasint1KEDA minimum replicas
worker.keda.maxReplicasint10KEDA maximum replicas
worker.keda.queueThresholdstring"5"Queue length threshold for scaling

Config

KeyTypeDefaultDescription
config.LOG_LEVELstringinfoApplication log level (debug, info, warn, error)
config.WORKER_CONCURRENCYstring"5"Max concurrent jobs per worker
config.NODE_ENVstringproductionNode environment
config.EVENT_PUBLISHERstringnoneEvent bus backend (none, kafka, redpanda)
config.CACHE_STOREstringnoneCache backend (none, redis, dragonfly)
config.INTERACTION_STOREstringpgInteraction storage backend
config.SEARCH_INDEXstringpgSearch index backend

Database

KeyTypeDefaultDescription
database.modestringinternalinternal (StatefulSet) or external (managed RDS)
database.internal.imagestringpostgres:16-alpinePostgreSQL image for internal mode
database.internal.storagestring10GiPVC size
database.internal.resources.*objectsee values.yamlResource requests/limits
database.internal.usernamestringkaireonDatabase user
database.internal.passwordstring""Password (auto-generated if empty)
database.internal.databasestringkaireonDatabase name
database.external.hoststring""RDS/managed DB hostname
database.external.portint5432Database port
database.external.namestringkaireonDatabase name
database.external.usernamestring""Database user
database.external.passwordstring""Password (use existingSecret instead for production)
database.external.sslModestringrequireSSL mode (disable, require, verify-full)
database.external.existingSecretstring""Name of a K8s Secret containing the password
database.external.secretKeystringpasswordKey within the K8s Secret

Redis

KeyTypeDefaultDescription
redis.modestringinternalinternal (StatefulSet) or external (ElastiCache)
redis.internal.image.repositorystringredisRedis image
redis.internal.image.tagstring7-alpineRedis image tag
redis.internal.storagestring10GiPVC size
redis.internal.maxmemorystring512mbRedis maxmemory setting
redis.external.hoststring""ElastiCache/managed Redis hostname
redis.external.portint6379Redis port
redis.external.tlsbooltrueEnable TLS for external Redis
redis.external.passwordstring""Password (use existingSecret for production)
redis.external.existingSecretstring""K8s Secret name
redis.external.secretKeystringpasswordKey within the K8s Secret

Secrets

KeyTypeDefaultDescription
secrets.providerstringkuberneteskubernetes or aws-secrets-manager
secrets.NEXTAUTH_SECRETstring""NextAuth session secret (auto-generated if empty)
secrets.JWT_SIGNING_SECRETstring""JWT signing key
secrets.CONNECTOR_ENCRYPTION_KEYstring""Encryption key for connector credentials
secrets.awsSecretsManager.secretNamestringkaireon/productionAWS Secrets Manager secret name
secrets.awsSecretsManager.roleArnstring""IAM role ARN for IRSA
secrets.awsSecretsManager.regionstringus-east-1AWS region

Ingress

KeyTypeDefaultDescription
ingress.enabledbooltrueCreate an Ingress resource
ingress.classNamestringnginxIngress class (nginx, alb, traefik)
ingress.hoststringapp.kaireon.comHostname for the application
ingress.tls.enabledbooltrueEnable TLS termination
ingress.tls.secretNamestringkaireon-tlsTLS certificate secret name
ingress.tls.clusterIssuerstringletsencrypt-prodcert-manager ClusterIssuer
ingress.annotationsmapsee values.yamlAdditional Ingress annotations (rate limits, timeouts)

PgBouncer

KeyTypeDefaultDescription
pgbouncer.enabledbooltrueDeploy PgBouncer connection pooler
pgbouncer.poolModestringtransactionPool mode (session, transaction, statement)
pgbouncer.defaultPoolSizeint25Connections per user/database pair
pgbouncer.maxClientConnint1000Maximum client connections
pgbouncer.maxDbConnectionsint25Maximum server connections

Observability

KeyTypeDefaultDescription
observability.metrics.providerstringprometheusprometheus, cloudwatch, datadog, newrelic
observability.metrics.cloudwatch.enabledboolfalsePush metrics to CloudWatch
observability.metrics.cloudwatch.namespacestringKaireonAICloudWatch metric namespace
observability.metrics.datadog.enabledboolfalsePush metrics to Datadog
observability.metrics.datadog.apiKeystring""Datadog API key
observability.metrics.newrelic.enabledboolfalsePush metrics to New Relic
observability.metrics.newrelic.licenseKeystring""New Relic license key
observability.tracing.providerstringnonenone, jaeger, xray, otel-collector
observability.tracing.endpointstring""Tracing collector endpoint
observability.logging.providerstringstdoutstdout, cloudwatch-logs, datadog, elasticsearch
observability.logging.cloudwatch.logGroupstring""CloudWatch log group
observability.logging.cloudwatch.regionstring""CloudWatch region
observability.alerting.providerstringnonenone, pagerduty, slack, opsgenie, sns
observability.alerting.pagerduty.integrationKeystring""PagerDuty integration key
observability.alerting.slack.webhookUrlstring""Slack incoming webhook URL
observability.alerting.slack.channelstring""Slack channel
observability.alerting.sns.topicArnstring""SNS topic ARN

Monitoring (In-Cluster)

KeyTypeDefaultDescription
monitoring.prometheus.enabledbooltrueDeploy in-cluster Prometheus
monitoring.prometheus.retentionstring7dMetrics retention period
monitoring.grafana.enabledbooltrueDeploy in-cluster Grafana
monitoring.grafana.adminUserstringadminGrafana admin username
monitoring.grafana.adminPasswordstring""Grafana admin password (auto-generated if empty)

7. Upgrading

Rolling Upgrade

To upgrade KaireonAI to a new version:
# Pull the latest chart / image tags
git pull origin main

# Upgrade with existing values
make upgrade
This runs helm upgrade --reuse-values, which performs a rolling update with zero downtime. The API deployment uses a RollingUpdate strategy by default.

Upgrade with New Configuration

To change configuration during an upgrade, edit your config file and pass it:
helm upgrade kaireon ./helm/kaireon \
  -n kaireon \
  -f kaireon-config.yaml

Database Schema Sync

Schema changes are applied automatically during install and upgrade via prisma db push. This project uses Prisma 7’s push-based schema management (not migration files). To run manually:
make migrate
This executes scripts/migrate.sh inside the cluster, which runs npx prisma db push --skip-generate against the connected database.
Note: prisma db push compares the current schema.prisma against the live database and applies changes directly. There are no migration files to track.

Version Pinning

Pin the image tag in your config to control rollouts:
api:
  image:
    tag: "1.2.3"
worker:
  image:
    tag: "1.2.3"

8. Uninstalling

Remove the Application (Keep Infrastructure)

make uninstall
This runs helm uninstall and removes all KaireonAI pods, services, and config maps. Persistent volumes (database, Redis) are retained by default.

Remove Application and Namespace

make destroy
This removes the Helm release and deletes the entire namespace, including any persistent volume claims.

Destroy AWS Infrastructure (Full Stack Only)

After removing the application, destroy Terraform-managed resources:
cd terraform/environments/production
terraform destroy
Warning: This permanently deletes the VPC, EKS cluster, RDS database, ElastiCache cluster, and S3 bucket. Ensure you have backed up any data before proceeding.

Create a Backup Before Uninstalling

make backup
make uninstall
To restore later:
make restore FILE=path/to/backup.sql.gz

9. Troubleshooting FAQ

Pods stuck in Pending state

Symptom: kubectl get pods -n kaireon shows pods in Pending status. Causes and fixes:
  1. Insufficient cluster resources. Check node capacity:
    kubectl describe nodes | grep -A 5 "Allocated resources"
    
    Add more nodes or reduce resource requests in your config.
  2. PVC not binding (dev mode). The cluster may lack a default StorageClass:
    kubectl get storageclass
    
    If empty, install a provisioner (e.g., hostpath-provisioner for local clusters) or set one as default:
    kubectl patch storageclass standard -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
    
  3. Node selector or taint mismatch. Verify there are no taints preventing scheduling:
    kubectl describe pod <pod-name> -n kaireon | grep -A 10 Events
    

Database connection refused

Symptom: API pods crash with ECONNREFUSED or connection refused errors in logs. Fixes:
  1. Internal mode: Confirm the PostgreSQL pod is running:
    kubectl get pods -n kaireon -l app=kaireon-postgres
    
    Check its logs:
    kubectl logs -n kaireon -l app=kaireon-postgres
    
  2. External mode: Verify the host is reachable from inside the cluster:
    kubectl run dbtest --rm -it --image=postgres:16-alpine -n kaireon -- \
      pg_isready -h <DB_HOST> -p 5432
    
  3. Security groups (AWS). Ensure the RDS security group allows inbound on port 5432 from the EKS node security group. In full-stack mode, Terraform configures this automatically.
  4. PgBouncer misconfiguration. If PgBouncer is enabled, verify it can reach the database:
    kubectl logs -n kaireon -l app=kaireon-pgbouncer
    

Redis connection errors

Symptom: Workers fail to start or events are not published. Fixes:
  1. Internal mode: Confirm the Redis pod is running:
    kubectl get pods -n kaireon -l app=kaireon-redis
    
  2. External mode: Test connectivity:
    kubectl run redistest --rm -it --image=redis:7-alpine -n kaireon -- \
      redis-cli -h <REDIS_HOST> -p 6379 --tls PING
    
  3. TLS mismatch. If your external Redis does not use TLS, set redis.external.tls: false.
  4. Authentication. Ensure the password in your K8s secret matches the Redis AUTH password:
    kubectl get secret kaireon-redis -n kaireon -o jsonpath='{.data.password}' | base64 -d
    

Ingress not working / 404 errors

Symptom: The application is unreachable via the configured hostname. Fixes:
  1. Ingress controller not installed. Verify an ingress controller is running:
    kubectl get pods -n ingress-nginx
    
    If missing, install one:
    helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
    helm install ingress-nginx ingress-nginx/ingress-nginx -n ingress-nginx --create-namespace
    
  2. Ingress class mismatch. Ensure ingress.className matches your controller:
    kubectl get ingressclass
    
  3. DNS not pointing to the load balancer. Get the external IP:
    kubectl get svc -n ingress-nginx
    
    Create a DNS A/CNAME record pointing your domain to that address.
  4. TLS certificate not ready. Check cert-manager:
    kubectl get certificate -n kaireon
    kubectl describe certificate kaireon-tls -n kaireon
    

Helm install fails with “namespace not found”

Fix: Create the namespace first:
kubectl create namespace kaireon
Or let the chart create it (the chart includes a namespace.yaml template that handles this).

Migrations fail during install

Symptom: Install completes but the app shows schema errors. Fixes:
  1. Run migrations manually:
    make migrate
    
  2. Check migration logs:
    kubectl logs -n kaireon -l job-name=kaireon-migrate
    
  3. Verify database connectivity and permissions. The migration user needs CREATE TABLE, ALTER TABLE, and CREATE INDEX privileges.

OOMKilled pods

Symptom: Pods restart with OOMKilled status. Fix: Increase memory limits in your config:
api:
  resources:
    limits:
      memory: "4Gi"

worker:
  resources:
    limits:
      memory: "8Gi"
Then upgrade:
make upgrade

AWS Secrets Manager errors (Full Stack mode)

Symptom: Pods fail with “AccessDeniedException” when fetching secrets. Fixes:
  1. Verify the IRSA role ARN is correct in your config:
    terraform output helm_values
    
  2. Confirm the service account is annotated:
    kubectl get sa -n kaireon kaireon-api -o yaml | grep eks.amazonaws.com
    
  3. Check the IAM trust policy allows the OIDC provider for your cluster.

How to open a shell in the API pod

make shell
This drops you into /bin/sh inside the running API container, useful for debugging environment variables and network connectivity.

How to view all available Make targets

make help