Documentation Index
Fetch the complete documentation index at: https://docs.kaireonai.com/llms.txt
Use this file to discover all available pages before exploring further.
KaireonAI is a Next-Best-Action decisioning platform deployed on Kubernetes via Helm. This guide covers three installation modes ranging from a quick local demo to a fully managed production stack on AWS.
| Mode | Target | Infrastructure | Time |
|---|
| Dev/Demo | Local / CI | In-cluster PostgreSQL (StatefulSet); Redis optional | ~5 min |
| App-Only | BYO Infra | You provide RDS; Redis/Kafka/OpenSearch optional (configured via UI) | ~10 min |
| Full Stack | Production AWS | Terraform creates VPC, EKS, RDS, S3; optional ElastiCache, MSK; Helm installs | ~30 min |
1. Prerequisites
| Tool | Minimum Version | Purpose |
|---|
kubectl | 1.27+ | Kubernetes CLI |
helm | 3.12+ | Chart installation |
make | any | Orchestrates install targets |
bash | 4.0+ | Install scripts |
openssl | any | Secret generation |
| Tool | Version | Required For |
|---|
terraform | 1.3+ | Full Stack mode only |
aws-cli | 2.x | Full Stack mode and AWS Secrets Manager |
docker | 24+ | Building custom images (optional) |
Cluster Requirements
- Dev/Demo: Any Kubernetes cluster (Docker Desktop, kind, minikube, or a remote cluster). Minimum 2 CPU / 4 GB RAM available.
- App-Only: An existing Kubernetes cluster with access to your managed database and cache.
- Full Stack: An AWS account with permissions to create VPC, EKS, RDS, ElastiCache, S3, IAM, and Route 53 resources.
Verify Prerequisites
This runs scripts/preflight.sh and validates that required tools are installed and cluster connectivity works.
2. Quick Start (Dev/Demo)
The dev mode deploys everything inside the cluster. Only PostgreSQL is required — Redis and other infrastructure are optional and can be configured later from the Settings > Integrations UI.
# Clone the repository
git clone https://github.com/OWNER/kaireonai.git
cd kaireonai
# Install in dev mode
make install-dev
This command:
- Creates the
kaireon namespace.
- Deploys an in-cluster PostgreSQL 16 StatefulSet (10 Gi volume).
- Deploys the KaireonAI API (1 replica) and Worker (1 replica).
- Runs database schema sync automatically.
Note: Redis, Kafka, OpenSearch, and other infrastructure are not deployed by default. The platform runs entirely on PostgreSQL. To add caching, event streaming, or search indexing, configure them from Settings > Integrations after installation.
Access the Application
Port-forward the API service to your local machine:
kubectl port-forward -n kaireon svc/kaireon-api 3000:3000
Open http://localhost:3000 in your browser.
Load Demo Data (Optional)
Override the Namespace
make install-dev NAMESPACE=kaireon-demo
3. App-Only Install
Use this mode when you already have a managed PostgreSQL database (e.g., Amazon RDS, Cloud SQL). KaireonAI installs only the application components into your existing cluster. Redis and other infrastructure are optional — configure them from Settings > Integrations after installation.
Step 1 — Create a Configuration File
Create a file named kaireon-config.yaml (or any name you prefer):
# kaireon-config.yaml -- App-Only example
database:
mode: external
external:
host: my-rds-instance.abc123.us-east-1.rds.amazonaws.com
port: 5432
name: kaireon
username: kaireon_app
password: "" # Leave empty if using existingSecret
sslMode: require
existingSecret: kaireon-db # K8s secret containing the password
secretKey: password
# Redis is optional — configure via Settings > Integrations UI if needed.
# Uncomment below to provide Redis at install time:
# redis:
# mode: external
# external:
# host: my-redis.abc123.0001.use1.cache.amazonaws.com
# port: 6379
# tls: true
# password: ""
# existingSecret: kaireon-redis
# secretKey: password
secrets:
provider: kubernetes
NEXTAUTH_SECRET: "" # Will be auto-generated if empty
JWT_SIGNING_SECRET: ""
CONNECTOR_ENCRYPTION_KEY: ""
ingress:
enabled: true
className: nginx
host: app.example.com
tls:
enabled: true
secretName: kaireon-tls
clusterIssuer: letsencrypt-prod
api:
replicas: 3
env:
NEXTAUTH_URL: "https://app.example.com"
Step 2 — Create Kubernetes Secrets (If Using existingSecret)
kubectl create namespace kaireon
kubectl create secret generic kaireon-db \
-n kaireon \
--from-literal=password='YOUR_DB_PASSWORD'
# Only needed if you configured Redis in your config file:
# kubectl create secret generic kaireon-redis \
# -n kaireon \
# --from-literal=password='YOUR_REDIS_PASSWORD'
Step 3 — Install
make install-app CONFIG=kaireon-config.yaml
This deploys the API, Worker, PgBouncer, and optionally Prometheus/Grafana into the target namespace. Database schema sync runs automatically.
4. Full Stack Install
The full-stack mode provisions all AWS infrastructure with Terraform, then deploys the application with Helm.
aws configure
# or export AWS_PROFILE=your-profile
Ensure the IAM principal has broad permissions (VPC, EKS, RDS, ElastiCache, S3, IAM, Route 53, CloudWatch). An AdministratorAccess policy works for initial setup; scope down for production.
cd terraform/environments/production
Edit variables.tf defaults or create a terraform.tfvars file:
# terraform.tfvars
aws_region = "us-east-1"
environment = "production"
project_name = "kaireon"
domain = "app.kaireon.com" # Leave empty to skip DNS/TLS
rds_multi_az = true
eks_node_max_size = 6
cloudwatch_enabled = true
cloudwatch_email = "ops@example.com"
terraform init
terraform plan -out=tfplan
terraform apply tfplan
Terraform creates:
- A VPC with public and private subnets across availability zones.
- An EKS cluster with a managed node group (t3.large, min 2 / max 6 nodes).
- An RDS PostgreSQL instance in private subnets.
- An ElastiCache Redis cluster in private subnets.
- An S3 bucket for backups.
- IAM roles with IRSA for pod-level AWS access.
- Route 53 records and an ACM certificate (if
domain is set).
- CloudWatch alarms and dashboards (if enabled).
aws eks update-kubeconfig \
--region us-east-1 \
--name $(terraform output -raw cluster_name)
Step 5 — Install the Application
Return to the repository root and run:
cd /path/to/kaireonai
make install
The install script reads Terraform outputs and generates the Helm values automatically, setting database.mode: external, redis.mode: external, and secrets.provider: aws-secrets-manager.
5. Post-Install Verification
Check Deployment Status
Expected output:
=== Pods ===
NAME READY STATUS RESTARTS AGE
kaireon-api-7b8c9d6e5f-abc12 1/1 Running 0 2m
kaireon-api-7b8c9d6e5f-def34 1/1 Running 0 2m
kaireon-api-7b8c9d6e5f-ghi56 1/1 Running 0 2m
kaireon-worker-4a5b6c7d-jkl78 1/1 Running 0 2m
kaireon-pgbouncer-8e9f0a1b-x9 1/1 Running 0 2m
kaireon-prometheus-0 1/1 Running 0 2m
kaireon-grafana-5c6d7e8f-mn01 1/1 Running 0 2m
=== Services ===
NAME TYPE CLUSTER-IP PORT(S)
kaireon-api ClusterIP 10.100.45.12 3000/TCP
kaireon-pgbouncer ClusterIP 10.100.45.13 5432/TCP
=== Health Check ===
{"status":"healthy","version":"1.0.0"}
Health Check Endpoints
| Endpoint | Purpose |
|---|
GET /api/health | Liveness check (HTTP 200 if process is running) |
GET /api/health?detail=true | Readiness check (verifies DB and Redis connectivity) |
Access Grafana
# Port-forward Grafana
kubectl port-forward -n kaireon svc/kaireon-grafana 3001:3000
# Open http://localhost:3001
# Default credentials: admin / (value of monitoring.grafana.adminPassword or auto-generated)
Grafana ships with pre-configured dashboards for API latency, worker queue depth, database connections, and Redis memory.
Access Prometheus
kubectl port-forward -n kaireon svc/kaireon-prometheus 9090:9090
Tail Logs
make logs # API logs
make logs-worker # Worker logs
Install provenance bundle signing (cosign) — required for production
KaireonAI ships every /api/v1/decisions/:id/provenance response with a
detached cosign signature in the X-Provenance-Signature header.
Until you install a signing key the header reads unsigned and
downstream verifiers will reject the bundle. The fail-soft is
intentional (a missing key never breaks the response) but is not a
production posture — install before going live.
There are exactly two supported install paths. Pick whichever matches
your topology:
- Cloud (AWS Secrets Manager) — recommended for App Runner, ECS,
EKS, or any AWS-resident deployment. Operator runbook with the
exact commands lives at
tools/runbooks/cosign-key-rollout.md.
- Self-host (local key file) — for VM, on-prem, Docker Compose, or
single-host installs.
End-to-end install steps + verification commands are documented at
Provenance signing install guide.
Both paths set the same two env vars (COSIGN_KEY, COSIGN_PASSWORD)
on the runtime container — what differs is where those values live at
rest. The public verification key for KaireonAI’s hosted
playground.kaireonai.com instance is published at
kaireonai-docs/security/cosign.pub.
6. Configuration Reference
The Helm chart is configured through values.yaml or a custom config file passed via CONFIG=. Below is a complete reference of all options.
Global
| Key | Type | Default | Description |
|---|
namespace | string | kaireon | Kubernetes namespace for all resources |
API
| Key | Type | Default | Description |
|---|
api.image.repository | string | ghcr.io/OWNER/kaireon-api | Container image repository |
api.image.tag | string | latest | Image tag |
api.image.pullPolicy | string | IfNotPresent | Image pull policy |
api.replicas | int | 3 | Number of API replicas |
api.resources.requests.memory | string | 512Mi | Memory request |
api.resources.requests.cpu | string | 500m | CPU request |
api.resources.limits.memory | string | 2Gi | Memory limit |
api.resources.limits.cpu | string | 2000m | CPU limit |
api.hpa.enabled | bool | true | Enable Horizontal Pod Autoscaler |
api.hpa.minReplicas | int | 3 | HPA minimum replicas |
api.hpa.maxReplicas | int | 20 | HPA maximum replicas |
api.hpa.targetCPUUtilization | int | 70 | Target CPU % for scaling |
api.env.NEXTAUTH_URL | string | https://app.kaireon.com | Public URL for NextAuth callbacks |
Worker
| Key | Type | Default | Description |
|---|
worker.image.repository | string | ghcr.io/OWNER/kaireon-worker | Container image repository |
worker.image.tag | string | latest | Image tag |
worker.replicas | int | 2 | Number of worker replicas |
worker.resources.requests.memory | string | 1Gi | Memory request |
worker.resources.requests.cpu | string | 1000m | CPU request |
worker.resources.limits.memory | string | 4Gi | Memory limit |
worker.resources.limits.cpu | string | 4000m | CPU limit |
worker.keda.enabled | bool | true | Enable KEDA autoscaling |
worker.keda.minReplicas | int | 1 | KEDA minimum replicas |
worker.keda.maxReplicas | int | 10 | KEDA maximum replicas |
worker.keda.queueThreshold | string | "5" | Queue length threshold for scaling |
Config
| Key | Type | Default | Description |
|---|
config.LOG_LEVEL | string | info | Application log level (debug, info, warn, error) |
config.WORKER_CONCURRENCY | string | "5" | Max concurrent jobs per worker |
config.NODE_ENV | string | production | Node environment |
config.EVENT_PUBLISHER | string | none | Event bus backend (none, kafka, redpanda) |
config.CACHE_STORE | string | none | Cache backend (none, redis, dragonfly) |
config.INTERACTION_STORE | string | pg | Interaction storage backend |
config.SEARCH_INDEX | string | pg | Search index backend |
Database
| Key | Type | Default | Description |
|---|
database.mode | string | internal | internal (StatefulSet) or external (managed RDS) |
database.internal.image | string | postgres:16-alpine | PostgreSQL image for internal mode |
database.internal.storage | string | 10Gi | PVC size |
database.internal.resources.* | object | see values.yaml | Resource requests/limits |
database.internal.username | string | kaireon | Database user |
database.internal.password | string | "" | Password (auto-generated if empty) |
database.internal.database | string | kaireon | Database name |
database.external.host | string | "" | RDS/managed DB hostname |
database.external.port | int | 5432 | Database port |
database.external.name | string | kaireon | Database name |
database.external.username | string | "" | Database user |
database.external.password | string | "" | Password (use existingSecret instead for production) |
database.external.sslMode | string | require | SSL mode (disable, require, verify-full) |
database.external.existingSecret | string | "" | Name of a K8s Secret containing the password |
database.external.secretKey | string | password | Key within the K8s Secret |
Redis
| Key | Type | Default | Description |
|---|
redis.mode | string | internal | internal (StatefulSet) or external (ElastiCache) |
redis.internal.image.repository | string | redis | Redis image |
redis.internal.image.tag | string | 7-alpine | Redis image tag |
redis.internal.storage | string | 10Gi | PVC size |
redis.internal.maxmemory | string | 512mb | Redis maxmemory setting |
redis.external.host | string | "" | ElastiCache/managed Redis hostname |
redis.external.port | int | 6379 | Redis port |
redis.external.tls | bool | true | Enable TLS for external Redis |
redis.external.password | string | "" | Password (use existingSecret for production) |
redis.external.existingSecret | string | "" | K8s Secret name |
redis.external.secretKey | string | password | Key within the K8s Secret |
Secrets
| Key | Type | Default | Description |
|---|
secrets.provider | string | kubernetes | kubernetes or aws-secrets-manager |
secrets.NEXTAUTH_SECRET | string | "" | NextAuth session secret (auto-generated if empty) |
secrets.JWT_SIGNING_SECRET | string | "" | JWT signing key |
secrets.CONNECTOR_ENCRYPTION_KEY | string | "" | Encryption key for connector credentials |
secrets.awsSecretsManager.secretName | string | kaireon/production | AWS Secrets Manager secret name |
secrets.awsSecretsManager.roleArn | string | "" | IAM role ARN for IRSA |
secrets.awsSecretsManager.region | string | us-east-1 | AWS region |
Ingress
| Key | Type | Default | Description |
|---|
ingress.enabled | bool | true | Create an Ingress resource |
ingress.className | string | nginx | Ingress class (nginx, alb, traefik) |
ingress.host | string | app.kaireon.com | Hostname for the application |
ingress.tls.enabled | bool | true | Enable TLS termination |
ingress.tls.secretName | string | kaireon-tls | TLS certificate secret name |
ingress.tls.clusterIssuer | string | letsencrypt-prod | cert-manager ClusterIssuer |
ingress.annotations | map | see values.yaml | Additional Ingress annotations (rate limits, timeouts) |
PgBouncer
| Key | Type | Default | Description |
|---|
pgbouncer.enabled | bool | true | Deploy PgBouncer connection pooler |
pgbouncer.poolMode | string | transaction | Pool mode (session, transaction, statement) |
pgbouncer.defaultPoolSize | int | 25 | Connections per user/database pair |
pgbouncer.maxClientConn | int | 1000 | Maximum client connections |
pgbouncer.maxDbConnections | int | 25 | Maximum server connections |
Observability
| Key | Type | Default | Description |
|---|
observability.metrics.provider | string | prometheus | prometheus, cloudwatch, datadog, newrelic |
observability.metrics.cloudwatch.enabled | bool | false | Push metrics to CloudWatch |
observability.metrics.cloudwatch.namespace | string | KaireonAI | CloudWatch metric namespace |
observability.metrics.datadog.enabled | bool | false | Push metrics to Datadog |
observability.metrics.datadog.apiKey | string | "" | Datadog API key |
observability.metrics.newrelic.enabled | bool | false | Push metrics to New Relic |
observability.metrics.newrelic.licenseKey | string | "" | New Relic license key |
observability.tracing.provider | string | none | none, jaeger, xray, otel-collector |
observability.tracing.endpoint | string | "" | Tracing collector endpoint |
observability.logging.provider | string | stdout | stdout, cloudwatch-logs, datadog, elasticsearch |
observability.logging.cloudwatch.logGroup | string | "" | CloudWatch log group |
observability.logging.cloudwatch.region | string | "" | CloudWatch region |
observability.alerting.provider | string | none | none, pagerduty, slack, opsgenie, sns |
observability.alerting.pagerduty.integrationKey | string | "" | PagerDuty integration key |
observability.alerting.slack.webhookUrl | string | "" | Slack incoming webhook URL |
observability.alerting.slack.channel | string | "" | Slack channel |
observability.alerting.sns.topicArn | string | "" | SNS topic ARN |
Monitoring (In-Cluster)
| Key | Type | Default | Description |
|---|
monitoring.prometheus.enabled | bool | true | Deploy in-cluster Prometheus |
monitoring.prometheus.retention | string | 7d | Metrics retention period |
monitoring.grafana.enabled | bool | true | Deploy in-cluster Grafana |
monitoring.grafana.adminUser | string | admin | Grafana admin username |
monitoring.grafana.adminPassword | string | "" | Grafana admin password (auto-generated if empty) |
7. Upgrading
Rolling Upgrade
To upgrade KaireonAI to a new version:
# Pull the latest chart / image tags
git pull origin main
# Upgrade with existing values
make upgrade
This runs helm upgrade --reuse-values, which performs a rolling update with zero downtime. The API deployment uses a RollingUpdate strategy by default.
Upgrade with New Configuration
To change configuration during an upgrade, edit your config file and pass it:
helm upgrade kaireon ./helm/kaireon \
-n kaireon \
-f kaireon-config.yaml
Database Schema Sync
Schema changes are applied automatically during install and upgrade via prisma db push. This project uses Prisma 7’s push-based schema management (not migration files). To run manually:
This executes scripts/migrate.sh inside the cluster, which runs npx prisma db push --skip-generate against the connected database.
Note: prisma db push compares the current schema.prisma against the live database and applies changes directly. There are no migration files to track.
Version Pinning
Pin the image tag in your config to control rollouts:
api:
image:
tag: "1.2.3"
worker:
image:
tag: "1.2.3"
8. Uninstalling
Remove the Application (Keep Infrastructure)
This runs helm uninstall and removes all KaireonAI pods, services, and config maps. Persistent volumes (database, Redis) are retained by default.
Remove Application and Namespace
This removes the Helm release and deletes the entire namespace, including any persistent volume claims.
Destroy AWS Infrastructure (Full Stack Only)
After removing the application, destroy Terraform-managed resources:
cd terraform/environments/production
terraform destroy
Warning: This permanently deletes the VPC, EKS cluster, RDS database, ElastiCache cluster, and S3 bucket. Ensure you have backed up any data before proceeding.
Create a Backup Before Uninstalling
make backup
make uninstall
To restore later:
make restore FILE=path/to/backup.sql.gz
9. Troubleshooting FAQ
Pods stuck in Pending state
Symptom: kubectl get pods -n kaireon shows pods in Pending status.
Causes and fixes:
-
Insufficient cluster resources. Check node capacity:
kubectl describe nodes | grep -A 5 "Allocated resources"
Add more nodes or reduce resource requests in your config.
-
PVC not binding (dev mode). The cluster may lack a default StorageClass:
If empty, install a provisioner (e.g.,
hostpath-provisioner for local clusters) or set one as default:
kubectl patch storageclass standard -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
-
Node selector or taint mismatch. Verify there are no taints preventing scheduling:
kubectl describe pod <pod-name> -n kaireon | grep -A 10 Events
Database connection refused
Symptom: API pods crash with ECONNREFUSED or connection refused errors in logs.
Fixes:
-
Internal mode: Confirm the PostgreSQL pod is running:
kubectl get pods -n kaireon -l app=kaireon-postgres
Check its logs:
kubectl logs -n kaireon -l app=kaireon-postgres
-
External mode: Verify the host is reachable from inside the cluster:
kubectl run dbtest --rm -it --image=postgres:16-alpine -n kaireon -- \
pg_isready -h <DB_HOST> -p 5432
-
Security groups (AWS). Ensure the RDS security group allows inbound on port 5432 from the EKS node security group. In full-stack mode, Terraform configures this automatically.
-
PgBouncer misconfiguration. If PgBouncer is enabled, verify it can reach the database:
kubectl logs -n kaireon -l app=kaireon-pgbouncer
Redis connection errors
Symptom: Workers fail to start or events are not published.
Fixes:
-
Internal mode: Confirm the Redis pod is running:
kubectl get pods -n kaireon -l app=kaireon-redis
-
External mode: Test connectivity:
kubectl run redistest --rm -it --image=redis:7-alpine -n kaireon -- \
redis-cli -h <REDIS_HOST> -p 6379 --tls PING
-
TLS mismatch. If your external Redis does not use TLS, set
redis.external.tls: false.
-
Authentication. Ensure the password in your K8s secret matches the Redis AUTH password:
kubectl get secret kaireon-redis -n kaireon -o jsonpath='{.data.password}' | base64 -d
Ingress not working / 404 errors
Symptom: The application is unreachable via the configured hostname.
Fixes:
-
Ingress controller not installed. Verify an ingress controller is running:
kubectl get pods -n ingress-nginx
If missing, install one:
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm install ingress-nginx ingress-nginx/ingress-nginx -n ingress-nginx --create-namespace
-
Ingress class mismatch. Ensure
ingress.className matches your controller:
-
DNS not pointing to the load balancer. Get the external IP:
kubectl get svc -n ingress-nginx
Create a DNS A/CNAME record pointing your domain to that address.
-
TLS certificate not ready. Check cert-manager:
kubectl get certificate -n kaireon
kubectl describe certificate kaireon-tls -n kaireon
Helm install fails with “namespace not found”
Fix: Create the namespace first:
kubectl create namespace kaireon
Or let the chart create it (the chart includes a namespace.yaml template that handles this).
Migrations fail during install
Symptom: Install completes but the app shows schema errors.
Fixes:
-
Run migrations manually:
-
Check migration logs:
kubectl logs -n kaireon -l job-name=kaireon-migrate
-
Verify database connectivity and permissions. The migration user needs
CREATE TABLE, ALTER TABLE, and CREATE INDEX privileges.
OOMKilled pods
Symptom: Pods restart with OOMKilled status.
Fix: Increase memory limits in your config:
api:
resources:
limits:
memory: "4Gi"
worker:
resources:
limits:
memory: "8Gi"
Then upgrade:
AWS Secrets Manager errors (Full Stack mode)
Symptom: Pods fail with “AccessDeniedException” when fetching secrets.
Fixes:
-
Verify the IRSA role ARN is correct in your config:
terraform output helm_values
-
Confirm the service account is annotated:
kubectl get sa -n kaireon kaireon-api -o yaml | grep eks.amazonaws.com
-
Check the IAM trust policy allows the OIDC provider for your cluster.
How to open a shell in the API pod
This drops you into /bin/sh inside the running API container, useful for debugging environment variables and network connectivity.
How to view all available Make targets