| Mode | Target | Infrastructure | Time |
|---|---|---|---|
| Dev/Demo | Local / CI | In-cluster PostgreSQL (StatefulSet); Redis optional | ~5 min |
| App-Only | BYO Infra | You provide RDS; Redis/Kafka/OpenSearch optional (configured via UI) | ~10 min |
| Full Stack | Production AWS | Terraform creates VPC, EKS, RDS, S3; optional ElastiCache, MSK; Helm installs | ~30 min |
1. Prerequisites
Required Tools
| Tool | Minimum Version | Purpose |
|---|---|---|
kubectl | 1.27+ | Kubernetes CLI |
helm | 3.12+ | Chart installation |
make | any | Orchestrates install targets |
bash | 4.0+ | Install scripts |
openssl | any | Secret generation |
Additional Tools by Mode
| Tool | Version | Required For |
|---|---|---|
terraform | 1.3+ | Full Stack mode only |
aws-cli | 2.x | Full Stack mode and AWS Secrets Manager |
docker | 24+ | Building custom images (optional) |
Cluster Requirements
- Dev/Demo: Any Kubernetes cluster (Docker Desktop, kind, minikube, or a remote cluster). Minimum 2 CPU / 4 GB RAM available.
- App-Only: An existing Kubernetes cluster with access to your managed database and cache.
- Full Stack: An AWS account with permissions to create VPC, EKS, RDS, ElastiCache, S3, IAM, and Route 53 resources.
Verify Prerequisites
scripts/preflight.sh and validates that required tools are installed and cluster connectivity works.
2. Quick Start (Dev/Demo)
The dev mode deploys everything inside the cluster. Only PostgreSQL is required — Redis and other infrastructure are optional and can be configured later from the Settings > Integrations UI.- Creates the
kaireonnamespace. - Deploys an in-cluster PostgreSQL 16 StatefulSet (10 Gi volume).
- Deploys the KaireonAI API (1 replica) and Worker (1 replica).
- Runs database schema sync automatically.
Note: Redis, Kafka, OpenSearch, and other infrastructure are not deployed by default. The platform runs entirely on PostgreSQL. To add caching, event streaming, or search indexing, configure them from Settings > Integrations after installation.
Access the Application
Port-forward the API service to your local machine:Load Demo Data (Optional)
Override the Namespace
3. App-Only Install
Use this mode when you already have a managed PostgreSQL database (e.g., Amazon RDS, Cloud SQL). KaireonAI installs only the application components into your existing cluster. Redis and other infrastructure are optional — configure them from Settings > Integrations after installation.Step 1 — Create a Configuration File
Create a file namedkaireon-config.yaml (or any name you prefer):
Step 2 — Create Kubernetes Secrets (If Using existingSecret)
Step 3 — Install
4. Full Stack Install
The full-stack mode provisions all AWS infrastructure with Terraform, then deploys the application with Helm.Step 1 — Configure AWS Credentials
AdministratorAccess policy works for initial setup; scope down for production.
Step 2 — Set Terraform Variables
variables.tf defaults or create a terraform.tfvars file:
Step 3 — Initialize and Apply Terraform
- A VPC with public and private subnets across availability zones.
- An EKS cluster with a managed node group (t3.large, min 2 / max 6 nodes).
- An RDS PostgreSQL instance in private subnets.
- An ElastiCache Redis cluster in private subnets.
- An S3 bucket for backups.
- IAM roles with IRSA for pod-level AWS access.
- Route 53 records and an ACM certificate (if
domainis set). - CloudWatch alarms and dashboards (if enabled).
Step 4 — Configure kubectl
Step 5 — Install the Application
Return to the repository root and run:database.mode: external, redis.mode: external, and secrets.provider: aws-secrets-manager.
5. Post-Install Verification
Check Deployment Status
Health Check Endpoints
| Endpoint | Purpose |
|---|---|
GET /api/health | Liveness check (HTTP 200 if process is running) |
GET /api/health?detail=true | Readiness check (verifies DB and Redis connectivity) |
Access Grafana
Access Prometheus
Tail Logs
6. Configuration Reference
The Helm chart is configured throughvalues.yaml or a custom config file passed via CONFIG=. Below is a complete reference of all options.
Global
| Key | Type | Default | Description |
|---|---|---|---|
namespace | string | kaireon | Kubernetes namespace for all resources |
API
| Key | Type | Default | Description |
|---|---|---|---|
api.image.repository | string | ghcr.io/OWNER/kaireon-api | Container image repository |
api.image.tag | string | latest | Image tag |
api.image.pullPolicy | string | IfNotPresent | Image pull policy |
api.replicas | int | 3 | Number of API replicas |
api.resources.requests.memory | string | 512Mi | Memory request |
api.resources.requests.cpu | string | 500m | CPU request |
api.resources.limits.memory | string | 2Gi | Memory limit |
api.resources.limits.cpu | string | 2000m | CPU limit |
api.hpa.enabled | bool | true | Enable Horizontal Pod Autoscaler |
api.hpa.minReplicas | int | 3 | HPA minimum replicas |
api.hpa.maxReplicas | int | 20 | HPA maximum replicas |
api.hpa.targetCPUUtilization | int | 70 | Target CPU % for scaling |
api.env.NEXTAUTH_URL | string | https://app.kaireon.com | Public URL for NextAuth callbacks |
Worker
| Key | Type | Default | Description |
|---|---|---|---|
worker.image.repository | string | ghcr.io/OWNER/kaireon-worker | Container image repository |
worker.image.tag | string | latest | Image tag |
worker.replicas | int | 2 | Number of worker replicas |
worker.resources.requests.memory | string | 1Gi | Memory request |
worker.resources.requests.cpu | string | 1000m | CPU request |
worker.resources.limits.memory | string | 4Gi | Memory limit |
worker.resources.limits.cpu | string | 4000m | CPU limit |
worker.keda.enabled | bool | true | Enable KEDA autoscaling |
worker.keda.minReplicas | int | 1 | KEDA minimum replicas |
worker.keda.maxReplicas | int | 10 | KEDA maximum replicas |
worker.keda.queueThreshold | string | "5" | Queue length threshold for scaling |
Config
| Key | Type | Default | Description |
|---|---|---|---|
config.LOG_LEVEL | string | info | Application log level (debug, info, warn, error) |
config.WORKER_CONCURRENCY | string | "5" | Max concurrent jobs per worker |
config.NODE_ENV | string | production | Node environment |
config.EVENT_PUBLISHER | string | none | Event bus backend (none, kafka, redpanda) |
config.CACHE_STORE | string | none | Cache backend (none, redis, dragonfly) |
config.INTERACTION_STORE | string | pg | Interaction storage backend |
config.SEARCH_INDEX | string | pg | Search index backend |
Database
| Key | Type | Default | Description |
|---|---|---|---|
database.mode | string | internal | internal (StatefulSet) or external (managed RDS) |
database.internal.image | string | postgres:16-alpine | PostgreSQL image for internal mode |
database.internal.storage | string | 10Gi | PVC size |
database.internal.resources.* | object | see values.yaml | Resource requests/limits |
database.internal.username | string | kaireon | Database user |
database.internal.password | string | "" | Password (auto-generated if empty) |
database.internal.database | string | kaireon | Database name |
database.external.host | string | "" | RDS/managed DB hostname |
database.external.port | int | 5432 | Database port |
database.external.name | string | kaireon | Database name |
database.external.username | string | "" | Database user |
database.external.password | string | "" | Password (use existingSecret instead for production) |
database.external.sslMode | string | require | SSL mode (disable, require, verify-full) |
database.external.existingSecret | string | "" | Name of a K8s Secret containing the password |
database.external.secretKey | string | password | Key within the K8s Secret |
Redis
| Key | Type | Default | Description |
|---|---|---|---|
redis.mode | string | internal | internal (StatefulSet) or external (ElastiCache) |
redis.internal.image.repository | string | redis | Redis image |
redis.internal.image.tag | string | 7-alpine | Redis image tag |
redis.internal.storage | string | 10Gi | PVC size |
redis.internal.maxmemory | string | 512mb | Redis maxmemory setting |
redis.external.host | string | "" | ElastiCache/managed Redis hostname |
redis.external.port | int | 6379 | Redis port |
redis.external.tls | bool | true | Enable TLS for external Redis |
redis.external.password | string | "" | Password (use existingSecret for production) |
redis.external.existingSecret | string | "" | K8s Secret name |
redis.external.secretKey | string | password | Key within the K8s Secret |
Secrets
| Key | Type | Default | Description |
|---|---|---|---|
secrets.provider | string | kubernetes | kubernetes or aws-secrets-manager |
secrets.NEXTAUTH_SECRET | string | "" | NextAuth session secret (auto-generated if empty) |
secrets.JWT_SIGNING_SECRET | string | "" | JWT signing key |
secrets.CONNECTOR_ENCRYPTION_KEY | string | "" | Encryption key for connector credentials |
secrets.awsSecretsManager.secretName | string | kaireon/production | AWS Secrets Manager secret name |
secrets.awsSecretsManager.roleArn | string | "" | IAM role ARN for IRSA |
secrets.awsSecretsManager.region | string | us-east-1 | AWS region |
Ingress
| Key | Type | Default | Description |
|---|---|---|---|
ingress.enabled | bool | true | Create an Ingress resource |
ingress.className | string | nginx | Ingress class (nginx, alb, traefik) |
ingress.host | string | app.kaireon.com | Hostname for the application |
ingress.tls.enabled | bool | true | Enable TLS termination |
ingress.tls.secretName | string | kaireon-tls | TLS certificate secret name |
ingress.tls.clusterIssuer | string | letsencrypt-prod | cert-manager ClusterIssuer |
ingress.annotations | map | see values.yaml | Additional Ingress annotations (rate limits, timeouts) |
PgBouncer
| Key | Type | Default | Description |
|---|---|---|---|
pgbouncer.enabled | bool | true | Deploy PgBouncer connection pooler |
pgbouncer.poolMode | string | transaction | Pool mode (session, transaction, statement) |
pgbouncer.defaultPoolSize | int | 25 | Connections per user/database pair |
pgbouncer.maxClientConn | int | 1000 | Maximum client connections |
pgbouncer.maxDbConnections | int | 25 | Maximum server connections |
Observability
| Key | Type | Default | Description |
|---|---|---|---|
observability.metrics.provider | string | prometheus | prometheus, cloudwatch, datadog, newrelic |
observability.metrics.cloudwatch.enabled | bool | false | Push metrics to CloudWatch |
observability.metrics.cloudwatch.namespace | string | KaireonAI | CloudWatch metric namespace |
observability.metrics.datadog.enabled | bool | false | Push metrics to Datadog |
observability.metrics.datadog.apiKey | string | "" | Datadog API key |
observability.metrics.newrelic.enabled | bool | false | Push metrics to New Relic |
observability.metrics.newrelic.licenseKey | string | "" | New Relic license key |
observability.tracing.provider | string | none | none, jaeger, xray, otel-collector |
observability.tracing.endpoint | string | "" | Tracing collector endpoint |
observability.logging.provider | string | stdout | stdout, cloudwatch-logs, datadog, elasticsearch |
observability.logging.cloudwatch.logGroup | string | "" | CloudWatch log group |
observability.logging.cloudwatch.region | string | "" | CloudWatch region |
observability.alerting.provider | string | none | none, pagerduty, slack, opsgenie, sns |
observability.alerting.pagerduty.integrationKey | string | "" | PagerDuty integration key |
observability.alerting.slack.webhookUrl | string | "" | Slack incoming webhook URL |
observability.alerting.slack.channel | string | "" | Slack channel |
observability.alerting.sns.topicArn | string | "" | SNS topic ARN |
Monitoring (In-Cluster)
| Key | Type | Default | Description |
|---|---|---|---|
monitoring.prometheus.enabled | bool | true | Deploy in-cluster Prometheus |
monitoring.prometheus.retention | string | 7d | Metrics retention period |
monitoring.grafana.enabled | bool | true | Deploy in-cluster Grafana |
monitoring.grafana.adminUser | string | admin | Grafana admin username |
monitoring.grafana.adminPassword | string | "" | Grafana admin password (auto-generated if empty) |
7. Upgrading
Rolling Upgrade
To upgrade KaireonAI to a new version:helm upgrade --reuse-values, which performs a rolling update with zero downtime. The API deployment uses a RollingUpdate strategy by default.
Upgrade with New Configuration
To change configuration during an upgrade, edit your config file and pass it:Database Schema Sync
Schema changes are applied automatically during install and upgrade viaprisma db push. This project uses Prisma 7’s push-based schema management (not migration files). To run manually:
scripts/migrate.sh inside the cluster, which runs npx prisma db push --skip-generate against the connected database.
Note:prisma db pushcompares the currentschema.prismaagainst the live database and applies changes directly. There are no migration files to track.
Version Pinning
Pin the image tag in your config to control rollouts:8. Uninstalling
Remove the Application (Keep Infrastructure)
helm uninstall and removes all KaireonAI pods, services, and config maps. Persistent volumes (database, Redis) are retained by default.
Remove Application and Namespace
Destroy AWS Infrastructure (Full Stack Only)
After removing the application, destroy Terraform-managed resources:Create a Backup Before Uninstalling
9. Troubleshooting FAQ
Pods stuck in Pending state
Symptom:kubectl get pods -n kaireon shows pods in Pending status.
Causes and fixes:
-
Insufficient cluster resources. Check node capacity:
Add more nodes or reduce resource requests in your config.
-
PVC not binding (dev mode). The cluster may lack a default StorageClass:
If empty, install a provisioner (e.g.,
hostpath-provisionerfor local clusters) or set one as default: -
Node selector or taint mismatch. Verify there are no taints preventing scheduling:
Database connection refused
Symptom: API pods crash withECONNREFUSED or connection refused errors in logs.
Fixes:
-
Internal mode: Confirm the PostgreSQL pod is running:
Check its logs:
-
External mode: Verify the host is reachable from inside the cluster:
- Security groups (AWS). Ensure the RDS security group allows inbound on port 5432 from the EKS node security group. In full-stack mode, Terraform configures this automatically.
-
PgBouncer misconfiguration. If PgBouncer is enabled, verify it can reach the database:
Redis connection errors
Symptom: Workers fail to start or events are not published. Fixes:-
Internal mode: Confirm the Redis pod is running:
-
External mode: Test connectivity:
-
TLS mismatch. If your external Redis does not use TLS, set
redis.external.tls: false. -
Authentication. Ensure the password in your K8s secret matches the Redis AUTH password:
Ingress not working / 404 errors
Symptom: The application is unreachable via the configured hostname. Fixes:-
Ingress controller not installed. Verify an ingress controller is running:
If missing, install one:
-
Ingress class mismatch. Ensure
ingress.classNamematches your controller: -
DNS not pointing to the load balancer. Get the external IP:
Create a DNS A/CNAME record pointing your domain to that address.
-
TLS certificate not ready. Check cert-manager:
Helm install fails with “namespace not found”
Fix: Create the namespace first:namespace.yaml template that handles this).
Migrations fail during install
Symptom: Install completes but the app shows schema errors. Fixes:-
Run migrations manually:
-
Check migration logs:
-
Verify database connectivity and permissions. The migration user needs
CREATE TABLE,ALTER TABLE, andCREATE INDEXprivileges.
OOMKilled pods
Symptom: Pods restart withOOMKilled status.
Fix: Increase memory limits in your config:
AWS Secrets Manager errors (Full Stack mode)
Symptom: Pods fail with “AccessDeniedException” when fetching secrets. Fixes:-
Verify the IRSA role ARN is correct in your config:
-
Confirm the service account is annotated:
- Check the IAM trust policy allows the OIDC provider for your cluster.
How to open a shell in the API pod
/bin/sh inside the running API container, useful for debugging environment variables and network connectivity.