Kubernetes (Helm)

KaireonAI includes a production-ready Helm chart for deploying to any Kubernetes cluster. This gives you full control over scaling, networking, monitoring, and security.

Prerequisites

Kubernetes cluster (1.24+)
Helm 3.x installed
kubectl configured for your cluster
PostgreSQL database (self-managed, RDS, or CloudNativePG)
Redis (self-managed, ElastiCache, or included via Helm)

What’s Included

The Helm chart in helm/ provides:

Resource	Description
API Deployment	Main Next.js application with health checks and HPA
Worker Deployment	Background job processor for pipelines and model retraining
ML Worker Deployment	Python/FastAPI service for scikit-learn analysis (optional)
ConfigMaps	Application configuration (non-sensitive)
Secrets	Database URLs, API keys, encryption keys
Ingress	HTTPS ingress with TLS termination (ALB or nginx)
HPA	Horizontal Pod Autoscaler for API pods
PodDisruptionBudget	Ensures availability during node maintenance
NetworkPolicies	Restrict pod-to-pod and egress traffic
ServiceMonitor	Prometheus metrics scraping

Quick Install

# Create namespace and secrets
kubectl create namespace kaireon

kubectl create secret generic kaireon-secrets \
  --namespace kaireon \
  --from-literal=DATABASE_URL='postgresql://user:pass@postgres:5432/kaireon' \
  --from-literal=REDIS_URL='redis://redis:6379' \
  --from-literal=NEXTAUTH_SECRET='your-secure-secret' \
  --from-literal=JWT_SIGNING_SECRET='your-jwt-secret-min-32-chars' \
  --from-literal=CONNECTOR_ENCRYPTION_KEY='your-32-byte-hex-key' \
  --from-literal=WEBHOOK_SIGNING_SECRET='your-webhook-secret' \
  --from-literal=API_KEY_PEPPER='your-api-key-pepper'

# Install the chart
helm install kaireon ./helm \
  --namespace kaireon \
  --set api.image.repository=<YOUR_ACCOUNT_ID>.dkr.ecr.<REGION>.amazonaws.com/kaireon-api \
  --set api.image.tag=latest \
  --set ingress.host=kaireon.example.com

With ML Worker

To include the ML Worker for AI-powered segmentation, policy analysis, and content intelligence:

helm install kaireon ./helm \
  --namespace kaireon \
  --set api.image.repository=<YOUR_ACCOUNT_ID>.dkr.ecr.<REGION>.amazonaws.com/kaireon-api \
  --set api.image.tag=latest \
  --set mlWorker.enabled=true \
  --set mlWorker.image.repository=<YOUR_ACCOUNT_ID>.dkr.ecr.<REGION>.amazonaws.com/kaireon-ml \
  --set mlWorker.image.tag=latest \
  --set ingress.host=kaireon.example.com

When mlWorker.enabled=true, the chart automatically injects ML_WORKER_URL into the API pods — no manual configuration needed.

Configuration

Key Helm values you can customize:

# values.yaml overrides
api:
  replicas: 3
  resources:
    requests:
      cpu: 500m
      memory: 512Mi
    limits:
      cpu: 2000m
      memory: 2Gi
  hpa:
    enabled: true
    minReplicas: 3
    maxReplicas: 20
    targetCPUUtilization: 70

worker:
  replicas: 2
  resources:
    requests:
      cpu: 1000m
      memory: 1Gi
    limits:
      cpu: 4000m
      memory: 4Gi

mlWorker:
  enabled: true
  replicas: 1
  resources:
    requests:
      cpu: 500m
      memory: 1Gi
    limits:
      cpu: 2000m
      memory: 4Gi

ingress:
  enabled: true
  className: alb
  host: app.kaireon.com
  aws:
    certificateArn: arn:aws:acm:us-east-1:...:certificate/...
    scheme: internet-facing

monitoring:
  prometheus:
    enabled: true
  grafana:
    enabled: true

Use --set to override any value:

helm install kaireon ./helm \
  --namespace kaireon \
  --set api.replicas=3 \
  --set mlWorker.enabled=true \
  --set monitoring.prometheus.enabled=true

Architecture

The API communicates with the ML Worker over an internal ClusterIP service (kaireon-ml-worker:8000). The ML Worker reads directly from PostgreSQL for schema data and analysis inputs.

Monitoring Stack

When monitoring.prometheus.enabled=true, the chart deploys:

Prometheus Metrics

KaireonAI exposes metrics at /api/v1/metrics:

Metric	Type	Description
`kaireon_decisions_total`	Counter	Total decisions made
`kaireon_decision_latency_ms`	Histogram	Decision latency distribution
`kaireon_pipeline_executions_total`	Counter	Pipeline run counts
`kaireon_api_requests_total`	Counter	API request counts by endpoint

Grafana Dashboards

Pre-built dashboards are included in helm/dashboards/:

API Overview — Request rates, latency percentiles, error rates
Decision Engine — Decision throughput, scoring latency, cache hit rates
Infrastructure Health — CPU, memory, pod restarts, network
Model Health — Model prediction distributions, feature drift
Worker Queues — Queue depth, processing times, failure rates

Database Options

Self-Managed PostgreSQL

Deploy PostgreSQL inside the cluster. The chart includes an internal PostgreSQL StatefulSet by default:

database:
  mode: internal
  internal:
    storage: 10Gi

Or use an operator like CloudNativePG:

helm install pg-operator cloudnative-pg/cloudnative-pg --namespace cnpg-system --create-namespace

Amazon RDS (External)

database:
  mode: external
  external:
    host: kaireon-db.cluster-abc.us-east-1.rds.amazonaws.com
    port: 5432
    name: kaireon
    username: admin
    existingSecret: kaireon-db-secret
    sslMode: require

Upgrading

# Update to a new image tag
helm upgrade kaireon ./helm \
  --namespace kaireon \
  --set api.image.tag=$(git rev-parse --short HEAD) \
  --set mlWorker.image.tag=$(git rev-parse --short HEAD)

Rate Limiting & Circuit Breakers

Rate limiting — KaireonAI protects API endpoints with a sliding-window rate limiter backed by Redis. You configure limits per endpoint via environment variables or platform settings.
Circuit breakers — External integrations (connectors, webhooks) use circuit breaker patterns to prevent cascade failures. States cycle: closed → open → half-open.

Troubleshooting

Secret creation errors ('already exists')

Kubernetes secrets are immutable by default once created. To update secrets, delete and recreate:

kubectl delete secret kaireon-secrets --namespace kaireon
kubectl create secret generic kaireon-secrets \
  --namespace kaireon \
  --from-literal=DATABASE_URL='...' \
  --from-literal=REDIS_URL='...' \
  --from-literal=NEXTAUTH_SECRET='...' \
  --from-literal=JWT_SIGNING_SECRET='...' \
  --from-literal=CONNECTOR_ENCRYPTION_KEY='...' \
  --from-literal=WEBHOOK_SIGNING_SECRET='...' \
  --from-literal=API_KEY_PEPPER='...'

Then restart the pods to pick up the new values:

kubectl rollout restart deployment kaireon-api --namespace kaireon

API pods OOMKilled or CrashLoopBackOff

The Next.js application requires at least 512Mi of memory. If pods are being OOMKilled, increase the memory limit:

api:
  resources:
    limits:
      memory: 2Gi

For the worker, allocate at least 1Gi. Check pod events for the specific reason:

kubectl describe pod -l app=kaireon-api --namespace kaireon

Health probe failures (readiness/liveness)

The API pods expose a health endpoint at /api/v1/metrics. If probes fail during startup, increase the initialDelaySeconds:

api:
  readinessProbe:
    initialDelaySeconds: 30
    periodSeconds: 10
  livenessProbe:
    initialDelaySeconds: 45
    periodSeconds: 15

Startup can take 15-30 seconds as the app validates environment variables and connects to PostgreSQL. Check pod logs if probes continue to fail:

kubectl logs -l app=kaireon-api --namespace kaireon --tail=50

ECR image pull errors ('ImagePullBackOff')

Ensure the node IAM role or service account has ecr:GetAuthorizationToken and ecr:BatchGetImage permissions. For EKS, verify that the OIDC provider is configured and the service account is annotated:

kubectl describe pod -l app=kaireon-api --namespace kaireon | grep -A5 Events

Cannot connect to PostgreSQL from pods

Verify the database is reachable from within the cluster. Common issues include missing VPC peering, security group rules, or incorrect hostnames. Test connectivity from a debug pod:

kubectl run pg-test --rm -it --image=postgres:16 --namespace kaireon -- \
  psql 'postgresql://user:pass@your-db-host:5432/kaireon' -c 'SELECT 1'

Next Steps

ML Worker

Configure the ML Worker for AI features.

Operations

Configure Prometheus metrics and Grafana dashboards.

Scaling Guide

Scaling guidance for high-throughput deployments.

Cloud Deployment

One-click deployment to AWS App Runner.

Get Started

Deploy & Operate

Runbooks

Data Platform

Decisioning Studio

Execute & Optimize

Intelligence

Platform & Security

Integrations

Reports

Release Notes

Prerequisites

What’s Included

Quick Install

With ML Worker

Configuration

Architecture

Monitoring Stack

Prometheus Metrics

Grafana Dashboards

Database Options

Self-Managed PostgreSQL

Amazon RDS (External)

Upgrading

Rate Limiting & Circuit Breakers

Troubleshooting

Next Steps

ML Worker

Operations

Scaling Guide

Cloud Deployment

Get Started

Deploy & Operate

Runbooks

Data Platform

Decisioning Studio

Execute & Optimize

Intelligence

Platform & Security

Integrations

Reports

Release Notes

Documentation Index

​Prerequisites

​What’s Included

​Quick Install

​With ML Worker

​Configuration

​Architecture

​Monitoring Stack

​Prometheus Metrics

​Grafana Dashboards

​Database Options

​Self-Managed PostgreSQL

​Amazon RDS (External)

​Upgrading

​Rate Limiting & Circuit Breakers

​Troubleshooting

​Next Steps

ML Worker

Operations

Scaling Guide

Cloud Deployment

Prerequisites

What’s Included

Quick Install

With ML Worker

Configuration

Architecture

Monitoring Stack

Prometheus Metrics

Grafana Dashboards

Database Options

Self-Managed PostgreSQL

Amazon RDS (External)

Upgrading

Rate Limiting & Circuit Breakers

Troubleshooting

Next Steps