ML Worker Setup

Overview

The ML Worker is a standalone Python/FastAPI service that provides scikit-learn-based analysis and LightGBM training for KaireonAI’s AI features. It handles computationally intensive tasks — K-Means clustering for segmentation, logistic regression for policy analysis, TF-IDF for content analysis, and LightGBM training for the gradient_boosted model type — that exceed what LLM-based analysis can do accurately.

Training a gradient_boosted algorithm model requires the ML Worker. Scoring does not — the trained tree ensemble is serialized to JSON and scored in-process in Node, so the /recommend hot path never calls Python. Only the Train button hits this service.

When to Use the ML Worker

Scenario	Without ML Worker	With ML Worker
Auto-Segmentation	LLM percentile-based grouping	K-Means on full dataset with silhouette scoring
Policy Recommender	Heuristic pattern recognition	Logistic regression and statistical analysis
Content Intelligence	CTR/CVR heuristics	TF-IDF + Random Forest feature importance
Dataset size	Works well under 5K rows	Required for 5K+ rows for accurate results

The ML Worker is optional. All AI features work without it by falling back to LLM-based analysis. Add it when you need higher accuracy on large datasets.

Local Development

Set up environment

cd ml-worker
cp .env.example .env

Edit .env to set your local database URL:

DATABASE_URL=postgresql://user:password@localhost:5432/kaireon

This must match the same PostgreSQL database the platform uses.

Install dependencies

pip install -r requirements.txt

Start the ML Worker

python -m uvicorn app.main:app --host 0.0.0.0 --port 8000

Configure the platform

Add to your platform/.env:

ML_WORKER_URL=http://localhost:8000

Restart the Next.js dev server to pick up the change.

Verify the connection

curl http://localhost:8000/health

Expected response:

{"status":"ok","capabilities":["policy_analysis","segmentation","content_analysis"]}

In the KaireonAI UI, navigate to AI > Insights — the ML Worker status badge should show “Connected”.

Docker Setup

Standalone Docker

docker run -d \
  --name kaireon-ml-worker \
  -p 8000:8000 \
  -e DATABASE_URL=postgresql://user:pass@host:5432/kaireon \
  <YOUR_ACCOUNT_ID>.dkr.ecr.<REGION>.amazonaws.com/kaireon-ml:latest

Docker Compose

The platform’s docker-compose.yml includes the ML Worker under the ml profile:

# Start everything including the ML Worker
docker compose --profile ml up -d

# Start the platform without the ML Worker
docker compose up -d

The ML Worker automatically connects to PostgreSQL through PgBouncer using the same DATABASE_URL as the platform.

Environment Variables

Variable	Required	Default	Description
`DATABASE_URL`	Yes	—	PostgreSQL connection string (same database as the platform)
`ML_WORKER_PORT`	No	`8000`	Port to listen on

On the platform side (not the ML Worker itself), configure these to tell the platform where to find the worker:

Variable	Required	Default	Description
`ML_WORKER_URL`	No	—	Base URL of the ML Worker. Required for `gradient_boosted` training and ML-Worker-backed AI features. If unset, GBM training returns `MLWorkerUnavailableError`.
`ML_WORKER_API_KEY`	No	—	Optional shared secret. If set, the platform sends it as `X-Api-Key` on every ML Worker request.
`ML_WORKER_TIMEOUT_MS`	No	`300000`	Per-request timeout (ms). GBM training on 50K rows + 100 trees typically completes in under 5 seconds, but allow headroom for pathological inputs.

The ML Worker connects to the same PostgreSQL database as the platform to read schema data directly. It does not need Redis or any other external dependencies.

Kubernetes (Helm)

The Helm chart includes ML Worker deployment as an optional component. Enable it with:

helm install kaireon ./helm \
  --namespace kaireon \
  --set mlWorker.enabled=true \
  --set mlWorker.image.repository=<YOUR_ACCOUNT_ID>.dkr.ecr.<REGION>.amazonaws.com/kaireon-ml \
  --set mlWorker.image.tag=latest

When mlWorker.enabled=true, the chart automatically:

Creates a Deployment and Service for the ML Worker
Injects ML_WORKER_URL into the API pods so the platform auto-connects
Creates a ServiceAccount for the ML Worker pods

Helm Values

Value	Default	Description
`mlWorker.enabled`	`false`	Enable ML Worker deployment
`mlWorker.replicas`	`1`	Number of replicas
`mlWorker.image.repository`	`<YOUR_ACCOUNT_ID>.dkr.ecr.<REGION>.amazonaws.com/kaireon-ml`	Container image
`mlWorker.image.tag`	`latest`	Image tag
`mlWorker.resources.requests.cpu`	`500m`	CPU request
`mlWorker.resources.requests.memory`	`1Gi`	Memory request
`mlWorker.resources.limits.cpu`	`2000m`	CPU limit
`mlWorker.resources.limits.memory`	`4Gi`	Memory limit

The ML Worker can be memory-intensive during clustering and model training. Allocate at least 2Gi of memory for production workloads with datasets over 100K rows.

Connecting from the Platform

There are two ways to connect the platform to the ML Worker:

1. Environment Variable (Recommended for local dev and Kubernetes)

Set ML_WORKER_URL in the platform’s environment. The Helm chart does this automatically when mlWorker.enabled=true.

ML_WORKER_URL=http://localhost:8000

2. Settings UI (Runtime configuration)

Navigate to Settings > Integrations in the KaireonAI UI
Find the ML Worker section
Enter the ML Worker URL
Click Test Connection to verify
Save the configuration

The Settings UI configuration takes precedence over the environment variable.

API Endpoints

Endpoint	Method	Description
`/health`	GET	Health check with capabilities list
`/analyze/policies`	POST	Submit policy analysis job
`/analyze/segments`	POST	Submit segmentation job
`/analyze/content`	POST	Submit content analysis job
`/status/{job_id}`	GET	Poll job status and results
`/train/gbm`	POST	Synchronous LightGBM training. Returns the serialized tree ensemble, metrics, and feature importances. Called by the platform when a `gradient_boosted` model is trained.

Analysis endpoints are asynchronous — they return a jobId immediately and process in the background. /train/gbm is synchronous and returns the trained model JSON directly.

Platform-side health probe

The platform exposes GET /api/v1/ml-worker/health as a pass-through probe so the UI can warn users before attempting GBM training. Response:

{
  "available": true,
  "ml_worker_url_configured": true
}

available is true only if ML_WORKER_URL is configured and the worker’s /health endpoint responds with status: ok.

Large Dataset Warning Flow

When a dataset contains 5,000 or more rows, the KaireonAI UI shows a confirmation dialog before starting analysis. The dialog provides:

Accuracy — ML Worker algorithms (K-Means, logistic regression, TF-IDF) are more accurate than LLM pattern matching on large datasets
Cost estimate — Approximate token count and cost if the user proceeds with LLM analysis
Speed — ML Worker processes data locally in seconds vs. LLM round-trip latency

The user can choose Use ML Worker or Proceed with LLM. If the ML Worker is not connected, the dialog still appears but explains that LLM analysis will sample the data.

For datasets over 5,000 rows, the ML Worker is strongly recommended. LLM-based analysis samples data (up to 1,000 rows for segmentation) which reduces accuracy. The ML Worker processes the entire dataset.

For details on configuring analysis parameters, see AI Configuration.

Troubleshooting

ML Worker not detected by the platform

The platform checks for the ML Worker at startup via the ML_WORKER_URL environment variable, or at runtime via Settings > Integrations. Verify:

The ML Worker is running and responding: curl http://localhost:8000/health
The URL is accessible from the platform (same network/cluster)
The environment variable is set correctly: ML_WORKER_URL=http://localhost:8000

In Kubernetes, the Helm chart auto-injects this when mlWorker.enabled=true.

Python version mismatch or pip install failures

The ML Worker requires Python 3.11+. Verify your version:

python --version

If using a virtual environment, ensure it is activated before installing:

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

On macOS, you may need to install python3.11 explicitly via Homebrew: brew install python@3.11.

ModuleNotFoundError: sklearn

The Python package name is scikit-learn, not sklearn:

pip install scikit-learn

This is a common source of confusion. The requirements.txt uses the correct package name, so running pip install -r requirements.txt avoids this issue.

Health check fails ('connection refused' to PostgreSQL)

The ML Worker connects directly to PostgreSQL to read schema data. Verify:

DATABASE_URL is set in the ML Worker environment and matches the platform database
PostgreSQL is reachable from the ML Worker host/container
Check logs for connection errors: docker logs kaireon-ml-worker

Out of memory during analysis

K-Means clustering and TF-IDF vectorization load the full dataset into memory. Allocate memory based on dataset size:

Under 100K rows: 1Gi is sufficient
100K-500K rows: 2Gi recommended
Over 500K rows: 4Gi+ recommended

In Kubernetes, increase the memory limit in Helm values:

mlWorker:
  resources:
    limits:
      memory: 4Gi

Next Steps

Auto-Segmentation

Use the ML Worker for full-dataset clustering.

Smart Policy Recommender

Enhanced frequency analysis with the ML Worker.

Kubernetes Deployment

Deploy the full stack on Kubernetes.

Get Started

Deploy & Operate

Runbooks

Data Platform

Decisioning Studio

Execute & Optimize

Intelligence

Platform & Security

Integrations

Reports

Release Notes

Overview

When to Use the ML Worker

Local Development

Docker Setup

Standalone Docker

Docker Compose

Environment Variables

Kubernetes (Helm)

Helm Values

Connecting from the Platform

1. Environment Variable (Recommended for local dev and Kubernetes)

2. Settings UI (Runtime configuration)

API Endpoints

Platform-side health probe

Large Dataset Warning Flow

Troubleshooting

Next Steps

Auto-Segmentation

Smart Policy Recommender

Kubernetes Deployment

Get Started

Deploy & Operate

Runbooks

Data Platform

Decisioning Studio

Execute & Optimize

Intelligence

Platform & Security

Integrations

Reports

Release Notes

Documentation Index

​Overview

​When to Use the ML Worker

​Local Development

​Docker Setup

​Standalone Docker

​Docker Compose

​Environment Variables

​Kubernetes (Helm)

​Helm Values

​Connecting from the Platform

​1. Environment Variable (Recommended for local dev and Kubernetes)

​2. Settings UI (Runtime configuration)

​API Endpoints

​Platform-side health probe

​Large Dataset Warning Flow

​Troubleshooting

​Next Steps

Auto-Segmentation

Smart Policy Recommender

Kubernetes Deployment

Overview

When to Use the ML Worker

Local Development

Docker Setup

Standalone Docker

Docker Compose

Environment Variables

Kubernetes (Helm)

Helm Values

Connecting from the Platform

1. Environment Variable (Recommended for local dev and Kubernetes)

2. Settings UI (Runtime configuration)

API Endpoints

Platform-side health probe

Large Dataset Warning Flow

Troubleshooting

Next Steps