Skip to main content

Overview

The ML Worker is a standalone Python/FastAPI service that provides scikit-learn-based analysis for KaireonAI’s AI features. It handles computationally intensive tasks — K-Means clustering for segmentation, logistic regression for policy analysis, and TF-IDF for content analysis — that exceed what LLM-based analysis can do accurately.

When to Use the ML Worker

ScenarioWithout ML WorkerWith ML Worker
Auto-SegmentationLLM percentile-based groupingK-Means on full dataset with silhouette scoring
Policy RecommenderHeuristic pattern recognitionLogistic regression and statistical analysis
Content IntelligenceCTR/CVR heuristicsTF-IDF + Random Forest feature importance
Dataset sizeWorks well under 5K rowsRequired for 5K+ rows for accurate results
The ML Worker is optional. All AI features work without it by falling back to LLM-based analysis. Add it when you need higher accuracy on large datasets.

Local Development

1

Set up environment

cd ml-worker
cp .env.example .env
Edit .env to set your local database URL:
DATABASE_URL=postgresql://user:password@localhost:5432/kaireon
This must match the same PostgreSQL database the platform uses.
2

Install dependencies

pip install -r requirements.txt
3

Start the ML Worker

python -m uvicorn app.main:app --host 0.0.0.0 --port 8000
4

Configure the platform

Add to your platform/.env:
ML_WORKER_URL=http://localhost:8000
Restart the Next.js dev server to pick up the change.
5

Verify the connection

curl http://localhost:8000/health
Expected response:
{"status":"ok","capabilities":["policy_analysis","segmentation","content_analysis"]}
In the KaireonAI UI, navigate to AI > Insights — the ML Worker status badge should show “Connected”.

Docker Setup

Standalone Docker

docker run -d \
  --name kaireon-ml-worker \
  -p 8000:8000 \
  -e DATABASE_URL=postgresql://user:pass@host:5432/kaireon \
  422500312304.dkr.ecr.us-east-1.amazonaws.com/kaireon-ml:latest

Docker Compose

The platform’s docker-compose.yml includes the ML Worker under the ml profile:
# Start everything including the ML Worker
docker compose --profile ml up -d

# Start the platform without the ML Worker
docker compose up -d
The ML Worker automatically connects to PostgreSQL through PgBouncer using the same DATABASE_URL as the platform.

Environment Variables

VariableRequiredDefaultDescription
DATABASE_URLYesPostgreSQL connection string (same database as the platform)
ML_WORKER_PORTNo8000Port to listen on
The ML Worker connects to the same PostgreSQL database as the platform to read schema data directly. It does not need Redis or any other external dependencies.

Kubernetes (Helm)

The Helm chart includes ML Worker deployment as an optional component. Enable it with:
helm install kaireon ./helm \
  --namespace kaireon \
  --set mlWorker.enabled=true \
  --set mlWorker.image.repository=422500312304.dkr.ecr.us-east-1.amazonaws.com/kaireon-ml \
  --set mlWorker.image.tag=latest
When mlWorker.enabled=true, the chart automatically:
  • Creates a Deployment and Service for the ML Worker
  • Injects ML_WORKER_URL into the API pods so the platform auto-connects
  • Creates a ServiceAccount for the ML Worker pods

Helm Values

ValueDefaultDescription
mlWorker.enabledfalseEnable ML Worker deployment
mlWorker.replicas1Number of replicas
mlWorker.image.repository422500312304.dkr.ecr.us-east-1.amazonaws.com/kaireon-mlContainer image
mlWorker.image.taglatestImage tag
mlWorker.resources.requests.cpu500mCPU request
mlWorker.resources.requests.memory1GiMemory request
mlWorker.resources.limits.cpu2000mCPU limit
mlWorker.resources.limits.memory4GiMemory limit
The ML Worker can be memory-intensive during clustering and model training. Allocate at least 2Gi of memory for production workloads with datasets over 100K rows.

Connecting from the Platform

There are two ways to connect the platform to the ML Worker: Set ML_WORKER_URL in the platform’s environment. The Helm chart does this automatically when mlWorker.enabled=true.
ML_WORKER_URL=http://localhost:8000

2. Settings UI (Runtime configuration)

  1. Navigate to Settings > Integrations in the KaireonAI UI
  2. Find the ML Worker section
  3. Enter the ML Worker URL
  4. Click Test Connection to verify
  5. Save the configuration
The Settings UI configuration takes precedence over the environment variable.

API Endpoints

EndpointMethodDescription
/healthGETHealth check with capabilities list
/analyze/policiesPOSTSubmit policy analysis job
/analyze/segmentsPOSTSubmit segmentation job
/analyze/contentPOSTSubmit content analysis job
/status/{job_id}GETPoll job status and results
All analysis endpoints are asynchronous — they return a jobId immediately and process in the background.

Large Dataset Warning Flow

When a dataset contains 5,000 or more rows, the KaireonAI UI shows a confirmation dialog before starting analysis. The dialog provides:
  • Accuracy — ML Worker algorithms (K-Means, logistic regression, TF-IDF) are more accurate than LLM pattern matching on large datasets
  • Cost estimate — Approximate token count and cost if the user proceeds with LLM analysis
  • Speed — ML Worker processes data locally in seconds vs. LLM round-trip latency
The user can choose Use ML Worker or Proceed with LLM. If the ML Worker is not connected, the dialog still appears but explains that LLM analysis will sample the data.
For datasets over 5,000 rows, the ML Worker is strongly recommended. LLM-based analysis samples data (up to 1,000 rows for segmentation) which reduces accuracy. The ML Worker processes the entire dataset.
For details on configuring analysis parameters, see AI Configuration.

Troubleshooting

IssueSolution
Worker not detectedCheck the URL in Settings > Integrations or verify ML_WORKER_URL env var. Ensure the worker is running and accessible.
Health check failsVerify DATABASE_URL is correct and the worker can reach PostgreSQL. Check logs with docker logs kaireon-ml-worker.
Out of memory during analysisIncrease memory limits. For datasets over 500K rows, use at least 4Gi.
ModuleNotFoundError: sklearnRun pip install scikit-learn — the package name differs from the import name.

Next Steps