ML models that score offers for customers — from manual scorecards to neural collaborative filtering, with built-in experimentation.
The Algorithms module is the machine learning layer that predicts how likely a customer is to engage with each offer. Every scoring model takes a customer-offer pair and produces a propensity score between 0 and 1. That score becomes the P (Propensity) factor in the PRIE arbitration formula, which combines it with Relevance, Impact, and Emphasis to determine the final ranking.KaireonAI ships with 9 scoring engines — from a transparent scorecard you can configure in minutes (no training data needed) to gradient-boosted trees (LightGBM) and neural collaborative filtering that learns latent user-item embeddings from interaction data. You can start simple and upgrade later without changing your Decision Flows; the engine is a configuration detail, not a structural one.The module also includes a full experimentation framework with champion/challenger testing, holdout groups, and uplift measurement so you can measure real-world impact before rolling out changes.
Start with a Scorecard during initial setup. Once you have 100+ interactions, add a Bayesian model as a challenger. At 1,000+ interactions, test Logistic Regression via champion/challenger experiments to see if accuracy improves. At 5,000+ interactions — and with the ML Worker deployed — try Gradient Boosted for the best accuracy on non-linear patterns.
A weighted point system where you define rules that match customer or offer fields against conditions and award points. The raw total is normalized to 0—1 using sigmoid or linear normalization.Example (Starbucks): Award 20 points if reward_tier = "gold", 15 points if visit_frequency >= 3, 10 points if age >= 25.
Field
Type
Default
Description
baseScore
number
50
Starting point total before rules fire
rules
array
[]
{ field, operator, value, points, description }
normalization
enum
"sigmoid"
"sigmoid" or "linear"
maxScore
number
100
Upper bound of point range
minScore
number
0
Lower bound of point range
Supported operators:eq, neq, gt, gte, lt, lte, in, not_in, contains, starts_with.Every rule evaluation is returned in the explanations array — fully auditable.Pros: Transparent, instant setup, no training data, easy to audit.
Cons: Cannot capture non-linear feature interactions, manual maintenance.
An adaptive probability model that starts with a uniform prior and updates as it observes real outcomes. Each predictor contributes a log-likelihood ratio, and the posterior probability becomes the score. Laplace smoothing prevents zero-probability issues.
Field
Type
Default
Description
laplaceSmoothingAlpha
number
1
Smoothing parameter
aucThreshold
number
0.5
Minimum AUC to consider model useful
maxPredictors
number
20
Cap on predictor count
binCount
number
10
Bins for continuous feature discretization
priorPositiveRate
number
0.5
Initial assumed conversion rate
Key features:
Cold-start handling: Untrained model returns 0.5 for all customers (uniform prior). Score spread increases as data arrives.
Incremental learning: With autoLearn: true and learnMode: "per_outcome", each recorded outcome updates the model. A RETRAIN_EVERY_N threshold (default 100) controls full recomputation frequency.
Training enrichment: Training enriches customer attributes from schema tables (ds_* tables) so the model learns from real features like age, income, tenure — not empty context objects.
Predictor field names must match actual column names in your schema tables. If your ds_customers table has household_income, use that exact name — not a generic income. Mismatched names produce undifferentiated scores.
Pros: Learns from data, handles cold-start, interpretable per-field contributions.
Cons: Assumes feature independence, less accurate than tree-based models on complex data.
A linear classifier with sigmoid activation. Trained on customer features, offer features, and interaction history via SGD with L1 or L2 regularization. Fast, interpretable per-feature coefficients, and works well with 1,000+ samples.
Field
Type
Default
Description
learningRate
number
0.01
Step size for SGD weight updates
maxIterations
number
100
Epochs over the training data
regularization
enum
"l2"
"none", "l1", or "l2"
regularizationStrength
number
1.0
Penalty strength (higher = simpler model)
Pros: Fast training, fully interpretable coefficients, small model footprint, in-process scoring.
Cons: Can only capture linear relationships — combine with feature engineering for non-linear effects.
A LightGBM tree ensemble — the highest-accuracy engine in the platform. Training runs in the Python ML Worker using LightGBM; the trained ensemble is serialized as portable tree JSON and scored in-process in Node, so the /recommend hot path never calls the Python service.
Field
Type
Default
Description
numLeaves
integer
31
Tree complexity — higher = more capacity and overfitting risk
maxDepth
integer
-1
Depth cap; -1 lets LightGBM decide based on numLeaves
learningRate
number
0.05
Shrinkage per tree; lower values need more estimators
nEstimators
integer
100
Number of boosting trees
minChildSamples
integer
20
Minimum data points per leaf — higher = more regularization
regAlpha
number
0
L1 regularization on leaf weights
regLambda
number
0
L2 regularization on leaf weights
subsample
number
1.0
Row sampling ratio per tree
colsampleBytree
number
1.0
Feature sampling ratio per tree
Architecture:
Training: Python ml-worker receives a compact JSON payload (feature_names, training_data, hyperparams) and fits a LightGBM booster. The booster is dumped as portable tree JSON with split_feature, threshold, default_left, and leaf_value at each node.
Scoring: The Node scorer walks every tree per record, summing leaf values into a raw margin, then applies sigmoid. Typical latency is 5—50µs for a 100-tree ensemble. Missing values are routed via LightGBM’s native default_left convention.
Zero hops in decision path: The /recommend API never calls the ML Worker. Training is the only phase that does.
Requirements:
5,000+ labeled interactions for meaningful accuracy (the engine will cold-start to 0.5 for everything until enough trees exist).
The ML Worker must be reachable at training time. See ML Worker Setup to deploy it. If ML_WORKER_URL is unset, GBM training returns MLWorkerUnavailableError.
Troubleshooting:
ML Worker unreachable: Verify ML_WORKER_URL points to a running worker and that curl $ML_WORKER_URL/health returns status: ok. The platform exposes GET /api/v1/ml-worker/health as a probe.
All scores are 0.5: The model has no trained trees yet. Run Train to kick off LightGBM training.
Low AUC on training: Increase nEstimators, decrease learningRate, or collect more labeled interactions.
Pros: Best accuracy on non-linear structured data, captures feature interactions automatically, handles missing values natively, in-process scoring.
Cons: Requires the ML Worker for training, needs 5,000+ interactions, per-tree path contributions are a lightweight SHAP-inspired approximation rather than true SHAP.
A Thompson Sampling multi-armed bandit using Beta-distributed arms. Each offer is an arm with alpha (successes + 1) and beta (failures + 1). At scoring time, the engine draws from each arm’s Beta distribution — arms with higher expected reward win more often, but uncertain arms still get explored.
Field
Type
Default
Description
priorAlpha
number
1
Initial alpha for all arms
priorBeta
number
1
Initial beta for all arms
minSamples
number
10
Minimum observations before exploitation
Pros: Automatic explore/exploit balance, no feature engineering, Bayesian uncertainty.
Cons: Stochastic scores (different each request), does not use customer features directly.
A simpler bandit that exploits the best-known arm with probability 1 - epsilon and explores randomly with probability epsilon. Epsilon decays over time.
Unpulled arms receive an optimistic score of 1.0 to encourage initial exploration.Pros: Simple, deterministic during exploitation, easy to tune.
Cons: Less sample-efficient than Thompson, no uncertainty modeling.
A two-tower embedding model with an MLP head. Customer and offer each get a learned embedding vector. At scoring time, embeddings are concatenated and passed through a hidden layer (ReLU) and output layer (sigmoid).Architecture:user_embedding + item_embedding -> hidden (ReLU) -> output (sigmoid)
Field
Description
userEmbeddings
Learned vector per customer
itemEmbeddings
Learned vector per offer
hiddenWeights / hiddenBias
Hidden layer parameters
outputWeights / outputBias
Output layer parameters
embeddingDim
Embedding vector size (default 8)
hiddenDim
Hidden layer size (default 16)
Training uses mini-batch SGD with binary cross-entropy loss and Xavier/Glorot initialization.Pros: Captures latent factors, handles sparse interaction matrices.
Cons: Requires significant interaction data, cold-start for new users/items (falls back to zero embedding).
External calls add 50—200ms of network latency per recommendation. For latency-critical use cases (under 50ms), use built-in models. Enable response caching to reduce repeated calls for the same customer.
When explain=true is passed to the Recommend API, each decision includes a modelExplanation object with engine-specific details. The structure of the details array varies by engine type:
Engine
modelExplanation.details
confidence
Scorecard
Per-rule breakdown: field, operator, expected vs actual, matched, points
Scorecard explanations are the most detailed — every rule evaluation is returned with matched/unmatched status and point contribution. This makes scorecards ideal for regulated industries that require a full audit trail of scoring decisions.
New offers have no interaction data, so ML scores are unreliable. Propensity smoothing blends the model score with a prior estimate until sufficient evidence accumulates:
While propensity smoothing adjusts the score, the maturity ramp adjusts the exposure. New offers start at just 2% exposure and ramp linearly to 100% as interactions accumulate:
The ramp uses a deterministic hash of customerId + offerId + date so the same customer sees consistent results within a day.
Setting
Type
Default
Description
modelMaturityThreshold
integer
100
Interactions required for full exposure
Smoothing and the maturity ramp work together. Smoothing ensures a new offer’s score is reasonable; the ramp ensures it is not shown to everyone until the model has evidence. Together they provide fair but cautious treatment for new offers.
If the resolved model is unavailable or the circuit breaker is open, the engine falls back to pre-computed propensity scores, then to priority-based scoring as a last resort.
Every model carries an indexed registryStatus column that pins it to one
of four lifecycle states:
Status
Meaning
draft
training or untested; not reachable from /recommend
challenger
shadow-scoring only — measured against the active champion
champion
active in /recommend; at most one per (tenantId, registryFamily)
archived
retired; preserved for audit but never scored
Promotions go through lib/ml/registry.ts#promoteModel, which enforces
legal transitions (draft → challenger → champion → archived → draft) and
the “one champion per family” invariant in a single transaction. Every
promotion writes an AuditLog row keyed by entityType=algorithm_model,
action=registry_promote, recoverable via DSAR or compliance review.
Self-hosters upgrading from a 2026-04-22-or-earlier deployment must
add the new columns and the DB-level CHECK constraint:
# 1. Add the columns + indexes (non-destructive)npx prisma db push# 2. Apply the CHECK constraint (idempotent)psql "$DATABASE_URL" -f platform/prisma/manual-sql/01_registry_status_check.sql# 3. Backfill values from legacy config.registry JSON# (call backfillRegistryColumns(prisma) once)
Existing rows continue to work without the backfill — the read path falls
back to config.registry JSON when the columns are null. Backfill is a
one-time cleanup, not a hard prerequisite.
Before launching, estimate required sample size and duration given your baseline conversion rate, minimum detectable effect, and daily traffic volume. Returns required sample size per variant (80% power, 95% confidence) and estimated duration in days.
Auto-promote is disabled by default. The system provides uplift magnitude, p-value, and confidence intervals, but the final promotion decision is left to the operator. Enable autoPromote: true only when you have guardrail checks and are comfortable with automated rollouts.
When enabled, auto-promotion triggers when: (1) experiment has run for at least promoteAfterDays, (2) challenger exceeds champion by at least promoteThreshold, and (3) result is statistically significant at 95% confidence.
POST /api/v1/algorithm-models # Create a modelPOST /api/v1/algorithm-models/{id}/train # Train from interaction dataPOST /api/v1/algorithm-models/{id}/score # Score offers for a customerDELETE /api/v1/algorithm-models?id={id} # Delete permanently
Learned weights: visit_frequency=0.42, reward_tier=0.38, income=0.15. Sigmoid of weighted sum.
0.754
Gradient Boosted
100-tree LightGBM ensemble; sum of leaf values routed through sigmoid; captures reward_tier × visit_frequency interaction.
0.812
All four produce a 0—1 score, but arrive at it differently. The scorecard is transparent and manual. The Bayesian adapts from data while remaining interpretable. Logistic regression adds learned linear weights. The gradient boosted ensemble captures non-linear interactions between features — at the cost of needing the ML Worker for training.