Skip to main content
The Algorithms module is the machine learning layer that predicts how likely a customer is to engage with each offer. Every scoring model takes a customer-offer pair and produces a propensity score between 0 and 1. That score becomes the P (Propensity) factor in the PRIE arbitration formula, which combines it with Relevance, Impact, and Emphasis to determine the final ranking. KaireonAI ships with 9 scoring engines — from a transparent scorecard you can configure in minutes (no training data needed) to gradient-boosted trees (LightGBM) and neural collaborative filtering that learns latent user-item embeddings from interaction data. You can start simple and upgrade later without changing your Decision Flows; the engine is a configuration detail, not a structural one. The module also includes a full experimentation framework with champion/challenger testing, holdout groups, and uplift measurement so you can measure real-world impact before rolling out changes.

When to Use Which Engine

EngineData RequiredBest ForSetup Time
ScorecardNoneRegulated industries, simple rules, full audit trailMinutes
BayesianSome historical dataCold-start, adaptive scoring, interpretable contributionsMinutes
Logistic Regression1,000+ interactionsFast linear classifier with interpretable coefficientsMinutes
Gradient Boosted5,000+ interactionsMaximum accuracy on non-linear structured data (requires ML Worker)Hours
Thompson BanditOutcome signalsAuto-discovering best offers, content selectionMinutes
Epsilon-GreedyOutcome signalsDeterministic exploit-phase scoring, easy debuggingMinutes
Neural CFRich interaction dataLatent preference discovery, sparse matricesHours
Online LearnerStreaming outcomesReal-time adaptation, fast-moving environmentsMinutes
External EndpointExternal modelBYO model (SageMaker, Vertex AI, Azure ML)Minutes
Start with a Scorecard during initial setup. Once you have 100+ interactions, add a Bayesian model as a challenger. At 1,000+ interactions, test Logistic Regression via champion/challenger experiments to see if accuracy improves. At 5,000+ interactions — and with the ML Worker deployed — try Gradient Boosted for the best accuracy on non-linear patterns.

Scoring Engines

Scorecard

A weighted point system where you define rules that match customer or offer fields against conditions and award points. The raw total is normalized to 0—1 using sigmoid or linear normalization. Example (Starbucks): Award 20 points if reward_tier = "gold", 15 points if visit_frequency >= 3, 10 points if age >= 25.
FieldTypeDefaultDescription
baseScorenumber50Starting point total before rules fire
rulesarray[]{ field, operator, value, points, description }
normalizationenum"sigmoid""sigmoid" or "linear"
maxScorenumber100Upper bound of point range
minScorenumber0Lower bound of point range
Supported operators: eq, neq, gt, gte, lt, lte, in, not_in, contains, starts_with. Every rule evaluation is returned in the explanations array — fully auditable. Pros: Transparent, instant setup, no training data, easy to audit. Cons: Cannot capture non-linear feature interactions, manual maintenance.

Bayesian (Naive Bayes)

An adaptive probability model that starts with a uniform prior and updates as it observes real outcomes. Each predictor contributes a log-likelihood ratio, and the posterior probability becomes the score. Laplace smoothing prevents zero-probability issues.
FieldTypeDefaultDescription
laplaceSmoothingAlphanumber1Smoothing parameter
aucThresholdnumber0.5Minimum AUC to consider model useful
maxPredictorsnumber20Cap on predictor count
binCountnumber10Bins for continuous feature discretization
priorPositiveRatenumber0.5Initial assumed conversion rate
Key features:
  • Cold-start handling: Untrained model returns 0.5 for all customers (uniform prior). Score spread increases as data arrives.
  • Incremental learning: With autoLearn: true and learnMode: "per_outcome", each recorded outcome updates the model. A RETRAIN_EVERY_N threshold (default 100) controls full recomputation frequency.
  • Training enrichment: Training enriches customer attributes from schema tables (ds_* tables) so the model learns from real features like age, income, tenure — not empty context objects.
Predictor field names must match actual column names in your schema tables. If your ds_customers table has household_income, use that exact name — not a generic income. Mismatched names produce undifferentiated scores.
Pros: Learns from data, handles cold-start, interpretable per-field contributions. Cons: Assumes feature independence, less accurate than tree-based models on complex data.

Logistic Regression

A linear classifier with sigmoid activation. Trained on customer features, offer features, and interaction history via SGD with L1 or L2 regularization. Fast, interpretable per-feature coefficients, and works well with 1,000+ samples.
FieldTypeDefaultDescription
learningRatenumber0.01Step size for SGD weight updates
maxIterationsnumber100Epochs over the training data
regularizationenum"l2""none", "l1", or "l2"
regularizationStrengthnumber1.0Penalty strength (higher = simpler model)
Pros: Fast training, fully interpretable coefficients, small model footprint, in-process scoring. Cons: Can only capture linear relationships — combine with feature engineering for non-linear effects.

Gradient Boosted

A LightGBM tree ensemble — the highest-accuracy engine in the platform. Training runs in the Python ML Worker using LightGBM; the trained ensemble is serialized as portable tree JSON and scored in-process in Node, so the /recommend hot path never calls the Python service.
FieldTypeDefaultDescription
numLeavesinteger31Tree complexity — higher = more capacity and overfitting risk
maxDepthinteger-1Depth cap; -1 lets LightGBM decide based on numLeaves
learningRatenumber0.05Shrinkage per tree; lower values need more estimators
nEstimatorsinteger100Number of boosting trees
minChildSamplesinteger20Minimum data points per leaf — higher = more regularization
regAlphanumber0L1 regularization on leaf weights
regLambdanumber0L2 regularization on leaf weights
subsamplenumber1.0Row sampling ratio per tree
colsampleBytreenumber1.0Feature sampling ratio per tree
Architecture:
  • Training: Python ml-worker receives a compact JSON payload (feature_names, training_data, hyperparams) and fits a LightGBM booster. The booster is dumped as portable tree JSON with split_feature, threshold, default_left, and leaf_value at each node.
  • Scoring: The Node scorer walks every tree per record, summing leaf values into a raw margin, then applies sigmoid. Typical latency is 5—50µs for a 100-tree ensemble. Missing values are routed via LightGBM’s native default_left convention.
  • Zero hops in decision path: The /recommend API never calls the ML Worker. Training is the only phase that does.
Requirements:
  • 5,000+ labeled interactions for meaningful accuracy (the engine will cold-start to 0.5 for everything until enough trees exist).
  • The ML Worker must be reachable at training time. See ML Worker Setup to deploy it. If ML_WORKER_URL is unset, GBM training returns MLWorkerUnavailableError.
Troubleshooting:
  • ML Worker unreachable: Verify ML_WORKER_URL points to a running worker and that curl $ML_WORKER_URL/health returns status: ok. The platform exposes GET /api/v1/ml-worker/health as a probe.
  • All scores are 0.5: The model has no trained trees yet. Run Train to kick off LightGBM training.
  • Low AUC on training: Increase nEstimators, decrease learningRate, or collect more labeled interactions.
Pros: Best accuracy on non-linear structured data, captures feature interactions automatically, handles missing values natively, in-process scoring. Cons: Requires the ML Worker for training, needs 5,000+ interactions, per-tree path contributions are a lightweight SHAP-inspired approximation rather than true SHAP.

Thompson Bandit

A Thompson Sampling multi-armed bandit using Beta-distributed arms. Each offer is an arm with alpha (successes + 1) and beta (failures + 1). At scoring time, the engine draws from each arm’s Beta distribution — arms with higher expected reward win more often, but uncertain arms still get explored.
FieldTypeDefaultDescription
priorAlphanumber1Initial alpha for all arms
priorBetanumber1Initial beta for all arms
minSamplesnumber10Minimum observations before exploitation
Pros: Automatic explore/exploit balance, no feature engineering, Bayesian uncertainty. Cons: Stochastic scores (different each request), does not use customer features directly.

Epsilon-Greedy

A simpler bandit that exploits the best-known arm with probability 1 - epsilon and explores randomly with probability epsilon. Epsilon decays over time.
FieldTypeDefaultDescription
epsilonnumber0.1Base exploration probability
decayRatenumber0.01Decay: epsilon_t = epsilon / (1 + decayRate * totalPulls)
minEpsilonnumber0.01Floor for exploration
Unpulled arms receive an optimistic score of 1.0 to encourage initial exploration. Pros: Simple, deterministic during exploitation, easy to tune. Cons: Less sample-efficient than Thompson, no uncertainty modeling.

Neural Collaborative Filtering

A two-tower embedding model with an MLP head. Customer and offer each get a learned embedding vector. At scoring time, embeddings are concatenated and passed through a hidden layer (ReLU) and output layer (sigmoid). Architecture: user_embedding + item_embedding -> hidden (ReLU) -> output (sigmoid)
FieldDescription
userEmbeddingsLearned vector per customer
itemEmbeddingsLearned vector per offer
hiddenWeights / hiddenBiasHidden layer parameters
outputWeights / outputBiasOutput layer parameters
embeddingDimEmbedding vector size (default 8)
hiddenDimHidden layer size (default 16)
Training uses mini-batch SGD with binary cross-entropy loss and Xavier/Glorot initialization. Pros: Captures latent factors, handles sparse interaction matrices. Cons: Requires significant interaction data, cold-start for new users/items (falls back to zero embedding).

Online Learner

A streaming SGD logistic regression model that learns from one example at a time. Call learnOnline() after each outcome — no batch training required.
FieldDescription
weightsFeature weights (initialized to zero)
biasBias term
learningRateBase learning rate (default 0.01)
decayRateLearning rate decay (default 0.001)
stepTotal updates applied
Effective learning rate: lr_t = learningRate / (1 + decayRate * step). Pros: True real-time learning, no batch jobs, low memory. Cons: Linear model only, sensitive to learning rate, can be noisy.

External Model Endpoint

Call external HTTP prediction endpoints — SageMaker, Vertex AI, Azure ML, MLflow, BentoML, or any HTTP endpoint that returns scores.
SettingOptionsDescription
Scoring modebatch (all candidates in one request) or single (one call per pair)Batch is more efficient when supported
Auth typeapi_key, bearer, aws_sigv4AWS SigV4 for SageMaker
Response mappingDot-path extraction (e.g., predictions[0].score)Pull scores from nested response structures
TimeoutDefault 200msFalls back to fallbackScore on timeout
Example SageMaker config:
{
  "endpointUrl": "https://runtime.sagemaker.us-east-1.amazonaws.com/endpoints/churn-model/invocations",
  "authType": "aws_sigv4",
  "authConfig": { "awsRegion": "us-east-1", "awsService": "sagemaker" },
  "scoringMode": "batch",
  "responseMapping": { "batchScoresPath": "scores", "fallbackScore": 0.5 },
  "timeoutMs": 200
}
External calls add 50—200ms of network latency per recommendation. For latency-critical use cases (under 50ms), use built-in models. Enable response caching to reduce repeated calls for the same customer.

Engine Comparison

ScorecardBayesianLogistic RegressionGradient BoostedThompsonEpsilon-GreedyNeural CFOnline LearnerExternal
ComplexityLowMediumMediumHighMediumLowHighLowVaries
Data neededNoneHistorical1,000+5,000+OutcomesOutcomesRich interactionsStreamingExternal
InterpretabilityHighMediumHighMediumLowMediumLowMediumVaries
AccuracyGoodGoodGoodBestGoodGoodBest (latent)GoodVaries
Learning modeManualIncrementalScheduledScheduled (ML Worker)Per-outcomePer-outcomeBatch SGDPer-outcomeExternal

Explanation Details by Engine

When explain=true is passed to the Recommend API, each decision includes a modelExplanation object with engine-specific details. The structure of the details array varies by engine type:
EnginemodelExplanation.detailsconfidence
ScorecardPer-rule breakdown: field, operator, expected vs actual, matched, pointsNo
BayesianPer-field log-odds contribution: field, contributionYes (0—1)
Logistic RegressionPer-feature weight contribution: field, contributionNo
Gradient BoostedPer-feature path contribution (SHAP-inspired approximation): field, contributionNo
Thompson BanditPer-offer Thompson scoreNo
Epsilon-GreedyPer-offer epsilon-greedy scoreNo
Neural CFNot available (embedding-based)No
Online LearnerNot available (incremental)No
External EndpointDepends on external modelNo
Scorecard explanations are the most detailed — every rule evaluation is returned with matched/unmatched status and point contribution. This makes scorecards ideal for regulated industries that require a full audit trail of scoring decisions.

Cold Start and Propensity Smoothing

New offers have no interaction data, so ML scores are unreliable. Propensity smoothing blends the model score with a prior estimate until sufficient evidence accumulates:
smoothedScore = (modelScore * evidence + startingPropensity * weight) / (evidence + weight)
InteractionsModel ScoreSmoothed ScoreWhat Happens
00.700Pure prior (offer.priority / 100)
100.450.629Prior still dominates
250.450.575Equal blend (evidence = weight)
1000.450.500Model nearly converged
5000.450.462Effectively just model score
(Example with priority = 70, propensitySmoothingWeight = 25)
SettingTypeDefaultDescription
propensitySmoothingWeightinteger25Higher = slower transition from prior to model
Set higher (50—100) for extended exploration of new offers. Set lower (5—10) if you have fast feedback loops and trust the model quickly.

Model Maturity Ramp

While propensity smoothing adjusts the score, the maturity ramp adjusts the exposure. New offers start at just 2% exposure and ramp linearly to 100% as interactions accumulate:
exposureProbability = max(0.02, min(1.0, interactions / maturityThreshold))
InteractionsExposureEffect
02%Minimal exposure — model is blind
1010%Growing confidence
5050%Half of eligible customers
100+100%Full exposure — enough data
The ramp uses a deterministic hash of customerId + offerId + date so the same customer sees consistent results within a day.
SettingTypeDefaultDescription
modelMaturityThresholdinteger100Interactions required for full exposure
Smoothing and the maturity ramp work together. Smoothing ensures a new offer’s score is reasonable; the ramp ensures it is not shown to everyone until the model has evidence. Together they provide fair but cautious treatment for new offers.

Model Resolution Hierarchy

When a Score node executes, the engine resolves which model to use for each candidate via an override priority chain:
offer-level override -> category override -> channel override -> default model
First match wins. Override resolution happens independently per candidate — two offers in the same request can be scored by different models.
{
  "defaultModel": "model_bayesian_v3",
  "overrides": [
    { "scope": "offer",    "key": "offer_premium_cc",  "modelKey": "model_premium_scorecard" },
    { "scope": "category", "key": "cat_loans",         "modelKey": "model_loan_gb" },
    { "scope": "channel",  "key": "chan_email",         "modelKey": "model_email_bayesian" }
  ]
}
If the resolved model is unavailable or the circuit breaker is open, the engine falls back to pre-computed propensity scores, then to priority-based scoring as a last resort.

Champion/Challenger Testing

Run a new model against your production model using live traffic with deterministic customer assignment.

Model Registry

Every model carries an indexed registryStatus column that pins it to one of four lifecycle states:
StatusMeaning
drafttraining or untested; not reachable from /recommend
challengershadow-scoring only — measured against the active champion
championactive in /recommend; at most one per (tenantId, registryFamily)
archivedretired; preserved for audit but never scored
Promotions go through lib/ml/registry.ts#promoteModel, which enforces legal transitions (draft → challenger → champion → archived → draft) and the “one champion per family” invariant in a single transaction. Every promotion writes an AuditLog row keyed by entityType=algorithm_model, action=registry_promote, recoverable via DSAR or compliance review.
Self-hosters upgrading from a 2026-04-22-or-earlier deployment must add the new columns and the DB-level CHECK constraint:
# 1. Add the columns + indexes (non-destructive)
npx prisma db push

# 2. Apply the CHECK constraint (idempotent)
psql "$DATABASE_URL" -f platform/prisma/manual-sql/01_registry_status_check.sql

# 3. Backfill values from legacy config.registry JSON
#    (call backfillRegistryColumns(prisma) once)
Existing rows continue to work without the backfill — the read path falls back to config.registry JSON when the columns are null. Backfill is a one-time cleanup, not a hard prerequisite.

How It Works

  1. Configure the split — Set weights (e.g., 80/20 champion/challenger)
  2. Deterministic assignmentFNV-1a(customerId + ":cc") produces a stable hash. Same customer always gets the same model.
  3. Override bypass — When enabled, champion/challenger takes precedence over the normal override hierarchy.
{
  "championChallenger": {
    "enabled": true,
    "champion": { "modelKey": "model_bayesian_v3", "weight": 80 },
    "challengers": [{ "modelKey": "model_gb_v1", "weight": 20 }]
  }
}

Example: After 14 Days

MetricChampion (Bayesian)Challenger (Gradient Boosted)
Customers800200
Conversions9632
Conversion rate12.0%16.0%
Uplift+4pp (33.3% relative)
p-value0.150
Significant?No (need more data)
The challenger shows promise but has not reached statistical significance at 95% confidence. The experiment needs more traffic.

Experiments

Experiments wrap champion/challenger testing with holdout groups, traffic management, and statistical analysis.

Creating an Experiment

{
  "key": "q1-rewards-model-test",
  "name": "Q1 Rewards Model Test",
  "championModelId": "model_bayesian_v3",
  "trafficSplit": { "championPct": 80 },
  "challengers": [{ "modelId": "model_gb_v1", "trafficPct": 20 }],
  "autoPromote": false,
  "promoteThreshold": 0.02,
  "promoteAfterDays": 14
}
Traffic split must sum to exactly 100%. The API validates championPct + sum(challenger trafficPct) = 100 and rejects the request otherwise.

Uplift Calculation

KaireonAI uses a two-proportion z-test:
Result FieldDescription
treatmentConversionRateConversions / total in treatment group
holdoutConversionRateConversions / total in holdout group
upliftAbsolute difference (treatment - holdout)
relativeUpliftPercentage improvement over holdout
zScoreTest statistic
pValueTwo-tailed p-value
significanttrue if p-value < alpha (default 0.05)

Power Calculator

Before launching, estimate required sample size and duration given your baseline conversion rate, minimum detectable effect, and daily traffic volume. Returns required sample size per variant (80% power, 95% confidence) and estimated duration in days.

Auto-Promotion

Auto-promote is disabled by default. The system provides uplift magnitude, p-value, and confidence intervals, but the final promotion decision is left to the operator. Enable autoPromote: true only when you have guardrail checks and are comfortable with automated rollouts.
When enabled, auto-promotion triggers when: (1) experiment has run for at least promoteAfterDays, (2) challenger exceeds champion by at least promoteThreshold, and (3) result is statistically significant at 95% confidence.

Model Lifecycle

1

Create

Define model with key, name, engine type, and engine-specific config. Starts in draft status.
2

Configure

Set up predictors (feature fields), target field, and engine settings.
3

Train

Kick off training (requires 50+ interaction records for data-driven engines).
4

Evaluate

Review accuracy, precision, recall, F1, AUC. Compare against previous versions.
5

Promote

Set to active to make available for Decision Flows.
6

Monitor

Track in Model Health Dashboard. Scheduled drift checks auto-enqueue retraining when performance degrades.

Auto-Learning Modes

ModeEnginesHow It Works
IncrementalBayesianOutcomes buffered in Redis (50-event threshold), batch-updated without full retrain
ScheduledLogistic Regression, Gradient BoostedCron schedule (e.g., 24h, 7d) triggers periodic retraining
Per-outcomeThompson, Epsilon-Greedy, Online LearnerModel updates immediately after each outcome

Auto-Upgrade Recommendations

Current EngineUpgrade ToTrigger
ScorecardBayesianMore than 5 rules AND more than 100 training samples
BayesianLogistic RegressionMore than 500 samples AND AUC below 0.75
Logistic RegressionGradient BoostedMore than 5,000 samples AND AUC below 0.90 (requires ML Worker)

Field Reference

Algorithm Model

FieldTypeRequiredDefaultDescription
keystringYesUnique identifier (1—255 chars)
namestringYesHuman-readable name
descriptionstringNo""Optional description
modelTypeenumYesscorecard, bayesian, logistic_regression, gradient_boosted, thompson_bandit, epsilon_greedy, neural_cf, online_learner, external_endpoint
statusenumNo"draft"draft, training, active, paused, archived, error
configobjectNo{}Engine-specific configuration
targetFieldstringNo""Field being predicted
targetSchemaKeystringNo""Schema key for target field
predictorsarrayNo[]{ field, schemaKey, importance, bins, selected }
metricsobjectNo{}Latest evaluation metrics
metricsHistoryarrayNo[]Version-over-version comparison
modelStateobjectNo{}Learned parameters
learningConfigobjectNo{}Auto-learn settings
outcomeWeightsobjectNonullOutcome type weights for blended scoring
interactionFeaturesobjectNonullInteraction features to extract
evolutionConfigobjectNonullAuto-upgrade thresholds

Experiment

FieldTypeRequiredDefaultDescription
keystringYesUnique identifier
namestringYesHuman-readable name
championModelIdstringNonullChampion model ID
trafficSplit.championPctnumberNo80Champion traffic percentage
challengers[].modelIdstringYesChallenger model ID
challengers[].trafficPctnumberNo10Challenger traffic percentage
autoPromotebooleanNofalseAuto-promote on win
promoteThresholdnumberNo0.02Minimum uplift (2pp)
promoteAfterDaysnumberNo14Minimum experiment duration

API Quick Reference

POST   /api/v1/algorithm-models           # Create a model
POST   /api/v1/algorithm-models/{id}/train # Train from interaction data
POST   /api/v1/algorithm-models/{id}/score # Score offers for a customer
DELETE /api/v1/algorithm-models?id={id}    # Delete permanently

Score Request Example

{
  "customerAttributes": { "income": 75000, "reward_tier": "gold" },
  "offers": [
    { "id": "offer_bogo", "attributes": { "discount_pct": 50 } },
    { "id": "offer_stars", "attributes": { "multiplier": 3 } }
  ]
}
Returns per-offer scores sorted by propensity.
For full API reference with request/response schemas and error codes, see the Algorithm Models API Reference.

Worked Example: Three Engines Score the Same Offer

A Starbucks Rewards member (income: 75000, reward_tier: gold, visit_frequency: 4/week) is evaluated for the BOGO Frappuccino offer:
EngineHow It ScoresResult
ScorecardbaseScore=50 + 20 (gold tier) + 15 (high frequency) = 85 points. Sigmoid normalization.0.818
Bayesian500 positive, 300 negative outcomes. Log-likelihood ratios: reward_tier +0.31, visit_frequency +0.18.0.724
Logistic RegressionLearned weights: visit_frequency=0.42, reward_tier=0.38, income=0.15. Sigmoid of weighted sum.0.754
Gradient Boosted100-tree LightGBM ensemble; sum of leaf values routed through sigmoid; captures reward_tier × visit_frequency interaction.0.812
All four produce a 0—1 score, but arrive at it differently. The scorecard is transparent and manual. The Bayesian adapts from data while remaining interpretable. Logistic regression adds learned linear weights. The gradient boosted ensemble captures non-linear interactions between features — at the cost of needing the ML Worker for training.

Decision Flows

See how models plug into the Score stage.

Composable Pipeline

The score node uses the same scoring resolver.

Dashboards

Monitor model health, drift, and experiment results.