Skip to main content

How scoring works — end-to-end

This page is the synthesis of every model component in the platform. The pieces are documented individually (algorithms, maturity ramp, uplift, model lifecycle), but the question operators and evaluators ask most often is: “when a customer event lands, what actually happens between the request and the response?” That’s what’s covered here, in order, with every step linked to its deep-dive.

The 30-second mental model

A POST /api/v1/recommend runs the customer through eight stages. Each stage either narrows the candidate set, scores it, or persists state. The scoring stages are where the models actually drive the decision — everything else is filtering or bookkeeping. The learning loop is the dashed arrow at the bottom: every /respond updates ModelAdaptation posteriors that the next /recommend reads at stage 6.

Stage-by-stage walkthrough

1. Load + enrich the customer

Source: platform/src/app/api/v1/recommend/route.tsenrichCustomer The route resolves the customer from customerId and runs the configured enrichment node of the decision flow. Enrichment pulls customer attributes from the Data module’s schema tables (declared via Schemas + loaded via Pipelines), plus any behavioral metrics that have rolling windows (impression count 30d, complaint rate 90d, etc.). Output: a customer payload with both static attributes and rolling metrics that downstream stages can reference. Optional flow knob: EnrichNodeConfig.excludeJoinIds[] (added in WS1) — lets the flow author exclude expensive joins per-decision-flow when the data isn’t needed.

2. Build the candidate offer set

For each request the engine starts with every active offer in the tenant (status = "active", not soft-deleted), then narrows. The narrowing happens via three optional filters before scoring:
  • channelId query param → only offers with creatives on the requested channel
  • placementId query param → only offers with creatives at the requested placement
  • mandatory offers (per business hierarchy) bypass downstream filtering and always rank
This is the input set that flows into qualification.

3. Qualification rules

Source: lib/qualification-engine.ts Each qualification rule evaluates per (customer, offer) pair. There are 6 wired ruleType values — segment_required, attribute_condition, offer_attribute, propensity_threshold, recency_check, metric_condition — and rules can be scoped global, category, sub-category, channel, offer, creative, or placement. A candidate that fails ANY qualification rule is dropped. The qualification result is persisted to decision_traces.qualificationResults[] with {offerId, passed, reason, ruleId} so the Decision Provenance UI can answer “why did Customer X NOT get Offer Y” without leaving the row.

4. Contact policy (suppression)

Source: lib/contact-policy-engine.ts Contact policies are the always-on layer (made implicit in WS T21, with optional skipContactPolicy opt-out per flow). 14 wired ruleType values cover frequency caps, cooldowns, category-suppression windows, outcome-based suppression, do_not_contact (DNC — the only mechanism that suppresses across channels), and metric_condition rules. Each policy is per-candidate; the first blocking match suppresses. Unknown ruleTypes fail-closed (block + log) — a safety guarantee.

5. Maturity ramp (Bayesian Confidence-Bound — BCB-MR)

This is the first place models drive a decision. Source: lib/ml/maturity.ts, called from lib/pipeline-runner.ts → applyMaturityRamp. The maturity ramp gates exposure for offers whose posterior is too wide to rank confidently. Detailed math in /ai-ml/maturity-ramp. The short version: For each candidate offer, the engine computes the Wilson 95% credible-interval width for the offer-scope Bernoulli posterior (its positives + negatives aggregated across every algorithm model — the ramp reads only the offer scope, not a fallback hierarchy). If the width is tenant.settings.maturityWidthThreshold (default 0.20), the offer is mature → full exposure. Otherwise:
decayingFloor(n) = baseFloor / √(1 + n / decayHalfLife)
exposureProbability = max(decayingFloor(n), wilsonLower)
A deterministic hash (customerId, offerId, today) rolls against the exposure probability — if the roll exceeds it, the candidate is excluded from this customer’s decision today. Why this matters: cold-start offers get controlled exploration; mature offers run at full confidence; offers with strong early evidence aren’t punished by the floor decay. The posterior-width gate (vs. a fixed evidence-count threshold) lets low-volume offers mature once their CI is tight enough and keeps high-volume volatile offers in exploration when their CI stays wide.

6. Scoring — the model-heavy stage

Source: lib/pipeline-runner.ts (PRIE-U branch around line 2140) + lib/scoring/*.ts Each candidate gets scored. The scoring method is configured per decision flow as one of priority_weighted, propensity, or formula (PRIE / PRIE-U). The decision flow can also reference a specific algorithm model via the Score Node OR rely on a default scorer.

6a. Picking which algorithm scores this candidate

The platform supports 9 configurable model types plus ONNX import — 10 scoreable types in total. Each of the 9 configurable types has its own page under /ai-ml/algorithms/*; imported ONNX models (onnx_imported) are covered separately at /ai-ml/onnx-byo:
TypeWhen it fits
scorecardRule-based weights, no training needed. Best for transparency.
bayesianNaive Bayes with online updates — industry-standard Bayesian classifier with per-feature posteriors.
logistic_regressionCalibrated probability output. Good baseline.
gradient_boostedHigh-fidelity tabular. Best raw AUC. Requires retraining.
thompson_banditExploration-exploitation per offer; converges to best arm.
epsilon_greedySimpler bandit, ε% exploration.
online_learnerStreaming SGD logistic regression.
neural_cfCollaborative filtering, customer × offer embeddings.
external_endpointDelegate scoring to a 3rd-party HTTP scorer.
onnx_importedBring-your-own ONNX model.
registryStatus controls which models are used live. Only champion is the default scorer for its registry family; shadow scores silently for offline evaluation; challenger participates in experiments. Detailed lifecycle: /ai-ml/model-lifecycle.

6b. The hierarchical propensity read

When the scoring method is propensity, the engine reads ModelAdaptation rows in this priority order:
offer  →  channel  →  category  →  direction  →  global  →  0.5 fallback
Each tier has its own evidence threshold before it’s trusted (offer ≥ 50, channel ≥ 15, category ≥ 20, direction ≥ 10, global ≥ 10). The first tier above its threshold wins; if offer-level evidence is sparse-but-present (1–49), the score is blended with the strongest available fallback (channel → direction → category → global) via Bayesian shrinkage. The formula (PRIE) method resolves its P component through a narrower chain — offer → category → global → model score → 0.5 — because its channel and direction adaptations are consumed by the uplift τ term (stage 7) rather than the propensity factor. The propensitySource field on every scored candidate records WHICH tier fired (offer, offer+blend, channel, direction, category, global, fallback). Persisted into decision_traces.scoringResults[i].propensitySource so operators can answer “why did this offer rank where it did?”. See /ai-ml/model-lifecycle#scope-hierarchy for full thresholds.
Cold-start ranking honors Offer.priority. When there is no adaptation data at any tier, every candidate lands on the 0.5 fallback. The rank node breaks these ties by descending Offer.priority, so a brand-new tenant with no interaction history still returns offers in a meaningful order (highest business priority first) rather than arbitrary insertion order. As interaction data accumulates, propensity scores separate and take over from the priority tie-break. The response also carries degradedScoring: true whenever candidates were scored on the flat fallback (or a formula method had no formula configured), so you can tell “flat-scored cold-start” from “real model separation.”

6c. Bandits write per-offer state

For thompson_bandit and epsilon_greedy, every /respond updates the bandit posterior at the chosen scope:
  • Thompson stores Beta(α, β). Convert → α += 1; dismiss → β += 1.
  • ε-greedy stores (pulls, totalReward). Every respond increments pulls; positive outcomes also increment totalReward.
Both update incrementally per-respond — no batch retrain needed.

7. PRIE-U arbitration — the final ranking score

When the flow’s scoring method is formula, the final per-candidate score is a weighted geometric mean across five dimensions:
score = P^Wp × R^Wr × I^Wi × E^We × max(0.01, 0.5 + τ/2)^Wu
where each dimension comes from a different source:
LetterDimensionSource
PPropensityStage 6b’s hierarchical adaptation read
RRelevancecomputeRelevance(candidate, context) — channel match, recency, segment fit
IImpactComposite of Offer.businessValue / margin / revenueValue
EEmphasisOffer.priority / 100 — manual business priority
UUpliftCATE estimate τ = μ_T − μ_C mapped to max(0.01, 0.5 + τ/2)
CCLVclvNorm = clvScore / 100 from the customer’s CLV row, applied as extra impact emphasis
Wp, Wr, Wi, We (which sum to 1) plus the two optional exponent terms Wu (uplift) and Wclv (CLV) come from the active RankingProfile (tenant.settings.defaultRankingProfileId, or specified on the flow) or the inline Score-node formula. The profile weight keys uplift and clv now map straight into upliftWeight / clvWeight (previously upliftWeight was documented but stripped by validation — it is now reachable). Default Wu = 0 and Wclv = 0 keep the legacy 4-factor PRIE bit-identical (back-compat). When Wu > 0, persuadable offers (τ positive) get a multiplicative boost and sleeping-dog offers (τ negative) get suppressed. When Wclv > 0, the per-offer impact factor gets an extra exponent of Wclv × clvNorm, so high-CLV customers get up to Wclv extra impact emphasis; a customer with no CLV row is left untouched. For the detailed CATE math, T-learner / X-learner derivations, and the four uplift segments (persuadable / sure_thing / lost_cause / sleeping_dog), see /ai-ml/uplift-modeling. The PRIE composition draws on the recommender-systems literature where propensity (likelihood of conversion), relevance (channel/context match), business impact, and editorial emphasis are four axes that any multi-objective ranker must combine. The uplift dimension U is what differentiates a causal ranker from a predictive one — see /ai-ml/uplift-modeling for the references.

8. Allocation — Hungarian or greedy

For multi-placement decisions (the group node in a decision flow), the engine has to assign offers to placement slots. Two strategies:
  • Hungarian: globally optimal assignment that maximizes the total score across all (offer, placement) pairs subject to constraints (one offer per slot, no offer repeated, channel coupling). O(n³). Default for premium accounts.
  • Greedy: fastest available offer wins; subsequent placements get the next-best. O(n log n). Used when latency budget is tight.
After allocation, the channel atomic coupling pass applies: if any placement on a channel with couplingMode: "atomic" is empty (couldn’t find a viable offer), the engine empties the ENTIRE channel — so a half-rendered email never goes out. The flow’s couplingOverride lets you toggle this per-flow.

Bookkeeping — what gets persisted

Before the response returns, the engine writes:
  1. One recommendation-type interaction_history row per returned decision — via the new persistDecisionInteractions helper (Bug #248 fix). This is the audit join key the /respond route uses to bind a {customerId, rank} pair back to the (offerId, creativeId, channelId) that was actually shown.
  2. One impression-type interaction_history row per decision delivered on a channel where impressionMode != "explicit" — for channels we send (email, batch), the impression is auto-recorded. For client-rendered channels (web, mobile push), the impression isn’t recorded until the client calls /api/v1/impressions.
  3. One decision_trace row with the full forensic chain: qualification results, contact policy decisions, scoring results (with propensitySource, upliftTau, upliftMultiplier, and — when the CLV term is active — clvNorm / clvImpactExponent per candidate), selected offers, ranking weights used, experiment assignment if any, inputsHash, totalLatencyMs. Sampled per tenant.settings.decisionTraceSampleRate.

The learning loop — what /respond does to the next decision

The ModelAdaptation row updated at each scope is what the next /recommend reads in stage 6b. Because adaptations are tiered, a single respond improves scoring at every level the offer participates in:
  • (scope: "offer", scopeId: <offerId>) — directly improves this offer’s per-decision score
  • (scope: "category", scopeId: <categoryId>) — improves baseline for every offer in this category
  • (scope: "channel", scopeId: <channelId>) — improves baseline for every offer on this channel
  • (scope: "direction", scopeId: "inbound" | "outbound") — improves baseline for traffic with this intent
  • (scope: "global", scopeId: "") — improves baseline for everything
The Bug #248 attribution precondition guard prevents inflated learning: positive outcomes credited against an offer the customer was never actually shown (e.g. external attribution noise) get blocked from the adaptation upsert with status: "recorded_without_adaptation" plus an attribution_precondition_failed audit row. Model state stays protected. The model_matured telemetry event fires when an offer’s Wilson CI width crosses the maturity threshold downward for the first time — one event per (model × scope × scopeId) transition. See /ai-ml/maturity-ramp for how this gates exposure on subsequent /recommend calls.

Worked example — the headline finding from the model-architecture round

A real live-test result from /api/v1/algorithm-models/.../uplift?method=t_learner&mode=fitted against the e2e tenant. The same offer (Auto Loan Refi) produces a different CATE in two different score-time contexts:
Customer context (channel × direction)τ (CATE)SegmentWhat the engine concludes
direct_mail × inbound−0.073sleeping_dogShowing this offer suppresses conversion. Hide it.
direct_mail × outbound+0.0036uncertainNeutral effect. Default ranking applies.
The marginal mode (which used pre-aggregated ModelAdaptation rates) collapsed both to τ = 0 — couldn’t distinguish them. The fitted mode (two separate logistic regressions on treated vs. control subsets of interaction_history, with per-row features for channel one-hot, direction one-hot, time-of-day sin/cos, day-of-week sin/cos) produces context-varying τ — which is exactly what the per-customer CATE literature (Künzel et al. PNAS 2019) calls the heterogeneous treatment effect. Plug Wu > 0 into the flow’s RankingProfile and the ranking now actively pushes the sleeping_dog DOWN on inbound while leaving it neutral on outbound. The same model, the same offer, two different decisions per channel-direction context. This is what makes Kaireon’s decisioning behave differently from a propensity-only system.

Configuration knobs — quick reference

Every knob the operator can turn that affects the scoring path:

Per-tenant (tenant.settings)

SettingDefaultWhere it lands
maturityRampMode"bayesian_ci"Stage 5 — BCB-MR vs. legacy_count
maturityWidthThreshold0.20Stage 5 + telemetry D threshold
maturityRampColdStartFloor0.50Stage 5 — baseFloor
maturityFloorDecayHalfLife10Stage 5 — decayHalfLife in decayingFloor(n) = baseFloor / √(1 + n / decayHalfLife). Honored by the ramp runtime (applyMaturityRamp); higher = slower floor decay (offers stay in cold-start exposure longer). Bounded 1–1000
modelMaturityThreshold100Stage 5 — legacy_count mode only
upliftMethodDefault"t_learner"Stage 7 — default for /uplift endpoint
propensityScoreFloor0.05Stage 6 — minimum propensity component
propensitySmoothingWeight10Stage 6b — Bayesian shrinkage strength
defaultRankingProfileIdStage 7 — Wp/Wr/Wi/We/Wu source
UI: /settings/models (new this round) exposes the maturity + uplift knobs; the rest live under /settings.

Per-RankingProfile (weights JSONB)

The profile’s JSONB keys are conversion (Wp, default 0.4), recency (Wr, default 0.2), margin (Wi, default 0.3), fairness (We, default 0.1), uplift (Wu, default 0), and clv (Wclv, default 0). The runtime maps them onto the PRIE factors (conversion → P, recency → R, margin → I, fairness → E) — see Scoring strategies. The inline Score-node formula uses the parallel field names propensityWeight / relevanceWeight / impactWeight / emphasisWeight / upliftWeight / clvWeight (P+R+I+E must sum to 1).

Per-DecisionFlow

rankingProfileId (which weights to use), scoringMethod (priority_weighted / propensity / formula), couplingOverride (channel atomic coupling), skipContactPolicy (rare — for synthetic flows that shouldn’t suppress).

Per-AlgorithmModel

status (operational), registryStatus (lifecycle), autoLearn (whether /respond updates state), learnMode, learnSchedule, outcomeWeights.

Where every model interaction lives

Pre-existing dives: This-round dives: API references:

Kaireon’s decisioning capabilities at a glance

CapabilityImplementation
Per-(channel × direction) posteriorPer-scope row with (scope, scopeId) as the unit of adaptation
Maturity gateWilson credible-interval width (principled — width ≤ 0.20 at 95% CI = mature)
Per-customer CATET-learner and X-learner, per-context uplift estimation
τ in rankingPRIE-U composite: P^Wp × R^Wr × I^Wi × E^We × max(0.01, 0.5 + τ/2)^Wu
Historical backfillIdempotent cron, replayable from decision_traces
Maturity telemetrymodel_matured audit event with old/new state
Per-scope adaptation UIAdaptations panel with Wilson CI bars per (scope, scopeId)

See /about for the platform overview, or jump back to Core concepts for the building-block view.