How scoring works — end-to-end

This page is the synthesis of every model component in the platform. The pieces are documented individually (algorithms, maturity ramp, uplift, model lifecycle), but the question operators and evaluators ask most often is: “when a customer event lands, what actually happens between the request and the response?” That’s what’s covered here, in order, with every step linked to its deep-dive.

The 30-second mental model

A POST /api/v1/recommend runs the customer through eight stages. Each stage either narrows the candidate set, scores it, or persists state. The scoring stages are where the models actually drive the decision — everything else is filtering or bookkeeping. The learning loop is the dashed arrow at the bottom: every /respond updates ModelAdaptation posteriors that the next /recommend reads at stage 6.

Stage-by-stage walkthrough

1. Load + enrich the customer

Source: platform/src/app/api/v1/recommend/route.ts → enrichCustomer The route resolves the customer from customerId and runs the configured enrichment node of the decision flow. Enrichment pulls customer attributes from the Data module’s schema tables (declared via Schemas + loaded via Pipelines), plus any behavioral metrics that have rolling windows (impression count 30d, complaint rate 90d, etc.). Output: a customer payload with both static attributes and rolling metrics that downstream stages can reference. Optional flow knob: EnrichNodeConfig.excludeJoinIds[] (added in WS1) — lets the flow author exclude expensive joins per-decision-flow when the data isn’t needed.

2. Build the candidate offer set

For each request the engine starts with every active offer in the tenant (status = "active", not soft-deleted), then narrows. The narrowing happens via three optional filters before scoring:

channelId query param → only offers with creatives on the requested channel
placementId query param → only offers with creatives at the requested placement
mandatory offers (per business hierarchy) bypass downstream filtering and always rank

This is the input set that flows into qualification.

3. Qualification rules

Source: lib/qualification-engine.ts Each qualification rule evaluates per (customer, offer) pair. There are 6 wired ruleType values — segment_required, attribute_condition, offer_attribute, propensity_threshold, recency_check, metric_condition — and rules can be scoped global, category, sub-category, channel, offer, creative, or placement. A candidate that fails ANY qualification rule is dropped. The qualification result is persisted to decision_traces.qualificationResults[] with {offerId, passed, reason, ruleId} so the Decision Provenance UI can answer “why did Customer X NOT get Offer Y” without leaving the row.

4. Contact policy (suppression)

Source: lib/contact-policy-engine.ts Contact policies are the always-on layer (made implicit in WS T21, with optional skipContactPolicy opt-out per flow). 14 wired ruleType values cover frequency caps, cooldowns, category-suppression windows, outcome-based suppression, do_not_contact (DNC — the only mechanism that suppresses across channels), and metric_condition rules. Each policy is per-candidate; the first blocking match suppresses. Unknown ruleTypes fail-closed (block + log) — a safety guarantee.

5. Maturity ramp (Bayesian Confidence-Bound — BCB-MR)

This is the first place models drive a decision. Source: lib/ml/maturity.ts, called from lib/pipeline-runner.ts → applyMaturityRamp. The maturity ramp gates exposure for offers whose posterior is too wide to rank confidently. Detailed math in /ai-ml/maturity-ramp. The short version: For each candidate offer, the engine computes the Wilson 95% credible-interval width for the offer-scope Bernoulli posterior (its positives + negatives aggregated across every algorithm model — the ramp reads only the offer scope, not a fallback hierarchy). If the width is ≤ tenant.settings.maturityWidthThreshold (default 0.20), the offer is mature → full exposure. Otherwise:

decayingFloor(n) = baseFloor / √(1 + n / decayHalfLife)
exposureProbability = max(decayingFloor(n), wilsonLower)

A deterministic hash (customerId, offerId, today) rolls against the exposure probability — if the roll exceeds it, the candidate is excluded from this customer’s decision today. Why this matters: cold-start offers get controlled exploration; mature offers run at full confidence; offers with strong early evidence aren’t punished by the floor decay. The posterior-width gate (vs. a fixed evidence-count threshold) lets low-volume offers mature once their CI is tight enough and keeps high-volume volatile offers in exploration when their CI stays wide.

6. Scoring — the model-heavy stage

Source: lib/pipeline-runner.ts (PRIE-U branch around line 2140) + lib/scoring/*.ts Each candidate gets scored. The scoring method is configured per decision flow as one of priority_weighted, propensity, or formula (PRIE / PRIE-U). The decision flow can also reference a specific algorithm model via the Score Node OR rely on a default scorer.

6a. Picking which algorithm scores this candidate

The platform supports 9 configurable model types plus ONNX import — 10 scoreable types in total. Each of the 9 configurable types has its own page under /ai-ml/algorithms/*; imported ONNX models (onnx_imported) are covered separately at /ai-ml/onnx-byo:

Type	When it fits
`scorecard`	Rule-based weights, no training needed. Best for transparency.
`bayesian`	Naive Bayes with online updates — industry-standard Bayesian classifier with per-feature posteriors.
`logistic_regression`	Calibrated probability output. Good baseline.
`gradient_boosted`	High-fidelity tabular. Best raw AUC. Requires retraining.
`thompson_bandit`	Exploration-exploitation per offer; converges to best arm.
`epsilon_greedy`	Simpler bandit, ε% exploration.
`online_learner`	Streaming SGD logistic regression.
`neural_cf`	Collaborative filtering, customer × offer embeddings.
`external_endpoint`	Delegate scoring to a 3rd-party HTTP scorer.
`onnx_imported`	Bring-your-own ONNX model.

registryStatus controls which models are used live. Only champion is the default scorer for its registry family; shadow scores silently for offline evaluation; challenger participates in experiments. Detailed lifecycle: /ai-ml/model-lifecycle.

6b. The hierarchical propensity read

When the scoring method is propensity, the engine reads ModelAdaptation rows in this priority order:

offer  →  channel  →  category  →  direction  →  global  →  0.5 fallback

Each tier has its own evidence threshold before it’s trusted (offer ≥ 50, channel ≥ 15, category ≥ 20, direction ≥ 10, global ≥ 10). The first tier above its threshold wins; if offer-level evidence is sparse-but-present (1–49), the score is blended with the strongest available fallback (channel → direction → category → global) via Bayesian shrinkage. The formula (PRIE) method resolves its P component through a narrower chain — offer → category → global → model score → 0.5 — because its channel and direction adaptations are consumed by the uplift τ term (stage 7) rather than the propensity factor. The propensitySource field on every scored candidate records WHICH tier fired (offer, offer+blend, channel, direction, category, global, fallback). Persisted into decision_traces.scoringResults[i].propensitySource so operators can answer “why did this offer rank where it did?”. See /ai-ml/model-lifecycle#scope-hierarchy for full thresholds.

Cold-start ranking honors Offer.priority. When there is no adaptation data at any tier, every candidate lands on the 0.5 fallback. The rank node breaks these ties by descending Offer.priority, so a brand-new tenant with no interaction history still returns offers in a meaningful order (highest business priority first) rather than arbitrary insertion order. As interaction data accumulates, propensity scores separate and take over from the priority tie-break. The response also carries degradedScoring: true whenever candidates were scored on the flat fallback (or a formula method had no formula configured), so you can tell “flat-scored cold-start” from “real model separation.”

6c. Bandits write per-offer state

For thompson_bandit and epsilon_greedy, every /respond updates the bandit posterior at the chosen scope:

Thompson stores Beta(α, β). Convert → α += 1; dismiss → β += 1.
ε-greedy stores (pulls, totalReward). Every respond increments pulls; positive outcomes also increment totalReward.

Both update incrementally per-respond — no batch retrain needed.

7. PRIE-U arbitration — the final ranking score

When the flow’s scoring method is formula, the final per-candidate score is a weighted geometric mean across five dimensions:

score = P^Wp × R^Wr × I^Wi × E^We × max(0.01, 0.5 + τ/2)^Wu

where each dimension comes from a different source:

Letter	Dimension	Source
P	Propensity	Stage 6b’s hierarchical adaptation read
R	Relevance	`computeRelevance(candidate, context)` — channel match, recency, segment fit
I	Impact	Composite of `Offer.businessValue / margin / revenueValue`
E	Emphasis	`Offer.priority / 100` — manual business priority
U	Uplift	CATE estimate `τ = μ_T − μ_C` mapped to `max(0.01, 0.5 + τ/2)`
C	CLV	`clvNorm = clvScore / 100` from the customer’s CLV row, applied as extra `impact` emphasis

Wp, Wr, Wi, We (which sum to 1) plus the two optional exponent terms Wu (uplift) and Wclv (CLV) come from the active RankingProfile (tenant.settings.defaultRankingProfileId, or specified on the flow) or the inline Score-node formula. The profile weight keys uplift and clv now map straight into upliftWeight / clvWeight (previously upliftWeight was documented but stripped by validation — it is now reachable). Default Wu = 0 and Wclv = 0 keep the legacy 4-factor PRIE bit-identical (back-compat). When Wu > 0, persuadable offers (τ positive) get a multiplicative boost and sleeping-dog offers (τ negative) get suppressed. When Wclv > 0, the per-offer impact factor gets an extra exponent of Wclv × clvNorm, so high-CLV customers get up to Wclv extra impact emphasis; a customer with no CLV row is left untouched. For the detailed CATE math, T-learner / X-learner derivations, and the four uplift segments (persuadable / sure_thing / lost_cause / sleeping_dog), see /ai-ml/uplift-modeling. The PRIE composition draws on the recommender-systems literature where propensity (likelihood of conversion), relevance (channel/context match), business impact, and editorial emphasis are four axes that any multi-objective ranker must combine. The uplift dimension U is what differentiates a causal ranker from a predictive one — see /ai-ml/uplift-modeling for the references.

8. Allocation — Hungarian or greedy

For multi-placement decisions (the group node in a decision flow), the engine has to assign offers to placement slots. Two strategies:

Hungarian: globally optimal assignment that maximizes the total score across all (offer, placement) pairs subject to constraints (one offer per slot, no offer repeated, channel coupling). O(n³). Default for premium accounts.
Greedy: fastest available offer wins; subsequent placements get the next-best. O(n log n). Used when latency budget is tight.

After allocation, the channel atomic coupling pass applies: if any placement on a channel with couplingMode: "atomic" is empty (couldn’t find a viable offer), the engine empties the ENTIRE channel — so a half-rendered email never goes out. The flow’s couplingOverride lets you toggle this per-flow.

Bookkeeping — what gets persisted

Before the response returns, the engine writes:

One recommendation-type interaction_history row per returned decision — via the new persistDecisionInteractions helper (Bug #248 fix). This is the audit join key the /respond route uses to bind a {customerId, rank} pair back to the (offerId, creativeId, channelId) that was actually shown.
One impression-type interaction_history row per decision delivered on a channel where impressionMode != "explicit" — for channels we send (email, batch), the impression is auto-recorded. For client-rendered channels (web, mobile push), the impression isn’t recorded until the client calls /api/v1/impressions.
One decision_trace row with the full forensic chain: qualification results, contact policy decisions, scoring results (with propensitySource, upliftTau, upliftMultiplier, and — when the CLV term is active — clvNorm / clvImpactExponent per candidate), selected offers, ranking weights used, experiment assignment if any, inputsHash, totalLatencyMs. Sampled per tenant.settings.decisionTraceSampleRate.

The learning loop — what `/respond` does to the next decision

The ModelAdaptation row updated at each scope is what the next /recommend reads in stage 6b. Because adaptations are tiered, a single respond improves scoring at every level the offer participates in:

(scope: "offer", scopeId: <offerId>) — directly improves this offer’s per-decision score
(scope: "category", scopeId: <categoryId>) — improves baseline for every offer in this category
(scope: "channel", scopeId: <channelId>) — improves baseline for every offer on this channel
(scope: "direction", scopeId: "inbound" | "outbound") — improves baseline for traffic with this intent
(scope: "global", scopeId: "") — improves baseline for everything

The Bug #248 attribution precondition guard prevents inflated learning: positive outcomes credited against an offer the customer was never actually shown (e.g. external attribution noise) get blocked from the adaptation upsert with status: "recorded_without_adaptation" plus an attribution_precondition_failed audit row. Model state stays protected. The model_matured telemetry event fires when an offer’s Wilson CI width crosses the maturity threshold downward for the first time — one event per (model × scope × scopeId) transition. See /ai-ml/maturity-ramp for how this gates exposure on subsequent /recommend calls.

Worked example — the headline finding from the model-architecture round

A real live-test result from /api/v1/algorithm-models/.../uplift?method=t_learner&mode=fitted against the e2e tenant. The same offer (Auto Loan Refi) produces a different CATE in two different score-time contexts:

Customer context (channel × direction)	τ (CATE)	Segment	What the engine concludes
direct_mail × inbound	−0.073	sleeping_dog	Showing this offer suppresses conversion. Hide it.
direct_mail × outbound	+0.0036	uncertain	Neutral effect. Default ranking applies.

The marginal mode (which used pre-aggregated ModelAdaptation rates) collapsed both to τ = 0 — couldn’t distinguish them. The fitted mode (two separate logistic regressions on treated vs. control subsets of interaction_history, with per-row features for channel one-hot, direction one-hot, time-of-day sin/cos, day-of-week sin/cos) produces context-varying τ — which is exactly what the per-customer CATE literature (Künzel et al. PNAS 2019) calls the heterogeneous treatment effect. Plug Wu > 0 into the flow’s RankingProfile and the ranking now actively pushes the sleeping_dog DOWN on inbound while leaving it neutral on outbound. The same model, the same offer, two different decisions per channel-direction context. This is what makes Kaireon’s decisioning behave differently from a propensity-only system.

Configuration knobs — quick reference

Every knob the operator can turn that affects the scoring path:

Per-tenant (`tenant.settings`)

Setting	Default	Where it lands
`maturityRampMode`	`"bayesian_ci"`	Stage 5 — BCB-MR vs. legacy_count
`maturityWidthThreshold`	`0.20`	Stage 5 + telemetry D threshold
`maturityRampColdStartFloor`	`0.50`	Stage 5 — `baseFloor`
`maturityFloorDecayHalfLife`	`10`	Stage 5 — `decayHalfLife` in `decayingFloor(n) = baseFloor / √(1 + n / decayHalfLife)`. Honored by the ramp runtime (`applyMaturityRamp`); higher = slower floor decay (offers stay in cold-start exposure longer). Bounded 1–1000
`modelMaturityThreshold`	`100`	Stage 5 — legacy_count mode only
`upliftMethodDefault`	`"t_learner"`	Stage 7 — default for `/uplift` endpoint
`propensityScoreFloor`	`0.05`	Stage 6 — minimum propensity component
`propensitySmoothingWeight`	`10`	Stage 6b — Bayesian shrinkage strength
`defaultRankingProfileId`	—	Stage 7 — Wp/Wr/Wi/We/Wu source

UI: /settings/models (new this round) exposes the maturity + uplift knobs; the rest live under /settings.

Per-RankingProfile (`weights` JSONB)

The profile’s JSONB keys are conversion (Wp, default 0.4), recency (Wr, default 0.2), margin (Wi, default 0.3), fairness (We, default 0.1), uplift (Wu, default 0), and clv (Wclv, default 0). The runtime maps them onto the PRIE factors (conversion → P, recency → R, margin → I, fairness → E) — see Scoring strategies. The inline Score-node formula uses the parallel field names propensityWeight / relevanceWeight / impactWeight / emphasisWeight / upliftWeight / clvWeight (P+R+I+E must sum to 1).

Per-DecisionFlow

rankingProfileId (which weights to use), scoringMethod (priority_weighted / propensity / formula), couplingOverride (channel atomic coupling), skipContactPolicy (rare — for synthetic flows that shouldn’t suppress).

Per-AlgorithmModel

status (operational), registryStatus (lifecycle), autoLearn (whether /respond updates state), learnMode, learnSchedule, outcomeWeights.

Where every model interaction lives

Pre-existing dives:

Algorithms — all 10 types — start here for picking an algorithm
Algorithm selection guide — decision tree for which algorithm fits which use case
Per-algorithm tutorials — one page per algorithm with config + behavior
Adaptive learning — how adaptive vs. predictive models interact
Learning cadence — when each algorithm retrains
Model lifecycle — draft → shadow → challenger → champion → archived

This-round dives:

Maturity Ramp (BCB-MR) — stage 5 gating with Wilson CI math
Uplift Modeling (T/X-learner) — the U dimension in PRIE-U
Scope hierarchy — stage 6b read order

API references:

/api/v1/recommend — the request that triggers stages 1-8
/api/v1/respond — the feedback that drives the learning loop
/api/v1/algorithm-models/[id]/uplift — exposes per-offer τ for inspection
/api/v1/algorithm-models/[id]/adaptations — per-scope posterior inspection
Decision Traces — the audit trail Provenance UI reads

Kaireon’s decisioning capabilities at a glance

Capability	Implementation
Per-(channel × direction) posterior	Per-scope row with `(scope, scopeId)` as the unit of adaptation
Maturity gate	Wilson credible-interval width (principled — width ≤ 0.20 at 95% CI = mature)
Per-customer CATE	T-learner and X-learner, per-context uplift estimation
τ in ranking	PRIE-U composite: `P^Wp × R^Wr × I^Wi × E^We × max(0.01, 0.5 + τ/2)^Wu`
Historical backfill	Idempotent cron, replayable from `decision_traces`
Maturity telemetry	`model_matured` audit event with old/new state
Per-scope adaptation UI	Adaptations panel with Wilson CI bars per `(scope, scopeId)`

See /about for the platform overview, or jump back to Core concepts for the building-block view.

​How scoring works — end-to-end

​The 30-second mental model

​Stage-by-stage walkthrough

​1. Load + enrich the customer

​2. Build the candidate offer set

​3. Qualification rules

​4. Contact policy (suppression)

​5. Maturity ramp (Bayesian Confidence-Bound — BCB-MR)

​6. Scoring — the model-heavy stage

​6a. Picking which algorithm scores this candidate

​6b. The hierarchical propensity read

​6c. Bandits write per-offer state

​7. PRIE-U arbitration — the final ranking score

​8. Allocation — Hungarian or greedy

​Bookkeeping — what gets persisted

​The learning loop — what /respond does to the next decision

​Worked example — the headline finding from the model-architecture round

​Configuration knobs — quick reference

​Per-tenant (tenant.settings)

​Per-RankingProfile (weights JSONB)

​Per-DecisionFlow

​Per-AlgorithmModel

​Where every model interaction lives

​Kaireon’s decisioning capabilities at a glance