A/B test with holdout

What this solves
Why this works
Step 1 — Set the holdout percentage
Step 2 — Configure champion/challenger on the Score node
Step 3 — Capture the variant on each decision
Step 4 — Measure uplift
Gotchas
What the trace will show
Proof reference

What this solves

Most A/B tests confuse “which variant won” with “did the engine actually help vs doing nothing?” A holdout group answers the second question: a known percentage of traffic gets zero personalization (or a fixed-rule fallback). Comparing engaged-rate across variant × (in-experiment vs holdout) gives you causal uplift, not just relative ranking.

Why this works

The platform has two complementary mechanisms:

Champion / Challenger on the Score node — championChallenger.{champion, challengers[]} routes per-customer via deterministic hash so the same customer always lands in the same variant.
Tenant-level holdout percentage — tenant.settings.holdoutPercentage (0-100) reserves that share of traffic for a “no NBA” fallback that returns offers sorted by priority weight only (the same path NBA-disabled tenants take).

Combine them and you get: champion-vs-challenger inside the experiment, control group outside, all variants persisted on every decision trace.

Step 1 — Set the holdout percentage

curl -X PUT https://playground.kaireonai.com/api/v1/tenant-settings \
  -H "Content-Type: application/json" -H "X-Requested-With: XMLHttpRequest" \
  -d '{ "holdoutPercentage": 10 }'

This sends ~10% of every customer’s deterministic-random roll into the priority-only fallback. Verify with GET /api/v1/tenant-settings afterward.

Step 2 — Configure champion/challenger on the Score node

{
  "id": "score",
  "type": "score",
  "config": {
    "method": "formula",
    "championChallenger": {
      "enabled": true,
      "experimentId": "cards-q4-uplift",
      "champion":    { "modelKey": "scorecard-v2",   "weight": 50 },
      "challengers": [
        { "modelKey": "bayesian-v2",         "weight": 30 },
        { "modelKey": "gradient_boosted-v2", "weight": 20 }
      ]
    }
  }
}

The weights sum to 100. Each customer’s customerId × experimentId hash falls into one bucket; the routing is persistent across sessions for that customer until you change the configuration.

Step 3 — Capture the variant on each decision

The recommend response includes:

{
  "experimentVariant": "challenger-bayesian-v2",
  "controlGroup": false,
  ...
}

controlGroup: true means this customer was in the holdout — the engine ran the priority-only fallback path. The decision_traces.experimentAssignment JSONB persists the variant for later analysis.

Step 4 — Measure uplift

The platform’s /api/v1/experiments/uplift endpoint computes z-tested uplift between in-experiment and holdout:

curl https://playground.kaireonai.com/api/v1/experiments/cards-q4-uplift/uplift \
  -H "X-Requested-With: XMLHttpRequest"

Returns conversion rate per variant + the holdout, along with a confidence interval. The math lives in platform/src/lib/experimentation/uplift.ts.

Gotchas

Holdout is tenant-wide. Setting holdoutPercentage affects every flow for that tenant; if you need per-flow holdouts use the experiment.holdoutPercent field on the Experiment resource instead.
Variant assignment is persistent. The same customer always sees the same variant — even after the experiment ends, until you flip championChallenger.enabled to false.
autoPromote on the Experiment resource (when enabled) automatically promotes the winning challenger to champion after the experiment meets its success criteria. Combine with four-eyes approval for governance.

What the trace will show

customerId   | experimentVariant         | controlGroup | finalCount
cust-A-001   | champion-scorecard-v2     | false        | 3
cust-A-002   | challenger-bayesian-v2    | false        | 3
cust-A-003   | (none)                    | true         | 3      ← holdout, priority-only path

Proof reference

T11 (bulk respond) + T15 (scoring strategy resolution) + the experiment fixture in T1 of the proof bundle cover this end-to-end.

Decisioning Recipes Atomic channel coupling

Get Started

Tutorials

Decisioning

Studio

Data Pipelines

AI & ML

Operations & Reporting

Governance & Security

Integrations

Reference

A/B test with holdout

What this solves

Why this works

Step 1 — Set the holdout percentage

Step 2 — Configure champion/challenger on the Score node

Step 3 — Capture the variant on each decision

Step 4 — Measure uplift

Gotchas

What the trace will show

Proof reference

Get Started

Tutorials

Decisioning

Studio

Data Pipelines

AI & ML

Operations & Reporting

Governance & Security

Integrations

Reference

Documentation Index

​What this solves

​Why this works

​Step 1 — Set the holdout percentage

​Step 2 — Configure champion/challenger on the Score node

​Step 3 — Capture the variant on each decision

​Step 4 — Measure uplift

​Gotchas

​What the trace will show

​Proof reference

What this solves

Why this works

Step 1 — Set the holdout percentage

Step 2 — Configure champion/challenger on the Score node

Step 3 — Capture the variant on each decision

Step 4 — Measure uplift

Gotchas

What the trace will show

Proof reference