EXP3-IX online weight tuning

What it does
Why use it
Configuration
Honest limits
What gets logged
Operational checklist
Cross-references

What it does

The arbitration engine multiplies offer scores by a weight vector {propensity, relevance, impact, emphasis, diversity}. Operators historically set those weights manually and rarely changed them — EXP3-IX online tuning treats the weight vector as arms of a contextual bandit and updates per recorded outcome. Reference: Neu (2015), Explore no more: Improved high-probability regret bounds for non-stochastic bandits. EXP3-IX is the variance-bounded variant of EXP3 that handles adversarial feedback within high-probability regret bounds of O(√(K·T·log K)), where K is the number of arms and T the round count.

Why use it

Optimal weight mix shifts with time-of-day, segment, channel, campaign pressure. Manually chasing optimality is lost margin.
The bandit converges in O(√T log K) regret to the best-fixed weight mix among the configured arms.
All math is online / incremental. No retraining job.

Configuration

Two settings live under tenantSettings.aiAnalyzerSettings.arbitration:

{
  "arbitration": {
    "exp3IxEnabled": true,
    "banditConfig": {
      "arms": [
        { "id": "exploit", "weights": { "propensity": 1, "relevance": 0, "impact": 0, "emphasis": 0, "diversity": 0 } },
        { "id": "balanced", "weights": { "propensity": 0.5, "relevance": 0.3, "impact": 0.1, "emphasis": 0.05, "diversity": 0.05 } },
        { "id": "diverse", "weights": { "propensity": 0.4, "relevance": 0.2, "impact": 0.1, "emphasis": 0.1, "diversity": 0.2 } }
      ],
      "hyperparams": { "gamma": 0.1, "eta": 0.05, "beta": 0.01 }
    }
  }
}

The bandit’s running log-weights are persisted at arbitration.banditState after every outcome update. That row IS the state — there is no separate table.

Honest limits

No auto-bootstrap. When banditConfig.arms is missing or empty, readBanditState() returns null and the wire is a structured no-op. Operators must explicitly configure arms.
Arm-index thread-through. The recommend route must persist banditArmIndex on the interaction’s response payload for the respond route to know which arm to credit. Today only batch pipelines that opt in to bandit sampling thread this through; the realtime recommend hot path is roadmap.
Reward derivation. V1 maps outcome classification to reward: positive → 1, negative/neutral → 0. More nuanced reward shaping (revenue-aware, time-discounted) is a follow-up.

What gets logged

Per outcome that lands in respond/route.ts with the flag on:

INFO bandit.update {
  tenantId,
  armIndex,
  reward,   // 0 or 1 in V1
  changed   // true iff log-weight actually moved
}

When the helper returns null (no arms configured) or the response payload lacks banditArmIndex, the wire silently no-ops without emitting a log line. That’s the contract — silence is healthy “no work to do.” Failures emit an ERROR bandit update failed line with the underlying message and never block the respond response.

Operational checklist

Configure 2–5 arm presets covering the operating envelope you care about. Three is a good starting point.
Choose gamma based on how much exploration you can tolerate. The default 0.1 reserves 10 % of traffic for uniform exploration.
Watch the bandit.update rate vs the outcome.recorded rate: they should match when the flag is on. A divergence means arm-index threading is broken upstream.
To rollback: flip exp3IxEnabled to false. The wire stops on the next request. The persisted banditState is not deleted — on re-enable the bandit picks up where it left off.

Cross-references

arbitration-budget-pacing.mdx — same flag namespace, runs in batch-executor.
arbitration-goal-seek.mdx — same flag namespace, runs in batch-executor.
arbitration-lagrangian.mdx — soft-constraint shadow-price adjustment that complements bandit weight tuning.

Budget pacing Goal-seek arbitration

Get Started

Deploy & Operate

Runbooks

Data Platform

Decisioning Studio

Execute & Optimize

Intelligence

Platform & Security

Integrations

Reports

Release Notes

EXP3-IX online weight tuning

What it does

Why use it

Configuration

Honest limits

What gets logged

Operational checklist

Cross-references

Get Started

Deploy & Operate

Runbooks

Data Platform

Decisioning Studio

Execute & Optimize

Intelligence

Platform & Security

Integrations

Reports

Release Notes

Documentation Index

​What it does

​Why use it

​Configuration

​Honest limits

​What gets logged

​Operational checklist

​Cross-references

What it does

Why use it

Configuration

Honest limits

What gets logged

Operational checklist

Cross-references