Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.kaireonai.com/llms.txt

Use this file to discover all available pages before exploring further.

What it does

The arbitration engine multiplies offer scores by a weight vector {propensity, relevance, impact, emphasis, diversity}. Operators historically set those weights manually and rarely changed them — EXP3-IX online tuning treats the weight vector as arms of a contextual bandit and updates per recorded outcome. Reference: Neu (2015), Explore no more: Improved high-probability regret bounds for non-stochastic bandits. EXP3-IX is the variance-bounded variant of EXP3 that handles adversarial feedback within high-probability regret bounds of O(√(K·T·log K)), where K is the number of arms and T the round count.

Why use it

  • Optimal weight mix shifts with time-of-day, segment, channel, campaign pressure. Manually chasing optimality is lost margin.
  • The bandit converges in O(√T log K) regret to the best-fixed weight mix among the configured arms.
  • All math is online / incremental. No retraining job.

Configuration

Two settings live under tenantSettings.aiAnalyzerSettings.arbitration:
{
  "arbitration": {
    "exp3IxEnabled": true,
    "banditConfig": {
      "arms": [
        { "id": "exploit", "weights": { "propensity": 1, "relevance": 0, "impact": 0, "emphasis": 0, "diversity": 0 } },
        { "id": "balanced", "weights": { "propensity": 0.5, "relevance": 0.3, "impact": 0.1, "emphasis": 0.05, "diversity": 0.05 } },
        { "id": "diverse", "weights": { "propensity": 0.4, "relevance": 0.2, "impact": 0.1, "emphasis": 0.1, "diversity": 0.2 } }
      ],
      "hyperparams": { "gamma": 0.1, "eta": 0.05, "beta": 0.01 }
    }
  }
}
The bandit’s running log-weights are persisted at arbitration.banditState after every outcome update. That row IS the state — there is no separate table.

Honest limits

  • No auto-bootstrap. When banditConfig.arms is missing or empty, readBanditState() returns null and the wire is a structured no-op. Operators must explicitly configure arms.
  • Arm-index thread-through. The recommend route must persist banditArmIndex on the interaction’s response payload for the respond route to know which arm to credit. Today only batch pipelines that opt in to bandit sampling thread this through; the realtime recommend hot path is roadmap.
  • Reward derivation. V1 maps outcome classification to reward: positive → 1, negative/neutral → 0. More nuanced reward shaping (revenue-aware, time-discounted) is a follow-up.

What gets logged

Per outcome that lands in respond/route.ts with the flag on:
INFO bandit.update {
  tenantId,
  armIndex,
  reward,   // 0 or 1 in V1
  changed   // true iff log-weight actually moved
}
When the helper returns null (no arms configured) or the response payload lacks banditArmIndex, the wire silently no-ops without emitting a log line. That’s the contract — silence is healthy “no work to do.” Failures emit an ERROR bandit update failed line with the underlying message and never block the respond response.

Operational checklist

  • Configure 2–5 arm presets covering the operating envelope you care about. Three is a good starting point.
  • Choose gamma based on how much exploration you can tolerate. The default 0.1 reserves 10 % of traffic for uniform exploration.
  • Watch the bandit.update rate vs the outcome.recorded rate: they should match when the flag is on. A divergence means arm-index threading is broken upstream.
  • To rollback: flip exp3IxEnabled to false. The wire stops on the next request. The persisted banditState is not deleted — on re-enable the bandit picks up where it left off.

Cross-references

  • arbitration-budget-pacing.mdx — same flag namespace, runs in batch-executor.
  • arbitration-goal-seek.mdx — same flag namespace, runs in batch-executor.
  • arbitration-lagrangian.mdx — soft-constraint shadow-price adjustment that complements bandit weight tuning.