Documentation Index
Fetch the complete documentation index at: https://docs.kaireonai.com/llms.txt
Use this file to discover all available pages before exploring further.
What it does
The arbitration engine multiplies offer scores by a weight vector{propensity, relevance, impact, emphasis, diversity}. Operators
historically set those weights manually and rarely changed them —
EXP3-IX online tuning treats the weight vector as arms of a
contextual bandit and updates per recorded outcome.
Reference: Neu (2015), Explore no more: Improved high-probability
regret bounds for non-stochastic bandits. EXP3-IX is the
variance-bounded variant of EXP3 that handles adversarial feedback
within high-probability regret bounds of O(√(K·T·log K)), where K
is the number of arms and T the round count.
Why use it
- Optimal weight mix shifts with time-of-day, segment, channel, campaign pressure. Manually chasing optimality is lost margin.
- The bandit converges in O(√T log K) regret to the best-fixed weight mix among the configured arms.
- All math is online / incremental. No retraining job.
Configuration
Two settings live undertenantSettings.aiAnalyzerSettings.arbitration:
arbitration.banditState after every outcome update. That row IS the
state — there is no separate table.
Honest limits
- No auto-bootstrap. When
banditConfig.armsis missing or empty,readBanditState()returnsnulland the wire is a structured no-op. Operators must explicitly configure arms. - Arm-index thread-through. The recommend route must persist
banditArmIndexon the interaction’s response payload for the respond route to know which arm to credit. Today only batch pipelines that opt in to bandit sampling thread this through; the realtime recommend hot path is roadmap. - Reward derivation. V1 maps outcome classification to reward:
positive→ 1,negative/neutral→ 0. More nuanced reward shaping (revenue-aware, time-discounted) is a follow-up.
What gets logged
Per outcome that lands inrespond/route.ts with the flag on:
null (no arms configured) or the response
payload lacks banditArmIndex, the wire silently no-ops without
emitting a log line. That’s the contract — silence is healthy
“no work to do.” Failures emit an ERROR bandit update failed line
with the underlying message and never block the respond response.
Operational checklist
- Configure 2–5 arm presets covering the operating envelope you care about. Three is a good starting point.
- Choose
gammabased on how much exploration you can tolerate. The default0.1reserves 10 % of traffic for uniform exploration. - Watch the
bandit.updaterate vs theoutcome.recordedrate: they should match when the flag is on. A divergence means arm-index threading is broken upstream. - To rollback: flip
exp3IxEnabledtofalse. The wire stops on the next request. The persistedbanditStateis not deleted — on re-enable the bandit picks up where it left off.
Cross-references
arbitration-budget-pacing.mdx— same flag namespace, runs in batch-executor.arbitration-goal-seek.mdx— same flag namespace, runs in batch-executor.arbitration-lagrangian.mdx— soft-constraint shadow-price adjustment that complements bandit weight tuning.