Documentation Index
Fetch the complete documentation index at: https://docs.kaireonai.com/llms.txt
Use this file to discover all available pages before exploring further.
modelType: "epsilon_greedy" — the simplest workable bandit. On most calls, exploit the arm with the highest observed mean reward. With probability ε, pick a random arm to explore. ε decays over time so exploration tapers off as evidence accumulates.
When to use
- You want a simple, debuggable bandit baseline before reaching for Thompson.
- You can tolerate the exploration noise — every
1/εcalls is essentially random. - You need an explicit exploration rate knob — operators can set
εdirectly, unlike Thompson where exploration is implicit in posterior variance.
neural_cf) or when sample efficiency matters (Thompson dominates ε-greedy asymptotically).
The math
Fixture config
ε_t = 0.1 / (1 + 0.001 × 300) = 0.077. So 7.7% of calls explore randomly; 92.3% exploit and pick cashback.
The proof script disables exploration (ε=0) for deterministic verification: cashback wins with score 0.650, travel 0.300, nofee 0.180 — exactly the means.
Training
Same as Thompson — updates happen viaPOST /api/v1/respond. The engine’s auto-learn.ts increments totalReward and pulls per arm.
To bootstrap an unexplored arm: leave pulls = 0 and rely on optimistic initialization (score = 1.0). The first random exploration that picks it provides initial data.
Score interpretation
score=totalReward_i / pulls_ifor arms with pulls > 0.score = 1.0for unpulled arms (optimistic init — gets explored at least once).score = random()for ALL arms during an explore call.
method: propensity only.
Pitfalls
- ε too high — at
ε = 0.2, one in five calls is random. If your traffic is millions of impressions, that’s hundreds of thousands of wasted opportunities. Default to0.05–0.10. - ε too low — at
ε = 0.001, exploration barely happens and the model can’t recover from an early bad estimate. If you suspect this, force-reset viaPOST /api/v1/algorithm-models/<id>/reset-offerand warm-start from a Thompson prior. - decayRate too aggressive —
decayRate = 0.01halvesε_tafter 100 pulls. Combine with a small baseεand you’ll have effectively no exploration after a few thousand requests. Calibrate against expected total traffic. - Random number quality — the engine uses
Math.random()which is not cryptographically random but is good enough for traffic-allocation noise. Don’t repurpose this model for security decisions.
Cross-reference
- Algorithm Selection Guide.
- Thompson Sampling Bandit — usually a better choice once you understand the theory.