Online Learner

When to use
The math
Fixture config
Training
Score interpretation
Pitfalls
Cross-reference

modelType: "online_learner" — a perceptron / online logistic regression that updates its weight vector on every observed outcome. No batch retrain phase; the model is always current as of the most recent reward.

When to use

You can’t afford a retrain pipeline — limited engineering capacity, fast-changing data.
You want a model that adapts to drift — seasonal patterns, campaign changes, news-driven shifts in customer behavior.
Numeric features only — the engine version handles numeric inputs natively (categoricals need preprocessing).

Skip it when you need calibrated probabilities (online updates can leave the model uncalibrated between adjustments) or when features are highly non-linear (use gradient_boosted).

The math

# Inference (same as logistic_regression):
z      = bias + Σ_i (weights[xᵢ] × xᵢ)
score  = sigmoid(z)

# After observing outcome:
error  = (observed_reward - score)
weights[xᵢ] += learningRate × error × xᵢ
bias        += learningRate × error

The learning rate η controls how aggressively each new outcome moves the weights. Common values: 0.01 (slow, stable) to 0.1 (fast, noisy).

Fixture config

{
  "modelType": "online_learner",
  "modelState": {
    "weights": { "credit_score": 0.004, "income": 0.000015, "age": 0.005 },
    "bias": -3.2,
    "learningRate": 0.01
  }
}

The proof script verifies this scores 0.8108 for the standard test customer — close to logistic_regression’s 0.93 (different weights, same shape). The slightly lower score reflects the more conservative weights you’d expect from an online learner that hasn’t seen as many outcomes as a batch logistic regression.

Training

Updates happen via POST /api/v1/respond — auto-learn.ts applies the gradient update for each incoming outcome. No separate train endpoint. To bootstrap: seed weights = {} and bias = 0 (cold start, every candidate scores 0.5). Or seed with weights from a batch-trained logistic_regression model — the online learner can then refine them as outcomes arrive.

Score interpretation

Same as logistic_regression — calibrated probability in [0, 1], conditional on the weight vector being stable. During rapid drift, the score is more “current estimate” than “calibrated probability”.

Pitfalls

High learning rate → instability — at η = 0.5, a single bad outcome can flip the sign of a weight. Stick to 0.01–0.1.
Feature scaling — same issue as logistic_regression. Standardize features before training.
No regularization — the engine’s online learner has no L2 by default; weights can drift unboundedly on rare features. If you see exploding magnitudes, add a decay step in auto-learn.
Lost-update on concurrent writes — two parallel respond calls can race on the same weight vector. The engine serializes writes to ModelAdaptation per (modelId, scopeId); if you customize the update path, preserve that contract.
No held-out evaluation — there’s no train/test split; you’re updating against the same stream that’s being scored. Watch for self-confirming loops (the model believes “premium customers respond” → only shows offers to premium → only learns from premium → over-confident on premium). Mix in exploration via shadowModelKeys if you suspect this.

Cross-reference

Algorithm Selection Guide.
Logistic Regression — batch-trained equivalent.

Epsilon-Greedy Bandit Neural Collaborative Filtering

Get Started

Tutorials

Decisioning

Studio

Data Pipelines

AI & ML

Operations & Reporting

Governance & Security

Integrations

Reference

When to use

The math

Fixture config

Training

Score interpretation

Pitfalls

Cross-reference

Get Started

Tutorials

Decisioning

Studio

Data Pipelines

AI & ML

Operations & Reporting

Governance & Security

Integrations

Reference

Documentation Index

​When to use

​The math

​Fixture config

​Training

​Score interpretation

​Pitfalls

​Cross-reference

When to use

The math

Fixture config

Training

Score interpretation

Pitfalls

Cross-reference