Logistic Regression

When to use
The math
Fixture config
Training
Score interpretation
Pitfalls
Cross-reference

modelType: "logistic_regression" — a single-layer linear model: dot-product the customer’s feature vector with a learned weight vector, add a bias, push through a sigmoid. Probably the most-deployed classifier in production decisioning systems for a reason: cheap to train, cheap to score, easy to defend.

When to use

You have ≥ 1k labeled outcomes and numeric features — the linear weighted-sum structure benefits from numeric inputs (categoricals need one-hot).
You need calibrated probabilities for budget pacing or expected-value calculations — the sigmoid output is calibrated within the linear region.
You’re comparing against a Bayesian baseline — logistic and Bayesian are the two “first-real-model” picks. Train both, A/B test them via shadowModelKeys[].

Skip it when features have meaningful interactions (e.g. “income matters more for high-credit-score customers”) — linear regression can’t capture interaction terms without feature engineering. Use gradient_boosted instead.

The math

z      = bias + Σ_i (weights[xᵢ] × xᵢ)
score  = sigmoid(z) = 1 / (1 + e^(-z))

Training minimizes binary cross-entropy with optional L2 regularization. The engine’s training routine is SGD-based; one pass is typically enough on small datasets.

Fixture config

{
  "modelType": "logistic_regression",
  "predictors": [
    { "field": "credit_score", "selected": true, "importance": 0.40 },
    { "field": "income",       "selected": true, "importance": 0.25 },
    { "field": "age",          "selected": true, "importance": 0.10 },
    { "field": "segment",      "selected": true, "importance": 0.25 }
  ],
  "modelState": {
    "weights": {
      "credit_score": 0.005,
      "income":       0.00002,
      "age":          0.01,
      "segment":      0.5
    },
    "bias": -4.0
  }
}

The algorithm-coverage proof verifies this scores 0.930 for the standard test customer. Highest contribution: credit_score × 760 × 0.005 = 3.8 (raw); next income × 95000 × 0.00002 = 1.9.

Training

POST /api/v1/algorithm-models/<id>/train runs SGD over the observed interactions. Weights converge to maximize log-likelihood. The training routine’s hyperparameters (learning rate, L2 strength, epoch count) live on model.config:

{ "learningRate": 0.01, "l2": 0.001, "epochs": 50 }

Categorical features need explicit one-hot expansion in the training data — the engine binarizes segment="Gold" to 1 if present, 0 if absent. Multi-valued categoricals (segment ∈ ) need 4 binary features.

Score interpretation

score ∈ [0, 1] — calibrated probability.
explanations[] — per-feature contribution weight × value, sorted by absolute magnitude. Positive contributions push toward responding.

Pitfalls

Categoricals treated as scalars — segment = 3 for Gold is nonsense (no ordinal relationship). Always one-hot expand.
Unscaled features — credit_score (300–850) and income (0–500000) on the same model dominate age (18–80). Standardize to z-scores or min-max normalize before training, otherwise weights for small-magnitude features get pushed to zero by L2.
Multicollinearity — heavily correlated features split the credit; explanations become misleading. Drop one of each correlated pair.
Missing intercept — leaving bias at 0 forces every score through the origin. Always include the bias term.
Class imbalance — if positive rate is 1% and the loss is unweighted, the model learns to always predict “negative”. Use class weighting or downsample negatives in training.

Cross-reference

Algorithm Selection Guide.
Bayesian — natural baseline comparison.
Gradient Boosted Trees — pick this instead when interactions matter.

Bayesian (Naive Bayes)Gradient Boosted Trees (AGB)

Get Started

Tutorials

Decisioning

Studio

Data Pipelines

AI & ML

Operations & Reporting

Governance & Security

Integrations

Reference

Logistic Regression

When to use

The math

Fixture config

Training

Score interpretation

Pitfalls

Cross-reference

Get Started

Tutorials

Decisioning

Studio

Data Pipelines

AI & ML

Operations & Reporting

Governance & Security

Integrations

Reference

Documentation Index

​When to use

​The math

​Fixture config

​Training

​Score interpretation

​Pitfalls

​Cross-reference

When to use

The math

Fixture config

Training

Score interpretation

Pitfalls

Cross-reference