Bayesian (Naive Bayes)

When to use
The math
Fixture config
Training
Score interpretation
Pitfalls
Cross-reference

modelType: "bayesian" — a Naive Bayes classifier with Laplace smoothing. Returns a calibrated posterior probability that a customer responds positively to an offer, given a set of binned predictors. The model is small, fast to train, and produces per-feature log-likelihood contributions that explain each score.

When to use

You have 1k–10k labeled outcomes — enough to estimate per-feature likelihoods, not enough to justify gradient_boosted.
You need a calibrated probability — the score IS a probability, not a relative ranking. Useful when downstream consumers (e.g. budget pacers) need to multiply by expected value.
Some predictors are categorical with moderate cardinality — Naive Bayes handles segment, tier, state natively; logistic_regression needs one-hot encoding.

Skip it when features are heavily correlated — the “naive” independence assumption hurts quality. Use logistic_regression or gradient_boosted instead.

The math

P(positive | x₁, x₂, ..., xₙ)  ∝  P(positive) × Π_i P(xᵢ | positive)
P(negative | x₁, x₂, ..., xₙ)  ∝  P(negative) × Π_i P(xᵢ | negative)

# Laplace smoothing (α = 1 default):
P(xᵢ = v | class) = (count(xᵢ = v, class) + α) / (count(class) + α × |unique values of xᵢ|)

# Final score (probability of positive):
score = expPos / (expPos + expNeg)

The engine works in log space to avoid underflow:

logOdds_positive = log(P(positive)) + Σ_i log(P(xᵢ | positive))
logOdds_negative = log(P(negative)) + Σ_i log(P(xᵢ | negative))
score = sigmoid(logOdds_positive - logOdds_negative)

Numeric predictors are binned automatically — binEdges[field] defines bucket boundaries. Categorical predictors are used directly.

Fixture config

{
  "modelType": "bayesian",
  "config": { "laplaceSmoothingAlpha": 1 },
  "predictors": [
    { "field": "credit_score", "selected": true, "importance": 0.40 },
    { "field": "income",       "selected": true, "importance": 0.25 },
    { "field": "age",          "selected": true, "importance": 0.10 },
    { "field": "segment",      "selected": true, "importance": 0.25 }
  ],
  "modelState": {
    "priors": { "positive": 200, "negative": 800 },
    "likelihoods": {
      "credit_score": {
        "positive": { "bin_0": 5,  "bin_1": 25,  "bin_2": 80,  "bin_3": 90 },
        "negative": { "bin_0": 200,"bin_1": 350, "bin_2": 200, "bin_3": 50 }
      },
      "income": {
        "positive": { "bin_0": 10, "bin_1": 30,  "bin_2": 70,  "bin_3": 90 },
        "negative": { "bin_0": 250,"bin_1": 300, "bin_2": 180, "bin_3": 70 }
      },
      "segment": {
        "positive": { "Bronze": 10, "Silver": 40, "Gold": 80,  "Platinum": 70 },
        "negative": { "Bronze": 200,"Silver": 300,"Gold": 200, "Platinum": 100 }
      }
    },
    "binEdges": {
      "credit_score": [600, 700, 750],
      "income":       [40000, 70000, 100000]
    }
  }
}

The algorithm-coverage proof script verifies this fixture produces score = 0.812 for the standard test customer (credit_score=760, income=95000, age=38, segment=Gold), with credit_score contributing the most positive log-likelihood and segment + income reinforcing.

Training

The engine’s train.ts increments priors[outcome] and likelihoods[field][outcome][bin] for every observed interaction. No SGD, no learning rate — counts go up, smoothed estimates fall out for free. To retrain: hit POST /api/v1/algorithm-models/<id>/train after a batch of new interactions. Or set the model’s auto-learn cron to incrementally update from each new respond call.

Score interpretation

score ∈ [0, 1] — posterior probability of positive response.
confidence ∈ [0, 1] — heuristic based on sqrt(sample_count / 1000). A confidence of 1.0 means we’ve seen 1k+ outcomes for this model.
explanations[] — sorted by absolute contribution. Each entry is {field, contribution: logLik(pos) - logLik(neg)}. Positive contributions push toward responding; negative ones away.

Pitfalls

Strong predictor correlation — Naive Bayes double-counts correlated features (credit_score and income both signal affluence). Result: overconfident scores in the extremes. If you see scores clustering at 0.01 or 0.99, this is usually why.
Sparse bins — if a bin has 0 positive outcomes but α=1, smoothing pulls it to a tiny but non-zero rate. Acceptable as long as α is small relative to total samples.
Imbalanced priors — if conversion rate is 1% and priors aren’t reset before retrain, the prior overwhelms the per-feature likelihoods. Bayesian needs the priors to track actual class balance.
Categorical drift — if a new segment value appears that wasn’t in training data, it gets the smoothing-only count and basically falls back to the prior. Retrain when categories change.

Cross-reference

Algorithm Selection Guide — when to pick Bayesian over alternatives.
SHAP — Bayesian explanations are already per-feature log-likelihood contributions; no separate SHAP pass needed.

Scorecard Logistic Regression

Get Started

Tutorials

Decisioning

Studio

Data Pipelines

AI & ML

Operations & Reporting

Governance & Security

Integrations

Reference

Bayesian (Naive Bayes)

When to use

The math

Fixture config

Training

Score interpretation

Pitfalls

Cross-reference

Get Started

Tutorials

Decisioning

Studio

Data Pipelines

AI & ML

Operations & Reporting

Governance & Security

Integrations

Reference

Documentation Index

​When to use

​The math

​Fixture config

​Training

​Score interpretation

​Pitfalls

​Cross-reference

When to use

The math

Fixture config

Training

Score interpretation

Pitfalls

Cross-reference