Documentation Index
Fetch the complete documentation index at: https://docs.kaireonai.com/llms.txt
Use this file to discover all available pages before exploring further.
modelType: "gradient_boosted" — a sum-of-trees ensemble (gradient boosting machine). Each tree contributes a small additive margin; the final sigmoid converts the cumulative margin to a probability. Often abbreviated “AGB” in operator parlance (Adaptive Gradient Boosted). The accuracy ceiling on tabular data is hard to beat with anything that’s not also a tree ensemble.
When to use
- You have ≥ 10k labeled outcomes, especially with many feature columns.
- You suspect non-linear interactions — “income matters more for high-credit-score customers” — that logistic_regression can’t capture without manual feature engineering.
- You want SHAP-explainable predictions — TreeSHAP runs in polynomial time on tree ensembles and gives exact per-feature attributions.
logistic_regression or bayesian first.
The math
Fixture config
A minimal 2-tree ensemble (the proof script uses this exact fixture):score=0.8176 with rawMargin=1.50 for the standard test customer. The path-contribution explanation correctly identifies income as the dominant feature (it appears in both trees as a deep split).
Training
The engine doesn’t train GBTs in-process — it’s too expensive for a request-time pipeline. Train offline (Python + LightGBM or XGBoost, or your favorite GBT library) and import the model JSON intomodelState.trees. The shape above matches LightGBM’s JSON export format.
For production: run training on a schedule, store the latest model artifact, swap modelState via PUT /api/v1/algorithm-models/<id> after a hold-out evaluation.
Score interpretation
score∈[0, 1]— calibrated probability (well-calibrated when the ensemble has enough trees and isotonic post-calibration was applied during training).rawMargin— the pre-sigmoid log-odds. Operators reading the trace can see how many trees voted positively vs negatively.explanations[]— top contributing features along the chosen path through each tree. Per-feature contributions sum to the rawMargin.shapValues— full TreeSHAP attributions whencomputeShap: true. More expensive but exact. See SHAP.
Pitfalls
- Overfitting on small data — trees memorize. If you have < 1k outcomes, the test-set accuracy will be much worse than the training-set accuracy. Use early stopping during offline training.
- Categorical encoding — GBT libraries handle categoricals natively if told, but the engine’s tree format assumes numeric inputs. Pre-encode categoricals as ordinal (and let the GBT pick split points) or one-hot.
- Drift over time — tree splits are brittle to feature distribution shifts. Retrain monthly at minimum, weekly if conversion rate or customer mix is moving.
- Calibration drift after retraining — without isotonic post-calibration, the raw GBT score is a margin, not a probability. Run isotonic on a held-out set to keep
sigmoid(margin)calibrated. - Large model JSON — a 500-tree ensemble can be 5–50 MB. Watch
modelStatesize; the engine reads it on every score call. Consider downsampling trees or using leaf quantization.
Cross-reference
- Algorithm Selection Guide.
- SHAP — TreeSHAP is exact and fast for this algorithm.
- Logistic Regression — try this first; promote to GBT only if it materially beats logistic on hold-out.