Documentation Index
Fetch the complete documentation index at: https://docs.kaireonai.com/llms.txt
Use this file to discover all available pages before exploring further.
What it does
The Pega ADM Adaptive Boosting trainer claims it learns near the decision boundary by oversampling marginal predictions. Kaireon’s counterfactual trainer is the open-source equivalent:- Score every row of the labeled training set with the current gradient_boosted model.
- Identify marginal rows — predicted probability in
[0.5 - marginalBand .. 0.5 + marginalBand](default band 0.1). - Generate K synthetic neighbors per marginal row by perturbing
each numeric feature with gaussian noise scaled to the observed
feature standard deviation (default
0.5σ, K=4). - Append the synthetics to the training set; the augmented set is
what gets sent to the existing
trainGBMRemote()pipeline.
Honest limits
- Numeric only. Boolean / categorical features are held as-is on synthetic rows. Perturbation rules for them are undefined — silently perturbing them would corrupt training data.
- Deterministic. mulberry32 RNG seeded by
options.seed(default 7) so two runs with the same options produce the same synthetic data. - Bounded budget.
maxSyntheticdefaults to 10_000. The function surfaces asummary.syntheticRowsAddedso operators can audit how much data was added per training pass. - Not a feature-store substitute. This augments the training set at call time. It does not modify any persisted dataset.
Tenant-settings flag
Operators opt in per-tenant (the augmenter does not run by default — training time + cost rises proportionally toK × marginalCount):
lib/scoring/train.ts consumes these via the existing
getImportSettings-pattern reader. With the flag off, the trainer
bypasses the augmenter entirely.
API surface
The augmenter is a pure TS function. There is no HTTP endpoint — it runs inline at training time. Callers use it like:Honest comparison with Pega’s claim
Pega documents Adaptive Boosting’s marginal-emphasis behavior at the algorithm level; the implementation lives behind their proprietary ADM. Kaireon’s augmenter takes the same approach but exposes the auditable function: every synthetic row is reproducible from the seed, the input set, and the scorer. What we don’t claim: Pega’s full ADM pipeline includes binned-Bayes predictor grouping, online incremental updates, and a learning-rate schedule that varies by row. Counterfactual augmentation is one of those mechanisms, not all of them. The composite ADM grade (§3.2 Adaptive learning) reflects this honestly.
Tests
platform/src/lib/ml/__tests__/counterfactual-trainer.test.ts — 8
tests cover: empty-set passthrough, numeric-vs-skipped feature
classification, marginal-row identification, no-marginal-row passthrough,
maxSynthetic cap, label preservation, boolean-feature unchanged,
deterministic seed.