ADM Counterfactual Training

What it does
Honest limits
Tenant-settings flag
API surface
Algorithm — what it does, what it doesn’t
Tests

What it does

The counterfactual trainer is a pre-train hook that sharpens the decision boundary of the gradient_boosted model by augmenting the training set with synthetic rows near low-confidence predictions. It runs entirely in TypeScript before the existing remote-GBM training call, so the Python ml-worker stays unchanged. The augmenter produces an enriched training set + a deterministic summary describing how many synthetic rows were added and which feature columns were perturbed.

Honest limits

Numeric only. Boolean / categorical features are held as-is on synthetic rows. Perturbation rules for them are undefined — silently perturbing them would corrupt training data.
Deterministic. mulberry32 RNG seeded by options.seed (default 7) so two runs with the same options produce the same synthetic data.
Bounded budget. maxSynthetic defaults to 10_000. The function surfaces a summary.syntheticRowsAdded so operators can audit how much data was added per training pass.
Not a feature-store substitute. This augments the training set at call time. It does not modify any persisted dataset.

Tenant-settings flag

Operators opt in per-tenant (the augmenter does not run by default — training time + cost rises proportionally to K × marginalCount):

{
  "aiAnalyzerSettings": {
    "ml": {
      "counterfactualTrainingEnabled": true,    // default false
      "marginalBand": 0.1,                      // 0..0.5
      "syntheticPerRow": 4,                     // 1..16
      "maxSynthetic": 10000                     // hard cap
    }
  }
}

The wire into lib/scoring/train.ts consumes these via the existing getImportSettings-pattern reader. With the flag off, the trainer bypasses the augmenter entirely.

API surface

The augmenter is a pure TS function. There is no HTTP endpoint — it runs inline at training time. Callers use it like:

import { augmentWithCounterfactuals } from "@/lib/ml/counterfactual-trainer";
import { trainGBMRemote } from "@/lib/ml-worker-client";
import { scoreGradientBoosted } from "@/lib/scoring/gradient-boosted";

const scorer = (features) => scoreGradientBoosted(currentModel, features).p;
const { augmented, summary } = augmentWithCounterfactuals({
  request,
  scorer,
  options: { marginalBand: 0.1, syntheticPerRow: 4, seed: 7 },
});
console.log("counterfactual augmentation:", summary);

const result = await trainGBMRemote(augmented);

Algorithm — what it does, what it doesn’t

What the augmenter does, step by step:

Score every row of the labeled training set with the current gradient_boosted model via the supplied scorer.
Identify marginal rows — predicted probability in [0.5 - marginalBand .. 0.5 + marginalBand] (default band 0.1).
For each marginal row, generate K synthetic neighbors by perturbing each numeric feature with gaussian noise scaled to the observed feature standard deviation (default 0.5σ, K=4).
Append the synthetic rows to the training set and return the augmented set with a reproducibility summary.

What the augmenter does not do:

It does not perturb boolean or categorical features. Those are held as-is on synthetic rows.
It does not perform binned-Bayes predictor grouping or online incremental updates. The augmenter only widens the training set; the gradient-boosted trainer itself does the learning.
It does not vary the learning-rate schedule per row. The augmented set is fed to the remote GBM trainer with the same hyperparameters as any other training pass.
It does not modify any persisted dataset. Augmentation happens at call time only.

Every synthetic row is reproducible from the seed, the input set, and the scorer.

Tests

The counterfactual-trainer test suite ships 8 cases covering: empty-set passthrough, numeric-vs-skipped feature classification, marginal-row identification, no-marginal-row passthrough, maxSynthetic cap, label preservation, boolean-feature unchanged, deterministic seed.

ADM depth — auto-rollback + preprocessing ONNX bring-your-own model importer

Get Started

Tutorials

Decisioning

Studio

Data Pipelines

AI & ML

Operations & Reporting

Governance & Security

Integrations

Reference

ADM Counterfactual Training

What it does

Honest limits

Tenant-settings flag

API surface

Algorithm — what it does, what it doesn’t

Tests

Get Started

Tutorials

Decisioning

Studio

Data Pipelines

AI & ML

Operations & Reporting

Governance & Security

Integrations

Reference

Documentation Index

​What it does

​Honest limits

​Tenant-settings flag

​API surface

​Algorithm — what it does, what it doesn’t

​Tests

What it does

Honest limits

Tenant-settings flag

API surface

Algorithm — what it does, what it doesn’t

Tests