decisioning-bench — open NBA benchmark

Datasets
Submission contract
Running the harness
Scoring rubric
Honest limits

Datasets

Dataset	Customers	Offers	Domain
`banking-cards`	5,000	12	Credit-card propensity & cross-sell
`telco-churn`	8,000	6	Retention offer routing
`retail-loyalty`	10,000	20	Loyalty-tier upgrade & coupon

Each dataset ships customers.csv, offers.json, outcomes.csv, splits.json, and meta.json. All synthetic — no PII. Generated by tools/qa/decisioning-bench/datasets/generate.ts with a hard-coded seed so every checkout reproduces the same rows.

Submission contract

A submission is a Docker image that:

Listens on :8080.
Accepts POST /recommend with body { customerId, channelId?, attributes? } and returns { decisionTraceId, offers: [{ offerId, score, rank }] }.
Accepts POST /respond with body { customerId, outcome, ... }.
Tolerates 100 RPS for 5 minutes (the bench harness ramps to that).

Running the harness

cd tools/qa/decisioning-bench
docker run -d --name submission -p 8080:8080 my/submission:latest
node harness/run.mjs --dataset banking-cards --target http://localhost:8080

Results land at results/<dataset>/<utc-timestamp>.json plus a leaderboard CSV.

Scoring rubric

Dimension	Weight
Latency p99	20%
AUC (rank-based)	25%
Fairness gap (worst DI ratio)	15%
Explanation quality	10%
Uplift over random	30%

Composite normalized to 0-100.

Honest limits

V1 datasets are synthetic. Real-world distribution shift is not modeled — this is a capability check, not a market-fit signal.
Latency is measured from the harness on a single host. Multi-pod scale isn’t tested here; that’s k6’s job.
The repo currently lives in tools/qa/decisioning-bench/. Splitting to a dedicated kaireonai/decisioning-bench repo + GitHub-Pages leaderboard is operator-driven (needs repo creation + Pages config).

Decision Health Dashboard Negotiation eval harness — nightly CI

Get Started

Deploy & Operate

Runbooks

Data Platform

Decisioning Studio

Execute & Optimize

Intelligence

Platform & Security

Integrations

Reports

Release Notes

decisioning-bench — open NBA benchmark

Datasets

Submission contract

Running the harness

Scoring rubric

Honest limits

Get Started

Deploy & Operate

Runbooks

Data Platform

Decisioning Studio

Execute & Optimize

Intelligence

Platform & Security

Integrations

Reports

Release Notes

Documentation Index

​Datasets

​Submission contract

​Running the harness

​Scoring rubric

​Honest limits

Datasets

Submission contract

Running the harness

Scoring rubric

Honest limits