Documentation Index
Fetch the complete documentation index at: https://docs.kaireonai.com/llms.txt
Use this file to discover all available pages before exploring further.
What ships
| File | Purpose |
|---|---|
platform/perf/recommend-100rps.ts | TS-first 100 RPS / 5 min sustained load test against POST /api/v1/recommend. No new dep — uses node:http. |
platform/perf/recommend-5krps.ts | Wrapper that re-invokes the 100rps script with --rps 5000 --duration-sec 300. Operator-pending — running 5K RPS reliably needs a multi-node k6 cloud cluster or a self-hosted load fleet. The TS script is the auditable spec; the operator publishes the baseline. |
platform/perf/recommend.js | Original k6 scenario for 1K / 5K / 10K. Recommended way to actually drive load — the TS scripts are the source-of-truth scenario definition. |
platform/perf/baselines/<date>-recommend-<scenario>.json | One file per published baseline. Lexical sort = chronological because of the date prefix. |
platform/perf/compare-baselines.mjs | CI regression gate. Reads the latest two baselines, fails on p95 > +10% or error_rate > +0.5pp. |
tools/qa/decisioning-bench/harness/run.mjs | Decisioning-quality bench harness — now supports --concurrency N for parallel customer scans. |
Running the 100 RPS scenario
Publishing a baseline
- Run the scenario above against a stable environment (no concurrent dev work; warm caches). Numbers from a cold-start dev server are not representative of production.
- Inspect the JSON. Sanity-check
latencyMs.p95andtotals.errorRateagainst your service’s SLOs. - Commit the baseline file under
platform/perf/baselines/. - Open a PR. The CI step
Perf-baseline regression gate (#24)will compare your new baseline against the previous one and fail the build if either threshold is breached.
CI gate semantics
platform/perf/compare-baselines.mjs reads the two most recent
baseline files matching *-recommend-100rps.json (lexical sort →
chronological because of the date prefix), parses them, and computes:
| Threshold | Default | Override flag |
|---|---|---|
| p95 latency growth | +10 % | --p95-pct-tolerance |
| error rate growth | +0.5 percentage points | --error-rate-pp-tolerance |
- Zero baselines or directory missing. Pass with a
[notice]log. First-time setup — the gate is a no-op until the operator publishes. - One baseline. Pass with a “no previous to compare” log.
- Malformed JSON. Fail. A silent regression cannot slip through bad data.
- Missing
latencyMs.p95ortotals.errorRatefield. Fail.
Honest residual: 5K RPS is operator-pending
The user direction for #24 says explicitly: “5K RPS script ships but stays operator-pending (no AWS provisioning). If 5K RPS isn’t actually run, the baseline JSON file is operator-pending and the 3.1 grade reflects that honestly.” That is the case here.recommend-5krps.ts is the auditable load
profile; running it reliably requires either:
- A multi-node k6 cloud cluster (preferred — k6 already has the worker-pool primitives the TS script lacks for this scale), or
- A self-hosted load fleet (one box per ~500 RPS budget).
platform/perf/baselines/<date>-recommend-5krps.json and either run
the gate manually with --scenario recommend-5krps or extend the CI
step to invoke both scenarios.
Decisioning-bench --concurrency
tools/qa/decisioning-bench/harness/run.mjs now accepts an optional
--concurrency N flag (default 1 = sequential, matching prior
behaviour):
predictions is
sorted by score before AUC + fairness computation. The output JSON
gains a top-level concurrency: <N> field so downstream comparisons
don’t conflate sequential vs parallel runs.
Roadmap
- Auto-generate a
[date]-recommend-100rps.jsonin CI on push tomainagainst a long-running staging environment, so the gate catches regressions on every PR. Today’s manual-publish flow is the honest first cut. - Extend the gate to compare medians + p99 + actual achieved RPS, not just p95 + error rate. Trade-off is signal-vs-noise: short-window 100 RPS samples can show large p99 swings on cold caches.