Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.kaireonai.com/llms.txt

Use this file to discover all available pages before exploring further.

What ships

FilePurpose
platform/perf/recommend-100rps.tsTS-first 100 RPS / 5 min sustained load test against POST /api/v1/recommend. No new dep — uses node:http.
platform/perf/recommend-5krps.tsWrapper that re-invokes the 100rps script with --rps 5000 --duration-sec 300. Operator-pending — running 5K RPS reliably needs a multi-node k6 cloud cluster or a self-hosted load fleet. The TS script is the auditable spec; the operator publishes the baseline.
platform/perf/recommend.jsOriginal k6 scenario for 1K / 5K / 10K. Recommended way to actually drive load — the TS scripts are the source-of-truth scenario definition.
platform/perf/baselines/<date>-recommend-<scenario>.jsonOne file per published baseline. Lexical sort = chronological because of the date prefix.
platform/perf/compare-baselines.mjsCI regression gate. Reads the latest two baselines, fails on p95 > +10% or error_rate > +0.5pp.
tools/qa/decisioning-bench/harness/run.mjsDecisioning-quality bench harness — now supports --concurrency N for parallel customer scans.

Running the 100 RPS scenario

# Start the dev server in another terminal
cd platform && npm run dev

# In a fresh terminal, point the script at it
TENANT_ID=<your-tenant-uuid> API_KEY=<your-api-key> \
    npx tsx platform/perf/recommend-100rps.ts \
        --target http://localhost:3000

# Default flags: --rps 100 --duration-sec 300. Output JSON path:
#   platform/perf/baselines/<UTC-date>-recommend-100rps.json
The output is also printed to stdout so the operator can eyeball the percentiles before committing.

Publishing a baseline

  1. Run the scenario above against a stable environment (no concurrent dev work; warm caches). Numbers from a cold-start dev server are not representative of production.
  2. Inspect the JSON. Sanity-check latencyMs.p95 and totals.errorRate against your service’s SLOs.
  3. Commit the baseline file under platform/perf/baselines/.
  4. Open a PR. The CI step Perf-baseline regression gate (#24) will compare your new baseline against the previous one and fail the build if either threshold is breached.

CI gate semantics

platform/perf/compare-baselines.mjs reads the two most recent baseline files matching *-recommend-100rps.json (lexical sort → chronological because of the date prefix), parses them, and computes:
p95DeltaPct  = ((current.latencyMs.p95 - previous.latencyMs.p95) / previous.latencyMs.p95) * 100
errDeltaPp   = (current.totals.errorRate - previous.totals.errorRate) * 100
The build fails when either delta exceeds the configured threshold:
ThresholdDefaultOverride flag
p95 latency growth+10 %--p95-pct-tolerance
error rate growth+0.5 percentage points--error-rate-pp-tolerance
Edge cases:
  • Zero baselines or directory missing. Pass with a [notice] log. First-time setup — the gate is a no-op until the operator publishes.
  • One baseline. Pass with a “no previous to compare” log.
  • Malformed JSON. Fail. A silent regression cannot slip through bad data.
  • Missing latencyMs.p95 or totals.errorRate field. Fail.
The gate is read-only — CI never generates a baseline (a CI runner is not load-representative; using it would yield noisy thresholds). Baselines are produced by operators against a representative environment.

Honest residual: 5K RPS is operator-pending

The user direction for #24 says explicitly: “5K RPS script ships but stays operator-pending (no AWS provisioning). If 5K RPS isn’t actually run, the baseline JSON file is operator-pending and the 3.1 grade reflects that honestly.” That is the case here. recommend-5krps.ts is the auditable load profile; running it reliably requires either:
  • A multi-node k6 cloud cluster (preferred — k6 already has the worker-pool primitives the TS script lacks for this scale), or
  • A self-hosted load fleet (one box per ~500 RPS budget).
When the 5K baseline is published, drop it under platform/perf/baselines/<date>-recommend-5krps.json and either run the gate manually with --scenario recommend-5krps or extend the CI step to invoke both scenarios.

Decisioning-bench --concurrency

tools/qa/decisioning-bench/harness/run.mjs now accepts an optional --concurrency N flag (default 1 = sequential, matching prior behaviour):
node tools/qa/decisioning-bench/harness/run.mjs \
    --dataset banking-cards \
    --target http://localhost:3000 \
    --concurrency 8
Internally this runs N async loops over a shared queue of holdout customers. Aggregation order remains deterministic — predictions is sorted by score before AUC + fairness computation. The output JSON gains a top-level concurrency: <N> field so downstream comparisons don’t conflate sequential vs parallel runs.

Roadmap

  • Auto-generate a [date]-recommend-100rps.json in CI on push to main against a long-running staging environment, so the gate catches regressions on every PR. Today’s manual-publish flow is the honest first cut.
  • Extend the gate to compare medians + p99 + actual achieved RPS, not just p95 + error rate. Trade-off is signal-vs-noise: short-window 100 RPS samples can show large p99 swings on cold caches.