Documentation Index
Fetch the complete documentation index at: https://docs.kaireonai.com/llms.txt
Use this file to discover all available pages before exploring further.
What it does
tools/scripts/run-negotiation-eval.ts invokes
runDefaultNegotiationEval(seed=7) from
lib/negotiation/eval-harness.ts, prints a JSON summary, and writes
the full report to
platform/perf/baselines/<date>-negotiation-eval.json.
The script’s exit code is the regression signal:
| Exit | Condition |
|---|---|
| 0 | report.zeroViolationClearance === true — every row passed AND every violation code fired at least once. |
| 1 | Any row failed OR uncovered violation codes exist. |
When it runs
.github/workflows/nightly-eval.yml schedules the script at
06:17 UTC daily. The cron offset (17 minutes past) deconflicts
with other cron-driven jobs in the deployment.
workflow_dispatch: is also enabled so operators can re-run on
demand.
What gets archived
Each run uploadsplatform/perf/baselines/*-negotiation-eval.json
as the workflow artifact negotiation-eval-<run_id> with 30-day
retention. Operators can compare today’s report against a 7-day-prior
artifact to spot drift even when the daily exit code stays green.
Honest limits
- Synthetic-only dataset. Production negotiation transcripts are not fed into this harness — they would expose customer PII and require red-team review before use.
- The eval script does not POST results anywhere (Slack, Discord) yet. Adding a webhook integration is roadmap work; for now operators watch the GitHub Actions failure email.
- Default seed is 7. To investigate a flake, vary
NEGOTIATION_EVAL_SEEDviaworkflow_dispatchinputs.
Cross-references
- Negotiation guardrail engine:
lib/negotiation/guardrails.ts. - Apply-mode + multi-turn scaffolds (
lib/negotiation/apply-mode.ts,lib/negotiation/multi-turn.ts) remain deferred until eval-harness clearance produces enough signal for a regulator review.