Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.kaireonai.com/llms.txt

Use this file to discover all available pages before exploring further.

What it does

tools/scripts/run-negotiation-eval.ts invokes runDefaultNegotiationEval(seed=7) from lib/negotiation/eval-harness.ts, prints a JSON summary, and writes the full report to platform/perf/baselines/<date>-negotiation-eval.json. The script’s exit code is the regression signal:
ExitCondition
0report.zeroViolationClearance === true — every row passed AND every violation code fired at least once.
1Any row failed OR uncovered violation codes exist.

When it runs

.github/workflows/nightly-eval.yml schedules the script at 06:17 UTC daily. The cron offset (17 minutes past) deconflicts with other cron-driven jobs in the deployment. workflow_dispatch: is also enabled so operators can re-run on demand.

What gets archived

Each run uploads platform/perf/baselines/*-negotiation-eval.json as the workflow artifact negotiation-eval-<run_id> with 30-day retention. Operators can compare today’s report against a 7-day-prior artifact to spot drift even when the daily exit code stays green.

Honest limits

  • Synthetic-only dataset. Production negotiation transcripts are not fed into this harness — they would expose customer PII and require red-team review before use.
  • The eval script does not POST results anywhere (Slack, Discord) yet. Adding a webhook integration is roadmap work; for now operators watch the GitHub Actions failure email.
  • Default seed is 7. To investigate a flake, vary NEGOTIATION_EVAL_SEED via workflow_dispatch inputs.

Cross-references

  • Negotiation guardrail engine: lib/negotiation/guardrails.ts.
  • Apply-mode + multi-turn scaffolds (lib/negotiation/apply-mode.ts, lib/negotiation/multi-turn.ts) remain deferred until eval-harness clearance produces enough signal for a regulator review.