Negotiation eval harness — nightly CI

What it does
When it runs
What gets archived
Honest limits
Cross-references

What it does

tools/scripts/run-negotiation-eval.ts invokes runDefaultNegotiationEval(seed=7) from lib/negotiation/eval-harness.ts, prints a JSON summary, and writes the full report to platform/perf/baselines/<date>-negotiation-eval.json. The script’s exit code is the regression signal:

Exit	Condition
0	`report.zeroViolationClearance === true` — every row passed AND every violation code fired at least once.
1	Any row failed OR uncovered violation codes exist.

When it runs

.github/workflows/nightly-eval.yml schedules the script at 06:17 UTC daily. The cron offset (17 minutes past) deconflicts with other cron-driven jobs in the deployment. workflow_dispatch: is also enabled so operators can re-run on demand.

What gets archived

Each run uploads platform/perf/baselines/*-negotiation-eval.json as the workflow artifact negotiation-eval-<run_id> with 30-day retention. Operators can compare today’s report against a 7-day-prior artifact to spot drift even when the daily exit code stays green.

Honest limits

Synthetic-only dataset. Production negotiation transcripts are not fed into this harness — they would expose customer PII and require red-team review before use.
The eval script does not POST results anywhere (Slack, Discord) yet. Adding a webhook integration is roadmap work; for now operators watch the GitHub Actions failure email.
Default seed is 7. To investigate a flake, vary NEGOTIATION_EVAL_SEED via workflow_dispatch inputs.

Cross-references

Negotiation guardrail engine: lib/negotiation/guardrails.ts.
Apply-mode + multi-turn scaffolds (lib/negotiation/apply-mode.ts, lib/negotiation/multi-turn.ts) remain deferred until eval-harness clearance produces enough signal for a regulator review.

decisioning-bench — open NBA benchmark Industry Accelerators

Get Started

Deploy & Operate

Runbooks

Data Platform

Decisioning Studio

Execute & Optimize

Intelligence

Platform & Security

Integrations

Reports

Release Notes

Negotiation eval harness — nightly CI

What it does

When it runs

What gets archived

Honest limits

Cross-references

Get Started

Deploy & Operate

Runbooks

Data Platform

Decisioning Studio

Execute & Optimize

Intelligence

Platform & Security

Integrations

Reports

Release Notes

Documentation Index

​What it does

​When it runs

​What gets archived

​Honest limits

​Cross-references

What it does

When it runs

What gets archived

Honest limits

Cross-references