Skip to main content

Overview

KaireonAI exposes ~110 primitive MCP tools (one per entity + CRUD + intelligence). Agent playbooks are a higher-level layer on top: named, composable operations that chain 5-10 primitive tools into one structured invocation. Agents call a single playbook instead of orchestrating the chain themselves. Every playbook:
  • Validates its input with Zod.
  • Enforces a tenantId (from input or the MCP server’s configured tenant).
  • Is dry-run by default for any side-effecting operation — callers pass apply: true to commit.
  • Writes an entry to AuditLog with action: "playbook.<name>" when it commits.
  • Has per-(tenant, playbook) sliding-window rate limits.
  • Runs LLM work on-demand only — never on the /recommend hot path.
Playbooks are registered as first-class MCP tools under the playbook_* namespace.

The 10 playbooks

ToolPurposeDry-run default
playbook_run_shadow_experimentCreate a shadow challenger + replay recent traces + return uplift + recommendationyes
playbook_arbitrate_policy_conflictDetect policy conflicts + propose priority resolutionyes
playbook_explain_decision_chainFetch a trace + generate regulator/agent/customer narrativesread-only
playbook_promote_challenger_if_winningRun uplift z-test + promote winner if thresholds metyes
playbook_rebuild_offer_qualificationSuggest tighter/looser qualification thresholds from recent outcomesyes
playbook_audit_tenant_data_healthScore connectors + pipelines + schemas + DLQ + trace coverageread-only
playbook_bootstrap_new_offer_campaignBuild Category + SubCategory + Offer + Creative + QualRule + Journey skeletonyes
playbook_simulate_weight_changeReplay traces with proposed arbitration weights; winners/losersread-only
playbook_explain_algorithm_upgradeAuto-upgrade evaluator + LLM rationale + projected AUC gainread-only
playbook_generate_dsar_exportAssemble GDPR Art.15 bundle: traces + regulator narratives + subject dataalways applies

run_shadow_experiment

Input
{
  "championModelId": "abc-123",
  "challengerConfig": { "rules": [...] },
  "sampleSize": 200,
  "apply": false
}
Output
{
  "playbook": "run_shadow_experiment",
  "dryRun": true,
  "shadowModelId": null,
  "replay": { "tracesReplayed": 187, "offersCompared": 1244, "championWins": 612, "challengerWins": 604, "ties": 28 },
  "uplift": { "uplift": -0.006, "pValue": 0.61, "significant": false },
  "recommendation": "keep_champion"
}
Chain: load champion model → replay traces → re-score each offer with champion and challenger configs → compute uplift (two-proportion z-test) → recommend promote, keep_champion, or gather_more_data.

arbitrate_policy_conflict

Runs the conflict detector across offers, qualification rules, contact policies, and experiments. For priority_tie conflicts, proposes a priority bump on the alphabetically-first rule. With apply: true, commits the priority changes.

explain_decision_chain

Given a decisionTraceId, fetches the trace and calls generateNarrative() for each of the three modes (regulator, agent, customer). Narratives are cached per-(tenant × trace × mode × model). Input
{ "decisionTraceId": "trace-1", "modes": ["regulator", "customer"] }

promote_challenger_if_winning

Runs the uplift z-test on experiment.results (champion vs each challenger). Promotes the best challenger only if it clears:
  • samples >= minSamples (default 100) for champion AND challenger
  • significant === true at 95% confidence
  • uplift >= minUplift (default 0.02)
Before/after champion model IDs are logged to the audit trail.

rebuild_offer_qualification

For an offer, walks its assigned qualification rules and examines recent InteractionHistory to compute conversion rates. Suggests threshold deltas:
  • Low conversion → tighten threshold (+0.1)
  • High conversion → loosen (-0.1)
  • Middle band → no change
Confidence reported as low / medium / high based on sample size.

audit_tenant_data_health

Five probes, each scored 0-100:
ProbePass criteria
connector_statusNo connectors in error state
pipeline_freshnessAll pipelines ran within pipelineFreshnessHours (default 24)
schema_validationNo empty schemas
dlq_depthDLQ count == 0
decision_trace_coverage> 100 traces in last 24h
Returns overallScore (average) and overallStatus (worst of pass/warn/fail).

bootstrap_new_offer_campaign

Given a brief, produces a plan of entities to create. Reuses existing Category / SubCategory if they match by name. With apply: true, creates:
  • Offer (draft)
  • Creative (draft, bound to supplied channelId)
  • QualificationRule (draft, scope=offer, propensity_threshold >= 0.5)
  • Journey skeleton (optional; scaffolding only)
Returns yamlDiff — a YAML-ish rendering of the plan.

simulate_weight_change

Replays traces with proposed arbitration weights applied to each offer’s stored objectives. Reports per-offer impression delta (before vs after) and the top 20 movers. Read-only — never commits weight changes.

explain_algorithm_upgrade

Calls evaluateUpgrade() (deterministic) and optionally generates an LLM explanation of the recommendation. Projected AUC gain is a heuristic based on confidence tier and the current AUC gap.

generate_dsar_export

Assembles a GDPR Art.15 data-subject access bundle:
  • All DecisionTrace rows for the customer in the last N months (default 12)
  • A regulator-mode narrative for each trace (capped at maxNarratives, default 25)
  • Qualification + contact-policy outcomes attached to each trace
  • Offers presented
  • The output of exportSubjectData() (interactions, summaries, deliveries, etc.)
Always writes an audit log entry — DSAR requests are non-discardable.

Architecture notes

  • Implementation lives in platform/src/lib/mcp/playbooks/.
  • Each playbook is one file; index.ts wires registerPlaybooks(ctx) into the MCP server.
  • Playbooks call into Prisma / @/lib services directly (in-process), bypassing HTTP transport for efficiency.
  • Audit logging reuses logAudit() — same store and hash chain as all other audit events.
  • Rate limits are a simple in-memory sliding window keyed on (tenant, playbook).