AI Document Import

The LLM tokens consumed by document extraction are billed to your configured AI provider account. KaireonAI does not subsidize, reimburse, or cap third-party LLM costs. The platform exposes per-job caps, per-tenant monthly quotas, and a dry-run mode so operators can estimate spend BEFORE running extraction. The resulting bill from Anthropic / OpenAI / etc. is the customer’s responsibility. Always dry-run first.A 200-page brand deck routed through vision fallback can cost $20–$ 50 USD on Claude Sonnet vision pricing (2026 rates).

Overview

AI Document Import lets operators drop a PDF or PPTX (≤25 MB) into the existing AI chat panel. The platform:

Validates magic bytes (PDF starts with %PDF-; PPTX is a ZIP with ppt/presentation.xml). The client Content-Type header is never trusted alone.
Persists the bytes (local filesystem or S3, tenant-scoped) and a row in AiAttachment keyed (tenantId, sha256) for idempotency.
On dry-run: parses the document and emits a per-page token estimate without spending vision tokens.
On extract: runs the configured tenant LLM with a Zod-constrained schema that requires sourcePageNumber + sourceQuote per entity. Hallucinated citations are dropped before the proposal is rendered.
Dedupes against existing entities per type (exact → fuzzy via pg_trgm → optional LLM semantic).
The chat panel renders an inline EntityProposalView card. Operators pick create-new / merge-into-existing / skip per row.
On apply: a single prisma.$transaction writes every selected row. Any failure rolls back the whole apply and highlights the failing row inline.
On revert: deletes only the create-new rows in a fresh transaction. Merges are NOT auto-reverted — the audit log holds the field diff for manual undo.

Cost guardrails

Configure under tenantSettings.aiAnalyzerSettings.import:

Key	Default	Effect
`maxTokensPerJob`	`200_000`	Per-extract cap. Exceeding triggers a 429 with `{code:"JOB_CAP_EXCEEDED", capName, capValue, estimate, tenantSettingsKey}`.
`monthlyTokenBudget`	`2_000_000`	Per-tenant monthly cap tracked in `AiImportTokenLedger` keyed `(tenantId, YYYY-MM)`.
`textMinChars`	`30`	Below this, the page falls back to vision.
`visionImageRatio`	`0.40`	Above this, the page falls back to vision.
`semanticDedupeEnabled`	`false`	Optional 3rd dedupe pass via the LLM. Per-entity-type only.
`costNoticeAcknowledged`	`false`	Set to `true` after the operator dismisses the upload-banner cost notice.

The dry-run itself does not call the extractor LLM and is token-free for the customer’s AI provider. It only parses the PDF/PPTX and applies the heuristic estimator. The real LLM call happens at /api/v1/ai/imports/extract and is what gets billed.

API surface

POST `/api/v1/ai/chat/attachments` — upload

Multipart form-data:

Part	Required	Notes
`file`	yes	Single file. `application/pdf` or `application/vnd.openxmlformats-officedocument.presentationml.presentation`. ≤25 MB.
`conversationId`	yes	Chat conversation id this attachment is scoped to.

Auth: tenant + role admin or editor (viewer is read-only). Returns 201 (or 200 idempotent) with:

{
  "attachmentId": "uuid",
  "sha256": "hex",
  "sizeBytes": 8423129,
  "mimeType": "application/pdf",
  "pageCount": 0,
  "idempotent": false
}

pageCount stays 0 until the first dry-run or extract parses the file.

POST `/api/v1/ai/imports/dry-run`

Body: { "attachmentId": "..." }. Returns:

{
  "attachmentId": "uuid",
  "pageCount": 24,
  "estimatedTokens": 343200,
  "perPageEstimate": [
    { "pageNumber": 1, "tokenEstimate": 1850, "willTriggerVision": false },
    { "pageNumber": 2, "tokenEstimate": 8200, "willTriggerVision": true, "reason": "imageRatio=0.71" }
  ],
  "currentMonthlyUsage": 184500,
  "monthlyRemaining": 1815500,
  "wouldExceedJobCap": true,
  "wouldExceedMonthlyCap": false,
  "costDisclaimer": "The LLM tokens consumed by document extraction are billed to your configured AI provider account. KaireonAI does not subsidize, reimburse, or cap third-party LLM costs. Always dry-run first."
}

POST `/api/v1/ai/imports/extract`

Body: { "attachmentId": "..." }. On success returns 201 with proposals[], tokensUsed, droppedCitations, and disclaimer (the bolded customer-cost notice). On cap refusal returns 429:

{
  "error": {
    "code": "JOB_CAP_EXCEEDED",
    "details": {
      "capName": "maxTokensPerJob",
      "capValue": 200000,
      "estimate": 343200,
      "tenantSettingsKey": "aiAnalyzerSettings.import.maxTokensPerJob"
    }
  }
}

POST `/api/v1/ai/imports/{attachmentId}/apply`

Body:

{
  "conversationId": "conv-...",
  "verdicts": [
    { "proposalId": "...", "verdict": "create-new" },
    { "proposalId": "...", "verdict": "merge-into-existing" },
    { "proposalId": "...", "verdict": "skip" }
  ]
}

On success: 201 with applyId, createdEntityIds[], mergedEntityIds[], skippedProposalIds[]. On any row-level failure: 422 with { status: "rolled_back", failedProposalId, errorMessage } — the whole apply is rolled back; nothing is persisted.

POST `/api/v1/ai/imports/{applyId}/revert`

Deletes only the create-new entities. Merges are explicitly NOT undone (returned in skippedMerges). Created entities that have child references (e.g. a creative attached to an imported offer) are surfaced in orphanedRefusals so the operator can resolve manually.

Skip-digest endpoints

GET /api/v1/ai/imports/skip-digests — list dismissed proposals (so re-uploading the same deck doesn’t re-propose them).
POST /api/v1/ai/imports/skip-digests — manually add a digest.
DELETE /api/v1/ai/imports/skip-digests/{id} — clear a digest.

What V1 does NOT do

Per spec §1, these are explicit V2:

DOCX / Markdown / Excel attachments.
Multi-attachment per chat turn.
Auto-apply at any confidence (operator always picks per row).
Cross-entity-type semantic awareness (“this offer concept is your existing creative”).
Per-row revert button (only “Revert this import” — soft, atomic).
Re-running extraction with different settings on the same AiAttachment without re-uploading.

Storage

Backend	Path shape	Encryption
`local` (default)	`/var/kaireon/attachments/<tenantId>/<sha256>.<ext>`, dir `0700`, file `0600`	OS-level only
`s3`	`s3://${ATTACHMENT_S3_BUCKET}/<tenantId>/<sha256>.<ext>`	`AES256` SSE on every put

Set via STORAGE_BACKEND (local | s3). When s3, set ATTACHMENT_S3_BUCKET and reuse the existing AWS_REGION + AWS credential chain.

Retention

Attachments + parsed text references fall under the tenant’s existing RetentionConfig. Dry-run + extract token usage stays in AiImportTokenLedger indefinitely (operator-visible per-month rollups).

Audit trail

Every apply writes one AuditLog row per affected entity with action: "ai_import_apply" and details {applyId, mode}. Every revert writes one row per deleted entity with action: "ai_import_revert". Merges keep the field-diff in the audit log for manual undo.

Production deploy notes

The Phase 1 migration prisma/manual-sql/08_ai_import.sql creates the trgm GIN indexes inside a transaction. On large entity tables (millions of rows in Offer, Creative, Channel, Segment, QualificationRule), prefer to skip those CREATE INDEX statements in the migration and run them outside with CREATE INDEX CONCURRENTLY to avoid the table-level write lock. Local + small-tenant deployments are fine to apply the migration as-is (the trgm-index block is to_regclass-guarded so it’s safe to run before prisma db push materializes the entity tables in fresh environments).

Honest limits

PDF parser: pdfjs-dist legacy build extracts text. Image-only pages (image ratio ≥ 0.40) are flagged for the vision fallback path; vision-mode LLM call is wired in Phase 4 once cost-guard exists (it would otherwise be uncapped vision spend).
PPTX parser: walks ppt/slides/slide<N>.xml via JSZip + fast-xml-parser. Animations + embedded charts lose their data; chart axis labels go through vision when the ratio triggers.
Hallucinated citations: the post-extraction validator drops entities whose sourceQuote is not a substring of the cited page’s parsed text. Drops are logged and never reach the proposal card.
Concurrent operator edits: the apply transaction holds row locks; concurrent edits queue. Long merges can exceed the default 30s Prisma transaction timeout — split into multiple applies if you have hundreds of rows.

Cost responsibility — final word

The platform never spends an LLM token on the customer’s behalf without an explicit operator action. Upload + dry-run are token-free for the customer’s AI provider. Extract is the only billable step, and it is gated by maxTokensPerJob + monthlyTokenBudget. Customers control their spend via tenant settings; KaireonAI does not.

Get Started

Deploy & Operate

Runbooks

Data Platform

Decisioning Studio

Execute & Optimize

Intelligence

Platform & Security

Integrations

Reports

Release Notes

AI Document Import

Overview

Cost guardrails

API surface

POST `/api/v1/ai/chat/attachments` — upload

POST `/api/v1/ai/imports/dry-run`

POST `/api/v1/ai/imports/extract`

POST `/api/v1/ai/imports/{attachmentId}/apply`

POST `/api/v1/ai/imports/{applyId}/revert`

Skip-digest endpoints

What V1 does NOT do

Storage

Retention

Audit trail

Production deploy notes

Honest limits

Cost responsibility — final word

Get Started

Deploy & Operate

Runbooks

Data Platform

Decisioning Studio

Execute & Optimize

Intelligence

Platform & Security

Integrations

Reports

Release Notes

Documentation Index

​Overview

​Cost guardrails

​API surface

​POST /api/v1/ai/chat/attachments — upload

​POST /api/v1/ai/imports/dry-run

​POST /api/v1/ai/imports/extract

​POST /api/v1/ai/imports/{attachmentId}/apply

​POST /api/v1/ai/imports/{applyId}/revert

​Skip-digest endpoints

​What V1 does NOT do

​Storage

​Retention

​Audit trail

​Production deploy notes

​Honest limits

​Cost responsibility — final word

Overview

Cost guardrails

API surface

POST `/api/v1/ai/chat/attachments` — upload

POST `/api/v1/ai/imports/dry-run`

POST `/api/v1/ai/imports/extract`

POST `/api/v1/ai/imports/{attachmentId}/apply`

POST `/api/v1/ai/imports/{applyId}/revert`

Skip-digest endpoints

What V1 does NOT do

Storage

Retention

Audit trail

Production deploy notes

Honest limits

Cost responsibility — final word