Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.kaireonai.com/llms.txt

Use this file to discover all available pages before exploring further.

The LLM tokens consumed by document extraction are billed to your configured AI provider account. KaireonAI does not subsidize, reimburse, or cap third-party LLM costs. The platform exposes per-job caps, per-tenant monthly quotas, and a dry-run mode so operators can estimate spend BEFORE running extraction. The resulting bill from Anthropic / OpenAI / etc. is the customer’s responsibility. Always dry-run first.A 200-page brand deck routed through vision fallback can cost 2020–50 USD on Claude Sonnet vision pricing (2026 rates).

Overview

AI Document Import lets operators drop a PDF or PPTX (≤25 MB) into the existing AI chat panel. The platform:
  1. Validates magic bytes (PDF starts with %PDF-; PPTX is a ZIP with ppt/presentation.xml). The client Content-Type header is never trusted alone.
  2. Persists the bytes (local filesystem or S3, tenant-scoped) and a row in AiAttachment keyed (tenantId, sha256) for idempotency.
  3. On dry-run: parses the document and emits a per-page token estimate without spending vision tokens.
  4. On extract: runs the configured tenant LLM with a Zod-constrained schema that requires sourcePageNumber + sourceQuote per entity. Hallucinated citations are dropped before the proposal is rendered.
  5. Dedupes against existing entities per type (exact → fuzzy via pg_trgm → optional LLM semantic).
  6. The chat panel renders an inline EntityProposalView card. Operators pick create-new / merge-into-existing / skip per row.
  7. On apply: a single prisma.$transaction writes every selected row. Any failure rolls back the whole apply and highlights the failing row inline.
  8. On revert: deletes only the create-new rows in a fresh transaction. Merges are NOT auto-reverted — the audit log holds the field diff for manual undo.

Cost guardrails

Configure under tenantSettings.aiAnalyzerSettings.import:
KeyDefaultEffect
maxTokensPerJob200_000Per-extract cap. Exceeding triggers a 429 with {code:"JOB_CAP_EXCEEDED", capName, capValue, estimate, tenantSettingsKey}.
monthlyTokenBudget2_000_000Per-tenant monthly cap tracked in AiImportTokenLedger keyed (tenantId, YYYY-MM).
textMinChars30Below this, the page falls back to vision.
visionImageRatio0.40Above this, the page falls back to vision.
semanticDedupeEnabledfalseOptional 3rd dedupe pass via the LLM. Per-entity-type only.
costNoticeAcknowledgedfalseSet to true after the operator dismisses the upload-banner cost notice.
The dry-run itself does not call the extractor LLM and is token-free for the customer’s AI provider. It only parses the PDF/PPTX and applies the heuristic estimator. The real LLM call happens at /api/v1/ai/imports/extract and is what gets billed.

API surface

POST /api/v1/ai/chat/attachments — upload

Multipart form-data:
PartRequiredNotes
fileyesSingle file. application/pdf or application/vnd.openxmlformats-officedocument.presentationml.presentation. ≤25 MB.
conversationIdyesChat conversation id this attachment is scoped to.
Auth: tenant + role admin or editor (viewer is read-only). Returns 201 (or 200 idempotent) with:
{
  "attachmentId": "uuid",
  "sha256": "hex",
  "sizeBytes": 8423129,
  "mimeType": "application/pdf",
  "pageCount": 0,
  "idempotent": false
}
pageCount stays 0 until the first dry-run or extract parses the file.

POST /api/v1/ai/imports/dry-run

Body: { "attachmentId": "..." }. Returns:
{
  "attachmentId": "uuid",
  "pageCount": 24,
  "estimatedTokens": 343200,
  "perPageEstimate": [
    { "pageNumber": 1, "tokenEstimate": 1850, "willTriggerVision": false },
    { "pageNumber": 2, "tokenEstimate": 8200, "willTriggerVision": true, "reason": "imageRatio=0.71" }
  ],
  "currentMonthlyUsage": 184500,
  "monthlyRemaining": 1815500,
  "wouldExceedJobCap": true,
  "wouldExceedMonthlyCap": false,
  "costDisclaimer": "The LLM tokens consumed by document extraction are billed to your configured AI provider account. KaireonAI does not subsidize, reimburse, or cap third-party LLM costs. Always dry-run first."
}

POST /api/v1/ai/imports/extract

Body: { "attachmentId": "..." }. On success returns 201 with proposals[], tokensUsed, droppedCitations, and disclaimer (the bolded customer-cost notice). On cap refusal returns 429:
{
  "error": {
    "code": "JOB_CAP_EXCEEDED",
    "details": {
      "capName": "maxTokensPerJob",
      "capValue": 200000,
      "estimate": 343200,
      "tenantSettingsKey": "aiAnalyzerSettings.import.maxTokensPerJob"
    }
  }
}

POST /api/v1/ai/imports/{attachmentId}/apply

Body:
{
  "conversationId": "conv-...",
  "verdicts": [
    { "proposalId": "...", "verdict": "create-new" },
    { "proposalId": "...", "verdict": "merge-into-existing" },
    { "proposalId": "...", "verdict": "skip" }
  ]
}
On success: 201 with applyId, createdEntityIds[], mergedEntityIds[], skippedProposalIds[]. On any row-level failure: 422 with { status: "rolled_back", failedProposalId, errorMessage } — the whole apply is rolled back; nothing is persisted.

POST /api/v1/ai/imports/{applyId}/revert

Deletes only the create-new entities. Merges are explicitly NOT undone (returned in skippedMerges). Created entities that have child references (e.g. a creative attached to an imported offer) are surfaced in orphanedRefusals so the operator can resolve manually.

Skip-digest endpoints

  • GET /api/v1/ai/imports/skip-digests — list dismissed proposals (so re-uploading the same deck doesn’t re-propose them).
  • POST /api/v1/ai/imports/skip-digests — manually add a digest.
  • DELETE /api/v1/ai/imports/skip-digests/{id} — clear a digest.

What V1 does NOT do

Per spec §1, these are explicit V2:
  • DOCX / Markdown / Excel attachments.
  • Multi-attachment per chat turn.
  • Auto-apply at any confidence (operator always picks per row).
  • Cross-entity-type semantic awareness (“this offer concept is your existing creative”).
  • Per-row revert button (only “Revert this import” — soft, atomic).
  • Re-running extraction with different settings on the same AiAttachment without re-uploading.

Storage

BackendPath shapeEncryption
local (default)/var/kaireon/attachments/<tenantId>/<sha256>.<ext>, dir 0700, file 0600OS-level only
s3s3://${ATTACHMENT_S3_BUCKET}/<tenantId>/<sha256>.<ext>AES256 SSE on every put
Set via STORAGE_BACKEND (local | s3). When s3, set ATTACHMENT_S3_BUCKET and reuse the existing AWS_REGION + AWS credential chain.

Retention

Attachments + parsed text references fall under the tenant’s existing RetentionConfig. Dry-run + extract token usage stays in AiImportTokenLedger indefinitely (operator-visible per-month rollups).

Audit trail

Every apply writes one AuditLog row per affected entity with action: "ai_import_apply" and details {applyId, mode}. Every revert writes one row per deleted entity with action: "ai_import_revert". Merges keep the field-diff in the audit log for manual undo.

Production deploy notes

The Phase 1 migration prisma/manual-sql/08_ai_import.sql creates the trgm GIN indexes inside a transaction. On large entity tables (millions of rows in Offer, Creative, Channel, Segment, QualificationRule), prefer to skip those CREATE INDEX statements in the migration and run them outside with CREATE INDEX CONCURRENTLY to avoid the table-level write lock. Local + small-tenant deployments are fine to apply the migration as-is (the trgm-index block is to_regclass-guarded so it’s safe to run before prisma db push materializes the entity tables in fresh environments).

Honest limits

  • PDF parser: pdfjs-dist legacy build extracts text. Image-only pages (image ratio ≥ 0.40) are flagged for the vision fallback path; vision-mode LLM call is wired in Phase 4 once cost-guard exists (it would otherwise be uncapped vision spend).
  • PPTX parser: walks ppt/slides/slide<N>.xml via JSZip + fast-xml-parser. Animations + embedded charts lose their data; chart axis labels go through vision when the ratio triggers.
  • Hallucinated citations: the post-extraction validator drops entities whose sourceQuote is not a substring of the cited page’s parsed text. Drops are logged and never reach the proposal card.
  • Concurrent operator edits: the apply transaction holds row locks; concurrent edits queue. Long merges can exceed the default 30s Prisma transaction timeout — split into multiple applies if you have hundreds of rows.

Cost responsibility — final word

The platform never spends an LLM token on the customer’s behalf without an explicit operator action. Upload + dry-run are token-free for the customer’s AI provider. Extract is the only billable step, and it is gated by maxTokensPerJob + monthlyTokenBudget. Customers control their spend via tenant settings; KaireonAI does not.