Documentation Index
Fetch the complete documentation index at: https://docs.kaireonai.com/llms.txt
Use this file to discover all available pages before exploring further.
The LLM tokens consumed by document extraction are billed to your configured AI provider account. KaireonAI does not subsidize, reimburse, or cap third-party LLM costs.
The platform exposes per-job caps, per-tenant monthly quotas, and a dry-run mode so operators can estimate spend BEFORE running extraction. The resulting bill from Anthropic / OpenAI / etc. is the customer’s responsibility. Always dry-run first.A 200-page brand deck routed through vision fallback can cost 50 USD on Claude Sonnet vision pricing (2026 rates).
Overview
AI Document Import lets operators drop a PDF or PPTX (≤25 MB) into the existing AI chat panel. The platform:- Validates magic bytes (PDF starts with
%PDF-; PPTX is a ZIP withppt/presentation.xml). The clientContent-Typeheader is never trusted alone. - Persists the bytes (local filesystem or S3, tenant-scoped) and a row in
AiAttachmentkeyed(tenantId, sha256)for idempotency. - On dry-run: parses the document and emits a per-page token estimate without spending vision tokens.
- On extract: runs the configured tenant LLM with a Zod-constrained schema that requires
sourcePageNumber + sourceQuoteper entity. Hallucinated citations are dropped before the proposal is rendered. - Dedupes against existing entities per type (exact → fuzzy via
pg_trgm→ optional LLM semantic). - The chat panel renders an inline
EntityProposalViewcard. Operators pickcreate-new/merge-into-existing/skipper row. - On apply: a single
prisma.$transactionwrites every selected row. Any failure rolls back the whole apply and highlights the failing row inline. - On revert: deletes only the
create-newrows in a fresh transaction. Merges are NOT auto-reverted — the audit log holds the field diff for manual undo.
Cost guardrails
Configure undertenantSettings.aiAnalyzerSettings.import:
| Key | Default | Effect |
|---|---|---|
maxTokensPerJob | 200_000 | Per-extract cap. Exceeding triggers a 429 with {code:"JOB_CAP_EXCEEDED", capName, capValue, estimate, tenantSettingsKey}. |
monthlyTokenBudget | 2_000_000 | Per-tenant monthly cap tracked in AiImportTokenLedger keyed (tenantId, YYYY-MM). |
textMinChars | 30 | Below this, the page falls back to vision. |
visionImageRatio | 0.40 | Above this, the page falls back to vision. |
semanticDedupeEnabled | false | Optional 3rd dedupe pass via the LLM. Per-entity-type only. |
costNoticeAcknowledged | false | Set to true after the operator dismisses the upload-banner cost notice. |
The dry-run itself does not call the extractor LLM and is token-free for the customer’s AI provider. It only parses the PDF/PPTX and applies the heuristic estimator. The real LLM call happens at
/api/v1/ai/imports/extract and is what gets billed.API surface
POST /api/v1/ai/chat/attachments — upload
Multipart form-data:
| Part | Required | Notes |
|---|---|---|
file | yes | Single file. application/pdf or application/vnd.openxmlformats-officedocument.presentationml.presentation. ≤25 MB. |
conversationId | yes | Chat conversation id this attachment is scoped to. |
admin or editor (viewer is read-only).
Returns 201 (or 200 idempotent) with:
pageCount stays 0 until the first dry-run or extract parses the file.
POST /api/v1/ai/imports/dry-run
Body: { "attachmentId": "..." }. Returns:
POST /api/v1/ai/imports/extract
Body: { "attachmentId": "..." }. On success returns 201 with proposals[], tokensUsed, droppedCitations, and disclaimer (the bolded customer-cost notice). On cap refusal returns 429:
POST /api/v1/ai/imports/{attachmentId}/apply
Body:
applyId, createdEntityIds[], mergedEntityIds[], skippedProposalIds[]. On any row-level failure: 422 with { status: "rolled_back", failedProposalId, errorMessage } — the whole apply is rolled back; nothing is persisted.
POST /api/v1/ai/imports/{applyId}/revert
Deletes only the create-new entities. Merges are explicitly NOT undone (returned in skippedMerges). Created entities that have child references (e.g. a creative attached to an imported offer) are surfaced in orphanedRefusals so the operator can resolve manually.
Skip-digest endpoints
GET /api/v1/ai/imports/skip-digests— list dismissed proposals (so re-uploading the same deck doesn’t re-propose them).POST /api/v1/ai/imports/skip-digests— manually add a digest.DELETE /api/v1/ai/imports/skip-digests/{id}— clear a digest.
What V1 does NOT do
Per spec §1, these are explicit V2:- DOCX / Markdown / Excel attachments.
- Multi-attachment per chat turn.
- Auto-apply at any confidence (operator always picks per row).
- Cross-entity-type semantic awareness (“this offer concept is your existing creative”).
- Per-row revert button (only “Revert this import” — soft, atomic).
- Re-running extraction with different settings on the same
AiAttachmentwithout re-uploading.
Storage
| Backend | Path shape | Encryption |
|---|---|---|
local (default) | /var/kaireon/attachments/<tenantId>/<sha256>.<ext>, dir 0700, file 0600 | OS-level only |
s3 | s3://${ATTACHMENT_S3_BUCKET}/<tenantId>/<sha256>.<ext> | AES256 SSE on every put |
STORAGE_BACKEND (local | s3). When s3, set ATTACHMENT_S3_BUCKET and reuse the existing AWS_REGION + AWS credential chain.
Retention
Attachments + parsed text references fall under the tenant’s existingRetentionConfig. Dry-run + extract token usage stays in AiImportTokenLedger indefinitely (operator-visible per-month rollups).
Audit trail
Every apply writes oneAuditLog row per affected entity with action: "ai_import_apply" and details {applyId, mode}. Every revert writes one row per deleted entity with action: "ai_import_revert". Merges keep the field-diff in the audit log for manual undo.
Production deploy notes
The Phase 1 migrationprisma/manual-sql/08_ai_import.sql creates the trgm GIN indexes inside a transaction. On large entity tables (millions of rows in Offer, Creative, Channel, Segment, QualificationRule), prefer to skip those CREATE INDEX statements in the migration and run them outside with CREATE INDEX CONCURRENTLY to avoid the table-level write lock.
Local + small-tenant deployments are fine to apply the migration as-is (the trgm-index block is to_regclass-guarded so it’s safe to run before prisma db push materializes the entity tables in fresh environments).
Honest limits
- PDF parser: pdfjs-dist legacy build extracts text. Image-only pages (image ratio ≥ 0.40) are flagged for the vision fallback path; vision-mode LLM call is wired in Phase 4 once cost-guard exists (it would otherwise be uncapped vision spend).
- PPTX parser: walks
ppt/slides/slide<N>.xmlvia JSZip + fast-xml-parser. Animations + embedded charts lose their data; chart axis labels go through vision when the ratio triggers. - Hallucinated citations: the post-extraction validator drops entities whose
sourceQuoteis not a substring of the cited page’s parsed text. Drops are logged and never reach the proposal card. - Concurrent operator edits: the apply transaction holds row locks; concurrent edits queue. Long merges can exceed the default 30s Prisma transaction timeout — split into multiple applies if you have hundreds of rows.
Cost responsibility — final word
The platform never spends an LLM token on the customer’s behalf without an explicit operator action. Upload + dry-run are token-free for the customer’s AI provider. Extract is the only billable step, and it is gated by
maxTokensPerJob + monthlyTokenBudget. Customers control their spend via tenant settings; KaireonAI does not.