Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.kaireonai.com/llms.txt

Use this file to discover all available pages before exploring further.

Status as of 2026-05-03 PM. Import endpoint, runtime, integrity check, audit trail, multi-input + out-of-band blob, AND scoring dispatch are now all shipped. The pipeline runner branches on modelType === "onnx_imported" and awaits the async ONNX scoring path, fail-soft to score 0.5 with a degraded-explanation marker when onnxruntime-node is not installed in the deployment image. Install the dep explicitly when a tenant enables ONNX BYO: npm install onnxruntime-node@^1.20.

Import endpoint

POST /api/v1/models/import — multipart form with these parts:
PartRequiredMeaning
fileyesONNX bytes (≤100 MB).
nameyesDisplay name (the algorithm model name).
familyyesMust be "onnx_imported" in V1.
featureNamesyesJSON-encoded string[] matching the model’s input shape.
featureDefaultsnoJSON-encoded Record<string, number> for missing-input handling.
Auth: tenant + admin role. The bytes are persisted base64-encoded inside the model’s modelState.onnxBytesBase64 slot, along with a sha256 digest at bytesHashSha256 so the runner can verify integrity at scoring time.

Scoring

The ONNX scoring runner exposes a per-instance score function that loads the ONNX session lazily through Node’s dynamic-require shim. The runtime dep is not in the platform’s package.json by default — install it explicitly when a tenant enables ONNX BYO:
cd platform
npm install onnxruntime-node@^1.20
The scoring barrel exposes async wrappers for per-instance and batch ONNX scoring. The pipeline runner branches on modelType === "onnx_imported" at both per-candidate scoring sites (the formula PRIE path and the non-formula model fallback) and awaits the async path instead of the sync default scorer. When the runtime dep is missing, the ONNX scorer is fail-soft to score 0.5 and adds a degraded-explanation marker ({ engineType: "onnx_imported", degraded: true, reason: "onnx_runtime_missing" }) to the trace so the operator can detect the missing-install state via audit-log analytics. A malformed-model-state error still bubbles — that’s a configuration bug that should surface in the import audit.

Honest limits

  • Multi-input + out-of-band blob store supported (was single-input only in earlier releases). Float32 + int64 + bool tensor dtypes only — mixed-precision graphs and string tensors are rejected at load.
  • 100 MB cap per file. Larger payloads land in the out-of-band blob store automatically.
  • In-memory session cache; cache eviction relies on process restart. Keep V1 deployments to <50 ONNX models per tenant.
  • CPU-only inference. No GPU support.
  • The onnxruntime-node dep is opt-in. When absent, scoring degrades to 0.5 + degraded-explanation marker rather than throwing — the fail-soft path keeps /recommend responsive for tenants that don’t use ONNX BYO.

Roadmap

  • Streaming-input models (multi-call session reuse) for transformer use cases.
  • GPU runtime via onnxruntime-node GPU build.

Audit trail

Every import writes an audit log row with action: "model_import_onnx", entityType: "algorithm_model", and changes: { family, bytesHashSha256, size, featureCount }. DSAR exports cite the import audit row, the model’s bytesHashSha256, and (when the runtime is installed at scoring time) each scoring decision back to the trace’s scoringResults[].modelType: "onnx_imported" entry.