Documentation Index
Fetch the complete documentation index at: https://docs.kaireonai.com/llms.txt
Use this file to discover all available pages before exploring further.
Overview
KaireonAI’s AI features (chat assistant, insights, content intelligence, rule builder) can run against any OpenAI-compatible LLM endpoint. This guide covers self-hosted options for environments where external API calls are not permitted.Quick Start with Ollama
Ollama is the fastest way to run a local LLM. It runs on Mac, Linux, and Windows.1. Install Ollama
2. Pull a Model
3. Start Ollama
http://localhost:11434 by default.
4. Configure in KaireonAI
Navigate to Settings > AI Configuration and set:| Setting | Value |
|---|---|
| Provider | ollama |
| Model | qwen2.5:7b (or your chosen model) |
| Base URL | http://localhost:11434 |
| API Key | (leave empty for Ollama) |
Other Self-Hosted Options
vLLM (GPU Server)
Best for production deployments with GPU instances.- Provider:
openai(vLLM is OpenAI-compatible) - Base URL:
http://your-gpu-server:8000/v1 - Model:
meta-llama/Llama-3.1-8B-Instruct
LM Studio (Desktop)
Download from lmstudio.ai, load a model, and start the local server. Configure in KaireonAI:- Provider:
lmstudio - Base URL:
http://localhost:1234/v1 - Model: (auto-detected)
HuggingFace Text Generation Inference
openai, Base URL http://localhost:8080/v1
Supported Providers
| Provider | Tool Calling | Streaming | Local | Cloud |
|---|---|---|---|---|
| Google (Gemini) | Yes | Yes | No | Yes |
| OpenAI (GPT) | Yes | Yes | No | Yes |
| Anthropic (Claude) | Yes | Yes | No | Yes |
| Ollama | Yes (qwen2.5, llama3.1) | Yes | Yes | No |
| LM Studio | Partial | Yes | Yes | No |
| vLLM | Yes | Yes | Yes | Yes |
| AWS Bedrock | Yes | Yes | No | Yes |
Bring Your Own Key (BYOK)
On the KaireonAI Playground (playground.kaireonai.com), each registered user can configure their own LLM provider:
- Go to Settings > AI Configuration
- Select your preferred provider
- Enter your API key (encrypted at rest, never shared)
- Your key is scoped to your tenant only
API keys are encrypted using AES-256 before storage. They are never returned in API responses — only
**** masking is shown. Keys can be rotated at any time without affecting other tenants.Model Recommendations
| Use Case | Model | RAM Required | Notes |
|---|---|---|---|
| Dev/testing | qwen2.5:7b | 8GB | Fast, good tool calling |
| Demo | llama3.1 | 8GB | Good general quality |
| Production (self-hosted) | qwen2.5:14b | 16GB | Best quality/speed balance |
| Enterprise | llama3.1:70b | 64GB | Near-cloud quality |
| Cloud (no infra) | gemini-2.5-flash | N/A | Free tier: 20 req/min |