Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.kaireonai.com/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Auto-Segmentation analyzes your schema data to discover natural customer segments. Instead of manually defining segments based on assumptions, you select a schema and the AI identifies clusters of customers with similar characteristics. Navigate to AI > Segments in the sidebar to view the segment insights dashboard, or trigger analysis programmatically via POST /api/v1/ai/analyze/segments with a schemaId in the request body. The Segments dashboard page (/ai/segments) automatically runs a health check and extracts segment-related findings. It shows segment data health status and overall segmentation readiness, refreshing every 5 minutes.

How It Works

1
Select a schema
2
Choose the schema table that contains the customer data you want to segment (e.g., customers, credit_profiles). The schema must have been created in the Data module and contain rows. The schemaId is required for segment analysis.
3
Pick fields
4
Select the fields to include in the segmentation analysis. Choose numeric and categorical fields that are likely to differentiate customer groups — for example, age, income, total_spend, region, account_type.
5
Fields are classified as either numeric (integer, float, decimal, number, bigint) or categorical (everything else). This classification determines which statistics are computed for the LLM prompt.
6
Select 3 to 8 fields for the best results. Too few fields produce trivial segments; too many dilute the signal and slow analysis.
7
Run discovery
8
Click Discover Segments. The AI analyzes the field statistics and identifies clusters of customers with similar characteristics.
9
Review segment cards
10
The results appear as segment cards, each representing a discovered cluster. Review the cards to understand what makes each segment distinct.

Dual-Tier Analysis

TierWhen UsedMethod
LLMDefault, or when ML Worker is unavailableBuilds a field statistics summary (min/max/avg for numeric, distinct/topValues for categorical) and uses generateObject() with the configured LLM to propose segment definitions
ML WorkerWhen connected and dataset exceeds 5,000 rowsSubmits an async job to the Python ML Worker which runs K-Means clustering on the full dataset with silhouette scoring to determine the optimal number of clusters
The ML Worker tier is more accurate because it processes the entire dataset and uses statistical validation (silhouette score) to find the natural number of clusters rather than relying on pattern matching.

Heuristic Fallback

If the LLM call fails, the system falls back to a deterministic heuristic segmentation:
  1. Finds the numeric field with the widest range (max - min)
  2. Splits into three segments at the 33rd and 67th percentiles: Low, Mid, and High
  3. Each segment gets filter rules, characteristics, and suggested marketing use cases
If no numeric fields exist, a single “All Customers” segment is returned with a message suggesting you add numeric attributes.

Segment Card Format

Each segment card (a SegmentResult object) displays:
FieldTypeDescription
namestringAI-generated descriptive name (e.g., “High-Value Urban Professionals”)
descriptionstring1-2 sentence description of who belongs to this segment
sizenumberEstimated number of customers in the segment
percentagenumberPercentage of total customers (0-100)
filterRulesarrayMachine-readable filter rules with field, operator (eq, gt, lt, gte, lte, in, contains), and value
characteristicsarrayKey distinguishing features, each with feature (field name), value (description), and importance (0-1 score)
suggestedUsestringAI-generated recommendation for how to target this segment
Example card:
High-Value Loyalists — 2,340 customers (18%)
  • Average spend: 1,250(vs.1,250 (vs. 480 population avg)
  • Average tenure: 4.2 years (vs. 1.8 population avg)
  • Primary channel: Email (72%)
  • Suggested: Premium offers, loyalty rewards, lower contact frequency

Applying Segments

When you click Apply on a segment:
  1. A recommendation is created in the AI Insights Dashboard
  2. The recommendation includes the segment definition (filter rules)
  3. Applying from Insights creates a draft qualification rule that identifies customers matching the segment criteria
You can use these qualification rules in Decision Flows to target specific segments with tailored offers.

Field Selection Tips

  • Numeric fields work best — Income, age, spend, tenure, and score fields produce clearer clusters
  • Limit categorical fields — High-cardinality categoricals (e.g., zip code) add noise. Prefer broad categories like region or account type
  • Exclude IDs — Do not include customer_id, email, or other unique identifiers as segmentation fields
  • Include behavioral data — If you have interaction summaries (total_clicks, avg_response_rate), include them for behaviorally meaningful segments

Advanced Parameters

Each segmentation run can be fine-tuned using the Advanced Parameters panel on the segmentation page. Expand the panel to adjust:
ParameterMinMaxDefaultDescription
Min Clusters2202Fewest groups to split customers into
Max Clusters2508Most groups customers can be split into
AlgorithmkmeansClustering method: kmeans (centroid-based, fast) or dbscan (density-based, handles outliers)
Included Featuresnull (all)Which attributes to consider. Null = all numeric features
Per-run overrides do not change your saved tenant configuration. To change the organization-wide defaults, go to AI Configuration.

Large Dataset Warning

When the selected schema contains 5,000 or more rows, a confirmation dialog appears before analysis begins. The dialog shows:
  • Accuracy comparison — ML Worker uses K-Means with silhouette scoring on the full dataset vs. LLM analysis of field statistics
  • Estimated cost — Token count and approximate cost if proceeding with LLM
  • Speed comparison — ML Worker processes locally in seconds vs. LLM round-trip
You can choose Use ML Worker (recommended for large datasets) or Proceed with LLM (uses field statistics summary).
For datasets over 5,000 rows, the ML Worker is strongly recommended. LLM-based analysis works from summarized field statistics rather than raw data, which may miss important patterns. See ML Worker Setup for deployment instructions.

Next Steps

AI Configuration

Configure default segmentation parameters.

AI Insights Dashboard

View and apply segmentation recommendations.

Smart Policy Recommender

Optimize contact frequency policies.