Skip to main content

Overview

Auto-Segmentation analyzes your schema data to discover natural customer segments. Instead of manually defining segments based on assumptions, you select a schema and a set of fields, and the AI identifies clusters of customers with similar characteristics. Navigate to AI > Auto-Segmentation in the sidebar.

How It Works

1
Select a schema
2
Choose the schema table that contains the customer data you want to segment (e.g., customers, credit_profiles). The schema must have been created in the Data module and contain rows.
3
Pick fields
4
Select the fields to include in the segmentation analysis. Choose numeric and categorical fields that are likely to differentiate customer groups — for example, age, income, total_spend, region, account_type.
5
Select 3 to 8 fields for the best results. Too few fields produce trivial segments; too many dilute the signal and slow analysis.
6
Run discovery
7
Click Discover Segments. The AI analyzes the data and identifies clusters of customers with similar characteristics.
8
Review segment cards
9
The results appear as segment cards, each representing a discovered cluster. Review the cards to understand what makes each segment distinct.

Dual-Tier Analysis

TierWhen UsedMethod
LLMDefault, or when ML Worker is unavailableSamples up to 1000 rows and uses the LLM to identify patterns and propose segment definitions
ML WorkerWhen connected and dataset exceeds 5K rowsRuns K-Means clustering on the full dataset with silhouette scoring to determine the optimal number of clusters
The ML Worker tier is more accurate because it processes the entire dataset and uses statistical validation (silhouette score) to find the natural number of clusters rather than guessing.

Interpreting Segment Cards

Each segment card displays:
  • Segment name — An AI-generated descriptive name (e.g., “High-Value Loyalists”, “Price-Sensitive New Users”)
  • Size — Number of customers in the segment and percentage of total
  • Key characteristics — The defining attributes of the segment, showing how its averages compare to the population
  • Distinguishing features — Which fields most differentiate this segment from others
  • Suggested actions — AI-generated recommendations for how to target this segment
Example card:
High-Value Loyalists — 2,340 customers (18%)
  • Average spend: 1,250(vs.1,250 (vs. 480 population avg)
  • Average tenure: 4.2 years (vs. 1.8 population avg)
  • Primary channel: Email (72%)
  • Suggested: Premium offers, loyalty rewards, lower contact frequency

Applying Segments

When you click Apply on a segment:
  1. A recommendation is created in the AI Insights dashboard
  2. The recommendation includes the segment definition (field conditions)
  3. Applying from Insights creates a draft qualification rule that identifies customers matching the segment criteria
You can use these qualification rules in decision flows to target specific segments with tailored offers.

Field Selection Tips

  • Numeric fields work best — Income, age, spend, tenure, and score fields produce clearer clusters
  • Limit categorical fields — High-cardinality categoricals (e.g., zip code) add noise. Prefer broad categories like region or account type
  • Exclude IDs — Do not include customer_id, email, or other unique identifiers as segmentation fields
  • Include behavioral data — If you have interaction summaries (total_clicks, avg_response_rate), include them for behaviorally meaningful segments

Advanced Parameters

Each segmentation run can be fine-tuned using the Advanced Parameters panel on the segmentation page. Expand the panel to adjust:
ParameterDefaultDescription
Min Clusters2Fewest groups to split customers into
Max Clusters8Most groups customers can be split into
AlgorithmkmeansClustering method (kmeans or dbscan)
Included FeaturesAllWhich attributes to consider (null = all)
Per-run overrides do not change your saved tenant configuration. To change the organization-wide defaults, go to AI Configuration.

Large Dataset Warning

When the selected schema contains 5,000 or more rows, a confirmation dialog appears before analysis begins. The dialog shows:
  • Accuracy comparison — ML Worker uses K-Means with silhouette scoring on the full dataset vs. LLM sampling
  • Estimated cost — Token count and approximate cost if proceeding with LLM
  • Speed comparison — ML Worker processes locally in seconds vs. LLM round-trip
You can choose Use ML Worker (recommended for large datasets) or Proceed with LLM (uses sampled data).
For datasets over 5,000 rows, the ML Worker is strongly recommended. LLM-based analysis samples up to 1,000 rows, which may miss important patterns. See ML Worker Setup for deployment instructions.

Next Steps