Overview
The Data module is the foundation of KaireonAI. It lets you connect to external data sources, define entity schemas that create real database tables, and build visual pipelines to transform and load data. Navigate to Data in the sidebar to access Connectors, Schemas, Pipelines, Sources, and Segments.Connectors
Connectors define how KaireonAI reaches your external data. 22 connector types are supported across 7 categories:Cloud Storage
| Connector | Description | File Formats |
|---|---|---|
| Amazon S3 | AWS object storage with region selection | CSV, JSON, Parquet, Avro, ORC, TSV, XML |
| Google Cloud Storage | GCP object storage | CSV, JSON, Parquet |
| Azure Blob Storage | Azure object storage | CSV, JSON, Parquet |
| SFTP | Secure file transfer | CSV, JSON, Parquet |
Databases
| Connector | Description | Auth Methods |
|---|---|---|
| PostgreSQL | With CDC via logical replication | Username/password, SSL modes |
| MySQL / MariaDB | With optional SSL | Username/password |
| MongoDB | Atlas or self-hosted | Connection string |
Data Warehouses
| Connector | Description | Auth Methods |
|---|---|---|
| Snowflake | Full warehouse access | OAuth, Key-Pair Token |
| Databricks | Unity Catalog support | OAuth2, Personal Access Token |
| Google BigQuery | Multi-region and regional | Service account JSON |
| Amazon Redshift | Cluster and serverless | Username/password |
Streaming
| Connector | Description |
|---|---|
| Apache Kafka | Self-hosted with Schema Registry support |
| Confluent Cloud | Managed Kafka with Schema Registry |
| Amazon Kinesis | Real-time data streams (LATEST or TRIM_HORIZON) |
CRM / CDP
| Connector | Description |
|---|---|
| Salesforce | Objects sync with API versioning |
| HubSpot | Contacts, companies, deals |
| Segment | User profiles, events, audiences |
| Braze | Campaign performance, user engagement |
APIs & Files
| Connector | Description |
|---|---|
| REST API | GET/POST with pagination (offset, cursor, link) |
| Webhook | Inbound data with API key or bearer token auth |
| CSV File Upload | Direct upload with delimiter and header detection |
| Shopify | Orders, customers, products via Admin API |
| Stripe | Payments, customers, subscriptions, invoices |
| Mailchimp | Audiences, campaigns, engagement data |
Schemas
Schemas define your entity structure — customers, accounts, transactions, etc. When you create a schema, KaireonAI creates an actual PostgreSQL table with the fields you define.Supported Field Types
| Type | PostgreSQL Mapping | Use Case |
|---|---|---|
text | TEXT | Names, emails, free-form strings |
integer | INTEGER | Counts, IDs, discrete values |
bigint | BIGINT | Large numeric IDs |
decimal | NUMERIC | Monetary values, precise decimals |
float | DOUBLE PRECISION | Scores, percentages |
boolean | BOOLEAN | Flags, toggles |
date | DATE | Birth dates, start dates |
timestamp | TIMESTAMPTZ | Event times, audit timestamps |
json | JSONB | Nested/dynamic data |
uuid | UUID | Unique identifiers |
Schema References
You reference schemas throughout the platform:- Enrichment stages in decision flows — load customer data at decision time
- Computed values — formulas that reference
customer.*fields from schema tables - Pipelines — as target destinations for ETL workflows
- Segments — define customer cohorts based on schema field conditions
Pipelines
Pipelines are visual ETL workflows built with a drag-and-drop flow editor. Each pipeline connects source nodes → transform nodes → target nodes.Node Types
- Source nodes — Read from a connector (any of the 22 types above)
- Transform nodes — Apply one of 14 data transformations (see below)
- Target nodes — Write to a schema table or external destination
Transform Types
KaireonAI provides 14 built-in transform types:Rename Field
Rename Field
Rename columns to standardize naming conventions across data sources. Map source field names to target field names.
Cast Type
Cast Type
Convert data types between:
string, integer, bigint, float, numeric, boolean, date, timestamp, json, uuid.Expression
Expression
Compute new fields using SQL-like expressions with 50+ built-in functions across 5 categories:
| Category | Functions |
|---|---|
| String | UPPER, LOWER, TRIM, LTRIM, RTRIM, SUBSTRING, REPLACE, CONCAT, LENGTH, LEFT, RIGHT, SPLIT_PART |
| Numeric | ABS, CEIL, FLOOR, ROUND, MOD, POWER, SQRT, LOG, GREATEST |
| Date/Time | NOW, DATE_TRUNC, DATE_PART, AGE, DATE_ADD, DATE_SUB, TO_DATE, TO_TIMESTAMP, EXTRACT, DATE_DIFF |
| Type Cast | CAST, TO_CHAR, TO_NUMBER, TO_BOOLEAN, COALESCE, NULLIF, CASE WHEN, GREATEST |
| Conditional | CASE, COALESCE, NULLIF, IIF |
Filter Rows
Filter Rows
Keep only rows matching specified conditions. Supports operators:
=, !=, >, <, >=, <=, LIKE, IN, IS NULL, IS NOT NULL.Drop Field
Drop Field
Remove unwanted columns from the data flow.
Add Field
Add Field
Add a new column with a default value or computed expression.
Map Values
Map Values
Replace values using a JSON lookup table. Useful for code-to-label mapping (e.g.,
"M" → "Male", "F" → "Female").Split Field
Split Field
Split a single field into multiple fields by a separator character (e.g., split full name into first/last).
Merge Fields
Merge Fields
Concatenate multiple columns into a single field with an optional separator.
Deduplicate
Deduplicate
Remove duplicate rows based on one or more key columns. Keeps the first occurrence.
Aggregate
Aggregate
GROUP BY one or more columns with aggregate functions:
SUM, COUNT, AVG, MIN, MAX.Lookup / Join
Lookup / Join
LEFT JOIN with another schema table to enrich data. Specify the join key and which fields to pull from the lookup table.
Hash
Hash
Apply SHA-256 or SHA-512 hashing to fields for anonymization or deduplication keys.
Mask PII
Mask PII
Auto-detect and mask personally identifiable information including SSN, email addresses, phone numbers, and credit card numbers. Supports partial masking (e.g.,
***-**-1234).Execution Config
Pipelines support three execution modes:| Mode | Description | Use Case |
|---|---|---|
| Batch | Process all records in configurable batch sizes | Full loads, daily syncs |
| Micro-Batch | Small frequent batches | Near real-time with controlled throughput |
| Streaming | Continuous record-by-record processing | Real-time event streams |
Configuration Options
| Setting | Default | Description |
|---|---|---|
| Batch Size | 10,000 | Records per batch (batch/micro-batch modes) |
| Parallelism | 1 | Concurrent workers (1–16) |
| Partitioning | None | Partition key for distributed processing |
| Error Handling | Fail | skip (continue on error), fail (stop pipeline), dlq (route to dead-letter queue) |
| Scheduling | Manual | Cron expression for automatic runs |
Supported File Formats
When reading from file-based sources (S3, GCS, Azure Blob, SFTP, CSV Upload):| Format | Extensions | Notes |
|---|---|---|
| CSV | .csv | Configurable delimiter, header detection |
| JSON | .json | Array of objects or nested |
| JSON Lines | .jsonl | One JSON object per line |
| Parquet | .parquet | Columnar, schema-preserving |
| Avro | .avro | Schema-embedded binary |
| ORC | .orc | Optimized row columnar |
| TSV | .tsv | Tab-separated values |
| XML | .xml | Structured markup |
Next Steps
Decisioning Studio
Set up offers, rules, and decision flows that use your data.
Core Concepts
Understand how data connects to decisioning and delivery.