Data Platform

Overview

The Data module is the foundation of KaireonAI. It lets you connect to external data sources, define entity schemas that create real database tables, and build visual pipelines to transform and load data. Navigate to Data in the sidebar to access Connectors, Schemas, Pipelines, Sources, and Segments.

Connectors

Connectors define how KaireonAI reaches your external data. 22 connector types are supported across 7 categories:

Cloud Storage

Connector	Description	File Formats
Amazon S3	AWS object storage with region selection	CSV, JSON, Parquet, Avro, ORC, TSV, XML
Google Cloud Storage	GCP object storage	CSV, JSON, Parquet
Azure Blob Storage	Azure object storage	CSV, JSON, Parquet
SFTP	Secure file transfer	CSV, JSON, Parquet

Databases

Connector	Description	Auth Methods
PostgreSQL	With CDC via logical replication	Username/password, SSL modes
MySQL / MariaDB	With optional SSL	Username/password
MongoDB	Atlas or self-hosted	Connection string

Data Warehouses

Connector	Description	Auth Methods
Snowflake	Full warehouse access	OAuth, Key-Pair Token
Databricks	Unity Catalog support	OAuth2, Personal Access Token
Google BigQuery	Multi-region and regional	Service account JSON
Amazon Redshift	Cluster and serverless	Username/password

Streaming

Connector	Description
Apache Kafka	Self-hosted with Schema Registry support
Confluent Cloud	Managed Kafka with Schema Registry
Amazon Kinesis	Real-time data streams (LATEST or TRIM_HORIZON)

CRM / CDP

Connector	Description
Salesforce	Objects sync with API versioning
HubSpot	Contacts, companies, deals
Segment	User profiles, events, audiences
Braze	Campaign performance, user engagement

APIs & Files

Connector	Description
REST API	GET/POST with pagination (offset, cursor, link)
Webhook	Inbound data with API key or bearer token auth
CSV File Upload	Direct upload with delimiter and header detection
Shopify	Orders, customers, products via Admin API
Stripe	Payments, customers, subscriptions, invoices
Mailchimp	Audiences, campaigns, engagement data

Each connector type has its own configuration form with connection fields, authentication options, and a Test Connection button to verify connectivity before saving. With connectors configured, you define schemas to structure the data they bring in.

Schemas

Schemas define your entity structure — customers, accounts, transactions, etc. When you create a schema, KaireonAI creates an actual PostgreSQL table with the fields you define.

Supported Field Types

Type	PostgreSQL Mapping	Use Case
`text`	`TEXT`	Names, emails, free-form strings
`integer`	`INTEGER`	Counts, IDs, discrete values
`bigint`	`BIGINT`	Large numeric IDs
`decimal`	`NUMERIC`	Monetary values, precise decimals
`float`	`DOUBLE PRECISION`	Scores, percentages
`boolean`	`BOOLEAN`	Flags, toggles
`date`	`DATE`	Birth dates, start dates
`timestamp`	`TIMESTAMPTZ`	Event times, audit timestamps
`json`	`JSONB`	Nested/dynamic data
`uuid`	`UUID`	Unique identifiers

Schema References

You reference schemas throughout the platform:

Enrichment stages in decision flows — load customer data at decision time
Computed values — formulas that reference customer.* fields from schema tables
Pipelines — as target destinations for ETL workflows
Segments — define customer cohorts based on schema field conditions

Schema table names are auto-generated from the schema name. A schema called “Customer Profile” creates a PostgreSQL table named customer_profile.

Once your schemas are in place, you build pipelines to load data into them.

Pipelines

Pipelines are visual ETL workflows built with a drag-and-drop flow editor. Each pipeline connects source nodes → transform nodes → target nodes.

Node Types

Source nodes — Read from a connector (any of the 22 types above)
Transform nodes — Apply one of 14 data transformations (see below)
Target nodes — Write to a schema table or external destination

Transform Types

KaireonAI provides 14 built-in transform types:

Rename Field

Rename columns to standardize naming conventions across data sources. Map source field names to target field names.

Cast Type

Convert data types between: string, integer, bigint, float, numeric, boolean, date, timestamp, json, uuid.

Expression

Compute new fields using SQL-like expressions with 50+ built-in functions across 5 categories:

Category	Functions
String	`UPPER`, `LOWER`, `TRIM`, `LTRIM`, `RTRIM`, `SUBSTRING`, `REPLACE`, `CONCAT`, `LENGTH`, `LEFT`, `RIGHT`, `SPLIT_PART`
Numeric	`ABS`, `CEIL`, `FLOOR`, `ROUND`, `MOD`, `POWER`, `SQRT`, `LOG`, `GREATEST`
Date/Time	`NOW`, `DATE_TRUNC`, `DATE_PART`, `AGE`, `DATE_ADD`, `DATE_SUB`, `TO_DATE`, `TO_TIMESTAMP`, `EXTRACT`, `DATE_DIFF`
Type Cast	`CAST`, `TO_CHAR`, `TO_NUMBER`, `TO_BOOLEAN`, `COALESCE`, `NULLIF`, `CASE WHEN`, `GREATEST`
Conditional	`CASE`, `COALESCE`, `NULLIF`, `IIF`

Filter Rows

Keep only rows matching specified conditions. Supports operators: =, !=, >, <, >=, <=, LIKE, IN, IS NULL, IS NOT NULL.

Drop Field

Remove unwanted columns from the data flow.

Add Field

Add a new column with a default value or computed expression.

Map Values

Replace values using a JSON lookup table. Useful for code-to-label mapping (e.g., "M" → "Male", "F" → "Female").

Split Field

Split a single field into multiple fields by a separator character (e.g., split full name into first/last).

Merge Fields

Concatenate multiple columns into a single field with an optional separator.

Deduplicate

Remove duplicate rows based on one or more key columns. Keeps the first occurrence.

Aggregate

GROUP BY one or more columns with aggregate functions: SUM, COUNT, AVG, MIN, MAX.

Lookup / Join

LEFT JOIN with another schema table to enrich data. Specify the join key and which fields to pull from the lookup table.

Hash

Apply SHA-256 or SHA-512 hashing to fields for anonymization or deduplication keys.

Mask PII

Auto-detect and mask personally identifiable information including SSN, email addresses, phone numbers, and credit card numbers. Supports partial masking (e.g., ***-**-1234).

Execution Config

Pipelines support three execution modes:

Mode	Description	Use Case
Batch	Process all records in configurable batch sizes	Full loads, daily syncs
Micro-Batch	Small frequent batches	Near real-time with controlled throughput
Streaming	Continuous record-by-record processing	Real-time event streams

Configuration Options

Setting	Default	Description
Batch Size	10,000	Records per batch (batch/micro-batch modes)
Parallelism	1	Concurrent workers (1–16)
Partitioning	None	Partition key for distributed processing
Error Handling	Fail	`skip` (continue on error), `fail` (stop pipeline), `dlq` (route to dead-letter queue)
Scheduling	Manual	Cron expression for automatic runs

Supported File Formats

When reading from file-based sources (S3, GCS, Azure Blob, SFTP, CSV Upload):

Format	Extensions	Notes
CSV	`.csv`	Configurable delimiter, header detection
JSON	`.json`	Array of objects or nested
JSON Lines	`.jsonl`	One JSON object per line
Parquet	`.parquet`	Columnar, schema-preserving
Avro	`.avro`	Schema-embedded binary
ORC	`.orc`	Optimized row columnar
TSV	`.tsv`	Tab-separated values
XML	`.xml`	Structured markup

Get Started

Deploy & Operate

Runbooks

Decisioning Studio

Execute & Optimize

Intelligence

Platform & Security

Integrations

Reports

Release Notes

Data Platform

Overview

Connectors

Cloud Storage

Databases

Data Warehouses

Streaming

CRM / CDP

APIs & Files

Schemas

Supported Field Types

Schema References

Pipelines

Node Types

Transform Types

Execution Config

Configuration Options

Supported File Formats

Next Steps

Decisioning Studio

Core Concepts

Get Started

Deploy & Operate

Runbooks

Data Platform

Decisioning Studio

Execute & Optimize

Intelligence

Platform & Security

Integrations

Reports

Release Notes

​Overview

​Connectors

​Cloud Storage

​Databases

​Data Warehouses

​Streaming

​CRM / CDP

​APIs & Files

​Schemas

​Supported Field Types

​Schema References

​Pipelines

​Node Types

​Transform Types

​Execution Config

​Configuration Options

​Supported File Formats

​Next Steps

Decisioning Studio

Core Concepts

Overview

Connectors

Cloud Storage

Databases

Data Warehouses

Streaming

CRM / CDP

APIs & Files

Schemas

Supported Field Types

Schema References

Pipelines

Node Types

Transform Types

Execution Config

Configuration Options

Supported File Formats

Next Steps