Skip to main content

Overview

The Data module is the foundation of KaireonAI. It lets you connect to external data sources, define entity schemas that create real database tables, and build visual pipelines to transform and load data. Navigate to Data in the sidebar to access Connectors, Schemas, Pipelines, Sources, and Segments.

Connectors

Connectors define how KaireonAI reaches your external data. 22 connector types are supported across 7 categories:

Cloud Storage

ConnectorDescriptionFile Formats
Amazon S3AWS object storage with region selectionCSV, JSON, Parquet, Avro, ORC, TSV, XML
Google Cloud StorageGCP object storageCSV, JSON, Parquet
Azure Blob StorageAzure object storageCSV, JSON, Parquet
SFTPSecure file transferCSV, JSON, Parquet

Databases

ConnectorDescriptionAuth Methods
PostgreSQLWith CDC via logical replicationUsername/password, SSL modes
MySQL / MariaDBWith optional SSLUsername/password
MongoDBAtlas or self-hostedConnection string

Data Warehouses

ConnectorDescriptionAuth Methods
SnowflakeFull warehouse accessOAuth, Key-Pair Token
DatabricksUnity Catalog supportOAuth2, Personal Access Token
Google BigQueryMulti-region and regionalService account JSON
Amazon RedshiftCluster and serverlessUsername/password

Streaming

ConnectorDescription
Apache KafkaSelf-hosted with Schema Registry support
Confluent CloudManaged Kafka with Schema Registry
Amazon KinesisReal-time data streams (LATEST or TRIM_HORIZON)

CRM / CDP

ConnectorDescription
SalesforceObjects sync with API versioning
HubSpotContacts, companies, deals
SegmentUser profiles, events, audiences
BrazeCampaign performance, user engagement

APIs & Files

ConnectorDescription
REST APIGET/POST with pagination (offset, cursor, link)
WebhookInbound data with API key or bearer token auth
CSV File UploadDirect upload with delimiter and header detection
ShopifyOrders, customers, products via Admin API
StripePayments, customers, subscriptions, invoices
MailchimpAudiences, campaigns, engagement data
Each connector type has its own configuration form with connection fields, authentication options, and a Test Connection button to verify connectivity before saving. With connectors configured, you define schemas to structure the data they bring in.

Schemas

Schemas define your entity structure — customers, accounts, transactions, etc. When you create a schema, KaireonAI creates an actual PostgreSQL table with the fields you define.

Supported Field Types

TypePostgreSQL MappingUse Case
textTEXTNames, emails, free-form strings
integerINTEGERCounts, IDs, discrete values
bigintBIGINTLarge numeric IDs
decimalNUMERICMonetary values, precise decimals
floatDOUBLE PRECISIONScores, percentages
booleanBOOLEANFlags, toggles
dateDATEBirth dates, start dates
timestampTIMESTAMPTZEvent times, audit timestamps
jsonJSONBNested/dynamic data
uuidUUIDUnique identifiers

Schema References

You reference schemas throughout the platform:
  • Enrichment stages in decision flows — load customer data at decision time
  • Computed values — formulas that reference customer.* fields from schema tables
  • Pipelines — as target destinations for ETL workflows
  • Segments — define customer cohorts based on schema field conditions
Schema table names are auto-generated from the schema name. A schema called “Customer Profile” creates a PostgreSQL table named customer_profile.
Once your schemas are in place, you build pipelines to load data into them.

Pipelines

Pipelines are visual ETL workflows built with a drag-and-drop flow editor. Each pipeline connects source nodes → transform nodes → target nodes.

Node Types

  • Source nodes — Read from a connector (any of the 22 types above)
  • Transform nodes — Apply one of 14 data transformations (see below)
  • Target nodes — Write to a schema table or external destination

Transform Types

KaireonAI provides 14 built-in transform types:
Rename columns to standardize naming conventions across data sources. Map source field names to target field names.
Convert data types between: string, integer, bigint, float, numeric, boolean, date, timestamp, json, uuid.
Compute new fields using SQL-like expressions with 50+ built-in functions across 5 categories:
CategoryFunctions
StringUPPER, LOWER, TRIM, LTRIM, RTRIM, SUBSTRING, REPLACE, CONCAT, LENGTH, LEFT, RIGHT, SPLIT_PART
NumericABS, CEIL, FLOOR, ROUND, MOD, POWER, SQRT, LOG, GREATEST
Date/TimeNOW, DATE_TRUNC, DATE_PART, AGE, DATE_ADD, DATE_SUB, TO_DATE, TO_TIMESTAMP, EXTRACT, DATE_DIFF
Type CastCAST, TO_CHAR, TO_NUMBER, TO_BOOLEAN, COALESCE, NULLIF, CASE WHEN, GREATEST
ConditionalCASE, COALESCE, NULLIF, IIF
Keep only rows matching specified conditions. Supports operators: =, !=, >, <, >=, <=, LIKE, IN, IS NULL, IS NOT NULL.
Remove unwanted columns from the data flow.
Add a new column with a default value or computed expression.
Replace values using a JSON lookup table. Useful for code-to-label mapping (e.g., "M" → "Male", "F" → "Female").
Split a single field into multiple fields by a separator character (e.g., split full name into first/last).
Concatenate multiple columns into a single field with an optional separator.
Remove duplicate rows based on one or more key columns. Keeps the first occurrence.
GROUP BY one or more columns with aggregate functions: SUM, COUNT, AVG, MIN, MAX.
LEFT JOIN with another schema table to enrich data. Specify the join key and which fields to pull from the lookup table.
Apply SHA-256 or SHA-512 hashing to fields for anonymization or deduplication keys.
Auto-detect and mask personally identifiable information including SSN, email addresses, phone numbers, and credit card numbers. Supports partial masking (e.g., ***-**-1234).

Execution Config

Pipelines support three execution modes:
ModeDescriptionUse Case
BatchProcess all records in configurable batch sizesFull loads, daily syncs
Micro-BatchSmall frequent batchesNear real-time with controlled throughput
StreamingContinuous record-by-record processingReal-time event streams

Configuration Options

SettingDefaultDescription
Batch Size10,000Records per batch (batch/micro-batch modes)
Parallelism1Concurrent workers (1–16)
PartitioningNonePartition key for distributed processing
Error HandlingFailskip (continue on error), fail (stop pipeline), dlq (route to dead-letter queue)
SchedulingManualCron expression for automatic runs

Supported File Formats

When reading from file-based sources (S3, GCS, Azure Blob, SFTP, CSV Upload):
FormatExtensionsNotes
CSV.csvConfigurable delimiter, header detection
JSON.jsonArray of objects or nested
JSON Lines.jsonlOne JSON object per line
Parquet.parquetColumnar, schema-preserving
Avro.avroSchema-embedded binary
ORC.orcOptimized row columnar
TSV.tsvTab-separated values
XML.xmlStructured markup

Next Steps

Decisioning Studio

Set up offers, rules, and decision flows that use your data.

Core Concepts

Understand how data connects to decisioning and delivery.