Criteo Uplift Measurement Dataset Guide

The Criteo Uplift dataset is a minimal, experiment-focused pack designed to test the uplift z-test calculation engine. Unlike the Starbucks dataset (10 offers, 4 channels), Criteo has just 1 offer, 1 channel, and 1 creative. Its purpose is: show an ad (treatment) or do not (control), then measure if the ad caused a conversion. Based on the Criteo Uplift Prediction Dataset from Kaggle (Criteo Research). The original dataset contains 25M rows with 12 anonymous features. KaireonAI generates a synthetic 2000-row sample.

What Gets Created

Entity	Count	Description
Data Schemas + DDL Tables	2	`criteo_customers` (2000 rows), `criteo_experiments` (2000 rows)
Categories	1	Criteo: Digital Advertising
Channels	1	Criteo: Display (web/api)
Offers	1	Criteo: Ad Campaign (priority 90, $100k/month,$ 0.50/unit)
Creatives	1	Criteo: Ad Campaign — Display (banner)
Qualification Rules	0	None — all customers eligible
Algorithm Models	2	Treatment Group Model (5% prior), Control Group Model (3% prior)
Experiment	1	Treatment vs Control — 50/50 split
Customer Rows	2000	CRIT-000000 through CRIT-001999
Experiment Rows	2000	Treatment/control assignment + visit/conversion outcomes

Pure uplift measurement does not need qualification rules, contact policies, multiple channels, or customer segments. The entire point is binary: show ad or not, then measure causation.

Step-by-Step Walkthrough

Load the Criteo Uplift Dataset

curl -X POST "http://localhost:3000/api/v1/seed-dataset/criteo-uplift?force=true" \
  -H "Content-Type: application/json" \
  -H "X-Requested-With: XMLHttpRequest"

Expected Response (201 Created)

{
  "success": true,
  "dataset": "criteo-uplift",
  "created": {
    "schemas": 2,
    "categories": 1,
    "subCategories": 1,
    "channels": 1,
    "offers": 1,
    "creatives": 1,
    "qualificationRules": 0,
    "contactPolicies": 0,
    "outcomeTypes": 5,
    "models": 2,
    "experiments": 1,
    "decisionFlows": 1,
    "segments": 0,
    "customerRows": 2000,
    "experimentRows": 2000,
    "interactionHistory": 500,
    "interactionSummaries": 500
  }
}

Understand the Treatment vs Control Setup

Each customer is randomly assigned to treatment or control:

GroupShareVisit RateConversion RateExposureTreatment (treatment_group = 1)~50% (~1000)8%3.5%Shown display adsControl (treatment_group = 0)~50% (~1000)5%2.0%Not shown any ads

Make a Binary Treatment Decision

curl -X POST "http://localhost:3000/api/v1/recommend" \
  -H "Content-Type: application/json" \
  -H "X-Requested-With: XMLHttpRequest" \
  -d '{
  "customerId": "CRIT-000042",
  "channel": "web",
  "placement": "display_banner",
  "limit": 1,
  "context": {
    "source": "criteo-uplift-test"
  }
}'

Response — Treatment (ad served)

{
  "recommendations": [
    {
      "offerId": "uuid-...",
      "offerName": "Criteo: Ad Campaign",
      "creativeId": "uuid-...",
      "creativeName": "Criteo: Ad Campaign — Display",
      "score": 0.847,
      "rank": 1,
      "content": {
        "headline": "Criteo Display Ad",
        "body": "Targeted display advertisement for uplift measurement",
        "cta": "Learn More"
      }
    }
  ],
  "experimentGroup": "treatment",
  "modelUsed": "criteo-treatment-model"
}

Response — Control (no ad)

{
  "recommendations": [],
  "experimentGroup": "control",
  "modelUsed": "criteo-control-model"
}

The experiment engine uses deterministic hashing on the customer ID. With a 50/50 split, roughly half of CRIT-* customers receive the ad and the other half get an empty recommendations array. The experimentGroup field tells you which group.

Record Visit and Conversion Outcomes

Record a Visit (click)

curl -X POST "http://localhost:3000/api/v1/respond" \
  -H "Content-Type: application/json" \
  -H "X-Requested-With: XMLHttpRequest" \
  -d '{
  "customerId": "CRIT-000042",
  "decisionId": "uuid-from-recommend-response",
  "offerId": "uuid-of-ad-campaign",
  "outcomeType": "click",
  "context": {
    "source": "site-visit-tracker",
    "visited": 1
  }
}'

Record a Conversion

curl -X POST "http://localhost:3000/api/v1/respond" \
  -H "Content-Type: application/json" \
  -H "X-Requested-With: XMLHttpRequest" \
  -d '{
  "customerId": "CRIT-000042",
  "decisionId": "uuid-from-recommend-response",
  "offerId": "uuid-of-ad-campaign",
  "outcomeType": "convert",
  "conversionValue": 47.50,
  "context": {
    "source": "checkout-tracker",
    "converted": 1,
    "visited": 1
  }
}'

Train Both Models

# Get model IDs
TREATMENT_MODEL_ID=$(curl -s "http://localhost:3000/api/v1/algorithm-models" \
  -H "X-Requested-With: XMLHttpRequest" | jq -r '.[] | select(.key == "criteo-treatment-model") | .id')

CONTROL_MODEL_ID=$(curl -s "http://localhost:3000/api/v1/algorithm-models" \
  -H "X-Requested-With: XMLHttpRequest" | jq -r '.[] | select(.key == "criteo-control-model") | .id')

# Train treatment model
curl -X POST "http://localhost:3000/api/v1/algorithm-models/$TREATMENT_MODEL_ID/train" \
  -H "Content-Type: application/json" \
  -H "X-Requested-With: XMLHttpRequest" \
  -d '{ "trainingConfig": { "lookbackDays": 90, "experimentGroup": "treatment" } }'

# Train control model
curl -X POST "http://localhost:3000/api/v1/algorithm-models/$CONTROL_MODEL_ID/train" \
  -H "Content-Type: application/json" \
  -H "X-Requested-With: XMLHttpRequest" \
  -d '{ "trainingConfig": { "lookbackDays": 90, "experimentGroup": "control" } }'

Compute Uplift

EXPERIMENT_ID=$(curl -s "http://localhost:3000/api/v1/experiments" \
  -H "X-Requested-With: XMLHttpRequest" | jq -r '.[0].id')

curl -X POST "http://localhost:3000/api/v1/experiments/$EXPERIMENT_ID/uplift" \
  -H "Content-Type: application/json" \
  -H "X-Requested-With: XMLHttpRequest"

Uplift Results

{
  "experimentId": "uuid-...",
  "experimentName": "Criteo: Treatment vs Control",
  "uplift": {
    "treatmentConversionRate": 0.035,
    "holdoutConversionRate": 0.020,
    "uplift": 0.015,
    "relativeUplift": 0.75,
    "zScore": 2.14,
    "pValue": 0.032,
    "significant": true,
    "treatmentSamples": 1000,
    "holdoutSamples": 1000
  }
}

The Uplift Formulas

KaireonAI computes uplift using a two-proportion z-test:

Formula	Expression
Absolute Uplift	`uplift = pT - pH`
Relative Uplift	`relativeUplift = (pT - pH) / pH`
Pooled Proportion	`pooled = (treatmentConversions + holdoutConversions) / (nT + nH)`
Standard Error	`SE = sqrt(pooled x (1 - pooled) x (1/nT + 1/nH))`
z-Score	`z = (pT - pH) / SE`
Significance	`p-value = 2 x (1 - Phi(	z	))` — significant if p < 0.05

Worked Example

Step	Calculation	Result
Treatment rate (pT)	~35 conversions / 1000 customers	0.035
Control rate (pH)	~20 conversions / 1000 customers	0.020
Absolute uplift	0.035 - 0.020	0.015 (+1.5pp)
Relative uplift	(0.035 - 0.020) / 0.020	0.75 (+75%)
Pooled proportion	(35 + 20) / (1000 + 1000)	0.0275
Standard error	sqrt(0.0275 x 0.9725 x (1/1000 + 1/1000))	0.00731
z-score	0.015 / 0.00731	2.052
p-value	2 x (1 - Phi(2.052))	0.040
Significant?	0.040 < 0.05	Yes

Two Bayesian Models

Property	Treatment Model	Control Model
Key	`criteo-treatment-model`	`criteo-control-model`
Type	Bayesian (Naive Bayes)	Bayesian (Naive Bayes)
Prior Positive Rate	5% (0.05)	3% (0.03)
Target Field	`converted`	`converted`
Predictors	f0-f5 (6 features)	f0-f5 (6 features)

The treatment model has a higher prior because customers who see ads are expected to convert at a higher base rate. After training on real data, the posteriors diverge further based on observed conversion rates.

Anonymous Feature Schema (f0-f11)

The dataset uses 12 anonymous numeric features normalized to 0-1:

Feature	Correlations	In Model
`f0`	Independent (base)	Yes (importance 0.20)
`f1`	Independent (base)	Yes (importance 0.18)
`f2`	0.3 x f0 + 0.7 x random	Yes (importance 0.17)
`f3`	0.4 x f1 + 0.6 x random	Yes (importance 0.16)
`f4`	Independent	Yes (importance 0.15)
`f5`	Independent	Yes (importance 0.14)
`f6`-`f11`	Various	No (available for feature engineering)

Decision Framework

p-value Range	Decision	Action
< 0.01	Strong significance	Roll out treatment to 100% of traffic
0.01 — 0.05	Significant	Roll out treatment, monitor closely
0.05 — 0.10	Marginal	Continue experiment, collect more data
> 0.10	Not significant	No evidence of uplift — consider stopping

What to Look For

Binary treatment decision: The Recommend API returns either one ad (treatment) or nothing (control). The experimentGroup field tells you which.
Deterministic assignment: The same customer always falls into the same group via hash — no contamination between groups.
Statistical significance: With ~1000 per group and 3.5% vs 2.0% conversion rates, the z-test produces p < 0.05 (significant at 95% confidence).
Minimal entity graph: Only 1 offer, 1 channel, 1 creative. This is intentional — the focus is on the uplift math, not the decisioning pipeline.
computeUplift() function: The underlying computation is in src/lib/experimentation/uplift.ts and can be called directly in code.

Criteo vs Starbucks — When to Use Which

Aspect	Criteo Uplift	Starbucks
Focus	Uplift measurement & z-test	Full decisioning pipeline
Offers	1	10
Channels	1	4
Qualification Rules	0	2
Contact Policies	0	2
Models	2 (treatment/control)	3 (scorecard/bayesian/bandit)
Experiment Type	50/50 (pure uplift)	80/20 (champion/challenger)
Best For	Testing statistical significance engine	Testing end-to-end NBA pipeline

Get Started

Deploy & Operate

Runbooks

Data Platform

Decisioning Studio

Execute & Optimize

Intelligence

Platform & Security

Integrations

Criteo Uplift Measurement Dataset Guide

What Gets Created

Step-by-Step Walkthrough

The Uplift Formulas

Worked Example

Two Bayesian Models

Anonymous Feature Schema (f0-f11)

Decision Framework

What to Look For

Criteo vs Starbucks — When to Use Which

Get Started

Deploy & Operate

Runbooks

Data Platform

Decisioning Studio

Execute & Optimize

Intelligence

Platform & Security

Integrations

​What Gets Created

​Step-by-Step Walkthrough

​The Uplift Formulas

​Worked Example

​Two Bayesian Models

​Anonymous Feature Schema (f0-f11)

​Decision Framework

​What to Look For

​Criteo vs Starbucks — When to Use Which

What Gets Created

Step-by-Step Walkthrough

The Uplift Formulas

Worked Example

Two Bayesian Models

Anonymous Feature Schema (f0-f11)

Decision Framework

What to Look For

Criteo vs Starbucks — When to Use Which