The Criteo Uplift dataset is a minimal, experiment-focused pack designed to test the uplift z-test calculation engine. Unlike the Starbucks dataset (10 offers, 4 channels), Criteo has just 1 offer, 1 channel, and 1 creative. Its purpose is: show an ad (treatment) or do not (control), then measure if the ad caused a conversion.
Based on the Criteo Uplift Prediction Dataset from Kaggle (Criteo Research). The original dataset contains 25M rows with 12 anonymous features. KaireonAI generates a synthetic 2000-row sample.
What Gets Created
| Entity | Count | Description |
|---|
| Data Schemas + DDL Tables | 2 | criteo_customers (2000 rows), criteo_experiments (2000 rows) |
| Categories | 1 | Criteo: Digital Advertising |
| Channels | 1 | Criteo: Display (web/api) |
| Offers | 1 | Criteo: Ad Campaign (priority 90, 100k/month,0.50/unit) |
| Creatives | 1 | Criteo: Ad Campaign — Display (banner) |
| Qualification Rules | 0 | None — all customers eligible |
| Algorithm Models | 2 | Treatment Group Model (5% prior), Control Group Model (3% prior) |
| Experiment | 1 | Treatment vs Control — 50/50 split |
| Customer Rows | 2000 | CRIT-000000 through CRIT-001999 |
| Experiment Rows | 2000 | Treatment/control assignment + visit/conversion outcomes |
Pure uplift measurement does not need qualification rules, contact policies, multiple channels, or customer segments. The entire point is binary: show ad or not, then measure causation.
Step-by-Step Walkthrough
Load the Criteo Uplift Dataset
curl -X POST "http://localhost:3000/api/v1/seed-dataset/criteo-uplift?force=true" \
-H "Content-Type: application/json" \
-H "X-Requested-With: XMLHttpRequest"
Expected Response (201 Created)
{
"success": true,
"dataset": "criteo-uplift",
"created": {
"schemas": 2,
"categories": 1,
"subCategories": 1,
"channels": 1,
"offers": 1,
"creatives": 1,
"qualificationRules": 0,
"contactPolicies": 0,
"outcomeTypes": 5,
"models": 2,
"experiments": 1,
"decisionFlows": 1,
"segments": 0,
"customerRows": 2000,
"experimentRows": 2000,
"interactionHistory": 500,
"interactionSummaries": 500
}
}
Understand the Treatment vs Control Setup
Each customer is randomly assigned to treatment or control:
| Group | Share | Visit Rate | Conversion Rate | Exposure |
|---|
| Treatment (treatment_group = 1) | ~50% (~1000) | 8% | 3.5% | Shown display ads |
| Control (treatment_group = 0) | ~50% (~1000) | 5% | 2.0% | Not shown any ads |
Make a Binary Treatment Decision
curl -X POST "http://localhost:3000/api/v1/recommend" \
-H "Content-Type: application/json" \
-H "X-Requested-With: XMLHttpRequest" \
-d '{
"customerId": "CRIT-000042",
"channel": "web",
"placement": "display_banner",
"limit": 1,
"context": {
"source": "criteo-uplift-test"
}
}'
Response — Treatment (ad served)
{
"recommendations": [
{
"offerId": "uuid-...",
"offerName": "Criteo: Ad Campaign",
"creativeId": "uuid-...",
"creativeName": "Criteo: Ad Campaign — Display",
"score": 0.847,
"rank": 1,
"content": {
"headline": "Criteo Display Ad",
"body": "Targeted display advertisement for uplift measurement",
"cta": "Learn More"
}
}
],
"experimentGroup": "treatment",
"modelUsed": "criteo-treatment-model"
}
Response — Control (no ad)
{
"recommendations": [],
"experimentGroup": "control",
"modelUsed": "criteo-control-model"
}
The experiment engine uses deterministic hashing on the customer ID. With a 50/50 split, roughly half of CRIT-* customers receive the ad and the other half get an empty recommendations array. The experimentGroup field tells you which group.
Record Visit and Conversion Outcomes
curl -X POST "http://localhost:3000/api/v1/respond" \
-H "Content-Type: application/json" \
-H "X-Requested-With: XMLHttpRequest" \
-d '{
"customerId": "CRIT-000042",
"decisionId": "uuid-from-recommend-response",
"offerId": "uuid-of-ad-campaign",
"outcomeType": "click",
"context": {
"source": "site-visit-tracker",
"visited": 1
}
}'
curl -X POST "http://localhost:3000/api/v1/respond" \
-H "Content-Type: application/json" \
-H "X-Requested-With: XMLHttpRequest" \
-d '{
"customerId": "CRIT-000042",
"decisionId": "uuid-from-recommend-response",
"offerId": "uuid-of-ad-campaign",
"outcomeType": "convert",
"conversionValue": 47.50,
"context": {
"source": "checkout-tracker",
"converted": 1,
"visited": 1
}
}'
# Get model IDs
TREATMENT_MODEL_ID=$(curl -s "http://localhost:3000/api/v1/algorithm-models" \
-H "X-Requested-With: XMLHttpRequest" | jq -r '.[] | select(.key == "criteo-treatment-model") | .id')
CONTROL_MODEL_ID=$(curl -s "http://localhost:3000/api/v1/algorithm-models" \
-H "X-Requested-With: XMLHttpRequest" | jq -r '.[] | select(.key == "criteo-control-model") | .id')
# Train treatment model
curl -X POST "http://localhost:3000/api/v1/algorithm-models/$TREATMENT_MODEL_ID/train" \
-H "Content-Type: application/json" \
-H "X-Requested-With: XMLHttpRequest" \
-d '{ "trainingConfig": { "lookbackDays": 90, "experimentGroup": "treatment" } }'
# Train control model
curl -X POST "http://localhost:3000/api/v1/algorithm-models/$CONTROL_MODEL_ID/train" \
-H "Content-Type: application/json" \
-H "X-Requested-With: XMLHttpRequest" \
-d '{ "trainingConfig": { "lookbackDays": 90, "experimentGroup": "control" } }'
EXPERIMENT_ID=$(curl -s "http://localhost:3000/api/v1/experiments" \
-H "X-Requested-With: XMLHttpRequest" | jq -r '.[0].id')
curl -X POST "http://localhost:3000/api/v1/experiments/$EXPERIMENT_ID/uplift" \
-H "Content-Type: application/json" \
-H "X-Requested-With: XMLHttpRequest"
Uplift Results
{
"experimentId": "uuid-...",
"experimentName": "Criteo: Treatment vs Control",
"uplift": {
"treatmentConversionRate": 0.035,
"holdoutConversionRate": 0.020,
"uplift": 0.015,
"relativeUplift": 0.75,
"zScore": 2.14,
"pValue": 0.032,
"significant": true,
"treatmentSamples": 1000,
"holdoutSamples": 1000
}
}
KaireonAI computes uplift using a two-proportion z-test:
| Formula | Expression | | |
|---|
| Absolute Uplift | uplift = pT - pH | | |
| Relative Uplift | relativeUplift = (pT - pH) / pH | | |
| Pooled Proportion | pooled = (treatmentConversions + holdoutConversions) / (nT + nH) | | |
| Standard Error | SE = sqrt(pooled x (1 - pooled) x (1/nT + 1/nH)) | | |
| z-Score | z = (pT - pH) / SE | | |
| Significance | `p-value = 2 x (1 - Phi( | z | ))` — significant if p < 0.05 |
Worked Example
| Step | Calculation | Result |
|---|
| Treatment rate (pT) | ~35 conversions / 1000 customers | 0.035 |
| Control rate (pH) | ~20 conversions / 1000 customers | 0.020 |
| Absolute uplift | 0.035 - 0.020 | 0.015 (+1.5pp) |
| Relative uplift | (0.035 - 0.020) / 0.020 | 0.75 (+75%) |
| Pooled proportion | (35 + 20) / (1000 + 1000) | 0.0275 |
| Standard error | sqrt(0.0275 x 0.9725 x (1/1000 + 1/1000)) | 0.00731 |
| z-score | 0.015 / 0.00731 | 2.052 |
| p-value | 2 x (1 - Phi(2.052)) | 0.040 |
| Significant? | 0.040 < 0.05 | Yes |
Two Bayesian Models
| Property | Treatment Model | Control Model |
|---|
| Key | criteo-treatment-model | criteo-control-model |
| Type | Bayesian (Naive Bayes) | Bayesian (Naive Bayes) |
| Prior Positive Rate | 5% (0.05) | 3% (0.03) |
| Target Field | converted | converted |
| Predictors | f0-f5 (6 features) | f0-f5 (6 features) |
The treatment model has a higher prior because customers who see ads are expected to convert at a higher base rate. After training on real data, the posteriors diverge further based on observed conversion rates.
Anonymous Feature Schema (f0-f11)
The dataset uses 12 anonymous numeric features normalized to 0-1:
| Feature | Correlations | In Model |
|---|
f0 | Independent (base) | Yes (importance 0.20) |
f1 | Independent (base) | Yes (importance 0.18) |
f2 | 0.3 x f0 + 0.7 x random | Yes (importance 0.17) |
f3 | 0.4 x f1 + 0.6 x random | Yes (importance 0.16) |
f4 | Independent | Yes (importance 0.15) |
f5 | Independent | Yes (importance 0.14) |
f6-f11 | Various | No (available for feature engineering) |
Decision Framework
| p-value Range | Decision | Action |
|---|
| < 0.01 | Strong significance | Roll out treatment to 100% of traffic |
| 0.01 — 0.05 | Significant | Roll out treatment, monitor closely |
| 0.05 — 0.10 | Marginal | Continue experiment, collect more data |
| > 0.10 | Not significant | No evidence of uplift — consider stopping |
What to Look For
- Binary treatment decision: The Recommend API returns either one ad (treatment) or nothing (control). The
experimentGroup field tells you which.
- Deterministic assignment: The same customer always falls into the same group via hash — no contamination between groups.
- Statistical significance: With ~1000 per group and 3.5% vs 2.0% conversion rates, the z-test produces p < 0.05 (significant at 95% confidence).
- Minimal entity graph: Only 1 offer, 1 channel, 1 creative. This is intentional — the focus is on the uplift math, not the decisioning pipeline.
- computeUplift() function: The underlying computation is in
src/lib/experimentation/uplift.ts and can be called directly in code.
Criteo vs Starbucks — When to Use Which
| Aspect | Criteo Uplift | Starbucks |
|---|
| Focus | Uplift measurement & z-test | Full decisioning pipeline |
| Offers | 1 | 10 |
| Channels | 1 | 4 |
| Qualification Rules | 0 | 2 |
| Contact Policies | 0 | 2 |
| Models | 2 (treatment/control) | 3 (scorecard/bayesian/bandit) |
| Experiment Type | 50/50 (pure uplift) | 80/20 (champion/challenger) |
| Best For | Testing statistical significance engine | Testing end-to-end NBA pipeline |