Skip to main content
The Criteo Uplift dataset is a minimal, experiment-focused pack designed to test the uplift z-test calculation engine. Unlike the Starbucks dataset (10 offers, 4 channels), Criteo has just 1 offer, 1 channel, and 1 creative. Its purpose is: show an ad (treatment) or do not (control), then measure if the ad caused a conversion. Based on the Criteo Uplift Prediction Dataset from Kaggle (Criteo Research). The original dataset contains 25M rows with 12 anonymous features. KaireonAI generates a synthetic 2000-row sample.

What Gets Created

EntityCountDescription
Data Schemas + DDL Tables2criteo_customers (2000 rows), criteo_experiments (2000 rows)
Categories1Criteo: Digital Advertising
Channels1Criteo: Display (web/api)
Offers1Criteo: Ad Campaign (priority 90, 100k/month,100k/month, 0.50/unit)
Creatives1Criteo: Ad Campaign — Display (banner)
Qualification Rules0None — all customers eligible
Algorithm Models2Treatment Group Model (5% prior), Control Group Model (3% prior)
Experiment1Treatment vs Control — 50/50 split
Customer Rows2000CRIT-000000 through CRIT-001999
Experiment Rows2000Treatment/control assignment + visit/conversion outcomes
Pure uplift measurement does not need qualification rules, contact policies, multiple channels, or customer segments. The entire point is binary: show ad or not, then measure causation.

Step-by-Step Walkthrough

1
Load the Criteo Uplift Dataset
2
curl -X POST "http://localhost:3000/api/v1/seed-dataset/criteo-uplift?force=true" \
  -H "Content-Type: application/json" \
  -H "X-Requested-With: XMLHttpRequest"
3

Expected Response (201 Created)

{
  "success": true,
  "dataset": "criteo-uplift",
  "created": {
    "schemas": 2,
    "categories": 1,
    "subCategories": 1,
    "channels": 1,
    "offers": 1,
    "creatives": 1,
    "qualificationRules": 0,
    "contactPolicies": 0,
    "outcomeTypes": 5,
    "models": 2,
    "experiments": 1,
    "decisionFlows": 1,
    "segments": 0,
    "customerRows": 2000,
    "experimentRows": 2000,
    "interactionHistory": 500,
    "interactionSummaries": 500
  }
}
4
Understand the Treatment vs Control Setup
5
Each customer is randomly assigned to treatment or control:
6
GroupShareVisit RateConversion RateExposureTreatment (treatment_group = 1)~50% (~1000)8%3.5%Shown display adsControl (treatment_group = 0)~50% (~1000)5%2.0%Not shown any ads
7
Make a Binary Treatment Decision
8
curl -X POST "http://localhost:3000/api/v1/recommend" \
  -H "Content-Type: application/json" \
  -H "X-Requested-With: XMLHttpRequest" \
  -d '{
  "customerId": "CRIT-000042",
  "channel": "web",
  "placement": "display_banner",
  "limit": 1,
  "context": {
    "source": "criteo-uplift-test"
  }
}'
9

Response — Treatment (ad served)

{
  "recommendations": [
    {
      "offerId": "uuid-...",
      "offerName": "Criteo: Ad Campaign",
      "creativeId": "uuid-...",
      "creativeName": "Criteo: Ad Campaign — Display",
      "score": 0.847,
      "rank": 1,
      "content": {
        "headline": "Criteo Display Ad",
        "body": "Targeted display advertisement for uplift measurement",
        "cta": "Learn More"
      }
    }
  ],
  "experimentGroup": "treatment",
  "modelUsed": "criteo-treatment-model"
}
10

Response — Control (no ad)

{
  "recommendations": [],
  "experimentGroup": "control",
  "modelUsed": "criteo-control-model"
}
11
The experiment engine uses deterministic hashing on the customer ID. With a 50/50 split, roughly half of CRIT-* customers receive the ad and the other half get an empty recommendations array. The experimentGroup field tells you which group.
12
Record Visit and Conversion Outcomes
13
Record a Visit (click)
curl -X POST "http://localhost:3000/api/v1/respond" \
  -H "Content-Type: application/json" \
  -H "X-Requested-With: XMLHttpRequest" \
  -d '{
  "customerId": "CRIT-000042",
  "decisionId": "uuid-from-recommend-response",
  "offerId": "uuid-of-ad-campaign",
  "outcomeType": "click",
  "context": {
    "source": "site-visit-tracker",
    "visited": 1
  }
}'
Record a Conversion
curl -X POST "http://localhost:3000/api/v1/respond" \
  -H "Content-Type: application/json" \
  -H "X-Requested-With: XMLHttpRequest" \
  -d '{
  "customerId": "CRIT-000042",
  "decisionId": "uuid-from-recommend-response",
  "offerId": "uuid-of-ad-campaign",
  "outcomeType": "convert",
  "conversionValue": 47.50,
  "context": {
    "source": "checkout-tracker",
    "converted": 1,
    "visited": 1
  }
}'
14
Train Both Models
15
# Get model IDs
TREATMENT_MODEL_ID=$(curl -s "http://localhost:3000/api/v1/algorithm-models" \
  -H "X-Requested-With: XMLHttpRequest" | jq -r '.[] | select(.key == "criteo-treatment-model") | .id')

CONTROL_MODEL_ID=$(curl -s "http://localhost:3000/api/v1/algorithm-models" \
  -H "X-Requested-With: XMLHttpRequest" | jq -r '.[] | select(.key == "criteo-control-model") | .id')

# Train treatment model
curl -X POST "http://localhost:3000/api/v1/algorithm-models/$TREATMENT_MODEL_ID/train" \
  -H "Content-Type: application/json" \
  -H "X-Requested-With: XMLHttpRequest" \
  -d '{ "trainingConfig": { "lookbackDays": 90, "experimentGroup": "treatment" } }'

# Train control model
curl -X POST "http://localhost:3000/api/v1/algorithm-models/$CONTROL_MODEL_ID/train" \
  -H "Content-Type: application/json" \
  -H "X-Requested-With: XMLHttpRequest" \
  -d '{ "trainingConfig": { "lookbackDays": 90, "experimentGroup": "control" } }'
16
Compute Uplift
17
EXPERIMENT_ID=$(curl -s "http://localhost:3000/api/v1/experiments" \
  -H "X-Requested-With: XMLHttpRequest" | jq -r '.[0].id')

curl -X POST "http://localhost:3000/api/v1/experiments/$EXPERIMENT_ID/uplift" \
  -H "Content-Type: application/json" \
  -H "X-Requested-With: XMLHttpRequest"
18

Uplift Results

{
  "experimentId": "uuid-...",
  "experimentName": "Criteo: Treatment vs Control",
  "uplift": {
    "treatmentConversionRate": 0.035,
    "holdoutConversionRate": 0.020,
    "uplift": 0.015,
    "relativeUplift": 0.75,
    "zScore": 2.14,
    "pValue": 0.032,
    "significant": true,
    "treatmentSamples": 1000,
    "holdoutSamples": 1000
  }
}

The Uplift Formulas

KaireonAI computes uplift using a two-proportion z-test:
FormulaExpression
Absolute Upliftuplift = pT - pH
Relative UpliftrelativeUplift = (pT - pH) / pH
Pooled Proportionpooled = (treatmentConversions + holdoutConversions) / (nT + nH)
Standard ErrorSE = sqrt(pooled x (1 - pooled) x (1/nT + 1/nH))
z-Scorez = (pT - pH) / SE
Significance`p-value = 2 x (1 - Phi(z))` — significant if p < 0.05

Worked Example

StepCalculationResult
Treatment rate (pT)~35 conversions / 1000 customers0.035
Control rate (pH)~20 conversions / 1000 customers0.020
Absolute uplift0.035 - 0.0200.015 (+1.5pp)
Relative uplift(0.035 - 0.020) / 0.0200.75 (+75%)
Pooled proportion(35 + 20) / (1000 + 1000)0.0275
Standard errorsqrt(0.0275 x 0.9725 x (1/1000 + 1/1000))0.00731
z-score0.015 / 0.007312.052
p-value2 x (1 - Phi(2.052))0.040
Significant?0.040 < 0.05Yes

Two Bayesian Models

PropertyTreatment ModelControl Model
Keycriteo-treatment-modelcriteo-control-model
TypeBayesian (Naive Bayes)Bayesian (Naive Bayes)
Prior Positive Rate5% (0.05)3% (0.03)
Target Fieldconvertedconverted
Predictorsf0-f5 (6 features)f0-f5 (6 features)
The treatment model has a higher prior because customers who see ads are expected to convert at a higher base rate. After training on real data, the posteriors diverge further based on observed conversion rates.

Anonymous Feature Schema (f0-f11)

The dataset uses 12 anonymous numeric features normalized to 0-1:
FeatureCorrelationsIn Model
f0Independent (base)Yes (importance 0.20)
f1Independent (base)Yes (importance 0.18)
f20.3 x f0 + 0.7 x randomYes (importance 0.17)
f30.4 x f1 + 0.6 x randomYes (importance 0.16)
f4IndependentYes (importance 0.15)
f5IndependentYes (importance 0.14)
f6-f11VariousNo (available for feature engineering)

Decision Framework

p-value RangeDecisionAction
< 0.01Strong significanceRoll out treatment to 100% of traffic
0.01 — 0.05SignificantRoll out treatment, monitor closely
0.05 — 0.10MarginalContinue experiment, collect more data
> 0.10Not significantNo evidence of uplift — consider stopping

What to Look For

  • Binary treatment decision: The Recommend API returns either one ad (treatment) or nothing (control). The experimentGroup field tells you which.
  • Deterministic assignment: The same customer always falls into the same group via hash — no contamination between groups.
  • Statistical significance: With ~1000 per group and 3.5% vs 2.0% conversion rates, the z-test produces p < 0.05 (significant at 95% confidence).
  • Minimal entity graph: Only 1 offer, 1 channel, 1 creative. This is intentional — the focus is on the uplift math, not the decisioning pipeline.
  • computeUplift() function: The underlying computation is in src/lib/experimentation/uplift.ts and can be called directly in code.

Criteo vs Starbucks — When to Use Which

AspectCriteo UpliftStarbucks
FocusUplift measurement & z-testFull decisioning pipeline
Offers110
Channels14
Qualification Rules02
Contact Policies02
Models2 (treatment/control)3 (scorecard/bayesian/bandit)
Experiment Type50/50 (pure uplift)80/20 (champion/challenger)
Best ForTesting statistical significance engineTesting end-to-end NBA pipeline