You are on page 1of 4

3/23/24, 5:34 PM Uplift_modeling

Uplift modeling (Booking.com)


Lecturers:

- ML Scientists: Amsterdam, London

 ML Highlights:

- ~350 production models


- 800+ ML scientists/engineers

Causal Inference

- How likely it is rainy tomorrow?


- Prediction
- associated factors
- act according to factors
- What makes it less rainy tomorrow?
- Inference
- causal factors
- change factors

Randomised control trial

- A/B Test:
- A (Control): 50%
- B (Variant): 50%
- Metric (comparison)
- Identify reason for difference in metric, if any
- Individual treatment effect (assume can observe both outcomes y):
y(B) - y(A)
- Average treatment effect
ATE: \tau = E[y(B) - y(A)]
- Conditional average treatment effect (where X is another factor e.g. users f
CATE: \tau(x) = E[y(B) - y(A)|X=x]
- Reality: can only observe one outcome y(A) or y(B) for one individual.

file:///C:/Users/matth/Downloads/Uplift_modeling.html 1/4
3/23/24, 5:34 PM Uplift_modeling

Identification

- Causal estimand -> Statistical estimand


- Only some individuals receive treatment (T) B and some A
- E[y|T=B] - E[y|T=A] = ? E[Y(B)-Y(A)]
- True under 4 conditions:
1. Exchangeability:
- E[Y(B)|T=B] = E(Y(B)|T=A)
- Swapping the groups wouldn't change the expected value (i.e. outcomes y
- Violate: If not randomized experiments.
2. Positivity:
- 0 < P(T|X=x) < 1
- All subgroups have some probability of receiving different treatment.
- Violate: If all individuals receive the same treatment.
3. Consistency:
- Y = Y(T)
- The observed outcome y corresponds to the potential outcome under observ
- Violate: If treatment T=B has multiple variants, then the potential outc
4. No interference
- A given user's outcome Y_i is unaffected by anyone else's treatment T_j.
- Violate:
- E.g. Users communicate about their treatment.
- E.g. Users compete for the same supply. Both users T=B, one books, t
- How to prevent interference from communication? e.g. choose users from d

- If above true, can use data to estimate average treatment effect of all users:
$CATE = E[Y|T=B, X=Germany] - E[Y|T=A, X=Germany]$

- If too many conditions X= [x_1, x_2, ...]: may not have enough samples in each t

Uplift modeling

- Estimate CATE using ML


- 2 types of methods: Metalearners, Tailored methods

file:///C:/Users/matth/Downloads/Uplift_modeling.html 2/4
3/23/24, 5:34 PM Uplift_modeling

Metalearners

- Two-model method: Regression models to predict the outcomes (binary)


\mu_1(x) = E[y|T=1, X=x]
\mu_0(x) = E[y|T=0, X=x]
- Treatment effect:
\tau(x) = \mu_1(x) - \mu_0(x)
- Drawback: Models trained independently may lead to suprious effects
- Single model method:
- Treatment becomes a feature
- \mu(t,x) = E[y|T=t, X=x]
- Treatment effect:
\tau(x) = \mu(T=B, X=x) - \mu(T=A, X=x)
- Drawback: Model might drop treatment as a feature if it's not useful (L1 nor
- X-learner method:
- Use T-learner and then impute treatment effect for the control and for the t
T-learners:
\mu_1(x) = E[y|T=1, X=x]
\mu_0(x) = E[y|T=0, X=x]

Cross the models (enables predicting what would be the treatment effect if you
\mu_\tau_0 (x) = E[\mu_1 - y|T=0, X=x]
\mu_\tau_1 (x) = E[\mu_0 - y|T=1, X=x]

Learn final model:


\tau(x) = \mu_\tau_0 e(x) + \mu_\tau_1 (1-e(x))
where e(x) is the propensity score

X-learners can use info from control group to derive better estimators for the

In general, X-learner better when treatment group >> control group

Tailored Methods

- Find features to split the users who underwent different treatments.


- Allows us to find the feature that best split users into pure outcomes (e.g. fea

Evaluating uplift models

- Can't use MSE because no ground truth


- Sort by predicted CATE descending and calculate average CATE per segment
- Plot cumulative avg CATE: more uplift in the beginning then fall (less effec

file:///C:/Users/matth/Downloads/Uplift_modeling.html 3/4
3/23/24, 5:34 PM Uplift_modeling

Uplift curve

- Plot Cumulative incremental conversions vs % of population treated


- Users with negative treatment effect people who was already converted now unconv
- Multi-treatment cumulative gain
1. Find best treatment per individual \tau_i^* = max_t \tau
2. Sort by \tau_i^*
3. Average against control

file:///C:/Users/matth/Downloads/Uplift_modeling.html 4/4

You might also like