Professional Documents
Culture Documents
/0)&*1/2/*34%(+4(
E-"*'/@(*7*B0((*>(>#(0F-+$D*&2-0%(&*$(B2*2'%&*>-+2'5
3%C+*"G*B-0*A()%">*/+)*C(2*/+*(H20/*-+(
105*6-#(02*7ü#$(0 8-$$-.
3/@(
!"#$%&"'()&$*"*+,&+"-(."*/$0&'1(.&2
.34$-&'1(&'(5#.6
?(/0+*'-.*2-*#"%$)*AAA&*B-0*)%BB(0(+2*4-"+20%(&*2'(*0%C'2*./D
!'-2-*#D*I++%(*3G0/22*-+*J+&G$/&'
8'0*349+0&3'(03(."*/$0&'1(.&2(.34$-&'1(&'(5#0,3'
K'%4'*/)@(02%&%+C*&G(+)%+C&*/0(*0(/$$D*)0%@%+C*D-"0*&/$(&L
2-./0)&)/2/&4%(+4(54->
I have also covered Bayesian marketing mix modeling, a way to get more
robust models and uncertainty estimates for everything you forecast.
!"#$%&"'(."*/$0&'1(.&2(.34$-&'1(&'(5#0,3'(:&"(5#.6;
M&2%>/2(*2'(*&/2"0/2%-+N*4/00D-@(0N*/+)*-2'(0*G/0/>(2(0&*/$$
/2*-+4(N*%+4$")%+C*2'(%0*"+4(02/%+2D
2-./0)&)/2/&4%(+4(54->
1. Ignore the fact that there are several countries in the dataset and build a
single big model.
!"#$%&"'()&$*"*+,&+"-(.34$-&'1(&'(5#.6;
O"0%+C*D-"0*>-)($P&*/>+(&%/
2-./0)&)/2/&4%(+4(54->
:
How we can benefit from this in our case exactly you ask? As an example,
Bayesian hierarchical modeling could produce a model where the TV carryover
values in neighboring countries are not too far apart from each other, which
counters overfitting effects.
However, if the data clearly suggests that parameters are in fact completely
different, the Bayesian hierarchical model will be able to pick this up as well,
given enough data.
In the following, I will show you how to combine the Bayesian marketing mix
modeling (BMMM) with the Bayesian hierarchical modeling (BHM)
approach to create a — maybe you guessed it — a Bayesian hierarchical
marketing mix model (BHMMM) in Python using PyMC.
!"###$%$!###$&$!"#
Researchers from the former Google Inc. have also written a paper about this
idea that I encourage you to check out later as well. [1] You should be able to
understand this paper quite well after you have understood my articles about
BMMM and BHM.
Note that I do not use PyMC3 anymore but PyMC, which is a facelift of this
great library. Fortunately, if you knew PyMC3 before, you will be able to pick
up on PyMC as well. Let’s get started!
5*$<"*"0&3'%
First, we will load a synthetic dataset that I made up myself, which is fine for
training purposes.
:
dataset_link =
"https://raw.githubusercontent.com/Garve/datasets/fdb81840fb
96faeda5a874efa1b9bbfb83ce1929/bhmmm.csv"
data = pd.read_csv(dataset_link)
X = data.drop(columns=["Sales", "Date"])
y = data["Sales"]
UG(+*%+*/GG V(2*&2/02()
RST R
,'(*)/2/N*40(/2()*#D*2'(*/"2'-05*Q>/C(*#D*2'(*/"2'-05
Now, let me copy over some functions from my other article, one for
computing exponential saturation and one for dealing with carryovers. I
adjusted them — i.e. changed theano.tensor to aesara.tensor , and tt to at
x_lags = at.stack(
[at.concatenate([
at.zeros(i),
x[:x.shape[0]-i]
]) for i in range(length)]
)
!)...(!9&-4&'1
Before we start with the full model, we could start by building separate
models, just to see what happens and to have a kind of baseline.
=$<"*"0$(.34$-%
If we follow the methodology from here, we get for Germany:
:
A-)($*G0()%42%-+*B-0*V(0>/+D5*Q>/C(*#D*2'(*/"2'-05
A quite nice fit. However, for Switzerland we only have 20 observations, so the
predictions are not too great:
A-)($*G0()%42%-+*B-0*3.%2W(0$/+)5*Q>/C(*#D*2'(*/"2'-05
This is exactly the reason why the separate models approach is sometimes
problematic. There is reason to believe that the people in Switzerland are not
completely different from the people in Germany regarding the impact of
media on them, and a model should be able to capture this.
We can also see what the Switzerland model has learned about the
parameters:
coefTV
:
coefTV satTV carT\
0 1000 2000 3000 4000 5000 0.0 0.2 0.4 0.6 0.8 1.0
coefBanners satBanners carBanners
base noise
mean=1333 hear42328
94%HDI 94%HDI
1000 2000 3000 4000 5000 1500 2000 2500 3000 3500 4000 4500
!-&2(0%-0&*B-0*2'(*3.%2W(0$/+)*>-)($5*Q>/C(*#D*2'(*/"2'-05
The posteriors are still quite wide due to the lack of Switzerland data points.
You can see this from the car_ parameters on the right: the 94% HDI of the
carryovers nearly spans across the entire possible range between 0 and 1.
Let us build a proper BHMMM now, so especially Switzerland can benefit from
the larger amount of data that we have from Germany and Austria.
5#.6(8><-$>$'0"0&3'
We introduce some hyperpriors that shape the underlying distribution over
all countries. For example, the carryover is modeled using a Beta distribution.
This distribution has two parameters α and β, and we reserve two hyperpriors
car_alpha and car_beta to model these.
:
In Line 15, you can see how the hyperpriors are used then to define the
carryover per country and channel. Furthermore, I use more tuning steps than
usual — 3000 instead of 1000 — because the model is quite complex. Having
more tuning steps gives the model an easier time inferring.
6,$+/&'1(0,$(?90<90
Let us only take a look at how well the model captures the data.
:
XYAAA*G0()%42%-+*0(&"$2&*-B*V(0>/+D*/+)*I"&20%/5*Q>/C(*#D*2'(*/"2'-05
I will not conduct any real checks with metrics now, but from the plots, we can
see that the performance of Germany and Austria looks quite well.
XYAAA*Z$(B2[*/+)*XAAA*Z0%C'2[*G0()%42%-+*0(&"$2&*-B*3.%2W(0$/+)5*Q>/C(*#D*2'(*/"2'-05
This is only possible because we have given Switzerland some context using
the data of other, similar countries.
We can also see how the posteriors of the carryovers of Switzerland narrowed
down:
:
Q>/C(*#D*2'(*/"2'-05
Some distributions are still a bit wild, and we would have to take a deeper look
into how to fix this. There might be sampling issues or the priors might be bad,
among other things. However, we will not do that here.
63'+-9%&3'
In this article, we have taken a quick look at two different Bayesian concepts:
This method works so well because it gives the model context: if you tell the
model to give a forecast for one country, it can take the information about
other countries into account. This is crucial if the model has to operate on a
dataset that is otherwise too small.
Another thought that I want to give you on your way is the following: In this
article, we used a country hierarchy. However, you can think of other
hierarchies as well, for example, a channel hierarchy. A channel hierarchy
can arise if you say that different channels should behave not too differently,
for example if your model not only takes banner spendings but banner
spendings on website A and banner spendings on website B, where the user
behavior of websites A and B are not too different.
@$A$*$'+$%
[1] Y. Sun, Y. Wang, Y. Jin, D. Chan, J. Koehler, Geo-level Bayesian
Hierarchical Media Mix Modeling (2017)
I hope that you learned something new, interesting, and useful today. Thanks
for reading!
To be transparent, the price for you does not change, but about half of the
subscription fees go directly to me.
'($)*+$,-./$-0)$1+/234*025$6743/$8/$*0$940:/;'0<
!"#$%&'%()*%+,-%./*"/01-
XD*,-./0)&*1/2/*34%(+4(
M@(0D*,'"0&)/DN*2'(*\/0%/#$(*)($%@(0&*2'(*@(0D*#(&2*-B*,-./0)&*1/2/*34%(+4(]*B0->*'/+)&F-+*2"2-0%/$&*/+)
4"22%+CF()C(*0(&(/04'*2-*-0%C%+/$*B(/2"0(&*D-"*)-+^2*./+2*2-*>%&&5_,/`(*/*$--`5
V(2*2'%&*+(.&$(22(0
:
I#-"2 Y($G ,(0>& !0%@/4D
2-3%3,-%4-5"&6%/''
: