Probabilistic Machine Learning: Exponential Families

Probabilistic Machine Learning
Lecture 04
Exponential Families
Philipp Hennig
27 April 2023
Faculty of Science
Department of Computer Science
Chair for the Methods of Machine Learning
Example — inferring probability of wearing glasses
Step 1: Construct σ-algebra exposition by Stefan Harmeling
Represent all unknowns as random variables (RVs)

▶ probability to wear glasses is represented by RV Y
▶ five observations are represented by RVs X1 , X2 , X3 , X4 , X5
Possible values of the RVs
▶ Y takes values π ∈ [0, 1]
▶ X1 , X2 , X3 , X4 , X5 are binary, i.e. values 0 and 1
Graphical representation Generative model and joint probability
▶ we abbreviate Y = π as π, Xi = xi as xi
Y ▶ p(π) is the prior of Y, written fully p(Y = π)
▶ p(xi |π) is the likelihood of observation xi
▶ note that the likelihood is a function of π
X1 X2 X3 X4 X5
Probabilistic ML — P. Hennig, SS 2023 — © Philipp Hennig, 2023 CC BY-NC-SA 4.0 1

Step 2: Define probability space, taking care of conditional independence exposition by Stefan Harmeling
Probability of wearing glasses without observations

p(π|“nothing”) = p(π)
Probability of wearing glasses after one observation
p(x1 |π)p(π)
p(π|x1 ) = R = Z−1
1 p(x1 |π)p(π)
p(x1 |π)p(π) dπ
Probability of wearing glasses after two observations
p(π|x1 , x2 ) = Z−1 −1
2 p(x2 |x1 , π)p(x1 |π)p(π) = Z2 p(x2 |π)p(x1 |π)p(π)
…
Probability of wearing glasses after five observations
!
Y
5
p(π|x1 , x2 , x3 , x4 , x5 ) = Z−1
5 p(xi |π) p(π)
i=1

Step 3: Define analytic forms of generative model exposition by Stefan Harmeling
What is the likelihood?

π for x1 = 1
p(x1 |π) =
1−π for x1 = 0
More helpful RVs:

▶ RV N for the number of observations being 1 (with values n)
▶ RV M for the number of observations being 0 (with values m)
Probability of wearing glasses after five observations
!
Y
5
p(π|x1 , x2 , x3 , x4 , x5 ) = Z−1
5 p(xi |π) p(π)
i=1
= Z−1
5 π (1 − π) p(π)
n m
= p(π|n, m)

Step 4: make computationally convenient choices. Here: a conjugate prior exposition by Stefan Harmeling
Posterior after seeing five observations:
p(π|n, m) = Z−1
5 π (1 − π) p(π)
n m
What prior p(π) would make the calculations easy?
p(π) = Z−1 π a−1 (1 − π)b−1 with parameters a > 0, b > 0
the Beta distribution with parameter a and b

Let’s give the normalization factor Z of the beta distribution a name!
Z 1
B(a, b) = π a−1 (1 − π)b−1 dπ
0
the Beta function with parameters a and b

Conjugate Priors
More in the next lecture
Quand les valeurs de x, considérées indépendamment du résultat

observé, ne sont pas également possibles; en nommant z la fonction de x
qui exprime leur probabilité; il est facile de voir, par ce qui a été dit dans le
premier chaptire de ce Livre, qu’en changeant dans la formule (1), y dans
y · z, on aura la probabilité que la valeur de x est comprise dans les limites
x = θ and x = θ′ . Cela revient à supposer toutes les valeurs de x
également possible à priori, et à considérer le résultat observé, comme
étant formé de deux résultats indépendans, dont les probabilités sont y et
z. On peut donc ramener ainsi tous les case à celui ou l’on suppose à
priori, avant l’événement, une égal possibilité aux différentes valeurs de x,
et par cette raison, nous adopterons cette hypothèse dans ce qui va
suivre.
Pierre-Simon, marquis de Laplace (1749–1827) Theorie Analytique des Probabilités, 1814, p. 364
Translated by a Deep Network, assisted by a human
Conjugate Priors
More in the next lecture
When the values of x, considered independently of the observed result,

are not equally possible; if we name z the function of x which expresses
their probability; it is easy to see, by what has been said in the first
chapter of this Book, that by changing in formula (1), y in y · z, we will
have the probability that the value of x is within the limits x = θ and
x = θ′ . This amounts to assuming all the values of x equally possible a
priori, and to considering the observed result as being formed by two
independent results, whose probabilities are y and z. We can thus reduce
all the cases to the one where we assume a priori, before the event, an
equal possibility to the different values of x, and by this reason, we will
adopt this hypothesis in what follows.
Theorie Analytique des Probabilités, 1814, p. 364

Pierre-Simon, marquis de Laplace (1749–1827) Translated by a Deep Network, assisted by a human

Conjugate Priors
An observation from last lecture
Definition (Conjugate Prior)

Let D and x be a data-set and a variable to be inferred, respectively, connected by the likelihood
p(D | x) = ℓ(D; x). A conjugate prior to ℓ for x is a probability measure with pdf p(x) = g(x; θ), such
that
ℓ(D; x)g(x; θ)
p(x | D) = R = g(x; θ + ϕ(D)).
ℓ(D; x)g(x; θ) dx
That is, such that the posterior arising from ℓ is of the same functional form as the prior, with updated
parameters arising by adding some sufficient statistics of the observation D to the prior’s parameters.
E. Pitman. Sufficient statistics and intrinsic accuracy (1936). Math. Proc. Cambr. Phil. Soc. 32(4), 1936.
P. Diaconis and D. Ylvisaker, Conjugate priors for exponential families. Annals of Statistics 7(2), 1979.

Analytic Bayesian Inference
Inferring a Binary Distribution [cf. Lecture 2]
Y
n
p(x | f) = f x · (1 − f)1−x x ∈ {0; 1}
i=1
=f n1
· (1 − f)n0
n0 := n − n1
1
p(f | α, β) = B(α, β) = fα−1 (1 − f)β−1
B(α, β)
1
p(f | x) = B(α + n1 , β + n0 ) = fα+n1 −1 (1 − f)β+n0 −1
B(α + n1 , β + n0 )
with
Z 1
Γ(x)Γ(y) (x − 1)! (y − 1)!
B(x, y) := tx−1 (1 − t)y−1 dt = =
0 Γ(x + y) (x + y − 1)!
Pierre-Simon, marquis de Laplace (1749–1827) Z ∞ Z 1
Γ(z) = e−t tz−1 dt = (− log x)z+1 dx = (z − 1)!
Probabilistic ML — P. Hennig, SS 2023 — © Philipp Hennig, 2023 CC BY-NC-SA 4.0 0 0 7
Can we predict observations?
marginalizing over a Beta posterior – the Beta binomial distribution
p(x | f) = f n1 · (1 − f)n0
n0 := n − n1
1
p(f) = B(α, β) = fα−1 (1 − f)β−1
B(α, β)
with
Z 1
Γ(x)Γ(y) (x − 1)! (y − 1)!
B(x, y) := tx−1 (1 − t)y−1 dt = =
0 Γ(x + y) (x + y − 1)!
p(x) =?
Pierre-Simon, marquis de Laplace (1749–1827)

p(x | f) = f n1 · (1 − f)n0
n0 := n − n1
1
p(f) = B(α, β) = fα−1 (1 − f)β−1
B(α, β)
with
Z 1
Γ(x)Γ(y) (x − 1)! (y − 1)!
B(x, y) := tx−1 (1 − t)y−1 dt = =
Γ(x + y) (x + y − 1)!
Z0
p(x) = p(x | f)p(f) df
Z
1
= f n1 · (1 − f)n0 · fα−1 (1 − f)β−1 df
B(α, β)
Pierre-Simon, marquis de Laplace (1749–1827)

p(x | f) = f n1 · (1 − f)n0
n0 := n − n1
1
p(f) = B(α, β) = fα−1 (1 − f)β−1
B(α, β)
with
Z 1
Γ(x)Γ(y) (x − 1)! (y − 1)!
B(x, y) := tx−1 (1 − t)y−1 dt = =
Γ(x + y) (x + y − 1)!
Z0
p(x) = p(x | f)p(f) df
Z
1
= f n1 · (1 − f)n0 · fα−1 (1 − f)β−1 df
B(α, β)
Pierre-Simon, marquis de Laplace (1749–1827) B(α + n1 , β + n0 )
=
B(α, β)
Demo
live: streamlit cloud

local: ▶ git clone https://github.com/philipphennig/ProbML_Apps.git
▶ cd ProbML_Apps/04
▶ pip install -r requirements.txt
▶ streamlit run 04_Conjugate_Prior_Inference.py

Inferring a Categorical Distribution image: Deutsches Museum München
Y
n
p(x) = fxi x ∈ {0; . . . , K}
i=1
Y
K
= fknk nk := |{xi | xi = k}|
k=1
1 Y αk −1
K
p(f | α) = D(α) = fk
B(α)
k=1
p(f | x) = D(α + n)
where
QK
Peter Gustav Lejeune Dirichlet (1805–1859) Γ(α )
B(α) = P k
i=1
Γ( k αk )
Inferring the variance of a Gaussian
Y
n Y
n
1 (xi − µ)2
p(x | σ) = N (xi ; µ, σ 2 ) =√ exp −
2πσ2 2σ 2
i=1 i=1
p(σ) = ?
Daniel Bernoulli (1700-1782)

Y
n Y
n
1 (xi − µ)2
p(x | σ) = N (xi ; µ, σ 2 ) =√ exp −
2πσ2 2σ 2
i=1 i=1
p(σ) = ?
n X1 n
1 n
log p(x | σ) = − log σ 2 − (xi − µ)2 · 2 − log 2π
2 2 σ 2
i
Daniel Bernoulli (1700-1782)

Y
n Y
n
1 (xi − µ)2
p(x | σ) = N (xi ; µ, σ 2 ) =√ exp −
2πσ2 2σ 2
i=1 i=1
p(σ) = ?
n X1 n
1 n
log p(x | σ) = − log σ 2 − (xi − µ)2 · 2 − log 2π
2 2 σ 2
i
1
log p(σ | α, β) = (α − 1) log σ −2 − β · − Z(α, β)
σ2
βα −2
p(σ | α, β) = (σ −2 )α−1 e−βσ =: G(σ −2 ; α, β)
Γ(α)
!
−2 n 1X
Daniel Bernoulli (1700-1782) p(σ | α, β, x) = G σ ; α + , β + (xi − µ) 2
2 2
i

Aside: The Eulerian Integrals
For an evening at the fireplace: [Philip J. Davis. Leonhard Euler’s Integral: A Historical Profile of the Gamma Function, 1959]
For m, n ∈ N and x, y, z ∈ C :
Z 1
)YPIV
\
B(x, y) = tx−1 (1 − t)y−1 dt
0
,EHEQEVH
Γ(x)Γ(y)
= if x + x̄, y + ȳ > 0
Γ(x + y)
(m − 1)! (n − 1)!
B(m, n) =
(m + n − 1)!
Z ∞ Z 1
Γ(z) = e−t tz−1 dt = (− log x)z+1 dx
0 0
Γ(n) = (n − 1)!

Hadamard:
Leonhard Euler (1707–1783)

1 d 1−x x
\ H(x) = log Γ Γ 1−
Probabilistic ML — P. Hennig, SS 2023 — © Philipp Hennig, 2023 CC BY-NC-SA 4.0 Γ(1 − x) dx 2 2 12
marginalizing over a Gamma posterior – the t-distribution

1 (x − µ)2
p(x | σ) = N (x; µ, σ ) = √
2
exp −
2πσ 2 2σ 2
α
β −2
p(σ) = G(σ −2 ; α, β) = (σ −2 )α−1 e−βσ
Γ(α)
Z
p(x) = p(x | σ)p(σ) dσ
Z
(σ −2 )1/2 (x − µ)2 βα −2
= √ exp −σ −2 · (σ −2 )α−1 e−βσ dσ
2π 2 Γ(α)
( )
Z (x−µ)2
1 βα −σ −2 β+ 2
−2 α+ 12 −1
=√ (σ ) e dσ
2π Γ(α)
−α+ 12
1 Γ(α + 12 ) βα Γ(α + 21 ) (x − µ)2
William Sealy Gosset (1876–1937) =√ α+ 12 = p 1+
2π Γ(α) (x−µ)2 2πβΓ(α) 2β
β+ 2
Inferring the Mean of a Gaussian
Y
n Y
n
1 (yi − x)2
p(y | x) = N (yi ; x, σ 2 ) = √ exp −
2πσ2 2σ 2
i=1 i=1
p(x) = ?
Carl Friedrich Gauss (1777–1855)

Y
n Y
n
1 (yi − x)2
p(y | x) = N (yi ; x, σ 2 ) = √ exp −
2πσ2 2σ 2
i=1 i=1
p(x) = ?
n X1 n
1 n
log p(y | x) = − log σ 2 − (yi − x)2 · 2 − log 2π
2 2 σ 2
i
1 X 2
n
n
= − (log σ − log 2π) − 2
2
yi + yi x − x2
2 2σ
i

Y
n Y
n
1 (yi − x)2
p(y | x) = N (yi ; x, σ 2 ) = √ exp −
2πσ2 2σ 2
i=1 i=1
p(x) = ?
n X1 n
1 n
2 2 σ 2
i
1 X 2
n
n
= − (log σ 2 − log 2π) − 2 yi + yi x − x2
2 2σ
i
1 m m2 1
log p(x | m, v2 ) = − 2 x2 + 2 x − 2 − log 2πv2
2v v 2v 2 !
1 1 m 1 X
log p(x | y, m, v ) =
2
x − −2
2
+ 2 yi x + const.
2(v−2 + nσ −2 ) v + nσ −2 v2 σ
i
Y
n Y
n
1 (yi − x)2
p(y | x) = N (yi ; x, σ 2 ) = √ exp −
2πσ2 2σ 2
i=1 i=1
p(x) = ?
n X1 n
1 n
2 2 σ 2
i
1 X 2
n
n
= − (log σ 2 − log 2π) − 2 yi + yi x − x2
2 2σ
i
1 m m2 1
log p(x | m, v2 ) = − 2 x2 + 2 x − 2 − log 2πv2
2v v 2v 2! !
m 1 Xn −1
−2 −2
p(x | y, m, v ) = N x; Ψ 2 + 2
2
yi , Ψ := v + nσ
v σ
i
marginalizing over a Gaussian posterior

1 (y − x)2
p(y | x) = N (y; x, σ 2 ) = √ exp −
2πσ 2 2σ 2
p(x) = N (x; m, v2 )
Z
p(y) = p(y | x)p(x) dx

marginalizing over a Gaussian posterior
Z
1 (y − x)2 1 (x − m)2
= √ exp − ·√ exp − dx
2πσ 2 2σ 2 2πv2 2v2
Z
1 1 1
= √ exp − 2 (y − x)2 − 2 (x − m)2 dx
2
2π σ v 2 2σ 2v
Z
1 1 2 1 2
= √ exp − 2 (y − 2xy + x ) − 2 (x − 2mx + m ) dx
2 2
2π σ 2 v2 2σ 2v
 2 2 2 2 2 
x2
− 2x yσ +mv
+ yσσ2+mv Z
1  y − 2ym + m
2 2
σ 2 +v2 +v2 
= √ exp − −  exp
2π σ 2 v2 2(σ 2 + v2 ) 2(σ 2 + v2 )

1 (y − m)2
=p exp −
2π(σ 2 + v2 ) 2(σ 2 + v2 )

▶ Conjugate priors allow analytic Bayesian inference
▶ How can we construct them in general?

Exponential Families
Exponentials of a Linear Form
Definition (Exponential Family, simplified form)

Consider a random variable X taking values x ∈ X ⊂ Rn . A probability distribution for X with pdf of the
functional form
h(x) ϕ(x)⊺ w
pw (x) = h(x) exp [ϕ(x)⊺ w − log Z(w)] = e = p(x | w)
Z(w)
is called an exponential family of probability measures. The function ϕ : X _ Rd is called the sufficient
statistics. The parameters w ∈ Rd are the natural parameters of pw . The normalization constant
Z(w) : Rd _ R is the partition function. The function h(x) : X _ R+ is the base measure. For
notational convenience, it can be useful to re-parametrize the natural parameters w as w := η(θ) in
terms of canonical parameters θ.

The Binomial Distribution
a quick tour of exponential families

n k
p(k | q) = q · (1 − q)n−k (nb: treating n as fixed)
k

n
= exp(k log q + (n − k) log(1 − q))
k
 
 
n  q 
= exp |{z}
k log + n log(1 − q)
k  1 − q | {z }
|{z} ϕ(k) | {z } − log Z(w)
=:h(k) w=η(q)
w
log Z(w) = n log(1 + e )

The Beta Distribution
1
p(q | α, β) = qα−1 (1 − q)β−1
B(α, β)
 
 ⊺ 
 log q α−1 
1 exp 
= |{z} − log B(α, β)
 log(1 − q) β−1 
h(q) | {z } | {z }
=:ϕ⊺ (q) w
 
⊺
1  log q α 
= exp 
 log(1 − q) − log B(α, β) 

q(1 − q) β
| {z } |{z}
h̃(q) w̃
sufficient statistics ϕ, natural parameters w and base measure h are not uniquely defined.

The Gaussian

1 (x − µ)2
N (x; µ, σ ) = √ 2
exp −
2πσ 2 2σ 2
 
 µ 2 
1  x2 −1 µ 
= √ · exp x · 2 + · 2 − + log σ 
2π | σ {z2 σ } 2σ 2 
| {z } ⊺
| {z }
=:h(x) =:ϕ(x) w =:log Z(w)
Thus we identify the precision and precision-adjusted mean as the natural parameters, and the first
two sample moments as the sufficient statistics:
µ
x w2 1
w := σ21 ϕ(x) := 1 2 log Z(w) := − 1 − log(−w2 )
− σ2 2 x 2w 2 2

A Family Meeting
Exponential families provide the probabilistic analogue to data types
Name sufficient stats domain use case

Bernoulli ϕ(x) = [x] X = {0; 1} coin toss
Poisson ϕ(x) = [x] X = R+ emails per day
Laplace ϕ(x) = [1, x]⊺ X=R floods
Helmert (χ2 ) ϕ(x) = [x, − log x] X=R variances
Dirichlet ϕ(x) = [log x] X = R+ class probabilities
Euler (Γ) ϕ(x) = [x, log x] X = R+ variances
Wishart ϕ(X) = [X, log |X|] X = {X ∈ RN×N | v⊺ Xv ≥ 0∀v ∈ RN } covariances
Gauss ϕ(X) = [X, XX⊺ ] X = RN functions
Boltzmann ϕ(X) = [X, triag(XX⊺ )] X = {0; 1}N thermodynamics

Exponential Families have Conjugate Priors
but the prior’s normalization constant can be tricky
▶ Consider the exponential family pw (x | w) = h(x) exp [ϕ(x)⊺ w − log Z(w)]

R
▶ its conjugate prior is the exponential family F(α, ν) = exp(α⊺ w − ν log Z(w)) dw
⊺
w α
pα (w | α, ν) = exp − log F(α, ν)
− log Z(w) ν
!
Yn X
because pα (w | α, ν) pw (xi | w) ∝ pα w α + ϕ(xi ), ν + n
i=1 i
▶ and the predictive is

Z Z
⊺
p(x) = pw (x | w)pα (w | α, ν) dw = h(x) e(ϕ(x)+α) w+(ν+1) log Z(w)−log F(α,ν) dw
F(ϕ(x) + α, ν + 1)
= h(x)
F(α, ν)
Computing F(α, ν) can be tricky. In general, this is the challenge when constructing an EF.

Summary:
▶ Conjugate Priors allow analytic inference in probabilistic models
Please cite this course, as
▶ Exponential Families guarantee the existence of conjugate priors,
@techreport { Tuebingen_ProbML23 ,
although not always tractable ones title =
{ P r o b a b i l i s t i c Machine L e a r n i n g } ,
▶ The hardest part is finding the normalization constant. In fact, a u t h o r = { Hennig , P h i l i p p } ,
s e r i e s = { L e c t u r e Notes
finding the normalization constant is the only hard part. i n Machine L e a r n i n g } ,
year = {2023} ,
▶ Exponential families are a way to turn someone else’s integral into

i n s t i t u t i o n = { Tübingen AI Center } }
an inference algorithm!
For a long time, exponential families were the only way to do tractable Bayesian inference. In a way, the
essence of machine learning is to use computers to break free from exponential families.

Probabilistic Machine Learning: Exponential Families

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Probabilistic Machine Learning: Exponential Families

Uploaded by

Copyright:

Available Formats

Probabilistic Machine Learning

Represent all unknowns as random variables (RVs)

Probabilistic ML — P. Hennig, SS 2023 — © Philipp Hennig, 2023 CC BY-NC-SA 4.0 1

Probability of wearing glasses without observations

Probabilistic ML — P. Hennig, SS 2023 — © Philipp Hennig, 2023 CC BY-NC-SA 4.0 2

What is the likelihood?

More helpful RVs:

Probabilistic ML — P. Hennig, SS 2023 — © Philipp Hennig, 2023 CC BY-NC-SA 4.0 3

Posterior after seeing five observations:

What prior p(π) would make the calculations easy?

p(π) = Z−1 π a−1 (1 − π)b−1 with parameters a > 0, b > 0

the Beta distribution with parameter a and b

the Beta function with parameters a and b

Probabilistic ML — P. Hennig, SS 2023 — © Philipp Hennig, 2023 CC BY-NC-SA 4.0 4

Quand les valeurs de x, considérées indépendamment du résultat

When the values of x, considered independently of the observed result,

Theorie Analytique des Probabilités, 1814, p. 364

Probabilistic ML — P. Hennig, SS 2023 — © Philipp Hennig, 2023 CC BY-NC-SA 4.0 5

Definition (Conjugate Prior)

Probabilistic ML — P. Hennig, SS 2023 — © Philipp Hennig, 2023 CC BY-NC-SA 4.0 6

Pierre-Simon, marquis de Laplace (1749–1827)

Probabilistic ML — P. Hennig, SS 2023 — © Philipp Hennig, 2023 CC BY-NC-SA 4.0 8

Probabilistic ML — P. Hennig, SS 2023 — © Philipp Hennig, 2023 CC BY-NC-SA 4.0 8

live: streamlit cloud

Probabilistic ML — P. Hennig, SS 2023 — © Philipp Hennig, 2023 CC BY-NC-SA 4.0 9

Daniel Bernoulli (1700-1782)

Probabilistic ML — P. Hennig, SS 2023 — © Philipp Hennig, 2023 CC BY-NC-SA 4.0 11

Daniel Bernoulli (1700-1782)

Probabilistic ML — P. Hennig, SS 2023 — © Philipp Hennig, 2023 CC BY-NC-SA 4.0 11

Probabilistic ML — P. Hennig, SS 2023 — © Philipp Hennig, 2023 CC BY-NC-SA 4.0 11

Carl Friedrich Gauss (1777–1855)

Probabilistic ML — P. Hennig, SS 2023 — © Philipp Hennig, 2023 CC BY-NC-SA 4.0 14

Carl Friedrich Gauss (1777–1855)

Probabilistic ML — P. Hennig, SS 2023 — © Philipp Hennig, 2023 CC BY-NC-SA 4.0 14

Carl Friedrich Gauss (1777–1855)

Probabilistic ML — P. Hennig, SS 2023 — © Philipp Hennig, 2023 CC BY-NC-SA 4.0 15

Probabilistic ML — P. Hennig, SS 2023 — © Philipp Hennig, 2023 CC BY-NC-SA 4.0 15

Probabilistic ML — P. Hennig, SS 2023 — © Philipp Hennig, 2023 CC BY-NC-SA 4.0 16

Definition (Exponential Family, simplified form)

Probabilistic ML — P. Hennig, SS 2023 — © Philipp Hennig, 2023 CC BY-NC-SA 4.0 17

Probabilistic ML — P. Hennig, SS 2023 — © Philipp Hennig, 2023 CC BY-NC-SA 4.0 18

Probabilistic ML — P. Hennig, SS 2023 — © Philipp Hennig, 2023 CC BY-NC-SA 4.0 19

Probabilistic ML — P. Hennig, SS 2023 — © Philipp Hennig, 2023 CC BY-NC-SA 4.0 20

Name sufficient stats domain use case

Probabilistic ML — P. Hennig, SS 2023 — © Philipp Hennig, 2023 CC BY-NC-SA 4.0 21

▶ Consider the exponential family pw (x | w) = h(x) exp [ϕ(x)⊺ w − log Z(w)]

▶ and the predictive is

Probabilistic ML — P. Hennig, SS 2023 — © Philipp Hennig, 2023 CC BY-NC-SA 4.0 22

▶ Exponential families are a way to turn someone else’s integral into

Probabilistic ML — P. Hennig, SS 2023 — © Philipp Hennig, 2023 CC BY-NC-SA 4.0 23

You might also like