You are on page 1of 41

Learning From Data

-1: Probability Theory: Recap


Jörg Schäfer
Frankfurt University of Applied Sciences
Department of Computer Sciences
Nibelungenplatz 1
D-60318 Frankfurt am Main

Wissen durch Praxis stärkt


1/36 Jörg Schäfer | Learning From Data | c b n a -1: Probability Theory: Recap and Prerequisites Summer Semester 2022
Content

Basic Definitions

Random Variables, Independence and Conditional Expectations

Bayes’ Rule

Convergence and Limit Laws

Concentration Inequalities

Probability and Machine Learning

Bibliography

2/36 Jörg Schäfer | Learning From Data | c b n a -1: Probability Theory: Recap and Prerequisites
Probability Theory

How dare we speak of the laws of chance? Is not chance the


antithesis of all law?
Joseph Bertrand, Calcul des probabilités
The theory of probability as a mathematical discipline can and
should be developed from axioms in exactly the same way as
geometry and algebra
Kolmogorov, Andrey

3/36 Jörg Schäfer | Learning From Data | c b n a -1: Probability Theory: Recap and Prerequisites
σ-Algebras

Definition 1
A system of A subsets of a set Ω is called σ-Algebra iff
1. Ω ∈ A
2. A ∈ A ⇒ Ac ∈ A
S∞
3. Ai ∈ A ⇒ i=1 Ai ∈A

Examples for σ-Algebras are


The power set P(Ω) of an arbitrary set Ω
The intersection of σ-Algebras

4/36 Jörg Schäfer | Learning From Data | c b n a -1: Probability Theory: Recap and Prerequisites
Generating σ-Algebras

Definition 2
Let C ⊂ Ω denote an arbitrary subset of Ω. Then we denote by A(C) the
smallest σ-Algebra containing C.

Theorem 3
A(C) as defined above is well-defined.

5/36 Jörg Schäfer | Learning From Data | c b n a -1: Probability Theory: Recap and Prerequisites
Measures

Definition 4
Let A denote a σ-algebra. A function µ : A → R with
1. µ(∅) = 0
2. µ ≥ 0
3. If An are pairwise disjoint, then

[ ∞
X

µ Ai = µ(Ai )
i=1 i=1

is called a measure on A.

6/36 Jörg Schäfer | Learning From Data | c b n a -1: Probability Theory: Recap and Prerequisites
Probability Spaces
Definition 5
A set Ω with a σ-algebra A including Ω and a measure P defined on A
such that P(Ω) = 1 is called a probability space with a probability
measure P. We usually use the notation (Ω, A, P).

Lemma 6
For any probability space, the following holds true

P(∅) = 0
P(Ac ) = 1 − P(A)
P(A \ B) = P(A) − P(A ∩ B)
[ X
P( Ai ) ≤ P(Ai ).

7/36 Jörg Schäfer | Learning From Data | c b n a -1: Probability Theory: Recap and Prerequisites
Probability Space Examples

1. Any finite set Ω with the set of all subsets as a σ-algebra can be
made into probability space by choosing an arbitrary function
g : Ω → R+ such as x ∈Ω g(x ) = 1 and specifying a probability
P

measure as X
P(A) := g(x )
x ∈A

2. Ω = [0, 1], A(Ω) the standard


R
σ-algebra of Lebesgue-measurable sets
and P(A) := λ(A) = A dλ, where λ denotes the Lebesgue-Measure.

8/36 Jörg Schäfer | Learning From Data | c b n a -1: Probability Theory: Recap and Prerequisites
Random Variables
Definition 7
Let Ω be a measurable space, i.e. a space equipped with a σ-algebra A
on Ω. Let B denote a measurable space with σ-algebra B. Then a map
f : Ω → B from Ω to a B is called measurable iff

∀B ∈ B : f −1 (B) ∈ A.

Definition 8
Let X be a measurable map from a probability space (Ω, A, P) to a
measurable space B. Then X is called a random variable.
Special cases:
1. Real-valued random variables with E = R (equipped with the usual
σ-algebra of Lebesgues measurable sets).
2. Discrete random variables with E = M ⊆ Z
9/36 Jörg Schäfer | Learning From Data | c b n a -1: Probability Theory: Recap and Prerequisites
Push-forward Probability Measures
A random variable X provides a way to push-forward a probability
measure, by defining PX (B) := P(X −1 (B)). This makes the following
diagram commutative:


X /B

P PX

 
[0, 1]
id / [0, 1]

Figure: Push-forward Probability Measure

Henceforth, without loss of generality, we might only consider measures


on “standard" probability spaces like e.g. B = R or B = Z.
10/36 Jörg Schäfer | Learning From Data | c b n a -1: Probability Theory: Recap and Prerequisites
Random/Stochastic Process

Definition 9
Let I be an arbitrary index set. Then any quadrupel (Ω, A, P, (Xt )t∈I ) is
called a stochastic process, where (Ω, A, P) is a probability space and
(Xt )t∈I is a collection of random variables with values in a common
measure space E . The map I → E defined by t → Xt is called path of
the process.
Common examples of index sets are the real line R or [0, 1].

(We ignore technical details how to construct such spaces and the
associated σ-algebras of proper filtrations, etc.)

11/36 Jörg Schäfer | Learning From Data | c b na -1: Probability Theory: Recap and Prerequisites
Brownian Motion
Definition 10
Let (Ω, A, P, (Xt )t∈R+ ) be a stochastic process such that Xt − Xs has
distribution N (0, t − s) for s ≤ t. Then we call Xt Brownian Motion or
Wiener Process. 2
1
phenotype

0
−1
−2

0 20 40 60 80 100

time
12/36 Jörg Schäfer | Learning From Data | c b n a -1: Probability Theory: Recap and Prerequisites
Distribution Functions
Definition 11
Let X be a real-valued random variable on (Ω, A, P). Then
F : R → [0, 1], F (x ) := P(X ≤ x ) is called the distribution function of X .

Lemma 12
Properties:
limx →−∞ F (x ) = 0, limx →∞ F (x ) = 1
F is monotonously increasing
F is continuous from the right
P(a < X ≤ b) = F (b) − F (a)
P(X > a) = 1 − F (a)
P(X = a) = F (a) − limx >0,x →0 F (a − x )

13/36 Jörg Schäfer | Learning From Data | c b na -1: Probability Theory: Recap and Prerequisites
Distribution Functions: Example
Normal Distribution

0.5
N ~ (0, 1)

0.4
0.3
Density

0.2
0.1
0.0

−6 −4 −2 0 2 4 6

Figure: Gaussian distribution

Z x
1 2 /σ 2
F (x ) := √ e −x dx
2πσ 2 −∞

14/36 Jörg Schäfer | Learning From Data | c b n a -1: Probability Theory: Recap and Prerequisites
Expectations

Definition 13
Let X beR a real-valued random variable on (Ω, A, P). Then
E[X ] := X (ω)dP(ω), if exists is called the Expectation (or Mean) of X .

Lemma 14
R
E[X ] = x dFX (x )

Remark: In case of discrete random variables we have


P
E[X ] = xi P(X = xi ).

15/36 Jörg Schäfer | Learning From Data | c b na -1: Probability Theory: Recap and Prerequisites
Moments

Definition 15
Let X be a real-valued random variable on (Ω, A, P). Then
Var [X ] := E[(X − E[X ])2 ], if exists is called the Variance of X . In
general, we define the n-th moments as µn [X ] := E[(X − E[X ])n ],
wherever the integrals exist.

16/36 Jörg Schäfer | Learning From Data | c b na -1: Probability Theory: Recap and Prerequisites
Independence of Events

Definition 16
Two events A and B are called independent (often written as A ⊥ B) iff
their joint probability equals the product of their probabilities:

P(A ∩ B) = P(A)P(B)

17/36 Jörg Schäfer | Learning From Data | c b na -1: Probability Theory: Recap and Prerequisites
Independence of Random Variables

Definition 17
Two random variables X and Y are independent iff the elements of the
σ-algebras generated by them are independent; i.e. for every a and b, the
events X ≤ a andY ≤ b are independent (in the sense defined above).

Lemma 18
Let X and Y be independent. Then the (joint) distribution function of
X , Y factors as follows:

FX ,Y (x , y ) = FX (x )FY (y )

Conversely, if the (joint) distribution function of X , Y factors as above,


then X and Y are independent.

18/36 Jörg Schäfer | Learning From Data | c b na -1: Probability Theory: Recap and Prerequisites
Covariance and Correlations
Covariance is a measure of the joint variability of two random variables:
Definition 19
Let X and Y be a real-valued random variables on (Ω, A, P). Then
Cov [X , Y ] := E[(X − E[X ]) (Y − E[Y ])], if called the covariance.

Definition 20
Let X and Y be a real-valued
p random variables on (Ω, A, P). Then
ρ(X , Y ) := Cov (X , Y )/ Var (X )Var (Y ) is called the correlation.

For vector valued random variables, the correlation matrix is useful:


Definition 21
Let X and Y be a random variables on (Ω, A, P) with values in Rn . Then
σ[X, Y] := E[(X − E[X]) (Y − E[Y])t ], if called the covariance matrix.

19/36 Jörg Schäfer | Learning From Data | c b na -1: Probability Theory: Recap and Prerequisites
Conditional Probabilities

Definition 22
Let (Ω, A, P) be a probability space and A, B ∈ A with P(B) > 0. Then
the conditional probability of A with respect to B (or given B),
denoted by P(A|B) is defined as

P(A ∩ B)
P(A|B) := .
P(B)

Lemma 23
For any A and B we have P(A) = P(A|B)P(B) + P(A|B c )P(B c ).

20/36 Jörg Schäfer | Learning From Data | c b na -1: Probability Theory: Recap and Prerequisites
Conditional Probabilities – Example
Three tosses of a fair coin:

Let (Ω, A, P) be {hhh, hht, hth, htt, thh, tht, tth, ttt}
2 or more heads in 3 tosses: A = {hhh, hht, hth, thh} ⇒ P(A) = 1/2
What is the probability of A, given that the first toss lands head
(event B)?
By counting, we conclude P(A|B) = 3/4
Indeed, using Bayes we also get
P(A ∩ B) = 3/8, P(B) = 1/2, thus P(A|B) = P(A∩B) P(B) = 3/4
Similarly, what is the probability of A, given that the first toss lands
tail?
P(A|B c ) = 1/4
Indeed, P(A) = P(A|B)P(B) + P(A|B c )P(B c ) =
3/4 × 1/2 + 1/4 × 1/2 = 1/2.
21/36 Jörg Schäfer | Learning From Data | c b n a -1: Probability Theory: Recap and Prerequisites
Conditional Expectations

Definition 24
Let (Ω, A, P) be a probability space, let X denote a random variable
defined on Ω, and B ∈ A an event with P(B) > 0. Then

E[X 1B ]
Z
E[X |B] := = x dP(x |B)
P(B)

is called the conditional expectation with regards to B (P(x |B) denotes


the probability measure corresponding to the conditional probability
defined as above).
R R
One can show that σ(B) E[X |B] dP = σ(B) X dP and this can be used
to define E[X |B] even for cases with P(B) = 0, but this requires more
measure theory than we can afford here. . .

22/36 Jörg Schäfer | Learning From Data | c b na -1: Probability Theory: Recap and Prerequisites
Bayes’ Rule

Theorem 25
Let (Ω, A, P) be a probability space, let A and B be random events, then

P(A|B)P(B)
P(B|A) =
P(A)

Proof.
Trivial exercise.

23/36 Jörg Schäfer | Learning From Data | c b na -1: Probability Theory: Recap and Prerequisites
Bayes – Example
Suppose you are tested positive for HIV. Should you be worried?
Here’s the math:
Prevalence of HIV in German population: 0.05% (Source:
Robert-Koch-Institut)
If a person carries HIV test is positive (T + ) with 99.9% (sensitivity)
If a person does not carry HIV test is negative with 99.7% (specivity)
Thus, according to Bayes
P(T + |HIV )P(HIV )
P(HIV |T + ) =
P(T + )
P(T + |HIV )P(HIV )
=
P(T + |HIV )P(HIV ) + P(T + |HIV c )P(HIV c )
99.9% ∗ 0.05%
=
99.9% ∗ 0.05% + 0.3% ∗ 99.95%
= 14%
24/36 Jörg Schäfer | Learning From Data | c b n a -1: Probability Theory: Recap and Prerequisites
Bayes – Example
Suppose you are tested positive for HIV. Should you be worried?
Here’s the math:
Prevalence of HIV in German population: 0.05% (Source:
Robert-Koch-Institut)
If a person carries HIV test is positive (T + ) with 99.9% (sensitivity)
If a person does not carry HIV test is negative with 99.7% (specivity)
Thus, according to Bayes
P(T + |HIV )P(HIV )
P(HIV |T + ) =
P(T + )
P(T + |HIV )P(HIV )
=
P(T + |HIV )P(HIV ) + P(T + |HIV c )P(HIV c )
99.9% ∗ 0.05%
=
99.9% ∗ 0.05% + 0.3% ∗ 99.95%
= 14%
24/36 Jörg Schäfer | Learning From Data | c b na -1: Probability Theory: Recap and Prerequisites
Bayes – Example
Suppose you are tested positive for HIV. Should you be worried?
Here’s the math:
Prevalence of HIV in German population: 0.05% (Source:
Robert-Koch-Institut)
If a person carries HIV test is positive (T + ) with 99.9% (sensitivity)
If a person does not carry HIV test is negative with 99.7% (specivity)
Thus, according to Bayes
P(T + |HIV )P(HIV )
P(HIV |T + ) =
P(T + )
P(T + |HIV )P(HIV )
=
P(T + |HIV )P(HIV ) + P(T + |HIV c )P(HIV c )
99.9% ∗ 0.05%
=
99.9% ∗ 0.05% + 0.3% ∗ 99.95%
= 14%
24/36 Jörg Schäfer | Learning From Data | c b na -1: Probability Theory: Recap and Prerequisites
Bayes – Example
Suppose you are tested positive for HIV. Should you be worried?
Here’s the math:
Prevalence of HIV in German population: 0.05% (Source:
Robert-Koch-Institut)
If a person carries HIV test is positive (T + ) with 99.9% (sensitivity)
If a person does not carry HIV test is negative with 99.7% (specivity)
Thus, according to Bayes
P(T + |HIV )P(HIV )
P(HIV |T + ) =
P(T + )
P(T + |HIV )P(HIV )
=
P(T + |HIV )P(HIV ) + P(T + |HIV c )P(HIV c )
99.9% ∗ 0.05%
=
99.9% ∗ 0.05% + 0.3% ∗ 99.95%
= 14%
24/36 Jörg Schäfer | Learning From Data | c b na -1: Probability Theory: Recap and Prerequisites
Bayes – Example
Suppose you are tested positive for HIV. Should you be worried?
Here’s the math:
Prevalence of HIV in German population: 0.05% (Source:
Robert-Koch-Institut)
If a person carries HIV test is positive (T + ) with 99.9% (sensitivity)
If a person does not carry HIV test is negative with 99.7% (specivity)
Thus, according to Bayes
P(T + |HIV )P(HIV )
P(HIV |T + ) =
P(T + )
P(T + |HIV )P(HIV )
=
P(T + |HIV )P(HIV ) + P(T + |HIV c )P(HIV c )
99.9% ∗ 0.05%
=
99.9% ∗ 0.05% + 0.3% ∗ 99.95%
= 14%
24/36 Jörg Schäfer | Learning From Data | c b na -1: Probability Theory: Recap and Prerequisites
Bayes – Example
Suppose you are tested positive for HIV. Should you be worried?
Here’s the math:
Prevalence of HIV in German population: 0.05% (Source:
Robert-Koch-Institut)
If a person carries HIV test is positive (T + ) with 99.9% (sensitivity)
If a person does not carry HIV test is negative with 99.7% (specivity)
Thus, according to Bayes
P(T + |HIV )P(HIV )
P(HIV |T + ) =
P(T + )
P(T + |HIV )P(HIV )
=
P(T + |HIV )P(HIV ) + P(T + |HIV c )P(HIV c )
99.9% ∗ 0.05%
=
99.9% ∗ 0.05% + 0.3% ∗ 99.95%
= 14%
24/36 Jörg Schäfer | Learning From Data | c b na -1: Probability Theory: Recap and Prerequisites
Bayes and Machine Learning
In machine learning, we often have the following interpretation of Bayes’
rule and nomenclature:
Let D denote the data and θ denote parameters of our model.
Then according to Bayes we have

P(θ|D) = P(D|θ)P(θ)/P(D).

P(D) is just a normalization constant and can be ignored.


P(D|θ) is called the likelihood of data given model parameters.
P(θ) is the prior and our (subjective?!) belief of what the model
parameters might be.
P(θ|D) is called the posterior – a probability distribution of the
model parameters given the data.

25/36 Jörg Schäfer | Learning From Data | c b n a -1: Probability Theory: Recap and Prerequisites
Convergence

Definition 26
Let Xn be a series of random variables. Then
a.s.
1. Xn converges to X almost surely (notation Xn −−→ X ) iff
P(limn→∞ |Xn − X | = 0) = 1.
i.p.
2. Xn converges to X in probability (notation Xn −−→ X ) iff ∀ > 0
limn→∞ P(|Xn − X | ≥ ) = 0.
d
3. Xn converges to X in distribution (notation Xn − → X ) iff
limn→∞ Fn (x ) = F (x ) for all x where F (x ) is continuous.
Lr
4. Xn converges to X in r -th mean (notation Xn −→ X ) iff
limn→∞ E(|Xn − X |r ) = 0.

26/36 Jörg Schäfer | Learning From Data | c b na -1: Probability Theory: Recap and Prerequisites
Convergence Relations

a.s.
Xn −−→ X +3 Xn −
i.p.
−KS → X +3 Xn −
d
→X

Lr
Xn −→ X

Figure: Convergence Relations

The reverse directions are not true, in general.

27/36 Jörg Schäfer | Learning From Data | c b na -1: Probability Theory: Recap and Prerequisites
Counterexample

Let Xn be a sequence of random variables with Bernoulli distribution 1/n,


i.e. P(Xn = 1) = 1/n and P(Xn = 0) = 1 − 1/n.
Lemma 27
For Xn defined as above we have:
i.p.
1. Xn −−→ 0
L2
2. Xn −→ 0
3. Xn does not converge almost surely to any random variable X

Proof.
Homework

28/36 Jörg Schäfer | Learning From Data | c b na -1: Probability Theory: Recap and Prerequisites
Law of Large Numbers

Let Xn denote a sequence of independently, identically distributed random


variables on a probability space (Ω, A, P).
Theorem 28
Weak Law of Large Numbers: If µ = E[Xn ], then the sample average
1 Pn i.p.
X n := n i=0 Xi converges in probability to µ, i.e. X n −−→ µ

Theorem 29
Strong Law of Large Numbers: If E[|Xn |] ≤ ∞, then the sample average
a.s.
X n := n1 ni=0 Xi converges almost surely to µ, i.e. X n −−→ µ
P

29/36 Jörg Schäfer | Learning From Data | c b na -1: Probability Theory: Recap and Prerequisites
Central Limit Theorem
Let Xn denote a sequence of independently, identically distributed random
variables on a probability space (Ω, A, P). We already know that the
sample average converges under certain assumptions to the mean. But
how fast?
It turns out that we cannot just describe how fast, but we can even
calculate the limit distribution of the sample average:
Theorem 30
Central Limit Theorem: Let Sn := (Xi − µ). Then E[Sn ] = 0 and
P

Var (Sn ) = nσ 2 . Henceforth, for S˜n := nS√nσ we have E[S˜n ] = 0 and


d
Var [S˜n ] = 1. Then S˜n −
→ N (0, 1).

The key point of the theorem is that we know that the sampling
distribution of the statistic is always normal no matter the distribution
of the Xn (universality property of normal distribution).
30/36 Jörg Schäfer | Learning From Data | c b na -1: Probability Theory: Recap and Prerequisites
Speed of Convergence
In the context of the Central Limit Theorem we have for arbitrary
λ ∈ R+ : Z λ
Sn − nµ

≤ λ → √1
 
2
P √ e −t /2 dt

2π −λ
Often we are interested in the following quantity
Sn − nµ
 
P √

The central limit theorem suggests that


Z λ
Sn − nµ

1
 
2 /2
P √
>λ →1− √ e −t dt,

2π −λ

which becomes arbitrarily small for big enough λ.


However, the central limit theorem only controls the typical behaviour of
statistics. We are now interested in the outliers.
31/36 Jörg Schäfer | Learning From Data | c b n a -1: Probability Theory: Recap and Prerequisites
Chebyshev’s Inequality

Theorem 31
For any random variable X with finite expectation µ and finite non zero
variance σ 2 , for any real number k > 0, we have
  1
P |Xn − µ| ≥ kσ 2 ≤ .
k2

32/36 Jörg Schäfer | Learning From Data | c b na -1: Probability Theory: Recap and Prerequisites
Concentration Inequalities
Fortunately, there are so-called concentration of measure inequalities
providing exponential type bounds for quantities as above. This allows us
to calculate how much the sample Sn differs from the target distribution
(keep this in mind!).
For instance, the famous Hoeffding inequality states, e.g.
Theorem 32
Let Xn be bounded in an interval, then
Sn − nµ
   
2
P √
≥ λ ≤ 2 exp −cλ

for some c > 0.


See also https://dustingmixon.wordpress.com/2014/09/29/
an-intuition-for-concentration-inequalities/.

33/36 Jörg Schäfer | Learning From Data | c b na -1: Probability Theory: Recap and Prerequisites
Example

2500
1.0

●●
0.9

800


●●
● ●
●●● ● ● ● ●
●● ● ●
0.8

● ● ●● ● ●●
●● ● ●● ●● ●● ●● ● ●● ● ● ● ●●
● ●● ● ● ●● ● ● ●●
●●●
● ● ●● ●●●●● ●● ●●●
● ● ● ● ● ●●●● ●
● ● ● ●
●●
● ●● ● ● ● ●
●● ● ● ● ●●●●●● ● ●●● ●● ●●● ● ●●●

600
Sample mean

● ● ● ●
● ●●● ●
●● ●●●

●● ●●
● ● ● ● ●● ●

●●● ●
●● ●●●● ●●● ●●●● ●●● ● ●●●●●●● ●●●●●
●●●
●●
●● ●
● ●●
● ●●●●●●● ●● ●●●● ●●

● ●●●

●● ●
●●

●●● ●●
● ●

● ●●
●●●●
● ● ●
●●●
●●●●●
● ● ●
●● ●● ●
●●●● ● ●
●●
●●● ●●
● ●●●●● ●●


● ●●
●●●●

Frequency
● ●● ●
● ●●●
●● ● ● ●● ●●●
● ● ●● ● ●● ●
●●
● ● ●●● ●
● ●●● ●● ●●● ●●● ●
●● ●●●●● ●●●●●● ●
●● ●●
● ● ● ● ● ● ●●
● ●●●
● ●●
●● ●


●●
●●
●● ●● ● ● ●●●● ● ● ● ●●● ●●●●● ● ● ●●● ●
●●●● ● ●● ● ●
0.7

●●
●● ● ●●● ● ● ●
● ●●
●● ●●
● ● ● ● ● ● ●
●●
● ●●● ● ● ● ● ●
● ● ● ●
● ●

400
● ●

0.6


●●
0.5

200
0.4

0
0 100 200 300 400 500 −4 −2 0 2 4

Sample size m

Figure: Law of Large Numbers and Central Limit for Binomial Distributions

34/36 Jörg Schäfer | Learning From Data | c b n a -1: Probability Theory: Recap and Prerequisites
Why Probability?

We need probability for machine learning, because


the laws of the universe are fundamentally probabilistic,
uncertainty arises because of complex systems,
uncertainty arises because of in-precise measurements,
we want to use imperfect ad-hoc models,
we want to use it as a tool-kit,
it provides the fundamental concepts for statistical learning, and
statistical learning theory is – by now – the only scientific foundation
of machine learning (the rest is engineering and heuristics. . . ).

35/36 Jörg Schäfer | Learning From Data | c b n a -1: Probability Theory: Recap and Prerequisites
References I

[Bau02] H. Bauer, Wahrscheinlichkeitstheorie, ser. De-Gruyter-Lehrbuch. de


Gruyter, 2002. [Online]. Available:
https://books.google.de/books?id=G87VCyJHsb0C

[Fel50] W. Feller, An introduction to probability theory and its applications, ser.


Wiley mathematical statistics series. Wiley, 1950, no. Bd. 1. [Online].
Available: https://books.google.de/books?id=_DNMAAAAMAAJ

[Geo04] H. Georgii, Stochastik: Einführung in die Wahrscheinlichkeitstheorie und


Statistik, ser. De Gruyter Lehrbuch. De Gruyter, 2004. [Online]. Available:
https://books.google.de/books?id=m2unQRKR2acC

[JJ60] P. P. Jean Jacod, Probability Essentials. Springer Berlin, 1960, vol. 2.

[Par60] E. Parzen, Modern Probability Theory and its Applications. John Wiley
and Sons, Inc., New York and London, 1960.

36/36 Jörg Schäfer | Learning From Data | c b n a -1: Probability Theory: Recap and Prerequisites

You might also like