Professional Documents
Culture Documents
Lecture 1 - Introduction
VNU-UET
2022
1 / 23
Table of content
2 / 23
Machine learning
3 / 23
Example
(Single) Face detection
▶ T : input = 224x224 RGB image, output = (x1,y1,x2,y2)
top-left & bottom-right corner of the face in the input
▶ P: IoU
▶ E : a set of (million) (image, (x1, y1, x2, y2)) pairs
Exercises: Specify T , P, E for
▶ Predicting tomorrow’s weather given geographic information,
satellite images, and a trailing window of past weather.
▶ Answering question, expressed in free-form text.
▶ Identifying all of people depicted in an image and draws
outlines around each.
▶ Recommending users with products that they are likely to
enjoy while browsing.
4 / 23
Types of Machine Learning
5 / 23
Key phases in Machine Learning
6 / 23
Prerequisite for Machine learning
Math Programming
▶ Linear Algebra ▶ Data structure and
▶ Calculus algorithms
▶ Probability and Statistics ▶ Python/C++
▶ Optimization ▶ Libaries: numpy, pandas,
scikit-learn, pytorch
▶ Framework: jupyter, django,
fastapi, Android, IOS
7 / 23
Probability
Definitions:
▶ Sample space: Ω is the set of all possible outcomes or results
(of a random experiment).
▶ Event space: The set F ⊂ 2Ω is a σ-algebra of the sets of Ω.
Each element in F is an event (subset of Ω).
▶ A σ-algebra must satisfy: (i) F ̸= ∅, (ii) A ∈ F ⇒ Ω \ A ∈ F,
S∞
(iii) Ai ∈ F, ∀i ⇒ i=1 Ai ∈ F
▶ Probability measure: a function P : F → R+ satisfies the
following properties:
▶ P(Ω) = 1, P(∅) = 0
▶ Ai ∈ F, Ai ∩ Aj = ∅, ∀i ̸= j ⇒ P( ∞
S P∞
i=1 Ai ) = i=1 P(Ai )
As a result, the probability of a random event is specified by a
probability triple (Ω, F, P).
8 / 23
Probability
Example
Consider a random experiment: A closed box contains 100
marbles, of which 40 are red and 60 are blue. Take out one marble
randomly.
▶ Sample space: Ω is the set of 100 marbles in the box.
▶ Event space: F = {∅, Ω, red marble, blue marble}, i.e F
includes 4 sets of Ω. Notice that F is a σ-algebra of Ω.
▶ Probability measure: If the chances of taking every marble are
all equal, then
▶ P(∅) = 0, (Ω) = 1, P(red) = 0.4, P(blue) = 0.6
▶ Event ∅: no marble are taken (happen with probability 0).
▶ Event Ω: A red or blue marble is taken (happen with
probability 1).
▶ Event red marble: the marble taken is red (probability 0.4).
▶ Event blue marble: the marble taken is blue (probability 0.6).
9 / 23
Probability
Bayes’ theorem
Consider two events A, B, with P(A) ̸= 0, then
P(A ∩ B) P(A|B)P(B)
P(B|A) = =
P(A) P(A)
where
▶ P(B|A): the probability of event B occurring given that A is
true (a-posterior).
▶ P(A|B): the likelihood of A given a fixed B
▶ P(B): marginal or prior probability.
Independence
Two events A and B are independent iff P(A ∩ B) = P(A)P(B)
10 / 23
Probability
Example: COVID-19
▶ Test results are accurate on a sicked person with 90% (True
positive rate).
▶ Test results are accurate on a healthy person with 99% (True
negative rate).
▶ 3% of the population have COVID-19.
Question: what is the probability that a random person who tests
positive is really a sicked person?
▶ Event A: positive test result.
▶ Event B: has disease.
P(A|B) × P(B) = 0.9 × 0.03 = 0.027
⇒ P(B|A) = 73.569%
11 / 23
Random variable
12 / 23
Types of random variables
▶ Discrete
X ∈ {1, 2, . . . C }
with parameters: θc = P(X = c), c = 1, 2, . . . C
▶ Continuous
X ∈R
13 / 23
Properties of random distribution
▶ Expectation
X Z
E[X ] = cP(X = c) = xp(x)dx
c R
Z
E[f (X )] = f (x)p(x)dx
R
▶ Variance
V[X ] = E[(X − E[X ])2 ]
14 / 23
Properties of expectation
15 / 23
Properties of expectation
Z Z Z
EY EX [X |Y ] = xp(x|y )dx p(y )dy = xp(x)dx = E[X ]
R R R
Z Z Z Z
xp(x|y )dx p(y )dy = xp(x|y )p(y ) dxdy
R R R R
Z Z
= xp(x, y ) dxdy
ZR ZR
= xp(y |x)p(x) dxdy
ZR RZ
= p(y |x) dy xp(x) dx
R R
| {z }
1
16 / 23
Bernoulli distribution
17 / 23
Parameter estimation
18 / 23
Maximum likelihood Estimation - MLE
L(θ) is the likelihood of θ with respect to the dataset D
MLE: Find θ for which L(θ) is maximized.
n
X
ℓ(θ) = log L(θ) = xi log θ + (1 − xi ) log(1 − θ)
i=1
n
X
ℓ′ (θ) = xi /θ − (1 − xi )/(1 − θ) = 0
i=1
n n
1X 1 X
xi = (1 − xi )
θ 1−θ
i=1 i=1
| {z } | {z }
s N−s
s(1 − θ) = (N − s)θ
s
θMLE =
N
19 / 23
How good is the MLE?
▶ Unbiased: E[θMLE ] = θ
▶ Variance goes to 0: V[θMLE ] = θ(1 − θ)/N
▶ Consistent: P{|θMLE − θ| ≥ ϵ} n→∞
−→ ∞
√ d
▶ Normality: N(θMLE − θ) −→ N (0, 1)
20 / 23
Binomial distribution
21 / 23
Binomial distribution (cont)
n
X
ℓ(θ) = log L(θ) = const + si log θ + (N − si ) log(1 − θ)
i=1
n
X
ℓ′ (θ) = si /θ − (N − si )/(1 − θ) = 0
i=1
n n
1X 1 X
si = (N − si )
θ 1−θ
i=1 i=1
N
1 X si
θ=
n N
i=1
22 / 23
Gaussian distribution
1 (x−µ)2
p(X = x) = √ e− 2σ 2
2πσ 2
23 / 23