Professional Documents
Culture Documents
02 Math Essentials PDF
02 Math Essentials PDF
Math Essentials
HOWEVER
z To get really useful results, you need good
mathematical
at e at ca intuitions
tu t o s about ce
certain
ta gegeneral
ea
machine learning principles, as well as the inner
workings of the individual algorithms.
Intuition:
z In some process, several outcomes are possible.
When the process is repeated a large number of
times, each outcome occurs with a characteristic
relative frequency
frequency, or probability.
probability If a particular
outcome happens more often than another
outcome we say it is more probable
outcome, probable.
1. Non-negativity:
for any event E ∈ F,
F p( E ) ≥ 0
2
2. All possible outcomes:
p( Ω ) = 1
z Continuous
C ti space | Ω | isi infinite
i fi it
– Analysis involves integrals ( ∫ )
example:
sum of two
fair dice
example:
probability
z Scenario
– Several random p processes occur ((doesn’t matter
whether in parallel or in sequence)
– Want to know probabilities for each possible
combination
bi ti off outcomes
t
z Can describe as joint probability of several random
variables
– Example: two processes whose outcomes are
represented by random variables X and Y. Probability
that process X has outcome x and process Y has
outcome y is denoted as:
p(( X = x, Y = y )
Jeff Howbert Introduction to Machine Learning Winter 2012 18
Example of multivariate distribution
z Marginal probability
– Probability distribution of a single variable in a
joint distribution
– Example: two random variables X and Y:
p( X = x ) = ∑b=all values of Y p( X = x, Y = b )
z Conditional probability
– Probability distribution of one variable given
that another variable takes a certain value
– Example: two random variables X and Y:
p( X = x | Y = y ) = p( X = x, x Y = y ) / p( Y = y )
Jeff Howbert Introduction to Machine Learning Winter 2012 20
Example of marginal probability
0.2
0 15
0.15
probability
0.1
0.05
p
American
sport
Asian SUV
European minivan
Y = manufacturer X = model type
sedan
Given:
z A discrete random variable X, X with possible
values x = x1, x2, … xn
z Probabilities p( X = xi ) that X takes on the
various values of xi
z A function yi = f( xi ) defined on X
E( f ) = ∫x = a → b p( x ) ⋅ f( x )
z Mean (μ)
f( xi ) = xi ⇒ μ = E( f ) = ∑i p( xi ) ⋅ xi
– Average value of X = xi, taking into account probability
of the various xi
– Most
M t common measure off “center”
“ t ” off a distribution
di t ib ti
z Variance (σ2)
f( xi ) = ( xi - μ ) ⇒ σ2 = ∑i p( xi ) ⋅ ( xi - μ )2
– Average value of squared deviation of X = xi from
mean μ, taking into account probability of the various xi
– Most
M t common measure off “spread”
“ d” off a distribution
di t ib ti
– σ is the standard deviation
high (pos
covaria
no co
ance
sitive)
z Compare to formula for covariance of actual samples
1 n
cov( x, y ) = ∑
N − 1 i =1
( xi − μ x )( yi − μ y )
Linear dependence
without noise
Various nonlinear
dependencies
p( not A ) = 1 - p( A )
A not A
A B ) = p( A | B ) ⋅ p( B )
p( A,
(same expression given previously to define conditional probability)
(not A, not B)
A ( A,
A B) B
Ω
(A, not B) (not A, B)
p( A ) = p( A,
A B ) + p( A,
A not B )
(same expression given previously to define marginal probability)
(not A, not B)
A ( A,
A B) B
Ω
(A, not B) (not A, B)
p( A | B ) = p( A ) or A B ) = p( A ) ⋅ p( B )
p( A,
B
(A, not B) A ( A, B )
z Independence:
– Outcomes on multiple p rolls of a die
– Outcomes on multiple flips of a coin
– Height of two unrelated individuals
– Probability of getting a king on successive draws from
a deck, if card from each draw is replaced
z D
Dependence:
d
– Height of two related individuals
– Duration of successive eruptions of Old Faithful
– Probability of getting a king on successive draws from
a deck,, if card from each draw is not replaced
p
(not A, not B)
A ( A, B ) B
Ω
(A, not B) (not A, B)
p( B | A ) = p( A | B ) ⋅ p( B ) / p( A )
(not A, not B)
A ( A, B ) B
Ω
(A, not B) (not A, B)
– rows represent
p samples
p No
No
Married
Single
100K
70K
No
No
(records, items, datapoints) Yes Married 120K No
No Divorced 95K Yes
– columns represent
p attributes No Married 60K No
(features, variables)
Yes Divorced 220K No
No Single 85K Yes
No Married 75K
z Natural to think of each sample
No
No Single 90K Yes
– result is a vector
– result is a vector
– result is a scalar
y
z Dot product alternative form
a = x ⋅ y = x y cos (θ ) θ
x
z Matrix-matrix multiplication
– vector-matrix multiplication
p jjust a special
p case
TO THE BOARD!!
z Multiplication is associative
A⋅(B⋅C)=(A⋅B)⋅C
z Multiplication is not commutative
A ⋅ B ≠ B ⋅ A (generally)
z Transposition rule:
( A ⋅ B )T = B T ⋅ A T
z Maximum likelihood
z Expectation maximization
z Gradient descent