Professional Documents
Culture Documents
Jia-Bin Huang
ECE-5424G / CS-5824 Virginia Tech Spring 2019
Administrative
• HW 1 out today. Please start early!
• Office hours
• Chen: Wed 4pm-5pm
• Shih-Yang: Fri 3pm-4pm
• Location: Whittemore 266
Linear Regression
• Model representation
• Cost function
• Gradient descent for linear regression
Repeat until convergence {}
• Features and polynomial regression
Can combine features; can use different functions to generate features (e.g.,
polynomial)
• Normal equation
() Size in feet^2 Number of Number of Age of home Price ($) in
() bedrooms () floors () (years) () 1000’s (y)
1 2104 5 1 45 460
1 1416 3 2 40 232
1 1534 3 2 30 315
1 852 2 1 36 178
… …
[ ]
460
𝑦 = 232
315
178
⊤ −1 ⊤
𝜃=( 𝑋 𝑋) 𝑋 𝑦 Slide credit: Andrew Ng
Least square solution
•
𝒚
Justification/interpretation 1
𝑿 𝜽−𝒚
• Geometric interpretation 𝑿𝜽
column space of
• Cost function
• Gradient descent for linear regression
Repeat until convergence {}
• Features and polynomial regression
Can combine features; can use different functions to generate features (e.g.,
polynomial)
• Normal equation
Today’s plan
• Probability basics
• Naïve Bayes
Today’s plan
• Probability basics
• Naive Bayes
Random variables
• Outcome space S
• Space of possible outcomes
• Random variables
• Functions that map outcomes to real numbers
• Event E
• Subset of S
Visualizing probability
Sample space
Area = 1
A is true
A is false
A is true
A is false
𝑃 ( 𝐴 ) + P ( A ) =1
Visualizing probability
A^B
B
A^~B
𝑃 ( 𝐴 ) =P( A , B)+P ( A , 𝐵 )
Visualizing conditional probability
A^B 𝑃 ( 𝐴∨𝐵 )=𝑃 ( 𝐴 , 𝐵 ) /𝑃 (𝐵)
0.8 × 0.05
𝑃 ( 𝐴∨𝐵 )= 0.17
0.8 × 0.05+0.2 × 0.95
Learn
A B C Prob
Joint distribution 0
0
0
0
0
1
0.30
0.05
0 1 0 0.10
• Making a joint distribution of M variables 0 1 1 0.05
1 0 0 0.05
1 0 1 0.10
1. Make a truth table listing all combinations 1 1 0 0.25
1 1 1 0.10
2. Be smart about
how to represent joint distributions
• Bayes network, graphical models (more on this later)
Slide credit: Tom Mitchell
Today’s plan
• Probability basics
• Naive Bayes
Estimating the probability
• Flip the coin repeatedly, observing 𝑋=1 0
• It turns heads times
• It turns tails times
• Your estimate for is?
1 𝛽 −1 𝛽 −1
𝑃 ( 𝜃)=𝐵𝑒𝑡𝑎 ( 𝛽1,𝛽0)= 𝜃 (1 −𝜃)
1 0
𝐵(𝛽1 ,𝛽0)
Slide credit: Tom Mitchell
Maximum likelihood estimate
• Data set of iid flips, 𝑋=1 0
• Conjugate prior:
Prior is the conjugate prior for a likelihood function if
the prior and the posterior have the same form.
• Example (coin flip problem)
• Prior : Likelihood : Binomial
• Posterior :
Slide credit: Tom Mitchell
How many parameters?
• Suppose , where
and are Boolean random variables
To estimate
When ?
• Naive Bayes
Naïve Bayes
• Assumption:
Example:
General form:
How many parameters to describe ? ?
• Without conditional indep assumption?
• With conditional indep assumption?
Slide credit: Tom Mitchell
Naïve Bayes classifier
• Bayes rule:
• Classify
Estimating parameters: discrete
• Maximum likelihood estimates (MLE)
• Additional assumption on :
• Is independent of ()
• Is independent of ()
• Is independent of and ()
• Classify
• Naive Bayes
Next class
• Logistic regression