0% found this document useful (0 votes)

38 views10 pages

ML Day3

The document contains lecture notes for CS229 Machine Learning, focusing on supervised learning topics such as linear regression, locally weighted regression, and logistic regression. It covers key components, limitations, and examples of these methods, along with insights into their applications and challenges. The notes also discuss optimization techniques and future topics in machine learning.

Uploaded by

hardik.mahajan.work

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views10 pages

ML Day3

Uploaded by

hardik.mahajan.work

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

CS229 Machine Learning: Supervised Learning

Lecture Notes

Andrew Ng (CS229, Autumn 2018)

May 30, 2025

Contents
1 Recap of Linear Regression 3
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Key Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.5 Insight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Locally Weighted Regression (LWR) 4

2.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Bandwidth Parameter (𝜏) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.4 Parametric vs. Non-Parametric . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.5 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.6 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.7 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.8 Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.9 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.10 Insight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3 Probabilistic Interpretation of Linear Regression 5

3.1 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.2 Model Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.3 Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.4 Key Insights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.6 Insight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

4 Logistic Regression 7
4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.2 Why Not Linear Regression? . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.3 Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.4 Probabilistic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.5 Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1
4.6 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.7 Why Sigmoid? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.8 Practical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.9 Insight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

5 Newton’s Method 8
5.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
5.2 Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
5.3 Quadratic Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
5.4 Trade-offs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
5.5 Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
5.6 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
5.7 Insight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

6 Future Topics 9

7 Practical Insights 10

2
1 Recap of Linear Regression
1.1 Overview
Linear regression predicts continuous outputs using a linear combination of features, founda-
tional to supervised learning. Previously covered: Problem setup, gradient descent, normal
equations.

1.2 Key Components

Notation:
• 𝜉: Feature vector for the 𝑖-th example (𝑛 + 1-dimensional, with 𝑥 0(𝑖) = 1).
• 𝑦 (𝑖) : Continuous output (real number).
• 𝑚: Number of training examples.
• 𝑛: Number of features.
Hypothesis:
ℎ𝜃 (𝑥) = 𝜃 𝑇 𝑥 = 𝜃 0 + 𝜃 1 𝑥1 + · · · + 𝜃 𝑛 𝑥 𝑛

Cost Function:
1 ∑
𝑚
𝐽 (𝜃) = (ℎ𝜃 (𝜉) − 𝑦 (𝑖) ) 2
2𝑚 𝑖=1
1
Measures average squared error, with 2 for derivative simplicity.
Optimization:
Gradient Descent:
1 ∑
𝑚
𝜃 𝑗 := 𝜃 𝑗 − 𝛼 (ℎ𝜃 (𝜉) − 𝑦 (𝑖) )𝑥 (𝑖)
𝑗
𝑚 𝑖=1
Iterative, suitable for large datasets.
Normal Equations:
𝜃 = (𝑋 𝑇 𝑋) −1 𝑋 𝑇 𝑦
Closed-form, but computationally expensive (𝑂 (𝑛3 )) for large 𝑛.

1.3 Limitations
• Assumes linear relationships, struggles with non-linear data.
• Requires feature engineering (e.g., adding 𝑥 2 ) for non-linearity.

1.4 Example
Housing price prediction: Features (size, bedrooms) fit a line, but non-linear trends (e.g., di-
minishing returns for large houses) need advanced methods.

3
1.5 Insight
Linear regression is a baseline for regression tasks, widely used in economics and forecasting.
Visualization Placeholder: A scatter plot of house sizes vs. prices with a fitted line would
illustrate the linear fit and highlight non-linear deviations.

2 Locally Weighted Regression (LWR)

2.1 Purpose
Addresses non-linearity by fitting linear models
√ locally around each prediction point, avoiding
manual feature engineering (e.g., adding 𝑥 2 , 𝑥).

2.2 Mechanism
For a prediction at point 𝑥:
• Assign weights to training examples:
( )
(𝑥𝑖 − 𝑥) 2
𝑤 𝑖 = exp −
2𝜏 2

• 𝑤 𝑖 ≈ 1 if 𝑥𝑖 is close to 𝑥.
• 𝑤 𝑖 ≈ 0 if 𝑥𝑖 is far from 𝑥.
• Minimize weighted cost:
∑
𝑚
𝐽 (𝜃) = 𝑤 𝑖 (ℎ𝜃 (𝑥𝑖 ) − 𝑦𝑖 ) 2
𝑖=1

• Solve using weighted least squares:

𝜃 = (𝑋 𝑇 𝑊 𝑋) −1 𝑋 𝑇 𝑊 𝑦

𝑊: Diagonal matrix with 𝑊𝑖𝑖 = 𝑤 𝑖 .

2.3 Bandwidth Parameter (𝜏)

Controls the “neighborhood” size:
• Small 𝜏: Narrow focus, risks overfitting (jagged fit).
• Large 𝜏: Broad focus, risks underfitting (smooth, linear-like fit).
Tuning: Use cross-validation to select optimal 𝜏.

2.4 Parametric vs. Non-Parametric

• Parametric: Fixed parameters (e.g., linear regression’s 𝜃).
• Non-Parametric: Parameters scale with data (LWR stores all data, memory grows with
𝑚).

4
2.5 Challenges
• Computationally intensive: Requires solving a new linear system per prediction.
• Poor extrapolation outside the training data range.

2.6 Applications
• Low-dimensional data (𝑛 ≤ 3) with sufficient examples.
• Time-series forecasting, robotics (e.g., path planning).

2.7 Extensions
• Alternative kernels: Triangular, Epanechnikov for different weighting schemes.
• Scalability: KD-trees to efficiently find nearby points.

2.8 Practice
CS229 Problem Set 1: Implement LWR, experiment with 𝜏.

2.9 Example
Non-linear housing prices: LWR captures curves (e.g., price plateaus for large houses) without
explicit non-linear features.

2.10 Insight
LWR is intuitive but less common in high-dimensional settings due to computational cost.
Visualization Placeholder: A plot showing a non-linear dataset with LWR’s local linear fits at
different points would clarify the method.

3 Probabilistic Interpretation of Linear Regression

3.1 Objective
Justify the squared error cost using a probabilistic framework, connecting to maximum likeli-
hood estimation (MLE).

3.2 Model Assumptions

True output:
𝑦 𝑖 = 𝜃 𝑇 𝑥𝑖 + 𝜖𝑖
𝜖𝑖 : Error (unmodeled effects + noise).
Error distribution:
𝜖𝑖 ∼ N (0, 𝜎 2 )
Gaussian, mean 0, variance 𝜎 2 , independent and identically distributed (IID).

5
Density: ( )
1 𝜖𝑖2
𝑝(𝜖𝑖 ) = √ exp −
2𝜋𝜎 2𝜎 2

Implies:
𝑦𝑖 |𝑥𝑖 ; 𝜃 ∼ N (𝜃 𝑇 𝑥𝑖 , 𝜎 2 )

Density: ( )
1 (𝑦𝑖 − 𝜃 𝑇 𝑥𝑖 ) 2
𝑝(𝑦𝑖 |𝑥𝑖 ; 𝜃) = √ exp −
2𝜋𝜎 2𝜎 2

3.3 Likelihood
Likelihood:
∏
𝑚
𝐿 (𝜃) = 𝑝(𝑦𝑖 |𝑥𝑖 ; 𝜃)
𝑖=1

Log-likelihood:
1 ∑
𝑚
𝑚
𝑙 (𝜃) = − log(2𝜋𝜎 2 ) − 2
(𝑦𝑖 − 𝜃 𝑇 𝑥𝑖 ) 2
2 2𝜎 𝑖=1

Maximizing 𝑙 (𝜃):
Equivalent to minimizing:
∑
𝑚
(𝑦𝑖 − 𝜃 𝑇 𝑥𝑖 ) 2
𝑖=1
Matches linear regression’s least squares objective.

3.4 Key Insights

• Gaussian Justification: Central Limit Theorem suggests errors from many small sources
are approximately Gaussian.
• IID Assumption: Simplifies math but may not hold (e.g., correlated housing prices).
• Likelihood vs. Probability: Likelihood treats data as fixed, 𝜃 as variable; probability
fixes 𝜃.
• Notation: Semicolon (;) denotes 𝜃 as a parameter (frequentist convention).

3.5 Conclusion
• Least squares is the MLE under Gaussian, IID errors.
• Non-Gaussian errors (e.g., Poisson) require generalized linear models (GLMs).

6
3.6 Insight
This probabilistic view unifies regression with other models, setting the stage for logistic re-
gression and GLMs.
Visualization Placeholder: A plot of a Gaussian error distribution around a linear fit would
illustrate the error model.

4 Logistic Regression
4.1 Overview
Used for binary classification (𝑦 ∈ {0, 1}), e.g., tumor malignancy (1 = malignant, 0 = benign).

4.2 Why Not Linear Regression?

• Outputs unbounded values, not probabilities in [0, 1].
• Sensitive to outliers, distorting decision boundaries.
• Non-binary outputs are unnatural for classification.

4.3 Hypothesis
Sigmoid function:
1
𝑔(𝑧) =
1 + 𝑒 −𝑧
Maps 𝑧 ∈ R to [0, 1].
Hypothesis:
1
ℎ𝜃 (𝑥) = 𝑔(𝜃 𝑇 𝑥) = 𝑇
1 + 𝑒 −𝜃 𝑥
Represents 𝑝(𝑦 = 1|𝑥; 𝜃).

4.4 Probabilistic Model

Assumptions:
𝑝(𝑦 = 1|𝑥; 𝜃) = ℎ𝜃 (𝑥)
𝑝(𝑦 = 0|𝑥; 𝜃) = 1 − ℎ𝜃 (𝑥)

Combined:
𝑝(𝑦|𝑥; 𝜃) = (ℎ𝜃 (𝑥)) 𝑦 (1 − ℎ𝜃 (𝑥)) 1−𝑦

4.5 Likelihood
Likelihood:
∏
𝑚
(𝑖) (𝑖)
𝐿 (𝜃) = (ℎ𝜃 (𝜉)) 𝑦 (1 − ℎ𝜃 (𝜉)) 1−𝑦
𝑖=1

7
Log-likelihood:
∑
𝑚
[ ]
𝑙 (𝜃) = 𝑦 (𝑖) log ℎ𝜃 (𝜉) + (1 − 𝑦 (𝑖) ) log(1 − ℎ𝜃 (𝜉))
𝑖=1

4.6 Optimization
Batch Gradient Ascent:
Maximize 𝑙 (𝜃):
∑
𝑚
𝜃 𝑗 := 𝜃 𝑗 + 𝛼 (𝑦 (𝑖) − ℎ𝜃 (𝜉))𝑥 (𝑖)
𝑗
𝑖=1
Similar to linear regression’s gradient descent, but uses sigmoid-based ℎ𝜃 .
Properties:
• Concave log-likelihood ensures a global maximum.
• No closed-form solution, requiring iterative methods.

4.7 Why Sigmoid?

• Ensures probabilistic outputs in [0, 1].
• Derived from GLMs, guaranteeing concavity.

4.8 Practical Notes

• Decision Boundary: 𝜃 𝑇 𝑥 = 0, where ℎ𝜃 (𝑥) = 0.5.
• Applications: Medical diagnosis, spam detection, credit scoring.
• Extensions: L1/L2 regularization for overfitting, softmax for multiclass classification.

4.9 Insight
Logistic regression is robust and interpretable, serving as a baseline for classification tasks.
Visualization Placeholder: A plot of the sigmoid function and a 2D decision boundary sepa-
rating two classes would clarify the model.

5 Newton’s Method
5.1 Purpose
Optimizes 𝜃 faster than gradient ascent using second-order information (Hessian).

8
5.2 Mechanism
Goal: Find 𝜃 where ∇𝑙 (𝜃) = 0.
Scalar Case:
𝑙 ′ (𝜃 𝑡 )
𝜃 𝑡+1 = 𝜃 𝑡 −
𝑙 ′′ (𝜃 𝑡 )

Vector Case:
𝜃 𝑡+1 = 𝜃 𝑡 − 𝐻 −1 ∇𝑙 (𝜃 𝑡 )
𝐻: Hessian matrix ((𝑛 + 1) × (𝑛 + 1)), second derivatives of 𝑙 (𝜃).
Process: Uses tangent approximation to find the zero of the derivative.

5.3 Quadratic Convergence

• Error reduces quadratically (e.g., from 0.01 to 0.0001).
• Requires fewer iterations (e.g., 10 vs. 100–1000 for gradient ascent).

5.4 Trade-offs
• Advantages: Rapid convergence for low-dimensional 𝜃 (𝑛 ≤ 50).
• Disadvantages: Hessian inversion is costly (𝑂 (𝑛3 )) for high-dimensional 𝜃 (e.g., 𝑛 =
10, 000).

5.5 Guidelines
• Use Newton’s method for 𝑛 ≤ 50.
• Use gradient ascent or L-BFGS for large 𝑛.

5.6 Context
• Also known as Newton-Raphson.
• Applications: Small-scale optimization in finance, control systems.

5.7 Insight
Newton’s method is powerful for small problems but impractical for modern ML’s high-dimensional
data.
Visualization Placeholder: A plot comparing convergence paths of gradient ascent vs. New-
ton’s method would highlight quadratic convergence.

6 Future Topics
• Problem Set 1: Implement LWR, experiment with 𝜏.
• Generalized Linear Models (GLMs): Unify linear and logistic regression under a com-
mon framework.

9
• Feature Selection: Automate feature choice to improve model performance.
• Overfitting/Underfitting: Address via 𝜏 tuning, regularization techniques.
Insight: These topics build toward robust, scalable ML systems.

7 Practical Insights
LWR:
• Ideal for non-linear, low-dimensional data.
• Used in time-series, robotics (e.g., trajectory prediction).
• Requires careful 𝜏 tuning to balance fit.
Logistic Regression:
• Robust for binary classification.
• Applications: Medical diagnostics, spam filters, credit risk.
• Use as a baseline before complex models like neural networks.
Newton’s Method:
• Efficient for small 𝑛, but modern ML prefers stochastic gradient descent for scalability.
Learning Tips:
• Visualize LWR fits and logistic decision boundaries to build intuition.
• Study GLMs for a unified perspective on regression and classification.
• Use Python libraries (e.g., scikit-learn) for implementation.
Insight: These methods are foundational, bridging theory and practice in ML engineering.
Visualization Placeholder: A flowchart of supervised learning algorithms (linear regression
→ LWR → logistic regression → GLMs) would clarify their relationships.

Undergraduate Fundamentals of Machine Learning Author William J. Deuschle
No ratings yet
Undergraduate Fundamentals of Machine Learning Author William J. Deuschle
143 pages
UC Berkeley Machine Learning Guide
No ratings yet
UC Berkeley Machine Learning Guide
185 pages
SumitBurnwal ML
No ratings yet
SumitBurnwal ML
13 pages
Linear and Logistic Regression Overview
No ratings yet
Linear and Logistic Regression Overview
65 pages
MIT 6.390 Fall 2024 Lecture Notes
No ratings yet
MIT 6.390 Fall 2024 Lecture Notes
146 pages
Module 2 Modified
No ratings yet
Module 2 Modified
67 pages
ML Day2
No ratings yet
ML Day2
7 pages
Machine Learning Course Notes
No ratings yet
Machine Learning Course Notes
112 pages
Da Sem Unit 3-1
No ratings yet
Da Sem Unit 3-1
13 pages
Cs181 Textbook
No ratings yet
Cs181 Textbook
163 pages
Undergraduate Fundamentals of Machine Learning
100% (1)
Undergraduate Fundamentals of Machine Learning
163 pages
Machine Learning Fundamentals Guide
No ratings yet
Machine Learning Fundamentals Guide
161 pages
Machine Learning Principles Explained
100% (1)
Machine Learning Principles Explained
124 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
12 pages
Statistical Regression and Classification Guide
No ratings yet
Statistical Regression and Classification Guide
32 pages
Machine Learning Overview and Techniques
No ratings yet
Machine Learning Overview and Techniques
38 pages
UC Berkeley ML Course Guide
100% (1)
UC Berkeley ML Course Guide
185 pages
Understanding Simple Linear Regression
No ratings yet
Understanding Simple Linear Regression
47 pages
Statistical Machine Learning: Yiqiao YIN Department of Statistics Columbia University
No ratings yet
Statistical Machine Learning: Yiqiao YIN Department of Statistics Columbia University
204 pages
Stanford ML
No ratings yet
Stanford ML
168 pages
Stats 205 Notes
No ratings yet
Stats 205 Notes
99 pages
Week 6
No ratings yet
Week 6
34 pages
MIT 6.390 Machine Learning Notes
No ratings yet
MIT 6.390 Machine Learning Notes
144 pages
Unit II - Supervised Machine Learning Techniques
No ratings yet
Unit II - Supervised Machine Learning Techniques
131 pages
Machine Learning Concepts and Formulas
No ratings yet
Machine Learning Concepts and Formulas
107 pages
Excel Data Mining Guide
100% (1)
Excel Data Mining Guide
178 pages
Convex Optimization and Regression Techniques
No ratings yet
Convex Optimization and Regression Techniques
7 pages
Brief Summary ML
No ratings yet
Brief Summary ML
25 pages
Introduction To Machine Learning: ETH Zurich Janik Schuettler Marcel Graetz FS18
No ratings yet
Introduction To Machine Learning: ETH Zurich Janik Schuettler Marcel Graetz FS18
18 pages
Machine Learning Cheatsheet Overview
100% (1)
Machine Learning Cheatsheet Overview
15 pages
Machine Learning The Basics
No ratings yet
Machine Learning The Basics
158 pages
Machine Learning Simplified
100% (2)
Machine Learning Simplified
109 pages
Statlearn PDF
No ratings yet
Statlearn PDF
123 pages
Reg Book Stat
No ratings yet
Reg Book Stat
79 pages
Linear Regression With Python
No ratings yet
Linear Regression With Python
140 pages
Notes Cce 577
No ratings yet
Notes Cce 577
71 pages
Summary FS24
No ratings yet
Summary FS24
63 pages
MLDAP Module2
No ratings yet
MLDAP Module2
32 pages
Machine Leaning and Dimensionality Reduction Course UCLouvain
No ratings yet
Machine Leaning and Dimensionality Reduction Course UCLouvain
36 pages
Keith McNulty - Handbook of Regression Modeling in People Analytics-Routledge (2021)
100% (1)
Keith McNulty - Handbook of Regression Modeling in People Analytics-Routledge (2021)
272 pages
Linear Regression
No ratings yet
Linear Regression
5 pages
Preview-9781000427899 A41277316
No ratings yet
Preview-9781000427899 A41277316
28 pages
ML 01 (Pranavv)
No ratings yet
ML 01 (Pranavv)
14 pages
TOBo ML
No ratings yet
TOBo ML
120 pages
Iit Notes
No ratings yet
Iit Notes
5 pages
CS229 Andrew NG Lecture Notes
No ratings yet
CS229 Andrew NG Lecture Notes
216 pages
Applications of Machine Learning Explained
No ratings yet
Applications of Machine Learning Explained
13 pages
Regression and Logistic Models Explained
No ratings yet
Regression and Logistic Models Explained
46 pages
CS229
No ratings yet
CS229
216 pages
Modern Pridictive Modelling (Regression)
No ratings yet
Modern Pridictive Modelling (Regression)
12 pages
Intuitive Machine Learning Guide
No ratings yet
Intuitive Machine Learning Guide
14 pages
Andrew NG Main - Notes PDF
100% (1)
Andrew NG Main - Notes PDF
226 pages
Statistical Prediction and Machine Learning
100% (6)
Statistical Prediction and Machine Learning
314 pages
Statistical Machine Learning Overview
No ratings yet
Statistical Machine Learning Overview
1,437 pages
BSW - Parents Pamphlet
No ratings yet
BSW - Parents Pamphlet
2 pages
Recruitment Task - Startup Support Vertical (2025)
No ratings yet
Recruitment Task - Startup Support Vertical (2025)
6 pages
Quotes
No ratings yet
Quotes
2 pages
Standardization Campusx
No ratings yet
Standardization Campusx
4 pages
Essential Radio Studio Equipment List
No ratings yet
Essential Radio Studio Equipment List
10 pages
Simplificationapproximation Asked in Previous Year Prelims Exams
No ratings yet
Simplificationapproximation Asked in Previous Year Prelims Exams
14 pages
Organized Stalking and Electronic Harassment - Osah
No ratings yet
Organized Stalking and Electronic Harassment - Osah
17 pages
European HPL - Fundermax
No ratings yet
European HPL - Fundermax
44 pages
Dynamics - Lect PPT 11.1-11.3
No ratings yet
Dynamics - Lect PPT 11.1-11.3
23 pages
Ejemplo ICL7106 An023
No ratings yet
Ejemplo ICL7106 An023
8 pages
Neuro Sleep1
No ratings yet
Neuro Sleep1
2 pages
Immunohistochemistry (IHC) / Immunocytochemistry (ICC) : Maulida Julia Saputri, S.Tr.A.K., M.Imun
No ratings yet
Immunohistochemistry (IHC) / Immunocytochemistry (ICC) : Maulida Julia Saputri, S.Tr.A.K., M.Imun
25 pages
Stormwater Assessment Checklist Template
100% (1)
Stormwater Assessment Checklist Template
16 pages
The Danger of a Single Story Explained
100% (1)
The Danger of a Single Story Explained
16 pages
Intro To Integers - Skills Practice ANS
No ratings yet
Intro To Integers - Skills Practice ANS
4 pages
Good Governance & Social Responsibility: Ba Core 4
No ratings yet
Good Governance & Social Responsibility: Ba Core 4
9 pages
1T1314 Departmental
No ratings yet
1T1314 Departmental
18 pages
Rain Water Harvesting
No ratings yet
Rain Water Harvesting
3 pages
Infosys Nagpur Campus Masterplan Overview
No ratings yet
Infosys Nagpur Campus Masterplan Overview
5 pages
Creatures
No ratings yet
Creatures
23 pages
KrautKramer DM4E DM4DL - Manual
No ratings yet
KrautKramer DM4E DM4DL - Manual
38 pages
Check Sheet Hd785-7
No ratings yet
Check Sheet Hd785-7
11 pages
Life of Children in Victorian Era
No ratings yet
Life of Children in Victorian Era
4 pages
Auditory-Verbal Strategies for Language Skills
100% (1)
Auditory-Verbal Strategies for Language Skills
42 pages
MATRIX-EXERCISE
No ratings yet
MATRIX-EXERCISE
27 pages
D10 Corporate House: Saransh - ArchDaily
No ratings yet
D10 Corporate House: Saransh - ArchDaily
14 pages
LPG4 Project G.A. Drawing for Shell
No ratings yet
LPG4 Project G.A. Drawing for Shell
16 pages
AMC BR100B WMS WAREHOUSE MANAGEMENT APPLICATION SETUP 29-Oct-08
100% (2)
AMC BR100B WMS WAREHOUSE MANAGEMENT APPLICATION SETUP 29-Oct-08
163 pages
Surveying Interview Prep Guide
No ratings yet
Surveying Interview Prep Guide
16 pages
Furniture for Indian Homes
No ratings yet
Furniture for Indian Homes
133 pages
Project-Based Learning PPT 1
No ratings yet
Project-Based Learning PPT 1
24 pages
Algebra 30 - The LRDI Master
No ratings yet
Algebra 30 - The LRDI Master
6 pages
Rotary Dynamic Sealing Solutions Guide
No ratings yet
Rotary Dynamic Sealing Solutions Guide
1 page
65-nm CMOS TECHNOLOGY
No ratings yet
65-nm CMOS TECHNOLOGY
32 pages

ML Day3

Uploaded by

ML Day3

Uploaded by

CS229 Machine Learning: Supervised Learning

Andrew Ng (CS229, Autumn 2018)

May 30, 2025

2 Locally Weighted Regression (LWR) 4

3 Probabilistic Interpretation of Linear Regression 5

1.2 Key Components

2 Locally Weighted Regression (LWR)

• Solve using weighted least squares:

𝑊: Diagonal matrix with 𝑊𝑖𝑖 = 𝑤 𝑖 .

2.3 Bandwidth Parameter (𝜏)

2.4 Parametric vs. Non-Parametric

3 Probabilistic Interpretation of Linear Regression

3.2 Model Assumptions

3.4 Key Insights

4.2 Why Not Linear Regression?

4.4 Probabilistic Model

4.7 Why Sigmoid?

4.8 Practical Notes

5.3 Quadratic Convergence

You might also like