0% found this document useful (0 votes)
40 views19 pages

Ch2 NonParametricRegression Part2

The document discusses nonparametric regression techniques, focusing on local regression methods such as nearest neighbor and kernel methods. It explains the use of local averages, kernel regression, and local linear regression to create flexible and continuous fits, while addressing issues like boundary problems and bias-variance trade-offs. Additionally, it covers the application of these techniques to multiple predictor variables and the use of different kernel functions for improved estimation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views19 pages

Ch2 NonParametricRegression Part2

The document discusses nonparametric regression techniques, focusing on local regression methods such as nearest neighbor and kernel methods. It explains the use of local averages, kernel regression, and local linear regression to create flexible and continuous fits, while addressing issues like boundary problems and bias-variance trade-offs. Additionally, it covers the application of these techniques to multiple predictor variables and the use of different kernel functions for improved estimation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Nonparametric Regression

• Fit more flexible regression functions f (X)

• Local regression at each query point x0

• → Nearest neighbor methods


→ kernel methods

1
Local average
• Only one predictor variable

• K -nearest neighbor average at x0 : Average of


K closest points to x0
→ Simple and flexible estimator
→ Discontinuous (bumpy) fit

2
Example: 20-nearest neighbor
average

2
1
y

0
−1

0.0 0.2 0.4 0.6 0.8 1.0

3
Kernel regression
• Resolve discontinuity
• Use local weighted fits
• Weight function Kλ (x0 , x)
• Weight decreases smoothly with distance from
target point: smooth fit
∑n
Kλ (x0 ,xi )yi
• fˆλ (x0 ) = ∑i=1
n
i=1 Kλ (x0 ,xi )
→ Nadaraya-Watson kernel-weighted average

4
Weight function
( )
|x − x0 |
Kλ (x0 , x) = D
λ
• Epanechnikov: D(t) = 34 (1 − t2 ) I(|t| ≤ 1)

• Tri-cube: D(t) = (1 − |t|3 )3 I(|t| ≤ 1)

• Gaussian: D(t) = ϕ(t)

5
Kernel functions

Epanechnikov

0.8
Tri−cube
Gaussian
0.6
D(t)

0.4
0.2
0.0

−3 −2 −1 0 1 2 3

6
Weight function
• Epanechnikov and tri-cube: compact support

• Gaussian: noncompact support

• Tri-cube is flatter on top than Epanechnikov


→ More efficient results but more bias

7
Kernel-weighted average
• Continuous fit

• Uses fixed width neighborhoods

• λ in the kernel function controls the window size


→ Bias-variance trade-off
→ λ ↗⇒ bias↗, variance↘

8
Example: Gaussian kernel, λ = 0.2

2
1
y

0
−1

0.0 0.2 0.4 0.6 0.8 1.0

9
NN and kernels
• Continuous fit and adaptive neighborhoods
→ Kernels with variable window width
( )
|x − x0 |
E.g. Kλ (x0 , x) = D
|x(k) − x0 |

→ λ(x0 ) = |x(k) − x0 |: distance to k th


nearest neighbor

10
Boundary problems
Local averages can have problems at the boundary

• Asymmetric neighborhoods

• NN: wider neighborhood ⇒ bias↗

• Kernel: less points ⇒ variance↗


→ Use higher order local regression

11
Local linear regression
• Use local linear fits (lines)
• Reduces bias substantially
• Solve at each target x0

n
min = Kλ (x0 , xi )(yi − β0 − β1 xi )2
β0 ,β1
i=1

→ fˆ(x0 ) = β̂0 (x0 ) + β̂1 (x0 )x0


→ Different linear model at each target x0

12
Local linear regression
• W (x0 ) = diag(Kλ (x0 , xi ))
→ fˆ(x0 ) =
x̃t0 (Xt W (x0 )X)−1 Xt W (x0 )y =
l(x0 )t y
• S kernel
λ = (l(x 1 ), . . . , l(x n )) t

→ f̂ = S kernel
λ y
→ A linear operator!

13
Effective degrees of freedom
• f̂ = S kernel
λ y
→ Effective degrees of freedom is given by
trace(S kernel
λ )
→ Useful to select tuning parameter λ

14
Example: Gaussian kernel, linear fit

2
1
y

0
−1

0.0 0.2 0.4 0.6 0.8 1.0

15
Local polynomial regression
• fit a local polynomial of degree M
( )2

n ∑
M
min = Kλ (x0 , xi ) yi − β0 − βm xm
i
β0 ,β1 ,...,βm
i=1 m=1
∑M
→ fˆ(x0 ) = β̂0 (x0 ) + β̂ (x
m=1 m 0 0)xm

• Further (smaller) reduction of bias ( High


curvature regions)

• Increased variance

16
Ex: Gaussian kernel, quadratic fit

2
1
y

0
−1

0.0 0.2 0.4 0.6 0.8 1.0

17
More than 1 predictor
• d-dimensional kernel functions
• Typically radial functions
( )
∥x − x0 ∥
Kλ (x0 , x) = D
λ
→ Standardize predictors
• More boundary problems
→ Use linear fits!

18
More than 1 predictor
More general kernel
( −1 )
(x − x0 t
) A (x − x0 )
Kλ (x0 , x) = D
λ
• A: positive semidefinite matrix

• Weigh components

• Correlations between features

19

You might also like