You are on page 1of 34
Cours e Summer Minin g Data
Cours
e
Summer
Minin
g
Data

Summer Course: Data Mining

Regression Analysis

Regression

Analysis

Presenter: Georgi

Presenter:

Georgi Nalbantov

Nalbantov

Cours e Summer Minin g Data Summer Course: Data Mining Regression Analysis Regression Analysis Presenter: Georgi
Cours e Summer Minin g Data Summer Course: Data Mining Regression Analysis Regression Analysis Presenter: Georgi

August 2009

Cours e Summer Minin g Data
Cours
e
Summer
Minin
g
Data

2/34

Structure

Regression analysis: definition and examples

Classical Linear Regression

LASSO and Ridge Regression (linear and nonlinear)

Nonparametric (local) regression estimation:

kNN for regression, Decision trees, Smoothers

Support Vector Regression (linear and nonlinear)

Variable/feature selection (AIC, BIC, R^2- adjusted)

Cours e Summer Minin g Data
Cours
e
Summer
Minin
g
Data

3/34

Feature Selection, Dimensionality Reduction, and Clustering in the KDD Process

U.M.Fayyad, G.Patetsky- Shapiro and P.Smyth
U.M.Fayyad, G.Patetsky-
Shapiro and P.Smyth

(1995)

Cours e Summer Minin g Data
Cours
e
Summer
Minin
g
Data

4/34

Common Data Mining tasks

Clustering

Regression

X 2

 

+

+

+

+

+

+ +

+

+

+

+

+

+ +

+

+

X 1

k-th Nearest Neighbour Parzen Window Unfolding, Conjoint Analysis, Cat-PCA

Classification

X 2

 

+

 

+

+

+

 

+

-

+

 

-

 

-

-

  • - -

X 1

Linear Discriminant Analysis,

QDA

Logistic Regression (Logit)

Decision Trees, LSSVM, NN, VS

+

+ + + + + + + +
+
+
+
+
+
+
+
+

X 1

Classical Linear

Regression

Ridge Regression

NN, CART

Cours e Summer Minin g Data
Cours
e
Summer
Minin
g
Data

5/34

Linear regression analysis: examples

Cours e Summer Minin g Data 5/34 Linear regression analysis: examples
Cours e Summer Minin g Data 5/34 Linear regression analysis: examples
Cours e Summer Minin g Data 5/34 Linear regression analysis: examples
Cours e Summer Minin g Data 5/34 Linear regression analysis: examples
Cours e Summer Minin g Data
Cours
e
Summer
Minin
g
Data

6/34

Linear regression analysis: examples

Cours e Summer Minin g Data 6/34 Linear regression analysis: examples
Cours e Summer Minin g Data 6/34 Linear regression analysis: examples
Cours e Summer Minin g Data 6/34 Linear regression analysis: examples
Cours e Summer Minin g Data
Cours
e
Summer
Minin
g
Data

7/34

The Regression task

Given data on m explanatory variables and 1 explained variable, where the explained variable can take real values in 1 , find a function that gives the “best” fit:

Given:

( x 1 , y 1 ), … , ( x m , y m )

n X 1

Find:

:

n 1

“best function” = the expected error on unseen data ( x m+1 , y m+1 ), … ,

(

x m+k , y m+k )

is minimal

Cours e Summer Minin g Data
Cours
e
Summer
Minin
g
Data

8/54

Classical Linear Regression (OLS)

Explanatory and Response Variables are Numeric

Relationship between the mean of the response variable and the level of the explanatory variable assumed to be approximately linear (straight line)

Model:

Y

0

x

1

N

~

(0, )

1 > 0 Positive Association 1 < 0 Negative Association

1 = 0

No Association

Cours e Summer Minin g Data
Cours
e
Summer
Minin
g
Data

9/54

Classical Linear Regression (OLS)

0 Mean response when x=0 (y-

intercept)

 1 Change in mean response when

x increases by 1 unit (slope)

0 , 1 are unknown parameters (like

)

Task:

0 +1 x  Mean response when

Minimize the

explanatory variable takes on the

sum value of x squared

errors:

^

^

^

y

0

1

x

SSE

n

i

1

y

i

^ ^ ^ y     x 0 1
^
^
^
y
 
 
x
0
1

^

y

i

2

n

i

1

y

i

^

^

0

1

x

i

2

     
   
 
Cours e Summer Minin g Data
Cours
e
Summer
Minin
g
Data

10/54

Classical Linear Regression (OLS)

Parameter: Slope in the population model (1 )

^

1

^

Estimator: Least squares estimate:

^

1

Cours e Summer Minin g Data 10/54 Classical Linear Regression (OLS)  Parameter: Slope in the

s/ S

xx

Estimated standard error:

y

y

s

2

^

2

SSE

n 2

n

2

S

xx

x

x

2

^ ^ ^ y     x 0 1
^
^
^
y
 
 
x
0
1

Methods of making inference regarding population:

Hypothesis tests (2-sided or 1- sided) Confidence Intervals

Coefficients a

 

Unstandardized

Standardized

   
 

Coefficients

 

Coefficients

Model

B

Std. Error

Beta

t

Sig.

1

(Constant)

x

LSD_CONC

1

89.124 -9.009

89.124

-9.009

 

7.048

1.503

-.937

12.646

-5.994

.000

.002

  • a. Dependent Variable: SCORE

y

Cours e Summer Minin g Data
Cours
e
Summer
Minin
g
Data

11/54

Classical Linear Regression (OLS)

Cours e Summer Minin g Data 11/54 Classical Linear Regression (OLS)
Cours e Summer Minin g Data
Cours
e
Summer
Minin
g
Data

12/54

Classical Linear Regression (OLS)

Cours e Summer Minin g Data 12/54 Classical Linear Regression (OLS)
Cours e Summer Minin g Data
Cours
e
Summer
Minin
g
Data

13/54

Classical Linear Regression (OLS)

Coefficient of determination (r 2 ) :

proportion of variation in y “explained” by the regression on x.

2

r

S

yy

SSE

S

yy

where

0

r

2

1

S

yy

y

y

2

SSE

 

y

^

y

2

 

Cours e Summer Minin g Data
Cours
e
Summer
Minin
g
Data

14/54

Classical Linear Regression (OLS):

Multiple regression

Numeric Response variable (y) p Numeric predictor variables Model:

Y = 0 + 1 x 1 +  + p x p +

Partial Regression Coefficients: i effect (on the mean response) of increasing the i th predictor variable by 1 unit, holding all other predictors constant

Cours e Summer Minin g Data
Cours
e
Summer
Minin
g
Data

15/54

Classical Linear Regression (OLS):

Ordinary Least Squares estimation

Population Model for mean response:

E Y x x x x

(

|

1

,

p

)

0

1

1

p p

Least Squares Fitted (predicted) equation, minimizing SSE:

^

^

^

^

Y

0

1

x

1

p

x

p

SSE

^

Y Y

2

Cours e Summer Minin g Data
Cours
e
Summer
Minin
g
Data

16/54

Classical Linear Regression (OLS):

Ordinary Least Squares estimation

Model:

^

^

^

^

Y

0

1

x

1

p

x

p

OLS estimation:

LASSO estimation:

min SSE

^

Y Y

2

min SSE

n

i

1

^

Y Y

2

p

j

1

j

Ridge regression estimation:

min SSE

n

i

1

^

Y Y

2

p

j

1

j

2

Cours e Summer Minin g Data
Cours
e
Summer
Minin
g
Data

17/59

LASSO and Ridge estimation of model coefficients

Cours e Summer Minin g Data 17/59 LASSO and Ridge estimation of model coefficients sum(|beta|) sum(|beta|)

sum(|beta|)

sum(|beta|)

Cours e Summer Minin g Data
Cours
e
Summer
Minin
g
Data

18/59

Nonparametric (local) regression estimation:

k-NN, Decision trees, smoothers

Cours e Summer Minin g Data 18/59 Nonparametric (local) regression estimation: k-NN, Decision trees, smoothers
Cours e Summer Minin g Data
Cours
e
Summer
Minin
g
Data

19/59

Nonparametric (local) regression estimation:

k-NN, Decision trees, smoothers

Cours e Summer Minin g Data 19/59 Nonparametric (local) regression estimation: k-NN, Decision trees, smoothers
Cours e Summer Minin g Data
Cours
e
Summer
Minin
g
Data

20/59

Nonparametric (local) regression estimation:

k-NN, Decision trees, smoothers

Cours e Summer Minin g Data 20/59 Nonparametric (local) regression estimation: k-NN, Decision trees, smoothers
Cours e Summer Minin g Data
Cours
e
Summer
Minin
g
Data

21/59

Nonparametric (local) regression estimation:

k-NN, Decision trees, smoothers

How to Choose k or h?

When k or h is small, single instances matter; bias is small, variance is large (undersmoothing): High complexity

As k or h increases, we average over more instances and variance decreases but bias increases (oversmoothing):

Low complexity

Cross-validation is used to finetune k or h.

Cours e Summer Minin g Data
Cours
e
Summer
Minin
g
Data

22/59

Linear Support Vector Regression

biggest area ● ● ● ● ● ● ● Expenditures
biggest area
Expenditures

Age

small area ● ● ● ● ● ● ● Expenditures
small area
Expenditures

Age

“Lazy case” (underfitting)

“Suspiciously smart case”

(overfitting)

middle-sized area ● ● ● ● ● ● ● “Support vectors” Expenditures
middle-sized
area
“Support
vectors”
Expenditures

Age

“Compromise case”, SVR (good generalisation)

The thinner the “tube”, the more complex the model

Cours e Summer Minin g Data
Cours
e
Summer
Minin
g
Data

23/59

Nonlinear Support Vector Regression

Map the data into a higher-dimensional space:

Cours e Summer Minin g Data 23/59 Nonlinear Support Vector Regression  Map the data into
● ● ● ● ● ● ● Expenditures
Expenditures

Age

Cours e Summer Minin g Data 23/59 Nonlinear Support Vector Regression  Map the data into
Cours e Summer Minin g Data 23/59 Nonlinear Support Vector Regression  Map the data into
Cours e Summer Minin g Data 23/59 Nonlinear Support Vector Regression  Map the data into
Cours e Summer Minin g Data 23/59 Nonlinear Support Vector Regression  Map the data into
Cours e Summer Minin g Data
Cours
e
Summer
Minin
g
Data

24/59

Nonlinear Support Vector Regression

Map the data into a higher-dimensional space:

● ● ● ● ● ● ● Expenditures
Expenditures

Age

Cours e Summer Minin g Data 24/59 Nonlinear Support Vector Regression  Map the data into
Cours e Summer Minin g Data 24/59 Nonlinear Support Vector Regression  Map the data into
Cours e Summer Minin g Data 24/59 Nonlinear Support Vector Regression  Map the data into
Cours e Summer Minin g Data 24/59 Nonlinear Support Vector Regression  Map the data into
Cours e Summer Minin g Data
Cours
e
Summer
Minin
g
Data

25/59

Nonlinear Support Vector Regression:

Technicalities

Cours e Summer Minin g Data 25/59 Nonlinear Support Vector Regression: Technicalities  The SVR function:

The SVR function:

To find the unknown parameters of the SVR function, solve:

Cours e Summer Minin g Data 25/59 Nonlinear Support Vector Regression: Technicalities  The SVR function:

Subject to:

Cours e Summer Minin g Data 25/59 Nonlinear Support Vector Regression: Technicalities  The SVR function:
 How to choose , ,  = RBF kernel:  Find , , , and
 How to choose
,
,
= RBF kernel:
Find
,
,
, and
from a cross-validation procedure

gamma

Cours e Summer Minin g Data
Cours
e
Summer
Minin
g
Data

26/59

SVR Technicalities: Model Selection

gamma Cours e Summer Minin g Data 26/59 SVR Technicalities: Model Selection  Do 5-fold cross-validation

Do 5-fold cross-validation to find

gamma Cours e Summer Minin g Data 26/59 SVR Technicalities: Model Selection  Do 5-fold cross-validation

values of

.

and

gamma Cours e Summer Minin g Data 26/59 SVR Technicalities: Model Selection  Do 5-fold cross-validation

for several fixed

CV_MSE, epsilon = 0.15

0.02

   

0.018

0.016

0.014

0.012

0.01

0.008

0.006

0

5

10

15

C

CV_MSE, epsilon = 0.15

0.064 0.063 0.062 0.061 0.06 0.059 0.02 0.058 0.01 0 5 10 150 CVMSE
0.064
0.063
0.062
0.061
0.06
0.059
0.02
0.058
0.01
0
5
10
150
CVMSE
Cours e Summer Minin g Data
Cours
e
Summer
Minin
g
Data

27/59

SVR Study :

Model Training, Selection and Prediction

CVMSE (IR*, HR*, CR*)

Cours e Summer Minin g Data 27/59 SVR Study : Model Training, Selection and Prediction CVMSE

True returns (red) and raw predictions (blue)

Cours e Summer Minin g Data 27/59 SVR Study : Model Training, Selection and Prediction CVMSE

CVMSE (IR*, HR*, CR*)

Cours e Summer Minin g Data 27/59 SVR Study : Model Training, Selection and Prediction CVMSE

SP500

Cours e Summer Minin g Data
Cours
e
Summer
Minin
g
Data

28/59

SVR: Individual Effects

Effect of 3m treasure bill on SP500

-0.5

-1

-1.5

-2

-2.5

-3

-3.5

SP500 Cours e Summer Minin g Data 28/59 SVR: Individual Effects Effect of 3m treasure bill

Effect of vix on SP500

-2.5 -3 -3.5 -4 -4.5 -5 -5.5 SP500
-2.5
-3
-3.5
-4
-4.5
-5
-5.5
SP500

Effect of credit spread on SP500

-2.8 -2.85 -2.9 -2.95 -3 Effect of vix FUT on SP500 -3.05 -2 -1 -0 5
-2.8
-2.85
-2.9
-2.95
-3
Effect of vix FUT on SP500
-3.05
-2 -1
-0 5
0 0
5
1
1 5
2 2
5
3
3 5
4
-2.5
-3
-3.5
-4
-4.5
SP500
SP500
Cours e Summer Minin g Data
Cours
e
Summer
Minin
g
Data

29/34

SVR Technicalities: SVR vs. OLS

4

3.5

3

2.5

2

 Performance on the test set  Performance on the test set SVR
 Performance on the test set
 Performance on the test set
SVR

4

3.5

3

2.5

2

0 5 10 15 20 25 30 35 40 OLS 0 5 10 15 20 25
0
5
10
15
20
25
30
35
40
OLS
0
5
10
15
20
25
30
35
40

Obserlation

MSE=

0.04

MSE=

0.23

30/34

Technical Note:

Number of Training Errors vs. Model Complexity

Min. number of training Model errors, test errors complexity training errors complexity
Min. number
of
training
Model
errors,
test errors
complexity
training errors
complexity

Best trade-off

Functions ordered

in

MATLAB video here…

30/34 Technical Note: Number of Training Errors vs. Model Complexity Min. number of training Model errors,

increasing

complexity

Cours e Summer Minin g Data
Cours
e
Summer
Minin
g
Data

31/34

Variable selection for regression

Akaike Information Criterion (AIC). Final prediction error:

Cours e Summer Minin g Data 31/34 Variable selection for regression  Akaike Information Criterion (AIC).
Cours e Summer Minin g Data
Cours
e
Summer
Minin
g
Data

32/34

Variable selection for regression

Bayesian Information Criterion (BIC), also known as Schwarz

criterion. Final prediction error:

Cours e Summer Minin g Data 32/34 Variable selection for regression  Bayesian Information Criterion (BIC),

BIC tends to choose simpler models than AIC.

Cours e Summer Minin g Data
Cours
e
Summer
Minin
g
Data

33/34

Variable selection for regression

R^2-adjusted:

Cours e Summer Minin g Data 33/34 Variable selection for regression  R^2-adjusted:
Cours e Summer Minin g Data
Cours
e
Summer
Minin
g
Data

34/34

Conclusion / Summary / References

Classical Linear Regression

(any introductory statistical/econometric book)

LASSO and Ridge Regression (linear

and nonlinear)

http://www-stat.stanford.edu/~tibs/lasso.html , Bishop, 2006

Nonparametric (local) regression

estimation:

Alpaydin, 2004, Hastie et. el., 2001

kNN for regression, Decision trees,

Smoothers

Smola and Schoelkopf, 2003

Support Vector Regression (linear and

nonlinear)

Hastie et. el., 2001, (any statistical/econometric book)

Variable/feature selection (AIC, BIC,