1. Descriptive statistics 2. Foundations of inferential statistics 3. Estimation and confidence intervals 4. Testing statistical hypotheses 5.

Regression analysis

5.1 Correlation 5.2 Simple linear regression 5.3 Multiple regression

All these notions can be extended to the case with multiple predictors...

193 / 221

Veronika Czellar HEC Paris

Statistics

1. Descriptive statistics 2. Foundations of inferential statistics 3. Estimation and confidence intervals 4. Testing statistical hypotheses 5. Regression analysis

5.1 Correlation 5.2 Simple linear regression 5.3 Multiple regression

Example We can use two predictors for Intel: S&P500 and inflation.

−0.4−0.3−0.2−0.1 0.0 0.1 0.2 0.3 0.4

Intel

0.015 0.010 0.005 0.000 −0.005 −0.010 −0.015 −0.020

−0.20−0.15−0.10−0.05 0.00 0.05 0.10

SP500

194 / 221

Veronika Czellar HEC Paris

Statistics

Inflation

where xi1 . σ > 0 is a fixed and unknown parameter. .3 Multiple regression 5. . Regression analysis 5. . β1 .2 Simple linear regression 5. . . . .i.3. . . n .1. Foundations of inferential statistics 3. . xik are observable variables. βk are fixed and unknown parameters. Definition: multiple linear regression equation yi = β0 + β1 xi1 + β2 xi2 + · · · + βk xik + εi . Estimation and confidence intervals 4.d. β0 . ε1 . 195 / 221 Veronika Czellar HEC Paris Statistics i = 1.1 Multiple regression equation We extend the regression theory to k explanatory variables. Descriptive statistics 2.1 Correlation 5. εn ∼ i. . . .3 Multiple regression 5. N (0. . . σ 2 ). Testing statistical hypotheses 5. .

Testing statistical hypotheses 5. βk are n ˆ ˆ (β0 .βk i=1 yi − (β0 + β1 xi1 + · · · + βk xik ) 2 .. . Estimation and confidence intervals 4. Regression analysis 5. .1 Correlation 5.1.. .. βk ) = arg min β0 . . . 196 / 221 Veronika Czellar HEC Paris Statistics .2 Simple linear regression 5.3 Multiple regression Definitions The least squares (LS) estimators of β0 . Descriptive statistics 2. . . Remark: explicit formulas for these estimators are available . Foundations of inferential statistics 3. . .. .

Foundations of inferential statistics 3. . and is ˆ β = (X T X )−1 X T Y . . ε =  . . . Testing statistical hypotheses 5. . . . with Y =  . Y = Xβ + ε.3 Multiple regression . . . we will use Excel to estimate the parameters. No need to learn this slide by heart. yn 1 xn1 · · · xnk βk εn The LS estimator of β minimizes (Y − X β)T (Y − X β). β =  .1. .         1 x11 · · · x1k y1 β0 ε1 . Estimation and confidence intervals 4.  . Descriptive statistics 2. .1 Correlation 5. . X =  . . . .  . . Regression analysis 5. 197 / 221 Veronika Czellar HEC Paris Statistics . . but require a matrix form of the regression model.2 Simple linear regression 5.

1 Correlation 5.2 Simple linear regression 5. Testing statistical hypotheses 5. Regression analysis 5. Foundations of inferential statistics 3.1. Estimation and confidence intervals 4.3 Multiple regression Regressing Intel on S&P500 and inflation: Back to one predictor 198 / 221 Veronika Czellar HEC Paris Statistics . Descriptive statistics 2.

1 Correlation 5. Testing statistical hypotheses 5. Foundations of inferential statistics 3.2 Simple linear regression 5. Regression analysis 5.1. Descriptive statistics 2. Estimation and confidence intervals 4. than in the case of one predictor S&P500 only. Question What does R Square mean in the multiple regression? 199 / 221 Veronika Czellar HEC Paris Statistics .3 Multiple regression The R Square in the Excel output is higher.

Syy where Syy = n (yi − y )2 and yi are the fitted values ˆ i=1 ˆ0 + β1 xi1 + · · · + βk xik . Testing statistical hypotheses 5. Statistics . Regression analysis 5. Foundations of inferential statistics 3.3 Multiple regression 5. ˆ ˆ yi = β ˆ Proposition R2 = 1 − 200 / 221 Veronika Czellar HEC Paris n i=1 (yi Syy − yi ) 2 ˆ .2 Evaluating a multiple regression equation Definition n y i=1 (ˆi Back to simple regression The coefficient of multiple determination R 2 is defined by R2 = − y )2 . Estimation and confidence intervals 4. Descriptive statistics 2.3.1 Correlation 5.1.2 Simple linear regression 5.

Estimation and confidence intervals 4.2 Simple linear regression 5. A value near 0 indicates little linear association between the set of independent variables and the dependent variable. R 2 can almost always be made very close to 1 by using a model with k quite close to n. Testing statistical hypotheses 5. Foundations of inferential statistics 3.1 Correlation 5. R 2 cannot go down when an extra predictor is added to the model and it will generally increase. Regression analysis 5. A value near 1 means a strong association. 2 3 201 / 221 Veronika Czellar HEC Paris Statistics .1.3 Multiple regression Properties of the coefficient of multiple determination 1 It can range from 0 to 1. even if many of the predictors would contribute only marginally to variation in y . Descriptive statistics 2.

adjusted R 2 penalizes the addition of extraneous predictors to the model.3 Multiple regression Definition The adjusted R 2 is defined by Adjusted R 2 = 1 − Properties of the adjusted R 2 1 n−1 n−k −1 n i=1 (yi Syy − yi ) 2 ˆ . 2 202 / 221 Veronika Czellar HEC Paris Statistics .2 Simple linear regression 5.1.1 Correlation 5. Estimation and confidence intervals 4. Regression analysis 5. Testing statistical hypotheses 5. Foundations of inferential statistics 3. Descriptive statistics 2. adjusted R 2 is smaller than R 2 .

But how large should this value be before we draw this conclusion? 203 / 221 Veronika Czellar HEC Paris Statistics . Regression analysis 5.3 Multiple regression Question High values of R 2 suggest that the model fit is a useful one.1. Foundations of inferential statistics 3.2 Simple linear regression 5. Estimation and confidence intervals 4.1 Correlation 5. Descriptive statistics 2. Testing statistical hypotheses 5.

Testing statistical hypotheses 5.3 Testing the global utility of the multiple regression H0 : β 1 = β 2 = · · · = β k = 0 Ha : at least one among β1 . .3 Multiple regression 5. Regression analysis 5. .1. Estimation and confidence intervals 4. Descriptive statistics 2.1 Correlation 5. Foundations of inferential statistics 3.2 Simple linear regression 5.n−k−1) .3. . . βk is not zero Model utility F test: F = R 2 /k H0 ∼ F(k. 2 )/(n − k − 1) (1 − R 204 / 221 Veronika Czellar HEC Paris Statistics .

Descriptive statistics 2. Regression analysis 5.1. Foundations of inferential statistics 3.2 Simple linear regression 5.3 Multiple regression Model utility F test for the Intel example with two predictors: 205 / 221 Veronika Czellar HEC Paris Statistics . Testing statistical hypotheses 5. Estimation and confidence intervals 4.1 Correlation 5.

1. Testing statistical hypotheses 5. Foundations of inferential statistics 3. Descriptive statistics 2. 206 / 221 Veronika Czellar HEC Paris Statistics . it does not mean that all predictors are useful.1 Correlation 5. Regression analysis 5.3 Multiple regression Warning If the F test results in the rejection of H0 .2 Simple linear regression 5. Estimation and confidence intervals 4.

1. Estimation and confidence intervals 4. .. . we can test H0 : β j = 0 . . Foundations of inferential statistics 3. Descriptive statistics 2. k}. For any given j ∈ {0.3 Multiple regression 5. Testing statistical hypotheses 5. Ha : β j = 0 using a t test: T βj = where ˆ βj ˆ SE (βj ) H0 ∼ tn−k−1 . 207 / 221 Veronika Czellar HEC Paris Statistics . .1.3.2 Simple linear regression 5..4 Evaluating individual regression coefficients The t tests can be extended to the multivariate case. Regression analysis 5.1 Correlation 5.

1 Correlation 5. Regression analysis 5.2 Simple linear regression 5. Testing statistical hypotheses 5. ˆ jj 1 ˆ σ = n−k−1 n (yi − yi )2 and is called multiple standard ˆ i=1 error of estimate. (and has the matrix form σ 2 (X T X )−1 ). Example Do an individual test of each independent variable for the Intel regression with two predictors. Estimation and confidence intervals 4.05 significance level. Foundations of inferential statistics 3.1. Descriptive statistics 2.3 Multiple regression ˆ SE (βj ) is the standard error of the coefficient j. Which variable would you consider eliminating? Use the 0. 208 / 221 Veronika Czellar HEC Paris Statistics .

2 Simple linear regression 5. Foundations of inferential statistics 3.3 Multiple regression 209 / 221 Veronika Czellar HEC Paris Statistics .1. Descriptive statistics 2.1 Correlation 5. Regression analysis 5. Testing statistical hypotheses 5. Estimation and confidence intervals 4.

we should delete only one variable at a time. Testing statistical hypotheses 5. we need to rerun the regression equation and check the remaining variables. Estimation and confidence intervals 4.1.3 Multiple regression Remark: if there are more than one nonsignificant variables.2 Simple linear regression 5. 210 / 221 Veronika Czellar HEC Paris Statistics . Descriptive statistics 2. Regression analysis 5.1 Correlation 5. This method is called backward stepwise regression method. Each time we delete a variable. Foundations of inferential statistics 3.

1. Estimation and confidence intervals 4.2 Simple linear regression 5. Descriptive statistics 2. Testing statistical hypotheses 5. 211 / 221 Veronika Czellar HEC Paris Statistics . Example Global warming is the increase in the average temperature of Earth’s near-surface air and oceans since the mid-20th century and its projected continuation. 37 percent above those in 1990. Regression analysis 5.3.1 billion tons in 2009.3 Multiple regression 5.5 Transformed variables We can also include transformed variables or mixtures of variables in a multiple regression model.1 Correlation 5. Foundations of inferential statistics 3. Global CO2 emissions totalled 31.txt for more than 65 countries has been released in August 2010 by and available on the CERINA Plan website (and on the course website as well). It is well-known that climate change is influenced by human CO2 emissions. Global data GlobalAirpollution.

Estimation and confidence intervals 4.2009 ) GDP2009realgrowth : GDP real growth rate (in %.1 Correlation 5.2008 212 / 221 Veronika Czellar HEC Paris Statistics i = 1.1 + β2 xi. Testing statistical hypotheses 5. xi.2 + β3 xi. Descriptive statistics 2. yi. yi.1 ) PopGrowth2009 : population growth rate (in %.1 ) 2 SquarePopGrowth2009 : squared PopGrowth2009 (xi.2 Simple linear regression 5.2 ) Fit the following model: yi. 65 . .1.2008 ) Year2009 : emissions of CO2 in 2009 (in million tons. Foundations of inferential statistics 3.2009 2 2 = β0 + β1 xi.1 + β4 xi. xi. . yi.2 ) 2 SquareGDP2009 : squared GDP2009realgrowth (xi. . Year2008 : emissions of CO2 in 2008 (in million tons. Regression analysis 5. . .2 + εi .3 Multiple regression Example continued We would like to investigate the impact of GDP per capita and population growth on the increase of CO2 emissions.

Regression analysis 5.1 Correlation 5.2 Simple linear regression 5.3 Multiple regression 213 / 221 Veronika Czellar HEC Paris Statistics . Descriptive statistics 2. Foundations of inferential statistics 3. Estimation and confidence intervals 4.1. Testing statistical hypotheses 5.

Descriptive statistics 2. Estimation and confidence intervals 4.2 Simple linear regression 5. Foundations of inferential statistics 3.3 Multiple regression 214 / 221 Veronika Czellar HEC Paris Statistics . Regression analysis 5. Testing statistical hypotheses 5.1.1 Correlation 5.

6 Dummy variables We can also include a dummy variable as a predictor. which takes the values 0 or 1 to indicate the absence or presence of some categorical effect. Foundations of inferential statistics 3. Testing statistical hypotheses 5.3. Example: CEO salaries (see NorthwestCEOsalaries.1.txt on course website) 215 / 221 Veronika Czellar HEC Paris Statistics .1 Correlation 5. Estimation and confidence intervals 4.3 Multiple regression 5. Regression analysis 5. Descriptive statistics 2.2 Simple linear regression 5.

Example: prices of LCD televisions (see LCD. Descriptive statistics 2.txt on course website.1 Correlation 5. Regression analysis 5.2 Simple linear regression 5.7 Qualitative variables A categorical (or qualitative) variable is a predictor that takes a finite number d possible values.3. Testing statistical hypotheses 5. Only d − 1 categories are added to the regression model.3 Multiple regression 5. and exercise 5.12) 216 / 221 Veronika Czellar HEC Paris Statistics . Estimation and confidence intervals 4. Foundations of inferential statistics 3.1.

Testing statistical hypotheses 5. Foundations of inferential statistics 3. Regression analysis 5.3 Multiple regression 5.1 Correlation 5.3.2 Simple linear regression 5. Estimation and confidence intervals 4.1. Descriptive statistics 2.8 Interaction variables In some cases. Example: CEO salaries The product between the woman dummy and sales is an interaction term. 217 / 221 Veronika Czellar HEC Paris Statistics . which are products of at least two variables. it can be useful to add interaction terms.

.8). Estimation and confidence intervals 4.1 Correlation 5. Regression analysis 5. there is an additional requirement in multiple regression: predictors should not be correlated. Descriptive statistics 2.2.2 Simple linear regression 5. Foundations of inferential statistics 3.1. Testing statistical hypotheses 5. Back to simple regression However. . 218 / 221 Veronika Czellar HEC Paris Statistics .3 Multiple regression Model assumptions in multiple regression can be verified in the same way as in simple linear regression (see 5.

A regression coefficient that should have a positive sign turns out to be negative.3 Multiple regression 5. Several clues that indicate problems with multicollinearity: An independent variable known to be an important predictor ends up being not significant.1. Testing statistical hypotheses 5. Descriptive statistics 2. Estimation and confidence intervals 4. or vice versa.3. Regression analysis 5. Foundations of inferential statistics 3. there is a drastic change in the values of the remaining coefficients. 219 / 221 Veronika Czellar HEC Paris Statistics . When an independent variable is added or removed.9 Multicollinearity Multicollinearity exists when independent variables are correlated.1 Correlation 5.2 Simple linear regression 5.

Estimation and confidence intervals 4. Neter and Li (2005). Applied Regression Analysis and Generalized Linear Models..2 Simple linear regression 5. see Kutner. 2nd ed. Fox (2008). 5th ed. Descriptive statistics 2. Sage Publications. Testing statistical hypotheses 5.1 Correlation 5. 220 / 221 Veronika Czellar HEC Paris Statistics . Foundations of inferential statistics 3. Nachtscheim.3 Multiple regression For further details about linear regression. Regression analysis 5.. McGraw-Hill. Applied Linear Statistical Models.1.

Testing statistical hypotheses 5.2 Simple linear regression 5. Estimation and confidence intervals 4. Descriptive statistics 2.. Regression analysis 5. Merci Danke Grazie Gracias Spasibo K¨sz¨n¨m o o o 221 / 221 Veronika Czellar HEC Paris Statistics ..1. Foundations of inferential statistics 3.1 Correlation 5.3 Multiple regression Thank you.