You are on page 1of 41

CARS HH SIZE CARS

1 1 1 CARS regressed on HHSIZE
3.5
2 2 2

No. of cars
2 3 2 3
2 4 2 f(x) = 0.4x + 0.8
2.5 R² = 0.8
3 5 3
2

1.5

1

0.5

0
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5
No. of household members

CARS regressed on HHSIZE
3.5
No. of cars

3

2.5 f(x) = 0.9962543387 l n(x) + 1.0460881159
R² = 0.8017047516
2

1.5

1

0.5

0
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5
No. of household members

CARS regressed on HHSIZE
3.5
No. of cars

3
f(x) = 0.9767186839 exp( 0.2197224577 x )
2.5 R² = 0.7683868434

2

1.5

1

0.5

0
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5
No. of household members

CARS regressed on HHSIZE
3.5
No. of cars

3

2.5 f(x) = 1.0893387646 x^0.574455186

CARS regressed on HHSIZE
3.5

No. of cars
3

2.5 f(x) = 1.0893387646 x^0.574455186
R² = 0.8484911839
2

1.5

1

0.5

0
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5
No. of household members

CARS regressed on HHSIZE
3.5
No. of cars

3
f(x) = - 8.90158753083262E-17x^2 + 0.4x + 0.8
2.5 R² = 0.8

2

1.5

1

0.5

0
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5
No. of household members

4.5 5 5.5

Logarithmic: y = c + b*ln(x) + u
This is a linear-log relationship.

4.5 5 5.5

Exponential: y = c*exp(bx)*u
This is a log-linear relationship since taking logs gives ln(y) = ln(a) + b*ln(x) + ln(u)

4.5 5 5.5

Power: y = a*(x^b)*u

This is a log-log relationship since taking logs gives ln(y) = ln(a) + b*ln(x) + ln(u)

4.5 5 5.5

Polynomial: y = a + b*x + c*x^2 + d*x^4 + .... + u
This is for example a quadratic relationship if the poynomial is of order 2.

4.5 5 5.5

C.B2:B6) yields the same result COVARIANCE This is obtained in a similar way to correlation.894427 1 Variance(y) is the sample variance of y: (1/(n-1)) × Σ i (yi .894427 On the Formula Tab select the Function Library group and More Functions and Statistical Select Correlation and fill out the dialog box as below 0.894427 = CORREL(A2:A6. say x and y. equals 2 3 2 4 Covariance(x. 0.894427.y) is the sample covariance between x and y: (1/(n-1)) × CARS 1 Variance(x) is the sample variance of x: (1/(n-1)) × Σ i (xi .y) / [Sqrt(Variance(x)) * Sqrt(Variance(y))] 3 5 CALCULATION USING THE DATA ANALYSIS ADD-IN where CARS HH SIZE Covariance(x. CALCULATION USING THE CORREL FUNCTION 0.ybar)2 The correlation coefficient is 0. D and E then the array chosen is A1:E6 and produces a 5 x 5 table of correlations. We can use Data Analysis Add-in and Covariance CARS HH SIZE CARS 0. This can be extended to several series.894427.4 HH SIZE 0.8 .B1:B6) which yields 0.8 2 0. CARS HH SIZE CORRELATION COEFFICIENT 1 1 2 2 The correlation coefficient between two series. For example if there are data in columns A.894427 Alternatively directly type = CORREL(A1:A6.xbar)2 HH SIZE 0.8 0. B. Note that Excel dropped the first row (or labesl).

equals tween x and y: (1/(n-1)) × Σ i (xi .xbar)(yi .ies. say x and y.xbar)2 n-1)) × Σ i (yi .ybar)2 es a 5 x 5 table of .ybar) n-1)) × Σ i (xi .

418784 2.088932 0.133333 Total 4 2 Coefficients Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.6 12 0.4 x or CARS = 0. .018784 HH SIZE 0.8 0.127907 -0.418784 2.4 HHSIZE The regression statistics outyput gives measures of how well the model fits the data..4 0.4 (the Coefficient of HH SIZE : the slope coefficient) Thus the fitted line is: y = 0. CARS HH SIZE TWO-VARIABLE LINEAR REGRESSION 1 1 The population regression model is: y = β1 + β2 x + u 2 2 We wish to estimate the regression line: y = b1 + b2 x 2 3 2 4 3 5 We obtain SUMMARY OUTPUT Regression Statistics Multiple R 0. p-value.767477 INTERPRETING THE REGRESSION SUMMARY OUTPUT The key output is given in the Coefficients column in the last set of output: b1 = 0.464102 0.8 (the Intercept coefficient) b2 = 0.8 + 0.894427 R Square 0.4 0.018784 -0.. .365148 Observatio 5 ANOVA df SS MS F Significance F Regression 1 1.11547 3.032523 0.0% Intercept 0.040519 Residual 3 0. ) is used for statistical inference.8 which measures the fit of the model This means that 80% of the variation of yi around ybar is explained by the regressor xi Standard error = 0..733333 Standard E 0.767477 0.382971 2.365 which measures the standard deviation of yi around its fitted value.8 + 0.0% Upper 95.8 Adjusted R 0.032523 0. The remaining output (ANOVA table and t Stat.6 1.040519 0. In particular R2 = 0.

6 1.365148 This is the sample estimate of the standard deviation of the error u Observatio 5 Number of observations used in the regression (n) The standard error here refers to the estimated standard deviation of the error term u.464102 0.040519 Residual 3 0.11547 3.382971 2.6 12 0.8 Adjusted R 0.032523 0.767477 The regression output has three components: Regression statistics table ANOVA table Regression coefficients table. CARS HH SIZE Statistical Inference for Two-variable Regression 1 1 2 2 2 3 2 4 3 5 REGRESSION USING THE DATA ANALYSIS ADD-IN SUMMARY OUTPUT Regression Statistics Multiple R 0. It equals sqrt(SSE/(n-k)).018784 -0.0% Intercept 0.088932 0.032523 0.418784 2.040519 0.127907 -0.018784 HH SIZE 0. .133333 Total 4 2 Coefficients Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0% Upper 95. INTERPRET REGRESSION STATISTICS TABLE Regression Statistics Explanation Multiple R 0.8 0.767477 0.365148 Observatio 5 ANOVA df SS MS F Significance F Regression 1 1.8 R2 = coefficient of determination Adjusted R 0.4 0.894427 R = square root of R2 R Square 0.4 0.418784 2.894427 R Square 0. It is sometimes called the standard error of the regression.733333 Adjusted R2 used if more than one x variable Standard E 0.733333 Standard E 0.

032523 0.127907 -0. For example: R2 = 1 .8 0.418784 2.6 1.0% Intercept 0.4 0.133333 Total 4 2 The ANOVA (analysis of variance) table splits the sum of squares into its components. Thus Σ i (yi .0% Upper 95.040519 0.11547 3.4 0.418784 2.8 (which equals R2 given in the regression Statistics table).040519 Residual 3 0.464102 0.0.SSE = Residual (or error) sum of squares 0.ybar)2 = Σ i (yi .6 12 0.088932 0.767477 .018784 HH SIZE 0. Total sums of squares = Residual (or error) sum of squares + Regression (or explained) sum of squares.032523 0.365148 INTERPRET ANOVA TABLE ANOVA df SS MS F Significance F Regression 1 1.yhati)2 + Σ i (yhati .018784 -0.382971 2.0 (from data in the ANOVA table) 0. INTERPRET REGRESSION COEFFICIENTS TABLE Coefficients Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.ybar)2 where yhati is the value of yi predicted from the regression line and ybar is the sample mean of y.4/2. The remainder of the ANOVA table is described in more detail in Excel: Multiple Regression.767477 0.Residual SS / Total SS (general formula for R2) = 1 .

CONFIDENCE INTERVALS FOR SLOPE COEFFICIENT TEST HYPOTHESIS OF ZERO SLOPE COEFFICIENT ("TEST OF STATISTICAL SIGNIFICANCE") .

2 RESIDUAL OUTPUT PROBABILITY OUTPUT HH Residuals 0.2 0 .2 0.TEST HYPOTHESIS OF SLOPE COEFFICIENT EQUAL TO VALUE OTHER THAN ZERO -5.6 0.4 -0.4 -0.2 2 1.6 0.4 0.1.0) / 0.2 -0.2 -0.4 .4 5 2.4 .11457 FITTED VALUES AND RESIDUALS FROM REGRESSION LINE RESIDUAL OUTPUT Observation Predicted CARS Residuals 1 1.11547 -5.236973 = (0.196155 = (0.1.4 0.4 3 2 0 0 4 2.0) / 0.8 0.

4 -1.4 1 1.2 3 2 0 0 50 2 0 4 2.632456 90 3 -0.5 CARS 3 2.5 5 2.632456 10 1 2 1.264911 70 2 -0.5 2 1.6 Norm 3.2 0.2 -0.264911 30 2 0.2 -0.5 1 1. HH Residuals 0.6 0.4 1.8 0.4 -0.4 -0.6 Observation Predicted CARS Residuals Standard Residuals Percentile CARS 0.5 1 0.2 0.5 0 0 10 20 .

.

.

.

6 CARS 3.5 0. HH SIZE Residual Plot HH SIZE Line Fit Plot Residuals 0.5 CARS .2 2 0 1.4 3 2.5 0.

5 1 1.5 2 2.5 2 1.5 0.5 CARS 1 Predicted CARS -0.5 3 3. HH SIZE Residual Plot HH SIZE Line Fit Plot Residuals 0.5 1 0.5 5 5.5 CARS 3 2.4 3 2.5 5 5.5 4 4.5 3 3.5 2 2.5 -0.6 HH SIZE Normal Probability Plot 3.5 4 4.5 1 1.2 0.5 -0.5 0.5 0 0 10 20 30 40 50 60 70 80 90 100 Sample Percentile .6 CARS 3.4 0 HH SIZE 0.5 0.2 2 0 1.

lot CARS .

5 .lot CARS Predicted CARS 5.

STEYX and FORECAST The population regression model is: y = β1 + β2 x + u We wish to estimate the regression line: y = b1 + b2 x The individual functions INTERCEPT.8 INTERCEPT(A1:A6.4 2 2 2 3 2 4 3 5 Then Highlight the desired array D2:E6 Hit the F2 key (Then edit appears at the bottom left of the dpreadsheet). CARS HH SIZE 1 1 2 2 2 3 2 4 3 5 REGRESSION USING EXCEL FUNCTIONS INTERCEPT.A1:A6.365148 STEYX(A1:A6.8 0.36515 0.B1:B6) yields the OLS intercept estimate of 0.8 0.4 2 2 2 3 2 4 3 5 . SLOPE. SLOPE.B1:B6) yields the OLS slope estimate of 0. RSQ. First in cell D2 enter the function LINEST(A2:A6.4 0. RSQ. CARS HH SIZE 1 1 0.4 SLOPE(A1:A6.8 3.2 for X=6 (forecast 3. CARS HH SIZE 1 1 0.2 FORECAST(6.B1:B6) yields the OLS forecast value of Yhat=3. STEYX and FORECAST can be used to get key results for two-variable regression 0.2 cars for household of size 6).1.B1:B6) yields the R-squared of 0.8 RSQ(A1:A6.B2:B6.1).B1:B6) yields the standard error of the regression of 0.

11547 (slope) and 0. This yields CARS HH SIZE 1 1 0. say.8 PREDICTION USING EXCEL FUNCTION TREND CARS HH SIZE New Values 1 1 6 . and then highlight cells A8:B8.1.error of slope St. and hit CTRL-SHIFT-ENTER.4 0.8 0.365148 2 4 12 3 3 5 1. To get just the coefficients give the LINEST command with the last entry 0 rather than 1.8 2 2 0.6 0.11547 0. respectively. CARS HH SIZE 1 1 2 2 2 3 2 4 3 5 0. ie.4 where the results in A2:E6 represent Slope coef Intercept coef St. Finally Hit CTRL-SHIFT-ENTER. LINEST(A2:A6.error of regression F-test overall Degrees of freedom (n-k) Regression SS Residual SS In particular.4 0. hit F2 key. 0.4 HH SIZE with R2 = 0.0).8 + 0. the fitted regression is CARS = 0.error of intercept R-squared St.B2:B6.382971 (intercept).8 The estimated coefficients have standard errors of.382971 2 3 0.

6 .2 3.6 3.8 + 0.4*x (model) 3. 2 2 7 2 3 2 4 3 5 CARS HH SIZE New Values 1 1 6 2 2 7 2 3 2 4 3 5 trend y = 0.2 3.

.

.

.

145523 1.220245 2 4 9.976719 Exponential: y = c*exp(bx)*u This is a log-linear relationship since taking logs gives ln(y) = ln(a) + b*ln(x) + ln(u) .48278 0. The LOGEST function is the same as the LINEST function.768387 0.245731 0.976719 2 2 0.069647 0.952632 3 3 5 0. CARS HH SIZE 1 1 1. except that an exponential relationship is estim LOGEST rather than a linear relationship.245731 0.230995 2 3 0.

5 of cars 3 . CARS regressed on HHSIZE 3.

5 No.5 1 0.5 R² = 0.5 1 1.5 4 4.7683868434 2 1. of cars 3 f(x) = 0.nential relationship is estimated CARS regressed on HHSIZE 3.9767186839 exp( 0.2197224577 x ) 2.5 2 2.5 5 5.5 3 3.5 No. of household members .5 + b*ln(x) + ln(u) 0 0.

.

5 .5 5 5..

802508 R2 = coefficient of determination Adjusted R 0.79599 0.0%Upper 95.058514 The regression output has three components: Regression statistics table ANOVA table Regression coefficients table.172886 0.896552 0.063492 0.605016 0.361624 -2.185491 HH SIZE 0.482279 2.482279 2.159364 0.0% Intercept 0.392388 4.802508 4.444401 This is the sample estimate of the standard deviation of the error u Observation 5 Number of observations used in the regression (n) .054334 0. Regression Statistics Explanation Multiple R 0.155216 CUBED HH S 0.336468 0.895828 R Square 0.155216 -1.013114 0.605016 Adjusted R2 used if more than one x variable Standard Er 0. INTERPRET REGRESSION STATISTICS TABLE This is the following output.802508 Adjusted R 0.422704 0.895828 R = square root of R2 R Square 0.197492 Total 4 2 Coefficients Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.605016 Standard Er 0.392388 4. CARS HH SIZE CUBED HH SIZE EXCEL 2007: Multiple Regression 1 1 1 2 2 8 2 3 27 2 4 64 3 5 125 SUMMARY OUTPUT Regression Statistics Multiple R 0. Of greatest interest is R Square.509507 -1.444401 Observation 5 ANOVA df SS MS F Significance F Regression 2 1.197492 Residual 2 0.394984 0.185491 -2.00209 0.054334 0.764398 1.888021 -0.058514 -0.

where k equals = 3 Here FDIST(4.05.39498/2] = 4.2. It is sometimes called the standard error of the regression.(1-R2 )*(k-1)/(n-k) = . Adjusted R2 = R2 .0635.1729 0.05.8025 Correlation between y and y-hat is 0./std error) 1.3950/2. SSE = Residual (or error) sum of squares 0. 4.6050/2] / [.The above gives the overall goodness-of-fit measures: R2 = 0.0635. It equals sqrt(SSE/(n-k)). The column labeled F gives the overall F-test of H0: β2 = 0 and β3 = 0 versus Ha: at least one of β2 and β3 does not Aside: Excel computes F this as: F = [Regression SS/(k-1)] / [Residual SS/(n-k)] = [1. The standard error here refers to the estimated standard deviation of the error term u. Since 0. n-k) where k is the number of regressors including hte intercept.0.7960 ..1975 INTERPRET REGRESSION COEFFICIENTS TABLE t Stat =(coeff.1975 > 0.8958 (when squared gives 0. we do not reject H0 at signficance level 0.1975*2/2 = 0.2) = 0.0 (from data in the ANOVA table) 0.Residual SS / Total SS (general formula for R2) = 1 . k-1. Note: Significance F in general = FDIST(F.6050.0635 The column labeled significance F has the associated P-value.8025 (which equals R2 given in the regression Statistics table).8025).1975.8025 .444401 INTERPRET ANOVA TABLE For example: R2 = 1 . 0.

0021*z CONFIDENCE INTERVALS FOR SLOPE COEFFICIENTS 95% confidence interval for slope coefficient β2 is from Excel output (-1.33647 ± TINV(0. 0.4823 2. 2.025(3) × se(b2) = 0.1552 .8187 = (-1.2) = 0. 2) × 0.3365*x + 0.8966 + 0.4823.1552). -1.303 × 0.1552). 2.33647 ± 4.4823.8189 1.05.42270 4.1594 A simple summary of the above output is that the fitted line is y = 0.05.303 = 0.42270 TINV(0. Excel computes this as b2 ± t_.33647 ± 1.

0635 with p-value of 0.2) = 4. 2) = 0. n-k) where k is the number of regressors including hte intercept.05. OVERALL TEST OF SIGNIFICANCE OF THE REGRESSION PARAMETERS We test H0: β2 = 0 and β3 = 0 versus Ha: at least one of β2 and β3 does not equal zero.2) = 0. For example.2) = 0.025(2) = TINV(0.1975. Using the critical value approach We computed t = -1. Since the p-value is not less than 0.569| < 4.2. 2.1975. Conclude that the parameters are jointly statistically insignificant at significance level 0. 4.33647 . for HH SIZE p = TDIST(0.197492 PREDICTED VALUE OF Y GIVEN REGRESSORS Consider case where x = 4 in which case CUBED HH SIZE = x^3 = 4^3 = 64. [Here n=5 and k=3 so n-k=2]. 0.796.05 since t = |-1. Note: Significance F in general = FINV(F. 0.303. [Here n=5 and k=3 so n-k=2].5095.05 we do not reject the null hypothesis that the regression parameters are zero at significance level 0.05 since the p-value is > 0.257049 Do not reject the null hypothesis at level .05.303.05.257. k-1.4227 -1. 0.0635.H0 value of β2) / (standard error of b2 ) = (0.05.5095 TEST HYPOTHESIS ON A REGRESSION PARAMETER Then t = (b2 . From the ANOVA table the F-test statistic is 4.302653 So do not reject null hypothesis at level . Here FDIST(4.2.569. .569733 Using the p-value approach p-value = TDIST(1.TEST HYPOTHESIS OF ZERO SLOPE COEFFICIENT ("TEST OF STATISTICAL SIGNIFICANCE") There are 5 observations and 3 regressors (intercept and x) so we use t(5-3)=t(2).1) / 0.569 The critical value is t_.

. . such asheteroskedastic-robust or autocorrelation-robust standard errors and t-statistics and p-values.yhat = b1 + b2 x2 + b3 x3 = 0.376176 EXCEL LIMITATIONS Excel restricts the number of regressors (only up to 16 regressors ??). SAS. e. is needed. . Excel standard errors and t-statistics and p-values are based on the assumption that the error is independent with constant variance (homoskedastic). EVIEWS.3365×4 + 0.g. If the regressors are in columns B and D you need to copy at least one of columns B and D so that they are adjacent to each other.88966 + 0. LIMDEP. You may need to move columns to ensure this.. More specialized software such as STATA. Excel does not provide alternaties. PC-TSP.37006 2. Excel requires that all the regressor variables be in adjoining columns.0021×64 = 2.