You are on page 1of 4

Multiple

Regression
Sample Model
2005-2007 Copyright.
Dr. Johnathan Mun.
All Rights Reserved.

Multiple Regression
This sample model illustrates how to use Risk Simulator (Enterprise Simulator) for:
1. Running a Multiple Regression Analysis
Model Background
File Name: Multiple Regression.xls
This example shows how a multiple regression can be run using Risk Simulator. The raw data are arranged in the Cross-Sectional Data worksheet,
which contains cross-sectional data on all 50 U.S. states on the number of aggravated assaults (in thousands) per year, the number of bachelor's degrees
awarded per year, police expenditure per capita population, population size in millions, population density (person per square mile), and unemployment
rate. The idea is to see if there is a relationship between the number of aggravated assaults per year and these explanatory variables using multiple
regression analysis.
Multiple Regression Analysis
To run this model, simply:
1. In the Cross Sectional Data worksheet, select the area C5:H55.
2. Select Simulation l Forecasting l Multiple Regression.
3. Choose Aggravated Assault as the dependent variable in the regression and click on OK.

Model Results Analysis

Note that more advanced regressions such as lag regressors, stepwise regression, and nonlinear regressions can also be run using Risk Simulator.
For details on running such regressions as well as the results interpretation, refer to Modeling Risk, (Wiley 2006) by Dr. Johnathan Mun.
Results Summary
Refer to the Report worksheet for details on the regression output. The report has more details on the interpretation of specific statistical results.
It provides the following elements: multiple regression and analysis of variance output, including coefficients of determination, hypothesis
test results (single variable t-test and multiple variable F-test), computed coefficients for each regressor, fitted chart, and much more.
Disclaimer
DEVELOPER SPECIFICALLY DISCLAIMS ALL OTHER WARRANTIES, EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. As standard practice for software development and end-user applications, it is important for
the Licensee to note that the valuation results attached herein are accurate to the software Developers best knowledge and are solely based on the information furnished
by the Licensee or end-user. While the software Developer has used his best efforts in preparing this report, he makes no representations or warranties with respect to
the accuracy or completeness of the contents of this model and specifically disclaims any implied warranties of merchantability of fitness for a particular purpose. The
Licensee hereby agrees that the Developer is not held liable for any loss of profit or any other commercial damages, including, but not limited to, special, incidental,
consequential or other damages. This model is only an illustration of using the software and in no way represents the correct and complete picture of an investor's
investment, risk, and return profile. The user is advised to take great care in using and interpreting the model and its results. Model was developed by Dr. Johnathan Mun
of www.realoptionsvaluation.com.

Copyright 2005-2007. Real Options Valuation, Inc. (www.realoptionsvaluation.com)

Multiple Regression Analysis Data Set


Aggravated
Assault

Bachelor's
Degree

521
367
443
365
614
385
286
397
764
427
153
231
524
328
240
286
285
569
96
498
481
468
177
198
458
108
246
291
68
311
606
512
426
47
265
370
312
222
280
759
114
419
435
186
87
188
303
102
127
251

18308
1148
18068
7729
100484
16728
14630
4008
38927
22322
3711
3136
50508
28886
16996
13035
12973
16309
5227
19235
44487
44213
23619
9106
24917
3872
8945
2373
7128
23624
5242
92629
28795
4487
48799
14067
12693
62184
9153
14250
3680
18063
65112
11340
4553
28960
19201
7533
26343
1641

Police
Expenditure Per
Capita
185
600
372
142
432
290
346
328
354
266
320
197
266
173
190
239
190
241
189
358
315
303
228
134
189
196
183
417
233
349
284
499
231
143
249
195
288
229
287
224
161
221
237
220
185
260
261
118
268
300

Population in
Millions

Population Density
(Persons/Sq Mile)

Unemployment
Rate

4.041
0.55
3.665
2.351
29.76
3.294
3.287
0.666
12.938
6.478
1.108
1.007
11.431
5.544
2.777
2.478
3.685
4.22
1.228
4.781
6.016
9.295
4.375
2.573
5.117
0.799
1.578
1.202
1.109
7.73
1.515
17.99
6.629
0.639
10.847
3.146
2.842
11.882
1.003
3.487
0.696
4.877
16.987
1.723
0.563
6.187
4.867
1.793
4.892
0.454

79.6
1
32.3
45.1
190.8
31.8
678.4
340.8
239.6
111.9
172.5
12.2
205.6
154.6
49.7
30.3
92.8
96.9
39.8
489.2
767.6
163.6
55
54.9
74.3
5.5
20.5
10.9
123.7
1042
12.5
381
136.1
9.3
264.9
45.8
29.6
265.1
960.3
115.8
9.2
118.3
64.9
21
60.8
156.3
73.1
74.5
90.1
4.7

7.2
8.5
5.7
7.3
7.5
5
6.7
6.2
7.3
5
2.8
6.1
7.1
5.9
4.6
4.4
7.4
7.1
7.5
5.9
9
9.2
5.1
8.6
6.6
6.9
2.7
5.5
7.2
6.6
6.9
7.2
5.8
4.1
6.4
6.7
6
6.9
8.5
6.2
3.4
6.6
6.6
4.9
6.4
5.8
6.3
10.5
5.4
5.1

Copyright 2005-2007. Real Options Valuation, Inc. (www.realoptionsvaluation.com)

Regression Analysis Report


Regression Statistics
R-Squared (Coefficient of Determination)
Adjusted R-Squared
Multiple R (Multiple Correlation Coefficient)
Standard Error of the Estimates (SEy)
nObservations

0.3272
0.2508
0.5720
149.6720
50

The R-Squared or Coefficient of Determination indicates that 0.33 of the variation in the dependent variable can be explained and accounted for by the independent variables in this
regression analysis. However, in a multiple regression, the Adjusted R-Squared takes into account the existence of additional independent variables or regressors and adjusts this RSquared value to a more accurate view of the regression's explanatory power. Hence, only 0.25 of the variation in the dependent variable can be explained by the regressors.
The Multiple Correlation Coefficient (Multiple R) measures the correlation between the actual dependent variable (Y) and the estimated or fitted (Y) based on the regression
equation. This is also the square root of the Coefficient of Determination (R-Squared).
The Standard Error of the Estimates (SEy) describes the dispersion of data points above and below the regression line or plane. This value is used as part of the calculation to obtain
the confidence interval of the estimates later.

Regression Results

Intercept Bachelor's Degree

Coefficients
Standard Error
t-Statistic
p-Value
Lower 5%
Upper 95%

57.9555
108.7901
0.5327
0.5969
-161.2966
277.2076

-0.0035
0.0035
-1.0066
0.3197
-0.0106
0.0036

Police
Expenditure
Per Capita

Population in
Millions

0.4644
0.2535
1.8316
0.0738
-0.0466
0.9753

25.2377
14.1172
1.7877
0.0807
-3.2137
53.6891

Degrees of Freedom
Degrees of Freedom for Regression
Degrees of Freedom for Residual
Total Degrees of Freedom

Population
Density
(Persons/Sq Unemploymen
Mile)
t Rate

-0.0086
0.1016
-0.0843
0.9332
-0.2132
0.1961

16.5579
14.7996
1.1188
0.2693
-13.2687
46.3845

Hypothesis Test
Critical t-Statistic (99% confidence with df of 44)
Critical t-Statistic (95% confidence with df of 44)
Critical t-Statistic (90% confidence with df of 44)

5
44
49

2.6923
2.0154
1.6802

The Coefficients provide the estimated regression intercept and slopes. For instance, the coefficients are estimates of the true; population b values in the following regression
equation Y = 0 + 1X1 + 2X2 + ... + nXn. The Standard Error measures how accurate the predicted Coefficients are, and the t-Statistics are the ratios of each predicted Coefficient to
its Standard Error.
The t-Statistic is used in hypothesis testing, where we set the null hypothesis (Ho) such that the real mean of the Coefficient = 0, and the alternate hypothesis (Ha) such that the real
mean of the Coefficient is not equal to 0. A t-test is is performed and the calculated t-Statistic is compared to the critical values at the relevant Degrees of Freedom for Residual. The
t-test is very important as it calculates if each of the coefficients is statistically significant in the presence of the other regressors. This means that the t-test statistically verifies
whether a regressor or independent variable should remain in the regression or it should be dropped.

The Coefficient is statistically significant if its calculated t-Statistic exceeds the Critical t-Statistic at the relevant degrees of freedom (df). The three main confidence levels used to
test for significance are 90%, 95% and 99%. If a Coefficient's t-Statistic exceeds the Critical level, it is considered statistically significant. Alternatively, the p-Value calculates each tStatistic's probability of occurrence, which means that the smaller the p-Value, the more significant the Coefficient. The usual significant levels for the p-Value are 0.01, 0.05, and
0.10, corresponding to the 99%, 95%, and 99% confidence levels.
The Coefficients with their p-Values highlighted in blue indicate that they are statistically significant at the 90% confidence or 0.10 alpha level, while those highlighted in red indicate
that they are not statistically significant at any other alpha levels.

Analysis of Variance

Regression
Residual
Total

Sums of
Squares
479388.4898
985675.1902
1465063.6800

Mean of
Squares
95877.6980
22401.7089

F-Statistic

p-Value

4.2799

0.0029

Hypothesis Test
Critical F-statistic (99% confidence with df of 4 and 3)
Critical F-statistic (95% confidence with df of 4 and 3)
Critical F-statistic (90% confidence with df of 4 and 3)

3.4651
2.4270
1.9828

Copyright 2005-2007. Real Options Valuation, Inc. (www.realoptionsvaluation.com)

The Analysis of Variance (ANOVA) table provides an F-test of the regression model's overall statistical significance. Instead of looking at individual regressors as in the t-test, the Ftest looks at all the estimated Coefficients' statistical properties. The F-Statistic is calculated as the ratio of the Regression's Mean of Squares to the Residual's Mean of Squares.
The numerator measures how much of the regression is explained, while the denominator measures how much is unexplained. Hence, the larger the F-Statistic, the more significant
the model. The corresponding p-Value is calculated to test the null hypothesis (Ho) where all the Coefficients are simultaneously equal to zero, versus the alternate hypothesis (Ha)
that they are all simultaneously different from zero, indicating a significant overall regression model. If the p-Value is smaller than the 0.01, 0.05, or 0.10 alpha significance, then the
regression is significant. The same approach can be applied to the F-Statistic by comparing the calculated F-Statistic with the critical F values at various significance levels.

Forecasting
Period
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50

Actual (Y)
521
367
443
365
614
385
286
397
764
427
153
231
524
328
240
286
285
569
96
498
481
468
177
198
458
108
246
291
68
311
606
512
426
47
265
370
312
222
280
759
114
419
435
186
87
188
303
102
127
251

Forecast (F)
299.5124
487.1243
353.2789
276.3296
776.1336
298.9993
354.8718
312.6155
529.7550
347.7034
266.2526
264.6375
406.8009
272.2226
231.7882
257.8862
314.9521
335.3140
282.0356
370.2062
340.8742
427.5118
274.5298
294.7795
295.2180
269.6195
195.5955
364.5004
287.0426
431.7568
323.6399
531.4356
325.3641
192.3960
378.1250
288.6064
317.5374
355.8075
316.6280
301.1523
193.4630
327.9304
474.7332
244.3734
247.3897
326.9181
337.6402
304.5301
301.1671
287.3149

Error (E)
221.4876
(120.1243)
89.7211
88.6704
(162.1336)
86.0007
(68.8718)
84.3845
234.2450
79.2966
(113.2526)
(33.6375)
117.1991
55.7774
8.2118
28.1138
(29.9521)
233.6860
(186.0356)
127.7938
140.1258
40.4882
(97.5298)
(96.7795)
162.7820
(161.6195)
50.4045
(73.5004)
(219.0426)
(120.7568)
282.3601
(19.4356)
100.6359
(145.3960)
(113.1250)
81.3936
(5.5374)
(133.8075)
(36.6280)
457.8477
(79.4630)
91.0696
(39.7332)
(58.3734)
(160.3897)
(138.9181)
(34.6402)
(202.5301)
(174.1671)
(36.3149)

Copyright 2005-2007. Real Options Valuation, Inc. (www.realoptionsvaluation.com)