You are on page 1of 18

Regression Applications

and
Minitab Output
Simple Regression
Claim Distance Damage
1 3.4 26.2
 An Insurance Example 2 1.8 17.8

– An insurance company 3 4.6 31.3

suspects that distance from the 4 2.3 23.1

nearest fire station is a good 5 3.1 27.5

predictor of the damage 6 5.5 36.0

incurred if there is a house 7 0.7 14.1

fire. A sample of claims over 8 3.0 22.3

the past year yielded the 9 2.6 19.6

following data: 10 4.3 31.3

11 2.1 24.0

12 1.1 17.3

13 6.1 43.2

14 4.8 36.4

15 3.8 26.1
Scatter Plot of Fire Damage vs. Distance
Scatterplot with Regression Line
Simple Regression Example
Minitab Output
 Results for: Fire Damage_Regression.MTW
 Regression Analysis: Damage versus Distance
 The regression equation is
 Damage = 10.3 + 4.92 Distance
 Predictor Coef SE Coef T P
 Constant 10.278 1.420 7.24 0.000
 Distance 4.9193 0.3927 12.53 0.000

 S = 2.31635 R-Sq = 92.3% R-Sq(adj) = 91.8%


 Analysis of Variance
 Source DF SS MS F P
 Regression 1 841.77 841.77 156.89 0.000
 Residual Error 13 69.75 5.37
 Total 14 911.52

 Note: Model is significant, high R2 – strong relationship


Minitab Output
Residual Plots
Residual Plots for Damage
Normal Probability Plot of the Residuals Residuals Versus the Fitted Values
99 4

90 2

Residual
Percent

50 0

10 -2

1 -4
-5.0 -2.5 0.0 2.5 5.0 10 20 30 40
Residual Fitted Value

Histogram of the Residuals Residuals Versus the Order of the Data


3 4

2
Frequency

2
Residual

1
-2

0 -4
-3 -2 -1 0 1 2 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Residual Observation Order
Multiple Regression Example
 Handout example of estimating time (man-hours)
required to erect boiler drums in future projects
– 4 independent variables
– 2 quantitative variables (x1 = capacity, x2 = pressure)
– 2 qualitative variables (x3 = boiler type, x4 = drum type)
– Each qualitative variable has two possibilities
 x3 = 1 if industrial, x3 = 0 if utility
 x4 = 1 if steam, x4 = 0 if mud
 Utility and Mud are the base cases
Multiple Regression Example
Minitab Output
Scatterplot of ManHrs vs Capacity
16000

14000

12000

10000
ManHrs

8000

6000

4000

2000

0
0 200000 400000 600000 800000 1000000 1200000
Capacity

Man-hours and Capacity appear to be positively related


More Minitab Output
Scatterplot of ManHrs vs Pressure
16000

14000

12000

10000
ManHrs

8000

6000

4000

2000

0
0 500 1000 1500 2000
Pressure

Man-hours and Pressure appear to be positively related


Minitab Output Continued
Scatterplot of ManHrs vs BoilType
16000

14000

12000

10000
ManHrs

8000

6000

4000

2000

0
0.0 0.2 0.4 0.6 0.8 1.0
BoilType

Man-hours and boiler type appear to be negatively related


Minitab Output Continued
Scatterplot of ManHrs vs DrumType
16000

14000

12000

10000
ManHrs

8000

6000

4000

2000

0
0.0 0.2 0.4 0.6 0.8 1.0
DrumType

Man-hours and Drum type appear to be positively related


Minitab Output
 Regression Analysis: ManHrs versus Capacity, Pressure, ...

 The regression equation is


 ManHrs = - 3783 + 0.00875 Capacity + 1.93 Pressure + 3444 BoilType + 2093 DrumType

 Predictor Coef SE Coef T P


 Constant -3783 1205 -3.14 0.004
 Capacity 0.008749 0.0009035 9.68 0.000
 Pressure 1.9265 0.6489 2.97 0.006
 BoilType 3444.3 911.7 3.78 0.001
 DrumType 2093.4 305.6 6.85 0.000

 S = 894.603 R-Sq = 90.3% R-Sq(adj) = 89.0%

 Analysis of Variance
 Source DF SS MS F P
 Regression 4 230854854 57713714 72.11 0.000
 Residual Error 31 24809761 800315
 Total 35 255664615

 Source DF Seq SS
 Capacity 1 175007141
 Pressure 1 490357
 BoilType 1 17813091
 DrumType 1 37544266

 Unusual Observations
 Obs Capacity ManHrs Fit SE Fit Residual St Resid
 19 1089490 14791 12022 523 2769 3.81R

 R denotes an observation with a large standardized residual.


Residual Plots
Residual Plots for ManHrs
Normal Probability Plot of the Residuals Residuals Versus the Fitted Values
99 3000

90 2000

Residual
Percent

1000
50
0
10
-1000
1
-2000 0 2000 0 3000 6000 9000 12000
Residual Fitted Value

Histogram of the Residuals Residuals Versus the Order of the Data


3000
10.0
2000
7.5
Frequency

Residual 1000
5.0
0
2.5
-1000
0.0
-1000 0 1000 2000 3000 1 5 10 15 20 25 30 35
Residual Observation Order
Stepwise Regression
 Stepwise Regression: ManHrs versus Capacity, Pressure, ...
 Alpha-to-Enter: 0.15 Alpha-to-Remove: 0.15
 Response is ManHrs on 4 predictors, with N = 36
 Step 1 2 3 4
 Constant 1760.3 652.9 -949.3 -3783.4
 Capacity 0.00795 0.00750 0.00916 0.00875
 T-Value 8.59 11.84 9.20 9.68
 P-Value 0.000 0.000 0.000 0.000
 DrumType 2249 2233 2093
 T-Value 6.36 6.63 6.85
 P-Value 0.000 0.000 0.000

 BoilType 1390 3444
 T-Value 2.10 3.78
 P-Value 0.044 0.001
 Pressure 1.93
 T-Value 2.97
 P-Value 0.006
 S 1540 1048 998 895
 R-Sq 68.45 85.82 87.54 90.30
 R-Sq(adj) 67.52 84.96 86.37 89.04
 Mallows C-p 68.8 15.3 11.8 5.0
Best Subsets Regression
 Best Subsets Regression: ManHrs versus Capacity, Pressure, ...
 Response is ManHrs
 CPB D
 a r o r
 p e i u
 a s l m
 c s T T
 i u y y
 Mallows t r p p
 Vars R-Sq R-Sq(adj) C-p S y e e e
 1 68.5 67.5 68.8 1540.2 X
 1 43.4 41.8 148.7 2062.7 X
 2 85.8 85.0 15.3 1048.1 X X
 2 70.4 68.6 64.5 1513.8 X X
 3 87.5 86.4 11.8 997.87 X X X
 3 85.8 84.5 17.3 1064.1 X X X
 4 90.3 89.0 5.0 894.60 X X X X

Correlation Matrix
 Correlations: ManHrs, Capacity, Pressure, BoilType, DrumType

 ManHrs Capacity Pressure BoilType


 Capacity 0.827
 0.000

 Pressure 0.659 0.762


 0.000 0.000

 BoilType -0.575 -0.797 -0.902


 0.000 0.000 0.000

 DrumType 0.506 0.111 0.138 -0.075


 0.002 0.520 0.421 0.665

 Cell Contents: Pearson correlation


 P-Value

Final Regression Model
 Regression Analysis: ManHrs versus Capacity, DrumType
 The regression equation is
 ManHrs = 653 + 0.00750 Capacity + 2249 DrumType

 Predictor Coef SE Coef T P


 Constant 652.9 317.9 2.05 0.048
 Capacity 0.0074998 0.0006334 11.84 0.000
 DrumType 2248.9 353.7 6.36 0.000

 S = 1048.10 R-Sq = 85.8% R-Sq(adj) = 85.0%

 Analysis of Variance
 Source DF SS MS F P
 Regression 2 219413720 109706860 99.87 0.000
 Residual Error 33 36250895 1098512
 Total 35 255664615

 Source DF Seq SS
 Capacity 1 175007141
 DrumType 1 44406579

 Unusual Observations
 Obs Capacity ManHrs Fit SE Fit Residual St Resid
 19 1089490 14791 11073 526 3718 4.10RX
 35 610000 3211 5228 334 -2017 -2.03R

 R denotes an observation with a large standardized residual.


 X denotes an observation whose X value gives it large influence.
Residual Analysis
Residual Plots for ManHrs
Normal Probability Plot of the Residuals Residuals Versus the Fitted Values
99 4000

90
2000

Residual
Percent

50
0
10
-2000
1
-2000 0 2000 4000 0 2500 5000 7500 10000
Residual Fitted Value

Histogram of the Residuals Residuals Versus the Order of the Data


4000
10.0

2000
Frequency

7.5
Residual
5.0
0
2.5
-2000
0.0
-2000 -1000 0 1000 2000 3000 1 5 10 15 20 25 30 35
Residual Observation Order

You might also like