Professional Documents
Culture Documents
2
Property Sales price Assessed value Property Sales price
1 167.9 152.7 11 230
2 168 163.8 12 230
3 155 167.6 13 222.5
4 158.5 127.3 14 225.5
5 159.9 155.7 15 220
6 162 169 16 216
7 165 187.1 17 215
8 174.5 153.6 18 228
9 175 167.1 19 209
10 159 148.9 20 267
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.911634743
R Square 0.831077905
Adjusted R Square 0.825044973
Standard Error 0.100872244
Observations 30
ANOVA
df SS MS F
Regression 1 1.40170517505172 1.40170517505172 137.7568833154
Residual 28 0.28490587153883 0.010175209697815
Total 29 1.68661104659054
RESIDUAL OUTPUT
LN(Sale Price)
Residuals
0.1
0 4
-0.1 50 100 150 200 250 300 350
2
-0.2
-0.3 0
Assessed value 50 100
5.8
5.6
5.4
5.2
5
4.8
4.6
4.4
0 20 40 60 80
Assessed value Property Sales price Assessed value
225.4 21 283 303.9
170.4 22 269 233.7
200.4 23 255 233.6
209.6 24 285 234.2
205.2 25 146 145.1
220.9 26 128 108.3
194.9 27 126.5 136.2
231.4 28 129.9 113.3
224.2 29 150 121.4
235.1 30 195 184
Sales price -
Assessed value LN(Sale Price)
15.2 5.1233685640835
4.2 5.12396397940326
-12.6 5.04342511691925
31.2 5.06575459331734
4.2 5.07454861983991
-7.0 5.08759633523238
-22.1 5.10594547390058
20.9 5.16192474164248
7.9 5.16478597392351
10.1 5.06890420222023
4.6 5.4380793089232
59.6 5.4380793089232
22.1 5.4049271016063
15.9 5.41832015894273
14.8 5.39362754635236
-4.9 5.37527840768417
20.1 5.37063802812766
-3.4 5.42934562895444
-15.2 5.34233425196481
31.9 5.58724865840025
-20.9 5.64544689764324
35.3 5.59471137960184
21.4 5.54126354515843
50.8 5.65248918026865
0.9 4.98360662170834
19.7 4.85203026391962
-9.7 4.84024230816758
16.6 4.86676492367655
28.6 5.01063529409626
11.0 5.27299955856375
Significance F
2.50828089E-12
PROBABILITY OUTPUT
4 LN(Sale Price)
Predicted LN(Sale Price)
2
0
50 100 150 200 250 300 350
Assessed value
80 100 120
a. How many have a selling price greater than the assessed value?
answer 0
0.00%
It is true that sales prices are higher than assessed values in around 73.33% of the situations.
b. scatterplort
250
200
Sales Price
150
100
50
0
50 100 150 200
Assessed Value
c. Report the R-Square value, standard error for the regression and the least-squares
regression line for predicting selling price from assessed value (M1)
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.911688603703238
R Square 0.831176110122359
Adjusted R Square 0.825146685483872
Standard Error 19.7279088762718
Observations 30
ANOVA
df SS
Regression 1 53651.1811183464
Residual 28 10897.3308816536
Total 29 64548.512
RESIDUAL OUTPUT
10
0
50 100 150 200 250 300 350
-10
-20
-30
-40
Assessed value
According to the normal probability plot, the residuals generally cluster around a straight line. As a result, it ca
that the residual is roughly normal.
Normal Probability P
60
50
40
30
20
Residuals
10
0
0 20 40 60
-10
-20
-30
-40
Sample Percentile
f. Based on your answers to parts (b), (d), and (e), do the assumptions for the linear
regression analysis appear reasonable? Explain your answer.
assumptions:
i. Linearity: Relationship between dependent and independent variables is described b
ii. Independence: The independent variable and the residual are independent.
iii. Normality: The errors are approximately normal.
iv. Homoscedasticity: The variance of the errors is constant across all levels of the independent v
it's true that the assumptions for the linear regression analysis is reasonable.
g. Report the R-Square value, standard error for the regression and the least-squares regression line for the mo
Tell whether the estimated coefficients are significant at the level of α=0.05.
At the level of significance α=5%, the Intercept (β0) significant (with p-value = 0%), whereas the slope coeffi
significant (with p-value = 0.0%)
when p-value < α, H0 is not rejected.
Which model better fits the data? Which one is suitable to use for prediction. Explain your answer.
The coefficients of determination for both models are close. However when considering the significance of th
model (M2) is preferred.
h. Use the better fit model to calculate the predicted selling prices for homes currently
assessed at $155,000, $220,000, and $285,000.
We will use Model M2, as it’s more superior in prediction.
(M2) ln(Sales Price) = β0 + β1*(Assessed Value + Error) = 4.3581 + 0.0048(Assessed Value(x))
Sales Price
Assessed Value (in $1000) ln (Sales Price) (in $1000)
155 5.10828389338818 165.386290694272
220 5.42285482719691 226.524889702463
285 5.73742576100563 310.26468662732
d 73.33% of the situations.
Assessed Value
east-squares
Tell whether the estimated coefficients are significant at the level of α=0.05.
At the level of significance α=5%, the Intercept (β0) is not significant (with p-value = 17%), whereas the slope
coefficient (β1) is significant (with p-value = 0.0%)
p-value < α, H0 is not rejected.
MS F Significance F
53651.1811183464 137.853304413 2.48780718077414E-12
389.190388630485
PROBABILITY OUTPUT
40 60 80 100 120
Sample Percentile
or the linear
omes currently
Normal Probabil
300
250
Normal Probabil
300
250
Upper 95% Lower 95.0% Upper 95.0%
200
Sales price
52.797583544021 -9.79912001297451 52.797583544021 150
1.11200445332364 0.781631696942221 1.11200445332364 100
50
0
0 20 40 60
Sample Percen
Residuals
20
146 -24.77584 3.97082727005483
150 -23.95585 -18.9188060637204 0
50 100 150 200
155 -19.51149 -26.5114864629884 -20
No clear evvidence
thatresidual and asseses
value. They act
independently
Normal Probability Plot
Normal Probability Plot Assessed value Line Fit Plot
Sales price Predicted Sales price
400
300
Sales price
200
100
20 40 60 80 100 120 0
50 100 150 200 250 300
Sample Percentile Assessed value
Assessed value
e Line Fit Plot
redicted Sales price
Mean
Standard dev
Min
Max
Q1
Q2
Q3
15
10
5
Market Shares vs Acc
30
25
20
Market Shares
15
10
5
0
0 500 1000 1500
Accounts
Market Share
Accounts
Assets
SUMMARY OUTPUT
Regression Statistics
Multiple R
R Square
Adjusted R Square
Standard Error
Observations
ANOVA
Regression
Residual
Total
Intercept
Accounts
Assets
RESIDUAL OUTPUT
Observation
1
2
3
4
5
6
7
8
9
10
We have;
l. Residuals
Find the residuals for the multiple regression used to predict
According to the Regression Analysis Table:
RESIDUAL OUTPUT
o. Plot the residuals versus assets. Describe the plot and any
Assets
219
21.1
38.8
5.5
160
19.5
11.2
5.9
1.3
6.8
Mean
Standard dev
Min
Max
Q1
Q2
Q3
(b)
This chart isn't available in your version of Excel.
(c)
Boxplot of Marketshare shows a very symmetric distribu
Boxplot of Accounts shows that it is likely poitively stron
Boxplot of Assets shows that it is likely positively skewe
(d)
8
6
4
2
0
0 100 200 300 400 500 600
Accounts
(e)
There are positive linear relationships between each va
Yet, there is probable outlier influencing the other obse
(f)
Correlation Study
Market Share
Accounts
Assets
(g)
Due to the significant correlations, multicollinearity cou
(h)
SUMMARY OUTPUT
Regression Statistics
Multiple R
R Square
Adjusted R Square
Standard Error
Observations
ANOVA
Regression
Residual
Total
Intercept
Accounts
Assets
RESIDUAL OUTPUT
Observation
1
2
3
4
5
6
7
8
We have;
(i)
Some coefficients are not significant because of high mu
(j)
Market Share = β0 + β1*(Account + Er
(k)
(i)
Find the residuals for the multiple regression used to predict
According to the Regression Analysis Table:
RESIDUAL OUTPUT
(m)
(n)
Accounts
909
615
205
428
590
134
130
125
(o)
Assets
21.1
38.8
5.5
19.5
11.2
5.9
1.3
6.8
SUMMARY OUTPUT
Regression Statistics
Multiple R
R Square
Adjusted R Square
Standard Error
Observations
ANOVA
Regression
Residual
Total
Intercept
Assets
RESIDUAL OUTPUT
Observation
1
2
3
4
5
6
7
8
9
10
We have;
ll 3 distributions
ape or saving this workbook into a different file format will permanently Editing
Editingthis
thisshape
shapeor
orsaving
savingthis
thisworkbook
workbookin
in
. chart.
chart.
ere's an outlier for each, for Marketshare, at 27.5, for Accounts at 2500 (also acts as the 3rd Quartile) and for Assets at 219.
pairs of variables.
15
10
5
Market Shares vs Accounts Market S
30
25
20
Market Shares
15
10
5
0
500 1000 1500 2000 2500 3000 0 50
Accounts
ve linear relationships between each variables Market Share VS Accounts, between Market Share VS Assets and between Acco
obable outlier influencing the other observations from the plots.
ficant correlations, multicollinearity could be a problem if we were to analyze the link between these three variables.
ssion equation. Run a multiple regression to predict market share using
unts and assets as explanatory variables.
Regression Statistics
0.78022081343515
0.608744517717408
0.496957237065238
5.4876770230626
10
df SS MS F
2 327.981806233856 163.990903116928 5.44556155374726
7 210.802193766144 30.1145991094492
9 538.784
β0 = 5.1593630976492
β1 = -0.00031226419256
β2 = 0.082773456666666
α, H0 is not rejected.
s for the multiple regression used to predict market share with number of accounts and assets as explanatory variables.
to the Regression Analysis Table:
Residuals
4.99391037375163
6.27796511772144
3.42106926210873
4.449397050159
-8.38490852142713
1.76020357176661
-2.3021899387053
-2.80588309017945
-3.02637424628303
-4.38318957891249
cal summary of the distribution of the residuals. Are there any outliers in this distribution?
als versus the number of accounts. Describe the plot and any unusual cases.
Residuals
8
6
Residuals
4.99391037375163
Residuals
6.27796511772144 8
6
3.42106926210873
4
4.449397050159
2
-8.38490852142713
0
1.76020357176661 -2
0 500 1000 1500 2000 2500 3000
-2.3021899387053 -4
-2.80588309017945 -6
-3.02637424628303 -8
-4.38318957891249 -10
me points on the far right of the diagram likely imply the existence of two clusters or possibly some outliers. The linear link betw
nd Accounts can be restored if these two points are eliminated. However, given the tiny dataset, it would be premature to conc
that there is a positive linear association between the Residuals and Accounts variables.
uals versus assets. Describe the plot and any unusual cases.
Residuals Residuals
4.99391037375163
8
6.27796511772144 6
3.42106926210873 4
4.449397050159 2
-8.38490852142713 0
0 50 100 150 200 250
1.76020357176661 -2
-2.3021899387053 -4
-2.80588309017945 -6
-3.02637424628303 -8
-10
-4.38318957891249
me points on the far right of the diagram likely imply the existence of two clusters or possibly some outliers. The linear link betw
nd Accounts can be restored if these two points are eliminated. However, given the tiny dataset, it would be premature to conc
that there is a positive linear association between the Residuals and Accounts variables.
available in your version of Excel. This chart isn't available in your version of Excel.
ape or saving this workbook into a different file format will permanently Editing this shape or saving this workbook into a differen
. permanently break the chart.
8
6
4
2
0
100 200 300 400 500 600 700 800 900 1000 0 100 200 300
Accounts
ve linear relationships between each variables Market Share VS Accounts, between Market Share VS Assets and between Acco
obable outlier influencing the other observations from the plots.
ficant correlations, multicollinearity could be a problem if we were to analyze the link between these three variables.
Regression Statistics
0.769781562431823
0.592563653859978
0.429589115403969 (decreased, if we compare with the previous study)
3.50057931584292
8
df SS MS F
2 89.1097222674635 44.5548611337317 3.63592779260993
5 61.2702777325366 12.2540555465073
7 150.38
β0 = 1.84526113461613
β1 = 0.0066302275554295
β2 = 0.156635034598039
nts are not significant because of high multicollinearity between two independent variables (accounts and assets) or because th
Market Share = β0 + β1*(Account + Error)+ β2*(Assets+ Error)= = Market Share = 1.845 + 0.00066*(Account
s for the multiple regression used to predict market share with number of accounts and assets as explanatory variables.
to the Regression Analysis Table:
Residuals
1.72286278747984
-0.400290423609167
5.93404952623161
0.662618296998293
-3.91140777981757
-0.85785833117211
-0.710816261799415
-2.43915781431148
Residuals
1.72286278747984 Residuals
-0.400290423609167 8
5.93404952623161 6
0.662618296998293 4
-3.91140777981757 2
-0.85785833117211
0
-0.710816261799415 0 100 200 300 400 500 600 700 800 900 1000
-2
-2.43915781431148
-4
-6
normal and no linear relationship between the Residuals and Accounts is observed.
Residuals
8
6
Residuals
Residuals
1.72286278747984 8
-0.400290423609167 6
5.93404952623161 4
0.662618296998293 2
-3.91140777981757
0
-0.85785833117211 0 5 10 15 20 25 30 35 40
-2
-0.710816261799415
-2.43915781431148 -4
-6
normal and no linear relationship between the Residuals and Assets is observed.
hat the model is a good one? Propose another model that has higher adjusted
wer regression standard error.
de or exclude the outliers, we can observe that the aforementioned model is not significant because all
fficients are not significant.
her analysis nor prediction can be done using this model.
hat the coefficients are not important because of multicollinearity. We can then decrease the number of
riables in the model to increase the significance of the coefficients.
rrelation between assets and market share in the original data is stronger than the adjustment between
arket share, assets will remain the single independent variable in our analysis.
m the analysis using the original data and contrast the R Square and the Regression Standard Error.
PREVIOUS MODEL
df SS MS F
1 327.938753267235 327.938753267235 12.4428227185175
8 210.845246732765 26.3556558415957
9 538.784
β0 = 5.08364415690648
β1 = 0.079254873095349
β2 = 0
This model is better than the previous model, both for analysis and prediction.
n't
n'tavailable
availablein
inyour
yourversion
versionof
ofExcel.
Excel. This chart isn't available in your versio
hape or
shape orsaving
savingthis
thisworkbook
workbookinto
intoaadifferent
differentfile
fileformat
formatwill
willpermanently
permanentlybreak
breakthe
the Editing this shape or saving this workb
permanently break the chart.
1500
1000
500
Market Shares vs Assets Account
3000
2500
2000
Accounts
1500
1000
500
0
50 100 150 200 250 0 50 10
Assets
PROBABILITY OUTPUT
0 250
ng this workbook into a different file format will Editing this shape or saving this workbook into a different file format will per
hart. break the chart.
500
400
300
200
100
0
100 200 300 400 500 600 700 800 900 1000 0 5 10 15
Assets
VS Assets and between Accounts VS Assets for all graphs.
Significance F
0.105961822934921
PROBABILITY OUTPUT
Percentile Market Share
6.25 1.3
18.75 2.2
31.25 2.8
43.75 3.6
56.25 8.4
68.75 10
81.25 11.6
93.75 12.9
of α=5%.
nt (β1) is not significant (with p-value = 34.76%) and coefficient (β2)
ory variables.
00 900 1000
35 40
Significance F
0.0077610096754001
diting this shape or saving this workbook into a different file format will
ermanently break the chart.
Accounts vs Assets
3000
2500
2000
Accounts
1500
1000
500
Accounts vs Assets
3000
2500
2000
Accounts
1500
1000
500
0
0 50 100 150 200 250
Assets
of Excel.
Accounts vs Assets
1000
900
800
700
600
Accounts
500
400
300
200
100
0
0 5 10 15 20 25 30 35 40 45
Assets