You are on page 1of 63

TABLE 10.

2
Property Sales price Assessed value Property Sales price
1 167.9 152.7 11 230
2 168 163.8 12 230
3 155 167.6 13 222.5
4 158.5 127.3 14 225.5
5 159.9 155.7 15 220
6 162 169 16 216
7 165 187.1 17 215
8 174.5 153.6 18 228
9 175 167.1 19 209
10 159 148.9 20 267

a. Data Compressed into Column Sales Price >


Property Sales price Assessed value Assessed Value
1 167.9 152.7 YES
2 168.0 163.8 YES
3 155.0 167.6 NO
4 158.5 127.3 YES
5 159.9 155.7 YES
6 162.0 169.0 NO
7 165.0 187.1 NO
8 174.5 153.6 YES
9 175.0 167.1 YES
10 159.0 148.9 YES
11 230.0 225.4 YES
12 230.0 170.4 YES
13 222.5 200.4 YES
14 225.5 209.6 YES
15 220.0 205.2 YES
16 216.0 220.9 NO
17 215.0 194.9 YES
18 228.0 231.4 NO
19 209.0 224.2 NO
20 267.0 235.1 YES
21 283.0 303.9 NO
22 269.0 233.7 YES
23 255.0 233.6 YES
24 285.0 234.2 YES
25 146.0 145.1 YES
26 128.0 108.3 YES
27 126.5 136.2 NO
28 129.9 113.3 YES
29 150.0 121.4 YES
30 195.0 184.0 YES

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.911634743
R Square 0.831077905
Adjusted R Square 0.825044973
Standard Error 0.100872244
Observations 30

ANOVA
df SS MS F
Regression 1 1.40170517505172 1.40170517505172 137.7568833154
Residual 28 0.28490587153883 0.010175209697815
Total 29 1.68661104659054

Coefficients Standard Error t Stat P-value


Intercept 4.358153205 0.07812604097169 55.7836177396251 3.01898261E-30
Assessed value 0.004839553 0.00041233346222 11.7369878297374 2.50828089E-12

RESIDUAL OUTPUT

Observation Predicted LN(Sale Price) Residuals Standard Residuals


1 5.097152922 0.02621564219931 0.264489722021359
2 5.150871958 -0.0269079788698 -0.27147470953868
3 5.169262259 -0.1258371420995 -1.26957144443869
4 4.97422828 0.09152631325995 0.923409351093293
5 5.11167158 -0.0371229605278 -0.37453370151854
6 5.176037633 -0.0884412977454 -0.89228461687246
7 5.263633539 -0.1576880652608 -1.59091553927348
8 5.101508519 0.06041622221326 0.609539514511021
9 5.166842483 -0.0020565086814 -0.02074812454847
10 5.078762621 -0.0098584189182 -0.09946162903136
11 5.448988412 -0.010909103544 -0.11006199054405
12 5.182813007 0.25526630198649 2.57538277111482
13 5.327999592 0.07692750983479 0.776121963265535
14 5.372523478 0.04579668115522 0.462042904619855
15 5.351229445 0.04239810100729 0.427754615522818
16 5.427210425 -0.0519320170578 -0.52394233378652
17 5.301382051 0.0692559769092 0.698723836012366
18 5.478025729 -0.0486801004797 -0.49113373404907
19 5.443180949 -0.100846697109 -1.01744274209767
20 5.495932075 0.09131658350317 0.92129338671413
21 5.828893309 -0.1834464118083 -1.85079160358176
22 5.489156701 0.10555467866371 1.06494158737599
23 5.488672746 0.05259079950308 0.530588792587904
24 5.491576477 0.16091270291661 1.62344892185141
25 5.06037232 -0.0767656986844 -0.77448944990305
26 4.882276776 -0.0302465124091 -0.30515718815383
27 5.0173003 -0.1770579920575 -1.78633881042838
28 4.90647454 -0.0397096167913 -0.40063048720471
29 4.945674918 0.06496037572304 0.655385497969607
30 5.248630925 0.0243686331686 0.245855240310347

Assessed value Residual Plot


0.3 8
0.2
6

LN(Sale Price)
Residuals

0.1
0 4
-0.1 50 100 150 200 250 300 350
2
-0.2
-0.3 0
Assessed value 50 100

5.8
5.6
5.4
5.2
5
4.8
4.6
4.4
0 20 40 60 80
Assessed value Property Sales price Assessed value
225.4 21 283 303.9
170.4 22 269 233.7
200.4 23 255 233.6
209.6 24 285 234.2
205.2 25 146 145.1
220.9 26 128 108.3
194.9 27 126.5 136.2
231.4 28 129.9 113.3
224.2 29 150 121.4
235.1 30 195 184

Sales price -
Assessed value LN(Sale Price)
15.2 5.1233685640835
4.2 5.12396397940326
-12.6 5.04342511691925
31.2 5.06575459331734
4.2 5.07454861983991
-7.0 5.08759633523238
-22.1 5.10594547390058
20.9 5.16192474164248
7.9 5.16478597392351
10.1 5.06890420222023
4.6 5.4380793089232
59.6 5.4380793089232
22.1 5.4049271016063
15.9 5.41832015894273
14.8 5.39362754635236
-4.9 5.37527840768417
20.1 5.37063802812766
-3.4 5.42934562895444
-15.2 5.34233425196481
31.9 5.58724865840025
-20.9 5.64544689764324
35.3 5.59471137960184
21.4 5.54126354515843
50.8 5.65248918026865
0.9 4.98360662170834
19.7 4.85203026391962
-9.7 4.84024230816758
16.6 4.86676492367655
28.6 5.01063529409626
11.0 5.27299955856375

Significance F
2.50828089E-12

Lower 95% Upper 95% Lower 95.0% Upper 95.0%


4.1981192647885 4.51818714536167 4.198119265 4.51818714536167
0.003994926019 0.0056841796366316 0.003994926 0.00568417963663

PROBABILITY OUTPUT

Percentile LN(Sale Price)


1.6666666666667 4.84024230816758
5 4.85203026391962
8.3333333333333 4.86676492367655
11.666666666667 4.98360662170834
15 5.01063529409626
18.333333333333 5.04342511691925
21.666666666667 5.06575459331734
25 5.06890420222023
28.333333333333 5.07454861983991
31.666666666667 5.08759633523238
35 5.10594547390058
38.333333333333 5.1233685640835
41.666666666667 5.12396397940326
45 5.16192474164248
48.333333333333 5.16478597392351
51.666666666667 5.27299955856375
55 5.34233425196481
58.333333333333 5.37063802812766
61.666666666667 5.37527840768417
65 5.39362754635236
68.333333333333 5.4049271016063
71.666666666667 5.41832015894273
75 5.42934562895444
78.333333333333 5.4380793089232
81.666666666667 5.4380793089232
85 5.54126354515843
88.333333333333 5.58724865840025
91.666666666667 5.59471137960184
95 5.64544689764324
98.333333333333 5.65248918026865

Assessed value Line Fit Plot


8
6
LN(Sale Price)

4 LN(Sale Price)
Predicted LN(Sale Price)
2
0
50 100 150 200 250 300 350
Assessed value

80 100 120
a. How many have a selling price greater than the assessed value?

answer 0
0.00%

It is true that sales prices are higher than assessed values in around 73.33% of the situations.

b. scatterplort

Sales Price VS Assessed Value


300

250

200
Sales Price

150

100

50

0
50 100 150 200

Assessed Value
c. Report the R-Square value, standard error for the regression and the least-squares
regression line for predicting selling price from assessed value (M1)

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.911688603703238
R Square 0.831176110122359
Adjusted R Square 0.825146685483872
Standard Error 19.7279088762718
Observations 30

ANOVA
df SS
Regression 1 53651.1811183464
Residual 28 10897.3308816536
Total 29 64548.512

Coefficients Standard Error


Intercept β0 21.4992317655233 15.27936079693
Assessed value β1 0.94681807513293 0.080641379743451

RESIDUAL OUTPUT

Observation Predicted Sales price Residuals


1 166.078351838322 1.82164816167841
2 176.588032472297 -8.58803247229713
3 180.185941157802 -25.1859411578022
4 142.029172729945 16.4708272700548
5 168.91880606372 -9.01880606372038
6 181.511486462988 -19.5114864629884
7 198.648893622894 -33.6488936228944
8 166.930488105941 7.56951189405876
9 179.712532120236 -4.71253212023578
10 162.480443152816 -3.48044315281646
11 234.912025900486 -4.91202590048556
12 182.837031768174 47.1629682318255
13 211.241574022162 11.2584259778377
14 219.952300313385 5.54769968661472
15 215.7863007828 4.21369921719963
16 230.651344562387 -14.6513445623874
17 206.034074608931 8.96592539106877
18 240.592934351283 -12.5929343512832
19 233.775844210326 -24.775844210326
20 244.096161229275 22.903838770725
21 309.237244798421 -26.2372447984205
22 242.770615924089 26.2293840759111
23 242.675934116576 12.3240658834244
24 243.244024961655 41.7559750383447
25 158.882534467311 -12.8825344673113
26 124.03962930242 3.96037069758047
27 150.455853598628 -23.9558535986282
28 128.773719678084 1.12628032191583
29 136.442946086661 13.5570539133391
30 195.713757589982 -0.713757589982293
19.3847886344383
d. Obtain the residuals and plot them versus assessed value. Is there anything unusual to
report? If so, explain.

Assessed value Residual Plot


60
50
40
30
20
Residuals

10
0
50 100 150 200 250 300 350
-10
-20
-30
-40
Assessed value

e. Do the residuals appear to be approximately Normal? Explain your answer.

According to the normal probability plot, the residuals generally cluster around a straight line. As a result, it ca
that the residual is roughly normal.

Normal Probability P
60
50
40
30
20
Residuals

10
0
0 20 40 60
-10
-20
-30
-40
Sample Percentile

f. Based on your answers to parts (b), (d), and (e), do the assumptions for the linear
regression analysis appear reasonable? Explain your answer.

assumptions:
i. Linearity: Relationship between dependent and independent variables is described b
ii. Independence: The independent variable and the residual are independent.
iii. Normality: The errors are approximately normal.
iv. Homoscedasticity: The variance of the errors is constant across all levels of the independent v
it's true that the assumptions for the linear regression analysis is reasonable.

g. Report the R-Square value, standard error for the regression and the least-squares regression line for the mo

According to the Regression Analysis Table:


R Square 0.831077904941533
Standard Error 0.100872244437285

Tell whether the estimated coefficients are significant at the level of α=0.05.
At the level of significance α=5%, the Intercept (β0) significant (with p-value = 0%), whereas the slope coeffi
significant (with p-value = 0.0%)
when p-value < α, H0 is not rejected.

Which model better fits the data? Which one is suitable to use for prediction. Explain your answer.
The coefficients of determination for both models are close. However when considering the significance of th
model (M2) is preferred.

h. Use the better fit model to calculate the predicted selling prices for homes currently
assessed at $155,000, $220,000, and $285,000.
We will use Model M2, as it’s more superior in prediction.
(M2) ln(Sales Price) = β0 + β1*(Assessed Value + Error) = 4.3581 + 0.0048(Assessed Value(x))
Sales Price
Assessed Value (in $1000) ln (Sales Price) (in $1000)
155 5.10828389338818 165.386290694272
220 5.42285482719691 226.524889702463
285 5.73742576100563 310.26468662732
d 73.33% of the situations.

s Price VS Assessed Value

200 250 300 350

Assessed Value
east-squares

According to the Regression Analysis Table:


R Square 0.831176110122
Standard Error 19.72790887627

Tell whether the estimated coefficients are significant at the level of α=0.05.
At the level of significance α=5%, the Intercept (β0) is not significant (with p-value = 17%), whereas the slope
coefficient (β1) is significant (with p-value = 0.0%)
p-value < α, H0 is not rejected.

MS F Significance F
53651.1811183464 137.853304413 2.48780718077414E-12
389.190388630485

t Stat P-value Lower 95%


1.40707664746309 0.17040860349 -9.79912001297451
11.7410946854467 2.4878072E-12 0.781631696942221

PROBABILITY OUTPUT

Standard Residuals Percentile


0.0939730732189743 1.67
-0.443029461618166 5.00
-1.29926313011522 8.33
0.849677939783845 11.67
-0.465251710183618 15.00
-1.00653594067696 18.33
-1.73584010934816 21.67
0.390487202971667 25.00
-0.243104642980923 28.33
-0.179545065899415 31.67
-0.253395896809472 35.00
2.43298852111483 38.33
0.580786625541863 41.67
0.286188299043864 45.00
0.217371429560688 48.33
-0.75581657549561 51.67
0.4625237633564 55.00
-0.64962969618926 58.33
-1.27810752428377 61.67
1.18153667819906 65.00
-1.35349656337281 68.33
1.35309105353426 71.67
0.635759621414182 75.00
2.15405882549385 78.33
-0.664569251192386 81.67
0.204303011617296 85.00
-1.2358067993617 88.33
0.0581012433591833 91.67
0.699365578289263 95.00
-0.0368204989717689 98.33
ything unusual to

This graph is from the Regression Data Analysis Tool.


Plot

the plot is typical but one


point sticks out greatly
from the rest and is likely
to be outliers

250 300 350

around a straight line. As a result, it can be concluded


y normal.

Normal Probability Plot

40 60 80 100 120

Sample Percentile

or the linear

ndependent variables is described by a linear relationship.


dual are independent.

cross all levels of the independent variables.


easonable.

east-squares regression line for the model M2.


In Percentage Value
P-value
0%
0%
Since P-value is very small.

p-value = 0%), whereas the slope coefficient (β1) is


= 0.0%)

ction. Explain your answer.


when considering the significance of the coefficients,
red.

omes currently

= 4.3581 + 0.0048(Assessed Value(x))


p-value = 17%), whereas the slope
.0%)

Normal Probabil
300
250
Normal Probabil
300
250
Upper 95% Lower 95.0% Upper 95.0%
200

Sales price
52.797583544021 -9.79912001297451 52.797583544021 150
1.11200445332364 0.781631696942221 1.11200445332364 100
50
0
0 20 40 60
Sample Percen

Sales price Residuals Assessed value Re


126.5 -33.64889 -39.5783518383216
60
128 -26.23724 -48.5880324722971
40
129.9 -25.18594 -50.2859411578022

Residuals
20
146 -24.77584 3.97082727005483
150 -23.95585 -18.9188060637204 0
50 100 150 200
155 -19.51149 -26.5114864629884 -20

158.5 -14.65134 -40.1488936228944 -40


159 -12.88253 -7.93048810594124 Assessed v
159.9 -12.59293 -19.8125321202358
162 -9.01881 -0.480443152816463
165 -8.58803 -69.9120259004856
167.9 -4.91203 -14.9370317681745
168 -4.71253 -43.2415740221623
174.5 -3.48044 -45.4523003133853
175 -0.71376 -40.7863007828004
195 1.12628 -35.6513445623874
209 1.82165 2.96592539106877
215 3.96037 -25.5929343512832
216 4.21370 -17.775844210326
220 5.54770 -24.096161229275
222.5 7.56951 -86.7372447984205
225.5 8.96593 -17.2706159240889
228 11.25843 -14.6759341165756
230 12.32407 -13.2440249616553
230 13.55705 71.1174655326887
255 16.47083 130.96037069758
267 22.90384 116.544146401372
269 26.22938 140.226280321916
283 41.75598 146.557053913339
285 47.16297 89.2862424100177
om the Regression Data Analysis Tool.

No clear evvidence
thatresidual and asseses
value. They act
independently
Normal Probability Plot
Normal Probability Plot Assessed value Line Fit Plot
Sales price Predicted Sales price
400
300

Sales price
200
100
20 40 60 80 100 120 0
50 100 150 200 250 300
Sample Percentile Assessed value

Assessed value Residual Plot

100 150 200 250 300 350

Assessed value
e Line Fit Plot
redicted Sales price

200 250 300 350


essed value
Brokerage Market Share Accounts Assets
Charles Schwab 27.5 2500 219
E*Trade 12.9 909 21.1
TD Waterhouse 11.6 615 38.8
Datek 10 205 5.5
Fidelity 9.3 2300 160
Ameritrade 8.4 428 19.5
DLJ Direct 3.6 590 11.2
Discover 2.8 134 5.9
Suretrade 2.2 130 1.3
National Discount 1.3 125 6.8
From the previous plots, part (b), we can dectect one outlier, that is the values of Charles Swarb.
However, after removing this outlier, another outlier is detected (that is, Fidelity).
Hence the resulted data would be:
Charles Schwab 27.5 2500 219
Fidelity 9.3 2300 160
Brokerage Market Share Accounts Assets
E*Trade 12.9 909 21.1
TD Waterhouse 11.6 615 38.8
Datek 10 205 5.5
Ameritrade 8.4 428 19.5
DLJ Direct 3.6 590 11.2
Discover 2.8 134 5.9
Suretrade 2.2 130 1.3
National Discount Brokers 1.3 125 6.8
Brokerage Market Share Assets Accounts
Charles Schwab 27.5 219 2500
E*Trade 12.9 21.1 909
TD Waterhouse 11.6 38.8 615
Datek 10 5.5 205
Fidelity 9.3 160 2300
Ameritrade 8.4 19.5 428
DLJ Direct 3.6 11.2 590
Discover 2.8 5.9 134
Suretrade 2.2 1.3 130
National Discount Brokers 1.3 6.8 125
Solution
a. Data analysis: Individual variables.

Mean
Standard dev
Min
Max
Q1
Q2
Q3

b. Box-plots of all 3 distributions

This chart isn't available in your version of Excel.

Editing this shape or saving this workbook into a different f


break the chart.

c. Describe the distributions. Are there any unsual observatio


Boxplot of Marketshare shows a very symmetric distribu
Boxplot of Accounts shows that it is likely poitively stron
Boxplot of Assets shows that it is likely positively skewe

In each case, there's an outlier for each, for Marketshar

d. Data analysis: pairs of variables.

Market Shares vs Acc


30
25
20
Market Shares

15
10
5
Market Shares vs Acc
30
25
20

Market Shares
15
10
5
0
0 500 1000 1500
Accounts

e. Summarize these relationships. Are there any influential o

There are positive linear relationships between each va


Yet, there is probable outlier influencing the other obse

f. Find the correlation between each pair of variables. Descri

Market Share
Accounts
Assets

All 3 variables have strong postive linear correlation.


The correlation(Marketshare vs Accounts) = 75%.
The correlation(Marketshare vs Assets) = 78%.
The correlation(Accounts vs Assets) = 97%.

g. From the correlation table, do you think there is issue of m

Due to the significant correlations, multicollinearity cou


h. Multiple regression equation.
number of accounts and assets as explanatory variables

SUMMARY OUTPUT

Regression Statistics
Multiple R
R Square
Adjusted R Square
Standard Error
Observations

ANOVA
Regression
Residual
Total

Intercept
Accounts
Assets

RESIDUAL OUTPUT

Observation
1
2
3
4
5
6
7
8
9
10

We have;

Describe the significances of the estimated coefficients.


Tell whether the estimated coefficients are significant at the

when p-value < α, H0 is not rejected.

i. Can you explain why some coefficients are not significant?


Some coefficients are not significant because of high mu

j. Give the equation for predicted market share.


Market Share = β0 + β1*(Account + Er
k. What is the value of the adjusted coefficient of determinati

According to the Regression Analysis Table:


We have;

l. Residuals
Find the residuals for the multiple regression used to predict
According to the Regression Analysis Table:

RESIDUAL OUTPUT

m. Give a graphical summary of the distribution of the residu

This chart isn't available in your version of Excel.

Editing this shape or saving this workbook into a differ


permanently break the chart.

The residual distribution appears to be normal, as there

n. Plot the residuals versus the number of accounts. Describe


Accounts
2500
909
615
205
2300
428
590
134
130
125

The two extreme points on the far right of the diagram


the Residuals and Accounts can be restored if these two
that there is a positive line

o. Plot the residuals versus assets. Describe the plot and any

Assets
219
21.1
38.8
5.5
160
19.5
11.2
5.9
1.3
6.8

The two extreme points on the far right of the diagram


the Residuals and Accounts can be restored if these two
that there is a positive line

es of Charles Swarb. p. Without including the outlier(s), Redo (a) to (o)


(a)

Mean
Standard dev
Min
Max
Q1
Q2
Q3

(b)
This chart isn't available in your version of Excel.

Editing this shape or saving this workbook into a different f


break the chart.

(c)
Boxplot of Marketshare shows a very symmetric distribu
Boxplot of Accounts shows that it is likely poitively stron
Boxplot of Assets shows that it is likely positively skewe

(d)

Market Shares vs Accou


14
12
10
Market Shares

8
6
4
2
0
0 100 200 300 400 500 600
Accounts

(e)
There are positive linear relationships between each va
Yet, there is probable outlier influencing the other obse

(f)
Correlation Study

Market Share
Accounts
Assets

All 3 variables have strong postive linear correlation.


For example, The correlation(Marketshare vs Accounts) = 71.18%.
The correlation(Marketshare vs Assets) = 71.07%.
The correlation(Accounts vs Assets) = 70.74%.

(g)
Due to the significant correlations, multicollinearity cou

(h)
SUMMARY OUTPUT

Regression Statistics
Multiple R
R Square
Adjusted R Square
Standard Error
Observations

ANOVA

Regression
Residual
Total

Intercept
Accounts
Assets

RESIDUAL OUTPUT
Observation
1
2
3
4
5
6
7
8

We have;

Describe the significances of the estimated coefficients.


Te
At the level of significance α=5%, the Intercept (β0) is not si

*Remember: Significant is when p-value < α, H0 is not r

(i)
Some coefficients are not significant because of high mu

(j)
Market Share = β0 + β1*(Account + Er

(k)

According to the Regression Analysis Table:


We have;

(i)
Find the residuals for the multiple regression used to predict
According to the Regression Analysis Table:

RESIDUAL OUTPUT
(m)

This chart isn't available in your version of Excel.

Editing this shape or saving this workbook into a differ


permanently break the chart.

The residual distribution appears to be normal, as there

(n)

Accounts
909
615
205
428
590
134
130
125

The plot looks normal and no linear relationship betwee

(o)
Assets
21.1
38.8
5.5
19.5
11.2
5.9
1.3
6.8

The plot looks normal and no linear relationship betwee

q. Do you think that the model is a good one? Propose anoth


R-Square and lower regression standard error.

When we include or exclude the outliers, we can observ


estimated coefficients are not significant.
As a result, neither analysis nor prediction can be done
We are aware that the coefficients are not important be
independent variables in the model to increase the sign
Because the correlation between assets and market sha
accounts and market share, assets will remain the single
We will perform the analysis using the original data and

SUMMARY OUTPUT

Regression Statistics
Multiple R
R Square
Adjusted R Square
Standard Error
Observations

ANOVA

Regression
Residual
Total

Intercept
Assets
RESIDUAL OUTPUT

Observation
1
2
3
4
5
6
7
8
9
10

We have;

Describe the significances of the estimated coefficients.


Tell whether the estimated coe
At the level of significance α=5%, the Intercept (β0) is signific
(with
*Remember: Significant is when p-value < α, H0 is not r

Market Share = β0 + β1*(As

In Conclusion, This model is better than the previous mo


Individual variables.

Market Share Accounts Assets


8.96 793.6 48.91
7.73724 886.33367 76.16389
1.3 125 1.3
27.5 2500 219
3 151.75 6.125
8.85 509 15.35
11.2 835.5 34.375

ll 3 distributions

available in your version of Excel. This


Thischart
chartisn't
isn'tavailable
availablein
inyour
yourversion
versionof
ofEE

ape or saving this workbook into a different file format will permanently Editing
Editingthis
thisshape
shapeor
orsaving
savingthis
thisworkbook
workbookin
in
. chart.
chart.

istributions. Are there any unsual observations?


ketshare shows a very symmetric distribution.
unts shows that it is likely poitively strongly skewed to the right.
ts shows that it is likely positively skewed to the right as well but probably less than of Boxplot of Accounts.

ere's an outlier for each, for Marketshare, at 27.5, for Accounts at 2500 (also acts as the 3rd Quartile) and for Assets at 219.

pairs of variables.

Market Shares vs Accounts Market S


30
25
20
Market Shares

15
10
5
Market Shares vs Accounts Market S
30
25
20

Market Shares
15
10
5
0
500 1000 1500 2000 2500 3000 0 50
Accounts

ese relationships. Are there any influential observations?

ve linear relationships between each variables Market Share VS Accounts, between Market Share VS Assets and between Acco
obable outlier influencing the other observations from the plots.

ation between each pair of variables. Describe .

Market Share Accounts Assets


100%
75% 100%
78% 97% 100%

ave strong postive linear correlation.


(Marketshare vs Accounts) = 75%.
(Marketshare vs Assets) = 78%.
(Accounts vs Assets) = 97%.

elation table, do you think there is issue of multicollinearity? Explain .

ficant correlations, multicollinearity could be a problem if we were to analyze the link between these three variables.
ssion equation. Run a multiple regression to predict market share using
unts and assets as explanatory variables.

Regression Statistics
0.78022081343515
0.608744517717408
0.496957237065238
5.4876770230626
10
df SS MS F
2 327.981806233856 163.990903116928 5.44556155374726
7 210.802193766144 30.1145991094492
9 538.784

Coefficients Standard Error t Stat P-value


5.1593630976492 2.89856280843925 1.77997284813963 0.118299636815563
-0.000312264192560271 0.0082586524160534 -0.0378105502967146 0.970894448288601
0.082773456666666 0.09610750340693 0.86125904567715 0.417617753920517

Predicted Market Share Residuals Standard Residuals


22.5060896262484 4.99391037375163 1.03186869129023
6.62203488227856 6.27796511772144 1.29718700680692
8.17893073789127 3.42106926210873 0.706879779192633
5.55060294984101 4.449397050159 0.919358412059177
17.6849085214271 -8.38490852142713 -1.73253501465895
6.63979642823339 1.76020357176661 0.363702753968073
5.9021899387053 -2.3021899387053 -0.475690899788564
5.60588309017945 -2.80588309017945 -0.579766694932046
5.22637424628303 -3.02637424628303 -0.625325766613733
5.68318957891249 -4.38318957891249 -0.905678267323732

β0 = 5.1593630976492
β1 = -0.00031226419256
β2 = 0.082773456666666

ificances of the estimated coefficients.


estimated coefficients are significant at the level of α=5%.

α, H0 is not rejected.

n why some coefficients are not significant?


nts are not significant because of high multicollinearity between two independent variables (accounts and assets) or because th

on for predicted market share.


Market Share = β0 + β1*(Account + Error)+ β2*(Assets+ Error)= = Market Share = 5.16 -0.0003*(Account +
lue of the adjusted coefficient of determination the regression standard errors?

to the Regression Analysis Table:


Adjusted R Square 0.496957237065238
Standard Error 5.4876770230626

s for the multiple regression used to predict market share with number of accounts and assets as explanatory variables.
to the Regression Analysis Table:

Residuals
4.99391037375163
6.27796511772144
3.42106926210873
4.449397050159
-8.38490852142713
1.76020357176661
-2.3021899387053
-2.80588309017945
-3.02637424628303
-4.38318957891249

cal summary of the distribution of the residuals. Are there any outliers in this distribution?

isn't available in your version of Excel.

s shape or saving this workbook into a different file format will


ly break the chart.

tribution appears to be normal, as there are no outliers to be seen.

als versus the number of accounts. Describe the plot and any unusual cases.

Residuals
8
6
Residuals
4.99391037375163
Residuals
6.27796511772144 8
6
3.42106926210873
4
4.449397050159
2
-8.38490852142713
0
1.76020357176661 -2
0 500 1000 1500 2000 2500 3000
-2.3021899387053 -4
-2.80588309017945 -6
-3.02637424628303 -8
-4.38318957891249 -10

me points on the far right of the diagram likely imply the existence of two clusters or possibly some outliers. The linear link betw
nd Accounts can be restored if these two points are eliminated. However, given the tiny dataset, it would be premature to conc
that there is a positive linear association between the Residuals and Accounts variables.

uals versus assets. Describe the plot and any unusual cases.

Residuals Residuals
4.99391037375163
8
6.27796511772144 6
3.42106926210873 4
4.449397050159 2
-8.38490852142713 0
0 50 100 150 200 250
1.76020357176661 -2
-2.3021899387053 -4
-2.80588309017945 -6
-3.02637424628303 -8
-10
-4.38318957891249

me points on the far right of the diagram likely imply the existence of two clusters or possibly some outliers. The linear link betw
nd Accounts can be restored if these two points are eliminated. However, given the tiny dataset, it would be premature to conc
that there is a positive linear association between the Residuals and Accounts variables.

ding the outlier(s), Redo (a) to (o)

Market Share Accounts Assets


6.60 392.00 13.76
4.63496 292.52204 12.27121
1.3 125 1.3
12.9 909 38.8
2.65 133 5.8
6 316.5 9
10.4 596.25 19.9

available in your version of Excel. This chart isn't available in your version of Excel.

ape or saving this workbook into a different file format will permanently Editing this shape or saving this workbook into a differen
. permanently break the chart.

ketshare shows a very symmetric distribution.


unts shows that it is likely poitively strongly skewed to the right.
ts shows that it is likely positively skewed to the right as well but probably less than of Boxplot of Accounts.

Market Shares vs Accounts Marke


14
12
10
Market Shares

8
6
4
2
0
100 200 300 400 500 600 700 800 900 1000 0 100 200 300
Accounts
ve linear relationships between each variables Market Share VS Accounts, between Market Share VS Assets and between Acco
obable outlier influencing the other observations from the plots.

Market Share Accounts Assets


100%
71.18% 100%
71.07% 70.74% 100%

ave strong postive linear correlation.


(Marketshare vs Accounts) = 71.18%.
(Marketshare vs Assets) = 71.07%.
(Accounts vs Assets) = 70.74%.

ficant correlations, multicollinearity could be a problem if we were to analyze the link between these three variables.

Regression Statistics
0.769781562431823
0.592563653859978
0.429589115403969 (decreased, if we compare with the previous study)
3.50057931584292
8

df SS MS F
2 89.1097222674635 44.5548611337317 3.63592779260993
5 61.2702777325366 12.2540555465073
7 150.38

Coefficients Standard Error t Stat P-value


1.84526113461613 2.18655975093228 0.843910683817062 0.437220907160777
0.0066302275554295 0.0063988003726177 1.03616727657299 0.347613175292493
0.156635034598039 0.152535048055267 1.02687898024123 0.351566153206588
Predicted Market Share Residuals Standard Residuals
11.1771372125202 1.72286278747984 0.582337538243141
12.0002904236092 -0.400290423609167 -0.135300467083537
4.06595047376839 5.93404952623161 2.00574289376427
7.73738170300171 0.662618296998293 0.223968798138176
7.51140777981757 -3.91140777981757 -1.32207834199953
3.65785833117211 -0.85785833117211 -0.289961053408603
2.91081626179942 -0.710816261799415 -0.240259987648208
3.73915781431148 -2.43915781431148 -0.824449380005712

β0 = 1.84526113461613
β1 = 0.0066302275554295
β2 = 0.156635034598039

ificances of the estimated coefficients.


Tell whether the estimated coefficients are significant at the level of α=5%.
gnificance α=5%, the Intercept (β0) is not significant (with p-value = 43.7%), whereas the 2 slope coefficient (β1) is not significant (with p-v
is not significant (with p-value = 35.16%).
gnificant is when p-value < α, H0 is not rejected.

nts are not significant because of high multicollinearity between two independent variables (accounts and assets) or because th

Market Share = β0 + β1*(Account + Error)+ β2*(Assets+ Error)= = Market Share = 1.845 + 0.00066*(Account

to the Regression Analysis Table:


Adjusted R Square 0.429589115403969
Standard Error 3.50057931584292

s for the multiple regression used to predict market share with number of accounts and assets as explanatory variables.
to the Regression Analysis Table:

Residuals
1.72286278747984
-0.400290423609167
5.93404952623161
0.662618296998293
-3.91140777981757
-0.85785833117211
-0.710816261799415
-2.43915781431148

isn't available in your version of Excel.

s shape or saving this workbook into a different file format will


ly break the chart.

tribution appears to be normal, as there are no outliers to be seen.

Residuals
1.72286278747984 Residuals
-0.400290423609167 8

5.93404952623161 6
0.662618296998293 4
-3.91140777981757 2
-0.85785833117211
0
-0.710816261799415 0 100 200 300 400 500 600 700 800 900 1000
-2
-2.43915781431148
-4

-6

normal and no linear relationship between the Residuals and Accounts is observed.

Residuals
8

6
Residuals
Residuals
1.72286278747984 8

-0.400290423609167 6
5.93404952623161 4
0.662618296998293 2
-3.91140777981757
0
-0.85785833117211 0 5 10 15 20 25 30 35 40
-2
-0.710816261799415
-2.43915781431148 -4

-6
normal and no linear relationship between the Residuals and Assets is observed.

hat the model is a good one? Propose another model that has higher adjusted
wer regression standard error.

de or exclude the outliers, we can observe that the aforementioned model is not significant because all
fficients are not significant.
her analysis nor prediction can be done using this model.
hat the coefficients are not important because of multicollinearity. We can then decrease the number of
riables in the model to increase the significance of the coefficients.
rrelation between assets and market share in the original data is stronger than the adjustment between
arket share, assets will remain the single independent variable in our analysis.
m the analysis using the original data and contrast the R Square and the Regression Standard Error.

PREVIOUS MODEL

Regression Statistics Regression Statistics


0.780169603394825 Multiple R 0.78022081343515
0.608664610061239 R Square 0.608744517717408
0.559747686318894 > Adjusted R Square 0.496957237065238
5.1337759828021 < Standard Error 5.4876770230626
10 Observations 10

df SS MS F
1 327.938753267235 327.938753267235 12.4428227185175
8 210.845246732765 26.3556558415957
9 538.784

Coefficients Standard Error t Stat P-value


5.08364415690648 1.96040307172376 2.59316271752038 0.0319556108003974
0.079254873095349 0.0224681087960817 3.52743854921918 0.00776100967540011
Predicted Market Share Residuals Standard Residuals
22.4404613647879 5.05953863521209 1.04532241685809
6.75592197921834 6.14407802078166 1.26939291289331
8.15873323300602 3.44126676699398 0.710980497093639
5.5195459589309 4.4804540410691 0.92568105207
17.7644238521623 -8.46442385216232 -1.74878632942441
6.62911418226579 1.77088581773421 0.365872617335205
5.97129873557439 -2.37129873557439 -0.489920505421586
5.55124790816904 -2.75124790816904 -0.568419636669592
5.18667549193043 -2.98667549193043 -0.617059986827172
5.62257729395485 -4.32257729395485 -0.893063037907481

β0 = 5.08364415690648
β1 = 0.079254873095349
β2 = 0

ificances of the estimated coefficients.


Tell whether the estimated coefficients are significant at the level of α=5%.
gnificance α=5%, the Intercept (β0) is significant (with p-value = 3.2%), whereas the slope coefficient (β1) is significant
(with p-value = 0.78%)
gnificant is when p-value < α, H0 is not rejected.

Market Share = β0 + β1*(Assets + Error)= = Market Share = 5.084 + 0.0

This model is better than the previous model, both for analysis and prediction.
n't
n'tavailable
availablein
inyour
yourversion
versionof
ofExcel.
Excel. This chart isn't available in your versio

hape or
shape orsaving
savingthis
thisworkbook
workbookinto
intoaadifferent
differentfile
fileformat
formatwill
willpermanently
permanentlybreak
breakthe
the Editing this shape or saving this workb
permanently break the chart.

tile) and for Assets at 219.

Market Shares vs Assets Account


3000
2500
2000
Accounts

1500
1000
500
Market Shares vs Assets Account
3000
2500
2000

Accounts
1500
1000
500
0
50 100 150 200 250 0 50 10
Assets

VS Assets and between Accounts VS Assets for all graphs.

hese three variables.


Significance F
0.0374637740775383

Lower 95% Upper 95% Lower 95.0% Upper 95.0%


-1.69464881395113 12.01337501 -1.69464881 12.01337501
-0.0198408739810355 0.019216346 -0.01984087 0.019216346
-0.144484676649397 0.31003159 -0.14448468 0.31003159

PROBABILITY OUTPUT

Percentile Market Share


5 1.3
15 2.2
25 2.8
35 3.6
45 8.4
55 9.3
65 10
75 11.6
85 12.9
95 27.5

unts and assets) or because there can be the existence of an outlier.

Share = 5.16 -0.0003*(Account + Error)+ 0.0828*(Assets + Error)


ory variables.
2500 3000

e outliers. The linear link between


t would be premature to conclude
bles.

0 250

e outliers. The linear link between


t would be premature to conclude
bles.
in your version of Excel. This chart isn't available in your version of Excel.

ng this workbook into a different file format will Editing this shape or saving this workbook into a different file format will per
hart. break the chart.

Market Shares vs Assets Accoun


1000
900
800
700
600
Accounts

500
400
300
200
100
0
100 200 300 400 500 600 700 800 900 1000 0 5 10 15
Assets
VS Assets and between Accounts VS Assets for all graphs.

hese three variables.

Significance F
0.105961822934921

Lower 95% Upper 95% Lower 95.0% Upper 95.0%


-3.77546964366387 7.465991913 -3.77546964 7.465991913
-0.0098184124522843 0.023078868 -0.00981841 0.023078868
-0.235468789230742 0.548738858 -0.23546879 0.548738858

PROBABILITY OUTPUT
Percentile Market Share
6.25 1.3
18.75 2.2
31.25 2.8
43.75 3.6
56.25 8.4
68.75 10
81.25 11.6
93.75 12.9

of α=5%.
nt (β1) is not significant (with p-value = 34.76%) and coefficient (β2)

unts and assets) or because there can be the existence of an outlier.

are = 1.845 + 0.00066*(Account + Error)+ 0.1566*(Assets + Error)

ory variables.
00 900 1000
35 40

Significance F
0.0077610096754001

Lower 95% Upper 95% Lower 95.0% Upper 95.0%


0.56294656684454 9.604341747 0.562946567 9.604341747
0.0274433213013675 0.131066425 0.027443321 0.131066425
PROBABILITY OUTPUT

Percentile Market Share


5 1.3
15 2.2
25 2.8
35 3.6
45 8.4
55 9.3
65 10
75 11.6
85 12.9
95 27.5

= Market Share = 5.084 + 0.079*(Assets + Error)


his chart isn't available in your version of Excel.

diting this shape or saving this workbook into a different file format will
ermanently break the chart.

Accounts vs Assets
3000
2500
2000
Accounts

1500
1000
500
Accounts vs Assets
3000
2500
2000
Accounts

1500
1000
500
0
0 50 100 150 200 250
Assets
of Excel.

ok into a different file format will permanently

Accounts vs Assets
1000
900
800
700
600
Accounts

500
400
300
200
100
0
0 5 10 15 20 25 30 35 40 45
Assets

You might also like