You are on page 1of 16

Regression Assignment

By: Ashita Jain


B-17(MBA2)
AMSoM

reg= read.csv("Regression1.csv")
pairs(~A1+A2+A3+A4+A5+A6+A7+A8+A9+A10+A11+A12+A13+A14+
A15,data=reg)
results=
lm(B~A1+A2+A3+A4+A5+A6+A7+A8+A9+A10+A11+A12+A13+A14+A
15,data=reg)
summary(results)

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.863e+03 4.109e+02 4.535 4.41e-05
***
A1
2.073e+00 8.419e-01 2.462 0.01781 *
A2
-2.177e+00 6.753e-01 -3.224 0.00238 **
A3
-2.833e+00 1.771e+00 -1.600 0.11682
A4
-1.405e+01 7.747e+00 -1.813 0.07658 .
A5
-1.155e+02 6.201e+01 -1.862 0.06931 .
A6
-2.426e+01 1.121e+01 -2.164 0.03596 *
A7
-1.145e+00 1.467e+00 -0.780 0.43933
A8
1.004e-02 4.124e-03 2.435 0.01903 *
A9
3.533e+00 1.283e+00 2.754 0.00852 **
A10
5.245e-01 1.551e+00 0.338 0.73690
A11
2.659e-01 2.565e+00 0.104 0.91792
A12
-8.896e-01 4.525e-01 -1.966 0.05560 .
A13
1.868e+00 9.346e-01 1.999 0.05186 .
A14
-3.477e-02 1.423e-01 -0.244 0.80812
A15
5.329e-01 1.052e+00 0.507 0.61494
A3, A7, A10, A11, A14,A15 are not important
Values >0.5 are neglected

results=
lm(B~A1+A2+A4+A5+A6+A8+A9+A12+A13,
data=reg)
summary(results)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.397e+03 2.906e+02 4.807 1.44e05 ***
A1
1.436e+00 7.803e-01 1.840 0.071698 .
A2
-2.021e+00 5.126e-01 -3.943 0.000251
***
A4
-5.846e+00 6.561e+00 -0.891 0.377220
A5

-7.383e+01 5.683e+01 -1.299 0.199878

A6
-2.141e+01 7.513e+00 -2.850 0.006330
**
A8
8.727e-03 3.324e-03 2.626 0.011442 *
A4, A5 are not important
A9
3.874e+00 9.251e-01 4.188 0.000114
Values >0.5 are neglected
***
A12
-7.555e-01 3.101e-01 -2.437 0.018432 *
A13
1.606e+00 5.985e-01 2.683 0.009855

results= lm(B~A1+A2+A5+A6+A8+A9+A12+A13,
data=reg)
summary(results)
results= lm(B~A1+A2+A6+A8+A9+A12+A13,
data=reg)
summary(results)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.034e+03 8.190e+01 12.626 < 2e-16
***
A1
1.219e+00 6.206e-01 1.964 0.054876 .
A2
-1.672e+00 4.257e-01 -3.927 0.000254 ***
A6
-1.578e+01 6.126e+00 -2.576 0.012873 *
A8
9.285e-03 3.227e-03 2.877 0.005811 **
A9
4.081e+00 5.676e-01 7.190 2.46e-09 ***
A12
-7.372e-01 3.083e-01 -2.391 0.020444 *
A13
1.576e+00 5.889e-01 2.677 0.009914 **

Scaling Variables
reg1= reg
> reg1$A1 =
scale(reg1$A1)
> reg1$A2 =
scale(reg1$A2)
> reg1$A3 =
scale(reg1$A3)
> reg1$A4 =
scale(reg1$A4)
> reg1$A5 =
scale(reg1$A5)
> reg1$A6 =
scale(reg1$A6)
> reg1$A7 =
scale(reg1$A7)
> reg1$A8 =
scale(reg1$A8)
> reg1$A9 =
scale(reg1$A9)
> reg1$A10 =

results1=
lm(B~A1+A2+A3+A4+A5+A6+A7+A8+A9+A10+A11+A12+A13+
A14+A15,data=reg1)
> summary(results1)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 940.313
4.175 225.238 < 2e-16 ***
A1
20.694
8.406 2.462 0.01781 *
A2
-26.074
8.086 -3.224 0.00238 **
A3
-13.504
8.442 -1.600 0.11682
A4
-20.577
11.346 -1.813 0.07658 .
A5
-15.616
8.387 -1.862 0.06931 .
A6
-20.508
9.479 -2.164 0.03596 *
A7
-5.885
7.541 -0.780 0.43933
A8
14.598
5.996 2.435 0.01903 *
A9
31.511
11.441 2.754 0.00852 **
A10
2.427
7.177 0.338 0.73690
A11
1.106
10.672 0.104 0.91792
A12
-81.827
41.616 -1.966 0.05560 .
A13
86.591
43.327 1.999 0.05186 .
A14
-2.204
9.020 -0.244 0.80812
A15
2.909
5.743 0.507 0.61494

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 940.313
4.182 224.846 <
2e-16 ***
A1
12.171
6.197 1.964 0.054876
.
A2
-20.020
5.097 -3.927 0.000254
***
A6
-13.340
5.178 -2.576 0.012873
*
A8
13.502
4.693 2.877 0.005811
**
A9
36.400
5.063 7.190
2.46e-09
Remove one
by one starting
from the
***
highest value.
A12
-67.810
28.357 -2.391
A3,A4,A5,A7,A10,A11,A14,A15
are neglected
0.020444 *
A13
73.080
27.299 2.677
0.009914 **

caling Variables Including Dependent Variable


reg2= reg
> reg2B = scale(reg2$B)
> reg2$A1 = scale(reg2$A1)
> reg2$A2 = scale(reg2$A2)
> reg2$A3 = scale(reg2$A3)
> reg2$A4 = scale(reg2$A4)
> reg2$A5 = scale(reg2$A5)
> reg2$A6 = scale(reg2$A6)
> reg2$A7 = scale(reg2$A7)
> reg2$A8 = scale(reg2$A8)
> reg2$A9 = scale(reg2$A9)
> reg2$A10 = scale(reg2$A10)
> reg2$A11 = scale(reg2$A11)
> reg2$A12 = scale(reg2$A12)
> reg2$A13 = scale(reg2$A13)
> reg2$A14 = scale(reg2$A14)
> reg2$A15 = scale(reg2$A15)
> reg2

results2=
lm(B~A1+A2+A3+A4+A5+A6+A7+A8+A9+A10+A11+A12+A13+A
14+A15,data=reg2)
summary(results2)
results2= lm(B~A1+A2+A6+A8+A9+A12+A13,data=reg2)
summary(results2)
Coefficients:
Estimate Std. Error t value
Pr(>|t|)
(Intercept) 940.313
4.182
224.846 < 2e-16 ***
A1
12.171
6.197 1.964
0.054876 .
A2
-20.020
5.097 -3.927
0.000254 ***
A6
-13.340
5.178 -2.576
0.012873 *
A8
13.502
4.693 2.877
0.005811 **

Calculating Leverage

lev=hat(model.matrix(results))
plot(lev)

reg[lev>0.2,]
Sno A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13
A14 A15
B
7 7 43 30 74 10.9 3.23 12.1 83.9 4679 3.5 49.2 11.3 21 32
62 56 934.7
29 29 11 53 68 9.2 2.99 12.1 90.6 4700 7.8 48.9 12.3 648
319 130 47 861.8
32 32 60 67 82 10.0 2.98 11.5 88.6 4657 13.6 47.3 22.4 3
1 1 60 861.4
40 40 36 29 72 9.5 3.32 10.6 77.6 3437 8.1 45.5 13.8 45
59 263 56 991.2
47 47 10 55 70 7.3 3.11 12.1 88.9 3033 5.9 51.0 14.0 144
66 20 61 839.7
48 48 18 48 63 9.2 2.92 12.2 87.7 4253 13.7 51.2 12.0 311
171 86 71 911.7
49 49 13 49 68 7.0 3.36 12.2 90.7 2702 3.0 51.9 9.7 105
32 3 71 790.7
55 55 41 37 78 6.2 3.25 12.3 89.5 5308 25.9 59.7 10.3 65
28 102 52 967.8

Diagnosing Residuals
>par(mfrow=c(1,5))
> plot(reg$A1, results$res)
> plot(reg$A2, results$res)
> plot(reg$A3, results$res)
> plot(reg$A4, results$res)
> plot(reg$A5, results$res)
> plot(reg$A6, results$res)
> plot(reg$A7, results$res)
> plot(reg$A8, results$res)
> plot(reg$A9, results$res)
> plot(reg$A10, results$res)
> plot(reg$A11, results$res)
> plot(reg$A12, results$res)
> plot(reg$A13, results$res)
> plot(reg$A14, results$res)
>plot(reg$A15, results$res)
> plot(results$fitted, results$res)

Plot Studentized Residuals Vs.


Fitted Values

Suggested power transformation: 0.5839741

qqnorm(results$res)
qqline(results$res)
hist(results$res)

Test Of Multicollinearity
modelvif(results)
A1
A2
A6
A8
A9
A12
A13
2.158907 1.460934 1.507709 1.238291 1.440988
45.211507 41.899843
A VIF greater than 10 for a variable suggests strong
multicollinearity.

You might also like