Professional Documents
Culture Documents
Project Business Report - Regression
Project Business Report - Regression
Code:
toload_libraries <- c("reshape2", "rpsychi", "car", "psych", "corrplot", "forecast", "GPArotation", "psy",
"MVN", "DataExplorer", "ppcor", "Metrics", "foreign", "MASS", "lattice", "nortest", "Hmisc","factoextra",
"nFactors")
> new.packages <- toload_libraries[!(toload_libraries %in% installed.packages()[,"Package"])]
> if(length(new.packages)) install.packages(new.packages)
> lapply(toload_libraries, require, character.only= TRUE
a) Dataset Introduction
Setting database: Code:setwd("C:/Users/ashishj/Desktop/great lakes/R/datasets")
mydata=read.csv("Factor-Mydata-Revised.csv")
dim(mydata) -> checking the dimensionality of the dataset
## [1] 100 13
sapply(mydata,function(x) sum(is.na(x))) -> checking for any missing values in dataset
Code:
sapply(mydata,function(x) sum(is.na(x)))
ID ProdQual Ecom TechSup CompRes Advertising ProdLine SalesFImage
ComPricing WartyClaim OrdBilling DelSpeed
0 0 0 0 0 0 0 0 0 0 0
0
Satisfaction
0
b) Dataset Summarization
Checking the structure of the Dataset
Code:
str(mydata)
## 'data.frame': 100 obs. of 13 variables:
## $ ID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ ProdQual : num 8.5 8.2 9.2 6.4 9 6.5 6.9 6.2 5.8 6.4 ...
## $ Ecom : num 3.9 2.7 3.4 3.3 3.4 2.8 3.7 3.3 3.6 4.5 ...
## $ TechSup : num 2.5 5.1 5.6 7 5.2 3.1 5 3.9 5.1 5.1 ...
## $ CompRes : num 5.9 7.2 5.6 3.7 4.6 4.1 2.6 4.8 6.7 6.1 ...
## $ Advertising : num 4.8 3.4 5.4 4.7 2.2 4 2.1 4.6 3.7 4.7 ...
## $ ProdLine : num 4.9 7.9 7.4 4.7 6 4.3 2.3 3.6 5.9 5.7 ...
## $ SalesFImage : num 6 3.1 5.8 4.5 4.5 3.7 5.4 5.1 5.8 5.7 ...
## $ ComPricing : num 6.8 5.3 4.5 8.8 6.8 8.5 8.9 6.9 9.3 8.4 ...
## $ WartyClaim : num 4.7 5.5 6.2 7 6.1 5.1 4.8 5.4 5.9 5.4 ...
## $ OrdBilling : num 5 3.9 5.4 4.3 4.5 3.6 2.1 4.3 4.4 4.1 ...
## $ DelSpeed : num 3.7 4.9 4.5 3 3.5 3.3 2 3.7 4.6 4.4 ...
## $ Satisfaction: num 8.2 5.7 8.9 4.8 7.1 4.7 5.7 6.3 7 5.5 ...
Conducting a 5 point summary of the variables in the dataset to detect outliers in plain sight before
going deeper
c)Histogram Plots
plot_histogram(mydata)
d)Density Plots
plot_density(mydata)
Code:
skewdata=skew(mydata1)
> list=names(mydata1)
>
• skewlist=data.frame(list,skewdata)
>
• skewlist
list skewdata
1 ProdQual -0.237215714
2 Ecom 0.640710684
3 TechSup -0.197201529
4 CompRes -0.131763526
5 Advertising 0.042299656
6 ProdLine -0.089689444
7 SalesFImage 0.365660982
8 ComPricing -0.232782461
9 WartyClaim 0.008120531
10 OrdBilling -0.323600855
11 DelSpeed -0.449292744
12 Satisfaction 0.075851399
Density plots reveal some are left skewed like Delivery Speed and Tech support to some extent.
Some are right skewed Sales Force Image
Some are bimodal like Product quality and Waranty Claims
Most resemble a normal distribution Ecommerce , Complaint Resolution.
e) Correlations
## Extracting the sub-dataset of relevant Variables
## ID column has been removed as it’s a of no value addition
As we can see from the above correlation matrix:
1. CompRes and DelSpeed are highly correlated
2. OrdBilling and CompRes are highly correlated
3. WartyClaim and TechSupport are highly correlated
4. CompRes and OrdBilling are highly correlated
5. OrdBilling and DelSpeed are highly correlated
6. Ecom and SalesFImage are highly correlated
cormatrix2=cor1
> cormatrix2[lower.tri(cormatrix2,diag=TRUE)]=NA
> cormatrix2=as.data.frame(as.table(cormatrix2))
> cormatrix2=na.omit(cormatrix2)
> cormatrix2=cormatrix2[order(-abs(cormatrix2$Freq)),]
> cormatrix2
Var1 Var2 Freq
124 CompRes DelSpeed 0.8650916968
99 TechSup WartyClaim 0.7971679258
74 Ecom SalesFImage 0.7915437115
112 CompRes OrdBilling 0.7568685913
130 OrdBilling DelSpeed 0.7510030675
136 CompRes Satisfaction 0.6032626039
126 ProdLine DelSpeed 0.6018502083
143 DelSpeed Satisfaction 0.5770422745
64 CompRes ProdLine 0.5614169522
138 ProdLine Satisfaction 0.5505459359
77 Advertising SalesFImage 0.5422036582
142 OrdBilling Satisfaction 0.5217319124
139 SalesFImage Satisfaction 0.5002053063
90 ProdLine ComPricing -0.4949484016
133 ProdQual Satisfaction 0.4863249980
61 ProdQual ProdLine 0.4774934132
50 Ecom Advertising 0.4298907110
114 ProdLine OrdBilling 0.4244082496
85 ProdQual ComPricing -0.4012818841
137 Advertising Satisfaction 0.3046694747
134 Ecom Satisfaction 0.2827450147
125 Advertising DelSpeed 0.2758630832
102 ProdLine WartyClaim 0.2730775284
127 SalesFImage DelSpeed 0.2715512592
87 TechSup ComPricing -0.2707866821
91 SalesFImage ComPricing 0.2645965539
104 ComPricing WartyClaim -0.2449860542
76 CompRes SalesFImage 0.2297517611
86 Ecom ComPricing 0.2294624014
140 ComPricing Satisfaction -0.2082956889
117 WartyClaim OrdBilling 0.1970651213
52 CompRes Advertising 0.1969168472
115 SalesFImage OrdBilling 0.1951274057
63 TechSup ProdLine 0.1926254565
122 Ecom DelSpeed 0.1916360683
113 Advertising OrdBilling 0.1842355941
141 WartyClaim Satisfaction 0.1775448190
110 Ecom OrdBilling 0.1561473316
73 ProdQual SalesFImage -0.1518128743
100 CompRes WartyClaim 0.1404082967
38 Ecom CompRes 0.1401792611
13 ProdQual Ecom -0.1371632174
89 Advertising ComPricing 0.1342168943
88 CompRes ComPricing -0.1279542529
116 ComPricing OrdBilling -0.1145670257
135 TechSup Satisfaction 0.1125971788
129 WartyClaim DelSpeed 0.1093946024
103 SalesFImage WartyClaim 0.1074553447
37 ProdQual CompRes 0.1063700009
109 ProdQual OrdBilling 0.1043030736
39 TechSup CompRes 0.0966565978
25 ProdQual TechSup 0.0956004542
97 ProdQual WartyClaim 0.0883123063
111 TechSup OrdBilling 0.0801018246
128 ComPricing DelSpeed -0.0728717289
51 TechSup Advertising -0.0628700668
78 ProdLine SalesFImage -0.0613155277
49 ProdQual Advertising -0.0534731340
62 Ecom ProdLine -0.0526878383
98 Ecom WartyClaim 0.0518981915
121 ProdQual DelSpeed 0.0277180027
123 TechSup DelSpeed 0.0254406935
75 TechSup SalesFImage 0.0169905395
65 Advertising ProdLine -0.0115508187
101 Advertising WartyClaim 0.0107920743
26 Ecom TechSup 0.0008667887
Code:
cor1=cor(mydata1)
corrplot(cor1,type = "upper",method = "number")
print(cor1,digits=2)
3. Evidence of Multicollinearity
# modelling the Response variable against the predictors using linear Regression
model = lm(Satisfaction ~ . , data = mydata`)
summary(model)
▪ As in our model the adjusted R-squared: 0.7774, meaning that independent variables explain
78% of the variance
▪ of the dependent variable, only 3 variables are significant out of 11 independent variables.
▪ The p-value of the F-statistic is less than 0.05(level of Significance), which means our model is
significant. This means that, at least, one of the predictor variables is significantly related to the
outcome variable.
▪ Our model equation can be written as:
Satisfaction = -0.66 + 0.37*ProdQual -0.44*Ecom + 0.034*TechSup + 0.16*CompRes -
0.02*Advertising + 0.14ProdLine + 0.80*SalesFImage-0.038*CompPricing -0.10*WartyClaim +
0.14*OrdBilling + 0.16*DelSpeed
model1=lm(Satisfaction~.,data = mydata1)
> summary(model1)
Call:
lm(formula = Satisfaction ~ ., data = mydata1)
Residuals:
Min 1Q Median 3Q Max
-1.43005 -0.31165 0.07621 0.37190 0.90120
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.66961 0.81233 -0.824 0.41199
ProdQual 0.37137 0.05177 7.173 2.18e-10 ***
Ecom -0.44056 0.13396 -3.289 0.00145 **
TechSup 0.03299 0.06372 0.518 0.60591
CompRes 0.16703 0.10173 1.642 0.10416
Advertising -0.02602 0.06161 -0.422 0.67382
ProdLine 0.14034 0.08025 1.749 0.08384 .
SalesFImage 0.80611 0.09775 8.247 1.45e-12 ***
ComPricing -0.03853 0.04677 -0.824 0.41235
WartyClaim -0.10298 0.12330 -0.835 0.40587
OrdBilling 0.14635 0.10367 1.412 0.16160
DelSpeed 0.16570 0.19644 0.844 0.40124
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
For evidence of multicollinearity amongst the variables of mydata dataset Variance Inflation Factors
(VIF) concept was used
Any variable having value of VIF > 4 suggests presence of multicollinearity amongst predictor
variables
vif(model1)
ProdQual Ecom TechSup CompRes Advertising ProdLine SalesFImage ComPricing War
tyClaim OrdBilling
1.635797 2.756694 2.976796 4.730448 1.508933 3.488185 3.439420 1.635000 3
.198337 2.902999
DelSpeed
6.516014
Code
mydata2=mydata[,2:12]
> cor2=cor(mydata2)
> KMO(cor2)
Kaiser-Meyer-Olkin factor adequacy
Call: KMO(r = cor2)
Overall MSA = 0.65
MSA for each item =
ProdQual Ecom TechSup CompRes Advertising ProdLine SalesFImage ComPric
ing WartyClaim OrdBilling
0.51 0.63 0.52 0.79 0.78 0.62 0.62 0
.75 0.51 0.76
DelSpeed
0.67
➢ Step 2: nfactors kept at 4 as per below eigen values and scree plot and using fa function
Summary :
▪ The first 4 factors have an Eigenvalue >1 and which explains almost 69% of the variance.
▪ We can effectively reduce dimensionality from 11 to 4 while only losing about 31% of the
variance.
▪ Factor 1 accounts for 29.20% of the variance; Factor 2 accounts for 20.20% of the
variance; Factor 3 accounts for 13.60% of the variance; Factor 4 accounts for 6% of the
variance.
Factor Analysis using method = pa
Call: fa(r = mydata2, nfactors = 4, rotate = "none", fm = "pa")
Standardized loadings (pattern matrix) based upon correlation matrix
PA1 PA2 PA3 PA4 h2 u2 com
ProdQual 0.20 -0.41 -0.06 0.46 0.42 0.576 2.4
Ecom 0.29 0.66 0.27 0.22 0.64 0.362 2.0
TechSup 0.28 -0.38 0.74 -0.17 0.79 0.205 1.9
CompRes 0.86 0.01 -0.26 -0.18 0.84 0.157 1.3
Advertising 0.29 0.46 0.08 0.13 0.31 0.686 1.9
ProdLine 0.69 -0.45 -0.14 0.31 0.80 0.200 2.3
SalesFImage 0.39 0.80 0.35 0.25 0.98 0.021 2.1
ComPricing -0.23 0.55 -0.04 -0.29 0.44 0.557 1.9
WartyClaim 0.38 -0.32 0.74 -0.15 0.81 0.186 2.0
OrdBilling 0.75 0.02 -0.18 -0.18 0.62 0.378 1.2
DelSpeed 0.90 0.10 -0.30 -0.20 0.94 0.058 1.4
The harmonic number of observations is 100 with the empirical chi square 3.19 with prob
< 1
The total number of observations was 100 with Likelihood Chi Square = 30.27 with prob <
0.024
>
The degrees of freedom for the null model are 55 and the objective function was 6.55
with Chi Square of 619.27
The degrees of freedom for the model are 17 and the objective function was 0.33
The harmonic number of observations is 100 with the empirical chi square 3.19 with pr
ob < 1
The total number of observations was 100 with Likelihood Chi Square = 30.27 with pro
b < 0.024
>
➢ Step 6: using fa diagram to compare with and without rotation: The red dotted line means
that Competitive Pricing marginally falls under the PA4 bucket and the loading are
negative.
➢ Step 8: getting the new dataset using scores and new factors mentioned above
sat=as.data.frame(mydata[,13])
> colnames(sat)=c("satisfaction")
> mydatanew1=cbind(scores1,sat) ## Factor Analysis
> mydatanew1
Experience compbrand aftersaleservice productsuperioity satisfaction
1 -0.13388710 0.91751661 -1.719604873 0.09135411 8.2
2 1.62976040 -2.00900531 -0.596361722 0.65808192 5.7
3 0.36376581 0.83617362 0.002979966 1.37548765 8.9
4 -1.22252302 -0.54913358 1.245473305 -0.64421384 4.8
5 -0.48542093 -0.42762231 -0.026980304 0.47360747 7.1
6 -0.59509240 -1.30353334 -1.183019401 -0.95913571 4.7
7 -2.52885363 0.38836877 -0.603275803 -1.29659025 5.7
8 -0.11315168 -0.13097631 -0.699238481 -1.36606005 6.3
9 0.95751096 0.34755882 -0.142256076 -0.93477420 7.0
10 0.58135807 0.43427719 -0.481549064 -0.66519579 5.5
11 -0.04744554 -0.34677999 -0.477931226 0.62086386 7.4
12 -1.22969845 1.22373499 0.307420873 -1.06601488 6.0
13 0.70120038 1.40162126 -0.077278204 0.61198552 8.4
14 0.18944710 -0.12001589 0.341391428 1.43748733 7.6
15 1.59586476 0.51484865 -0.307216912 -0.62265003 8.0
16 1.11215548 -1.25985548 -0.535588676 0.99091689 6.6
17 0.90477581 -0.30392244 0.909413294 -1.04926552 6.4
18 1.35863182 0.09820639 0.147598367 -0.63536585 7.4
19 0.76821232 0.25113902 -0.444327163 -0.84712501 6.8
20 0.61161128 1.74911250 -0.747772366 -0.37770002 7.6
21 -0.49662748 -0.40513549 1.413398115 -1.42620085 5.4
22 -0.24583333 2.83259042 0.458183224 2.15737479 9.9
23 -0.08593028 -0.20647990 0.954813784 1.29099542 7.0
24 1.30419410 -0.65840510 -0.735880788 0.79535237 8.6
25 0.02837015 0.11289267 0.352466288 -0.83695432 4.8
26 0.37516895 -0.08949421 0.586591370 -1.33839027 6.6
27 0.69040218 -1.27676215 0.980470048 0.97700863 6.3
28 0.19330562 -1.09019060 0.410744110 -1.15018948 5.4
29 0.74807174 -1.20388888 -0.169444094 1.11442401 6.3
30 -0.53645976 -0.31470400 -1.389463973 -0.62508147 5.4
31 -0.98478652 -0.32465411 2.054616799 0.57213335 6.1
32 -0.89540824 -1.30592549 1.145669622 -0.17981750 6.4
33 -0.61006900 -0.25385457 0.201747129 -0.51103804 5.4
34 0.58139098 -0.57102185 0.160508882 -1.09267691 7.3
35 -1.08254233 1.61817367 0.121735970 -0.32924397 6.3
36 -1.51860194 -1.82264781 0.280394795 1.08323735 5.4
37 -0.54298600 -0.45867623 0.359185746 0.44557509 7.1
38 1.35035690 0.28270522 0.116158567 0.69743150 8.7
39 0.98244317 -0.26293227 -1.015054009 -0.98172621 7.6
40 -0.92383910 1.28059000 -1.167921851 -0.79331056 6.0
41 0.09932064 0.09827305 -1.837451060 -0.63276007 7.0
42 0.10869713 -0.04357923 -1.050920110 0.49209145 7.6
43 0.44811474 1.20846081 -0.968824054 0.72015394 8.9
44 0.70660340 2.17850627 1.233358555 -0.84868686 7.6
45 1.38263370 -2.03732511 -0.954180731 0.77953010 5.5
46 0.94936859 0.24232567 0.291172997 -0.56344261 7.4
47 0.08771633 -0.60985289 0.976205747 0.45795642 7.1
48 1.81030056 0.43834059 0.814478797 -1.17962559 7.6
49 -0.19224074 1.62888205 -0.733730711 1.33934942 8.7
50 0.22531882 0.78430768 -0.957752366 0.92784955 8.6
51 -1.41135829 -0.16930613 0.091870699 -0.12631052 5.4
52 1.75365232 -1.99153570 -1.147542434 0.70646283 5.7
53 0.93209854 -0.51397965 -0.162358377 0.70759613 8.7
54 -0.94184081 -0.25288506 0.358406630 0.67268912 6.1
55 0.72242047 -0.54924734 -0.909388995 -1.09941166 7.3
56 -0.63244099 0.41450294 0.914064214 0.81720176 7.7
57 1.99193341 1.41977208 -0.077384854 -0.41266731 9.0
58 0.08548402 0.30077469 0.319615332 0.74357193 8.2
59 -0.57061091 -0.34885418 0.365475898 0.45601852 7.1
60 0.83067496 -1.65626897 0.641678078 0.23244952 7.9
61 0.86635294 -1.24543164 1.546930711 0.94382806 6.6
62 -0.60680134 0.74525730 -0.146586558 -0.25509333 8.0
63 -1.00306329 -0.09710517 -1.081705300 0.42111790 6.3
64 -1.28310920 -1.59838664 0.403626668 -0.02388625 6.0
65 -1.39461685 -0.23395772 0.557826636 -0.26460336 5.4
66 1.60952000 0.55610373 -0.999093684 -1.16665258 7.6
67 1.07257520 -0.39772241 1.815878900 -1.14631126 6.4
68 0.50884788 -0.29395183 -0.416389316 -0.75909753 6.1
69 -0.70601563 -0.44707894 -0.973611521 -0.83304582 5.2
70 0.23899120 0.05018369 -1.219545924 -1.27828253 6.6
71 0.48631145 1.74069413 0.854795068 -0.43411595 7.6
72 -1.37477720 -0.22770098 -1.265787815 1.00015839 5.8
73 0.76539809 0.81612817 -1.748046205 -0.13852761 7.9
74 -0.64249465 1.66245576 1.253902625 1.20305576 8.6
75 -0.26909538 0.83656999 -0.210138132 -0.11412392 8.2
76 -0.12296694 -0.33391431 1.172210398 0.39677906 7.1
77 0.10371878 -0.33487753 2.082854211 -1.18017022 6.4
78 0.46301479 -0.33956677 1.214737442 0.10357998 7.6
79 0.98918468 0.65624508 0.485336460 1.22733204 8.9
80 -1.74919939 0.82890385 0.003384046 0.09231225 5.7
81 -0.32137992 -0.20110631 0.470666208 0.60888606 7.1
82 -0.03471489 -0.38974940 0.412648407 0.57771035 7.4
83 -1.17790878 -0.90045717 0.198122454 0.40011473 6.6
84 -2.55956258 -0.25686675 1.554283149 -1.14101556 5.0
85 0.73971919 -0.90696938 0.655846983 0.51129012 8.2
86 -0.50161741 -0.56408382 -1.009145207 -0.86106988 5.2
87 -1.33057806 -0.01745886 -2.201996451 -0.92403144 5.2
88 0.70358137 -0.91236060 1.299031585 0.56968494 8.2
89 0.10912530 -0.46505791 0.266161987 0.17120142 7.3
90 1.02813129 2.57446865 1.640665437 -0.80151398 8.2
91 -1.04672561 0.44260491 1.394021786 0.75011393 7.4
92 -1.90712581 -0.46993540 -0.522475751 -1.31422452 4.8
93 0.07279015 -0.02235421 -0.352359525 1.61044504 7.6
94 1.08706436 0.61924340 0.089460676 1.16081123 8.9
95 -0.95457978 0.66256003 -0.981787816 1.06717592 7.7
96 -0.41931326 0.70755398 -0.077703201 0.52522023 7.3
97 -0.12315824 -0.25275815 -1.762967608 -0.63424275 6.3
98 -1.79270636 -1.59315365 -1.309147686 1.28219570 5.4
99 -0.33991434 1.89138931 0.122487640 -0.17511674 6.4
100 -0.31758889 -0.42356050 -0.453981729 -1.03250054 6.4
## Since the p value for the test is quite less signficance level of alpha = 0.001 so we reject the null
hypothesis Ho
## (that PCA cannot be conducted implying that there is no correlation amongst the predictor
variables)
➢ Step 3: using no rotation PCA not getting a clear picture so using varimax rotation
> unrotate=principal(mydata2,nfactors = 4,rotate = "none")
> print(unrotate)
Principal Components Analysis
Call: principal(r = mydata2, nfactors = 4, rotate = "none")
Standardized loadings (pattern matrix) based upon correlation matrix
PC1 PC2 PC3 PC4 h2 u2 com
ProdQual 0.25 -0.50 -0.08 0.67 0.77 0.232 2.2
Ecom 0.31 0.71 0.31 0.28 0.78 0.223 2.1
TechSup 0.29 -0.37 0.79 -0.20 0.89 0.107 1.9
CompRes 0.87 0.03 -0.27 -0.22 0.88 0.119 1.3
Advertising 0.34 0.58 0.11 0.33 0.58 0.424 2.4
ProdLine 0.72 -0.45 -0.15 0.21 0.79 0.213 2.0
SalesFImage 0.38 0.75 0.31 0.23 0.86 0.141 2.1
ComPricing -0.28 0.66 -0.07 -0.35 0.64 0.359 1.9
WartyClaim 0.39 -0.31 0.78 -0.19 0.89 0.108 2.0
OrdBilling 0.81 0.04 -0.22 -0.25 0.77 0.234 1.3
DelSpeed 0.88 0.12 -0.30 -0.21 0.91 0.086 1.4
>
Loadings:
RC1 RC2 RC3 RC4
ProdQual 0.876
Ecom 0.871
TechSup 0.939
CompRes 0.926
Advertising 0.742
ProdLine
SalesFImage 0.900
ComPricing -0.723
WartyClaim 0.931
OrdBilling 0.864
DelSpeed 0.938
print(scores,digits=2)
Experience compbrand aftersaleservice productsuperioity
1 0.127 0.770 -1.8784 0.366
2 1.222 -1.646 -0.6140 0.813
3 0.616 0.580 0.0037 1.570
4 -0.845 -0.272 1.2675 -1.254
5 -0.320 -0.834 -0.0081 0.448
6 -0.647 -1.067 -1.3032 -1.053
7 -2.627 -0.246 -0.5554 -1.226
8 -0.279 -0.157 -0.7493 -1.015
9 1.052 -0.172 -0.0923 -1.658
10 0.429 0.764 -0.4504 -0.891
11 -0.136 -0.768 -0.4637 0.606
12 -1.450 1.360 0.4378 -1.070
13 0.625 2.113 -0.1683 0.875
14 0.427 -0.404 0.4322 0.902
15 1.439 0.664 -0.2681 -1.044
16 0.920 -1.058 -0.5568 1.167
17 0.522 -0.320 1.1060 -1.032
18 1.713 -0.164 0.2549 -1.478
19 1.161 -0.419 -0.3756 -1.762
20 0.293 1.776 -0.9501 0.241
21 -0.615 -0.179 1.5259 -1.832
22 -0.113 2.834 0.6343 2.244
23 0.081 -0.351 1.1413 1.335
24 1.949 -1.671 -0.8592 0.503
25 0.115 -0.016 0.4718 -1.250
26 0.575 -0.245 0.6243 -1.354
27 0.829 -0.986 1.0426 0.922
28 0.117 -1.107 0.3797 -1.360
29 1.158 -1.606 -0.0558 0.795
30 -0.507 0.162 -1.5513 -0.306
31 -0.811 -0.179 2.2566 0.216
32 -1.074 -1.601 1.1867 -0.070
33 -0.500 0.306 0.1571 -0.970
34 0.279 0.071 -0.0329 -0.656
35 -1.211 0.612 0.2758 -0.689
36 -1.376 -1.059 0.2775 1.029
37 -0.625 -0.244 0.3109 0.661
38 1.364 0.035 0.1112 0.582
39 0.601 0.471 -1.2915 -0.446
40 -0.586 1.482 -1.1845 -1.039
41 0.192 -0.390 -1.9817 -0.596
42 0.043 0.090 -1.1657 0.537
43 0.410 1.958 -1.0947 0.989
44 0.775 1.613 1.5121 -1.149
45 1.270 -1.774 -0.9828 0.737
46 1.060 0.679 0.3242 -1.103
47 -0.123 -0.091 0.9961 1.417
48 2.098 0.462 0.8401 -1.681
49 0.156 0.882 -0.8353 1.298
50 0.230 0.503 -0.8770 1.037
51 -0.942 -0.376 0.1942 -0.653
52 1.561 -1.908 -1.1765 0.721
53 0.860 -1.089 -0.2414 0.872
54 -0.818 -0.529 0.5399 0.331
55 0.541 -0.680 -1.0607 -0.815
56 -0.369 0.283 0.9175 0.604
57 1.979 1.432 -0.0853 -0.839
58 0.206 0.517 0.3475 0.858
59 -1.341 0.553 0.3266 1.940
60 0.853 -1.578 0.5660 0.740
61 0.993 -1.265 1.7001 0.791
62 -1.105 0.709 -0.1528 0.396
63 -0.759 0.260 -1.1884 0.780
64 -1.095 -1.951 0.4272 -0.149
65 -1.209 0.153 0.5776 -0.516
66 1.343 0.537 -1.0391 -1.249
67 0.902 -0.588 2.0624 -1.319
68 0.423 -0.248 -0.3013 -0.847
69 -0.875 -0.604 -0.9976 -0.529
70 0.144 -0.151 -1.2760 -1.000
71 0.344 2.056 0.6863 0.094
72 -1.160 -0.185 -1.2052 0.714
73 0.926 1.316 -1.8699 -0.559
74 -0.567 1.400 1.2266 1.350
75 -0.299 0.872 -0.2946 0.303
76 -0.891 0.233 1.0379 1.613
77 -0.355 0.144 2.0573 -0.633
78 0.211 0.342 1.0733 0.309
79 1.130 0.640 0.4414 1.465
80 -1.532 0.288 0.0325 -0.311
81 -0.850 -0.248 0.4526 1.531
82 0.028 -0.916 0.4936 0.404
83 -1.392 -0.985 0.2076 0.626
84 -2.486 -0.736 1.6335 -1.445
85 1.003 -1.782 0.7977 -0.011
86 -0.829 -0.419 -1.0805 -0.452
87 -1.425 -0.298 -2.1553 -1.270
88 1.071 -1.298 1.4008 0.040
89 0.088 -0.059 0.1342 0.235
90 1.076 2.377 1.8930 -1.013
91 -0.785 0.463 1.3918 0.613
92 -2.348 -0.264 -0.5345 -1.189
93 0.299 0.206 -0.3714 1.208
94 1.107 0.370 0.0538 1.445
95 -0.797 0.712 -1.0877 1.061
96 -0.113 0.396 0.0483 0.348
97 -0.208 -0.253 -1.8809 -0.321
98 -1.586 -1.123 -1.3375 1.237
99 -0.328 1.902 0.1402 -0.121
100 -0.627 0.211 -0.7489 -0.696
Call:
lm(formula = satisfaction ~ ., data = mydatanew)
Residuals:
Min 1Q Median 3Q Max
-1.6308 -0.4996 0.1372 0.4623 1.5228
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.91800 0.07089 97.589 < 2e-16 ***
Experience 0.61805 0.07125 8.675 1.12e-13 ***
compbrand 0.50973 0.07125 7.155 1.74e-10 ***
aftersaleservice 0.06714 0.07125 0.942 0.348
productsuperioity 0.54032 0.07125 7.584 2.24e-11 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
vif(modelnew)
Experience compbrand aftersaleservice productsuperioity
1.002184 1.004588 1.002890 1.007353
backtrack=data.frame(mydata$Satisfaction,predictsat)
> backtrack
mydata.Satisfaction predict.modelnew.
1 8.2 7.461131
2 5.7 7.232186
3 8.9 8.442790
4 4.8 5.664819
5 7.1 6.536470
6 4.7 5.317754
7 5.7 4.469453
8 6.3 6.066612
9 7.0 6.577969
10 5.5 7.060438
11 7.4 6.739281
12 6.0 6.166029
13 8.4 8.842472
14 7.6 7.492682
15 8.0 7.563360
16 6.6 7.540149
17 6.4 6.594089
18 7.4 7.111975
19 6.8 6.444675
20 7.6 8.071185
21 5.4 5.559359
22 9.9 9.548008
23 7.0 7.586644
24 8.6 7.484882
25 4.8 6.337033
26 6.6 6.458666
27 6.3 7.495895
28 5.4 5.716737
29 6.3 7.240977
30 5.4 6.417368
31 6.1 6.593980
32 6.4 5.479441
33 5.4 6.251209
34 7.3 6.769940
35 6.3 6.127986
36 5.4 6.102572
37 7.1 6.785459
38 8.7 8.101165
39 7.6 7.201950
40 6.0 6.670603
41 7.0 6.382541
42 7.6 7.202833
43 8.9 8.630260
44 7.6 7.700263
45 5.5 7.130855
46 7.4 7.344975
47 7.1 7.627871
48 7.6 7.598427
49 8.7 8.109555
50 8.6 7.817810
51 5.4 5.804808
52 5.7 7.220864
53 8.7 7.349170
54 6.1 6.357687
55 7.3 6.394126
56 7.7 7.222578
57 9.0 8.411725
58 8.2 7.795489
59 7.1 7.441189
60 7.9 7.078808
61 6.6 7.428834
62 8.0 6.800197
63 6.3 6.923048
64 6.0 5.195447
65 5.4 6.008776
66 7.6 7.276799
67 6.4 6.601811
68 6.1 6.575471
69 5.2 5.716483
70 6.6 6.303540
71 7.6 8.275765
72 5.8 6.411607
73 7.9 7.733521
74 8.6 8.093294
75 8.2 7.321437
76 7.1 7.427831
77 6.4 6.567802
78 7.6 7.461656
79 8.9 8.763895
80 5.7 5.952047
81 7.1 7.123864
82 7.4 6.719964
83 6.6 5.907461
84 5.0 4.335592
85 8.2 6.677173
86 5.2 5.875296
87 5.2 5.054001
88 8.2 7.033722
89 7.3 7.078467
90 8.2 8.374157
91 7.4 7.093570
92 4.8 4.653626
93 7.6 7.835808
94 8.9 8.575632
95 7.7 7.288792
96 7.3 7.241431
97 6.3 6.360840
98 5.4 5.943548
99 6.4 7.629094
100 6.4 6.211478
lines(mydata$Satisfaction,col="blue")
> lines(predictsat,col="yellow")
> text(28, 9.9, "Actual value", col = "blue")
> text(14.5, 9, "Predicted value", col ="red")
Plot analysis revealed that our new Regression model is quite good and close to actual Satisfaction
scores Blue dots represent Actual Satisfaction ratings Red dots represent Predicted satisfaction scores
derived from multiple linear regression model
9:Conclusion:
• Satisfaction" ratings of mydata depends very highly on the overall Purchasing experience of the
Customer with highest loading factor & beta : faster delivery and compliant resolution mechanism are
most important
• Company brand is next important & surprisingly Product itself comes in third in order to satisfy the
customer
Disclaimer: There can be differences in the real operating world and these statistical models but this model
comes closest to explaining the data provided for deduction
Annexue : R Code
### loading appropraite libraries
setwd("C:/Users/ashishj/Desktop/great lakes/R/datasets")
mydata=read.csv("Factor-Hair-Revised.csv")
mydata
dim(mydata)
## data summary
str(mydata)
summary(mydata)
#histogram of data
#since ID is of no sue removing that data from the dataset #
mydata1=mydata[,-1]
plot_histogram(mydata1)
## skewness -Density plots & skew data shows reveal Delivery Speed and Tech support are left
skewed while Sales Force Image is right skewed and Waranty Claims Most resemble a normal
distribution Ecommerce , Complaint Resolution.
plot_density(mydata1)
skewdata=skew(mydata1)
list=names(mydata1)
skewlist=data.frame(list,skewdata)
skewlist
## correlation test
cor1=cor(mydata1)
corrplot(cor1,type = "upper",method = "number")
print(cor1,digits=2)
cormatrix2=cor1
cormatrix2[lower.tri(cormatrix2,diag=TRUE)]=NA
cormatrix2=as.data.frame(as.table(cormatrix2))
cormatrix2=na.omit(cormatrix2)
cormatrix2=cormatrix2[order(-abs(cormatrix2$Freq)),]
cormatrix2
## multicollineraity
attach(mydata1)
model1=lm(Satisfaction~.,data = mydata1)
summary(model1)
multcol=as.data.frame(vif(model1))
multcol
##vif for complaint resolution & delivery speed showing high mulicollinearity
# factor analysis
KMO(cor2)
scores1=factor2$scores
head(scores1)
colnames(scores1)=c("Experience","compbrand","aftersaleservice","productsuperioity")
print(scores,digits=2)
sat=as.data.frame(mydata[,13])
colnames(sat)=c("satisfaction")
mydatanew1=cbind(scores1,sat) ## Factor Analysis
mydatanew1
mode
## Principal Component Analysis
mydata2=mydata[,2:12]
cor2=cor(mydata2)
test1=cortest.bartlett(cor2,nrow(mydata2))
test1$p.value
## Since the p value for the test is quite less signficance level of alpha = 0.001 so we reject the null
hypothesis Ho
## (that PCA cannot be conducted implying that there is no correlation amongst the predictor
variables)
## deciding opn number of factors to be consoidered using Scree plot and eigen values
ev=eigen(cor2)
factor=c(1:11)
scree=data.frame(factor,ev$values)
plot(scree,data=mydata2,main="scree plot",col="red")
lines(scree,col="blue")
abline(h=1,col="violet")
## sine elbowing is happening at 5 factors but Eigen values suggesting 4 factors so we will consider
4 factors for PCA
print(rotate$loadings[,1],cutoff=0.7)
scores=as.data.frame(rotate$scores)
print(scores,digits = 2)
class(scores)
colnames(scores)=c("Experience","compbrand","aftersaleservice","productsuperioity")
print(scores,digits=2)
sat=as.data.frame(mydata[,13])
colnames(sat)=c("satisfaction")
mydatanew=cbind(scores,sat) ## Factor Analysis
mydatanew
attach(mydatanew)
modelnew=lm(satisfaction~.,data=mydatanew)
summary(modelnew)
vif(modelnew)
predictsat=as.data.frame(predict(modelnew))
backtrack=data.frame(mydata$Satisfaction,predictsat)
backtrack
plot(mydata$Satisfaction,col="red",xslab="data points")
lines(mydata$Satisfaction,col="blue")
lines(predictsat,col="yellow")
text(28, 9.9, "Actual value", col = "blue")
text(14.5, 9, "Predicted value", col ="red")