You are on page 1of 39

Index

1. Importing the Libraries & setting working directory


2. Exploratory Data Analysis
a. Dataset Introduction
b. Dataset Summarization
c. Histogram Plots
d. Density Plots & skewlist
e. Correlations
3. Evidence of Multicollinearity
4. Simple Linear Models
5. Principal Component Analysis
6. Factor Component Analysis
7. New Multiple Linear Regression Model from PCA
8. New Model analysis
9. Testing of New Model-Predicted v/s Actual Satisfactions
10. Conclusion
Annexure : R Code
1. Importing the Libraries
Have imported the following libraries -> reshape2", "rpsychi", "car", "psych", "corrplot", "forecast",
"GPArotation", "psy", "MVN", "DataExplorer", "ppcor", "Metrics", "foreign", "MASS", "lattice", "nortest",
"Hmisc","factoextra", "nFactors

Code:
toload_libraries <- c("reshape2", "rpsychi", "car", "psych", "corrplot", "forecast", "GPArotation", "psy",
"MVN", "DataExplorer", "ppcor", "Metrics", "foreign", "MASS", "lattice", "nortest", "Hmisc","factoextra",
"nFactors")
> new.packages <- toload_libraries[!(toload_libraries %in% installed.packages()[,"Package"])]
> if(length(new.packages)) install.packages(new.packages)
> lapply(toload_libraries, require, character.only= TRUE

2. Exploratory Data Analysis

a) Dataset Introduction
Setting database: Code:setwd("C:/Users/ashishj/Desktop/great lakes/R/datasets")
mydata=read.csv("Factor-Mydata-Revised.csv")
dim(mydata) -> checking the dimensionality of the dataset
## [1] 100 13
sapply(mydata,function(x) sum(is.na(x))) -> checking for any missing values in dataset
Code:
sapply(mydata,function(x) sum(is.na(x)))
ID ProdQual Ecom TechSup CompRes Advertising ProdLine SalesFImage
ComPricing WartyClaim OrdBilling DelSpeed
0 0 0 0 0 0 0 0 0 0 0
0
Satisfaction
0

b) Dataset Summarization
Checking the structure of the Dataset
Code:
str(mydata)
## 'data.frame': 100 obs. of 13 variables:
## $ ID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ ProdQual : num 8.5 8.2 9.2 6.4 9 6.5 6.9 6.2 5.8 6.4 ...
## $ Ecom : num 3.9 2.7 3.4 3.3 3.4 2.8 3.7 3.3 3.6 4.5 ...
## $ TechSup : num 2.5 5.1 5.6 7 5.2 3.1 5 3.9 5.1 5.1 ...
## $ CompRes : num 5.9 7.2 5.6 3.7 4.6 4.1 2.6 4.8 6.7 6.1 ...
## $ Advertising : num 4.8 3.4 5.4 4.7 2.2 4 2.1 4.6 3.7 4.7 ...
## $ ProdLine : num 4.9 7.9 7.4 4.7 6 4.3 2.3 3.6 5.9 5.7 ...
## $ SalesFImage : num 6 3.1 5.8 4.5 4.5 3.7 5.4 5.1 5.8 5.7 ...
## $ ComPricing : num 6.8 5.3 4.5 8.8 6.8 8.5 8.9 6.9 9.3 8.4 ...
## $ WartyClaim : num 4.7 5.5 6.2 7 6.1 5.1 4.8 5.4 5.9 5.4 ...
## $ OrdBilling : num 5 3.9 5.4 4.3 4.5 3.6 2.1 4.3 4.4 4.1 ...
## $ DelSpeed : num 3.7 4.9 4.5 3 3.5 3.3 2 3.7 4.6 4.4 ...
## $ Satisfaction: num 8.2 5.7 8.9 4.8 7.1 4.7 5.7 6.3 7 5.5 ...

Conducting a 5 point summary of the variables in the dataset to detect outliers in plain sight before
going deeper

Visibly there are not outliers in the dataset


Code:
summary(mydata)
## ID ProdQual Ecom TechSup
## Min. : 1.00 Min. : 5.000 Min. :2.200 Min. :1.300
## 1st Qu.: 25.75 1st Qu.: 6.575 1st Qu.:3.275 1st Qu.:4.250
## Median : 50.50 Median : 8.000 Median :3.600 Median :5.400
## Mean : 50.50 Mean : 7.810 Mean :3.672 Mean :5.365
## 3rd Qu.: 75.25 3rd Qu.: 9.100 3rd Qu.:3.925 3rd Qu.:6.625
## Max. :100.00 Max. :10.000 Max. :5.700 Max. :8.500
## CompRes Advertising ProdLine SalesFImage
## Min. :2.600 Min. :1.900 Min. :2.300 Min. :2.900
##
1st Qu.:4.600 1st Qu.:3.175 1st Qu.:4.700 1st Qu.:4.500
## Median :5.450 Median :4.000 Median :5.750 Median :4.900
## Mean :5.442 Mean :4.010 Mean :5.805 Mean :5.123

## 3rd Qu.:6.325 3rd Qu.:4.800 3rd Qu.:6.800 3rd Qu.:5.800


## Max. :7.800 Max. :6.500 Max. :8.400 Max. :8.200
## ComPricing WartyClaim OrdBilling DelSpeed
## Min. :3.700 Min. :4.100 Min. :2.000 Min. :1.600
## 1st Qu.:5.875 1st Qu.:5.400 1st Qu.:3.700 1st Qu.:3.400
## Median :7.100 Median :6.100 Median :4.400 Median :3.900
## Mean :6.974 Mean :6.043 Mean :4.278 Mean :3.886
## 3rd Qu.:8.400 3rd Qu.:6.600 3rd Qu.:4.800 3rd Qu.:4.425
## Max. :9.900 Max. :8.100 Max. :6.700 Max. :5.500
## Satisfaction
## Min. :4.700
## 1st Qu.:6.000
## Median :7.050
## Mean :6.918
## 3rd Qu.:7.625
## Max. :9.900

c)Histogram Plots
plot_histogram(mydata)

d)Density Plots
plot_density(mydata)
Code:
skewdata=skew(mydata1)
> list=names(mydata1)
>

• skewlist=data.frame(list,skewdata)
>

• skewlist
list skewdata
1 ProdQual -0.237215714
2 Ecom 0.640710684
3 TechSup -0.197201529
4 CompRes -0.131763526
5 Advertising 0.042299656
6 ProdLine -0.089689444
7 SalesFImage 0.365660982
8 ComPricing -0.232782461
9 WartyClaim 0.008120531
10 OrdBilling -0.323600855
11 DelSpeed -0.449292744
12 Satisfaction 0.075851399

Density plots reveal some are left skewed like Delivery Speed and Tech support to some extent.
Some are right skewed Sales Force Image
Some are bimodal like Product quality and Waranty Claims
Most resemble a normal distribution Ecommerce , Complaint Resolution.

e) Correlations
## Extracting the sub-dataset of relevant Variables
## ID column has been removed as it’s a of no value addition
As we can see from the above correlation matrix:
1. CompRes and DelSpeed are highly correlated
2. OrdBilling and CompRes are highly correlated
3. WartyClaim and TechSupport are highly correlated
4. CompRes and OrdBilling are highly correlated
5. OrdBilling and DelSpeed are highly correlated
6. Ecom and SalesFImage are highly correlated
cormatrix2=cor1
> cormatrix2[lower.tri(cormatrix2,diag=TRUE)]=NA
> cormatrix2=as.data.frame(as.table(cormatrix2))
> cormatrix2=na.omit(cormatrix2)
> cormatrix2=cormatrix2[order(-abs(cormatrix2$Freq)),]
> cormatrix2
Var1 Var2 Freq
124 CompRes DelSpeed 0.8650916968
99 TechSup WartyClaim 0.7971679258
74 Ecom SalesFImage 0.7915437115
112 CompRes OrdBilling 0.7568685913
130 OrdBilling DelSpeed 0.7510030675
136 CompRes Satisfaction 0.6032626039
126 ProdLine DelSpeed 0.6018502083
143 DelSpeed Satisfaction 0.5770422745
64 CompRes ProdLine 0.5614169522
138 ProdLine Satisfaction 0.5505459359
77 Advertising SalesFImage 0.5422036582
142 OrdBilling Satisfaction 0.5217319124
139 SalesFImage Satisfaction 0.5002053063
90 ProdLine ComPricing -0.4949484016
133 ProdQual Satisfaction 0.4863249980
61 ProdQual ProdLine 0.4774934132
50 Ecom Advertising 0.4298907110
114 ProdLine OrdBilling 0.4244082496
85 ProdQual ComPricing -0.4012818841
137 Advertising Satisfaction 0.3046694747
134 Ecom Satisfaction 0.2827450147
125 Advertising DelSpeed 0.2758630832
102 ProdLine WartyClaim 0.2730775284
127 SalesFImage DelSpeed 0.2715512592
87 TechSup ComPricing -0.2707866821
91 SalesFImage ComPricing 0.2645965539
104 ComPricing WartyClaim -0.2449860542
76 CompRes SalesFImage 0.2297517611
86 Ecom ComPricing 0.2294624014
140 ComPricing Satisfaction -0.2082956889
117 WartyClaim OrdBilling 0.1970651213
52 CompRes Advertising 0.1969168472
115 SalesFImage OrdBilling 0.1951274057
63 TechSup ProdLine 0.1926254565
122 Ecom DelSpeed 0.1916360683
113 Advertising OrdBilling 0.1842355941
141 WartyClaim Satisfaction 0.1775448190
110 Ecom OrdBilling 0.1561473316
73 ProdQual SalesFImage -0.1518128743
100 CompRes WartyClaim 0.1404082967
38 Ecom CompRes 0.1401792611
13 ProdQual Ecom -0.1371632174
89 Advertising ComPricing 0.1342168943
88 CompRes ComPricing -0.1279542529
116 ComPricing OrdBilling -0.1145670257
135 TechSup Satisfaction 0.1125971788
129 WartyClaim DelSpeed 0.1093946024
103 SalesFImage WartyClaim 0.1074553447
37 ProdQual CompRes 0.1063700009
109 ProdQual OrdBilling 0.1043030736
39 TechSup CompRes 0.0966565978
25 ProdQual TechSup 0.0956004542
97 ProdQual WartyClaim 0.0883123063
111 TechSup OrdBilling 0.0801018246
128 ComPricing DelSpeed -0.0728717289
51 TechSup Advertising -0.0628700668
78 ProdLine SalesFImage -0.0613155277
49 ProdQual Advertising -0.0534731340
62 Ecom ProdLine -0.0526878383
98 Ecom WartyClaim 0.0518981915
121 ProdQual DelSpeed 0.0277180027
123 TechSup DelSpeed 0.0254406935
75 TechSup SalesFImage 0.0169905395
65 Advertising ProdLine -0.0115508187
101 Advertising WartyClaim 0.0107920743
26 Ecom TechSup 0.0008667887
Code:
cor1=cor(mydata1)
corrplot(cor1,type = "upper",method = "number")
print(cor1,digits=2)
3. Evidence of Multicollinearity

# modelling the Response variable against the predictors using linear Regression
model = lm(Satisfaction ~ . , data = mydata`)
summary(model)
▪ As in our model the adjusted R-squared: 0.7774, meaning that independent variables explain
78% of the variance
▪ of the dependent variable, only 3 variables are significant out of 11 independent variables.
▪ The p-value of the F-statistic is less than 0.05(level of Significance), which means our model is
significant. This means that, at least, one of the predictor variables is significantly related to the
outcome variable.
▪ Our model equation can be written as:
Satisfaction = -0.66 + 0.37*ProdQual -0.44*Ecom + 0.034*TechSup + 0.16*CompRes -
0.02*Advertising + 0.14ProdLine + 0.80*SalesFImage-0.038*CompPricing -0.10*WartyClaim +
0.14*OrdBilling + 0.16*DelSpeed
model1=lm(Satisfaction~.,data = mydata1)
> summary(model1)

Call:
lm(formula = Satisfaction ~ ., data = mydata1)

Residuals:
Min 1Q Median 3Q Max
-1.43005 -0.31165 0.07621 0.37190 0.90120

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.66961 0.81233 -0.824 0.41199
ProdQual 0.37137 0.05177 7.173 2.18e-10 ***
Ecom -0.44056 0.13396 -3.289 0.00145 **
TechSup 0.03299 0.06372 0.518 0.60591
CompRes 0.16703 0.10173 1.642 0.10416
Advertising -0.02602 0.06161 -0.422 0.67382
ProdLine 0.14034 0.08025 1.749 0.08384 .
SalesFImage 0.80611 0.09775 8.247 1.45e-12 ***
ComPricing -0.03853 0.04677 -0.824 0.41235
WartyClaim -0.10298 0.12330 -0.835 0.40587
OrdBilling 0.14635 0.10367 1.412 0.16160
DelSpeed 0.16570 0.19644 0.844 0.40124
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.5623 on 88 degrees of freedom


Multiple R-squared: 0.8021, Adjusted R-squared: 0.7774
F-statistic: 32.43 on 11 and 88 DF, p-value: < 2.2e-16

For evidence of multicollinearity amongst the variables of mydata dataset Variance Inflation Factors
(VIF) concept was used
Any variable having value of VIF > 4 suggests presence of multicollinearity amongst predictor
variables

vif(model1)
ProdQual Ecom TechSup CompRes Advertising ProdLine SalesFImage ComPricing War
tyClaim OrdBilling
1.635797 2.756694 2.976796 4.730448 1.508933 3.488185 3.439420 1.635000 3
.198337 2.902999
DelSpeed
6.516014

▪ VIF for Delivery speed was found to be 6.525324 ( greater than 4)


▪ Also VIF for Complaint Resolution CompRes was 4.73 suggesting presence of multicollinearity
which can destabilise the Regression model
▪ To eliminate multicollinearity we need to Perform an analysis design like principal component
analysis (PCA)/ Factor Analysis on the correlated variables.
4. Simple Linear Models
Simple linear regression models for all 11 factors with Response variable - Satisfaction was done
using lm function

Satisfaction = 3.6759 + 0.4151 * ProdQual


Intercept coefficient is equal to 3.6759
slope : 0.4151;for any one unit change in product quality Satisfaction rating would improve by 0.4151
keeping other things constant as explained by model

Satisfaction = 5.1516 + 0.4811 * Ecom


Intercept coefficient is equal to 5.1516
slope 0.4811;for any one unit change in product quality Satisfaction rating would improve by
0.4811keeping other things constant as explained by model
Satisfaction = 6.44757 + 0.08768 * TechSup
Intercept coefficient is equal to 6.44757
slope 0.08768 ;for any one unit change in product quality Satisfaction rating would improve by 0.08768
keeping other things constant as explained by model

Satisfaction = 3.680 + 0.595 * CompRes


Intercept coefficient is equal to 3.680
slope 0.595 : for any one unit change in CompRes Satisfaction rating would improve by 0.595 keeping
other things constant as explained by model

Satisfaction = 5.6259 + 0.3222 * Advertising


Intercept coefficient is equal to 5.6259
slope 0.3222: for any one unit change in Advertising Satisfaction rating would improve by 0.3222
keeping other things constant as explained by model
Satisfaction = 4.0220 + 0.4989 * ProdLine
Intercept coefficient is equal to 4.0220
slope 00.4989: for any one unit change in ProdLine Satisfaction rating would improve by 0.4989
keeping other things constant as explained by model

Satisfaction = 4.070 + 0.556 * SalesFImage


Intercept coefficient is equal to = 4.070
slope 0.556: for any one unit change in SalesFImage Satisfaction rating would improve by 0.556
keeping other things constant as explained by model
Satisfaction = 8.0386 + (-0.1607) * ComPricing
Intercept coefficient is equal to = 8.0386
slope (-0.1607: for any one unit change ComPricing Satisfaction rating would improve by (-
0.1607keeping other things constant as explained by model

Satisfaction = 5.3581 + 0.2581 * WartyClaim


Intercept coefficient is equal to = 5.3581
Slope 0.2581: for any one unit change WartyClaim Satisfaction rating would improve by 0.2581
keeping other things constant as explained by model
Satisfaction = 4.0541 + 0.6695 * OrdBilling
Intercept coefficient is equal to = = 4.0541
Slope 0.6695: for any one unit change OrdBilling Satisfaction rating would improve by 0.6695 keeping
other things constant as explained by model

Satisfaction = 3.2791 + 0.9364 * DelSpeed


Intercept coefficient is equal to = 3.2791
Slope 0.9364: for any one unit change DelSpeed Satisfaction rating would improve by 0.9364
keeping other things constant as explained by model
5,Factor Analysis:
➢ Step 1:Perform KMO test (Kaiser-Meyer-Olkin test)
Overall MSA =0.65 > 0.5 so we can perform the test

Code
mydata2=mydata[,2:12]
> cor2=cor(mydata2)
> KMO(cor2)
Kaiser-Meyer-Olkin factor adequacy
Call: KMO(r = cor2)
Overall MSA = 0.65
MSA for each item =
ProdQual Ecom TechSup CompRes Advertising ProdLine SalesFImage ComPric
ing WartyClaim OrdBilling
0.51 0.63 0.52 0.79 0.78 0.62 0.62 0
.75 0.51 0.76
DelSpeed
0.67

➢ Step 2: nfactors kept at 4 as per below eigen values and scree plot and using fa function
Summary :
▪ The first 4 factors have an Eigenvalue >1 and which explains almost 69% of the variance.
▪ We can effectively reduce dimensionality from 11 to 4 while only losing about 31% of the
variance.
▪ Factor 1 accounts for 29.20% of the variance; Factor 2 accounts for 20.20% of the
variance; Factor 3 accounts for 13.60% of the variance; Factor 4 accounts for 6% of the
variance.
Factor Analysis using method = pa
Call: fa(r = mydata2, nfactors = 4, rotate = "none", fm = "pa")
Standardized loadings (pattern matrix) based upon correlation matrix
PA1 PA2 PA3 PA4 h2 u2 com
ProdQual 0.20 -0.41 -0.06 0.46 0.42 0.576 2.4
Ecom 0.29 0.66 0.27 0.22 0.64 0.362 2.0
TechSup 0.28 -0.38 0.74 -0.17 0.79 0.205 1.9
CompRes 0.86 0.01 -0.26 -0.18 0.84 0.157 1.3
Advertising 0.29 0.46 0.08 0.13 0.31 0.686 1.9
ProdLine 0.69 -0.45 -0.14 0.31 0.80 0.200 2.3
SalesFImage 0.39 0.80 0.35 0.25 0.98 0.021 2.1
ComPricing -0.23 0.55 -0.04 -0.29 0.44 0.557 1.9
WartyClaim 0.38 -0.32 0.74 -0.15 0.81 0.186 2.0
OrdBilling 0.75 0.02 -0.18 -0.18 0.62 0.378 1.2
DelSpeed 0.90 0.10 -0.30 -0.20 0.94 0.058 1.4

PA1 PA2 PA3 PA4


SS loadings 3.21 2.22 1.50 0.68
Proportion Var 0.29 0.20 0.14 0.06
Cumulative Var 0.29 0.49 0.63 0.69
Proportion Explained 0.42 0.29 0.20 0.09
Cumulative Proportion 0.42 0.71 0.91 1.00

Mean item complexity = 1.9


Test of the hypothesis that 4 factors are sufficient.
The degrees of freedom for the null model are 55 and the objective function was 6.55 wit
h Chi Square of 619.27
The degrees of freedom for the model are 17 and the objective function was 0.33

The root mean square of the residuals (RMSR) is 0.02


The df corrected root mean square of the residuals is 0.03

The harmonic number of observations is 100 with the empirical chi square 3.19 with prob
< 1
The total number of observations was 100 with Likelihood Chi Square = 30.27 with prob <
0.024

Tucker Lewis Index of factoring reliability = 0.921


RMSEA index = 0.088 and the 90 % confidence intervals are 0.032 0.139
BIC = -48.01
Fit based upon off diagonal values = 1
Measures of factor score adequacy
PA1 PA2 PA3 PA4
Correlation of (regression) scores with factors 0.98 0.97 0.95 0.88
Multiple R square of scores with factors 0.96 0.95 0.91 0.78
Minimum correlation of possible factor scores 0.92 0.90 0.82 0.56

>

➢ Step3: check the loadings


➢ factor1$loadings

➢ Loadings:
➢ PA1 PA2 PA3 PA4
➢ ProdQual 0.201 -0.408 0.463
➢ Ecom 0.290 0.659 0.270 0.216
➢ TechSup 0.278 -0.381 0.738 -0.166
➢ CompRes 0.862 -0.255 -0.184
➢ Advertising 0.286 0.457 0.129
➢ ProdLine 0.689 -0.453 -0.142 0.315
➢ SalesFImage 0.395 0.801 0.346 0.251
➢ ComPricing -0.232 0.553 -0.286
➢ WartyClaim 0.379 -0.324 0.735 -0.153
➢ OrdBilling 0.747 -0.175 -0.181
➢ DelSpeed 0.895 -0.303 -0.198

➢ PA1 PA2 PA3 PA4
➢ SS loadings 3.215 2.223 1.499 0.678
➢ Proportion Var 0.292 0.202 0.136 0.062
➢ Cumulative Var 0.292 0.494 0.631 0.692

➢ Step 4: using orthogonal rotation fo perfom FA
factor2=fa(r=mydata2,nfactors = 4,rotate = "varimax",fm="pa")
> factor2
Factor Analysis using method = pa
Call: fa(r = mydata2, nfactors = 4, rotate = "varimax", fm = "pa")
Standardized loadings (pattern matrix) based upon correlation matrix
PA1 PA2 PA3 PA4 h2 u2 com
ProdQual 0.02 -0.07 0.02 0.65 0.42 0.576 1.0
Ecom 0.07 0.79 0.03 -0.11 0.64 0.362 1.1
TechSup 0.02 -0.03 0.88 0.12 0.79 0.205 1.0
CompRes 0.90 0.13 0.05 0.13 0.84 0.157 1.1
Advertising 0.17 0.53 -0.04 -0.06 0.31 0.686 1.2
ProdLine 0.53 -0.04 0.13 0.71 0.80 0.200 1.9
SalesFImage 0.12 0.97 0.06 -0.13 0.98 0.021 1.1
ComPricing -0.08 0.21 -0.21 -0.59 0.44 0.557 1.6
WartyClaim 0.10 0.06 0.89 0.13 0.81 0.186 1.1
OrdBilling 0.77 0.13 0.09 0.09 0.62 0.378 1.1
DelSpeed 0.95 0.19 0.00 0.09 0.94 0.058 1.1

PA1 PA2 PA3 PA4


SS loadings 2.63 1.97 1.64 1.37
Proportion Var 0.24 0.18 0.15 0.12
Cumulative Var 0.24 0.42 0.57 0.69
Proportion Explained 0.35 0.26 0.22 0.18
Cumulative Proportion 0.35 0.60 0.82 1.00

Mean item complexity = 1.2


Test of the hypothesis that 4 factors are sufficient.

The degrees of freedom for the null model are 55 and the objective function was 6.55
with Chi Square of 619.27
The degrees of freedom for the model are 17 and the objective function was 0.33

The root mean square of the residuals (RMSR) is 0.02


The df corrected root mean square of the residuals is 0.03

The harmonic number of observations is 100 with the empirical chi square 3.19 with pr
ob < 1
The total number of observations was 100 with Likelihood Chi Square = 30.27 with pro
b < 0.024

Tucker Lewis Index of factoring reliability = 0.921


RMSEA index = 0.088 and the 90 % confidence intervals are 0.032 0.139
BIC = -48.01
Fit based upon off diagonal values = 1
Measures of factor score adequacy
PA1 PA2 PA3 PA4
Correlation of (regression) scores with factors 0.98 0.99 0.94 0.88
Multiple R square of scores with factors 0.96 0.97 0.88 0.78
Minimum correlation of possible factor scores 0.93 0.94 0.77 0.55

>

➢ Step 5: check loadings


➢ factor2$loadings

➢ Loadings:
➢ PA1 PA2 PA3 PA4
➢ ProdQual 0.647
➢ Ecom 0.787 -0.113
➢ TechSup 0.883 0.116
➢ CompRes 0.898 0.130 0.132
➢ Advertising 0.166 0.530
➢ ProdLine 0.525 0.127 0.712
➢ SalesFImage 0.115 0.971 -0.135
➢ ComPricing 0.213 -0.209 -0.590
➢ WartyClaim 0.103 0.885 0.128
➢ OrdBilling 0.768 0.127
➢ DelSpeed 0.949 0.185

➢ PA1 PA2 PA3 PA4
➢ SS loadings 2.635 1.967 1.641 1.371
➢ Proportion Var 0.240 0.179 0.149 0.125
➢ Cumulative Var 0.240 0.418 0.568 0.692

➢ Step 6: using fa diagram to compare with and without rotation: The red dotted line means
that Competitive Pricing marginally falls under the PA4 bucket and the loading are
negative.

➢ Step 7: Naming the factos


• Experience=>comp resolution, ordere billing, delivery speed:
Talks about customer experience
• RC2= > compbrand =>Ecom,advertising,sales image
Talks about company’s image and brand as a whole
• RC3= >aftersaleservice=>techsup,wartcalim
Talks about after sales service
• RC4= >productsuperioity=>prodquality,comppricing

➢ Step 8: getting the new dataset using scores and new factors mentioned above

sat=as.data.frame(mydata[,13])
> colnames(sat)=c("satisfaction")
> mydatanew1=cbind(scores1,sat) ## Factor Analysis
> mydatanew1
Experience compbrand aftersaleservice productsuperioity satisfaction
1 -0.13388710 0.91751661 -1.719604873 0.09135411 8.2
2 1.62976040 -2.00900531 -0.596361722 0.65808192 5.7
3 0.36376581 0.83617362 0.002979966 1.37548765 8.9
4 -1.22252302 -0.54913358 1.245473305 -0.64421384 4.8
5 -0.48542093 -0.42762231 -0.026980304 0.47360747 7.1
6 -0.59509240 -1.30353334 -1.183019401 -0.95913571 4.7
7 -2.52885363 0.38836877 -0.603275803 -1.29659025 5.7
8 -0.11315168 -0.13097631 -0.699238481 -1.36606005 6.3
9 0.95751096 0.34755882 -0.142256076 -0.93477420 7.0
10 0.58135807 0.43427719 -0.481549064 -0.66519579 5.5
11 -0.04744554 -0.34677999 -0.477931226 0.62086386 7.4
12 -1.22969845 1.22373499 0.307420873 -1.06601488 6.0
13 0.70120038 1.40162126 -0.077278204 0.61198552 8.4
14 0.18944710 -0.12001589 0.341391428 1.43748733 7.6
15 1.59586476 0.51484865 -0.307216912 -0.62265003 8.0
16 1.11215548 -1.25985548 -0.535588676 0.99091689 6.6
17 0.90477581 -0.30392244 0.909413294 -1.04926552 6.4
18 1.35863182 0.09820639 0.147598367 -0.63536585 7.4
19 0.76821232 0.25113902 -0.444327163 -0.84712501 6.8
20 0.61161128 1.74911250 -0.747772366 -0.37770002 7.6
21 -0.49662748 -0.40513549 1.413398115 -1.42620085 5.4
22 -0.24583333 2.83259042 0.458183224 2.15737479 9.9
23 -0.08593028 -0.20647990 0.954813784 1.29099542 7.0
24 1.30419410 -0.65840510 -0.735880788 0.79535237 8.6
25 0.02837015 0.11289267 0.352466288 -0.83695432 4.8
26 0.37516895 -0.08949421 0.586591370 -1.33839027 6.6
27 0.69040218 -1.27676215 0.980470048 0.97700863 6.3
28 0.19330562 -1.09019060 0.410744110 -1.15018948 5.4
29 0.74807174 -1.20388888 -0.169444094 1.11442401 6.3
30 -0.53645976 -0.31470400 -1.389463973 -0.62508147 5.4
31 -0.98478652 -0.32465411 2.054616799 0.57213335 6.1
32 -0.89540824 -1.30592549 1.145669622 -0.17981750 6.4
33 -0.61006900 -0.25385457 0.201747129 -0.51103804 5.4
34 0.58139098 -0.57102185 0.160508882 -1.09267691 7.3
35 -1.08254233 1.61817367 0.121735970 -0.32924397 6.3
36 -1.51860194 -1.82264781 0.280394795 1.08323735 5.4
37 -0.54298600 -0.45867623 0.359185746 0.44557509 7.1
38 1.35035690 0.28270522 0.116158567 0.69743150 8.7
39 0.98244317 -0.26293227 -1.015054009 -0.98172621 7.6
40 -0.92383910 1.28059000 -1.167921851 -0.79331056 6.0
41 0.09932064 0.09827305 -1.837451060 -0.63276007 7.0
42 0.10869713 -0.04357923 -1.050920110 0.49209145 7.6
43 0.44811474 1.20846081 -0.968824054 0.72015394 8.9
44 0.70660340 2.17850627 1.233358555 -0.84868686 7.6
45 1.38263370 -2.03732511 -0.954180731 0.77953010 5.5
46 0.94936859 0.24232567 0.291172997 -0.56344261 7.4
47 0.08771633 -0.60985289 0.976205747 0.45795642 7.1
48 1.81030056 0.43834059 0.814478797 -1.17962559 7.6
49 -0.19224074 1.62888205 -0.733730711 1.33934942 8.7
50 0.22531882 0.78430768 -0.957752366 0.92784955 8.6
51 -1.41135829 -0.16930613 0.091870699 -0.12631052 5.4
52 1.75365232 -1.99153570 -1.147542434 0.70646283 5.7
53 0.93209854 -0.51397965 -0.162358377 0.70759613 8.7
54 -0.94184081 -0.25288506 0.358406630 0.67268912 6.1
55 0.72242047 -0.54924734 -0.909388995 -1.09941166 7.3
56 -0.63244099 0.41450294 0.914064214 0.81720176 7.7
57 1.99193341 1.41977208 -0.077384854 -0.41266731 9.0
58 0.08548402 0.30077469 0.319615332 0.74357193 8.2
59 -0.57061091 -0.34885418 0.365475898 0.45601852 7.1
60 0.83067496 -1.65626897 0.641678078 0.23244952 7.9
61 0.86635294 -1.24543164 1.546930711 0.94382806 6.6
62 -0.60680134 0.74525730 -0.146586558 -0.25509333 8.0
63 -1.00306329 -0.09710517 -1.081705300 0.42111790 6.3
64 -1.28310920 -1.59838664 0.403626668 -0.02388625 6.0
65 -1.39461685 -0.23395772 0.557826636 -0.26460336 5.4
66 1.60952000 0.55610373 -0.999093684 -1.16665258 7.6
67 1.07257520 -0.39772241 1.815878900 -1.14631126 6.4
68 0.50884788 -0.29395183 -0.416389316 -0.75909753 6.1
69 -0.70601563 -0.44707894 -0.973611521 -0.83304582 5.2
70 0.23899120 0.05018369 -1.219545924 -1.27828253 6.6
71 0.48631145 1.74069413 0.854795068 -0.43411595 7.6
72 -1.37477720 -0.22770098 -1.265787815 1.00015839 5.8
73 0.76539809 0.81612817 -1.748046205 -0.13852761 7.9
74 -0.64249465 1.66245576 1.253902625 1.20305576 8.6
75 -0.26909538 0.83656999 -0.210138132 -0.11412392 8.2
76 -0.12296694 -0.33391431 1.172210398 0.39677906 7.1
77 0.10371878 -0.33487753 2.082854211 -1.18017022 6.4
78 0.46301479 -0.33956677 1.214737442 0.10357998 7.6
79 0.98918468 0.65624508 0.485336460 1.22733204 8.9
80 -1.74919939 0.82890385 0.003384046 0.09231225 5.7
81 -0.32137992 -0.20110631 0.470666208 0.60888606 7.1
82 -0.03471489 -0.38974940 0.412648407 0.57771035 7.4
83 -1.17790878 -0.90045717 0.198122454 0.40011473 6.6
84 -2.55956258 -0.25686675 1.554283149 -1.14101556 5.0
85 0.73971919 -0.90696938 0.655846983 0.51129012 8.2
86 -0.50161741 -0.56408382 -1.009145207 -0.86106988 5.2
87 -1.33057806 -0.01745886 -2.201996451 -0.92403144 5.2
88 0.70358137 -0.91236060 1.299031585 0.56968494 8.2
89 0.10912530 -0.46505791 0.266161987 0.17120142 7.3
90 1.02813129 2.57446865 1.640665437 -0.80151398 8.2
91 -1.04672561 0.44260491 1.394021786 0.75011393 7.4
92 -1.90712581 -0.46993540 -0.522475751 -1.31422452 4.8
93 0.07279015 -0.02235421 -0.352359525 1.61044504 7.6
94 1.08706436 0.61924340 0.089460676 1.16081123 8.9
95 -0.95457978 0.66256003 -0.981787816 1.06717592 7.7
96 -0.41931326 0.70755398 -0.077703201 0.52522023 7.3
97 -0.12315824 -0.25275815 -1.762967608 -0.63424275 6.3
98 -1.79270636 -1.59315365 -1.309147686 1.28219570 5.4
99 -0.33991434 1.89138931 0.122487640 -0.17511674 6.4
100 -0.31758889 -0.42356050 -0.453981729 -1.03250054 6.4

6. Principal Component Analysis


➢ Step 1: Using cortest.bartlett to check whether Principal Component Analysis can be done on the
independent variables of the dataset
test1=cortest.bartlett(cor2,nrow(mydata2))
> test1$p.value
[1] 1.79337e-96

## Since the p value for the test is quite less signficance level of alpha = 0.001 so we reject the null
hypothesis Ho
## (that PCA cannot be conducted implying that there is no correlation amongst the predictor
variables)

➢ Step2: selection of number of factors to be considered for PCA

• Eigen values: ## eigen value < 1 is not significant so 4 factors to be considered


• ev=eigen(cor1)
• > ev$values
• [1] 4.04285997 2.55292440 1.69222417 1.21754639 0.63596293 0.56853132 0.40282774 0.3244801
6 0.23613948 0.14422355 0.09913845 0.08314143
• Scree plot: sine elbowing is happening at 5 factors but Eigen values suggesting 4 factors so we
will consider 4 factors for PCA

➢ Step 3: using no rotation PCA not getting a clear picture so using varimax rotation
> unrotate=principal(mydata2,nfactors = 4,rotate = "none")
> print(unrotate)
Principal Components Analysis
Call: principal(r = mydata2, nfactors = 4, rotate = "none")
Standardized loadings (pattern matrix) based upon correlation matrix
PC1 PC2 PC3 PC4 h2 u2 com
ProdQual 0.25 -0.50 -0.08 0.67 0.77 0.232 2.2
Ecom 0.31 0.71 0.31 0.28 0.78 0.223 2.1
TechSup 0.29 -0.37 0.79 -0.20 0.89 0.107 1.9
CompRes 0.87 0.03 -0.27 -0.22 0.88 0.119 1.3
Advertising 0.34 0.58 0.11 0.33 0.58 0.424 2.4
ProdLine 0.72 -0.45 -0.15 0.21 0.79 0.213 2.0
SalesFImage 0.38 0.75 0.31 0.23 0.86 0.141 2.1
ComPricing -0.28 0.66 -0.07 -0.35 0.64 0.359 1.9
WartyClaim 0.39 -0.31 0.78 -0.19 0.89 0.108 2.0
OrdBilling 0.81 0.04 -0.22 -0.25 0.77 0.234 1.3
DelSpeed 0.88 0.12 -0.30 -0.21 0.91 0.086 1.4

PC1 PC2 PC3 PC4


SS loadings 3.43 2.55 1.69 1.09
Proportion Var 0.31 0.23 0.15 0.10
Cumulative Var 0.31 0.54 0.70 0.80
Proportion Explained 0.39 0.29 0.19 0.12
Cumulative Proportion 0.39 0.68 0.88 1.00

Mean item complexity = 1.9


Test of the hypothesis that 4 components are sufficient.

The root mean square of the residuals (RMSR) is 0.06


with the empirical chi square 39.02 with prob < 0.0018

Fit based upon off diagonal values = 0.97

>

> rotate=principal(mydata2,nfactors = 4,rotate = "varimax")


> print(rotate)
Principal Components Analysis
Call: principal(r = mydata2, nfactors = 4, rotate = "varimax")
Standardized loadings (pattern matrix) based upon correlation matrix
RC1 RC2 RC3 RC4 h2 u2 com
ProdQual 0.00 -0.01 -0.03 0.88 0.77 0.232 1.0
Ecom 0.06 0.87 0.05 -0.12 0.78 0.223 1.1
TechSup 0.02 -0.02 0.94 0.10 0.89 0.107 1.0
CompRes 0.93 0.12 0.05 0.09 0.88 0.119 1.1
Advertising 0.14 0.74 -0.08 0.01 0.58 0.424 1.1
ProdLine 0.59 -0.06 0.15 0.64 0.79 0.213 2.1
SalesFImage 0.13 0.90 0.08 -0.16 0.86 0.141 1.1
ComPricing -0.09 0.23 -0.25 -0.72 0.64 0.359 1.5
WartyClaim 0.11 0.05 0.93 0.10 0.89 0.108 1.1
OrdBilling 0.86 0.11 0.08 0.04 0.77 0.234 1.1
DelSpeed 0.94 0.18 0.00 0.05 0.91 0.086 1.1

RC1 RC2 RC3 RC4


SS loadings 2.89 2.23 1.86 1.77
Proportion Var 0.26 0.20 0.17 0.16
Cumulative Var 0.26 0.47 0.63 0.80
Proportion Explained 0.33 0.26 0.21 0.20
Cumulative Proportion 0.33 0.59 0.80 1.00

Mean item complexity = 1.2


Test of the hypothesis that 4 components are sufficient.

The root mean square of the residuals (RMSR) is 0.06


with the empirical chi square 39.02 with prob < 0.0018

Fit based upon off diagonal values = 0.97

Cumulative variance of 4 RCs explain about 80 % of variance which is good number

➢ Step 4: Naming the components


print(rotate$loadings,cutoff=0.7)

Loadings:
RC1 RC2 RC3 RC4
ProdQual 0.876
Ecom 0.871
TechSup 0.939
CompRes 0.926
Advertising 0.742
ProdLine
SalesFImage 0.900
ComPricing -0.723
WartyClaim 0.931
OrdBilling 0.864
DelSpeed 0.938

RC1= >Experience=>comp resolution, ordere billing, delivery speed:

Talks about customer experience


RC2= > compbrand =>Ecom,advertising,sales image
Talks about company’s image and brand as a whole
RC3= >aftersaleservice=>techsup,wartcalim
Talks about after sales service
RC4= >productsuperioity=>prodquality,comppricing
Talks about how good is the product after sales

➢ Step 5: getting scores and generating the new dataset


scores=as.data.frame(rotate$scores)
> print(scores,digits = 2)
RC1 RC2 RC3 RC4
1 0.127 0.770 -1.8784 0.366
2 1.222 -1.646 -0.6140 0.813
3 0.616 0.580 0.0037 1.570
4 -0.845 -0.272 1.2675 -1.254
5 -0.320 -0.834 -0.0081 0.448
6 -0.647 -1.067 -1.3032 -1.053
7 -2.627 -0.246 -0.5554 -1.226
8 -0.279 -0.157 -0.7493 -1.015
9 1.052 -0.172 -0.0923 -1.658
10 0.429 0.764 -0.4504 -0.891
11 -0.136 -0.768 -0.4637 0.606
12 -1.450 1.360 0.4378 -1.070
13 0.625 2.113 -0.1683 0.875
14 0.427 -0.404 0.4322 0.902
15 1.439 0.664 -0.2681 -1.044
16 0.920 -1.058 -0.5568 1.167
17 0.522 -0.320 1.1060 -1.032
18 1.713 -0.164 0.2549 -1.478
19 1.161 -0.419 -0.3756 -1.762
20 0.293 1.776 -0.9501 0.241
21 -0.615 -0.179 1.5259 -1.832
22 -0.113 2.834 0.6343 2.244
23 0.081 -0.351 1.1413 1.335
24 1.949 -1.671 -0.8592 0.503
25 0.115 -0.016 0.4718 -1.250
26 0.575 -0.245 0.6243 -1.354
27 0.829 -0.986 1.0426 0.922
28 0.117 -1.107 0.3797 -1.360
29 1.158 -1.606 -0.0558 0.795
30 -0.507 0.162 -1.5513 -0.306
31 -0.811 -0.179 2.2566 0.216
32 -1.074 -1.601 1.1867 -0.070
33 -0.500 0.306 0.1571 -0.970
34 0.279 0.071 -0.0329 -0.656
35 -1.211 0.612 0.2758 -0.689
36 -1.376 -1.059 0.2775 1.029
37 -0.625 -0.244 0.3109 0.661
38 1.364 0.035 0.1112 0.582
39 0.601 0.471 -1.2915 -0.446
40 -0.586 1.482 -1.1845 -1.039
41 0.192 -0.390 -1.9817 -0.596
42 0.043 0.090 -1.1657 0.537
43 0.410 1.958 -1.0947 0.989
44 0.775 1.613 1.5121 -1.149
45 1.270 -1.774 -0.9828 0.737
46 1.060 0.679 0.3242 -1.103
47 -0.123 -0.091 0.9961 1.417
48 2.098 0.462 0.8401 -1.681
49 0.156 0.882 -0.8353 1.298
50 0.230 0.503 -0.8770 1.037
51 -0.942 -0.376 0.1942 -0.653
52 1.561 -1.908 -1.1765 0.721
53 0.860 -1.089 -0.2414 0.872
54 -0.818 -0.529 0.5399 0.331
55 0.541 -0.680 -1.0607 -0.815
56 -0.369 0.283 0.9175 0.604
57 1.979 1.432 -0.0853 -0.839
58 0.206 0.517 0.3475 0.858
59 -1.341 0.553 0.3266 1.940
60 0.853 -1.578 0.5660 0.740
61 0.993 -1.265 1.7001 0.791
62 -1.105 0.709 -0.1528 0.396
63 -0.759 0.260 -1.1884 0.780
64 -1.095 -1.951 0.4272 -0.149
65 -1.209 0.153 0.5776 -0.516
66 1.343 0.537 -1.0391 -1.249
67 0.902 -0.588 2.0624 -1.319
68 0.423 -0.248 -0.3013 -0.847
69 -0.875 -0.604 -0.9976 -0.529
70 0.144 -0.151 -1.2760 -1.000
71 0.344 2.056 0.6863 0.094
72 -1.160 -0.185 -1.2052 0.714
73 0.926 1.316 -1.8699 -0.559
74 -0.567 1.400 1.2266 1.350
75 -0.299 0.872 -0.2946 0.303
76 -0.891 0.233 1.0379 1.613
77 -0.355 0.144 2.0573 -0.633
78 0.211 0.342 1.0733 0.309
79 1.130 0.640 0.4414 1.465
80 -1.532 0.288 0.0325 -0.311
81 -0.850 -0.248 0.4526 1.531
82 0.028 -0.916 0.4936 0.404
83 -1.392 -0.985 0.2076 0.626
84 -2.486 -0.736 1.6335 -1.445
85 1.003 -1.782 0.7977 -0.011
86 -0.829 -0.419 -1.0805 -0.452
87 -1.425 -0.298 -2.1553 -1.270
88 1.071 -1.298 1.4008 0.040
89 0.088 -0.059 0.1342 0.235
90 1.076 2.377 1.8930 -1.013
91 -0.785 0.463 1.3918 0.613
92 -2.348 -0.264 -0.5345 -1.189
93 0.299 0.206 -0.3714 1.208
94 1.107 0.370 0.0538 1.445
95 -0.797 0.712 -1.0877 1.061
96 -0.113 0.396 0.0483 0.348
97 -0.208 -0.253 -1.8809 -0.321
98 -1.586 -1.123 -1.3375 1.237
99 -0.328 1.902 0.1402 -0.121
100 -0.627 0.211 -0.7489 -0.696
> class(scores)
[1] "data.frame"

print(scores,digits=2)
Experience compbrand aftersaleservice productsuperioity
1 0.127 0.770 -1.8784 0.366
2 1.222 -1.646 -0.6140 0.813
3 0.616 0.580 0.0037 1.570
4 -0.845 -0.272 1.2675 -1.254
5 -0.320 -0.834 -0.0081 0.448
6 -0.647 -1.067 -1.3032 -1.053
7 -2.627 -0.246 -0.5554 -1.226
8 -0.279 -0.157 -0.7493 -1.015
9 1.052 -0.172 -0.0923 -1.658
10 0.429 0.764 -0.4504 -0.891
11 -0.136 -0.768 -0.4637 0.606
12 -1.450 1.360 0.4378 -1.070
13 0.625 2.113 -0.1683 0.875
14 0.427 -0.404 0.4322 0.902
15 1.439 0.664 -0.2681 -1.044
16 0.920 -1.058 -0.5568 1.167
17 0.522 -0.320 1.1060 -1.032
18 1.713 -0.164 0.2549 -1.478
19 1.161 -0.419 -0.3756 -1.762
20 0.293 1.776 -0.9501 0.241
21 -0.615 -0.179 1.5259 -1.832
22 -0.113 2.834 0.6343 2.244
23 0.081 -0.351 1.1413 1.335
24 1.949 -1.671 -0.8592 0.503
25 0.115 -0.016 0.4718 -1.250
26 0.575 -0.245 0.6243 -1.354
27 0.829 -0.986 1.0426 0.922
28 0.117 -1.107 0.3797 -1.360
29 1.158 -1.606 -0.0558 0.795
30 -0.507 0.162 -1.5513 -0.306
31 -0.811 -0.179 2.2566 0.216
32 -1.074 -1.601 1.1867 -0.070
33 -0.500 0.306 0.1571 -0.970
34 0.279 0.071 -0.0329 -0.656
35 -1.211 0.612 0.2758 -0.689
36 -1.376 -1.059 0.2775 1.029
37 -0.625 -0.244 0.3109 0.661
38 1.364 0.035 0.1112 0.582
39 0.601 0.471 -1.2915 -0.446
40 -0.586 1.482 -1.1845 -1.039
41 0.192 -0.390 -1.9817 -0.596
42 0.043 0.090 -1.1657 0.537
43 0.410 1.958 -1.0947 0.989
44 0.775 1.613 1.5121 -1.149
45 1.270 -1.774 -0.9828 0.737
46 1.060 0.679 0.3242 -1.103
47 -0.123 -0.091 0.9961 1.417
48 2.098 0.462 0.8401 -1.681
49 0.156 0.882 -0.8353 1.298
50 0.230 0.503 -0.8770 1.037
51 -0.942 -0.376 0.1942 -0.653
52 1.561 -1.908 -1.1765 0.721
53 0.860 -1.089 -0.2414 0.872
54 -0.818 -0.529 0.5399 0.331
55 0.541 -0.680 -1.0607 -0.815
56 -0.369 0.283 0.9175 0.604
57 1.979 1.432 -0.0853 -0.839
58 0.206 0.517 0.3475 0.858
59 -1.341 0.553 0.3266 1.940
60 0.853 -1.578 0.5660 0.740
61 0.993 -1.265 1.7001 0.791
62 -1.105 0.709 -0.1528 0.396
63 -0.759 0.260 -1.1884 0.780
64 -1.095 -1.951 0.4272 -0.149
65 -1.209 0.153 0.5776 -0.516
66 1.343 0.537 -1.0391 -1.249
67 0.902 -0.588 2.0624 -1.319
68 0.423 -0.248 -0.3013 -0.847
69 -0.875 -0.604 -0.9976 -0.529
70 0.144 -0.151 -1.2760 -1.000
71 0.344 2.056 0.6863 0.094
72 -1.160 -0.185 -1.2052 0.714
73 0.926 1.316 -1.8699 -0.559
74 -0.567 1.400 1.2266 1.350
75 -0.299 0.872 -0.2946 0.303
76 -0.891 0.233 1.0379 1.613
77 -0.355 0.144 2.0573 -0.633
78 0.211 0.342 1.0733 0.309
79 1.130 0.640 0.4414 1.465
80 -1.532 0.288 0.0325 -0.311
81 -0.850 -0.248 0.4526 1.531
82 0.028 -0.916 0.4936 0.404
83 -1.392 -0.985 0.2076 0.626
84 -2.486 -0.736 1.6335 -1.445
85 1.003 -1.782 0.7977 -0.011
86 -0.829 -0.419 -1.0805 -0.452
87 -1.425 -0.298 -2.1553 -1.270
88 1.071 -1.298 1.4008 0.040
89 0.088 -0.059 0.1342 0.235
90 1.076 2.377 1.8930 -1.013
91 -0.785 0.463 1.3918 0.613
92 -2.348 -0.264 -0.5345 -1.189
93 0.299 0.206 -0.3714 1.208
94 1.107 0.370 0.0538 1.445
95 -0.797 0.712 -1.0877 1.061
96 -0.113 0.396 0.0483 0.348
97 -0.208 -0.253 -1.8809 -0.321
98 -1.586 -1.123 -1.3375 1.237
99 -0.328 1.902 0.1402 -0.121
100 -0.627 0.211 -0.7489 -0.696

➢ Step 6: creating the new model with satisfaction as a dependent variable


sat=as.data.frame(mydata[,13])
> colnames(sat)=c("satisfaction")
> mydatanew=cbind(scores,sat) ## Factor Analysis
> mydatanew
Experience compbrand aftersaleservice productsuperioity satisfaction
1 0.12749104 0.76986860 -1.878446273 0.36648477 8.2
2 1.22166663 -1.64586166 -0.614030010 0.81306481 5.7
3 0.61582140 0.58000368 0.003689252 1.56997685 8.9
4 -0.84462665 -0.27192183 1.267493254 -1.25416452 4.8
5 -0.31979430 -0.83406501 -0.008096627 0.44753766 7.1
6 -0.64702925 -1.06726829 -1.303198892 -1.05277921 4.7
7 -2.62679851 -0.24588272 -0.555423494 -1.22601470 5.7
8 -0.27936394 -0.15732039 -0.749311481 -1.01464175 6.3
9 1.05151341 -0.17228834 -0.092252815 -1.65809634 7.0
10 0.42875382 0.76353272 -0.450377116 -0.89116595 5.5
11 -0.13580761 -0.76759698 -0.463706767 0.60634140 7.4
12 -1.45030579 1.35959912 0.437785016 -1.06981053 6.0
13 0.62461823 2.11311565 -0.168284409 0.87466736 8.4
14 0.42724294 -0.40405102 0.432245882 0.90236591 7.6
15 1.43869881 0.66394839 -0.268050576 -1.04431806 8.0
16 0.91969055 -1.05791159 -0.556847385 1.16667179 6.6
17 0.52182175 -0.31959634 1.106009732 -1.03228845 6.4
18 1.71349224 -0.16356534 0.254874808 -1.47834954 7.4
19 1.16101062 -0.41943765 -0.375574495 -1.76167798 6.8
20 0.29327394 1.77627892 -0.950139113 0.24112808 7.6
21 -0.61501848 -0.17897273 1.525943540 -1.83178487 5.4
22 -0.11282553 2.83382456 0.634265462 2.24434088 9.9
23 0.08062000 -0.35141218 1.141318858 1.33498913 7.0
24 1.94944755 -1.67141336 -0.859208476 0.50283683 8.6
25 0.11534004 -0.01629685 0.471841920 -1.25041487 4.8
26 0.57499258 -0.24490397 0.624292860 -1.35435360 6.6
27 0.82896381 -0.98564797 1.042612499 0.92163700 6.3
28 0.11695051 -1.10728007 0.379702318 -1.35959873 5.4
29 1.15812632 -1.60628019 -0.055788125 0.79531052 6.3
30 -0.50739097 0.16192496 -1.551322987 -0.30617006 5.4
31 -0.81074131 -0.17909238 2.256638942 0.21624964 6.1
32 -1.07438259 -1.60132074 1.186706049 -0.07026025 6.4
33 -0.49992323 0.30576561 0.157100923 -0.97020760 5.4
34 0.27885747 0.07142401 -0.032941868 -0.65628441 7.3
35 -1.21092268 0.61247373 0.275773660 -0.68907425 6.3
36 -1.37569442 -1.05901060 0.277541003 1.02901615 5.4
37 -0.62476762 -0.24359504 0.310901127 0.66051905 7.1
38 1.36407521 0.03533514 0.111220579 0.58229289 8.7
39 0.60127495 0.47053204 -1.291508459 -0.44567425 7.6
40 -0.58595295 1.48246242 -1.184474889 -1.03900017 6.0
41 0.19167763 -0.38987441 -1.981705114 -0.59621998 7.0
42 0.04337736 0.09038218 -1.165712378 0.53711635 7.6
43 0.40978439 1.95821980 -1.094672035 0.98888677 8.9
44 0.77547735 1.61343935 1.512055016 -1.14923990 7.6
45 1.26977129 -1.77421869 -0.982794252 0.73741113 5.5
46 1.06006213 0.67869812 0.324241314 -1.10289754 7.4
47 -0.12283972 -0.09120895 0.996132311 1.41658476 7.1
48 2.09832312 0.46224836 0.840138645 -1.68134357 7.6
49 0.15604110 0.88202250 -0.835276700 1.29848126 8.7
50 0.22982346 0.50302016 -0.877037378 1.03687279 8.6
51 -0.94183170 -0.37565064 0.194174450 -0.65267018 5.4
52 1.56112818 -1.90837771 -1.176496580 0.72135781 5.7
53 0.86011758 -1.08934973 -0.241431240 0.87182584 8.7
54 -0.81818435 -0.52905894 0.539901007 0.33090833 6.1
55 0.54057306 -0.67964718 -1.060702696 -0.81493134 7.3
56 -0.36862437 0.28299033 0.917529711 0.60437604 7.7
57 1.97865621 1.43218345 -0.085319811 -0.83928511 9.0
58 0.20552648 0.51721871 0.347543516 0.85780222 8.2
59 -1.34118399 0.55279292 0.326579529 1.94033636 7.1
60 0.85269365 -1.57772836 0.565957142 0.74035745 7.9
61 0.99335190 -1.26473291 1.700148685 0.79107349 6.6
62 -1.10480994 0.70911509 -0.152796271 0.39572776 8.0
63 -0.75921278 0.26001089 -1.188441475 0.78014681 6.3
64 -1.09474826 -1.95079477 0.427161087 -0.14850194 6.0
65 -1.20922892 0.15287985 0.577570622 -0.51556079 5.4
66 1.34313803 0.53659415 -1.039141561 -1.24941075 7.6
67 0.90215965 -0.58791187 2.062390350 -1.31875384 6.4
68 0.42318247 -0.24798003 -0.301264201 -0.84662237 6.1
69 -0.87487795 -0.60376193 -0.997620068 -0.52944051 5.2
70 0.14372369 -0.15149397 -1.275988102 -1.00015303 6.6
71 0.34387385 2.05641521 0.686346140 0.09426189 7.6
72 -1.16028876 -0.18463387 -1.205197353 0.71392258 5.8
73 0.92620350 1.31556747 -1.869872622 -0.55887325 7.9
74 -0.56659595 1.40049678 1.226627789 1.34965616 8.6
75 -0.29927186 0.87194345 -0.294625640 0.30300903 8.2
76 -0.89076271 0.23334622 1.037887857 1.61337977 7.1
77 -0.35535699 0.14354788 2.057316893 -0.63270298 6.4
78 0.21054781 0.34218260 1.073262401 0.30917078 7.6
79 1.12960563 0.64023318 0.441396478 1.46536309 8.9
80 -1.53178615 0.28775431 0.032504303 -0.31110748 5.7
81 -0.84995072 -0.24812793 0.452562850 1.53107516 7.1
82 0.02821132 -0.91638751 0.493585747 0.40440014 7.4
83 -1.39215814 -0.98489128 0.207609940 0.62550901 6.6
84 -2.48589153 -0.73564594 1.633547463 -1.44488070 5.0
85 1.00347560 -1.78211709 0.797684019 -0.01141758 8.2
86 -0.82905678 -0.41939997 -1.080457442 -0.45156381 5.2
87 -1.42542804 -0.29820535 -2.155317026 -1.27019948 5.2
88 1.07076650 -1.29822928 1.400760179 0.04006707 8.2
89 0.08823132 -0.05909838 0.134228700 0.23513720 7.3
90 1.07621515 2.37671168 1.892951438 -1.01341980 8.2
91 -0.78483349 0.46274897 1.391773475 0.61318828 7.4
92 -2.34793070 -0.26426141 -0.534487111 -1.18940207 4.8
93 0.29898878 0.20636519 -0.371416070 1.20810631 7.6
94 1.10722906 0.37021414 0.053771549 1.44542651 8.9
95 -0.79676401 0.71175008 -1.087719898 1.06131961 7.7
96 -0.11270919 0.39627233 0.048312077 0.34767120 7.3
97 -0.20833274 -0.25264090 -1.880921516 -0.32081680 6.3
98 -1.58596201 -1.12347151 -1.337515839 1.23670188 5.4
99 -0.32827278 1.90243479 0.140227444 -0.12061112 6.4
100 -0.62744070 0.21100398 -0.748923176 -0.69590553 6.4

6.New model linear egression testing


> attach(mydatanew)
> modelnew=lm(satisfaction~.,data=mydatanew)
> summary(modelnew)

Call:
lm(formula = satisfaction ~ ., data = mydatanew)
Residuals:
Min 1Q Median 3Q Max
-1.6308 -0.4996 0.1372 0.4623 1.5228

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.91800 0.07089 97.589 < 2e-16 ***
Experience 0.61805 0.07125 8.675 1.12e-13 ***
compbrand 0.50973 0.07125 7.155 1.74e-10 ***
aftersaleservice 0.06714 0.07125 0.942 0.348
productsuperioity 0.54032 0.07125 7.584 2.24e-11 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.7089 on 95 degrees of freedom


Multiple R-squared: 0.6605, Adjusted R-squared: 0.6462
F-statistic: 46.21 on 4 and 95 DF, p-value: < 2.2e-16

7.New Model Analysis:


1. Intercept P value <0.001 o it definitely not zero and contributes to regression model
2. Experience, Compbrand and productsuperiority have pvalue <0.001 implying that Response variable
Satisfaction is linearly associated with them.so beta coefficient of (0.618, 0.50973, and 0.5403) contribute
significantly to the model
3. After sales service is the only variable which has some high p-value implying that its beta coefficient
may not be contributing that significantly to the model or may be zero
4. Overall p value of the model given by F-statistic gives evidence against the null-hypothesis. Model is
significantly valid at this point
5. Adjusted r square explains that these predictors explains the 64.6 % of the variability in the dataset
which is still good enough (may not fall in excellent category)
6. Vif of new model is around 1 so no multicollinearity

vif(modelnew)
Experience compbrand aftersaleservice productsuperioity
1.002184 1.004588 1.002890 1.007353

8.Testing of New Model-Predicted v/s Actual Satisfactions

backtrack=data.frame(mydata$Satisfaction,predictsat)
> backtrack
mydata.Satisfaction predict.modelnew.
1 8.2 7.461131
2 5.7 7.232186
3 8.9 8.442790
4 4.8 5.664819
5 7.1 6.536470
6 4.7 5.317754
7 5.7 4.469453
8 6.3 6.066612
9 7.0 6.577969
10 5.5 7.060438
11 7.4 6.739281
12 6.0 6.166029
13 8.4 8.842472
14 7.6 7.492682
15 8.0 7.563360
16 6.6 7.540149
17 6.4 6.594089
18 7.4 7.111975
19 6.8 6.444675
20 7.6 8.071185
21 5.4 5.559359
22 9.9 9.548008
23 7.0 7.586644
24 8.6 7.484882
25 4.8 6.337033
26 6.6 6.458666
27 6.3 7.495895
28 5.4 5.716737
29 6.3 7.240977
30 5.4 6.417368
31 6.1 6.593980
32 6.4 5.479441
33 5.4 6.251209
34 7.3 6.769940
35 6.3 6.127986
36 5.4 6.102572
37 7.1 6.785459
38 8.7 8.101165
39 7.6 7.201950
40 6.0 6.670603
41 7.0 6.382541
42 7.6 7.202833
43 8.9 8.630260
44 7.6 7.700263
45 5.5 7.130855
46 7.4 7.344975
47 7.1 7.627871
48 7.6 7.598427
49 8.7 8.109555
50 8.6 7.817810
51 5.4 5.804808
52 5.7 7.220864
53 8.7 7.349170
54 6.1 6.357687
55 7.3 6.394126
56 7.7 7.222578
57 9.0 8.411725
58 8.2 7.795489
59 7.1 7.441189
60 7.9 7.078808
61 6.6 7.428834
62 8.0 6.800197
63 6.3 6.923048
64 6.0 5.195447
65 5.4 6.008776
66 7.6 7.276799
67 6.4 6.601811
68 6.1 6.575471
69 5.2 5.716483
70 6.6 6.303540
71 7.6 8.275765
72 5.8 6.411607
73 7.9 7.733521
74 8.6 8.093294
75 8.2 7.321437
76 7.1 7.427831
77 6.4 6.567802
78 7.6 7.461656
79 8.9 8.763895
80 5.7 5.952047
81 7.1 7.123864
82 7.4 6.719964
83 6.6 5.907461
84 5.0 4.335592
85 8.2 6.677173
86 5.2 5.875296
87 5.2 5.054001
88 8.2 7.033722
89 7.3 7.078467
90 8.2 8.374157
91 7.4 7.093570
92 4.8 4.653626
93 7.6 7.835808
94 8.9 8.575632
95 7.7 7.288792
96 7.3 7.241431
97 6.3 6.360840
98 5.4 5.943548
99 6.4 7.629094
100 6.4 6.211478

lines(mydata$Satisfaction,col="blue")
> lines(predictsat,col="yellow")
> text(28, 9.9, "Actual value", col = "blue")
> text(14.5, 9, "Predicted value", col ="red")

Plot analysis revealed that our new Regression model is quite good and close to actual Satisfaction
scores Blue dots represent Actual Satisfaction ratings Red dots represent Predicted satisfaction scores
derived from multiple linear regression model
9:Conclusion:

We conclude by saying the following

• Satisfaction" ratings of mydata depends very highly on the overall Purchasing experience of the
Customer with highest loading factor & beta : faster delivery and compliant resolution mechanism are
most important

• Company brand is next important & surprisingly Product itself comes in third in order to satisfy the
customer

Disclaimer: There can be differences in the real operating world and these statistical models but this model
comes closest to explaining the data provided for deduction

Annexue : R Code
### loading appropraite libraries

toload_libraries <- c("scatterplot3d","e1071","reshape2","rpsychi", "car", "psych", "corrplot", "forecast",


"GPArotation", "psy", "MVN", "DataExplorer", "ppcor", "Metrics", "foreign", "MASS", "lattice", "nortest",
"Hmisc","factoextra", "nFactors")
new.packages <- toload_libraries[!(toload_libraries %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)
lapply(toload_libraries, require, character.only= TRUE)

### loading appropraite diretory

setwd("C:/Users/ashishj/Desktop/great lakes/R/datasets")
mydata=read.csv("Factor-Hair-Revised.csv")
mydata

## Exploratory data analysis

dim(mydata)

# checking for #na values


sapply(mydata,function(x) sum(is.na(x)))

## data summary
str(mydata)
summary(mydata)

#histogram of data
#since ID is of no sue removing that data from the dataset #

mydata1=mydata[,-1]

plot_histogram(mydata1)
## skewness -Density plots & skew data shows reveal Delivery Speed and Tech support are left
skewed while Sales Force Image is right skewed and Waranty Claims Most resemble a normal
distribution Ecommerce , Complaint Resolution.

plot_density(mydata1)

skewdata=skew(mydata1)
list=names(mydata1)

skewlist=data.frame(list,skewdata)

skewlist

## correlation test

cor1=cor(mydata1)
corrplot(cor1,type = "upper",method = "number")
print(cor1,digits=2)

cormatrix2=cor1
cormatrix2[lower.tri(cormatrix2,diag=TRUE)]=NA
cormatrix2=as.data.frame(as.table(cormatrix2))
cormatrix2=na.omit(cormatrix2)
cormatrix2=cormatrix2[order(-abs(cormatrix2$Freq)),]
cormatrix2
## multicollineraity
attach(mydata1)
model1=lm(Satisfaction~.,data = mydata1)
summary(model1)

multcol=as.data.frame(vif(model1))
multcol

## vif greater than 2 are cauing multicollineraity

##vif for complaint resolution & delivery speed showing high mulicollinearity

# to ahndle mulcollinearity we are using PCA & FA

## linear regression model for all indepenednt variables

lm.ProdQual = lm(Satisfaction ~ ProdQual, mydata1)


lm.ProdQual
plot(ProdQual,Satisfaction,col="red",abline(lm.ProdQual,col="blue"))

lm.Ecom = lm(Satisfaction ~ Ecom, mydata1)


lm.Ecom
plot(Ecom,Satisfaction,col="red",abline(lm.Ecom,col="blue"))

lm.TechSup = lm(Satisfaction ~ TechSup, mydata1)


lm.TechSup
plot(TechSup,Satisfaction,col="red",abline(lm.TechSup,col="blue"))
lm.CompRes = lm(Satisfaction ~ CompRes, mydata1)
lm.CompRes
plot(CompRes,Satisfaction,col="red",abline(lm.CompRes,col="blue"))

lm.Advertising = lm(Satisfaction ~ Advertising, mydata1)


lm.Advertising
plot(Advertising,Satisfaction,col="red",abline(lm.Advertising,col="blue"))

lm.ProdLine = lm(Satisfaction ~ ProdLine, mydata1)


lm.ProdLine
plot(ProdLine,Satisfaction,col="red",abline(lm.ProdLine,col="blue"))

lm.SalesFImage = lm(Satisfaction ~SalesFImage, mydata1)


lm.SalesFImage
plot(SalesFImage,Satisfaction,col="red",abline(lm.SalesFImage,col="blue"))

lm.ComPricing = lm(Satisfaction ~ ComPricing, mydata1)


lm.ComPricing
plot(ComPricing,Satisfaction,col="red",abline(lm.ComPricing,col="blue"))

lm.WartyClaim = lm(Satisfaction ~ WartyClaim, mydata1)


lm.WartyClaim
plot(WartyClaim,Satisfaction,col="red",abline(lm.WartyClaim,col="blue"))

lm.OrdBilling = lm(Satisfaction ~ OrdBilling, mydata1)


lm.OrdBilling
plot(OrdBilling,Satisfaction,col="red",abline(lm.OrdBilling,col="blue"))
lm.DelSpeed = lm(Satisfaction ~ DelSpeed, mydata1)
lm.DelSpeed
plot(DelSpeed,Satisfaction,col="red",abline(lm.DelSpeed,col="blue"))

# factor analysis

KMO(cor2)

factor1=fa(r=mydata2,nfactors = 4,rotate = "none",fm="pa")


factor1
fa.diagram(factor1,main="FA wihtout rotation")
factor1$loadings

factor2=fa(r=mydata2,nfactors = 4,rotate = "varimax",fm="pa")


factor2
factor2$loadings
fa.diagram(factor2,main="FA with rotation")

scores1=factor2$scores
head(scores1)
colnames(scores1)=c("Experience","compbrand","aftersaleservice","productsuperioity")
print(scores,digits=2)
sat=as.data.frame(mydata[,13])
colnames(sat)=c("satisfaction")
mydatanew1=cbind(scores1,sat) ## Factor Analysis
mydatanew1

mode
## Principal Component Analysis

mydata2=mydata[,2:12]
cor2=cor(mydata2)
test1=cortest.bartlett(cor2,nrow(mydata2))
test1$p.value

## Since the p value for the test is quite less signficance level of alpha = 0.001 so we reject the null
hypothesis Ho
## (that PCA cannot be conducted implying that there is no correlation amongst the predictor
variables)

## deciding opn number of factors to be consoidered using Scree plot and eigen values

ev=eigen(cor2)

ev$values ## eigen value < 1 isnot signifcant so 4 factors to be considered

factor=c(1:11)
scree=data.frame(factor,ev$values)
plot(scree,data=mydata2,main="scree plot",col="red")
lines(scree,col="blue")
abline(h=1,col="violet")

## sine elbowing is happening at 5 factors but Eigen values suggesting 4 factors so we will consider
4 factors for PCA

unrotate=principal(mydata2,nfactors = 4,rotate = "none")


print(unrotate,digits = 4)

rotate=principal(mydata2,nfactors = 4,rotate = "varimax")


print(rotate,digits = 4)

print(rotate$loadings[,1],cutoff=0.7)

scores=as.data.frame(rotate$scores)
print(scores,digits = 2)
class(scores)

colnames(scores)=c("Experience","compbrand","aftersaleservice","productsuperioity")
print(scores,digits=2)
sat=as.data.frame(mydata[,13])
colnames(sat)=c("satisfaction")
mydatanew=cbind(scores,sat) ## Factor Analysis
mydatanew

attach(mydatanew)

modelnew=lm(satisfaction~.,data=mydatanew)
summary(modelnew)
vif(modelnew)

predictsat=as.data.frame(predict(modelnew))

backtrack=data.frame(mydata$Satisfaction,predictsat)
backtrack
plot(mydata$Satisfaction,col="red",xslab="data points")
lines(mydata$Satisfaction,col="blue")
lines(predictsat,col="yellow")
text(28, 9.9, "Actual value", col = "blue")
text(14.5, 9, "Predicted value", col ="red")

You might also like