Professional Documents
Culture Documents
By Divyansh Srivastava
Table of Contents
2
1. Project Objective
The objective of the report is to explore the Factor Hair data set (“Factor-Hair-Revised.csv”) in R, and
generate insights about the data set. This exploration report will consist of the following:
3
1.0.1 Importing the dataset in R
The following packages were used in the analysis and representation of the dataset.
‘corrplot’ – This function is used to plot the graph of the correlation matrix. The correlation matrix to
visualize. To visualize a general matrix, please use is.corr=FALSE. The visualization method : “circle”,
“color”, “number”, etc.
‘ppcor’ - The R package ppcor provides users with four functions which are pcor(), pcor.test(), spcor(),
and spcor.test(). The function pcor() (spcor()) calculates the partial (semi-partial) correlations of all
pairs of two random variables of a matrix or a data frame and provides the matrices of statistics and
p-values of each pairwise partial (semi-partial) correlation.
‘tidyverse’ - The "tidyverse" collects some of the most versatile R packages: ggplot2, dplyr, tidyr, readr,
purrr, and tibble. The packages work in harmony to clean, process, model, and visualize data.
‘ggplot2’ - The ggplot2 package, created by Hadley Wickham, offers a powerful graphics language for
creating elegant and complex plots.
‘psych’ - The psych package has been developed at Northwestern University since 2005 to include
functions most useful for personality, psychometric, and psychological research. The package is also
meant to supplement a text on psychometric theory.
‘car’ - Calculates type-II or type-III analysis-of-variance tables for model objects produced by lm, glm,
multinom (in the nnet package), polr (in the MASS package), coxph (in the survival package), coxme
(in the coxme pckage), svyglm (in the survey package), rlm (in the MASS package), lmer in the lme4
package, lme in the nlme package, and (by the default method) for most models with a linear predictor
and asymptotically normal coefficients.
‘nFactors’ - Indices, heuristics and strategies to help determine the number of factors/components to
retain.
Setting a working directory on starting of the R session makes importing and exporting data files and
code files easier. To set up the working directory, we use the command ‘setwd()’. To fetch the path of
the existing working directory, we use the command ‘getwd()’.
4
The given dataset is in .csv format. Hence, the command ‘read.csv’ is used for importing the file.
str – After applying the command, we infer that all factors are numbers, except ‘ID’ which is an integer.
dim - The dataset has 100 rows and 13 columns. Please refer Appendix 6 for source code.
Since the first column is of no use to us, we discard it and create a new dataset called ‘newdata’.
For plotting a histogram of all the independent variables, we use the following function.
‘par’ – we sue this function to divide the plotting space into 12 to display all the histograms at the
same time.
5
For Bivariate analysis, we use the following functions
‘for’ – These are the basic control-flow constructs of the R language. They function in much the same
way as control statements in any Algol-like language.
6
7
1.2 EDA - Check for Outliers and missing values and check the summary
of the dataset
For finding the missing values in our dataset ‘newdata’, we use the function ‘sum’ along with ‘is.na’.
For finding the outliers in our dataset, we use the boxplot function. The resulting graph is displayed
below.
As we can see, there are 4 outliers present in E-Commerce, 2 outliers present in Salesforce Image, 3
outliers present in Order Billing, and 1 outlier present in delivery speed.
8
2 Check for Multicollinearity - Plot the graph based on Multicollinearity
Before checking the correlation of the data we create a new dataset consisting of only the
independent variables called ‘newdata2’.
Now, we sue the function ‘corrplot’ for plotting the correlation between these independent variables.
Please note that blue shows positive correlation, and red shows negative correlation. Therefore, dark
blue shows highest positive correlation, and red shows highest negative correlation.
9
Order Billing and Complain Resolution are highly correlated
Delivery Speed and Complain Resolution are highly correlated
E-Commerce and Salesforce Image are highly correlated
Technical Support and Warranty Claim are highly correlated
We also use the Variable Inflation Factor (VIF) method to check for the multicollinearity.
The numerical value for VIF tells us (in decimal form) what percentage the variance (i.e. the standard
error squared) is inflated for each coefficient. A rule of thumb for interpreting the variance inflation
factor is
1 – Not correlated
Between 1 and 5 – Moderately Correlated
Greater than 5 – Highly Correlated
From the values we get, we can infer that Delivery Speed is a cause of concern.
For Simple Linear Regression with every function, we use the function ‘lm’, and compare the
dependent variable ‘Satisfaction’, against every independent variable.
4.1 Perform PCA/FA and Interpret the Eigen Values (apply Kaiser
Normalization Rule)
Before running PCA/FA on our dataset, we’ll first run the Kaiser-Meyer-Olkin factor adequacy test on
our dataset to check if this is a suitable method to be used here.
We’ll run this test on our earlier created correlation matrix which we named as ‘CorMat’.
Since the MSA for the data is greater than 5, we can run Factor Analysis on our dataset.
Firstly, we’ll compute the eigen values for our dataset with independent varibles ‘newdata2’.
10
We then draw a Scree plot for the same to understand our data better.
As per the Kaiser Normalization rule, we take factors only greater than one for decide the number to
which these 11 variables will be reduced.
New, we run factor analysis on our data using principal axis method, where with the number of factors
as four.
After running the factor analysis method, we graphically represent the factor loadings, as mentioned
below.
In bringing down 11 factors to 4, we lost around 31% of the variance. Only the first four factors have
Eigen Value greater than 1.
11
Factor 3 = 13.60% of the variance;
After rotating the data, we get the below mentioned loading graph.
12
The red dotted lines mean that the loadings are negative and that it marginally falls under PA4.
13
4.2 Output Interpretation Tell why only 4 factors are being asked in the
questions and tell whether it is correct in choosing 4 factors. Name
the factors with correct explanations
As already explained in the last answer, we brought down the number of factors down to four as per
the Kaiser Normalization rule.
We now combine the newly created independent variables PA1, PA2, PA3 and PA4, and combine them
with the dependent variable Customer Satisfaction.
14
After creating the new dataset ‘Zdata’, we create the ‘Test’ and ‘Train’ datasets out of it to test the
model.
After running the multiple linear regression analysis on the ‘Train’ dataset, we can see that
‘Product_Purchase’, ‘Marketing’, and ‘Positioning’ are highly significant, as can be noted from the
three stars, which have been highlighted in the Appendix as well.
Therefore, we will now run multiple linear regression for Customer Satisfaction with respect to these
three factors.
5.3 MLR summary interpretation and significance (R, R2, Adjusted R2,
Degrees of Freedom, f-statistic, coefficients along with p-values)
R (Residual Standard Error) - Residual Standard Error is measure of the quality of a linear regression
fit. Theoretically, every linear model is assumed to contain an error term E. Due to the presence of
this error term, we are not capable of perfectly predicting our response variable (Customer
Satisfaction) from the predictor (Independent variables). The residual standard error in this case is
0.6683.
R2 (Multiple R-squared) - The R-squared (R2) statistic provides a measure of how well the model is
fitting the actual data. It takes the form of a proportion of variance. R2 is a measure of the linear
relationship between our predictor variables and our response / target variable (Customer
Satisfaction). It always lies between 0 and 1 (i.e. a number near 0 represents a regression that does
not explain the variance in the response variable well and a number close to 1 does explain the
observed variance in the response variable). In our example, the R2 we get is 0.6951. Or roughly 69%
of the variance found in the response variable (Customer Satisfaction) can be explained by the
predictor variables (Product_Purchase, Marketing and Positioning).
Adjusted R2(R-squared) - In multiple regression settings, the R2 will always increase as more variables
are included in the model. That’s why the adjusted R2 is the preferred measure as it adjusts for the
number of variables considered. The adjusted R2 in this case is 0.6856.
Degrees of Freedom - The Residual Standard Error was calculated with 96 degrees of freedom.
Simplistically, degrees of freedom are the number of data points that went into the estimation of the
parameters used after taking into account these parameters (restriction). In our case, we had 100 data
points.
15
F-statistic - F-statistic is a good indicator of whether there is a relationship between our predictor and
the response variables. The further the F-statistic is from 1 the better it is. However, how much larger
the F-statistic needs to be depends on both the number of data points and the number of predictors.
The F-statistic for this analysis is 72.96.
Coefficients along with p-values - A small p-value indicates that it is unlikely we will observe a
relationship between the predictor (Product_Purchase, Marketing and Positioning) and response
(Customer Satisfaction) variables due to chance. Typically, a p-value of 5% or less is a good cut-off
point. In our model example, the p-values are very close to zero. Note the ‘signif. Codes’ associated to
each estimate. Three stars (or asterisks) represent a highly significant p-value.
The output has been well interpreted in 5.3. Along with that, the following points were covered in this
project.
Multiple Linear regression with Customer satisfaction as the dependent variable and other
factors as dependent variables
16
Appendix
##Appendix 1
setwd("D:/learning/BABI Online")
getwd()
##Appendix2
mydata=read.csv("Factor-Hair-Revised.csv")
##Appendix3
str(mydata)
##Appendix4
library(corrplot)
library(tidyverse)
## -- Conflicts ----------------------------------------------------------
--------------------------------------- tidyverse_conflicts() --
17
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(ggplot2)
library(psych)
##
## Attaching package: 'psych'
library(car)
##
## Attaching package: 'car'
library(cartools)
##
## Attaching package: 'cartools'
library(ppcor)
##
## Attaching package: 'MASS'
library(nFactors)
18
##
## Attaching package: 'boot'
##
## Attaching package: 'lattice'
##
## Attaching package: 'nFactors'
##Appendix5
summary(mydata)
19
## Min. :4.700
## 1st Qu.:6.000
## Median :7.050
## Mean :6.918
## 3rd Qu.:7.625
## Max. :9.900
##Appendix6
dim(mydata)
## [1] 100 13
##Appendix7
names(mydata)
##Appendix8
attach(mydata)
##Appendix9
newdata=mydata[c(2:13)]
##Appendix10
names=c("Product Quality","E-Commerce","Technical Support","Complaint Reso
lution","Advertising","Product Line","Salesforce Image","Competitive Prici
ng","Warranty & Claims","Order & Billing","Delivery Speed","Customer Satis
faction")
##Appendix11
# Histogram of independent variables
par(mfrow = c(3,4))
for (i in (1:11)) {
h = round(max(newdata[,i]),0)+1
20
l = round(min(newdata[,i]),0)-1
n = names[i]
21
##Appendix12
# Bivariate Analysis ####
par(mfrow = c(4,3))
22
for (i in c(1:11)) {plot(newdata[,i],`Satisfaction`,
xlab = names[i], ylab = NULL, col = "red",
cex.lab=1, cex.axis=1, cex.main=1, cex.sub=1,
xlim = c(0,10),ylim = c(0,10))
abline(lm(formula = `Satisfaction` ~ newdata[,i]),col = "blue")}
23
##Appendix13
#Clear all plots
#Finding the missing values in the data
sum(is.na(newdata))
## [1] 0
##Appendix14
#Checking for outliers
boxplot(newdata)
##Appendix15
#Checking correlation
newdata2=newdata[1:11]
CorMat=corrplot(cor(newdata2))
24
CorMat
25
## TechSup 0.19262546 0.01699054 -0.27078668 0.79716793 0.08010182
## CompRes 0.56141695 0.22975176 -0.12795425 0.14040830 0.75686859
## Advertising -0.01155082 0.54220366 0.13421689 0.01079207 0.18423559
## ProdLine 1.00000000 -0.06131553 -0.49494840 0.27307753 0.42440825
## SalesFImage -0.06131553 1.00000000 0.26459655 0.10745534 0.19512741
## ComPricing -0.49494840 0.26459655 1.00000000 -0.24498605 -0.11456703
## WartyClaim 0.27307753 0.10745534 -0.24498605 1.00000000 0.19706512
## OrdBilling 0.42440825 0.19512741 -0.11456703 0.19706512 1.00000000
## DelSpeed 0.60185021 0.27155126 -0.07287173 0.10939460 0.75100307
## DelSpeed
## ProdQual 0.02771800
## Ecom 0.19163607
## TechSup 0.02544069
## CompRes 0.86509170
## Advertising 0.27586308
## ProdLine 0.60185021
## SalesFImage 0.27155126
## ComPricing -0.07287173
## WartyClaim 0.10939460
## OrdBilling 0.75100307
## DelSpeed 1.00000000
##Appendix16
#Check for multicollinearity in independent variables using VIF
vif(lm(`Satisfaction`~.,newdata2))
##Appendix17
#Simple Linear Regression with every variable
SLM1=lm(Satisfaction~ProdQual)
summary(SLM1)
##
## Call:
## lm(formula = Satisfaction ~ ProdQual)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.88746 -0.72711 -0.01577 0.85641 2.25220
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
26
## (Intercept) 3.67593 0.59765 6.151 1.68e-08 ***
## ProdQual 0.41512 0.07534 5.510 2.90e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.047 on 98 degrees of freedom
## Multiple R-squared: 0.2365, Adjusted R-squared: 0.2287
## F-statistic: 30.36 on 1 and 98 DF, p-value: 2.901e-07
SLM2=lm(Satisfaction~Ecom)
summary(SLM2)
##
## Call:
## lm(formula = Satisfaction ~ Ecom)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.37200 -0.78971 0.04959 0.68085 2.34580
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.1516 0.6161 8.361 4.28e-13 ***
## Ecom 0.4811 0.1649 2.918 0.00437 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.149 on 98 degrees of freedom
## Multiple R-squared: 0.07994, Adjusted R-squared: 0.07056
## F-statistic: 8.515 on 1 and 98 DF, p-value: 0.004368
SLM3=lm(Satisfaction~TechSup)
summary(SLM3)
##
## Call:
## lm(formula = Satisfaction ~ TechSup)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.26136 -0.93297 0.04302 0.82501 2.85617
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.44757 0.43592 14.791 <2e-16 ***
## TechSup 0.08768 0.07817 1.122 0.265
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.19 on 98 degrees of freedom
## Multiple R-squared: 0.01268, Adjusted R-squared: 0.002603
## F-statistic: 1.258 on 1 and 98 DF, p-value: 0.2647
27
SLM4=lm(Satisfaction~CompRes)
summary(SLM4)
##
## Call:
## lm(formula = Satisfaction ~ CompRes)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.40450 -0.66164 0.04499 0.63037 2.70949
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.68005 0.44285 8.310 5.51e-13 ***
## CompRes 0.59499 0.07946 7.488 3.09e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9554 on 98 degrees of freedom
## Multiple R-squared: 0.3639, Adjusted R-squared: 0.3574
## F-statistic: 56.07 on 1 and 98 DF, p-value: 3.085e-11
SLM5=lm(Satisfaction~Advertising)
summary(SLM5)
##
## Call:
## lm(formula = Satisfaction ~ Advertising)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.34033 -0.92755 0.05577 0.79773 2.53412
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.6259 0.4237 13.279 < 2e-16 ***
## Advertising 0.3222 0.1018 3.167 0.00206 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.141 on 98 degrees of freedom
## Multiple R-squared: 0.09282, Adjusted R-squared: 0.08357
## F-statistic: 10.03 on 1 and 98 DF, p-value: 0.002056
SLM6=lm(Satisfaction~ProdLine)
summary(SLM6)
##
## Call:
## lm(formula = Satisfaction ~ ProdLine)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.3634 -0.7795 0.1097 0.7604 1.7373
28
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.02203 0.45471 8.845 3.87e-14 ***
## ProdLine 0.49887 0.07641 6.529 2.95e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1 on 98 degrees of freedom
## Multiple R-squared: 0.3031, Adjusted R-squared: 0.296
## F-statistic: 42.62 on 1 and 98 DF, p-value: 2.953e-09
SLM7=lm(Satisfaction~SalesFImage)
summary(SLM7)
##
## Call:
## lm(formula = Satisfaction ~ SalesFImage)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.2164 -0.5884 0.1838 0.6922 2.0728
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.06983 0.50874 8.000 2.54e-12 ***
## SalesFImage 0.55596 0.09722 5.719 1.16e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.037 on 98 degrees of freedom
## Multiple R-squared: 0.2502, Adjusted R-squared: 0.2426
## F-statistic: 32.7 on 1 and 98 DF, p-value: 1.164e-07
SLM8=lm(Satisfaction~ComPricing)
summary(SLM8)
##
## Call:
## lm(formula = Satisfaction ~ ComPricing)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.9728 -0.9915 -0.1156 0.9111 2.5845
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.03856 0.54427 14.769 <2e-16 ***
## ComPricing -0.16068 0.07621 -2.108 0.0376 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.172 on 98 degrees of freedom
29
## Multiple R-squared: 0.04339, Adjusted R-squared: 0.03363
## F-statistic: 4.445 on 1 and 98 DF, p-value: 0.03756
SLM9=lm(Satisfaction~WartyClaim)
summary(SLM9)
##
## Call:
## lm(formula = Satisfaction ~ WartyClaim)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.36504 -0.90202 0.03019 0.90763 2.88985
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.3581 0.8813 6.079 2.32e-08 ***
## WartyClaim 0.2581 0.1445 1.786 0.0772 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.179 on 98 degrees of freedom
## Multiple R-squared: 0.03152, Adjusted R-squared: 0.02164
## F-statistic: 3.19 on 1 and 98 DF, p-value: 0.0772
SLM10=lm(Satisfaction~OrdBilling)
summary(SLM10)
##
## Call:
## lm(formula = Satisfaction ~ OrdBilling)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.4005 -0.7071 -0.0344 0.7340 2.9673
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.0541 0.4840 8.377 3.96e-13 ***
## OrdBilling 0.6695 0.1106 6.054 2.60e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.022 on 98 degrees of freedom
## Multiple R-squared: 0.2722, Adjusted R-squared: 0.2648
## F-statistic: 36.65 on 1 and 98 DF, p-value: 2.602e-08
SLM11=lm(Satisfaction~DelSpeed)
summary(SLM11)
##
## Call:
## lm(formula = Satisfaction ~ DelSpeed)
##
30
## Residuals:
## Min 1Q Median 3Q Max
## -2.22475 -0.54846 0.08796 0.54462 2.59432
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.2791 0.5294 6.194 1.38e-08 ***
## DelSpeed 0.9364 0.1339 6.994 3.30e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9783 on 98 degrees of freedom
## Multiple R-squared: 0.333, Adjusted R-squared: 0.3262
## F-statistic: 48.92 on 1 and 98 DF, p-value: 3.3e-10
##Appendix18
#Factor Analysis
#Kaiser Test
KMO(CorMat)
##Appendix19
#Since MSA>5 we can run factor analysis on this data
#Eigen value computation
ev=eigen(cor(newdata2))
print(ev,digits=5)
## eigen() decomposition
## $values
## [1] 3.426971 2.550897 1.690976 1.086556 0.609424 0.551884 0.401518
## [8] 0.246952 0.203553 0.132842 0.098427
##
## $vectors
## [,1] [,2] [,3] [,4] [,5] [,6] [
,7]
## [1,] -0.13379 0.313498 0.062272 0.64314 0.231666 0.564570 -0.1916
413
## [2,] -0.16595 -0.446509 -0.235248 0.27238 0.422288 -0.263257 -0.0596
31
262
## [3,] -0.15769 0.230967 -0.610951 -0.19339 -0.023957 0.108769 0.0171
999
## [4,] -0.47068 -0.019444 0.210351 -0.20632 0.028657 0.028152 0.0084
996
## [5,] -0.18373 -0.363665 -0.088097 0.31789 -0.803870 0.200569 0.0630
696
## [6,] -0.38677 0.284781 0.116279 0.20290 0.116674 -0.098195 0.6081
476
## [7,] -0.20367 -0.470696 -0.241342 0.22218 0.204373 -0.104972 -0.0014
374
## [8,] 0.15169 -0.413457 0.053045 -0.33354 0.248926 0.709736 0.3082
489
## [9,] -0.21293 0.191672 -0.598564 -0.18530 -0.032927 0.139840 0.0306
402
## [10,] -0.43722 -0.026399 0.168930 -0.23685 0.026754 0.119480 -0.6593
199
## [11,] -0.47309 -0.073052 0.232625 -0.19733 -0.035433 -0.029800 0.2342
393
## [,8] [,9] [,10] [,11]
## [1,] 0.135473 0.031328 -0.066597 -0.182792
## [2,] -0.122026 -0.542511 -0.281558 -0.062339
## [3,] 0.464710 -0.359300 0.388171 0.051930
## [4,] 0.513398 0.093248 -0.534672 0.362534
## [5,] -0.053477 -0.154682 -0.037158 0.081187
## [6,] -0.333207 -0.084155 0.234798 0.385078
## [7,] 0.169107 0.644899 0.353412 0.084699
## [8,] -0.098832 -0.094144 0.045182 0.102958
## [9,] -0.443540 0.317566 -0.435348 -0.128932
## [10,] -0.366018 -0.099073 0.303865 0.194151
## [11,] 0.065391 -0.021885 0.120104 -0.775632
EigenValue=ev$values
EigenValue
Factor=c(1,2,3,4,5,6,7,8,9,10,11)
Scree=data.frame(Factor,EigenValue)
plot(Scree,main="Scree Plot", col="Blue",ylim=c(0,5))
lines(Scree,col="Blue")
32
##Appendix20
#we will take 4 values because any value less than 1 is not of value as pe
r Kaiser
Unrotate=fa(newdata2,nfactors=4,rotate="none",fm="pa")
print(Unrotate,digits=4)
33
## Cumulative Proportion 0.4222 0.7142 0.9110 1.0000
##
## Mean item complexity = 1.9
## Test of the hypothesis that 4 factors are sufficient.
##
## The degrees of freedom for the null model are 55 and the objective fu
nction was 6.5531 with Chi Square of 619.2726
## The degrees of freedom for the model are 17 and the objective function
was 0.3297
##
## The root mean square of the residuals (RMSR) is 0.017
## The df corrected root mean square of the residuals is 0.0306
##
## The harmonic number of observations is 100 with the empirical chi squa
re 3.1886 with prob < 0.9999
## The total number of observations was 100 with Likelihood Chi Square =
30.2733 with prob < 0.02444
##
## Tucker Lewis Index of factoring reliability = 0.92146
## RMSEA index = 0.09639 and the 90 % confidence intervals are 0.03169
0.13934
## BIC = -48.0146
## Fit based upon off diagonal values = 0.9974
## Measures of factor score adequacy
## PA1 PA2 PA3
## Correlation of (regression) scores with factors 0.9806 0.9738 0.9528
## Multiple R square of scores with factors 0.9616 0.9483 0.9078
## Minimum correlation of possible factor scores 0.9232 0.8966 0.8155
## PA4
## Correlation of (regression) scores with factors 0.8825
## Multiple R square of scores with factors 0.7789
## Minimum correlation of possible factor scores 0.5577
fa.diagram(Unrotate)
34
Unrotate$loadings
##
## Loadings:
## PA1 PA2 PA3 PA4
## ProdQual 0.201 -0.408 0.463
## Ecom 0.290 0.659 0.270 0.216
## TechSup 0.278 -0.381 0.738 -0.166
## CompRes 0.862 -0.255 -0.184
## Advertising 0.286 0.457 0.129
## ProdLine 0.689 -0.453 -0.142 0.315
## SalesFImage 0.395 0.801 0.346 0.251
## ComPricing -0.232 0.553 -0.286
## WartyClaim 0.379 -0.324 0.735 -0.153
## OrdBilling 0.747 -0.175 -0.181
## DelSpeed 0.895 -0.303 -0.198
##
## PA1 PA2 PA3 PA4
## SS loadings 3.215 2.223 1.499 0.678
## Proportion Var 0.292 0.202 0.136 0.062
## Cumulative Var 0.292 0.494 0.631 0.692
##Appendix21
Rotate=fa(newdata2,nfactors=4,rotate="varimax",fm="pa")
print(Rotate,digits=4)
35
## Factor Analysis using method = pa
## Call: fa(r = newdata2, nfactors = 4, rotate = "varimax", fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
## PA1 PA2 PA3 PA4 h2 u2 com
## ProdQual 0.0240 -0.0700 0.0157 0.6470 0.4243 0.57570 1.027
## Ecom 0.0676 0.7874 0.0279 -0.1132 0.6382 0.36183 1.059
## TechSup 0.0198 -0.0252 0.8832 0.1164 0.7946 0.20539 1.037
## CompRes 0.8977 0.1295 0.0535 0.1317 0.8428 0.15719 1.093
## Advertising 0.1662 0.5300 -0.0429 -0.0624 0.3142 0.68579 1.239
## ProdLine 0.5255 -0.0353 0.1273 0.7118 0.8003 0.19971 1.922
## SalesFImage 0.1154 0.9715 0.0635 -0.1345 0.9792 0.02076 1.076
## ComPricing -0.0757 0.2129 -0.2089 -0.5904 0.4433 0.55673 1.566
## WartyClaim 0.1026 0.0566 0.8851 0.1280 0.8135 0.18647 1.078
## OrdBilling 0.7682 0.1267 0.0882 0.0887 0.6218 0.37818 1.109
## DelSpeed 0.9487 0.1852 -0.0049 0.0874 0.9420 0.05796 1.094
##
## PA1 PA2 PA3 PA4
## SS loadings 2.6349 1.9671 1.6409 1.3714
## Proportion Var 0.2395 0.1788 0.1492 0.1247
## Cumulative Var 0.2395 0.4184 0.5675 0.6922
## Proportion Explained 0.3460 0.2583 0.2155 0.1801
## Cumulative Proportion 0.3460 0.6044 0.8199 1.0000
##
## Mean item complexity = 1.2
## Test of the hypothesis that 4 factors are sufficient.
##
## The degrees of freedom for the null model are 55 and the objective fu
nction was 6.5531 with Chi Square of 619.2726
## The degrees of freedom for the model are 17 and the objective function
was 0.3297
##
## The root mean square of the residuals (RMSR) is 0.017
## The df corrected root mean square of the residuals is 0.0306
##
## The harmonic number of observations is 100 with the empirical chi squa
re 3.1886 with prob < 0.9999
## The total number of observations was 100 with Likelihood Chi Square =
30.2733 with prob < 0.02444
##
## Tucker Lewis Index of factoring reliability = 0.92146
## RMSEA index = 0.09639 and the 90 % confidence intervals are 0.03169
0.13934
## BIC = -48.0146
## Fit based upon off diagonal values = 0.9974
## Measures of factor score adequacy
## PA1 PA2 PA3
## Correlation of (regression) scores with factors 0.9819 0.9861 0.9396
## Multiple R square of scores with factors 0.9641 0.9724 0.8828
## Minimum correlation of possible factor scores 0.9281 0.9448 0.7657
## PA4
## Correlation of (regression) scores with factors 0.8816
36
## Multiple R square of scores with factors 0.7772
## Minimum correlation of possible factor scores 0.5545
fa.diagram(Rotate)
Rotate$loadings
##
## Loadings:
## PA1 PA2 PA3 PA4
## ProdQual 0.647
## Ecom 0.787 -0.113
## TechSup 0.883 0.116
## CompRes 0.898 0.130 0.132
## Advertising 0.166 0.530
## ProdLine 0.525 0.127 0.712
## SalesFImage 0.115 0.971 -0.135
## ComPricing 0.213 -0.209 -0.590
## WartyClaim 0.103 0.885 0.128
## OrdBilling 0.768 0.127
## DelSpeed 0.949 0.185
##
## PA1 PA2 PA3 PA4
## SS loadings 2.635 1.967 1.641 1.371
## Proportion Var 0.240 0.179 0.149 0.125
## Cumulative Var 0.240 0.418 0.568 0.692
37
##Appendix22
#Data for all rows
head(Rotate$scores)
Zdata=cbind(newdata[12],Rotate$scores)
##Appendix23
#Naming the new columns
names(Zdata)=c("Satisfaction","Product_Purchase","Marketing","After_Sales"
,"Positioning")
head(Zdata)
##Appendix24
set.seed(100)
sample=sample(1:nrow(Zdata),0.7*nrow(Zdata))
Train=subset(Zdata,sample=T)
Test=subset(Zdata,sample=F)
MLTrain=lm(Satisfaction~.,Train)
summary(MLTrain)
##
## Call:
## lm(formula = Satisfaction ~ ., data = Train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.7125 -0.4708 0.1024 0.4158 1.3483
##
38
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.91800 0.06696 103.317 < 2e-16 ***
## Product_Purchase 0.57963 0.06857 8.453 3.32e-13 ***
## Marketing 0.61978 0.06834 9.070 1.61e-14 ***
## After_Sales 0.05692 0.07173 0.794 0.429
## Positioning 0.61168 0.07656 7.990 3.16e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6696 on 95 degrees of freedom
## Multiple R-squared: 0.6971, Adjusted R-squared: 0.6844
## F-statistic: 54.66 on 4 and 95 DF, p-value: < 2.2e-16
##Appendix25
MLR=lm(Satisfaction~Product_Purchase+Marketing+Positioning,data=Zdata)
summary(MLR)
##
## Call:
## lm(formula = Satisfaction ~ Product_Purchase + Marketing + Positioning,
## data = Zdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.68988 -0.46632 0.08656 0.41138 1.38575
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.91800 0.06683 103.517 < 2e-16 ***
## Product_Purchase 0.57944 0.06844 8.466 2.90e-13 ***
## Marketing 0.62068 0.06819 9.102 1.27e-14 ***
## Positioning 0.61488 0.07630 8.058 2.14e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6683 on 96 degrees of freedom
## Multiple R-squared: 0.6951, Adjusted R-squared: 0.6856
## F-statistic: 72.96 on 3 and 96 DF, p-value: < 2.2e-16
39