You are on page 1of 13

Advance Statistics – Project – 2

Topic: Factor Hair- Revised


Presented by: Sanan Sahadevan Olachery.
Submission Date: Dec 8th 2019.

1|Page
Content:

 NOTES: -- Page 2
 Question 1 -- Page 3 to Page4
 Question 2 – Page 5 to Page 9
 Question 3 – Page 9 to Page 11
 Question 4 – Page 11 to 12

NOTES:
The project is done on R studio and below captioned Packages were used in the project.

Variables Variables
Abbreviations
Production Quality ProdQual
E commerce Ecom
Technical Support TechSup
Complaint Resolution CompRes
Advertisement Advertising
Product Line ProdLine
Sales Force Image SalesFImage
Competition Price ComPricing
Warranty & Claims WartyClaim
Order & Billings OrdBilling
Delivery Speed DelSpeed
Satisfaction Satisfaction

NOTE: R Packages used for the Project:

 CorrPlot – For Making Correlation.


 GGPlot2- For Plotting.
 Psych – For Factor Analysis.
 Car - For Checking Multicollinearity.
 Ppcor – Pairwise correlation.

2|Page
Q1}. Perform exploratory data analysis on the dataset. Showcase some charts, graphs. Check for outliers
and missing values.

Solution:

Definition: Exploratory data analysis (EDA), is an approach to analyzing data sets and summarize their
main characteristics, with visual methods.
On the Dataset provided we conducted a basic summarization to find structure, Outliers & Missing
Values present in the dataset through R studio. We can omit ID from data set as it is not a contributing
factor in the factor analysis.
From the results of analysis in below Fig#1, Fig#2 and Fig#3 we can see that there are no Outlier
presents in the data set and the structure determines that there are no missing values in the DATA Set.
The Dataset is further analysed by using Plot_Intro Function in Rstudio, and from the observations we
confirm that there are no Missing Values in the DATA SET. Fig#4.

To confirm there are no Outliers in the Dataset a graphical representation is carried out using
Histogram. It also suggests that there are no outliers present in the DATASET. Fig#5.

Fig#1 Summary:

Fig#2 Structure:

3|Page
Fig#3

Fig#4:

Fig#5:

4|Page
Q2} Is there evidence of multicollinearity?

Solution:

Definition: Multi collinearity is a statistical Phenomenon in which multiple independent variables show
high correlation between each other. It means a correlation between two variables that causes
confusion in a study because the variables are closely related.

We use the dataset in R Studio to find correlation between the independent variables. From the below
table (Fig:1, Fig:1.1 &Fig:1.2) we can see some Independent Variables are highly correlated. These are:
 SalesFImage is highly correlated between Ecom &Advertising,
 CompRes is highly correlated between OrdBilling, Delspeed &Satisfaction

Also by conduction a pairwise correlation test in R studio using the Pearson Method,(Fig:2 &Fig2.1) we
can interpret & Conclude that there is high degree of collinearity between independent variables: The
correlation between SalesFImage & Ecom, similarly OrdBilling & Delspeed with CompRes is highly
significant.

We can therefore conclude that the independent variables given in the dataset are correlated and the
degree of colinearity between these independent variables are highly significant. Confirming
multicolinearity in the dataset.

Fig:1
Variables Variables Values
ProdQual Satisfaction 0.49
Ecom SalesFImage 0.79
TechSup WartyClaim 0.80
CompRes DelSpeed 0.87
Advertising SalesFImage 0.54
ProdLine DelSpeed 0.60
OrdBilling CompRes 0.76
Satisfaction CompRes 0.60

5|Page
Fig:1.1

6|Page
Fig:1.2

Fig:2

7|Page
Fig:2.1

8|Page
Q3: Perform simple linear regression for the dependent variable with every independent variable.

Solution:

Using LM Function Simple Linear Regression for the dependent variable with every Independent variable
was carried out. Results of the same is depicted in the below.

1) Lm(formula= satisfaction~ProdQual, data=AssignmentHair1)

Coefficients (intercept)=3.6759 & ProdQual=0.4151

Satisfaction =3.6759 + 0.4151 *ProdQual

2) Lm(formula= satisfaction~ Ecom, data=AssignmentHair1)

Coefficients (intercept)=5.1516 & Ecom=0.4811

Satisfaction = 5.1516 + 0.4811 *Ecom

9|Page
3) Lm(formula= satisfaction~ TechSup, data=AssignmentHair1)

Coefficients (intercept)= 6.44757 & TechSup = 0.08768

Satisfaction = 6.44757 + 0.08768 *TechSup

4) Lm(formula= satisfaction~ CompRes, data=AssignmentHair1)

Coefficients (intercept)= 3.680 & CompRes = 0.595

Satisfaction = 3.680 + 0.595 * CompRes

5) Lm(formula= satisfaction~ Advertising, data=AssignmentHair1)

Coefficients (intercept)= 5.6259 & Advertising = 0.3222

Satisfaction = 5.6259 + 0.3222 * Advertising

6) Lm(formula= satisfaction~ ProdLine, data=AssignmentHair1)

Coefficients (intercept)= 4.0220 & ProdLine = 0.4989

Satisfaction = 4.0220 + 0.4989 * ProdLine

7) Lm(formula= satisfaction~ SalesFImage, data=AssignmentHair1)

Coefficients (intercept)= 4.070 & SalesFImage = 0.556

Satisfaction = 4.070 + 0.556 * SalesFImage

8) Lm(formula= satisfaction~ ComPricing, data=AssignmentHair1)

Coefficients (intercept)= 8.0386 & ComPricing = (-0.1607)

Satisfaction = 8.0386 + (-0.1607) * ComPricing

9) Lm(formula= satisfaction~ WartyClaim, data=AssignmentHair1)

Coefficients (intercept)= 5.3581 & WartyClaim = 0.2581

Satisfaction = 5.3581 + 0.2581 * WartyClaim

10) Lm(formula= satisfaction~ OrdBilling, data=AssignmentHair1)

Coefficients (intercept)= 4.0541 & OrdBilling = 0.6695

Satisfaction = 4.0541 + 0.6695 * OrdBilling

10 | P a g e
11) Lm(formula= satisfaction~ DelSpeed, data=AssignmentHair1)

Coefficients (intercept)= 3.2791 & DelSpeed = 0.9364

Satisfaction 3.2791 + 0.9364 * DelSpeed

Q.4) Perform PCA/Factor analysis by extracting 4 factors. Interpret the output and name the Factors.

Solution:

We conducted PCA/Factor Analysis using Rotation method Varimax with 4 factors.

There is 0.80 or 80% cumulative Proportion Variation in the Data Set.

With mean item complexity of 1.4 we can interpret that choosing 4 factor components are sufficient.

The Dataset was further narrowed down to 4 variables consisting dependent variables from the original
dataset.

Sr. no Variable Name Variables Abbreviations


RC1 Purchasing Experience PExp
RC2 Marketing Mkt
RC3 Post Sales P.Sale
RC4 Product Prod.
 P.EXP : It includes Variable OrdBilling, DelSpeed & CompRes from the original Dataset depicting
the Purchasing experience.
 Mkt: Includes variable- Ecom, SalesFImage & Advertising which displays the marketing pattern.
 P. Sales: Includes: TechSup & WartyClaim for post sales.
 Prod: Includes ProdQual &ProdLine from the original Dataset.

11 | P a g e
12 | P a g e
13 | P a g e

You might also like