Data Screening& Factor Analysis

Data Analysis
Reliability-Measures
Cronbach’s alpha
Apha=(k/k-1)(1-((sum of item var)/summated scale var))
Alpha=k*AV(ri)/(1+(k-1)*(AV(ri))
 SPSS: Analyze->scaling->alpha
Data screening
Case Screening
Missing data in rows
 use count-blank function in Excel, if blank in a questionnaire is more than
or equal to 15% (Hair et al., 2014) , discard the questionnaire.
 Else replace with
 Mean or median
 Expectation maximization likelihood approach
 regression imputation techniques
Unengaged responses (Monotone response)

 Find the variance of each questionnaire, if it is very low discard the
questionnaire. (Hair et al. 2014)
Outliers (on continuous variables)
 Use BOX plot in SPSS, and check the reasoning for their outlier and
make the change using mean value of the variables
Variable Screening
Mahalanobis distance: for multivariate outliers
 Remove questionnaire with probability less than 0.001
Missing data in columns

Go to SPSS, transform-replace missing values
 For categorical variables replace with median
 For continuous variables replace with series mean
Skewness & Kurtosis

Skewness is obtained for continuous variable while
kurtosis for categorical variables. Use SPSS-descriptive
Normality Test, Cross tabulation
and Chi-square tests
Normality test
Analytical method-
Categorical variables- Ratio of Kurtosis to std. Of kurtosis
 Rule of thumb: within ± 1.96(or 2) or Kutosis value = ± 3
 Std of Kurtosis = Sqrt(24/(sample size “n”))
Continuous variables: Ratio of Skewness to Std. Of Skewness
 Rule of thumb: within ± 1.96(or 2). Or Skewness = ± 1 (.8)
 Std. Of Skewness = Sqrt(6/(sample size “n”))
Graphical method-
Normal Q-Q graph- mapping the variable with the equivalent normal variable with same
parameters.
Histogram
Tests in SPSS: Kolmogorov-Smirnov and Shapiro-Wilk
H0::data is normal
P value must be greater than 0.05
Goto Analyse -> Descriptive -> frequency-> charts-> histogram (check show normal
curve on histogram)
Goto Analyse -> Descriptive -> Explore-> Plots-> check Histogram and normality plots
with tests
Cross Tabulation
Used to find relationship between two categorical
variables
Excel:
Using Pivot Table
 SPSS:
 Analyse-> descriptive->crosstab->select row and column variables
and press ok
Chi-Square Test
Test of goodness of Fit:
Null hypothesis: no significant diff. Between observed and
expected result: Ef=Of
 P value >0.05
Test of Independence
H0: Two variables are independent and has no relation with
respect to impact on each other.
 Degree of Freedom: Df = (C-1)(R-1): C=no. Of column, R=

no. Of Row
ANOVA
ONE WAY: one factor(independent variable with multiple
levels-generally groups) and one dependent variable(generally
scaled values)
ONE WAY REPEATED MEASURES: one way paired sample
TWO WAY: Two factors (independent variables with multiple

levels- generally represents group) and one dependent variables.
MANOVA: one or many factors (independent variables) and

multiple dependent variables (more than one)
Contd..
Levene test of homogeneity : test of equality of
variance
Need to be insignificant (P>.05) else check
 P value for F at alpha=.001 for significance
 Or check brown-forsythe and Welch test (they are robust to
variance)
 Box test for covariance (Manova) (same as above)
 If significant use Pillai(robust to covariance)
 Else Roy’s
Factor Analysis
KMO Statistics (for sample adequacy):
Marvellous: .90s; Meritorious: .80s; Middling: .70s;
Mediocre: .60s; Miserable: .50s; Unacceptable: <.50
Bartlett’s Test of Sphericity: Tests hypothesis that

correlation matrix is an identity matrix. Diagonals are
ones, Off-diagonals are zeros
A significant result (Sig. < 0.05) indicates matrix is not an identity
matrix; i.e., the variables do relate to one another enough to run a
meaningful EFA.
Factor Analysis
• Done to find/reduce the factors
Exploratory Factor Analysis: Based on Eigen Value >=1
•Iterate until you arrive at a clean pattern matrix

•Adequacy
•Convergent validity
•Discriminant validity
•Reliability : of each factor must be >.7
22–13
Factor Analysis
 Communality: The extent to which an item correlates with all other
items. Higher communalities are better. If communalities for a particular
variable are low (between 0.0-0.4), then that variable may struggle to load
significantly on any factor.
 Convergent validity: means that the variables within a single factor are
highly correlated. Regardless of sample size, it is best to have loadings
greater than 0.500 and averaging out to greater than 0.700 for each factor.
 Discriminant validity: refers to the extent to which factors are distinct and
uncorrelated. The rule is that variables should relate more strongly to their own
factor than to another factor. If "cross-loadings" do exist (variable loads on
multiple factors), then the cross-loadings should differ by more than 0.2.
Correlations between factors should not exceed 0.7. A correlation greater than
0.7 indicates a majority of shared variance (0.7 * 0.7 = 49% shared variance)
Condition of best fit model
See chi square= must be insignificant
Cmin==chi square=p value should not be significant
Gfi (goodness of fit index)value >.9
Fit indexes (NFI(normed), RFI(relative),
IFI(incremental), CFI(bentlers comparative))>.9
RMSEA<.05
Go to estimates : Look for significant path coefficient
(P values<.05)
Look for model modification
The thresholds listed in the table below are from Hu
and Bentler (1999).
CFA
Reliability
CR (composite reliability) > 0.7
Convergent Validity
AVE > 0.5, CR>AVE
Discriminant Validity
MSV (max shared variance)< AVE; AVE>ASV (average shared
variance)
Square root of AVE greater than inter-construct correlations
A significant standardized residual covariance is one with an
absolute value greater than 2.58. Significant residual covariances
significantly decrease your model fit. Fixing model fit per the
residuals matrix is similar to fixing model fit per the modification
indices.
Convergent validity
 Average variance extracted (AVE)
 Where:
L = the standardized factor loading,
i = the number of the item.
 convergent validity by examining the loading paths of all
items, which should be statistically significant and exceed
0.50 (Hair et al., 2010, p. 709). It should be noted that
reliability is also considered an indicator of good
convergent validity if CR is 0.7 or more.
Common Method Bias (CMB)
Common Latent Factor

This method uses a common latent factor (CLF) to capture the
common variance among all observed variables in the model.
To do this, simply add a latent factor to your AMOS CFA
model (as in the figure below), and then connect it
to all observed items in the model. Then compare the
standardised regression weights from this model to the
standardized regression weights of a model without the CLF. If
there are large differences (like greater than 0.200) then you
will want to retain the CLF as you either impute composites
from factor scores, or as you move in to the structural model.
The CLF video tutorial demonstrates how to do this.

Data Screening& Factor Analysis

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Screening& Factor Analysis

Uploaded by

Copyright:

Available Formats

Data Analysis

Unengaged responses (Monotone response)

Missing data in columns

Skewness & Kurtosis

 Degree of Freedom: Df = (C-1)(R-1): C=no. Of column, R=

ONE WAY REPEATED MEASURES: one way paired sample

TWO WAY: Two factors (independent variables with multiple

MANOVA: one or many factors (independent variables) and

Bartlett’s Test of Sphericity: Tests hypothesis that

Exploratory Factor Analysis: Based on Eigen Value >=1

•Iterate until you arrive at a clean pattern matrix

Common Latent Factor

You might also like

Data Screening&amp; Factor Analysis

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Screening&amp; Factor Analysis

Uploaded by

Copyright:

Available Formats

Data Analysis

Unengaged responses (Monotone response)

Missing data in columns

Skewness & Kurtosis

 Degree of Freedom: Df = (C-1)(R-1): C=no. Of column, R=

ONE WAY REPEATED MEASURES: one way paired sample

TWO WAY: Two factors (independent variables with multiple

MANOVA: one or many factors (independent variables) and

Bartlett’s Test of Sphericity: Tests hypothesis that

Exploratory Factor Analysis: Based on Eigen Value >=1

•Iterate until you arrive at a clean pattern matrix

Common Latent Factor

You might also like

Data Screening& Factor Analysis

Data Screening& Factor Analysis