You are on page 1of 36

CHAPTER 13

QUANTITATIVE DATA ANALYSIS


2
Raw Data
 The unedited responses from a respondent exactly as indicated by that respondent.

Editing
 The process of checking the completeness, consistency, and legibility of data and
making the data ready for coding and transfer to storage.

Checking for Consistency


 Respondents match defined population
 Check for consistency within the data collection framework

Item Nonresponse
 The technical term for an unanswered question on an otherwise complete
questionnaire resulting in missing data.

3
Coding
 The process of assigning a numerical score or other character symbol to
previously edited data.

Codes
 Rules for interpreting, classifying, and recording data in the coding
process.
 The actual numerical or other character symbols assigned to raw data.

Data File
 The way a data set is stored electronically in spreadsheet-like form in
which the rows represent sampling units and the columns represent
variables.

4
Two Basic Rules for Coding Categories:

 They should be exhaustive, meaning that a coding category


should exist for all possible responses.

 They should be mutually exclusive and independent, meaning


that there should be no overlap among the categories to
ensure that a subject or response can be placed in only one
category.

5
Possible-Code Cleaning

Any given variable will have a specified set of answer choices and codes to match each answer choice.
For example, the variable gender will have three answer choices and codes for each: 1 for male, 2 for
female, and 0 for no answer. If you have a respondent coded as 6 for this variable, it is clear that an error
has been made since that is not a possible answer code. Possible-code cleaning is the process of checking
to see that only the codes assigned to the answer choices for each question (possible codes) appear in
the data file.

If you are not using a computer program that checks for coding errors during the data entry process, you
can locate some errors simply by examining the distribution of responses to each item in the data set.
For example, you could generate a frequency table for the variable gender and here you would see the
number 6 that was mis-entered. You could then search for that entry in the data file and correct it.

6
• Data integrity is essential to successful research and
decision making.

• What limits data integrity?


• Researcher making up data
• Nonresponse
• Poor editing or coding

• Consistent coding is important for secondary data.


7
PARAMETRIC STATISTICAL TEST
 In the literal meaning of the terms, a parametric statistical test is one that
makes assumptions about the parameters (defining properties) of the
population distribution(s) from which one‘s data are drawn.

 For practical purposes, you can think of "parametric" as referring to tests,


such as t-tests and the analysis of variance, that assume the underlying
source population(s) to be normally distributed;

 they generally also assume that one's measures derive from an equal-
interval scale (Interval or Ratio variables)

 And you can think of “non-parametric” as referring to tests that do not


make on these particular assumptions.
8
BASIS FOR
PARAMETRIC TEST NONPARAMETRIC TEST
COMPARISON

Meaning A statistical test, in which A statistical test used in the case


specific assumptions are made of non-metric independent
about the population parameter variables, is called non-
is known as parametric test. parametric test.

Basis of test statistic Distribution Arbitrary


Measurement level Interval or ratio Nominal or ordinal
Measure of central tendency Mean Median
Information about Completely known Unavailable
population
Applicability Variables Variables and Attributes
Correlation test Pearson Spearman

9
NON-PARAMETRIC
PARAMETRIC TEST
TEST

Independent Sample t Test Mann-Whitney test

Paired samples t test Wilcoxon signed Rank test

One way Analysis of Variance (ANOVA) Kruskal Wallis Test

One way repeated measures Analysis of Friedman's ANOVA


Variance

10
2. BASIC ASSUMPTION OF PARAMETRIC
TEST – NORMALLY DISTRIBUTED DATA SET
 Mean, Median, and Mode are equal.
 A standard deviation close to zero .

 Skewness and Kurtosis close to zero or within the range


of +1 to -1.
 Shapiro-Wilk’s W or Kolmogorov-Smirnov D test is not
significant.
 A histogram of a variable shows rough normality and
takes the form of a symmetric bell-shaped curve.
 A straighter line is formed by Q-Q plot.

 Boxplot shows little outliers; median in the center of the


box; all four quartiles about equally ranged.
MEAN, MEDIAN, MODE, SD, SKEWNESS
Gender Statistic Standard Error
Mean 3.3645 0.06771
95% Confidence Lower Bound 3.2305
Interval for Mean Upper Bound 3.4984
5% Trimmed Mean 3.3715
Median 3.4286
Variance 0.582
1. Male’s Purchase Intention:
Male Standard Deviation 0.76302 Mean (3.3645) is close to
Minimum 1.14 Median (3.4286)
Maximum 5.00
Range 3.86
Interquartile Range 0.86 2. Std Deviation (0.76302) close
Skewness -0.243 0.215 to 0, not more than 1
Kurtosis -0.016 0.427
Purchase
Mean 3.3866 0.05262
95% Confidence Lower Bound 3.2824 3. Skewness and Kurtosis close
Interval for Mean Upper Bound 3.4908
to zero, within + - 1
5% Trimmed Mean 3.3769
Median 3.4286
Variance 0.338
Female Standard Deviation 0.58121
Minimum 1.86
Maximum 5.00
Range 3.14
Interquartile Range 0.71
Skewness 0.173 0.219
Kurtosis 0.371 0.435
SHAPIRO-WILK’S W OR KOLMOGOROV-
SMIRNOV D

Kolmogorov-Smirnova Shapiro-Wilk
Gender
Statistic df Sig. Statistic df Sig.
Male 0.105 127 0.002 0.984 127 0.141
Purchase
Female 0.067 122 0.200 0.986 122 0.223
*. This is a lower bound of true significance.
a. Lilliefors Significance Correction

- Male: KMO Significant (0.002) indicates not Normally Distributed (to


conduct other test)
- Male and Female: Shapiro-Wilk Not Significant (0.141; 0.223) indicates
Normally Distributed
HISTOGRAM

Female Purchase:
Symmetric bell-
shaped curve,
normally
distributed
Q-Q PLOT

The circles lie quite close


to the line indicating the
data shows straight
relationship with normal
line, thus normally
distributed
BOXPLOT

All four quartiles not equal


range, indicating not
normally distributed.

Remove the outliers, to get


normal distribution.
3. STATE THE HYPOTHESES
 State the null and alternative hypotheses based on underlying
theory.
 The choice of the significance level is decided, either 0.01,
0.05, or 0.10.
 Collect data and compute the p-value using an accurate
statistical test.
 Compare the p-value (generated by software) with the
significance value (0.01, 0.05, or 0.10, determined by
researcher).
 Conclusion.
4. TESTING OF MEANS
STATE THE HYPOTHESES
a) T-Test uses to assess hypotheses involving a single sample,
paired sample, or two independent samples.

b) Three techniques of t-test procedures: one-sample t-test,


paired-sample t-test, and independent-samples t-test.

c) One-way Analysis of Variance (ANOVA) uses to compare the


mean of a variable between two or more independent groups.
ONE SAMPLE T-TEST
 Used to compare the mean between a
variable and a standard mean.
 H1: Mean Customer Satisfaction is
significantly greater than 3
 Univariate Analysis

 CS – Interval variable

 SPSS steps: Analyse/Compare means/CS to


test Variable/type “3” as test value
 SPSS output: p-value in one sample t-test
table is 0.000 means sample mean is
significantly different from 3. One sample
statistic sample shows mean = 3.366 indicate
sample mean is significantly greater than 3.
 Thus, H1 is supported
SPSS output:
a) p-value in one sample t-test table is 0.000 means sample mean is significantly
different from 3.
b) One sample statistic sample shows mean = 3.366 indicate sample mean is
significantly greater than 3.
20
Thus, H1 is supported
PAIRED-SAMPLE T-TEST
 Used to compare the mean of a
variable for two related samples or to
compare two means from the same
sample.
 H2: The mean score of customer
satisfaction level between before and
after changes in the service is not
equal.
 Bivariate analysis.

 CS before & after – Interval variable

 SPSS steps: Analyse/Compare


means/paired-samples t-test/bring “CS
before & after” to variable box/OK.
SPSS output:
a) Refer paired sample: Test results: P-value is < 0.05 indicates there are mean difference before and
after service changes.
b) Refer paired sample stat: CS before = 3.46/ CS after = 3.37 suggest that CS before service changes
is significantly higher than after. 22

Thus, H2 is supported.
INDEPENDENT-SAMPLE T-TEST
 Used to compare the mean of a variable
between two unrelated groups.
 H3: The mean score of customer
satisfaction between males and females is
not equal.
 Bivariate analysis.

 Gender = Nominal

 CS = Interval

 SPSS Steps: Analyze/Compare


means/Independent-sample t-test/bring
CS to test variable box/bring gender to
grouping variable box/define groups: 1
into group 1 box, 2 into group 2
box/continue/OK
SPSS output:
a) Examine group statistics; mean male is 3.34, mean female is 3.38, not
significantly difference, concluded means for male and female are the same.
24
Thus, H3 is not supported.
ONE-WAY ANALYSIS OF VARIANCE
(ANOVA)
 Used to compare the mean of a variable
between two or more independent groups.
 H4: The mean score of the employees’ job
performance after the training programmes
A, B and C is not equal.
 Bivariate analysis.

 Performance: Interval

 Training: Nominal

 SPSS steps: Analyze/compare means/one


way ANOVA/move performance to
dependent list box/move training to factor
box/options/tick descriptive/tick
homogeneity of variance test/continue/OK
SPSS output:
a) Examine homogeneity of variances table, P-value is 0.986, met assumptions of
homogeneity variance.
b) Examine ANOVA table, P-value is 0.00 (<0.05). There is a significant difference. 26

Thus, H4 is supported.
5. MEASURES OF ASSOCIATION

a. Pearson Correlation Coefficient uses to measure the


strength of a linear association between two variables.
b. Linear regression analysis uses to predict changes in
dependent variable based on the value of independent variable(s)
or predictor(s).
c. Logistic regression analysis uses to predict changes in
categorical dependent variable based on the value of
independent variable(s) or predictor(s).

27
PEARSON CORRELATION
COEFFICIENT
 Used to measure the strength of a linear
association between two variables.
 H5: There is a relationship between
employee motivation and performance.
 Bivariate analysis.

 Employee motivation & performance =


interval
 SPSS steps:
analyze/correlate/bivariate/move
motivation & performance to variable
box/tick pearson/OK
SPSS output:
a) Refer correlations table: r=0.59, p-value = 0.00 (<0.05) indicate significant
results.
H5 is supported.
b) Since r is positive, there is a positive and significant relationship between
29
employee motivation and performance.
LINEAR REGRESSION

 Used to predict changes in the dependent variable


based on the value of independent variable(s) or
predictor(s).
 H6: There is a relationship between performance
and involvement.
 H7: There is a relationship between performance
and welfare.
 Multivariate analysis.

 Dependent variable: performance = interval

 Independent variable: Involvement & welfare =

interval.
 SPSS steps: Analyze/regression/linear/move
performance to dependent box/move involvement
& welfare to independent box/statistics/tick
estimates, model fit, descriptives & collinearity
diagnostics/continue/OK
SPSS output:
- Refer model summary: Adjusted R square = 0.45, indicates that 5% of the variance in the
dependent variable can be predicted from independent variables.
 Refer ANOVA table: P-value=0.00 (<0.05) indicates the equation is a good fit.
 Refer coefficient table: standard coefficient of involvement is 0.435 (p=0.00) and welfare is
0.314 (p=0.00) indicates both significantly related to performance. 31
 Thus H6 & H7 is supported.
LOGISTIC REGRESSION

 Used to predict changes in a


categorical dependent
variable based on the value of
independent variable(s) or
predictor(s).
6. MEDIATION AND MODERATION

a. Mediation, an initial independent variable X1 may influence


the dependent variable Y through a mediator X2. The
interrelationship can be grouped to three relationships: (a)
Independent variable influences the dependent variable (X1 
Y), (b) independent variable influences the mediator (X1  X2),
(c) mediator direct influences the dependent variable (X2  Y).
b. A moderator is a variable that alters the relationship between
independent variable and dependent variable.

33
MEDIATION
 An initial independent variable X1 may influence the
dependent variable Y through a mediator X2.

a
Y
X1

b c

X2
MODERATION
 A moderator is a variable that alters the relationship between
an independent variable and a dependent variable.

Y
X

M
7. STRUCTURAL EQUATION MODELLING
(SEM)
Structural equation modeling (SEM) uses when a researcher is
faced with a set of interrelated variables, yet none of the multivariate
techniques allow the researcher to address the issues. SEM is widely
used for following:

a. Confirmatory Factor Analysis (CFA) uses to explore the patterns of


relationships among a number of variables and specify which variable
load onto which factors.

b. Estimating a path model uses to show path diagram, decomposing


covariances and correlations, and direct, indirect, and total effects.

You might also like