You are on page 1of 45

Introduction to Statistics:

Quantitative Data Analysis using SPSS

Discover the Management Dilemma


Defining the
problem Define the Management Question
or opportunity Sampling
Define the Research Question(s) design
Designing the sample
Refine the
Exploratory Research Question(s) Exploratory
design design

Quantitative Qualitative
data data
Research
Proposal
Choosing the research Collecting, analysing,
design type and interpreting
Data processing
Research and
method analysis

Qualitative Quantitative
Interpretation
method method
Of finding
Descriptive Causal
Identifying data design design
collection methods
Focus Research
group report

Survey Secondary Experiment


Depth Observation
Interview Questionnaire data study Laboratory Field
interview

2
Source: Cooper and Schindler (2003), Business Research Methods, McGraw-Hill

1
Stages of data analysis

3
Zikmund et al., 2007

Coding Questionnaires

The respondent code and the record number


appear on each record in the data.

The first record contains the additional codes:


project code, interviewer code, date and time
codes, and validation code.

2
Survey
Thank you for taking the time to complete this survey. We are interested in obtaining some basic information
relating to you and your sporting preferences. The survey should only take a couple of minutes to complete. Please
read the following questions carefully and answer in the spaces provided.

Q1. How old are you?


_______________________________

Q2. Please indicate your gender

Male Female
Q3. What is your current annual gross income (to the nearest thousand)?

_______________________________

Q4. Which of the following states do you live in?


NSW VIC QLD WA

Q5. Which sports do you most prefer to watch? Please rank the following sports in order
of preference (1=Most preferred and 3=Least preferred):

Cricket Soccer AFL


Q6. For the sport you most prefer to watch, please explain why this is your most preferred sport.

_________________________________________________________________________________________
5
_________________________________________________________________________________________

Survey Respondent ID #
Thank you for taking the time to complete this survey. We are interested in obtaining some basic information
relating to you and your sporting preferences. The survey should only take a couple of minutes to complete. Please
read the following questions carefully and answer in the spaces provided.

Q1. How old are you?


_______________________________
Q1 - Age
Q2. Please indicate your gender

Male Female Q2 - Gender


Q3. What is your current annual gross income (to the nearest thousand)?

_______________________________
Q3 Annual gross income
Q4. Which of the following states do you live in?
NSW VIC QLD WA
Q4 State of residence
Q5. Which sports do you most prefer to watch? Please rank the following sports in order
of preference (1=Most preferred and 3=Least preferred):
Cricket Soccer AFL
Q5-1 First preference
Q5-2 Second preference
Q6. For the sport you most prefer to watch, please explain why Q5-3 most
this is your Third preference
preferred sport.

Q6 Comment
_______________________________________________________________________________________

________________________________________________________________________________________ 6

3
Coding Questionnaire:
Variable Names and Labels

Name Label
Respid Respondent ID
Q1 Age
Q2 Gender
Q3 Annual Gross Income
Q4 State of Residence
Q5-1 First Preference
Q5-2 Second Preference
Q5-3 Third Preference
Q6 Reasons for Preference

Coding Questionnaire:
Variable Names and Labels

4
Value and Coding

Levels of Scales

Nominal
Ordinal
Interval
Ratio

10

5
Copyright 2003 John Wiley & Sons, Inc. Sekaran/RESEARCH 4E 8B
11

Copyright 2003 John Wiley & Sons, Inc. Sekaran/RESEARCH 4E 8C


12

6
Copyright 2003 John Wiley & Sons, Inc. Sekaran/RESEARCH 4E 8D
13

Copyright 2003 John Wiley & Sons, Inc. Sekaran/RESEARCH 4E 8E


14

7
15
Connolly, Paul (2007), Quantitative Data Analysis in Education, Routledge, New York, NY

SPSS Term

Traditional term Definition SPSS Term

Two or more unordered categories


Nominal Nominal

Ordered levels, in which the difference in


Ordinal magnitude between levels is not equal Ordinal

Ordered levels, in which the difference


Interval between levels is equal, but no true zero

Scale
Ordered levels; the difference between
Ratio levels is equal, and a true zero

16

8
Value and Coding

Survey
Thank you for taking the time to complete this survey. We are interested in obtaining some basic information relating to
you and your sporting preferences. The survey should only take a couple of minutes to complete. Please read the
following questions carefully and answer in the spaces provided.
Q1. How old are you?
_____25 __________________________ Ratio
Q2. Please indicate your gender

Male (1) Female (2) Nominal


Q3. What is your current annual gross income (to the nearest thousand)?

______$100,000____________________ Ratio
Q4. Which of the following states do you live in?
NSW (1) VIC (2) QLD (3) WA (4) Nominal
Q5. Which sports do you most prefer to watch? Please rank the following sports in order
of preference (1=Most preferred and 3=Least preferred):
Cricket (1) Soccer (2) AFL (3) Ordinal
Q6. For the sport you most prefer to watch, please explain why this is your most preferred sport.

_______________________________________________________________________________________
17

Value and Coding

Name Label Numeric Code


Respid Respondent ID -
Q1 Age -
Q2 Gender 1 = Male; 2 = Female
Q3 Annual Gross Income -
Q4 State of Residence 1 = NSW; 2 = VIC; 3 = QLD; 4 =
WA
Q5-1 First Preference 1 = Cricket; 2 = Soccer; 3 = AFL
Q5-2 Second Preference 1 = Cricket; 2 = Soccer; 3 = AFL
Q5-3 Third Preference 1 = Cricket; 2 = Soccer; 3 = AFL
Q6 Reasons for Preference -

18

9
Practical Examples

Value and Coding

19

Exercise

20

10
Exercise

21

Exercise

22

11
Exercise

23

Exercise

24

12
Exercise

25

Exercise

26

13
Exercise

27

Value and Coding

28

14
Entering Data

29

Entering Data
Respid Age Gender Income State Pref1 Pref2 Pref3 Comments

Soccer =
1 18 Male = 1 38000 NSW = 1 Cricket = 1 AFL = 3 Everyone likes cricket!
2

2 23 Female = 2 45000 VIC = 2 AFL = 3 Soccer = 2 Cricket = 1 Its the most exciting!

3 18 Male = 1 40000 WA = 4 AFL = 3 Soccer = 2 Cricket = 1 Its the sport of WA

My friends watch the


4 19 Male = 1 32000 QLD = 3 Cricket = 1 Soccer = 2 AFL = 3
cricket

5 25 Male = 1 51000 QLD = 3 Cricket = 1 AFL = 3 Soccer = 2 Cricket is more social

6 19 Female = 2 45000 WA = 4 Soccer = 2 AFL = 3 Cricket = 1 Soccer is more fun


Soccer is the new sport
7 19 Female = 2 VIC = 2 Soccer = 2 Cricket = 1 AFL = 3
of Australia

8 22 Male = 1 28000 NSW = 1 Cricket = 1 AFL = 3 Soccer = 1

30

15
Entering Data

31

Online Questionnaire

http://macquariefbe.qualtrics.com/SE/?SID=SV_db7FFQ8ocbRGOiw

32

16
Variable

A variable is defined as a characteristic of the


participants or situation for a given study that has
different values in that study
A variable must be able to vary or have different
values
E.g.: gender two values (male and female)
Age a large number of values

33

Variable

Variables are divided into:


Independent variables
Active Experimental studies
Attribute Non-experimental studies
Dependent variables
Extraneous variables
Also called nuisance variables or covariates
Variables are not of interest in a particular study but could
influence the dependent variable
Environmental factors, time of day, characteristics of the
experimenter, teacher, or therapist

34

17
Exploratory Data Analysis

Exploratory Data Analysis


Refining the datasets
Testing for normality
Data transformation
Practical examples

36

Exploratory Data Analysis

To examine and get to know your data using


various descriptive statistics and graphs before
running any inferential statistics

37

18
Purposes of EDA

To check that the data have been correctly entered and


free of any missing values
To check the data is free of any errors of omissions as
this can bias or limit the analysis
To examine if there are problems in the data such as
non-normal distributions of the variable that are to be
used in later analysis
To examine relationships between variables to determine
how to conduct the hypothesis testing analysis
Check the strength of correlation between variables
Check number of dimensions of constructs/variables

38

How to Do EDA

Generating plots of the data


Generating numbers from the data
Using Descriptive Statistics
Min-max, mean, standard deviation, skewness
Frequency distribution tables
Bloxplots
Histograms or stem and leaf plots

39

19
Conducting a Frequency Analysis

Case situation:
An honours student, Joyce, at Murdoch University is
interested in determining what factors influence
consumer satisfaction with backpacker hostels. She
conducts a survey and the data collected is shown in
dataset missval.sav

40

Frequency Analysis
Analyse > Descriptive Statistics>Frequencies
1
2

3
4

41

20
Numbers of valid cases

No missing value

Minimum (1) and maximum (26) value of


Nationality

Frequency of male respondents (176), female


respondent (127) and Total (303)

Percentage of male respondent (58.1) and


female respondent (41.9)

The tables show there are no


unusual values

42

Charts
Analyse > Descriptive Statistics>Frequencies> Chart
1
2

3
4

43

21
Around 45% intend to
stay in Perth for more
than 21 days, or three
weeks

A large proportion of
backpackers are
between 20 29 years
old

The charts show there are no


unusual values

44

Missing Values

45

22
Frequency Analysis
Analyse > Descriptive Statistics>Frequencies
1
2

3
4

46

Minimum (0) and maximum (9) value of


Repute Impt

Mistyped values

47

23
Numbers of valid cases

Missing values

Minimum (0) and maximum (9) value of


Repute Impt

48

Data Imputation

The operation of deciding what data to use to fill


these blanks
This term means that you assign data to the blank
based on some reasonable heuristic (a rule or set
of rules).
In deciding what values to use to fill blanks in the
record, you should follow the cardinal rule of data
imputation, Do the least harm (Allison, 2002).

50
Nisbet, Robert, John Elder, Gary Miner (2009), Handbook of Statistical Analysis and Data
Mining Applications, Elsevier Inc , Burlington, MA

24
Data Imputation
Estimation Methods for Replacing Missing Values in SPSS
1. Series mean. Replaces missing values with the mean for the
entire series.
2. Mean of nearby points (moving average). Replaces
missing values with the mean of valid surrounding values.
The span of nearby points is the number of valid values
above and below the missing value used to compute the
mean.
3. Median of nearby points. Replaces missing values with the
median of valid surrounding values. The span of nearby
points is the number of valid values above and below the
missing value used to compute the median.

52

Data Imputation

Estimation Methods for Replacing Missing Values in SPSS


4. Linear interpolation. Replaces missing values using a
linear interpolation. The last valid value before the missing
value and the first valid value after the missing value are
used for the interpolation. If the first or last case in the series
has a missing value, the missing value is not replaced.
5. Linear trend at point. Replaces missing values with the
linear trend for that point. The existing series is regressed on
an index variable scaled 1 to n. Missing values are replaced
with their predicted values.

53

25
MissVal SerMean Nearby MedNearby Interpo LinTren
5.00 5.00 5.00 5.00 5.00 5.00
5.00 5.00 5.00 5.00 5.00 5.00
7.00 7.00 7.00 7.00 7.00 7.00
5.00 5.00 5.00 5.00 5.00 5.00
4.00 4.00 4.00 4.00 4.00 4.00
7.00 7.00 7.00 7.00 7.00 7.00
4.36 4.50 4.00 5.00 5.00
3.00 3.00 3.00 3.00 3.00 3.00
4.00 4.00 4.00 4.00 4.00 4.00
3.00 3.00 3.00 3.00 3.00 3.00
4.00 4.00 4.00 4.00 4.00 4.00
1.00 1.00 1.00 1.00 1.00 1.00
4.00 4.00 4.00 4.00 4.00 4.00
5.00 5.00 5.00 5.00 5.00 5.00
4.00 4.00 4.00 4.00 4.00 4.00
54

Replace Missing Values


Transform > Replace Missing Values

3
4

55

26
Replace Missing Values
Transform > Replace Missing Values

56

Replace Missing Values


Transform > Replace Missing Values

57

27
Research Question and Hypothesis

58

Research Question and Hypothesis

Research hypotheses are predictive statements about the relationship


between variables
Students who take only one test per day will score better on
standardised tests than will students who take two tests in one day
Research questions are similar to hypotheses, except that they do not
entail specific predictions and are phrased in question format
Is there a difference in students scores on standardised test if
they took two tests in one day versus taking only one test on each
of two days
Types of research questions
Difference
Associational
Descriptive

59

28
60

61

29
Schematic Diagram
research question and type of statistic used

General Purpose Explore relationship


between variable
Description (only)

Specific Purpose Compare Groups Find Strength of Summarise Data


Association, Relate
Variables

Type of Question/ Difference Associational Descriptive


Hypothesis

Difference Inferential Associational Descriptive Statistics


General Type of statistics (e.g. t-test, Inferential statistics (e.g. mean, percentage,
Statistic ANOVA) (e.g. correlation, range)
multiple regression)
George A. Morgan, Leech, Nancy L., Gloeckner, Gene W., Barrett, Karen C. (2004), SPSS for Introductory Statistics,62
,
second edition, Lawrence Erlbaum Associates, Mahwah, NJ

Research Question and Hypothesis

Difference research questions


We compare scores (on the dependent variable) of two or more
different groups, each of which is composed of individuals with one of
the values or levels on the independent variable
This type of question attempts to demonstrate that groups are not the
same on the dependent variable
Associational research questions
We associate or relate two or more variables
This approach usually involves an attempt to see how two or more
variables co-vary (e.g., higher values on one variable correspond to
higher, or lower, values on another variable for the same persons) or
how one or more variables enables one to predict another variable.
Descriptive research questions
They merely describe or summarise data, without tyring to generalise
to a larger population of individuals

63

30
Inferential
Statistics

FOSTER , JEREMY J. (2001), Data Analysis Using SPSS for


Windows Versions 8 to 10 A Beginners Guide, SAGE 64
Publications, London

Descriptive Statistics

66
Connolly, Paul (2007), Quantitative Data Analysis in Education, Routledge, New York, NY

31
Selecting the Appropriate Statistical Test

67
Connolly, Paul (2007), Quantitative Data Analysis in Education, Routledge, New York, NY

Selecting the Appropriate Statistical Test

68
Connolly, Paul (2007), Quantitative Data Analysis in Education, Routledge, New York, NY

32
Test Selection Grid
Ho, Robert (2006),Handbook of Univariate and Multivariate Data Analysis and
Interpretation with SPSS, p.9)

69

Guilty Innocent

Reject null hypothesis: Correct decision Incorrect decision


Find guilty (1 ) or Power Type I error ()

Fail to reject null Incorrect decision


Correct decision
hypothesis: Innocent Type II error ()

Truth in the population Association +nt No association

Reject null hypothesis Correct Type I error

Fail to reject null


Type II error Correct
hypothesis
71

33
72

73

34
74

75

35
Normality

Dataset: normality.sav

76

Explore Analysis
Analyse > Descriptive Statistics>Explore
1
2

3
4

8
10 6

9 78

36
Skewed toward
5 and 7

Distribution with
Negative skewness
=
Most respondent agree
that the location of the
backpacker hostel is
important

Skewed toward
5 and 7
79

Peak and longer tail

Distribution with positive


kurtosis and show peaked
distribution

Peaked or
pulled in
upward 80

37
Kurtosis > 2

Outliers
81

Test of Normality

The advice from SPSS is


to use the Shapiro-
Wilks normality test
If the p-value is less than 0.05, you reject the normality when sample sizes are
assumption, and if the p-value is greater than 0.05, small (n < 50).
there is insufficient evidence to suggest the distribution
is not normal (meaning that you can proceed with the
assumption of normality). Since the p-value is 0.000,
there is reason to doubt the distribution is normal.

82

38
Removal of Outliers

If an identified outlier(s) has been caused by data recording


error, the value should be corrected and analysis can proceed.
If it is determined that the data have been recorded correctly,
investigation into other reasons why an observation is extreme
should be done if at all possible.
If the data have been investigated thoroughly and other
sources of error have been identified, it may not be possible to
recover the true data value (e.g., a scale was found to be
incorrectly calibrated). In this instance, the only alternative is to
throw out the contaminated data. However, all data which have
been subject to the same source of error should also be
eliminated, not just the extreme values.

84
Henry C. Thode, JR. (2002), Testing for Normality, Marcel Dekker, Inc, New York

Removal of Outliers
If no source of error is discovered, several
alternatives are available: for example, robust
methods of analysis can be used. Other ways of
accommodating outliers include (Tietjen, 1986):
1. removing the outliers and proceeding with analysis;
2. removing the outliers and treating the reduced sample as
a censored sample;
3. replace the outliers with the nearest non-outlier value;
4. replace the outliers with new observations;
5. use standard methods for analyzing the data, both
including and excluding the outliers, and reporting both
results. Interpretation of results becomes difficult if the two
results differ drastically.

85
Henry C. Thode, JR. (2002), Testing for Normality, Marcel Dekker, Inc, New York

39
3

Click

Double click

86

click
87

40
1
Click in row 212

Delete the case number started from biggest number (#212)


2

88

89

41
Normal Probability Plot and Corresponding
Univariate Distribution

91

Test of Normality

-2.58 < Z value < 2.58 (0.01 significance level)

-1.96 < Z value < 1.96 (0.05 significance level)

92

42
Practical Examples
Normality

Check the normality of qns21c


(staff are friendly)

93

Data Transformations

94

43
Data Transformation

Flat distribution
inverse transformation, e.g. 1/Y or 1/X
Skewed distribution
Square root, logarithms, squared, or cubed
Negative skewed are best transformed by employing a
squared or cubed transformation
Positive skewed are best transformed by employing a
logarithm or square root

95

Data Transformation

96

44
97

45

You might also like