You are on page 1of 33

Quantitative methods

of data analysis
Statistics or Sadistics?
• Statistics describes a set of tools and
techniques that is used for describing,
organising and interpreting information or
data.

• Methods of quantitative data analysis:


descriptive statistics and inferential statistics.
Descriptive and Inferential
• Descriptive statistics is used to organise and
describe the characteristics of a collection of
data.

• Inferential statistics is used to make


inferences from a smaller group of data
(sample, n) to a possibly larger one
(population, N).
Descriptive Analysis
Descriptive statistics involve the use of:

Univariate analysis (one variable)


Bivariate analysis (two variables)
Multivariate analysis (> two variables)
The number of variables to be analysed at any time
will relate back to the aims, objectives or
hypotheses of the research project.
Variables
Sample: Questionnaire
No. Questions
1 Gender? [var0001]
Male (code: 1) Female (code:2)
2 Age? [var0002]
3 Which of the following best describes your main reason for going to the gym?
[var0003]
Relaxation (code:1) Maintain or improve fitness (code:2)
Lose weight (code:3) Meet others (code:4)
Build strength (code:5) Other (code: 6)
4 When you go to the gym, how often do you use the equipment (jogger, bike)?
[var0004]
Always (code: 1) Usually (code:2) Rarely (code:3) Never (code:4)
5 Who do you go with? [var0005]
On my own (code:1) With a friend (code:2) With a partner/spouse (code:3)

Each variable number corresponds to the question number –


coding data
Missing data
• When respondents fail to reply to a question,
it could be
– by accident;
– because they do not want to answer the question;
– It is not applicable to them

Missing data must be taken into account during the


analysis.
Types of variables
Type Description Examples
Interval/ratio Variables where the distances between How old are you?
the categories are identical across the
range.

Ordinal Variables whose categories can be rank How often do you travel?
ordered but the distances between the always
categories are not equal across the usually
range. rarely
never

Nominal Variables whose categories cannot be Why do you participate in


rank ordered; also known as categorical. tourism?
relaxation, VFR, culture

Dichotomous Variables containing data that have only Are you male or female?
two categories. male or female

(Bryman, 2004)
Categorising a variable
Are there more than 2 categories?

Yes No variable is dichotomous

Can the categories be rank ordered?

Yes No variable is nominal

Are the distances between the categories equal?

Yes No variable is ordinal

Variable is interval/ratio
Univariate analysis: one variable
Univariate analysis refers to the analysis of one
variable at a time.
(i.e. aim: to describe the sociodemographic characteristics
of daytrippers: age, gender, education, occupation,
income, etc)

– Mean is the middle point of a set of values;


– Median is the midpoint in a set of scores (when data
content extreme scores and to avoid distorting the
average);
– Mode is the value that occurs most frequently (data that
are qualitative, categorical or nominal).
– Measures of variation: range, percentile and standard
deviation
Diagrams
• A picture really is worth a thousand words.
– Graphs
– Charts

Histogram Bar

Line Pie
Bivariate analysis
• Bivariate analysis is concerned with the
analysis of two variables at a time in order to
uncover whether the two variables are
related.
• Exploring relationships between variables
means searching for evidences that the
variation in one variable coincides with
variation in another variable.
(i.e. aim: to determine the effect of age on
destination selection)
Cross-tabulation
(relationship between gender and reasons for visiting the gym)
Reasons Gender
Male Female
No. % No. %
Relaxation 3 7 6 13
Fitness 15 36 16 33
Loss weight 8 19 25 52
Build strength 16 38 1 2
Total 42 100 48 100

•The presumed independent variable as the column variable and the presumed
dependent variable as the rows variable.

•In other words, gender (the independent variable) influences reasons for going
to the gym (dependent variable) as opposed to going to the gym cannot
influence gender.
Types of correlations and the corresponding
relationship between variables
What happens to What happens to Type of correlation Example
variable X variable Y
X increases in value Y increases in value Direct or positive The more time you
spend studying, the
higher your test
score will be.
X decreases in value Y decreases in value Direct or positive The less money you
put in the bank, the
less interest you
will earn.
X increases in value Y decreases in value Indirect or negative The more you
exercise, the less
you will weigh.
X decreases in value Y increases in value Indirect or negative The less time you
take to complete a
text, the more
you’ll get wrong.
(Salkind, 2000)
Pearson’s r
• Pearson’s r (Pearson correlation coefficient) is
a method for examining relationships between
interval/ratio variables.
– Interpreting a correlation coefficient
Size of the correlation coefficient General interpretation

.8 to 1.0 Very strong relationship

.6 to .8 Strong relationship

.4 to .6 Moderate relationship

.2 to .4 Weak relationship

.0 to .2 Weak or no relationship
Methods of bivariate analysis
Nominal Ordinal Interval/ratio Dichotomous
Nominal Contingency Contingency Contingency Contingency
table + chi- table + chi- table + chi- table + chi-
square (χ2) + square (χ2) + square (χ2) + square (χ2) +
Cramèr’s (V) Cramèr’s (V) Cramèr’s (V) Cramèr’s (V)
Ordinal Contingency Spearman’s rho Spearman’s rho Spearman’s rho
table + chi- (ρ) (ρ) (ρ)
square (χ2) +
Cramèr’s (V)
Interval/ratio Contingency Spearman’s rho Pearson’s (r) Spearman’s rho
table + chi- (ρ) (ρ)
square (χ2) +
Cramèr’s (V)
Dichotomous Contingency Spearman’s rho Spearman’s rho Phi (φ)
table + chi- (ρ) (ρ)
square (χ2) +
Cramèr’s (V)
(Bryman, 2004)
Multivariate analysis
• Multivariate analysis entails the simultaneous
analysis of three or more variables.
– For example project that seeks to determine
relationships between the sociodemographic
characteristics of tourists and the prices of four
destination packages.
• Multiple variables: sociodemographics (age, gender,
etc), destination packages and the prices of those
packages.
Inferential statistics
Inferential statistics are based on probability
sampling and are important when testing a
hypothesis and making statements about the
sample in relation to the population being
studied.
When considering inferential statistics, the focus
is on statistical significance, levels of
significance and Type I and Type II errors.
Statistical significance
A test of statistical significance allows the
analyst to estimate how confident he or she
can be that the results deriving from a study
based on a randomly selected sample are
generalisable to the population from which
the sample was drawn.
– How do you know how many questionnaires to distribute to ensure
that the sample represents the larger population?
– If the sample is unrepresentative of the wider population, the findings
will be invalid.
To be normal or not to be normal

Normal distribution
•The normal curve represents a distribution where the mean, median and
mode are equal to one another.
•If the median and the mean are different, then the distribution is skewed in
one direction or the other.
Tests of significance
• Parametric tests assume that the variable
being studied reflects a normal distribution in
the population.

• Non-parametric tests assume that the variable


being studied does not reflect a normal
distribution in the population.
Tests of significance
Non-parametric Parametric tests
tests
Sample number Nominal level Ordinal level Interval/ratio level
and type
One Chi-square test Kolmogorov- t-test
Smirnov test
Two independent Chi-square test Mann-Whitney U- t-test
test
Two independent McNemar test Sign test Wilcoxon t-test
test
More than two Chi-square test Kruskal-Wallis H- ANOVA
independent test
More than two Cochran Q-test Friedman test ANOVA
independent

(Sarantakos, 1998)
Levels of significance
• Even if the sample ‘perfectly’ represents the
population, there are always other influences
the sample – the possibility of an error.

• It is inevitable that you accept some risk.

• Statistical significance is the degree of risk you


are willing to take that you will reject the null
hypothesis when it is actually true.
Hypothesis
• A null hypothesis, H0
– This stipulates that there is no relationship
between the variables being tested for
relationships.
• For example: there is no relationship or difference
between gender and visiting the gym in the population
from which the sample was selected.
• Any differences observed are due to chance.
• As good researchers, our job is to eliminate chance
factors from explaining observed differences and to
evaluate other factors that might contribute to these
differences.
Hypothesis
• A research (alternative) hypothesis, H1
– It is a definite statement of the relationship
between variables.
– There are two types of research hypothesis:
• A directional: when the research hypothesis posits a
direction to the inequality (i.e. more than or less than).
– A one-tailed test
• A nondirectional: when the research hypothesis posits
no direction to the inequality.
– A two-tailed test
Null hypothesis and research hypotheses
Sample questions

Null hypothesis Nondirectional research Directional research


hypothesis hypothesis
There will be no difference 12th graders and 9th graders 12th graders will have a
in the average score of 9th will differ on the ABC higher average score on
graders and the average memory the ABC memory test than
score of 12th graders on the will 9th graders.
ABC memory test.
There is no difference The reading scores of Learning disabled children
between the reading levels learning disabled children who are taught in resource
of learning disabled who are taught in resource rooms will have higher
children in resource rooms rooms will differ from reading scores when
when compared with those of learning disabled compared with learning
learning disabled children children taught in regular disabled children taught in
in regular classrooms. classrooms. regular classrooms.
Null and Research hypotheses
• Null hypothesis refers to the population
whereas research hypothesis refers to the
sample.
• The null hypothesis cannot be directly tested
but the researched hypothesis can be directly
tested.
• Because the null hypothesis cannot be tested,
it is an implied hypothesis.
Levels of significance
• Statistical analysis is generally described with
respect to levels of significance.
• Three levels of significance: 0.05 (5 in 100),
0.01 (1 in 100) or 0.001 (1 in 1000).
– i.e. 0.05 or 5 in 100 chances of an association
between the variables being a result of sampling
error, or;
– findings are due to chance only 5 in 100 times, or;
– there is a 95% chance that the sample findings
reflect the population;
– These sample statements assume a 0.5 level of
statistical significance.
Example
Null hypothesis: no difference
Data however shows a difference.
In reality: no difference
(However, you will never know this true state
since the null cannot be directly tested because it
applies only to the population).
If you reject the null, you would be making an
error (a Type I error).
Different types of errors
Action you take

Accept the null Reject the null


hypothesis hypothesis

True nature of The null correct Type I error


the null hypothesis is
hypothesis really true
The null Type II error correct
hypothesis is
really false
Type I error
• Type I error, or level of significance has certain
values associated with it that define the risk
you are willing to take in any test of the null
hypothesis. The conventional levels set are
between .01 and .05.
– Example: as many as 5 out of the 100 samples
might exhibit a relationship when there is not one
in the population (p < 0.05)
Correlation and statistical significance
• Correlation coefficient r is -0.62
• Significance level is p < 0.05
• Conclusion:
– Reject the null hypothesis that this is no
relationship in the population.
– Inferred that there are only 5 chances in 100 that
a correlation of at least -0.62 could have arisen by
chance alone.
Conclusion
• Descriptive statistics are used to describe a
sample’s characteristics.
• Inferential statistics are used to infer
something about the population base on the
sample’s characteristics.
• SPSS – Statistical Package for the Social
Sciences
References
• Bryman, A. (2004) Social Research Methods
(2nd ed.) New York: Oxford University Press.
• Jennings, G. (2001) Tourism Research.
Sydney: Wiley.
• Salkind, N. (2000) Statistics for People Who
(Think They) Hate Statistics. London: Sage.

You might also like