Professional Documents
Culture Documents
of data analysis
Statistics or Sadistics?
• Statistics describes a set of tools and
techniques that is used for describing,
organising and interpreting information or
data.
Ordinal Variables whose categories can be rank How often do you travel?
ordered but the distances between the always
categories are not equal across the usually
range. rarely
never
Dichotomous Variables containing data that have only Are you male or female?
two categories. male or female
(Bryman, 2004)
Categorising a variable
Are there more than 2 categories?
Variable is interval/ratio
Univariate analysis: one variable
Univariate analysis refers to the analysis of one
variable at a time.
(i.e. aim: to describe the sociodemographic characteristics
of daytrippers: age, gender, education, occupation,
income, etc)
Histogram Bar
Line Pie
Bivariate analysis
• Bivariate analysis is concerned with the
analysis of two variables at a time in order to
uncover whether the two variables are
related.
• Exploring relationships between variables
means searching for evidences that the
variation in one variable coincides with
variation in another variable.
(i.e. aim: to determine the effect of age on
destination selection)
Cross-tabulation
(relationship between gender and reasons for visiting the gym)
Reasons Gender
Male Female
No. % No. %
Relaxation 3 7 6 13
Fitness 15 36 16 33
Loss weight 8 19 25 52
Build strength 16 38 1 2
Total 42 100 48 100
•The presumed independent variable as the column variable and the presumed
dependent variable as the rows variable.
•In other words, gender (the independent variable) influences reasons for going
to the gym (dependent variable) as opposed to going to the gym cannot
influence gender.
Types of correlations and the corresponding
relationship between variables
What happens to What happens to Type of correlation Example
variable X variable Y
X increases in value Y increases in value Direct or positive The more time you
spend studying, the
higher your test
score will be.
X decreases in value Y decreases in value Direct or positive The less money you
put in the bank, the
less interest you
will earn.
X increases in value Y decreases in value Indirect or negative The more you
exercise, the less
you will weigh.
X decreases in value Y increases in value Indirect or negative The less time you
take to complete a
text, the more
you’ll get wrong.
(Salkind, 2000)
Pearson’s r
• Pearson’s r (Pearson correlation coefficient) is
a method for examining relationships between
interval/ratio variables.
– Interpreting a correlation coefficient
Size of the correlation coefficient General interpretation
.6 to .8 Strong relationship
.4 to .6 Moderate relationship
.2 to .4 Weak relationship
.0 to .2 Weak or no relationship
Methods of bivariate analysis
Nominal Ordinal Interval/ratio Dichotomous
Nominal Contingency Contingency Contingency Contingency
table + chi- table + chi- table + chi- table + chi-
square (χ2) + square (χ2) + square (χ2) + square (χ2) +
Cramèr’s (V) Cramèr’s (V) Cramèr’s (V) Cramèr’s (V)
Ordinal Contingency Spearman’s rho Spearman’s rho Spearman’s rho
table + chi- (ρ) (ρ) (ρ)
square (χ2) +
Cramèr’s (V)
Interval/ratio Contingency Spearman’s rho Pearson’s (r) Spearman’s rho
table + chi- (ρ) (ρ)
square (χ2) +
Cramèr’s (V)
Dichotomous Contingency Spearman’s rho Spearman’s rho Phi (φ)
table + chi- (ρ) (ρ)
square (χ2) +
Cramèr’s (V)
(Bryman, 2004)
Multivariate analysis
• Multivariate analysis entails the simultaneous
analysis of three or more variables.
– For example project that seeks to determine
relationships between the sociodemographic
characteristics of tourists and the prices of four
destination packages.
• Multiple variables: sociodemographics (age, gender,
etc), destination packages and the prices of those
packages.
Inferential statistics
Inferential statistics are based on probability
sampling and are important when testing a
hypothesis and making statements about the
sample in relation to the population being
studied.
When considering inferential statistics, the focus
is on statistical significance, levels of
significance and Type I and Type II errors.
Statistical significance
A test of statistical significance allows the
analyst to estimate how confident he or she
can be that the results deriving from a study
based on a randomly selected sample are
generalisable to the population from which
the sample was drawn.
– How do you know how many questionnaires to distribute to ensure
that the sample represents the larger population?
– If the sample is unrepresentative of the wider population, the findings
will be invalid.
To be normal or not to be normal
Normal distribution
•The normal curve represents a distribution where the mean, median and
mode are equal to one another.
•If the median and the mean are different, then the distribution is skewed in
one direction or the other.
Tests of significance
• Parametric tests assume that the variable
being studied reflects a normal distribution in
the population.
(Sarantakos, 1998)
Levels of significance
• Even if the sample ‘perfectly’ represents the
population, there are always other influences
the sample – the possibility of an error.