How To Choose The Correct Statistical Test

Quantitative Methods:
Choosing a statistical test

Summer School June 2015
Dr. Tracie Afifi
Learning Objective
How to pick the right statistical test
To pick the correct statistical test you need to know…
• What your research question asking

• The level of measurement of the variables
• The distribution of the data
Common Statistical Tests
• T-test • Mann Whitney U
• ANOVA • Kruskal Wallis Test
• Chi-Square Test
• Pearsons Correlation • Spearmans Correlation
• Linear Regression
• Logistic Regression
What is your research question
asking?
Choosing a Statistical Test
Is there a difference?
• Is there a difference in depression among
adolescents who are sexually abused
compared to adolescents who are not sexually
abused?
• Chi-Square Test
• Chi-Square Test
But how do you know which one to choose?

• Chi-Square Test
What are the variables?

• Chi-Square Test
How are the variables measured?

• Chi-Square Test
How are the variables measured?
What is the distribution of the data?

What are the Variables?
abused?
What are the Variables?
abused?
One Variable is Sexual Abuse One Variable is Depression

How are the Variables Measured?
Sexual Abuse Depression
• Categories (yes or no)
• Categories (none, minor, moderate, severe)
• Scores (e.g., 0-10)
How are the Variables Measured?
Level of Measurement
• Nominal
– Named categories with no order
• Ordinal
– Categories with a logical order or rank order
• Interval
– Rank order AND distant between intervals of measurement have
meaning (zero value is arbitrary).
• Ratio
– Same properties as interval data AND the distance and ratio between
two measurements are defined and has an empirical (not arbitrary)
zero value.
– You can say a score of 20 is “twice as much” as 10.
Liamputtong 2013
Type Description
Nominal Classes or categories without numerical order
•Male, female
•Jewish, Catholic, Muslim
Ordinal (ranked) Ordered categories
•Mild pain, moderate pain, and severe pain
•High school, undergraduate, graduate
Interval The distance or interval between two
measurements have meaning
•Temperature in Celsius (zero = 273.15 Kelvin)
Ratio The distance and ratio between two
measurements are defined and zero has a
meaning of zero and therefore you can say “twice
as much”
•Weight
•Age in years
•Temperature in Kelvin (absolute zero)
What is the Distribution of the Data?
Central Tendency and Dispersion
• Central tendency
– Where the bulk of the data lie.
• Mode, Median, Mean, etc
• Dispersion
– How wide or narrow the data are
spread out.
• Number of categories, Range, Standard
Deviation, etc
Health Research Methods: A Canadian Perspective (2014) Edited by K. Bassil & D. Zabkiewicz; Chapter 7, pp. 119-142
Central Tendency
• Mode
– The value that appears most often
– (3, 4, 5, 6, 8, 8, 15) Mode = 8
• Mean
– The arithmetic average of the observations
– (3, 4, 5, 6, 8, 8, 15) Mean = 7
• Median
– Middle value (3, 4, 5, 6, 8, 8, 15) Median = 6
Level of Central Tendency Dispersion
Measurement
Nominal Mode (most frequent category) Number of categories
Ordinal Median (data are ranked, middle value with Range and the
half above and half below) Interquartile range
(median of upper half
and median of lower
half IQR is difference
between the two)
Interval Mean (summed and divided by number) Standard Deviation

(how much each data
point deviates from the
mean)
Ratio Mean (summed and divided by number) Standard Deviation
Health Research Methods: A Canadian Perspective (2014) Edited by K. Bassil & D. Zabkiewicz; Chapter 7,
pp. 119-142
Level of Central Tendency Dispersion
Measurement
Nominal Mode (most frequent category) Number of categories
Ordinal Median (data are ranked, middle value with Range and the
half above and half below) Interquartile range
(median of upper half
and median of lower
half IQR is difference
NON-PARAMETERIC TESTS between the two)
Interval Mean (summed and divided by number) Standard Deviation

(how much each data
point deviates from the
mean)
PARAMETERIC TESTS
Ratio Mean (summed and divided by number) Standard Deviation
Health Research Methods: A Canadian Perspective (2014) Edited by K. Bassil & D. Zabkiewicz; Chapter 7,
pp. 119-142
What is the Distribution of the Data?
Normal Distribution
Or
Non-Normal Distribution
Normal Distribution
Average Hours of Sleep
Mean = 7.92
Std Error = 0.13
95% CI = 7.68 to 8.18
Non-Normal Distribution
Among respondents with babies
Mean = 5.88
Std Error = 0.30
95% CI = 5.27 to 6.49
Distribution of the Data
• Parametric test
– Interval or ratio level data with a NORMAL
DISTRIBUTION
• Non-parametric test
– Nominal or ordinal level data or interval or ratio
with a NON-NORMAL DISTRIBUTION
Common Statistical Tests
Parametric Non-Parametric
• Chi-Square Test
T-test
• To test if two means are statistically different?
– One variable is Continuous (interval or ratio level)

– One variable is Dichotomous (two categories)
– Distribution of continuous variable is NORMAL (bell

curve)
T-test
• Is the mean depression score different for
adolescents who are sexually abused compared
to adolescents who are non-sexually abused?
• Sexual abuse = Yes or No (nominal or Dichotomous)
• Depression = 1 to 10 (interval with higher scores worse
depression)
Depression (mean)
Total Sample 4
No Sexual abuse 2
Sexual abuse 8
What if the Distribution was
NON-NORMAL?
– One variable is Continuous (interval or ratio level)
with a NON-NORMAL DISTRIBUTION
– One variable is Dichotomous (two categories)

Mann-Whitney U test
• A non-parametric test for comparing ordinal ,
or non-normal continuous level data for two
independent groups
• Non-normal distribution
– One Variable
• Ordinal or non-normal continuous level
– One Variable
• Two-level-categorical, dichotomous
Bruce, 2008 Quantitative Methods for Health Research, pp. 491-495

– Difference in means in two – Difference in medians in two
groups groups
• What if you have three groups or more?
– No sexual abuse, minor sexual abuse, moderate
sexual abuse, severe sexual abuse?
ANOVA
Analysis of Variance
• Used to compare statistical difference between three or more

group means
• ANOVA compares differences across all means at the same time
• Distribution of the sample means are normal (Parametric)
– Dependent Variable
• Continuous (one variable)
– Independent Variable
• Categorical (One variable with more than two levels or groups)
Bruce, (2008); Tabachnick & Fidell (2007); Winston (1999); Liamputtong, 2013
ANOVA
• Are the mean depression score different for adolescents who experience
mild sexual abuse, moderate sexual abuse, or severe sexual abuse?
– Distribution of depression scores is NORMAL
• Sexual abuse (Ordinal as none, minor, moderate, severe)

• Depression (interval ranging 0 to 10)
Depression (mean)
Total Sample 4
No Sexual Abuse 2
Minor Sexual Abuse 4
Moderate Sexual Abuse 7
Severe Sexual Abuse 9
ANOVA
• To test if three or means are statistically
different?
– One variable is continuous (interval or ratio level)

with a NORMAL DISTRIBUTION
– One variable is categorical (three or more categories)

What if the Distribution was
NON-NORMAL?
– One variable is ordinal OR continuous (interval or ratio

level) with a NON-NORMAL DISTRIBUTION
– One variable is Categorical (three or more categories)

Kruskal Wallis Test
• Median scores from three or more groups

– One variable = continuous (non-normal) or ordinal
– One variable = categorical with 3 levels or more
– An extension of the Mann Whitney U test and the
non-parametric equivalent to ANOVA.
Liamputtong, 2013
Chi-Square Test of Significance (X2)
• Non-parametric test (Non-normal distribution)
– One Variable
• Categorical with 2 or more levels
– One Variable
• Categorical with 2 or more levels
Bruce (2007); Tabachnick & Fidell (2007); Winston (1999)

• Chi-Square Test
Is there a relationship?
• Is there a positive correlation between sexual
abuse and depression?
• Is sexual abuse severity associated with

increased severity of depression?
• Is sexual abuse associated with increased odds

of depression?
• Is there a positive correlation between sexual
abuse and depression? Correlation
• Is sexual abuse severity associated with

increased severity of depression?
Linear Regression
• Is sexual abuse associated with increased odds

of depression? Logistic Regression
• Pearsons Correlation • Spearmans Correlation
• Linear Regression
• Logistic Regression
Correlation
Strength of a linear relationship
Pearson Spearman
• Distribution of the variables • Distribution of the variables
are normal (parametric test) are non-normal (non-
parametric test) OR one or
more variables are ordinal
– One Variable
• Continuous – One Variable
– One Variable • Continuous/Categorical
• Continuous – One Variable
• Continuous/Categorical

Linear Regression
• Describes how one variable (DV) depends on
the other variable (IV)
• Regression estimates the relationship between

two variables
– One Dependent Variable

• Continuous
– One or more Independent Variables
• Any level of measurement
Logistic Regression
• Predicts a dichotomous outcome from one or more
Independent variables (Odds Ratio)
• Parametric test (some distribution assumptions apply)
– One Dependent Variable

• Dichotomous (two categories)
– One or More Independent Variables

• Any level
Parametric Test (Normal Distribution) Non-Parametric Test (Non-Normal Distribution)
Pearsons Correlation Spearmans Correlation
One variable = continuous One variable = continuous or categorical
One variable = continuous One variable = continuous or categorical
Linear Regression
Dependent variable = continuous (1 variable)
Independent variable = any level (1 or more)
Logistic Regression
Dependent variable = Dichotomous (1 variable)
Independent variable = any level (1 or more)
Parametric Test (Normal Distribution) Non-Parametric Test (Non-Normal Distribution)
T-test (difference in means) Mann Whitney U (difference in Medians)
One variable = continuous One variable = Continuous or ordinal
One variable = Dichotomous One variable = dichotomous
ANOVA Kruskal Wallis Test

One variable = continuous One variable = continuous or ordinal
One variable = 3 or more categories One variable = categories or more
Chi-Square Test
One variable = 2 or more categories
One variable = 2 or more categories
To pick the correct statistical test you need to know…
• What your research question asking

• The level of measurement of the variables
• The distribution of the data

How To Choose The Correct Statistical Test

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

How To Choose The Correct Statistical Test

Uploaded by

Copyright:

Available Formats

Quantitative Methods:

Choosing a statistical test

• What your research question asking

But how do you know which one to choose?

But how do you know which one to choose?

What are the variables?

But how do you know which one to choose?

What are the variables?

How are the variables measured?

But how do you know which one to choose?

What are the variables?

How are the variables measured?

What is the distribution of the data?

One Variable is Sexual Abuse One Variable is Depression

Interval Mean (summed and divided by number) Standard Deviation

Ratio Mean (summed and divided by number) Standard Deviation

Interval Mean (summed and divided by number) Standard Deviation

Ratio Mean (summed and divided by number) Standard Deviation

– One variable is Continuous (interval or ratio level)

– Distribution of continuous variable is NORMAL (bell

– One variable is Dichotomous (two categories)

Bruce, 2008 Quantitative Methods for Health Research, pp. 491-495

• Used to compare statistical difference between three or more

• ANOVA compares differences across all means at the same time

• Distribution of the sample means are normal (Parametric)

– Distribution of depression scores is NORMAL

• Sexual abuse (Ordinal as none, minor, moderate, severe)

– One variable is continuous (interval or ratio level)

– One variable is categorical (three or more categories)

– One variable is ordinal OR continuous (interval or ratio

– One variable is Categorical (three or more categories)

• Median scores from three or more groups

• Non-parametric test (Non-normal distribution)

Bruce (2007); Tabachnick & Fidell (2007); Winston (1999)

• Is sexual abuse severity associated with

• Is sexual abuse associated with increased odds

• Is sexual abuse severity associated with

• Is sexual abuse associated with increased odds

Bruce, 2008 Quantitative Methods for Health Research, pp. 74-78

• Regression estimates the relationship between

– One Dependent Variable

• Parametric test (some distribution assumptions apply)

– One Dependent Variable

– One or More Independent Variables

ANOVA Kruskal Wallis Test

• What your research question asking

You might also like