You are on page 1of 34

Choose the Right Statistical Analysis using

Four Key Questions

Karen Grace-Martin
What You’ll Learn Today:
• The Four Questions you must answer to choose an appropriate statistical
method
• How they come together to help you narrow it way down
• Our focus today is on choosing the right type of model or test, not details of
that model or test
• Where to get help with the process

2
The Problem

3
1. The statistical test or model
answers the research question
What Makes a or sets you up to do so in
Statistical Method another analysis
Right?
2. The data meet all statistical
assumptions of the test

4
Four Questions to Answer for Each Analysis

1. What is your research question?


2. What is the design?
3. Which variables will you use to answer the research
question and what is the scale of measurement of each?

4. Are there any data issues?

5
Part 1: Part 2: Part 3: Part 4:
Define & Design Prepare & Explore Test & Refine Answer
Interpret results
13
Check and resolve 12
data issues 14
Communicate results
Collect, code, enter,
and clean data 11 Check assumptions
Estimate sample size 5 6
Run univariate and
7 bivariate descriptives
Write an analysis plan 4 Create new variables 8 10 Refine predictors
and check model fit
3 Choose the variables and 9
determine their measurement levels Run an initial model
2 Design the study or define the design
The Steps
Write out research questions
1 in theoretical and operational terms

6
An Example
I want to do the tests on a measure of Gestational Diabetes in conjunction with
Iron and/or Vitamin C supplementation, so:
Dependent Variable:
Gestational Diabetes (0 = no; 1 = yes)
Independent Variables:
Iron Supplementation (0 = no; 1 = yes)
Vitamin C Supplementation (0 = no; 1 = yes)
So from what I understand, I should be able to do a chi-squared test, since
they're all categorical variables? Is that correct, or did I miss something big?
Also, I need to validate those against a few confounding variables, namely
Age (continuous)
Body Mass Index (continuous)
Parity (continuous)

7
Question 1. What is your Research Question?

Step 1: Write out the Research Questions in


Theoretical and Operational Terms

8
Theoretical Question: Key Info:

1. Comparing
Do levels of Iron and Vitamin C affect the likelihood of gestational
diabetes? four groups on
a dependent
variable
Operational Question:
2. Control for
covariates
Do women who have received Iron supplements, Vitamin C
supplements, or both, during the first trimester of pregnancy have
different likelihood of developing gestational diabetes at any point
during the pregnancy, controlling for age, BMI, and parity, compared
to those who received a standard prenatal vitamin?

9
What We Get from the Operational Research Question
If the Research Question Contains: Statistical Method Needs to be
able to include:
Predicts; Relationship between; Affects Regression modeling, usually
Controlling for; Confounding variables; Control Variables
Above and beyond
When ….; In the presence of…; Moderate Interactions
Group comparisons Categorical Predictors

Mediates the relationship between; Mediation or Path Analysis


Affects M, which in turn affects Y

Change over time Repeated Measures

10
What Makes Question 1 What Makes Question 1
Important to Answer Difficult to Answer

• Not all research questions are testable • Translating from theory to operation

• This will directly affect the design, the • Knowing what tests are available helps
variables, and the analysis

• Knowing what variables you can actual


collect will help narrow it down

11
2. What is the design?

Step 2. Design the study or define the design

12
Step 2: Design Elements

Sampling: simple, stratified, convenience, matching, clustering

Assignment of subjects to conditions/predictors: random assignment, observed

Restrictions on randomization: blocking, order effects

Co-occurrence of conditions: nested and crossed effects

Independence: matching, clustering, repeated measures, longitudinal

13
Step 2: Design Names are Not Helpful

Design Names The Detail we need


Stratified Survey Sampling
Randomized Control Trial Assignment of subjects to conditions
Case-Control Study Restrictions on randomization
Split Plot Design Co-occurrence of conditions
Observational Design Independence
Crossover Design Selection of factors
Repeated Measures Design

14
What Makes Question 2 What Makes Question 2
Important to Answer Difficult to Answer
• Some design decisions are very logical • You need to consider logistical
but make the analysis much more constraints now
difficult
• Different research questions can have
• Failing to account for design issues in different designs in the same study
the analysis will lead to inaccurate
results
• Names of designs aren’t helpful
• The design affects which research
questions you can test • Design issues can get easily complicated

15
Step 2: Define The Design Missing Info:

I want to do the tests on a measure of Gestational Diabetes in conjunction 1. Are iron and
with Iron and/or Vitamin C supplementation, so: vitamin C
Dependent Variable: conditions crossed?
Gestational Diabetes (0 = no; 1 = yes)
Independent Variables: - Assume Yes
Iron Supplementation (0 = no; 1 = yes)
Vitamin C Supplementation (0 = no; 1 = yes) 2. Are patients
So from what I understand, I should be able to do a chi-squared test, since nested within
they're all categorical variables? Is that correct, or did I miss something big? doctors or
Also, I need to validate those against a few confounding variables, namely randomly sampled?
Age (continuous)
Body Mass Index (continuous) - Assume Yes
Parity (continuous)

16
Question 3. Which variables will you use to answer
the research question and what is the scale of
measurement of each?

Step 3. Choose the variables and determine their level of


measurement

17
Dependent Variable Types

Continuous, unbounded, interval: Linear Model


Binary: Logistic or Probit Regression
Unordered categories: Multinomial Logistic
Ordered Categories: Ordinal Logistic
Discrete counts: Poisson Family – Poisson, Negative Binomial
Proportion: Logistic, Tobit, Beta
Time to event: Survival Analysis

18
Independent Variable Types

1. Numerical 2. Categorical

19
Other Types of Terms

1. Interactions 2. Polynomials

20
What Makes Question 3 What Makes Question 3
Important to Answer Difficult to Answer

• It has a direct impact on • Data sets often contain (or could) multiple
assumptions being met versions of the same variable

• Huge impact on the difficulty of • Part of the analysis may be about creating
the statistical method chosen variables

• The same variable can be considered


different levels of measurement in
different contexts

21
Step 3: The Variables Key Info:

I want to do the tests on a measure of Gestational Diabetes in conjunction 1. DV is Binary


with Iron and/or Vitamin C supplementation, so:
Dependent Variable: 2. Both IVs are
Gestational Diabetes (0 = no; 1 = yes) Categorical
Independent Variables:
Iron Supplementation (0 = no; 1 = yes) 3. All Covariates are
Vitamin C Supplementation (0 = no; 1 = yes) Continuous

So from what I understand, I should be able to do a chi-squared test, since


they're all categorical variables? Is that correct, or did I miss something big?
Also, I need to validate those against a few confounding variables, namely
Age (continuous)
Body Mass Index (continuous)
Parity (continuous)

22
Now, pulling these together and anticipating
later steps…

Step 4. Write an analysis plan

23
The Data Analysis Plan Will Usually Change

24
❛❛ To consult the statistician after an experiment is finished is often
❛❛
merely to ask him to conduct a post mortem examination. He can
perhaps say what the experiment died of.

- Ronald Fisher

25
The Analysis Plan
Questions: Statistical Method Needs to be Indicates need for:
able to include:
1. Research questions - Comparing groups on DV - Some kind of ANCOVA or
- Controlling for covariates regression
2. Design - Crossed Factors - Include interaction
- Nesting of Individuals within - Mixed Model
Doctors
3. Variables - Binary outcome - Logistic Regression
- Two categorical predictors
- Three continuous covariates

26
The Analysis Plan
Questions: Statistical Method Needs to be Indicates need for:
able to include:
1. Research questions - Comparing groups on DV - Some kind of ANCOVA or
- Controlling for covariates regression
2. Design - Crossed Factors - Include interaction
- Nesting of Individuals within - Mixed Model
Doctors
3. Variables - Binary outcome - Logistic Regression
- Two categorical predictors
- Three continuous covariates

Conclusion: Generalized Linear Mixed Model


27
Question 4. Now that you’ve collected the data, are
there any data issues?

Step 7. Run univariate and bivariate statistics


Step 12. Check for and resolve data issues

28
Data Issues

1. Outliers and Influential 5. Lack of Variation


Points

6. Censoring and Truncation


2. Missing Data

7. Zero Inflation
3. Multicollinearity

4. Small Sample Sizes

29
Step 4: Data Issues Potential Issues:

I want to do the tests on a measure of Gestational Diabetes in conjunction 1. Outliers on


with Iron and/or Vitamin C supplementation, so: Covariates
Dependent Variable:
Gestational Diabetes (0 = no; 1 = yes) 2. Missing Data
Independent Variables: 3. Multicollinearity
Iron Supplementation (0 = no; 1 = yes)
Vitamin C Supplementation (0 = no; 1 = yes) among Covariates

So from what I understand, I should be able to do a chi-squared test, since 4. Sample Size
they're all categorical variables? Is that correct, or did I miss something big?
Also, I need to validate those against a few confounding variables, namely 5. Lack of Variation

Age (continuous)
Body Mass Index (continuous)
Parity (continuous)

30
To Review:
Steps:
1. Write out research questions in theoretical and operational terms
2. Design the study or define the design
3. Choose the variables and determine their level of measurement
4. Write an analysis plan
7. Run univariate and bivariate statistics
12. Check for and resolve data issues

31
Poll

32
Strategies to Make this Easier

1. Do all 14 Steps in this general order and expect to loop back


2. Specifically, write the data analysis plan before collecting data
3. Write out the answers to the four questions
4. Keep learning and practicing
5. Talk it out with someone knowledgeable

33
Bonus Guide:

Types of Regression Models


An Overview of 18 Types of Models and When
To Use Them

34

You might also like