You are on page 1of 76

Beyond PhD Coaching

Stats for
Graduate Research
Beyond the College Stats Textbook
and Course
Beyond PhD Coaching
Welcome & Overview https://www.beyondphdcoaching.com

• This course designed to go beyond your stats textbook and coursework.


• Answer the question, “But, what do I need to do to complete the <thesis or
dissertation>?”
• Four sessions.
• Major topics:
• Review & Summary of Basic Statistics Concepts
• Data: Collection, Sampling, and Sample Size
• Statistical Tools & Hypothesis Testing
• ANOVA & Regression
• Regression Model-Building & Interpretation
• What Happens After the Analysis?
• Sources:
• Listed at end.
• Using common graduate level stats textbook for most of the common ideas.1,2
• Specific citations for the specialized or unique ideas.
© 2021 by B. McAllister, PhD (All Rights Reserved)
Beyond PhD Coaching
analysis

“To consult the statistician after an experiment is finished is often
merely to ask him to conduct a post mortem examination. He can

perhaps say what the experiment died of.”3
analysis
− Sir Ronald Aylmer (“R. A.”) Fisher
− British biologist, mathematician, statistician, geneticist
− author, The Design of Experiments, 1935
− credited with inventing Design of Experiments (DOE)
at Rothamsted Experimental Station
− credited for introducing the concept of the null hypothesis

Plan your analysis rigorously and meticulously, then execute it properly.


Trying to recover after the data have been collected and analyzed is a fool’s errand.
Beyond PhD Coaching

Review & Summary of Basic


Statistics Concepts
Beyond PhD Coaching
Review of Basic Concepts
• Objectives:
• Same sheet of music.
• Same basic level of understanding.
• Focus on capstone research.
• Use proper and common terms.
• Mathematics, including statistics, has a precise language.
• Speaking another language poorly interferes with communication.
• Likewise, using statistical terms improperly confuses the message and
detracts from credibility of the analysis.
• Our goal is to understand the terms and use them correctly.
Beyond PhD Coaching
Stats Concepts: Why We Sample
• Want to understand some attribute of the population.
• Resources (time & money) may not permit a census.
• Sample a subset to infer – characterize the attribute
for the population (from a sample).
• May be comparing the population value between groups.
• Same or different? Greater or less than?
• e.g., are basketball players taller in the east than in the west U.S.?
• Or, comparing against some specification.
• Greater than or less than?
• e.g., are basketball players taller on average than 6’ tall?
• Or, assessing relationships among factors and responses (e.g., cause-effect).
Beyond PhD Coaching
Stats Concepts: Hypothesis Testing
• Start with a premise: e.g., eastern basketball players taller than western
• What we are trying to prove to be true.
• We call that a research hypothesis or the alternate hypothesis.
• We “prove” our premise by demonstrating that the converse is not likely.
• The converse: e.g., there is no difference in heights, east vs. west
• We call that the null hypothesis.
• We sample, collect data, calculate means and variation, and calculate a
test statistic (e.g., t or Z statistic).
• Compare the test statistic to a critical value (or compare p to ).
• If the sample (and its statistics) is not likely given that the null is true =>
we
• Otherwise, we
Beyond PhD Coaching
Stats Concepts: Type I and Type II errors
• Type I: false positive.
• We detect, prove, declare an effect (difference?) when it is not true.
• e.g., declare east coast players are taller when they are not.
• Type II: false negative. Truth

• Fail to detect an effect when it is true. The alternate hypothesis is The alternate hypothesis is
true or an effect exists not true or no effect exists.

• e.g., declare no difference when one exists.


• Each error is problematic.
Reject the null
Correct outcome and Type I error (false positive ):
hypothesis: Conclude
conclusion: Take Incorrect conclusion; may
the alternate hypothesis

• Relative importance is situational.


appropriate action for the take an inappropriate
is true (positive
outcome. action.
Findings: detection of an effect).

• Consider their implications: Result of the


hypothesis
• e.g., cancer test test and
conclusion.
Do not reject the null
Type II error (false
negative ): Incorrect or
hypothesis: Conclude Correct outcome and
• e.g., drug test for employment there is insufficient
inconclusive result; take an
inappropriate action or
conclusion: Take
evidence that the appropriate action and/or
continue to investigate to
• Drives sample size. alternate hypothesis is
true (failure to detect
gather more evidence that
leads to a correct or
continue to investigate for
more evidence.
any effect).
conclusive result.
Beyond PhD Coaching
Stats Concepts: Terms
• Level of significance =  = probability of a Type I error (false positive)
• Inverse is 1 −  = confidence [that we will not detect/declare an effect falsely]
•  = probability of a Type II error (false negative)
• Inverse is 1 −  = power [that we will detect/declare a true effect]
• Sampling and hypothesis testing strives to minimize these statistical errors.
• p value = probability that an observed effect could have occurred by
chance given that the null hypothesis is true.
• Lower p-value => less likely the null is true.
• p <  => reject the null hypothesis.
• Choice of  and  drives sample size.
Beyond PhD Coaching
Stats Concepts: Building Blocks of Analysis
Beyond PhD Coaching
Stats Concepts: Building Blocks of Analysis

We use statistical tests to


characterize a population. • May infer a population attribute
from a sample
(using hypothesis tests).
• Techniques used to compare the
means of two groups (e.g., t test).
• Or, compare the group mean to a
specified parameter.
• Or, to detect a correlation
(relationship) between two
variables.
Beyond PhD Coaching
Stats Concepts: Regression and ANOVA
• Different question: what is the influence of k factors
or independent variables (Xs) on a dependent variable, Y?
• e.g., What factors predict GRE scores for college seniors? (chronological age,
GPA, family disposable income, gender, race, geographical region, etc.)
• The effect in this analysis is whether or not any Xi is a significant predictor.
• Does a change in Xi predict, influence, correlate with, or cause*** a change in Y?
• Still care about power, confidence, effect size, variation, and resources!!!
• Sample size calculations more complex (use a tool; e.g., G*Power).
• Influence of one Xi on Y depends on interactions with other predictors.
• Proper analytical approach is to build a predictive model of Y composed of
various Xi s and their coefficients.
• The Xi s in the predictive model are the “significant” predictors of Y.
***Use caution when making claims about cause-effect relationships.
Beyond PhD Coaching
Stats Concepts: Which Tool to Choose?
Methodology is chosen that bests aligns with . . .

Real-World Issue

Research Problem
Quantitative
Qualitative Research?
Research? Purpose

Research Question
Beyond PhD Coaching
Stats Concepts: Which Tool to Choose?
Methodology is chosen that bests aligns with . . .

Employees in industry XXX do not use remote work when


Real-World Issue company policies permit it.

No scholarly research into reasons why employees do or do not


Research Problem work remotely in industry XXX.

Explore the factors which drive the decisions related to


Purpose employees working or not working remotely.

What are the lived What factors are correlated


Research Question experiences of employees and with the rate at which
managers related to remote employees work remotely?
work?
Beyond PhD Coaching
Stats Concepts: Variables (part 1)
• Validity of research depends on proper selection, measurement, use of
variables!
• Numerical: quantitative values.
• Continuous: measurements (number of significant digits to right of decimal
represents precision of the measurement—e.g., 62.389 inches).
• Discrete: counts (integers—e.g., number of people using remote work [62, 89, . . . ])
• Ordinal: rank or superiority is implied.
• Difference in values not accurately measured.
• e.g., Likert scale; freshman-sophomore-junior-senior. Increasing statistical
• Nominal: distinct categories or attributes. strength (power,
confidence, precision)
• No measurement or counting; no order.
• e.g., gender, race, geographical location, hair color).

Never convert a continuous numerical variable to a weaker form!!!!


Beyond PhD Coaching
Stats Concepts: Variables (part 2)

numerical
Real-World phenomenon, numerical
(linear output
input process, (except for
regression) input output
system, output binary
input
input
mechanism, treatment, output logistic
nominal
transformation, influence, regression)
(ANOVA,
correlation, or relationship
t-tests)

independent variables dependent variables


explanatory variables outcome variables
predictors an “effect” responses
control factors response variables
causes effects
Beyond PhD Coaching
Stats Concepts: Dummy Variables
• Regression requires numerical independent variables.
• Can use nominal variables if converted to dummy variables.
• Nominal variable with k values => k – 1 dummy variables (D1, D2, . . . Dk-1).
• Dummy coding:
• Each dummy variable corresponds
to a yes or no question.
• Di asks,
is the participant or record
a member of group i
(or variable i, or level i)?
Yes = 1, No = 0.
Beyond PhD Coaching
Stats Concepts: Dummy Variables
• Nominal variable with k = 4 values: 3 dummy variables (D1, D2, D3).
• Does the record belong to group A, B, or C?
• Dummy variable, D1, is the dummy variable corresponding to group A; etc.
• Group D has no dummy variable; represents the reference group. If the record or participant is not associated with A, B,
or C, then D1 = D2 = D3 = 0.
• Regression equation is 𝑌 ′ = 𝑏0 + 𝑏1 𝐷1 + 𝑏2 𝐷2 + 𝑏3 𝐷3 .
• bi represents the difference or contrast between the mean of group i and the reference group—the change when
moving from a code of 0 to a code of 1 on the dummy variable; how much belonging to group i influences Y.
Original
Categorical
Variable,
Group, or
Level D1 D2 D3
A 1 0 0
B 0 1 0
C 0 0 1
D 0 0 0 the reference/default/comparison/control variable ; not included as a separate dummy variable
k =4 A B C the "absence" of the others (a value of 0) indicates the reference/default/comparison/control variable
there are 4 original categorical variables; or, 4 values for an original categorical variable; or 4 groups; we need k − 1 = 3 dummy variable
Beyond PhD Coaching

Any questions?
Beyond PhD Coaching

Data: Collection
Instruments, Sampling, and
Sample Size
Beyond PhD Coaching
Data Collection: Instruments
• Designed Experiments
• Control the inputs to measure their influence on the output (response).
• Example: use of a mechanical/computer device to dispense patient meds.
• Observation
• Human or inanimate behavior
(static behavior or reactions to stimuli).
• Example: consumer-customer service interactions.
• Minimal control over inputs.
• Documents & Secondary Data
• Existing data from reliable, scholarly sources (e.g., company performance).
• Cannot control the variables, but collect what has already been measured.
• Surveys vs. Questionnaires
• What’s the difference?
Beyond PhD Coaching
The Nature of Data
• Likert Scales (questionnaires)
• 5-level (1-2-3-4-5 with midpoint = 3);
or, 7-level (1 through 7 with midpoint = 4).
• Ordinal variable.
• Often considered to be numerical.
• Aggregated Indices
• Mean response of several items on a Likert scale.
• Continuous numerical variable [higher statistical power & confidence].
• Operationalization of Variables
• Every variable must be defined two ways:
• How is it informed by the instrument (e.g., Likert scale, index, arithmetic operation)?
• What do the variable and its values mean in the real-world?
• Example: what does each number on the Likert scale mean operationally?
Beyond PhD Coaching
Reliability and Validity of Data
• For instruments (i.e., questionnaires), using a multi-item scale
to inform numerical variables, there is a need to measure and/or report
on instrument internal consistency reliability.
• Cronbach’s alpha: Do the items measure the same underlying construct?
• If so, correlations among all items should be positive.
•  can be increased by deleting poor items or increasing number of items (k).
• SPSS provides correlation matrix and statistics for identifying problem items.
• For off-the-shelf instruments (with permission), should be provided.
• For self-developed, can be calculated with formula 2,4 or using SPSS.
• Normally done with pilot test, of sufficient sample size.
• With a sample different from the main study, but that mirrors main study attributes.
• Pilot test also provides test of the length and understandability of the instrument.
• Sample size to determine Cronbach’s alpha depends on parameters.2,4 Example:
 = .05,  = .80, k = 5 items, CA0 = 0, CA1 = .7 => n = 16
Beyond PhD Coaching
Sampling
• Remember why we sample!
• Population: The entire set of entities you are interested in.
• Target population: A subset of the population delimited for the purpose of your
research (scope & deliminations, feasibility, etc.).
• Sample: A subset of the target population selected for the purpose of inferring
some attributes about the target population.
• Sampling Methods & Considerations
• Random sample: every element of the target population equally likely to be chosen.
• Stratified sample: reflects proportions of the target population (+ random sampling).
• Purposive sampling: non-probability sampling; researcher uses judgement when choosing
members of the target population to participate.
• Convience sampling.
• Snowball sampling.
• Survey options: conduct manually or use a service (e.g., SurveyMonkey).
Beyond PhD Coaching
Drivers of Sample Size
• Probability of Type I error (false positive) =
 = level of significance
• 1 −  = Confidence = inverse of level of significance
• Probability of Type II error (false negative) = 
• 1 −  = Power*
• How precisely we measure or compare = effect size = e (e.g.,  means of d)
• Variation (standard deviation) of the variable in the population = 
• May not be known, but can be estimated:
• In a normal dist, 99.7% of items are within a range of 6 ;   range  6 (e.g., range of 48,   8).1
• Another driver: your resources (time & money to perform the sampling)*
• Tradeoff among all five considerations
• *Many sample size “experts,” calculators, calculations fail to consider power.
• *Reality: money & time may prevent highly precise test with high power & confidence.
Beyond PhD Coaching
Sample Size Calculations
• Common errors with “experts,” calculators, tables: failure to consider statistical power, unequal group size,
population size.
• Sample size calculations for t and Z tests of means performed with an online calculator such as G*Power5
2 2 2
2 𝜎 2 𝑍𝛽 +𝑍𝛼ൗ
(𝑟+1) 𝜎 𝑍𝛽 +𝑍𝛼ൗ2 𝑟+1 2 (𝑟+1) 𝑍𝛽 +𝑍𝛼ൗ
2
𝑛= ∙ = ∙ = ∙
𝑟 𝑒2 𝑟 𝑑𝜎 2 𝑟 𝑑2

where, n = sample size per group (assuming equal sized groups) or sample size of larger group

r = ratio of larger group to smaller group (note: if two groups are of equal size, the expression = 2)

σ = population standard deviation for the dependent variable

e = difference in means (target difference) to detect

Z = standardized normal distribution critical value given the probabilities, 𝛼 and 𝛽

d = Cohen’s6 d ; represents a percentage of standard deviation, ; can be substituted for e and σ


Beyond PhD Coaching
Sample Size: t test of means, G*Power
Effect size from Cohen6 for difference in means:

effect size may be an operationally relevant


difference that someone cares about

d = percentage of population 
requires knowledge of population 
or, an estimate of population 

or, real-world descriptions of effect size:6


Medium effect = .50
“an effect likely to be visible to the naked
eye of a careful observer”
Small effect = .20
“noticeably smaller than medium but not
so small as to be trivial”
Large effect = .80
“equally distant from medium as small is”
Beyond PhD Coaching
Sample Size: Multiple Regression
• Sample size calculations for MLR
are complex.
• Use G*Power5
• Common issues:
• Choice of test (for multiple
regression, need the random model.
• Number of predictors including
factor-interactions.
• Use & rationale of effect size.6
• H0: no effect; HA: specified effect
• population squared correlation
coefficient = ρ2
• small: ρ2 = 0.02
• medium: ρ2 = 0. 13
• large: ρ2 = 0.26
Beyond PhD Coaching
Sample Size: ANOVA
• Sample size calculations for ANOVA are
complex.
• Use G*Power5
• Common issues:
• Example: 3 factors with levels of 4, 3, 2
• Number of groups should be i  j  k
where i, j, k represent levels for factors I, J, K
(e.g., 4  3  2 = 24 [combinations of levels])
• Numerator df should be number of levels for
group with most levels less one
(e.g., k − 1 = 3)
• Use & rationale of effect size6 (f: a function of
population standard deviation).
• small: f = 0. 10
• medium: f = 0. 25
• large: f = 0.40
Beyond PhD Coaching
Sample Size: Additional Considerations
• Calculate minimum sample size
using appropriate technique and tool.
• Example from multiple regression:
• n = 120 (for your , , e or d)
• Response rate (for surveys performed manually):
• Research good estimate for your survey (rate of valid, complete questionnaires).
• Example: 20% => need 120  0.20 = 600. Send out 600 to get 120.
• Consider rate of valid/complete samples.
• May have some questionnaires that have missing data, outliers, some corruption.
• Example: expect 10% reject rate => 90% valid => need 120  0.90 = 133.
• Re-calculate number to send: 133  0.20 = 665.
• Bottom line: Challenging to recover after the survey is conducted if there
are not enough valid, complete questionnaires to meet min sample size.
Beyond PhD Coaching

Statistical Tools
&
Hypothesis Testing
Beyond PhD Coaching
Data Preparation: Missing Data 1 44.83 114.65 1.18 115.84
2 46.09 117.18 0.03 117.22

• Missing Data (parts of records missing


3 49.54 124.07 -1.42 122.65
4 50.45 125.90 1.02 126.93
5 126.22 3.74 129.96

or corrupted) 6
7
52.31
50.81
129.62
126.61
-0.13
0.26
129.48
967.20

• Remedies (many, including these): 8


9
50.27
50.07
125.54
125.14
-2.65
-1.51
122.90
123.63
• Prevent missing data through careful collection. 10
11
41.62
55.25
108.24
135.50
0.40 108.65
136.39
• Collect sufficient number of samples. 12 49.20 123.41 -0.11 123.30
13 49.29 123.58 -0.14 123.45
• Use researcher judgment. 14 47.83 120.65 3.78 124.43

• Listwise deletion: eliminate record. 15


16
0.23
46.71
118.18
118.41
2.23
-1.95
120.41
116.46
• Mean substitution: 17
18
46.18
54.55 134.11
0.68
4.34
118.05
138.45
• substitute the mean value for the variable for that record. 19 49.93 124.87 -0.43

• Regression imputation: 20 48.63 122.26 -0.19 122.07

• replace with prediction from regression (requires some iteration).


• SPSS runs some of these routines to automate the missing data
remedy.
Beyond PhD Coaching
Data Preparation: Outliers (Part 1)
• Outlier (An observation lying an abnormal distance from other values in a
random sample; outside expected range; may be defined to be abnormal.)
• Judgment: Useful information or corrupted data?
• Detection & Criteria:
• Graphical analysis
• Subject matter judgment
• Distance from the mean
• In normal distribution,
99% of items within 3 .
• Any value > mean  3 
may be an outlier.
• Cook's Distance statistic > 1
or > (4 ÷ n) where n = sample size
Beyond PhD Coaching
Data Preparation: Outliers (Part 2)
• Implications and remedies:
• Explore the record and determine why it is an outlier.
• True value?
• Error or other corruption?
• Use researcher judgment:
• Useful information?
• Statistical or anecdotal?
• Unwanted statistical impacts on analysis?
• Consider the statistical test:
• Some include “no outliers” as an assumption.
• Remedies:
• Have sufficient sample size to provide options.
• Ignore the outlier and accept it as a valid sample record.
• Eliminate the record.
• Report anecdotally on the phenomenon.
• Change the value of the outlier to overcome an error with trimming, flooring, and capping.
• Accomplish statistical test with and without the outlier—compare outcomes.
Beyond PhD Coaching
Statistical Tools and Techniques

• Most of these tools employ parametric hypothesis tests.


• Parametric tests rely on the assumption that the data are distributed normally.
• Scholarly research favors parametric tests because of their ability to provide an
understandable, probabilistic test of H0, and control over Type I and II errors.
• But, if the data do not subscribe to the normality assumption, some non-
parametric tests are valid and useful.
Beyond PhD Coaching
Statistical Assumptions
• Every statistical test relies on assumptions about the data set.
• t test assumptions:
70
65
60
• Nominal independent variable (A, B groups).

Response, Y
55
50
• Continuous, numerical dependent variable. 45
40
• Each observation of the dependent variable is 35
30
independent of the observations. 0 10 20 30
Sequence of Observation
• Measurements for each sample not influenced by
or related to the measurements of other subjects. 80

• Example: time-related influences (learning curve). 70


60

Frequency
50
• Dependent variable has a normal distribution. 40
30

• ANOVA assumptions (same, except . . .)7


20
10
0

• Homogeneity of variance among all groups. Bin


• Tested with Levene’s test.
Beyond PhD Coaching
Linear Regression Assumptions 140.00

135.00

• numerical variables 130.00

• linearity: a straight-line relationship between the 125.00

Y
IVs and the RV. 120.00

115.00

• independence: the values of the residuals are 110.00

independent; no autocorrelation 105.00

(Durbin Watson test). 40.00 42.00 44.00 46.00 48.00 50.00 52.00 54.00 56.00 58.00
X

• homoscedasticity: the variation of the residuals


(error terms) is constant for all values of the IVs.
• absence of multicollinearity: no relationship among
IVs (tested with variance inflation factors [VIFs]).
• normally distributed residuals
(normal probability plot).
• no influential cases: no significant outliers
(Cook’s distance).
Beyond PhD Coaching
Hypothesis Testing
• Start with a premise: e.g., eastern basketball players are taller than western.
• This is what we are trying to prove is true through our research & analysis.
• We call that a research hypothesis or the alternate hypothesis.
• We “prove” our premise by demonstrating that the converse is not likely.
• The converse: e.g., there is no difference in heights, east vs. west.
• We call that the null hypothesis.
• We sample, collect data, calculate means and variation, and calculate a test
statistic (e.g., t or Z statistic and p value).
• Compare the test statistic to a critical value (or compare p to ).
• If test statistic > critical value or p <  . . .
• . . . the sample (+ statistics) is not likely given that the null is true.
• In other words, the null hypothesis is not likely.
• We “reject the null; declare sufficient evidence exists that the alt is true.”
• Otherwise, we “fail to reject the null; insufficient evidence for the alt.”
• What then?
We do not . . . WE NEVER . . . “accept the alternate hypothesis.”
Beyond PhD Coaching
The Concept of p Value
• Definition: p value = probability that an observed effect
occurred by chance when the null hypothesis is true.
• We put a case of really good wine on a bet:
• You claim eastern BBers are taller than western.
• I say, “Let’s measure every BBer.”
• You say, “Before I take that bet (and incur the cost),
let’s take a small sample of sample size n with  = .05.”
• Alt hypothesis: eastern BBers taller; null: they’re equal.
• Sample yields a p value of .01 <  = 05.
• Meaning, 1% chance of the sample given that the null hyp is true.
• So, since we took the sample, not likely that the null hyp is true.
• You say, “I’ll make that bet!”
• Based on the sample and the p value, we reject the null hyp and conclude there
is sufficient evidence that the alt is true (eastern BBers are taller). But wait!!!!
Beyond PhD Coaching
The Concept of p Value (part 2)
• What if p = .05 (5%)? .10? .50? .75?
• Choice of  matters (the level of significance).
• What if the bet was your car? Your house?
• Consequences of a Type I or Type II error matter.
• What about if p = .05, but the actual difference
was 1” with effect size 3”?
• Effect size (margin of error, precision) matters.
• What if p = .07 and your statistical power (1 − ) = .80?
• You might have preferred a bit more power (detecting a true effect).
• Sample size has consequences.
• , , e,  matter; these are important choices made by the researcher/analyst.
Beyond PhD Coaching

Any questions?
Beyond PhD Coaching

ANOVA
&
Regression
Beyond PhD Coaching
Advanced Tools
• Comparison of two groups: t or Z test, correlation, simple hypothesis tests
• When question is, “which factors are influential?” or multiple group comparison . . .
• . . . different, more advanced, related set of tools
Assumptions ANOVA Multiple Linear Regression Binary Logistic Regression
DV numerical categorical: binary
residuals normally distributed
homogeneity of variance homoscedasticity (constant variation of residuals
among groups across all values of IVs)
IVs categorical numerical
 2 groups, values, levels convert categorical to numerical
independent observations independent residuals
no multicollinearity among IVs
no outliers
linear relationship no DVs to IVs logit to IVs
model-guilding yes: to find best predictive model (significant predictors)
Beyond PhD Coaching
Multiple Regression

numerical Real-World phenomenon, numerical


output
independent input process, dependent
input output
variables system, output variable
input
+ input
mechanism, treatment, output (except for
factor transformation, influence, binary logistic
interactions correlation, or relationship regression)

independent variables dependent variables


explanatory variables outcome variables
predictors an “effect” responses
causes response variables
effects
Beyond PhD Coaching
Regression Analysis: Key Features
• All data collection instruments (experiments, observation, surveys).
• Y = 0 + 1X1 +  + kXk + 
• k predictors which may include factor-interactions
• coefficient for each predictor = sensitivity of Y to unit change in predictor
•  is the error or noise in the phenomenon not accounted for by the predictors
• Predictive model: error (residual) = difference between actual and predicted Y
• One research question & hypothesis pair for each dependent variable.
• Regression hypotheses:
Null (H0): 1 = 2 =  = k = 0 (no significant relationship between Y and any predictor)
Alt (HA): at least one j  0
• Hypothesis tests: F test (and p value) for model; t test for predictors
• Adjusted R2 is measure of goodness-of-fit: percentage of variation in Y
attributed to the model (of predictors).
Beyond PhD Coaching
Regression Execution
• Complex; requires statistical app such as SPSS
• Steps:
• Prep data in Excel
• Copy and paste to SPSS (paste with variables names)
• Analyze / Regression / Linear
• Choose dependent & independent variables
• Choose statistics
• Choose options
• Choose method
• Enter (manual; all predictors)
• Backward [elimination]
• Forward [selection]
• OK: run the analysis
• Check assumptions
• Perform model-building
Beyond PhD Coaching
Regression Output (SPSS)
Beyond PhD Coaching
Binary Logistic Regression
• Special case of multiple regression analysis: binary dependent variable.
• Instead of Y . . . Logit = Li = B0 + B1X1 + . . . + BKXK
• “Lōgit” is a mathematical function of the odds of one value of the binary Y
• Predict the logit then convert it to the probability of one outcome versus the other.
• Otherwise, steps the same as MLR, except . . .
• Analyze / Regression / Binary Logistic
• Choose dependent & independent variables
• Choose statistics
• Choose options
• Choose method
• Enter (manual; all predictors)
• Backward [elimination]
• Forward [selection]
• OK: run the analysis
• Perform model-building
Beyond PhD Coaching
Binary Logistic Regression Output (SPSS)
• Evaluate significance of the full model:
• 𝜒2 = 50.452, p = .000 => full model different from constant-only (null) model.
Model is a significant predictor of Y.
• Evaluate strength of association between the model (all IVs) and Y using
the Model Summary table.
• Based on *Nagelkerke’s R2 = .042: 4.2% of variation in Y is attributed to the model. Model a
significant predictor of Y, but there are other IVs that may be significant predictors.
• [*] Used Nagelkerke because it is normalized to produce values between 0 and 1, as in R2
used in conventional regression analysis.
Beyond PhD Coaching
Binary Logistic Regression Output (part 2)
• Evaluate strength of association between each IV and Y using the Variables
in the Equation table.
• Use the Wald ratios for each IV and associated p values:
• 𝜒12 = 26.711, p = .000 and 𝜒22 = 24.350, p = .000 respectively; conclude both IVs are
significantly different from even odds (the null model) and significant predictors of Y.
• Logit = Li = B0 + B1X1 + . . . + BKXK
• Logit increases (or decreases) by Bi for a unit increase in predictor, Xi.
• Given values for predictors, calculate Li; convert to probability of outcome.
𝑒 𝐿𝑖
𝑝𝑖 =
1 + 𝑒 𝐿𝑖
• e.g., probability of a 30 year old
male owning a gun is .314, or 31.4%.
Beyond PhD Coaching
ANOVA

categorical Real-World phenomenon,


output
[control] input process,
factors
input
system,
output numerical
output
+
input
mechanism, treatment, dependent
input output
factor transformation, influence, variable
interactions correlation, or relationship

factors dependent variables


control factors outcome variables
explanatory variables an “effect” responses
predictors response variables
causes effects
Beyond PhD Coaching
ANOVA: Key Features
• Often use data collection instruments (experiments, observation) with
controlled or fixed factors.
• Y = 0 + 1X1 +  + kXk +  (as in regression except categorical Xs)
• k predictors (control factors) which may include factor-interactions
• coefficient for each predictor = sensitivity of Y to unit change in predictor
•  is the error or noise in the phenomenon not accounted for by the predictors
• Predictive model: error (residual) = difference between actual and predicted Y
• Hypotheses (one for each control factor: A, B, . . .)
Null (H0): 1 = 2 = ⋯ = i ⋯ = m (means for all levels of A are equal)
where m = number of levels, values, or groups associated with factor A
Alt (HA): not all i are equal.
• Hypothesis of the interaction between factors A and B (one for each):
Null (H0): the interaction of A and B is equal to zero
Alt (HA): the interaction of A and B is not equal to zero
Beyond PhD Coaching
ANOVA: Key Features (part 2)
• Hypothesis tests: F test (and p value) for model; t test for predictors
• Adjusted R2 is measure of goodness-of-fit: Case
A
(2 levels)
B
(3 levels)
C
(4 levels)
Y
dep var

• percentage of variation in Y attributed to the model 1


2 e
e
g
g
k
j

• Factorial ANOVA often used in controlled conditions 3


4
e
e
g
g
m
p
for balanced design*. 5
6
e
e
h
h
j
k
• Case: unique combination 7
8
e
e
h
h
m
p
of factor levels. 9 e i j

• Each case is replicated.


10 e i k
11 e i m

• Example: 12
13
e
f
i
g
p
j

• min sample size = 232 14


15
f
f
g
g
k
m
• groups = 24 16 f g p

• sample size per group = 9.6


17 f h j
18 f h k

• round up to 10 19
20
f
f
h
h
m
p
• total sample size = N = 240 21
22
f
f
i
i
j
k
*SPSS can handle other designs. 23
24
f
f
i
i
m
p
Beyond PhD Coaching
ANOVA Execution
• Complex; requires statistical app such as SPSS
• Steps:
• Prep data in Excel
• Copy and paste to SPSS (paste with variables names)
• Analyze / General Linear Model / Univariate
• Choose dependent variable
& fixed factors
• Choose model
• Build terms:
• Main effects [primary factors]
• All 2-way [interactions]
Beyond PhD Coaching
ANOVA Execution (part 2)
• Choose EM [Marginal] Means
• For interaction graphs
• Choose options:
• Descriptive statistics
• Homogeneity tests
• Levene’s test
• Homogeneity of variance
• OK: run the analysis
• Check assumptions
• Perform model-building
Beyond PhD Coaching
ANOVA Output (SPSS)
Beyond PhD Coaching
Factor Interactions (Regression)
• Provides additional insights; qualifies the effect of B on Y (“it depends!”).
• “The influence of B (relationship between B & Y) depends on the value of D.”
• Interaction indicated by non-parallel lines (tested statistically).
Always test significance of at least the two-factor interactions.
Depiction 2FI: 𝑌-predicted vs. B when D = min, mean, max Y = hardness of steel
is a 30.0
B = temperature the alloy is heated to

Y-pred
regression D = min 28.0 D = percentage of element D
example; D = mean 26.0
24.0
same D = max Hardness increases with increased temperature . . .
22.0
concept 20.0
But, that depends on the percentage content of D.
for ANOVA. 18.0 When D is at its lowest level, hardness increases
16.0 steeply with temperature.
14.0
12.0
Less so with an average % of D.
B 10.0 And, hardness decreases with increased temperature
35.0 45.0 55.0 65.0 75.0 85.0 95.0
at highest % of D.
Beyond PhD Coaching
Factor Interactions (ANOVA)
• “Influence (relationship) between X and Y depends on the value of C.”
• Provides additional insights; qualifies the effect of X on Y (“it depends!”).
• Again, interaction indicated by non-parallel lines (tested statistically).

Y = socioeconomic index
X = marital status (5 levels or groups)
C = American citizen (2 levels: yes or no)

Non-parallel lines => possible interaction (X*C)


Look at two cases:
Widowed and not a citizen.
Separated and not a citizen.
Refer back to ANOVA outputs to interpret the
interaction.
Beyond PhD Coaching

Any questions?
Beyond PhD Coaching

Regression Model-Building
&
Interpretation
Beyond PhD Coaching
Model-Building
• For multi-attribute research & analysis techniques (regression, ANOVA)
• Interrelated questions which generate research:
• Which independent variables or factors are significant predictors of a response?
• Sensitivity of response to various predictors?
• Issues with real-world phenomena :
• Incompletely understood and complex.
• Unknown causes; lurking influences; interactions.
• Sampling yields less than 100% certainty.
• The influence of a predictor depends
on the presence of other predictors.
• Better questions:
• What model is the best predictor of a response? (and, is it a significant predictor?)
• What set of predictors comprise that model? (factors & interactions)
Beyond PhD Coaching
Defenses Against Uncertainty
• Define the problem (including the variables).
• Plan the approach.
• Choose an appropriate instrument.
• Adequate sample size.*
• Rigorous sampling.
• Careful data collection.
• Correct analysis tool.
• Proper use of the tool.
• Model-building: recognize one run of SPSS will not answer the question.
• Model validation & stability investigations, using bootstrapping.8
• Thorough explanation & interpretation, including subject matter expertise.
*Some recommend sample events-per-variable (EPV) > 25 with some stability investigation; never < 10.8,9
Earlier: 5 predictors, med effect size,  = .05, power = .90 => n = 120; EVP of 25 => n’ = 5  25 = 125.
Beyond PhD Coaching
Model-Building Strategy
• Factor screening (theory + SMEs + initial statistical investigation).
• Appropriate variable selection criteria (inclusion criteria, or 𝐵).
• Understand Type I and II errors related to models.
• Focus more on model correctness, completeness, and goodness-of-fit
than on variable significance.10
• Avoid reliance on automated stepwise methods*; but use as evidence.
• Use multiple model-building techniques.11
• Develop collaborative evidence to develop and choose the best model.
• Try different combinations of predictors. *Documented flaws: included
• Investigate factor interactions. predictors highly dependent on the
order of selection or elimination.
Beyond PhD Coaching
Four Stages of Model-Building
• Stage 1: Theory, previous research, empirical results, subject matter
Final Model
expertise to identify candidate independent variables.
• Stage 2: Screening: perform multiple analyses to identify Stageand
4 eliminate
independent variables highly unlikely to be significant predictors.
• Stage 3: Analyze remaining independentStage 3
variables + 2FIs.
• Stage 4: Compare results Stage
from2 all stages and all model-building
techniques to decide on a final predictive model.
• Stages 2 & 3Stage
employ
1 four regression techniques collaboratively . . .
• . . . to generate evidence to select a final predictive model.

It’s all about variable selection!


Beyond PhD Coaching
Regression Technique 1 (of 4)
(1) Best-subsets regression
• Assesses all predictor combinations
• Mallows’ CP 1,12 L
T
T
T
D
T
D
T
D
T
• Compares bias & precision between a R
*
R
*
C
*
R
*
C
*
fitted model and a full model (all D L T W D W L D L T W
T T T B T B T T T T B
predictors) Vars R-Sq R-Sq (adj) R-Sq (pred) Mallows Cp SR R R R C C C C C R C
• Small CP : precise but biased 1
1
15.8
13.4
15.4
13.0
13.7
11.3
24.5
31.2 X
X

2 20.0 19.2 17.1 14.8 X X


• Large CP : lacking in precision 2 18.9 18.1 16.0 17.9 X X
3 21.4 20.3 17.6 12.9 X X X
• Want CP close to or < k + 1 3 21.4 20.2 17.7 13.0 X X X
4 22.7 21.2 18.2 11.2 X X X X
• Adjusted R2 4
5
22.6
24.0
21.1
22.2
18.1
18.6
11.4
9.5
X
X
X
X X
X
X X
X

• Model goodness-of-fit 5
6
23.9
24.7
22.1
22.5
18.7
18.9
9.9
9.6 X X
X
X
X
X
X
X X
X X

6 24.6 22.4 18.6 9.9 X X X X X X


• Higher is better 7 25.8 23.3 19.0 8.6 X X X X X X X
7 25.5 23.0 18.8 9.4 X X X X X X X
• Consider models with . . . 8
8
26.5
26.4
23.7
23.5
19.3
19.2
8.5
8.9
X X
X
X
X
X
X
X
X
X
X
X
X
X
X X
• highest adjusted R2 9
9
27.1
26.9
23.9
23.7
19.2
19.0
9.0
9.3 X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X X
X
10 27.4 23.8 18.9 10.1 X X X X X X X X X X
• and, right CP 10 27.2 23.6 18.5 10.7 X X X X X X X X X X
11 27.4 23.5 17.6 12.0 X X X X X X X X X X X
Beyond PhD Coaching
Regression Techniques 2 and 3
• Stepwise regression successively adds or
removes independent variables.
• Statistical regression is one form of stepwise.
• Automated variable selection.
(2) Backward elimination
• All variables are entered to start.
• Then sequentially removed.
• Variables with the smallest partial correlation.
with the dependent variable considered for removal.
• Removed only if it satisfies the variable exit criterion.
(3) Forward selection
• No variables to start.
• Variables sequentially entered.
• Largest partial correlation with the dependent considered.
• Entered only if it satisfies the variable entry criterion.
Beyond PhD Coaching
Regression Technique 4
(4) Sequential regression
• aka, purposeful selection model-building
• Manual stepwise regression with judgment . . .
• . . . and, iteration—different combinations.
• Informed by previous regression analyses.
• Consider variable inclusion criterion (𝐵 ) . . .
• . . . and, impact on adjusted R2.
• Continue with sequence and iteration . . .
• . . . Until all terms (including interactions) meet
inclusion criterion and improve adjusted R2.
Strategy: generate collective evidence;
avoid pitfalls with automated stepwise;
develop “best” predictive model.
Beyond PhD Coaching
Variable Selection Criteria: Two Issues
• 2 issues to address & balance:10 true,
1. Overspecification of the model. population
distribution of Y
• Too many predictors for the sample size.
• Some are not true predictors.
• Noise and artifacts of random sampling.
• Higher variation in the average predicted
෡ ); mean may be true.
response (𝒀
2. Missing variable bias.
• Did not identify or select a true predictor.
• Contributes to bias in average predicted
response.
• Some scholarly research supports greater concern for missing variable bias.
• Less stringent variable inclusion criterion (𝐵) than model  = .05.
• Akaike information criterion (AIC) equivalent of 𝐵 = .157. 8,9
• Only with EPV > 100, consider 𝐵 ≤ .05.
Beyond PhD Coaching

What Happens
After the Analysis?
Beyond PhD Coaching
What Next?
• Statistics are not just numbers; they tell a story.13
• Someone is expecting both the numbers and the story.
• Several levels of rigorous statistical analysis:
data • Using tools provide graphical analysis, descriptive stats,
hypothesis testing, model-building. Some stop here . . . .
information • Presentation of outputs suggest relationships, patterns,
observations about mean & variance, results of hypothesis tests,
what adj R2 says, factor interactions. . . . . or, here.
knowledge • Explanation of stats:
• Subject matter expertise, prior research, theory to explain and interpret
statistical outputs.
• Corroboration or refutation of prior thinking.
wisdom • Not what was found, but why?
• Correlation and causation. This is where you become a scholar.
action • What real-world decisions are recommended?
Beyond PhD Coaching
What Might You Be Asked (Orals)?
• What’s the real-world issue at the heart of (the motivation for) your research?
• Why quantitative? ref14
• Why your choice of quantitative methodology?
• Where did your variables come from? Operational definitions?
• Rationale for statistical parameters (, power, effect size)?
• Rationale for sampling technique?
• Rationale for instrument?
• How do you know your instrument is valid and reliable?
• Did you consider factor interactions?
• What do factor interactions tell you?
• What did you learn that was not previously known or understood?
• How can someone use your research, analysis, and findings?
• What are your plans for disseminating your research, analysis, and findings?
Beyond PhD Coaching
So, What’s It All Mean?
• Facts are stubborn, but statistics are more pliable–Mark Twain.15
• It is the mark of a truly intelligent person to be moved by statistics– George
Bernard Shaw.15
• In ancient times they had no statistics so they had to fall back on lies– Stephen
Leacock.15
• There is a good reason why quantitative research is used.
• There is an opportunity for valid, reliable, believable analysis.
• But, it is not guaranteed just because we choose a well respected tool.
• Only one person responsible here—the researcher/analyst/candidate.
• Burden of proof!!!
• You are now armed and dangerous; make the most of it; don’t stop researching
and asking the tough questions!
• I would say, “Good Luck,” but you don’t need luck. So, “All the best!”
Beyond PhD Coaching

To consult a statistician after an analysis is finished is merely asking him (or


her) to conduct a post mortem examination. He or she can perhaps say
what the analysis died of.

− (paraphrased) Sir Ronald Aylmer (“R. A.”) Fisher

Plan your analysis rigorously and meticulously, then execute it properly.


Trying to recover after the data have been collected and analyzed is a fool’s errand.
Beyond PhD Coaching

Any final questions?


Beyond PhD Coaching
References (part 1): in order of use
1. Levine, D. M., Berenson, M. L., Krehbiel, T. C., & Stephan, D. F. (2011). Statistics for managers using MS Excel.
Boston, MA: Prentice Hall/Pearson.
2. Warner, R. M. (2013). Applied statistics: From bivariate through multivariate techniques. Los Angeles: Sage.
3. Fisher, R. A. Experiments. In S. Ratcliffe (Ed.), Oxford Essential Quotations (2018). Oxford University Press.
https://www.oxfordreference.com/view/10.1093/acref/9780191866692.001.0001/q-oro-ed6-00004418
4. Bujang, M. A., Omar, E. D., & Baharum, N. A. (2018). A review on sample size determination for Chronbach’s
Alpha test: A simple guide for researchers. Malays J Med Sci, 25(6), 85–99. https://doi.org/10.
21315/mjms2018.25.6.9
5. Faul, F., Erdfelder, E., Lang, A., & Buchner, A. (2007). G*Power 3: A flexible statistical power analysis program
for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39(2), 175-191.
6. Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Hillsdale, NJ: Lawrence Erlbaum
Associates.
7. University of Alberta (2022). [Lecture notes on ANOVA assumptions].
https://sites.ualberta.ca/~lkgray/uploads/7/3/6/2/7362679/slides_-_anova_assumptions.pdf
8. Heinze, G., Wallisch, C., & Dunkler, D. (2017). Variable selection: A review and recommendations for the
practicing statistician. Biometric Journal, 60, 431-449. https://doi.org/10.1002/bimj.201700067
9. Heinze, G. & Dunkler, D. (2017). Five myths about variable selection. Transplant International, 30, 6-10.
Beyond PhD Coaching
References (part 2)
10. Dranove, D. (2009, Spring). [MGMT 469 lecture notes on Model specification: Choosing the right variables for
the right hand side]. Department of Management and Strategy, Kellogg School of Management,
Northwestern University.
https://www.kellogg.northwestern.edu/faculty/dranove/htm/dranove/coursepages/Mgmt%20469/choosing
%20variables.pdf
11. Nunez, E., Steyerberg, E., & Nunez, J. (2011). Regression modeling strategies. Revista Española de Cardiología
(English Edition), 64(6), 501-507. https://doi.org/10.1016/j.rec.2011.01.017
12. Pennsylvania State University (PSU), Eberly College of Science. (2018). Best subsets regression, adjusted R-Sq,
Mallows CP. https://online.stat.psu.edu/stat462/node/197.
13. James, B. (2016, April 1). Creating baseball fiction with numbers [Paper presentation]. Baseball Fiction
Academic Conference, Ottawa, KS, United States.
https://www.billjamesonline.com/creating_baseball_fiction_with_numbers
14. Cham, J. (2012). Your thesis committee [Image]. Piled Higher and Deeper.
https://phdcomics.com/comics/archive_print.php?comicid=1537
15. Koretsky, I. (2022). The chief storyteller. https://www.thechiefstoryteller.com/2017/09/27/15-funny-quotes-
statistics-math-science
16. https://www.freepik.com/photos/education'>Education photo created by freepik - www.freepik.com</a>

You might also like