You are on page 1of 20

Assessing

Regression
Relationships:
AnalysesLecture 7
February 25,
2020
Announceme
nts
■ Midterm viewing on February 28; 11:00 AM –
1:00 PM in R2-013
– No cellphones/cameras allowed

■ Final exam date: April 28; 2PM


Assessing relationships
with continuous
variables
Non-
Samples Parametric test
parame
tric
Last test
week
2 Pearson Spearma
continuo n
us correlati correlati
variables on on
2+ Non-
Linear
Continuo parametric
regression
us regression
variables
Regression
analyses Non-
Depende Independe Parametr
paramet
nt nt ic
ric test
variable variable test
Non-
Linear
Continuous Continuou paramet
regressi
s ric
on
regressio
n
Non-
Logistic
Categorical Continuou paramet
s ric
regressi
regressio
on
n
Non-
Linear
regression
■ Approach for modelling the linear relationship
between a dependent variable (y) and one
or more independent variables (X)
– Independent variable: predictor
– Dependent variable: outcome
■ Simple linear regression: one dependent
variable and one independent variable
(Y=B0+B1X1+E)
■ Multiple linear regression: one dependent
variable and more than one independent
variables
■ Used to assess the strength of a relationship
between predictor(s) and an outcome. Can
Significance of a linear
model
■ Regression line: straight line that attempts to
predict the relationship between two points
■ Compares predictor(s) (X) and outcome
(y), and the consistent change between
those values
– 𝛽o (y-intercept): point where regression line
crosses the y-axis (i.e. when x = 0) -
constant
– 𝛽1 (𝑠𝑙𝑜𝑝𝑒): tells you how much y changes
as you move along the values of x
■ Hypothesis testing in regression: tests
the null hypothesis that 𝛽1 = 0
– Does the model predict the outcome
Least Squares
Criterion
■ Method of minimizing the sum of squared
residuals in a model
■ Total Sum of Squares = Explained Sum of
Squares
+ Residual Sum of Squares
• TSS = ESS + RSS

Observed data point (xi,


yi) Residual = yi -
ypp,
Predicted data point (x
yp)
Assumptions of
linear regression
■ Residuals are normally distributed
■ Linear relationship between
dependent and independent
variables
■ No extreme outliers
■ Ample sample size for variables in
the model
– At least 10 cases per
independent variable
■ No multicollinearity between independent
variables (predictors)
– Occurs when there is too strong a
Multiple linear
regression
■ Multiple predictor variables (referred to as
“covariates”)
■ Let’s say we want to assess the relationship
between LDL cholesterol (outcome) and
dietary fat intake (main predictor), but we
also want to consider age, waist
circumference, protein intake, and physical
activity level in the analysis (all covariates).
– Can determine what variables are
significant predictors of LDL (or are
significantly associated with LDL) with
this approach.
Confounde
rs

A variable that is associated with an exposure, and


independent of that association, is also a risk factor for an
outcome. Distorts estimate of association between
independent and dependent variable.

Can be controlled for in statistical analyses, but only if


the variable has been measured during data collection!
Residual confounding is something to consider in
Identifying
Confounders
■ General approach: Conduct simple linear regression
between a predictor variable and an outcome
variable.
■ Repeat the regression with the potential
confounder (covariate) now included in the model
■ Variable considered a confounder if:
– A p-value that was initially significant is
attenuated (i.e. no longer significant)
– B coefficient for your main predictor
variable changes by more than 10%
Relationship between
systolic blood pressure
and total fat intake is
confounded by sodium
intake.
Thought
Experiment
■ Think of a variable that could confound the
following relationships:
– Alcohol intake and heightened risk of
heart attack
– smoking
– Pet ownership and aging without onset
of T2D
– PA

You might also like