Regression Relationships: AnalysesLecture 7 February 25, 2020 Announceme nts ■ Midterm viewing on February 28; 11:00 AM – 1:00 PM in R2-013 – No cellphones/cameras allowed
■ Final exam date: April 28; 2PM
Assessing relationships with continuous variables Non- Samples Parametric test parame tric Last test week 2 Pearson Spearma continuo n us correlati correlati variables on on 2+ Non- Linear Continuo parametric regression us regression variables Regression analyses Non- Depende Independe Parametr paramet nt nt ic ric test variable variable test Non- Linear Continuous Continuou paramet regressi s ric on regressio n Non- Logistic Categorical Continuou paramet s ric regressi regressio on n Non- Linear regression ■ Approach for modelling the linear relationship between a dependent variable (y) and one or more independent variables (X) – Independent variable: predictor – Dependent variable: outcome ■ Simple linear regression: one dependent variable and one independent variable (Y=B0+B1X1+E) ■ Multiple linear regression: one dependent variable and more than one independent variables ■ Used to assess the strength of a relationship between predictor(s) and an outcome. Can Significance of a linear model ■ Regression line: straight line that attempts to predict the relationship between two points ■ Compares predictor(s) (X) and outcome (y), and the consistent change between those values – 𝛽o (y-intercept): point where regression line crosses the y-axis (i.e. when x = 0) - constant – 𝛽1 (𝑠𝑙𝑜𝑝𝑒): tells you how much y changes as you move along the values of x ■ Hypothesis testing in regression: tests the null hypothesis that 𝛽1 = 0 – Does the model predict the outcome Least Squares Criterion ■ Method of minimizing the sum of squared residuals in a model ■ Total Sum of Squares = Explained Sum of Squares + Residual Sum of Squares • TSS = ESS + RSS
Observed data point (xi,
yi) Residual = yi - ypp, Predicted data point (x yp) Assumptions of linear regression ■ Residuals are normally distributed ■ Linear relationship between dependent and independent variables ■ No extreme outliers ■ Ample sample size for variables in the model – At least 10 cases per independent variable ■ No multicollinearity between independent variables (predictors) – Occurs when there is too strong a Multiple linear regression ■ Multiple predictor variables (referred to as “covariates”) ■ Let’s say we want to assess the relationship between LDL cholesterol (outcome) and dietary fat intake (main predictor), but we also want to consider age, waist circumference, protein intake, and physical activity level in the analysis (all covariates). – Can determine what variables are significant predictors of LDL (or are significantly associated with LDL) with this approach. Confounde rs
A variable that is associated with an exposure, and
independent of that association, is also a risk factor for an outcome. Distorts estimate of association between independent and dependent variable.
Can be controlled for in statistical analyses, but only if
the variable has been measured during data collection! Residual confounding is something to consider in Identifying Confounders ■ General approach: Conduct simple linear regression between a predictor variable and an outcome variable. ■ Repeat the regression with the potential confounder (covariate) now included in the model ■ Variable considered a confounder if: – A p-value that was initially significant is attenuated (i.e. no longer significant) – B coefficient for your main predictor variable changes by more than 10% Relationship between systolic blood pressure and total fat intake is confounded by sodium intake. Thought Experiment ■ Think of a variable that could confound the following relationships: – Alcohol intake and heightened risk of heart attack – smoking – Pet ownership and aging without onset of T2D – PA