You are on page 1of 22

APPLIED STATISTICS IN CLINICAL AND MEDICAL RESEARCH :

MULTIPLE REGRESSION ANALYSIS


(MULTIVARIATE)
NEILY ZAKIYAH, PhD., Apt
MULTIPLE REGRESSION

u Multiple regression: One dependent variable (Y) and a set of (≥ 2)


independent variables (X)
u Model:
• Choice of variables
• Research method
• Independent variables: categorical, continuous (numerical) à
dummies
u Confounding, interaction
Regression models
u Type of dependent variable determines the model:

Dependent Independent
Linear regression Numerical /continuous Numerical /continuous
Categorical
Logistic regression Categorical Numerical /continuous
Categorical
Multiple linear regression

u Biological factors: mostly multifactorial


u Multiple linear regression equation:
Ypred = a + b1X1 + b2X2 + b3X3+….+bkXk
a = intercept (intersection y-as, X=0)
b = regression coefficient, slope
b1 = effect of X1 on Y, independent of the effects of X2, X3 to Xk
Multiple linear regression

u Strength estimated linear regression relation:


r = multiple correlation coefficient
r2 = the proportion of variation in dependent variable Y accounted
for by set of independent variables (X’s) / % explained variance.

u Total variance:
à Explained by independent variables in the model
à Unexplained, biological variation, measurement error
Multiple linear regression

u Assumptions:
1. Linearity between X and Y
2. Residuals normally distributed
3. Constant variance = dispersion around the line independent of value
of X
4. Independent observations
Model choice multiple regression

u Explaining (independent) variables:


• Literature, previous research
• Biological plausible
• Descriptive analysis ( p< 0.25)
• Number of observations (number of explaining variables < 10% of the
number of observations)
Model choice multiple regression:
Ways of building regression models
Method:
u Simultaneous: all independent variables entered all at once in the model
u Stepwise (backward/forward): independent variables entered according to
some order (sequence determined by degree in which the variable explains
the variance, each variable is regarded equally important)
u Hierarchical: independent variables entered in stages

Rule of thumb: confounding variable if estimate changes by 10% or more


Multiple linear regression

Variables in the model:


u Categorical
u > 2 categories à make dummies!
u Continuous: only if there is a linear relation
No linear relation à make categories/ transform
For instance 3 smoking categories: current smoker, ex-smoker, never smoker
As 1 variable in the model? Thus, ’smoking habits’=
0 for never smokers
1 for ex smokers
2 for current smokers
Confounding/ interaction
u Influence of different independent variables on each other:
confounding/ interaction
u Confounding variable is any other variable other than independent
variable of interest, that also has an effect on dependent variable.
u Confounding variables can cause major problems in the analysis:
• Increase variance
• Introduce bias
Confounding vs interaction

u Is the association between X and Y influenced by another variable Z?


yes à confounding

u Is the association between X on Y different for different values of Z?


yes à interaction, effect modification
Multiple linear regression
u Example
Cigarette smoking has been known as one of the risk factors of
development of chronic respiratory diseases (i.e. asthma and COPD) due
to its effect in decreasing respiratory functions. A study observes an
interference effect of cigarette smoking habits on respiratory functions
(FEV1) in asthmatic patients.
• What would be the possible association between the smoking habits
and forced expiratory volume in these patients?
• What would be the association with some adjustments to age, gender,
weight, and blood eosinophil profile as potential confounders?
• Is the association different in men and women?
Multiple linear regression
Dependent variable : FEV1 à forced expiratory volume per one second (cl)
Independent variable : Smoking habits (current smokers, ex smokers, never smoked)
Confounding variables : Age, gender, weight, blood eosinophil profile

SPSS output univariate linear regression (for details please refer to univariate analysis):

FEV1 = 290.981 +33.740*ex smokers +27.976*current smokers à crude estimate!


Multiple linear regression
u SPSS output multiple linear regression (multivariate analysis), adjustment to
potential confounders :

FEV1 = -218.708 -3.967*ex smokers -14.156*current smokers +33.356*sex -


3.297*age +3.803*height +0.132*weight -0.344*eosi à adjusted estimate!
Multiple linear regression
u SPSS output multiple linear regression (multivariate analysis), adjustment to potential
confounders, interaction (is the association different for men and women?) :

Note:
dummy variable
Women = 0
Men = 1

u The interaction between ex-smokers and FEV1 in men and women is not
significantly different (p=0.627). On the other hand, there is a significant
difference (p < 0.05) interaction between current smokers and FEV1 in men and
women.
Hospers, J. J., Postma, D. S., Rijcken, B., Weiss, S. T., &
Schouten, J. P. (2000). Histamine airway hyper-
responsiveness and mortality from chronic obstructive
pulmonary disease: a cohort study. The Lancet, 356(9238),
1313–1317. doi:10.1016/s0140-6736(00)02815-4
Hospers, J. J., Postma, D. S., Rijcken, B., Weiss, S. T., &
Schouten, J. P. (2000). Histamine airway hyper-
responsiveness and mortality from chronic obstructive
pulmonary disease: a cohort study. The Lancet, 356(9238),
1313–1317. doi:10.1016/s0140-6736(00)02815-4
Multiple Logistic Regression
Dependent Independent
Linear regression Numerical /continuous Numerical /continuous
Categorical
Logistic regression Categorical Numerical /continuous
Categorical

u OR= exposition odds among the patients


exposition odds among the controls

OR Exposure to X is associated with:


>1 Increased risk of Y; X is a risk factor (exposure associated with higher
odds of outcome)
<1 Decreased risk of Y; X is a protective factor (exposure associated with
lower odds of outcome)
=1 No association between X and Y (exposure dose not affect odds of
outcome)
Zakiyah N, Ter Heijne LF, Bos JH, Hak E, Postma MJ, Schuiling-Veninga
CCM. Antidepressant use during pregnancy and the risk of developing
gestational hypertension: a retrospective cohort study. BMC Pregnancy
Childbirth. 2018;18(1):187. Published 2018 May 29. doi:10.1186/s12884-
018-1825-y
Unadjusted OR: Adjusted OR: after
without adjustment to
adjustment to potential
potential confounders
confounders

Dependent variable: gestational hypertension


(using proxy of antihypertension dispensing),
categorical
Independent variable: antidepressants
dispensing in pharmacy, categorical
Confounding variables: maternal age, use of
benzodiazepins and antibiotics
Multiple Logistic Regression
Interpretation of the primary results:
u The exposure to antidepressants during pregnancy was associated with
significant increased odds for developing gestational hypertension,
the risk was approximately doubled (aOR 2.00, 95% CI 1.28–3.13)
u Significant associations were also found for the subgroup of selective
serotonin reuptake inhibitors (SSRIs) (aOR 2.07 95% CI 1.25–3.44), ≥30
DDDs (aOR 2.50 95% CI 1.55–3.99) and maternal age of 30–34 years
(aOR 2.59 95% CI 1.35–4.98).

You might also like