You are on page 1of 22

APPLIED STATISTICS IN CLINICAL AND MEDICAL RESEARCH :

CORRELATIONS AND REGRESSION


ANALYSIS (UNIVARIATE)
NEILY ZAKIYAH, PhD., Apt
APPLIED STATISTICS IN CLINICAL AND MEDICAL
RESEARCH

u Descriptive statistics, exploring the data


• Type of variables, distribution

u Testing differences between groups

u Correlations
u Univariate regressions
Linear regression
Logistic regression
Correlation and regression
Correlation
u to estimate association between two quantitative/numerical variables.
Assumption à the association is linear, meaning that one variable changes
(increases or decreases) a constant amount for a unit increase or decrease in the
other.
u Correlation is measured by a correlation coefficient, which represents the
strength of the linear association between variables in question.

+1, -1 : complete correlation ( perfect linear relationship)


0 : no correlation
Correlation

a,b complete correlation


a: the variables are directly related (i.e., as the
value of one variable goes up, the value of the
other also tends to do so)
b: the variables are inversely related (i.e., as the
value of one variable goes up, the value of the
other tends to go down)

c: no correlation
d: no linear correlation

Source: The BMJ


Correlation
u Strength of the association:
0 – 0.19 very poor
0.20 – 0.39 poor
0.40 – 0.59 moderate
0.60 – 0.79 strong
0.80 – 1.00 very strong
arbitrary!
u Correlation ≠ causality
Correlation
Type of correlation coefficients:
1. Pearson's product moment correlation coefficient (r)
: when both variables being studied are normally distributed
2. Spearman's rank correlation coefficient (rs)
: appropriate when one or both variables are skewed or ordinal and is robust
when extreme values are present
Correlation
u Example:
In a study, the researcher would like to test whether there is statistically
significant linear relationship between weight and height in a population. The
variable height in inches exhibits a range of values from 55.00 to 84.41 and the
variable weight in pounds, exhibits a range of values from 101.71 to 350.07. How
to determine strength and direction of the association? Normality assumption is
used.
Scatterplots:
u Pearson correlation in spss output:

A: height – height
B, C: height – weight
D: weight - weight

P value
Interpretation:
• Weight and height have a statistically significant linear relationship (p<0.001)
• The direction of the correlation is positive
• The strength of the association is approximately moderate

Size of the association ???? à regression


Regression analysis
u Regression analysis describes causal relation between one dependent
variable and one or more independent variables
u It enables the estimation of relationships between “response” or
”outcomes” variable (usually describes as dependent variable,
denoted by Y) and “explanatory” or “predictor” variable(s) (usually
describes as independent variable, denoted by X)
u 1 independent variableà univariate regression
u ≥ 2 independent variables à multivariate regression
Regression models
u Type of dependent variable determines the model:

Dependent Independent
Linear regression Numerical /continuous Numerical /continuous
Categorical
Logistic regression Categorical Numerical /continuous
Categorical
Regression models

Schneider et al, 2010


Linear Regression
u Univariate linear regression
Regression equation: Ypred = a + bX
Y = dependent variable
Ypred = predicted value
X = independent variable
a = intercept (intersection y-as, X=0)
b = regression coefficient, slope
Y – Ypred = residual

*interpretation b (regression coefficient):


The change in Y per unit change of X
Univariate linear regression
u Example
Cigarette smoking has been known as one of the risk factors of
development of chronic respiratory diseases (i.e. asthma and COPD) due
to its effect in decreasing respiratory functions. A cross sectional study
observes an interference effect of cigarette smoking habits on
respiratory functions (FEV1) in asthmatic patients. What would be the
possible association between the smoking habits and forced expiratory
volume in these patients?
Assumptions:
• Linearity between X and Y
• Independent observations
• Residuals normally distributed
Univariate linear regression
u SPSS: Analyze – Regression – Linear
Dependent variable : FEV1 (cl)
Independent variable : Smoking habits (current smokers, ex smokers, never smoked)
SPSS output: model summary

Residual variance

Correlation coefficient % explained variance


(31% of the variance
in FEV1 is explained
by smoking habits)
Univariate linear regression
u SPSS output: coefficients

P value 95% CI (a range of


Regression coefficient
values that we can 95%
(value where Y cross the X
certain that the
in the regression)
calculation contains
population mean)
FEV1 = 290.981 +33.740*ex smokers +27.976*current smokers
Interpretation: The mean value of FEV1 are 33.740 cl and 27.976 cl higher in ex-smokers and
current smokers, respectively compared to never smokers.
P=0.000 --- statistically significant
Univariate linear regression
u Check assumptions:
Linear relation between X and Y?
à Scatterplots!
Distribution residuals?
à Normal probability plot (histogram)
Logistic Regression
u Appropriate analysis when the dependent variable is categorical à
basically has the same concept as linear regression.
u Using the regression coefficients, we can estimate the odds ratio (OR)
for (each of the) independent variable(s).
u OR= exposition odds among the patients
exposition odds among the controls
u Exposition odds= number of exposed/number of non-exposed
Logistic Regression
Disease + Disease -
Exposition + a b
Exposition - c d

OR= exposition odds among the patients


exposition odds among the controls

𝑎/𝑐 𝑎𝑑
OR= =
𝑏/𝑑 𝑏𝑐

OR is a measure of association between exposure and outcome and represents


the odds that an outcome will occur given a particular exposure, compared to
the odds of the outcome occurring in the absence of the exposure.
Logistic Regression
u OR interpretation

OR Exposure to X is associated with:


>1 Increased risk of Y; X is a risk factor (exposure associated
with higher odds of outcome)
<1 Decreased risk of Y; X is a protective factor (exposure
associated with lower odds of outcome)
=1 No association between X and Y (exposure dose not affect
odds of outcome)
Univariate logistic regression
u Example
The objective of one case-control study is to observe the odds of having
depression based on people’s smoking behavior.
Dependent variable : depression, 2 categories:
1 = No
2 = Yes
Independent variable : smoking behavior, 3 categories:
1 = Ex smoker
2 = Current smoker
3 = Never smoked
Univariate logistic regression
u SPSS: Analyze – Regression – Binary logistic
u SPSS output:

P value Odds ratio= Exp (B) 95% CI


Interpretation:
Compared to people who never smoked, people who used to smoke (ex-smokers) have 1.14
higher odds (95% CI 1.05 to 1.24), and current smokers have 1.79 times higher odds (95% CI
1.64 to 1.95) to have a depression.
Correlation and regression
In short:
u Both measures association
u Correlation: strength, direction and significance of correlation between two
variables
u Regression: also measure size and causality of the association à between a
dependent variable and independent variable(s).
u ≥ 2 independent variables, how to deal with confounder à multivariate
regression

You might also like