Professional Documents
Culture Documents
BengalMSc Nursing
PPTs for the sessions by Dr. Indranil Saha (Professor & HOD,
Community Medicine, IQ City Medical College, Durgapur)
Shubham Pandey
14 Jul
PowerPoint
PowerPoint
PowerPoint
PowerPoint
9 class comments
ok sir.....thank you
Thank you
Okk sir
Page
29
30
Page 1 of 30
Fundamentals of Correlation and Regression
Dr. Indranil Saha
Professor & HOD
Community Medicine
IQ City Medical College, Durgapur
Dr. Indranil Saha: B M Birla College of Nursing 16/07/2020
Page 2 of 30
Correlation
Correlation represents relationship between two variables, like weight and height, weight and cholesterol, age and life
expectancy etc.
It is also known as simple bivariate correlation (means between two variables) or zero-order correlation.
Page 3 of 30
This relationship is displayed in scatter plot or scatter diagram. It shows how closely the points lie in relation to a straight
line.
There must be logical relationship between two variables
Relationship between two variables are established but causation is not determined
Page 4 of 30
Dr. Indranil Saha 16/07/2020
Page 5 of 30
Positive correlation between SBP and weight
Page 6 of 30
Positive correlation between height and SBP
Dr. Indranil Saha 16/07/2020
Page 7 of 30
Negative correlation between disease activity score (DAS) and HDL level
Page 8 of 30
No correlation between height and cholesterol
Dr. Indranil Saha 16/07/2020
Page 9 of 30
Page 10 of 30
Assumptions for correlation
At first, scatter diagram to check for linearity and homoscedasticity (variability in scores of one variable should be similar at
all values of other variable)
If the scores are evenly spread in a cigar shaped manner and a straight line is drawn through the main cluster of points
The data set are generated from random sample
If a curved line is found (suggesting a curvilinear relationship), then Pearson correlation coefficient can not be calculated
Page 11 of 30
r (Pearson’s)
Both the variables measured on a interval or ratio scale & normally distributed.
(rho) (Spearman’s)
One variable is on ordinal scale and other on either ordinal or higher scale.
Interpretation of correlation coefficient:
When r is more than 0.7: High correlation.
When r is between 0.3 – 0.7: Moderate correlation.
When r is less than 0.3: Weak correlation.
Dr. Indranil Saha 16/07/2020
Page 12 of 30
Evidence of normal distribution in a data set:
(i) Skewness and kurtosis
(ii) Kolmogorov-Smirnov test: A non-significant result (P value of more than 0.05) will indicate normality
(iii) Shapiro-Wilk W test
(iv) Histogram
(v) Quantile-quantile (Q-Q) plot
Dr. Indranil Saha 16/07/2020
Page 13 of 30
Distribution of the blood pressure according to body weight in both sexes.
120.34 75.81 116.63 71.45
r1 =0.77 r2 =0.76 r3 =0.76 r4 =0.63
p1 < 0.001 p2 < 0.001 p3 < 0.001 p4 < 0.001
Page 14 of 30
Distribution of the study subjects according to BDI score in different year of study.
Spearman’s correlation co-efficient (between academic year and BDI score): rho = - 0.219, P = 0.003
Page 15 of 30
Correlation matrix
Dr. Indranil Saha 16/07/2020
Page 16 of 30
Multiple correlation (R)
When there are two or more independent variables, the analysis concerning relationship between dependent and
independent variables is known as multiple correlation (denoted by R)
Partial correlation
Partial correlation measures separately the relationship between two variables in such a way that the effects of other related
variables are eliminated
Page 17 of 30
Regression:
Correlation gives degree & direction of relationship between 2 variables, whereas the regression analysis enables us to
predict the values of one variable (dependent) on the basis of other variable/s (independent).
Regression coefficient is a measure of change of one dependent variable with one unit change in independent variable.
Dr. Indranil Saha 16/07/2020
Page 18 of 30
Dependent variable continuous
Example: SBP
Dependent variable discrete – dichotomous
Example: HTN / Normo
Page 19 of 30
Linear Regression
Dependent variable is quantitative (preferably continuous)
Linear regression is a parametric test and is based on a linear relationship between variables.
Correlation - Independent & dependent variables > 0.3
Regression line : Method of least squares technique and the resultant line is called the least squares line
Page 20 of 30
Assumptions for Linear Regression
Relationship between two variables must be linear. In scatter plot all the points may not fall exactly on the line, rather it
should be closely scattered around it.
r value above 0.3
Extension of Pearson’s correlation coefficient
One dependent variable must be quantitative (preferably continuous), will be in ratio scale and should be normally
distributed.
Page 21 of 30
Assumptions for Linear Regression......
Independent variables can be either qualitative or quantitative and may belong to any scale
Large sample size
Multicollinearity and singularity must be absent
Free of outlier
Page 22 of 30
Simple Linear Regression
y (total Cholesterol level) = a + b (calorie intake)
One dependent variable like cholesterol level
One independent variable like calorie intake
Dr. Indranil Saha 16/07/2020
Page 23 of 30
Multivariable linear regression
y = a + bx1 + cx2 + dx3
y (total Cholesterol level) = a + b (calorie intake) + c (physical activity) + d (BMI)
Regression coefficients, also known as beta
Page 24 of 30
Output:
R-square value will indicate the amount of variance of dependent variable which can be explained by the model.
ANOVA table in output will indicate the statistical significance of the model.
Role of individual independent variable in relation to the dependent variable can be explained by regression coefficient:
unstandardized & standardized
Page 25 of 30
Interpretation of linear regression equation
Men aged 40 – 55 years
y DBP = 40 + 1.2 x age
Dr. Indranil Saha 16/07/2020
Page 26 of 30
Logistic Regression
When the dependent variable is qualitative
This qualitative dependent variable either may be dichotomous / binary (having two categories) or polychotomous (with
more than two categories)
Logistic regression is an example of nonlinear regression
Page 27 of 30
Assumptions for Logistic Regression
One dependent variable must be qualitative in nature having dichotomous or polychotomous characteristics
The independent variables can be either qualitative or quantitative and may belong to any scale
Unlike linear regression, logistic regression can be performed straightway without doing any correlation
Page 28 of 30
Selection of independent variables is crucial and one has to assess the fit of the model. Independent variables with P value
less than 0.25 in simple regression model
There should be large sample size like linear regression
Multicollinearity and outliers should be absent
Dr. Indranil Saha 16/07/2020
Page 29 of 30
Important output of logistic regression model:
Significant Omnibus test and non-significant Hosmer-Lemeshow test support good fit of the model
Cox & Snell R2 and Nagelkerke R2: Variation
Classification table – correct outcome explained
Adjusted odds ratio (AOR) / Exp(B): Role of individual independent variable : > or < 1 with 95% CI
Page 30 of 30
Thank you
...
Page 29 of 30