You are on page 1of 14

PAHS 306

HEALTH STATISTICS AND INFORMATION

Session 6 – Simple Linear Regression

Lecturer: Dr. Roger A. Atinga, UGBS


Contact Information: raatinga@ug.edu.gh

College of Education
School of Continuing and Distance Education
2014/2015 – 2016/2017
Session Outline

• Topic One – Definition of Regression


• Topic Two – Types of Regression

PAHS 306 Health Statistics and Information Slide 2


Regression analysis

Regression is used to indicate the cause and effect


relationship between two or more variables and
establishes a functional relationship between them
Differences between correlation and
regression

• Regression provides the direction of causal relationship


between the independent and dependent variable

• The effect of each independent variable on the


dependent variable can be determined

• The significance of each independent variable can be


ascertained
Some uses of regression
• It is used for forecasting and prediction purpose

• It is used to describe the relationship between two


or more variables and whether that relationship is
direct or inverse

• It can be used for quality control purposes


Types of regression
• Simple linear regression
• Multiple regression

• Used to predict scores on one


Simple variable from scores on another
regression
variable
• Used to predict scores on a
Multiple dependent variable from scores
regression of a number of independent
variables
Simple linear regression

A simple linear regression (SLR)


SLR predicting Y on X
has two variables of interest e.g.
X and Y

X depict independent variable

Y depict dependent variable

Y can be predicted on X and


vice versa in SLR is
Multiple regression
• Describes two or more independent variables on one
dependent variable
• In other words several predictor variables on one response
variable
• Predicting health service use (Y) on income (X) and Age (A)
Coefficients

Y = β0+β1X +β2A + ε Error term

Dependent variable Independent variables


The method of least squares
Regression line of best fit

• The method of least


squares is a way of finding
the line that best fits the
data points

• This line of best fit either


goes through or is close to
as many of the data
points as possible
The method of least squares

• The vertical difference


between the line of best
fit and the observed data
points are used to predict
values of the dependent
and independent variables

• In regression the
differences are usually
called residuals
Assessing the goodness of fit: sum of
squares R and R2
• One way of evaluating the strength of the line of
best fit is to compute the R2 value

• This R2 is called the goodness of fit (GoF) or


coefficient of determination of the regression model

• R2 is used to assess the quality of the regression line


Assessing the goodness of fit: sum of
squares R and R2
• R2 lies between 0 and 1

• The larger the R2 value the more the explanatory power


of the dependent variable

• A large R2 value means the regression model is able to


account for a larger proportion of the total variability in
the observed values of Y
Reading List
• Unit Three Section 5 of the reader – Health Statistics and
Information Module by R. Atinga

• Daniel, W.W. (2009), Biostatistics: a foundation for analysis


in the health sciences, John Wiley and Sons, New
Baskerville. (Chapter 9)

PAHS 306 Health Statistics and Information Slide 13


THANK YOU

PAHS 306 Health Statistics and Information Slide 14

You might also like