You are on page 1of 19

Chapter Four

Correlation and regression


Introduction
• Linear regression and correlation is studying and measuring the linear
relationship among two or more variables.

• When only two variables are involved, the analysis is referred to as simple
correlation and simple linear regression analysis, and

• when there are more than two variables the term multiple regression and
partial correlation is used.
Linear regression analysis
Regression Analysis
• Is a statistical technique that can be used to develop a mathematical equation showing
how variables are related.
• Regression analysis is the process of predicting the value of the dependent variable (Y)
on the basis of the known value of the independent variable (x) .
• The primary objective of regression is prediction.
For instance,
• By how much will the yield per hectare will increase as we increase the amount of fertilizer by 1
gram?
• By how much student’s GPA increase if the student increases on his stay on reading by one hour?
• As a business man decreases budget on advertisement, by how much will his benefit
increase/decrease?
Cont’d

Determining the equation of regression line

• In order to predict (estimate) the value of y for a given value of x or vice versa we

must first determine the equation of the regression line mathematically.

• The equation of the regression line y on x for any two variables x and y is given by:

y  a  bx

Where yis the predicated (estimated) variable.


n  xy   x  y or
b 
n x 2  ( x ) 2

a  y  bx (The y – intercept)
Cont’d
• If b > 0, then we say x and y are positively related

• If b < 0, then we say x and y are negatively related

Example: a) Determine the equation of the regression line y on x for the data about
advertising expenditure and sales revenues.

b) Estimate the sales revenue if the advertising expenditure is 700 Birr.


advertising expenditure sales revenue
(1000) ( 1000)
X y

2 10
3 15
5 12
8 17
10 18
12 20
Cont’d
• Example 2: The following data shows the score of 12 students for Accounting and
Statistics examinations.
• A) Fit a regression equation of Statistics on Accounting using least square estimates.
• B) Predict the score of Statistics if the score of accounting is 85.
a.
Cont’d
Exercise: A car rental agency is interested in studying the relationship between the
distance driven in kilometer (Y) and the maintenance cost for their cars (X in birr).
The following summarized information is given based on samples of size 5.

a) Find the least squares regression equation of Y on X


b) Compute the correlation coefficient and interpret it.
c) Estimate the maintenance cost of a car which has been driven for 6 km
Linear Correlation
• Regression analysis helps us in predicting the value of one variable on the basis of the
given value of another variable, when these two variables are related to each other.

• Correlation analysis helps us to determine the strength of linear relationship between


the two variables x and y. in other words , as how strongly are these two variables
correlated in cases where the relationship is linear.

• A measure that expresses the extent to which two variables are related is called
correlation coefficient.

• It is a measure of the degree of strength of relationship between the variables.


Cont’d
• The degree of linear relationship between two variables x and y can be estimated by
any of the following methods
 By presenting the data in a scatter diagram

 By calculating the Pearson’s coefficient of correlation(r)

 By calculating the rank correlation coefficient

 I. The scatter diagram

• The form of relationships between two variables can be presented usually in a scatter
diagram. A scatter diagram is a graphic method used to visually summarize the
relationship between two variables.
Cont’d

• The horizontal axis represents the independent variable x and the vertical axis
represents the dependent variable y.

• whether the variables are positively or negatively related or no relation exists between
them can be visualized from the scatter diagram .

• Now consider the following scatter diagrams.

• In a, b and c the variable are positively related, whereas in d, e, and f they are
negatively related.

• At c and f the variables are perfectly correlated.


Cont’d
Cont’d

II.Correlation coefficient
• Correlation coefficient is used to measure the degree and direction of linear
relationship between two quantitative variables.

• We have a number of statistical procedures that can quantify linear relationship

Pearson Correlation Coefficient

• Have two random variables, say X and Y. The Pearson correlation coefficient is
n xy  ( x)(  y )
r r
xy
n x  ( x )
2 2
n  y  ( y )
2 2
Or

 The correlation coefficient can assume any value in


Cont’d
• Interpretation of r
1.Perfect positive linear relationship ( if r = 1)
2.Some Positive linear relationship ( if r between 0 and 1)
3.No linear relationship ( if r= 0)
4.Some Negative linear relationship ( if r between -1 and 0)
5.Perfect negative linear relationship ( if r = -1)
• Examples:
1. Calculate the simple correlation between mid semester and final exam scores of 10
students (both out of 50)
Solution
Cont’d

Exercise : Compute the value of the correlation coefficient for the data

obtained in the study of the number of absences and the final grade of the

seven students in the statistics class given in the following table.


Student number of absence final grade y (%)
A 6 82
B 2 86
C 15 43
D 9 74
E 12 58
F 5 90
G 8 78
Cont’d

III. Rank correlation

Many characters such as beauty, honesty, intelligence etc., are expressed

in comparative terms.

In such cases the units are ranked pertaining to the particular character

instead of taking measurements on them.


• Steps
i. Rank the different items in X and Y.
ii. Find the difference of the ranks in a pair , denote them by Di
iii. Use the following formula
Cont’d

The coefficient of rank correlation is given by the formula

You might also like