You are on page 1of 5

LEARNING UNIT 3: LINEAR REGRESSION AND CORRELATION ANALYSIS

LEARNING OUTCOMES:
 Understand the meaning of regression analysis.
 Construct a simple regression model
 Understand correlation analysis and relationships between variables;
 Understand and calculate Pearson’s correlation coefficient;
 Interpret the correlation coefficient;
 Calculate and interpret the coefficient of determination

3.1 Simple linear regression analysis


Regression analysis and correlation analysis are two statistical methods that aim to
quantify the relationship between these variables and measure the strength of this
relationship.

One variable is called the independent or predictor variable, x, and the other is
called the dependent or responsive variable, y.

Regression Equation
Regression analysis finds the equation that best fits a straight line to the scatter
points. A straight-line graph is defined as follows:
^y =b0 +b 1 x
where: x = values of independent variable
^y = estimated values of dependent variable
b 0 = y-intercept coefficient (where the regression line cuts the y-axis)
b 1 = slope (gradient) coefficient of the regression line

Regression analysis uses the method of least squares to find the best-fitting straight-
line equation.
^y =b0 +b 1 x , where

n ∑ XY −∑ X ∑ Y
b 1=
n ∑ X 2− ( ∑ X )
2

b 0=
∑ Y −b 1 ∑ X
n
Flat-screen TV sales per week
40
35
30 f(x) = 4.36842105263158 x + 12.8157894736842
25
Sales

20
15
10
5
0
1.5 2 2.5 3 3.5 4 4.5 5 5.5

Adverts
Fig. 3.1 Scatter plot of TV sales, with superimposed regression equation.

Example 3.1
Calculation of the regression coefficient b 0 and b 1 for the flat-screen TV sales.
Adverts (x) Sales (y) x
2
xy
4 26 16 104
4 28 16 112
3 24 9 72
2 18 4 36
5 35 25 175
2 24 4 48
4 36 16 144
3 25 9 75
5 31 25 155
5 37 25 185
3 30 9 90
4 32 16 128
Σ x = 44 Σ y = 346 ∑ x 2= 174 Σ xy = 1324

From the table,


n ∑ XY −∑ X ∑ Y
b 1= = 12 ¿ ¿ = 4.368
n ∑ X 2− ( ∑ X )
2

b 0=
∑ Y −b 1 ∑ X = 346−4.368( 44) = 12.817
n 12

The simple linear regression equation to estimate flat-screen TV sales is given by:
^y =12.817 +4.368 x
Estimate y-values using the regression equation

The regression equation can now be used to estimate y-values from (known) x-values
by substituting a given x-values into the regression equation.
Example 3.2
What will be the average sales of flat-screen TVs in a week when 6 advertisements
are made, using equation in example 3.1?
Solution
Thus, substitute x = 6 into the regression equation
^y =12.817 +4.368 x

^y =12.817 +4.368 (6) = 39.025

39 flat-screen TVs are expected to be sold on average

3.2 Correlation Analysis


Correlation analysis measures the strength of the linear association between two
numeric variables x and y. This measure is called Pearson’s correlation coefficient
and is represented by the symbol r when calculated from sample data.
The following formula is used to calculate the sample correlation coefficient:
n ∑ XY −∑ X ∑ Y
r=
√ ⌊ n ∑ X −( ∑ X ) ⌋ ⌊ n ∑ Y −( ∑ Y ) ⌋
2 2 2 2

Where r = the sample correlation coefficient


x = the values of the independent variable
y = the values of the dependent variable
n = the number of paired data points in the sample

Example 3.3
Refer to example 3.1. Find the sample correlation coefficient, r, between the number
of adverts placed and flat-screen TVs sales. Comment on the strength of linear
relationship.
Solution:
Adverts (x) Sales (y) x
2
xy y
2

4 26 16 104 676
4 28 16 112 784
3 24 9 72 576
2 18 4 36 324
5 35 25 175 1225
2 24 4 48 576
4 36 16 144 1296
3 25 9 75 625
5 31 25 155 961
5 37 25 185 1369
3 30 9 90 900
4 32 16 128 1024
Σ x = 44 Σ y = 346 ∑ x 2= 174 Σ xy = 1324 ∑ y = 10336
2

12 ( 1324 )−[ ( 44 ) ( 346 ) ] 664


Then: r = =
√¿ ¿ ¿ √152 × 4316
= 0.8198
How to interpret a correlation coefficient:
A correlation coefficient is a proportion that lies between -1 and +1
-1 ≤ r ≤ +1
Strong - moderate - weak - weak + moderate + strong +
-1 -0.5 -0.25 0 + 0.25 +0.5 +1

Perfect negative No Perfect positive


correlation correlation correlation

The r 2 coefficient (The coefficient of determination)


The coefficient of determination measures the proportion or percentage of variation
in the dependent variable, y, that is explained by the independent variable, x. the
coefficient of determination ranges between 0 and 1 (or 0 and 100%)
Interpretation of the r 2 coefficient:
When: r 2 = 0, there is no association between x and y
r = 1, there is perfect association between x and y
2

0 < r 2 < 1: The strength of association depends on how closer r 2 lies to either
0 or 1.
 When r 2 lies closer to 0, it is a weak association between x and y
 When r 2 lies closer to 1, it is a strong association between x and y
Example 3.4
Calculate the sample coefficient of determination, r 2, between the number of adverts
placed and flat-screen TV sales in example 3.3.
Solution:
Given r = 0.8198, then r 2 = (0.8198)2 = 0.6721
This means that adverts have a moderate to strong impact on weekly sales of flat-
screen TV

You might also like