You are on page 1of 16

Chapter 3 STA104 : Correlation and Simple Linear Regression DEC2016-APRIL2017

CHAPTER 3: CORRELATION AND SIMPLE LINEAR


REGRESSION

3.1 CORRELATION
Correlation is a statistical method used to determine whether a relationship between variables
exits. The coefficient that can be used to measure the strength of relationship between two
variables is called correlation coefficient. The sample correlation coefficient would take on
values ranging from -1 to 1.

3.2 DEPENDENT AND INDEPENDENT VARIABLES


Dependent Variable The variable that is being predicted or estimated. It is scaled on the Y-axis.
Independent Variable The variable that provides the basis for estimation. It is the predictor
variable. It is scaled on the X-axis

3.3 SCATTER DIAGRAM


Scatter diagram can help describe the nature of a relationship between independent and
dependent variables. Scatter diagrams can help show different possible correlation categories,
namely, positive correlation, negative correlation, no correlation, perfect positive correlation and
perfect negative correlation.

a) Positive correlation
Positive correlation shows the existence of a positive relationship between two variables, x and y.
The direction of change for both variables is the same. If x increases, then y would increase too.

1
Chapter 3 STA104 : Correlation and Simple Linear Regression DEC2016-APRIL2017

b) Negative correlation
Negative correlation shows the existence of a negative relationship between two variables x
and y, that is both x and y change in the opposite direction of each other. If x increases, y
would decrease

c) No correlation
No correlation simply means there exist no relationship between two variables, x and y. We
cannot relate the changes that occur between x and y, in any way.

d) Perfect negative and perfect positive correlation

2
Chapter 3 STA104 : Correlation and Simple Linear Regression DEC2016-APRIL2017

Example 3.1
Table shows the test scores for finance and mathematics tests for students in a faculty.
Finance (x) 39 12 21 64 57 47 28 75 34
Mathematics(y) 65 35 52 82 92 89 73 98 56
Plot a scatter diagram and determine whether there is a relationship between finance test scores
and mathematics test scores.

Example 3.2
Fuzi Company supplies prawns to restaurants. The demand for prawns depends on the price per
kg. The data are shown below.
Price per kg(RM) 20 22 24 26 28 30 32
Sales (kg) 600 550 480 450 400 330 250
Draw a scatter diagram and state type of relationship between price per kg and the sales.

3.4 CORRELATION ANALYSIS

Correlation Analysis is a statistical method used to measure the strength of the relationship
between two variables. The Coefficient of Correlation (r) is a measure of the strength of the
relationship between two variables. It requires interval or ratio-scaled data. The sample
correlation coefficient would take on values ranging from -1 to 1.

-1 0 1

Linear correlation coefficient (r):

1. Pearson’s Product Moment Correlation Coefficient.

2. Spearman’s Coefficient of Rank Correlation.

3
Chapter 3 STA104 : Correlation and Simple Linear Regression DEC2016-APRIL2017

3.4.1 PEARSON’S PRODUCT MOMENT CORRELATION COEFFICIENT.

The parameter for the correlation coefficient is denoted by rxy .

 x  y
 xy   n
rxy 
 x 2   x   y 2   y  
 2  2 

 n 
 n 
  

Example 3.3
Table shows the test scores for finance and mathematics tests for students in a faculty.
Finance (x) 39 12 21 64 57 47 28 75 34
Mathematics(y) 65 35 52 82 92 89 73 98 56
Calculate the Pearson’s Product Moment Correlation Coefficient and comment on the value
obtained.

4
Chapter 3 STA104 : Correlation and Simple Linear Regression DEC2016-APRIL2017

Example 3.4
Fuzi Company supplies prawns to restaurants. The demand for prawns depends on the price per
kg. The data are shown below.
Price per kg(RM) 20 22 24 26 28 30 32
Sales (kg) 600 550 480 450 400 330 250
Calculate the coefficient correlation. What can be said about the relationship?

5
Chapter 3 STA104 : Correlation and Simple Linear Regression DEC2016-APRIL2017

3.4.2 SPEARMAN’S COEFFICIENT OF RANK CORRELATION.

 Spearman’s rank correlation coefficient is a measure of association between two


variables (at least of ordinal scale).
 It suitable for qualitative data.
 For example: ratings of popular newspaper, beauty, etc.
 However, for quantitative data, the variables must first be ranked and Spearman’s rank
correlation is calculated based on these rankings.
 Difficult to determine the dependent and independent variables.

Steps in Calculating Spearman’s Coefficient of Rank Correlation

Step 1: Rank a data set in ascending order, take note that:


a) If there are no similar numbers, ranking them in ascending order would be to rank
them from the lowest to the highest.
b) If there are similar numbers, one would need to average the ranks of the similar
values.

Step 2: Calculate the difference, d, between the ranks for each pair of values.
d= rank x – rank y

Step 3 : Determine the value of the Spearman’s coefficient of rank correlation, rs as follows:
 6 d 2 
rs  1   
 n (n  1) 
2

6
Chapter 3 STA104 : Correlation and Simple Linear Regression DEC2016-APRIL2017

Example 3.5
Table shows the test scores for finance and mathematics tests for students in a faculty.
Finance (x) 80 70 60 35 80 48 80
Mathematics(y) 50 60 80 40 75 60 90
Calculate the Spearman’s Coefficient of Rank Correlation and interpret its meaning.

7
Chapter 3 STA104 : Correlation and Simple Linear Regression DEC2016-APRIL2017

Example 3.6
Two women rank the eight flower bouquet in a competition as follows:
Flower A B C D E F G H
1st woman 2 5 3 6 1 4 7 8
2nd woman 4 3 2 6 1 8 5 7
Calculate the Spearman’s Coefficient of Rank Correlation and interpret the meaning of the value
obtained.

8
Chapter 3 STA104 : Correlation and Simple Linear Regression DEC2016-APRIL2017

3.5 SIMPLE LINEAR REGRESSION


A simple linear relationship involving only two variables. One would be the dependent variable
(y) while the other would be the independent variable (x). The dependent variable is the variable
in regression that cannot be controlled or manipulated. Besides helping to determine the type of
relationship between two variables, regression analysis also helps predict a dependent variable
when independent variable is given.

3.5.1 FITTING THE SIMPLE LINEAR REGRESSION EQUATION USING THE


LEAST SQUARES METHOD.

The linear regression equation can be represented by:


^
y  a  bx Where a = y intercept , b = slope

Thus the regression equation can be calculated using the following formula

 x  y 
 xy  n  y  b  x 
b a
 x 2 n  n 
 
x 2

n
Where n = number of observation
Interpretation :
Slope b
 If slope has positive value. That implies a positive relationship exist between the two
variables
 The specific value b = b indicate that for every unit increase in independent variable (x),
dependent variable (y) would increase by b units.
 The specific value b = -b indicate that for every unit increase in independent variable (x),
dependent variable (y) would decrease by b units.

Intercept a
Let a= a, When independent variable (x) =0, the dependent variable(y) would be a.

9
Chapter 3 STA104 : Correlation and Simple Linear Regression DEC2016-APRIL2017

Example 3.7
Table gives the daily production quantity and the estimated production cost of Butik Ceria Sdn.
Bhd. This data was collected continuously for a period of 10 days. Using least square method,
find the regression line for cost against quantity. Interpret the meaning of the slope and intercept
of the regression line.
Day Quantity (‘000 unit) Cost (RM’000)
1 10 20
2 13 28
3 20 38
4 16 35
5 17 32
6 15 30
7 18 31
8 14 29
9 11 23
10 12 25

10
Chapter 3 STA104 : Correlation and Simple Linear Regression DEC2016-APRIL2017

11
Chapter 3 STA104 : Correlation and Simple Linear Regression DEC2016-APRIL2017

Example 3.8
The number of kilograms steam used per month by a chemical plant is thought to be related to
the average ambient temperature (in of) for that month. The past year’s usage and temperature
are shown in the following table. Using least square method, find the regression line for
temperature against steam usage. Interpret the meaning of the slope of the regression line.

Month Temperature (X) Steam Usage (Y)


1 21 185.79
2 24 214.47
3 32 288.03
4 47 424.84
5 50 454.58
6 59 539.03
7 68 621.55
8 74 675.06
9 62 562.03
10 50 452.93
11 41 369.95
12 30 273.98

12
Chapter 3 STA104 : Correlation and Simple Linear Regression DEC2016-APRIL2017

13
Chapter 3 STA104 : Correlation and Simple Linear Regression DEC2016-APRIL2017

3.6 COEFFICIENT OF DETERMINATION

 The coefficient of determination (r2) is the proportion of the total variation in the
dependent variable (Y) that is explained or accounted for by the variation in the
independent variable (X).
 It is the square of the coefficient of correlation.

 It ranges from 0 to 1.

Example 3.9
The correlation coefficient for the strength of the relationship between the marks of Additional
Mathematics and Physics for 9 students was found to equal 0.8976. Find the coefficeint of
determination and interpret the value.

Solution 3.9
Ccoefficient of determination, r2 = (0.8976)2 = 0.8056

Means that 80.56% of the variation between the variables is explained by the model
while the other 0.10 or 19.44% is caused by random errors. Therefore, the model built is quite
good.

Example 3.10
The correlation coefficient between the marks of 10 students in Malay and English Language
quizzes is 0.4909. Find the coefficient of determination and interpret the value.

14
Chapter 3 STA104 : Correlation and Simple Linear Regression DEC2016-APRIL2017

Example 3.11
The table below shows the advertising expenditures and company sales for Exandra Company
over the past 12 weeks.

Weeks Advertising Company Sales


expenditures(RM’000) (RM’000)
1 1.4 180
2 1.6 184
3 2.0 215
4 2.2 220
5 2.4 225
6 1.6 185
7 2.0 224
8 2.6 187
9 1.5 178
10 2.1 222
11 1.7 190
12 2.3 215

a) Draw a scatter diagram for the above data.


b) Compute the product moment correlation coefficient and explain its meaning.
c) Using least square method, find the regression line for the company sales against
advertising expenditure. Interpret the meaning of the slope of the regression line.
d) Calculate the coefficeint of determination and interpret the value.
e) Estimate the company sales if the advertising expenditure is 2.8

15
Chapter 3 STA104 : Correlation and Simple Linear Regression DEC2016-APRIL2017

16

You might also like