You are on page 1of 39

CHAPTER 5

BIVARIATE ANALYSIS
Lecturer: Dr Ruzanita Mat Rani
Prepared By:
HAZIYAH BINTI MD JASMIN
CHAPTER OUTLINE
5.0 INTRODUCTION
5.1 SCATTER DIAGRAM/SCATTER PLOT
5.2 CORRELATION
1. Pearson Correlation Coefficient (r)
2. Coefficient of Determination (R2)
5.3 SIMPLE LINEAR REGRESSION (The Least Square Method / Least Square Regression Line)
1. Estimated Regression Coefficients
2. Estimating The Dependent Variable
5.0
INTRODUCTION
What is
Bivariate Types of Method of
Analysis? Variables Analysis
Independent Variable (X)
The analysis involving Also known as Simple Linear Regression
TWO QUANTITATIVE VARIABLES Regressor/Predictor/Factor
A statistical method that
(X and Y) allows us to summarize and
study the linear relationships
Dependent Variable (Y) between two continuous
Also known as Response Variable (quantitative) variables.

The simplest forms of statistical


analysis, used to find out if there EXAMPLE: The management of a chain of
is a relationship between two sets fast-food restaurants wish to determine if
sales are related to advertising expenditure.
of values.
Ind. Var (X) : Advertising Expenditure
Dep. Var (Y) : Sales
5.1
SCATTER DIAGRAM
SCATTER DIAGRAM

The dots on the diagram


A graphical analysis which represents the pair of
Also known as show the DIRECTION of observations (x & y) where
Scatter Plot relationship between the value of x is for the
variables. independent variable and y
is for dependent variable.
SCATTER DIAGRAM
Types/Direction of Relationship

Positive linear Negative linear No linear


relationship relationship relationship

𝑦 Scatter Diagram 𝑦 Scatter Diagram 𝑦 Scatter Diagram


80 80 60
Tensile Strength

Tensile Strength

Tensile Strength
60 60 40
40 40
20
20 20
0 0 0
0 5 10 15 0 5 10 15 0 5 10 15
Percentage of Hardwood 𝑥 Percentage of Hardwood 𝑥 Percentage of Hardwood 𝑥

Positive linear relationship Negative linear relationship No linear relationship


(as values of x increases, (as values of x increases, (as values of x increases,
value of y increases) value of y decreases) value of y unchanged)
5.2
CORRELATION
CORRELATION
CORRELATION
A measurement used to
determine the DIRECTION and the
STRENGTH of the linear
relationship of the two variables
(X & Y)

COEFFICIENT OF DETERMINATION (R2)


CORRELATION COEFFICIENT (r)
A value/coefficient obtained to
A value/coefficient obtained to measure the proportion of the total
determine the direction and the variation in Y that is explained by the
strength of the linear relationship of independent variable X in the
the two variables (X & Y) regression model
5.2.1
PEARSON CORRELATION
COEFFICIENT (r)
CORRELATION COEFFICIENT (r)
■ Also known as Pearson Correlation Coefficient or Pearson Product Moment of Correlation
■ It was a measurement used to measure the strength and determine the direction of a linear
relationship between X and Y variables.
■ The population correlation coefficient is 𝝆 (the Greek letter “rho”) and the sample correlation
coefficient is denoted by r and its value range from -1 and +1.
■ Formula used to calculate correlation coefficient is given as:

Σ𝑥Σ𝑦
Σ𝑥𝑦 − 𝑛Σ𝑥𝑦 − Σ𝑥Σ𝑦 𝑆𝑆𝑥 𝑦
𝑟= 𝑛 = =
(Σ𝑥)2 (Σ𝑦)2 𝑛Σ𝑥2 − (Σ𝑥)2 𝑛Σ𝑦2 − (Σ𝑦)2 𝑆𝑆𝑥𝑥𝑆𝑆𝑦𝑦
Σ𝑥2 − Σ𝑦2 −
𝑛 𝑛
where Σ𝑥 = 𝑆𝑢𝑚 𝑜𝑓 𝑡ℎ𝑒 𝑥 𝑣𝑎𝑙𝑢𝑒𝑠
Σ𝑦 = 𝑆𝑢𝑚 𝑜𝑓 𝑡ℎ𝑒 𝑦 𝑣𝑎𝑙𝑢𝑒𝑠
Σ𝑥𝑦 = 𝑆𝑢𝑚 𝑜𝑓 𝑡ℎ𝑒 𝑝𝑟𝑜𝑑𝑢𝑐𝑡 𝑜𝑓 𝑥 𝑎𝑛𝑑 𝑦 𝑣𝑎𝑙𝑢𝑒𝑠
Σ𝑥2 = 𝑆𝑢𝑚 𝑜𝑓 𝑡ℎ𝑒 𝑠𝑞𝑢𝑎𝑟𝑒 𝑣𝑎𝑙𝑢𝑒𝑠 𝑜𝑓 𝑥
Σ𝑦2 = 𝑆𝑢𝑚 𝑜𝑓 𝑡ℎ𝑒 𝑠𝑞𝑢𝑎𝑟𝑒 𝑣𝑎𝑙𝑢𝑒𝑠 𝑜𝑓 𝑦
𝑛 = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑎𝑖𝑟𝑒𝑑 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
CORRELATION COEFFICIENT (r)
■ The following table and figure give a simple guideline of interpreting the values of r :
Values of r Interpretation
-1 Perfect negative linear relationship
- 0.7 ≤ r < - 1 Strong negative linear relationship
- 0.5 ≤ r < - 0.7 Moderate negative linear relationship
0 < r < - 0.5 Weak negative linear relationship
0 No linear relationship
0 < r < 0.5 Weak positive linear relationship
0.5 ≤ r < 0.7 Moderate positive linear relationship
0.7 ≤ r < 1 Strong positive linear relationship
1 Perfect positive linear relationship
Negative Correlation Positive Correlation

-1 0 +1
Negative Positive
Perfect No correlation Perfect
Correlation Correlation
EXAMPLE 1
Calculate the correlation coefficient between scores of Test 1 and Final Examination. Interpret the
value obtained.
TEST 1 (X) FINAL EXAM (Y) XY X2 Y2
65 69 4485 4225 4761
70 83 5810 4900 6889
68 60 4080 4624 3600
59 58 3422 3481 3364
46 51 2346 2116 2601
50 53 2650 2500 2809
74 76 5624 5476 5776
40 38 1520 1600 1444
𝜮𝑿 = 𝟒𝟕𝟐 𝜮𝒀 = 𝟒𝟖𝟖 𝜮𝑿𝒀 = 𝟐𝟗𝟗𝟑𝟕 𝜮𝑿𝟐 = 𝟐𝟖𝟗𝟐𝟐 𝜮𝒀𝟐 = 𝟑𝟏𝟐𝟒𝟒

Σ𝑥𝑦− Σ𝑥Σ𝑦 29937 − (472)(488) 1145


Solution: 𝑟 = 2
𝑛
2
= 8
= = 0.909
2 (488)2 1074 1476
Σ𝑥 2 − (Σ𝑥) Σ𝑦 2 − (Σ𝑦) 28922−(472) 31244−
𝑛 𝑛 8 8

Interpretation: There is a strong positive linear relationship between scores of Test 1 and scores of
Final Examination.
How to Obtain the value of Correlation Coefficient
(𝑟) by using Calculator (CASIO 𝑓𝑥 − 570𝑀𝑆)?
1st Enter/Key in the Data
Step 1: Press MODE twice until you see “REG”. Then Press 2.

Step 2: Press 1 “Lin” to select Linear Regression.

𝐼𝑛𝑑. 𝑉𝑎𝑟 (𝑥) 𝐷𝑒𝑝. 𝑉𝑎𝑟 (𝑦)

Step 3: Start key in the data. You have to enter Ind. Var (x) first
then press comma (,) followed by the Dep. Var (y).
Step 4: After you enter each pair of data (x, y), then press M+.
The screen will display n=1 which indicates this is your
first sample or data.
Step 5: Repeat Step 3 & 4 to finish your data entering.

2nd Obtained the Numerical Summaries (𝜮𝒙, 𝜮𝒚, 𝜮𝒙𝒚, 𝜮𝒙𝟐, 𝜮𝒚𝟐)

Step 1: Press SHIFT followed by 1. Your screen will display the summation of 𝑥.

Step 2: To yield the value of 𝜮𝒙𝟐, Press 1 followed by “=“. Press the corresponding number to yield the
value of other symbols (To do so, repeat Step 1 first).
Step 3: To obtain the summation of 𝑦, repeat Step 1 then click on “ “ button.
3rd Obtained the Correlation Coefficient (𝒓)

Step 1: Press SHIFT followed by 2. Your screen will display


𝑥ҧ.

Step 2: To get the value of 𝒓, click on “ “ button twice.

Step 3: Press 3 followed by “=“ to yield the value of 𝒓.


How to Obtain the value of Correlation Coefficient
(𝑟) by using Calculator (CASIO 𝑓𝑥 − 570𝐸𝑆 𝑃𝐿𝑈𝑆)?
1st Enter/Key in the Data

Step 1: Press MODE.


Step 2: Press 3 to choose STAT.
Step 3: Press 2 to choose 𝐴 + 𝐵𝑥.
Step 4: Start key in the data. You have to enter Ind. Var (x) followed by the Dep. Var (y).

Step 5: Repeat Step 4 to finish your data entering.


2nd Obtained the Numerical Summaries (𝜮𝒙, 𝜮𝒚, 𝜮𝒙𝒚, 𝜮𝒙𝟐,𝜮𝒚𝟐)

Step 1: Press AC.

Step 2: Press SHIFT followed by 1 to select statistical mode.

Step 3: Press 3 to select “SUM”.

Step 4: To yield the value of 𝜮𝒙𝟐, Press 1 followed by “=“. Press the corresponding number to yield the
value of other symbols.
3rd Obtained the Correlation Coefficient (𝒓)

Step 1: Press AC.


Step 2: Press SHIFT followed by 1 to select statistical mode.

Step 3: Press 5 to select “REG”.

Step 4: Press 3 followed by “=“ to yield the value of 𝒓.


5.2.2
COEFFICIENT OF
DETERMINATION (R2)
COEFFICIENT OF DETERMINATION (R2)
■ Measure the proportion of the total variation in the dependent variable (Y) that is explained by the
independent variable (X) in the regression model.
■ Formula used to calculate R2 is given as; 𝑅2 = 𝑟 2

EXAMPLE 2
Referring to Example 1, determine and interpret the meaning of R2.
Solution: 𝑅2 = 𝑟 2 = (0.909)2= 0.826 ≈ 82.60%

Interpretation of R2 (convert the value of R2 into percentage):


82.70% of the total variation in the score of Final Examination is explained by the variation in
the score of Test 1 and another 17.30% is explained by other variables/factors.
5.3
SIMPLE LINEAR
REGRESSION
Simple Linear Regression is a statistical method that allows us to summarize and study the linear
relationships between two continuous (quantitative) variables

The LEAST SQUARE METHOD was used to represent the linear relationship between independent
(X) and dependent variable (Y) using an EQUATION

Estimated regression line @ Least square regression line


Standard form of LR equation: @ Least square line:
𝑌𝑖= 𝛽0 + 𝛽1𝑋 + 𝜀𝑖

where 𝑋 = 𝑉𝑎𝑙𝑢𝑒 𝑜𝑓 𝑋
𝑌𝑖 = 𝑉𝑎𝑙𝑢𝑒 𝑜𝑓 𝑌
𝛽0 = 𝑌 𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡
𝛽1 = 𝑆𝑙𝑜𝑝𝑒
𝜖𝑖 = 𝐴 𝑟𝑎𝑛𝑑𝑜𝑚 𝑒𝑟𝑟𝑜𝑟
𝒂, 𝒃 is a constant value which is also known as the
estimated regression coefficients
5.3.1
ESTIMATED REGRESSION
COEFFICIENTS (𝒂, 𝒃)
EXAMPLE 1
Calculate the correlation coefficient between scores of Test 1 and Final Examination. Interpret the
value obtained.
TEST 1 (X) FINAL EXAM (Y) XY X2 Y2
65 69 4485 4225 4761
70 83 5810 4900 6889
68 60 4080 4624 3600
59 58 3422 3481 3364
46 51 2346 2116 2601
50 53 2650 2500 2809
74 76 5624 5476 5776
40 38 1520 1600 1444
𝜮𝑿 = 𝟒𝟕𝟐 𝜮𝒀 = 𝟒𝟖𝟖 𝜮𝑿𝒀 = 𝟐𝟗𝟗𝟑𝟕 𝜮𝑿𝟐 = 𝟐𝟖𝟗𝟐𝟐 𝜮𝒀𝟐 = 𝟑𝟏𝟐𝟒𝟒

Σ𝑥𝑦− Σ𝑥Σ𝑦 29937 − (472)(488) 1145


Solution: 𝑟 = 2
𝑛
2
= 8
= = 0.909
2 (488)2 1074 1476
Σ𝑥 2 − (Σ𝑥) Σ𝑦 2 − (Σ𝑦) 28922−(472) 31244−
𝑛 𝑛 8 8

Interpretation: There is a strong positive linear relationship between scores of Test 1 and scores of
Final Examination.
EXAMPLE 3
By using Example 1, find the least square regression line and interpret the values obtained as the
values for summation of X and Y were given as follows:
Σ𝑥 = 472 Σ𝑦 = 488 Σ𝑥𝑦 = 29937 Σ𝑥2 = 28922 Σ𝑦2 = 31244
Solution:

8 29937 − (472)(488) 9160


= = 1.066
8 28922 − (472)2 8592

Interpretation: It tells us that we predict the mean final exam to increase by 1.066 for every
additional one mark increase in Test 1
488 472
− 1.066 = −1.894
8 8

Interpretation: It tells us that a person who is 0 mark for Test 1 is predicted to get -1.894 marks in
Final Exam. Clearly this prediction is nonsense. It is not meaningful to have 0 mark for Test 1.
The Least Square Regression Line,

Final Exam Score = _______+ _____Test 1 Score

Relationship Between The Value of Correlation


Coefficient (r) and The Slope (b) of The Regression
❖ The value of slope 𝑏 and the value of 𝑟 have the following relationship:

Value of 𝒓 Value of 𝒃
Positive Positive
Negative Negative
0 0
How to Obtain the value of the Estimated Regression
Coefficient (𝑎, 𝑏) by using Calculator (CASIO 𝑓𝑥 − 570𝑀𝑆)?
After you finished entering your data as shown before, follow the following steps:

Step 1: Press SHIFT followed by 2. Your screen will display 𝑥ҧ

Step 2: To get the value of 𝑡ℎ𝑒 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 (𝑎, 𝑏), click on “ “ button twice.

Step 3: Press 1 followed by “=“ to yield the value of 𝒕𝒉𝒆 𝒚 − 𝒊𝒏𝒕𝒆𝒓𝒄𝒆𝒑𝒕, 𝒂.


Press 2 followed by “=“ to yield the value of 𝒕𝒉𝒆 𝒔𝒍𝒐𝒑𝒆, 𝒃.
How to Obtain the value of the Estimated Regression Coefficient
(𝑎, 𝑏) by using Calculator (CASIO 𝑓𝑥 − 570𝐸𝑆 𝑃𝐿𝑈𝑆)?
After you finished entering your data as shown before, follow the following steps:
Step 1: Press AC.
Step 2: Press SHIFT followed by 1 to select statistical mode.

Step 3: Press 5 to select “REG”.

Step 4: Press 1 followed by “=“ to yield the value of 𝒕𝒉𝒆 𝒚 − 𝒊𝒏𝒕𝒆𝒓𝒄𝒆𝒑𝒕, 𝒂.


Press 2 followed by “=“ to yield the value of 𝒕𝒉𝒆 𝒔𝒍𝒐𝒑𝒆, 𝒃.
5.3.2
ESTIMATING THE
DEPENDENT VARIABLE
■ The regression line, 𝑦ො = 𝒂 + 𝒃𝒙 can be used to estimate the values of the dependent variable (𝑦)
given the value of 𝑥.
■ By using the least square regression line in Example 3, estimate the value of Final Exam Score if
the score of Test 1 is 90%.

Solution:
By substituting 𝑥 = 90; 𝑦ො = −1.894 + 1.066𝒙 = −1.894 + 1.066 𝟗𝟎 = 94.046%
EXAMPLE 4
The following data shows the number of year’s people smoked and the percentage of lung damage they sustained.

a) Write the least square line to estimate percentage of lung damage from number of years smoking.
b) Estimate the percentage of lung damage for a person who has been smoking for 30 years.
Solution:
a) No need to calculate the value of the estimated regression coefficient (𝑎, 𝑏), used calculator to obtain these two
values directly as the question asked you to “WRITE”
𝑎 = −10.944
𝑏 = 1.969
𝑇ℎ𝑒 𝑙𝑒𝑎𝑠𝑡 𝑠𝑞𝑢𝑎𝑟𝑒 𝑙𝑖𝑛𝑒 is 𝑦ො = a + 𝑏𝑥; 𝑦ො = −10.944 + 1.969𝑥
𝐿𝑢𝑛𝑔 𝑑𝑎𝑚𝑎𝑔𝑒 = −10.944 + 1.969𝑌𝑒𝑎𝑟𝑠 𝑝𝑒𝑜𝑝𝑙𝑒 𝑠𝑚𝑜𝑘𝑒𝑑
b) Substituting 𝑥 = 30; 𝑦ො = −10.944 + 1.969𝑥
= −10.944 + 1.969(30)
= 48.126%
SPPS Output for Simple Linear Regression
Correlation coefficient,
(𝑟)

Coefficient of determination,
(𝑅2)

𝑦 − 𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡,
(𝑎)

𝐼𝑛𝑑. 𝑉𝑎𝑟 (𝑥)

𝑠𝑙𝑜𝑝𝑒, Correlation coefficient,


(𝑏) (𝑟)
EXAMPLE 5 – DEC’19 Q.6
Consider the following data on the number of song stored in MP3 player and the number of months the user has owned
the MP3 player for a sample of 15 owners. The data were recorded and analyzed using IBM SPSS Statistics. The results
are as follow.
a) Based on the scatter plot, describe on the relationship between the numbers of month owned MP3 and the number of
song stored in MP3.
Answer: Positive linear relationship (as values of x (number of month owned MP3) increases, value of y (number of
song stored in MP3) increases.

b) Calculate the correlation coefficient value. Hence, explain its meaning.


Answer: Σ𝑥2 = 5252, Σ𝑥 = 224, Σ𝑦2 = 2249085, Σ𝑦 = 4539, Σ𝑥𝑦 = 108069, 𝑛 = 15

224(4539)
108069 − 40286.6
15 =
224 2 4539 2 (1906.933)(875583.6)
5252 − [2249085 − ]
15 15
40286.6
= =0.986
40861.71

Strong positive linear relationship


c) How many percent of the variation in number of song stored in MP3 is explained by the variation in number of
month owned MP3?
Answer: 𝑅2 = 𝑟 2 = 0.9862 = 0.972 ≈ 97.2%
97
97.2% of the total variation in number of song stored in MP3 is explained by the variation in number of month
owned MP3 and another 2.8% is explained by other variables/factors.

d) Name the statistic used in c).


Answer: Coefficient of determination

e) Based on given output, write the complete estimated regression equation. Hence, interpret the slope in the context
of the problem.
Answer: Complete estimated regression equation; 𝑦ො = 𝑎 + 𝑏𝑥 = -12.887 + 21.126x

Interpretation of slope =
It tells us that we predict the mean number of song stored in MP3 increase by 21.126 for every additional
one unit increase in number of month owned MP3.

f) Predict the number of song stored in MP3 if the number of month owned of MP3 is 37.

Answer: Substitute 𝑥 = 37; 𝑦ො = -12.887+21.126(37) = 769


EXAMPLE 6 – DEC’18 Q.7
A study was conducted to determine the relationship between working experience (in years) and the monthly salary
(RM’00) of teachers at a primary school in Kuantan. The data were analysed using SPSS and produced the following
tables.

Answer the following questions based on the given output.


c) Determine the value of M. Comment on the value. (2 marks) *Since there’s no data provided, thus we can’t find the
value of (𝒓) by using calculator, and crucially the question did asked you to used only the value given in the output. So to solve
this question, we need to use the value of 𝑹𝟐.*

Answer: 𝑟 = (𝑅2) = 0.878 = 0.937


e) Estimate the monthly salary for a teacher with 15 years of working experiences. (2 marks)
Answer: Substitute 𝑥 = 15; 𝑦ො = 23.470 + 1.194(15) = 41.38 (𝑅𝑀′00)
≈ 𝑅𝑀4138
THE END

You might also like