Professional Documents
Culture Documents
BIVARIATE ANALYSIS
Lecturer: Dr Ruzanita Mat Rani
Prepared By:
HAZIYAH BINTI MD JASMIN
CHAPTER OUTLINE
5.0 INTRODUCTION
5.1 SCATTER DIAGRAM/SCATTER PLOT
5.2 CORRELATION
1. Pearson Correlation Coefficient (r)
2. Coefficient of Determination (R2)
5.3 SIMPLE LINEAR REGRESSION (The Least Square Method / Least Square Regression Line)
1. Estimated Regression Coefficients
2. Estimating The Dependent Variable
5.0
INTRODUCTION
What is
Bivariate Types of Method of
Analysis? Variables Analysis
Independent Variable (X)
The analysis involving Also known as Simple Linear Regression
TWO QUANTITATIVE VARIABLES Regressor/Predictor/Factor
A statistical method that
(X and Y) allows us to summarize and
study the linear relationships
Dependent Variable (Y) between two continuous
Also known as Response Variable (quantitative) variables.
Tensile Strength
Tensile Strength
60 60 40
40 40
20
20 20
0 0 0
0 5 10 15 0 5 10 15 0 5 10 15
Percentage of Hardwood 𝑥 Percentage of Hardwood 𝑥 Percentage of Hardwood 𝑥
Σ𝑥Σ𝑦
Σ𝑥𝑦 − 𝑛Σ𝑥𝑦 − Σ𝑥Σ𝑦 𝑆𝑆𝑥 𝑦
𝑟= 𝑛 = =
(Σ𝑥)2 (Σ𝑦)2 𝑛Σ𝑥2 − (Σ𝑥)2 𝑛Σ𝑦2 − (Σ𝑦)2 𝑆𝑆𝑥𝑥𝑆𝑆𝑦𝑦
Σ𝑥2 − Σ𝑦2 −
𝑛 𝑛
where Σ𝑥 = 𝑆𝑢𝑚 𝑜𝑓 𝑡ℎ𝑒 𝑥 𝑣𝑎𝑙𝑢𝑒𝑠
Σ𝑦 = 𝑆𝑢𝑚 𝑜𝑓 𝑡ℎ𝑒 𝑦 𝑣𝑎𝑙𝑢𝑒𝑠
Σ𝑥𝑦 = 𝑆𝑢𝑚 𝑜𝑓 𝑡ℎ𝑒 𝑝𝑟𝑜𝑑𝑢𝑐𝑡 𝑜𝑓 𝑥 𝑎𝑛𝑑 𝑦 𝑣𝑎𝑙𝑢𝑒𝑠
Σ𝑥2 = 𝑆𝑢𝑚 𝑜𝑓 𝑡ℎ𝑒 𝑠𝑞𝑢𝑎𝑟𝑒 𝑣𝑎𝑙𝑢𝑒𝑠 𝑜𝑓 𝑥
Σ𝑦2 = 𝑆𝑢𝑚 𝑜𝑓 𝑡ℎ𝑒 𝑠𝑞𝑢𝑎𝑟𝑒 𝑣𝑎𝑙𝑢𝑒𝑠 𝑜𝑓 𝑦
𝑛 = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑎𝑖𝑟𝑒𝑑 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
CORRELATION COEFFICIENT (r)
■ The following table and figure give a simple guideline of interpreting the values of r :
Values of r Interpretation
-1 Perfect negative linear relationship
- 0.7 ≤ r < - 1 Strong negative linear relationship
- 0.5 ≤ r < - 0.7 Moderate negative linear relationship
0 < r < - 0.5 Weak negative linear relationship
0 No linear relationship
0 < r < 0.5 Weak positive linear relationship
0.5 ≤ r < 0.7 Moderate positive linear relationship
0.7 ≤ r < 1 Strong positive linear relationship
1 Perfect positive linear relationship
Negative Correlation Positive Correlation
-1 0 +1
Negative Positive
Perfect No correlation Perfect
Correlation Correlation
EXAMPLE 1
Calculate the correlation coefficient between scores of Test 1 and Final Examination. Interpret the
value obtained.
TEST 1 (X) FINAL EXAM (Y) XY X2 Y2
65 69 4485 4225 4761
70 83 5810 4900 6889
68 60 4080 4624 3600
59 58 3422 3481 3364
46 51 2346 2116 2601
50 53 2650 2500 2809
74 76 5624 5476 5776
40 38 1520 1600 1444
𝜮𝑿 = 𝟒𝟕𝟐 𝜮𝒀 = 𝟒𝟖𝟖 𝜮𝑿𝒀 = 𝟐𝟗𝟗𝟑𝟕 𝜮𝑿𝟐 = 𝟐𝟖𝟗𝟐𝟐 𝜮𝒀𝟐 = 𝟑𝟏𝟐𝟒𝟒
Interpretation: There is a strong positive linear relationship between scores of Test 1 and scores of
Final Examination.
How to Obtain the value of Correlation Coefficient
(𝑟) by using Calculator (CASIO 𝑓𝑥 − 570𝑀𝑆)?
1st Enter/Key in the Data
Step 1: Press MODE twice until you see “REG”. Then Press 2.
Step 3: Start key in the data. You have to enter Ind. Var (x) first
then press comma (,) followed by the Dep. Var (y).
Step 4: After you enter each pair of data (x, y), then press M+.
The screen will display n=1 which indicates this is your
first sample or data.
Step 5: Repeat Step 3 & 4 to finish your data entering.
2nd Obtained the Numerical Summaries (𝜮𝒙, 𝜮𝒚, 𝜮𝒙𝒚, 𝜮𝒙𝟐, 𝜮𝒚𝟐)
Step 1: Press SHIFT followed by 1. Your screen will display the summation of 𝑥.
Step 2: To yield the value of 𝜮𝒙𝟐, Press 1 followed by “=“. Press the corresponding number to yield the
value of other symbols (To do so, repeat Step 1 first).
Step 3: To obtain the summation of 𝑦, repeat Step 1 then click on “ “ button.
3rd Obtained the Correlation Coefficient (𝒓)
Step 4: To yield the value of 𝜮𝒙𝟐, Press 1 followed by “=“. Press the corresponding number to yield the
value of other symbols.
3rd Obtained the Correlation Coefficient (𝒓)
EXAMPLE 2
Referring to Example 1, determine and interpret the meaning of R2.
Solution: 𝑅2 = 𝑟 2 = (0.909)2= 0.826 ≈ 82.60%
The LEAST SQUARE METHOD was used to represent the linear relationship between independent
(X) and dependent variable (Y) using an EQUATION
where 𝑋 = 𝑉𝑎𝑙𝑢𝑒 𝑜𝑓 𝑋
𝑌𝑖 = 𝑉𝑎𝑙𝑢𝑒 𝑜𝑓 𝑌
𝛽0 = 𝑌 𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡
𝛽1 = 𝑆𝑙𝑜𝑝𝑒
𝜖𝑖 = 𝐴 𝑟𝑎𝑛𝑑𝑜𝑚 𝑒𝑟𝑟𝑜𝑟
𝒂, 𝒃 is a constant value which is also known as the
estimated regression coefficients
5.3.1
ESTIMATED REGRESSION
COEFFICIENTS (𝒂, 𝒃)
EXAMPLE 1
Calculate the correlation coefficient between scores of Test 1 and Final Examination. Interpret the
value obtained.
TEST 1 (X) FINAL EXAM (Y) XY X2 Y2
65 69 4485 4225 4761
70 83 5810 4900 6889
68 60 4080 4624 3600
59 58 3422 3481 3364
46 51 2346 2116 2601
50 53 2650 2500 2809
74 76 5624 5476 5776
40 38 1520 1600 1444
𝜮𝑿 = 𝟒𝟕𝟐 𝜮𝒀 = 𝟒𝟖𝟖 𝜮𝑿𝒀 = 𝟐𝟗𝟗𝟑𝟕 𝜮𝑿𝟐 = 𝟐𝟖𝟗𝟐𝟐 𝜮𝒀𝟐 = 𝟑𝟏𝟐𝟒𝟒
Interpretation: There is a strong positive linear relationship between scores of Test 1 and scores of
Final Examination.
EXAMPLE 3
By using Example 1, find the least square regression line and interpret the values obtained as the
values for summation of X and Y were given as follows:
Σ𝑥 = 472 Σ𝑦 = 488 Σ𝑥𝑦 = 29937 Σ𝑥2 = 28922 Σ𝑦2 = 31244
Solution:
Interpretation: It tells us that we predict the mean final exam to increase by 1.066 for every
additional one mark increase in Test 1
488 472
− 1.066 = −1.894
8 8
Interpretation: It tells us that a person who is 0 mark for Test 1 is predicted to get -1.894 marks in
Final Exam. Clearly this prediction is nonsense. It is not meaningful to have 0 mark for Test 1.
The Least Square Regression Line,
Value of 𝒓 Value of 𝒃
Positive Positive
Negative Negative
0 0
How to Obtain the value of the Estimated Regression
Coefficient (𝑎, 𝑏) by using Calculator (CASIO 𝑓𝑥 − 570𝑀𝑆)?
After you finished entering your data as shown before, follow the following steps:
Step 2: To get the value of 𝑡ℎ𝑒 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 (𝑎, 𝑏), click on “ “ button twice.
Solution:
By substituting 𝑥 = 90; 𝑦ො = −1.894 + 1.066𝒙 = −1.894 + 1.066 𝟗𝟎 = 94.046%
EXAMPLE 4
The following data shows the number of year’s people smoked and the percentage of lung damage they sustained.
a) Write the least square line to estimate percentage of lung damage from number of years smoking.
b) Estimate the percentage of lung damage for a person who has been smoking for 30 years.
Solution:
a) No need to calculate the value of the estimated regression coefficient (𝑎, 𝑏), used calculator to obtain these two
values directly as the question asked you to “WRITE”
𝑎 = −10.944
𝑏 = 1.969
𝑇ℎ𝑒 𝑙𝑒𝑎𝑠𝑡 𝑠𝑞𝑢𝑎𝑟𝑒 𝑙𝑖𝑛𝑒 is 𝑦ො = a + 𝑏𝑥; 𝑦ො = −10.944 + 1.969𝑥
𝐿𝑢𝑛𝑔 𝑑𝑎𝑚𝑎𝑔𝑒 = −10.944 + 1.969𝑌𝑒𝑎𝑟𝑠 𝑝𝑒𝑜𝑝𝑙𝑒 𝑠𝑚𝑜𝑘𝑒𝑑
b) Substituting 𝑥 = 30; 𝑦ො = −10.944 + 1.969𝑥
= −10.944 + 1.969(30)
= 48.126%
SPPS Output for Simple Linear Regression
Correlation coefficient,
(𝑟)
Coefficient of determination,
(𝑅2)
𝑦 − 𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡,
(𝑎)
224(4539)
108069 − 40286.6
15 =
224 2 4539 2 (1906.933)(875583.6)
5252 − [2249085 − ]
15 15
40286.6
= =0.986
40861.71
e) Based on given output, write the complete estimated regression equation. Hence, interpret the slope in the context
of the problem.
Answer: Complete estimated regression equation; 𝑦ො = 𝑎 + 𝑏𝑥 = -12.887 + 21.126x
Interpretation of slope =
It tells us that we predict the mean number of song stored in MP3 increase by 21.126 for every additional
one unit increase in number of month owned MP3.
f) Predict the number of song stored in MP3 if the number of month owned of MP3 is 37.