You are on page 1of 8

Lesson 3: Linear Regression and Correlations

Learning Objectives

Upon the completion of this topic, you are expected to:


a. recall concepts on linear correlation and least square line;
b. describe the set of data using the computed correlation coefficient;
c. identify what relationship that exists between two variables; and
d. estimate a value of the dependent variable based on the derived regression equation.
 

Presentation of Content

I. Linear Correlation
The coefficient measures the strength and direction of linear coefficient between two variables
(Larson and Farber, 2000; Pagala, 2011). We will use the formula below to determine the value
of linear coefficient.

r =n
∑ xy −¿ ∑ x ∑ y ¿
√ 2 2 2
[n ∑ x −(∑ x) ][n ∑ y −( ∑ y ) ]
2

Where:
n=number of ordered pairs
x=value of independent variable
y=value of dependent variable

How will you use the formula to determine the relationship of two variables?

From the formula, we will follow the following procedure:


1. Multiply x and y values and compute for the sum of the products. ∑ xy
2. Multiply the sum of the products by the number of ordered pairs. n ∑ xy
3. Determine the sum of x values. ∑ x
4. Determine the sum of y values. ∑ y
5. Multiply the totaled values of x and totaled values of y.∑ x ∑ y
6. Square the values of x and take the sum.∑ x 2
7. Multiply the sum of the squares of the values of x by the number of ordered pairs.n ∑ x 2
8. Square the values of y and take the sum.∑ y2
9. Multiply the sum of the squares of the values of y by the number of ordered pairs.n ∑ y 2
10. Square the total value of x.( ∑ x )
2

11. Square the total value of y.( ∑ y )


2
12. Substitute the values in the formula to determine the value of the coefficient.
 
Note: We can only employ correlation when data are in interval or ratio scale.

II. Simple Regression Analysis


We start with the concept of simple regression analysis.
When only one independent variable is used, the analysis is referred to as simple regression
analysis.
The formal statements of the simple linear regression model is:
 
y=¿α + βx
Where:
y=t h e value of dependent variable
a=t h e y — intercept
β=t h e slope of t h e regression line
x=t h e value of t h e independent variable

How can we apply the formula to predict values of the dependent variable?
 
Method of Least Square
Since α and β are generally not known in a regression problem, they must be estimated from a
sample data taken on the dependent variable y for a number of values of the independent variable
x.
  
Note: The standard approach to estimating α and β is using the least squares (minimizing the
sum of the squared errors for your data points.)
 
Sample estimates of α and β are denoted by α and β, respectively, and the resulting regression
line is called sample least squares regression equation.

y = α + βx

The sum of the squared deviation between the line and the scatter of points should be minimized.
Statisticians have found that the formulas for α and β are shown below:

β=
∑ (x −x)( x− y )
∑ ( x−x)
a= y−β x
Note: Here, x and y denote the sample means of x and y.
 
Alternative Formulas
The alternative formulas for α and β are as follow.

β=n ∑ xy −¿ ¿ ¿

a=
∑ y−β ∑ x
n
 

Application

Example 1
Now, let us apply what we have learned. Here is an activity where we can utilize the formula
given. Remember to follow the guidelines in determining the linear coefficient. Try to solve the
problem independently before comparing your answers to the answers provided.
 
Problem: The list of height and weight of 10 basketball players is given below. Determine the
value of the linear coefficient.
 
The list of height and weight of 10 basketball players.
X
(Height in 67 70 71 70 66 69 72 78 64 65
Inches)
Y
(Weight in 71 70 69 68 66 65 71 70 64 65
Kilograms)

Have you tried answering the problem? Great! Now, we can compare your answers.

Solution:
We determine the values of the variables. 
Height (X) Weight (Y) XY X2 Y2
67 71 4,757 4,489 5,041
70 70 4,900 4,900 4,900
71 69 4,899 5,041 4,761
70 68 4,760 4,900 4,624
66 66 4,356 4,356 4,356
69 65 4,485 4,761 4,225
72 71 5,112 5,184 5,041
78 70 5,460 6,084 4,900
64 64 4,096 4,096 4,096
65 65 4,225 4,225 4,225
The values of the variables are:
 
∑ xy =47,050 n ∑ xy=¿470,500 ∑ x = 692
∑ y=¿ ¿679 ∑ x ∑ y=¿ ¿469,868 ∑ x 2=¿ ¿ 48,036
n ∑ x =¿ ¿480,360 ∑ y2 =¿ ¿46,169 n ∑ y =¿ ¿ 461,690
2 2

2 2
( ∑ x ) =¿478,864 ( ∑ y ) =¿46,104

We are now ready to substitute them in the formula.


∑ xy −¿ ∑ x ∑ y
  r =n ¿

[n ∑ x −(∑ x) ][ n ∑ y −( ∑ y ) ]
2 2 2 2

( 470,500 ) −(692)(697)
r=
√ [480,360−( 692 ) ][461,690−(697) ]
2 2

470,500−469,868
r=
√ [480,360−478,864][461,690−461,041]
632
r=
√(1,496)( 649)
632
r=
√ 970,909
632
r=
985.35
r =0.64

The value of the linear coefficient is 0.64.

What could be the meaning of the value we computed?


 
Interpreting the Correlation Coefficient
After determining the correlation coefficient, we need to interpret the value. The quantitative
interpretation of the degree of linear relationship existing is shown below.
Values Interpretation
±1.00 Perfect positive/ negative correlation
±0.91 to ±0.99 Very high positive/ negative correlation
±0.71 to ±0.90 High positive/ negative correlation
±0.51 to ±0.70 Moderately positive/ negative correlation
±0.31 to ±0.50 Low positive/ negative correlation
±0.01 to ±0.30 Slight positive/ negative correlation
0 No correlation

From the previous activity, the correlation coefficient is 0.64 which can be interpreted as a
moderately positive correlation. There is a substantial degree of correlation between the height
and weight of the ten basketball players.
 
Awesome! Keep up the good work!
 
Exercise 1
Let us put your understanding into practice. Below are the test results of 10 students in their
Mathematics and English examinations. Determine the linear correlation coefficient and interpret
its value.
X
(Score in 34 23 45 44 37 46 23 41 40 35
Mathematics
)
Y
(Score in 35 21 43 42 32 45 23 47 43 37
English)

Example 2
Using the given formulas, try to determine the values of the variables to come up with the least
squares regression equation.
 
Problem:
The Cagayan State University officials wished to determine if the CSU—College Admission
scores is a good indicator of the General Weighted Average (GWA) of the 16 scholars selected at
random from the first year class. Their GPA and CSU-CAT scores are shown in the next page.

What will the estimated GWA of a student with the CAT score of 83?
 
Student CAT Raw Score (x) GWA (y)
1 80 85
2 82 87
3 90 90
4 87 88
5 80 84
6 85 89
7 95 97
8 97 98
9 98 98
10 90 92
11 82 85
12 81 83
13 85 87
14 86 88
15 88 88
16 92 95

How can one predict and estimate GWA from CAT scores?
 
Solution
Now, we need to obtain the equation for the line that best fits the sample data.
CAT Raw
Student GWA (y) xy x2 y2
Score (x)
1 80 85 6,800 6,400 7,225
2 82 87 7,134 6,724 7,569
3 90 90 8,100 8,100 8,100
4 87 88 7,656 7,569 7,744
5 80 84 6,720 6,400 7,056
6 85 89 7,565 7,225 7,921
7 95 97 9,215 9,025 9,409
8 97 98 9,506 9,409 9,604
9 98 98 9,604 9,604 9,604
10 90 92 8,280 8,100 8,464
11 82 85 6,970 6,724 7,225
12 81 83 6,723 6,561 6,889
13 85 87 7,395 7,225 7,569
14 86 88 7,568 7,396 7,744
15 88 88 7,744 7,744 7,744
16 92 95 8,740 8,464 9,025
Total 1,398 1,434 125,720 122,670 128,892

Solution:
Using the formulas:
1,434
y= =89.625
16
1,398
x= =87.375
16
16(125,720)−(1,398)(1,434)
β= 2
=0.8163
16 (122,670 )−(1,398)
a=89.625−( 0.8163 )( 87.375 )=18.3008

The fitted equation describing the relationship between GWA and CAT scores is: GWA =
18.3008 + 0.8163x

To predict the future GWA of a student with a CAT score of 83:


GWA = 18.3008 + 0.8163(83) = 86
 
Congratulations! You just learned to predict the future General Weighted Average of the student.
 
Exercise 2:
Determine the equation that would fit the following set of observations.

Age
10 12 11 26 28 21 22 18 16 15
(x)
Score
32 30 34 39 38 32 29 28 25 20
(y)

You might also like