You are on page 1of 11

Lesson

REGRESSION ANALYSIS
3

Learning Concept

There are many cases where we make predictions to make more


comprehensive decisions. In this lesson, we shall discuss the process of predicting
the value of one variable in relation to another variable which is called regression
analysis. From the previous lesson, we learned that the commonly used statistic to
measure correlation is the Pearson Product-Moment Correlation Coefficient, or
simply r. We also learned how to compute r using a formula. Further, we interpreted r
in terms of its direction and strength.
Check your readiness for this lesson by answering the following activities.

What’s In

Activity 1: What’s Happening


In our daily life, many things depend on others.
A. Match the dependent variable on the left with the independent variable on the
right.
Dependent Variable Independent Variable
___ 1. Price of banana cake a. Cost of appliance
___ 2. Amount of appliance payment b. Distance and speed
___ 3. Volume of a cube c. Size of cake
___ 4. Amount of cellular phone bill d. Length, width and thickness of the
cube
___ 5. Time it takes for a road trip e. Number of minutes consumed

B. Write the dependency statement for each event.


1. The price of banana cake depends on _______________________________
2. The amount of appliances payment depends on _______________________
3. ______________________________________________________________
4. ______________________________________________________________
5. ______________________________________________________________

22
Activity 2: Dependent or Independent?
Directions: Identify the dependent and independent variables in each pair of the
following variables. Complete your answer on the table provided.
Pair 1: Growth of a plant and the amount of sunlight and water
Pair 2: Monthly salary and annual income of an employee
Pair 3: Amount of time spent in studying and the scores in Statistics quiz
Pair 4: Ticket sales in raffle draw and the number of tickets sold
Pair 5: Price of commodities and the demand
Pair Dependent Variable Independent Variable
1
2
3
4
5
Then, place each variable on the space provided.
(Dependent Variable) (Independent Variable)
________________________ depends upon ______________________________
________________________ depends upon ______________________________
________________________ depends upon ______________________________
________________________ depends upon ______________________________
________________________ depends upon ______________________________

What is it

When two variables are related, one is the dependent variable while the other
is the independent variable. To identify which is the dependent and independent
variable, put each one on the blank in the statement: _________ depends upon
_________, then evaluate whether the statement is logical.
In a scatterplot, we can draw the trend line if there is an evident correlation
between the bivariate data. We also learned that the trend line is the line “closest” to
the points in the scatterplot.
When the trend line is drawn, we observe that some of the points are on the
line while others are below or above the line. In other words, we say that the points in
the scatterplot regress with reference to the line. If the average (Y) distance of the
points from this line is the least, then we call this line the regression line or the line
that “best fit” in the scatterplot. The regression line is the same as the trend line.

23
To find the regression line, we use the least-square method, which is
summarized using a formula. Like the equation of a line in Algebra, we write the
equation of the regression line using the “point-slope-form.”
Formula for linear regression:

𝑌𝑌 = 𝑚𝑚𝑚𝑚 + 𝑏𝑏

Where m is the slope and b is the y-intercept. We use the Pearson Product-
Moment Correlation or Pearson’s r correlation coefficient as the slope of the equation,
or that
𝑛𝑛(∑ 𝑋𝑋𝑋𝑋)−(∑ 𝑋𝑋)(∑ 𝑌𝑌)
𝑚𝑚 = 2
𝑛𝑛(∑ 𝑋𝑋 2 )−(∑ 𝑋𝑋)

To solve for the value of the y-intercept b, we have:

(∑ 𝑌𝑌)(∑ 𝑋𝑋 2 )−(∑ 𝑋𝑋)(∑ 𝑋𝑋𝑋𝑋)


𝑏𝑏 = 2
𝑛𝑛(∑ 𝑋𝑋 2 )−(∑ 𝑋𝑋)

The regression line 𝑌𝑌 = 𝑚𝑚𝑚𝑚 + 𝑏𝑏 is also called the line of prediction equation
because we use it to predict Y if X is known. Since in the analysis, only the Y distance
was considered, the line cannot be used to predict X from Y.

To determine the regression line or do a regression analysis, we go through the


following steps:

1. Find the value of the correlation coefficient (r).


2. Test the significance of r. If r is significant, proceed to regression analysis
(Proceed to Step 3). If r is not significant, regression analysis cannot be
done (Stop).
3. Find the values of b and m.
4. Substitute the values of b and m in the regression line 𝑌𝑌 = 𝑚𝑚𝑚𝑚 + 𝑏𝑏.

Example 1
The following data shows the number of tardiness and the number of performance
tasks missed by five students. If there is a significant relationship between the two
variables, predict the number of performance tasks missed by a student who was late
for 6 days.

Student Number of Tardiness Number of Missed Performance


Tasks
1 1 1
2 1 2
3 2 4
4 3 2

24
5 4 4

Solution:

Steps Solution
1. Identify the dependent and Here, the dependent variable is the
independent variables. number of missed performance tasks
while the independent variable is the
number of tardiness.
2. Compute the correlation Let us put the data in columns and find
coefficient (r) using the formula: the following: ∑ 𝑋𝑋, ∑ 𝑌𝑌, ∑ 𝑋𝑋 2 , ∑ 𝑌𝑌 2 , ∑ 𝑋𝑋𝑋𝑋
𝑛𝑛(∑ 𝑋𝑋𝑋𝑋) − (∑ 𝑋𝑋)(∑ 𝑌𝑌) and substitute them in the formula:
𝑟𝑟 =
√[𝑛𝑛(∑ 𝑋𝑋 2 ) − (∑ 𝑋𝑋)2 ][𝑛𝑛(∑ 𝑌𝑌 2 ) − (∑ 𝑌𝑌)2 ]   𝑋𝑋 2 𝑌𝑌 2 
1 1 1 1 1
1 2 1 4 2
2 4 4 16 8
3 2 9 4 6
4 4 16 16 16
∑X= 11 ∑Y= 13 ∑𝑿𝑿𝟐𝟐 = 31 ∑𝒀𝒀𝟐𝟐= 41 ∑XY= 33

𝑛𝑛(∑ 𝑋𝑋𝑋𝑋) − (∑ 𝑋𝑋)(∑ 𝑌𝑌)


𝑟𝑟 =
√[𝑛𝑛(∑ 𝑋𝑋 2 ) − (∑ 𝑋𝑋)2 ][𝑛𝑛(∑ 𝑌𝑌 2 ) − (∑ 𝑌𝑌)2 ]

5(33) − (11)(13)
𝑟𝑟 =
√[5(31) − (11)2 ] [5(41) − (13)2 ]

𝑟𝑟 = 0.63

3. Test the significance of r using Here 𝑛𝑛 = 5 and 𝑟𝑟 = 0.63


the formula:
𝑛𝑛 − 2
𝑡𝑡 = 𝑟𝑟 √
𝑛𝑛 − 2 1 − 𝑟𝑟 2
𝑡𝑡 = 𝑟𝑟 √
1 − 𝑟𝑟 2
5−2
𝑡𝑡 = 0.63 √
1 − (0.63)2

𝑡𝑡 = 1.41

4. Compare the computed t-value to Using 𝑑𝑑𝑑𝑑 = 𝑛𝑛 − 2 = 5 − 2 = 3, 𝛼𝛼 = 0.05,


the critical t-value. two-tailed test, we find from the t-table
that the critical value of t is 3.182.

25
5. Make a decision. Since the computed 𝑡𝑡 = 1.41 is less
than the critical value of 𝑡𝑡 = 3.182, we
accept the null hypothesis. So there is
no significant relationship between the
two variables.
(Note: If the computed t-value is equal to or
greater than the critical value, reject the null
hypothesis then accept the alternative
hypothesis.)

6. Summarize the results. It appears that there is no significant


relationship between the number of
tardiness and the number of missed
performance tasks. Therefore, we will
not proceed to the regression analysis.

Example 2
The following data pertains to the height of the mothers and their eldest daughters in
inches. If there is a significant relationship between the two variables, predict the
height of the daughter if the height of her mother is 78 inches.

Height of the Mother Height of the Daughter


71 71
69 69
69 71
65 68
66 68
63 66
68 70
70 72
60 65
58 60

Solution:

Steps Solution
1. Identify the dependent and Here, the dependent variable is the
independent variables. height of the daughter while the
independent variable is the height of the
mother.

26
2. Compute the correlation Let us put the data in columns and find
coefficient (r) using the formula: the following: ∑ 𝑋𝑋, ∑ 𝑌𝑌, ∑ 𝑋𝑋 2 , ∑ 𝑌𝑌 2 , ∑ 𝑋𝑋𝑋𝑋
𝑛𝑛(∑ 𝑋𝑋𝑋𝑋) − (∑ 𝑋𝑋)(∑ 𝑌𝑌) and substitute them in the formula:
𝑟𝑟 =
  𝑋𝑋 2 𝑌𝑌 2 
√[𝑛𝑛(∑ 𝑋𝑋 2 ) − (∑ 𝑋𝑋)2 ][𝑛𝑛(∑ 𝑌𝑌 2 ) − (∑ 𝑌𝑌)2 ]
71 71 5041 5041 5041

69 69 4761 4761 4761

69 71 4761 5041 4899

65 68 4225 4624 4420

66 68 4356 4624 4488

63 66 3969 4356 4158

68 70 4624 4900 4760

70 72 4900 5184 5040

60 65 3600 4225 3900

58 60 3364 3600 3480

∑X= 659 ∑Y= 680 ∑𝑿𝑿𝟐𝟐 = ∑𝒀𝒀𝟐𝟐= ∑XY=


43601 46356 44947

𝑛𝑛(∑ 𝑋𝑋𝑋𝑋) − (∑ 𝑋𝑋)(∑ 𝑌𝑌)


𝑟𝑟 =
√[𝑛𝑛(∑ 𝑋𝑋 2 ) − (∑ 𝑋𝑋)2 ][𝑛𝑛(∑ 𝑌𝑌 2 ) − (∑ 𝑌𝑌)2 ]

𝑟𝑟
10(44947) − (659)(680)
=
√[10(43601) − (659)2 ] [10(46356) − (680)2 ]

𝑟𝑟 = 0.95
3. Test the significance of r using Here 𝑛𝑛 = 10 and 𝑟𝑟 = 0.95
the formula:
𝑛𝑛 − 2
𝑡𝑡 = 𝑟𝑟 √
𝑛𝑛 − 2 1 − 𝑟𝑟 2
𝑡𝑡 = 𝑟𝑟 √
1 − 𝑟𝑟 2
10 − 2
𝑡𝑡 = 0.95 √
1 − (0.95)2

𝑡𝑡 = 8.61
4. Compare the computed t-value Using 𝑑𝑑𝑑𝑑 = 𝑛𝑛 − 2 = 10 − 2 = 8, 𝛼𝛼 = 0.05,
to the critical t-value. two-tailed test, we find from the t-table
that the critical value of t is 2.306.
5. Make a decision. Since the computed 𝑡𝑡 = 8.61 is greater
than the critical value of 𝑡𝑡 = 2.306, we
reject the null hypothesis. So there is

27
significant relationship between the two
variables.
6. Summarize the results. There is sufficient evidence to conclude
that there is significant relationship
between the height of the mother and
the height of the daughter. Thus, we will
proceed to regression analysis.
7. Compute the values of m and b Using the values obtained in step 2, we
in the regression equation 𝑌𝑌 = have the following:
𝑚𝑚𝑚𝑚 + 𝑏𝑏using the following
𝑛𝑛(∑ 𝑋𝑋𝑋𝑋) − (∑ 𝑋𝑋)(∑ 𝑌𝑌)
formulas. 𝑚𝑚 = 2
𝑛𝑛(∑ 𝑋𝑋 2 ) − (∑ 𝑋𝑋)
𝑛𝑛(∑ 𝑋𝑋𝑋𝑋) − (∑ 𝑋𝑋)(∑ 𝑌𝑌)
𝑚𝑚 = 2 10(44947) − (659)(680)
𝑛𝑛(∑ 𝑋𝑋 2 ) − (∑ 𝑋𝑋) 𝑚𝑚 =
10(43601) − (659)2
(∑ 𝑌𝑌)(∑ 𝑋𝑋 2 ) − (∑ 𝑋𝑋)(∑ 𝑋𝑋𝑋𝑋)
𝑏𝑏 = 𝑚𝑚 = 0.78
2
𝑛𝑛(∑ 𝑋𝑋 2 ) − (∑ 𝑋𝑋)
(∑ 𝑌𝑌)(∑ 𝑋𝑋 2 ) − (∑ 𝑋𝑋)(∑ 𝑋𝑋𝑋𝑋)
𝑏𝑏 = 2
𝑛𝑛(∑ 𝑋𝑋 2 ) − (∑ 𝑋𝑋)
(680)(43601) − (659)(44947)
𝑏𝑏 =
10(43601) − (659)2
𝑏𝑏 = 16.55
8. From the regression equation. Substitute the values of m and b in the
equation
𝑌𝑌 = 𝑚𝑚𝑚𝑚 + 𝑏𝑏
𝑌𝑌 = 0.78𝑋𝑋 + 16.55
The regression equation for predicting
the height of the daughter given the
height of the mother is 𝑌𝑌 = 0.78𝑋𝑋 +
16.55
9. Predict the height of the Find the value of Y when X = 78 in the
daughter if the height of the regression equation.
mother is 78 inches.
𝑌𝑌 = 0.78𝑋𝑋 + 16.55
𝑌𝑌 = 0.78(78) + 16.55
𝑌𝑌 = 77.39 𝑜𝑜𝑜𝑜 77 inches
So the predicted height of the daughter
whose mother’s height is 78 inches is 77
inches. Remember that this is just a
predicted value based on the given data.

28
What I Can Do

1. For each of the following pairs of variables, identify the dependent and
independent variables.
a. Hourly rate and monthly salary of a part-time teacher
b. Total time used and amount of electrical energy of a ceiling fan
c. Pressure and depth of water
d. Side and area of a square
e. Cost and age of a vehicle
2. Graph the following regression lines.
a. 𝑌𝑌 = 0.2𝑋𝑋 + 1.5 d. 𝑌𝑌 = −3.6𝑋𝑋 + 7.12
b. 𝑌𝑌 = 2.5𝑋𝑋 + 1.56 e. 𝑌𝑌 = −0.87𝑋𝑋 + 21.8
c. 𝑌𝑌 = 3.4𝑋𝑋 + 0.75
3. For each regression line in No. 2 above, predict Y for the given values of X.
a. 𝑌𝑌 = 0.2𝑋𝑋 + 1.5 , (X = 8) d. 𝑌𝑌 = −3.6𝑋𝑋 + 7.12 , (X = 3)
b. 𝑌𝑌 = 2.5𝑋𝑋 + 1.56 ,(X = 4.5) e. 𝑌𝑌 = −0.87𝑋𝑋 + 1.82, (X = 75)
c. 𝑌𝑌 = 3.4𝑋𝑋 + 0.75 , (X = 7)
4. Survey tests on leadership skills and on self-concept were administered to
student-leaders. Both tests use a 10-point Liker scale with 10 indicating the
highest scores for each test. Scores for the student-leaders on the tests
follow:
Student Code A B C D E F G H I J
Self-concept 9.5 9.2 6.3 4.1 5.4 8.3 7.8 6.8 5.6 7.1
Leadership Skill 9.2 8.8 7.3 3.4 6.0 7.8 8.8 7.0 6.5 8.3
a. Find the regression line that will predict the leadership skill if the self-
concept score is known.
b. Predict the leadership skill of a student leader whose self-concept is 1.5.
5. In an attempt to assess the effect of the student’s score in the admission test
to their academic performance, a survey was conducted to eight randomly
selected students to determine the scores in the admission test and their
grades. The data are as follows:
Student number 1 2 3 4 5 6 7 8
Score in admission test 52 61 86 79 45 58 60 45
Grade 80 78 95 90 75 82 80 70
a. Find the regression line that will predict the grade of the student for a given
score in the admission test.

29
b. What is the expected grade of the student if his/her score in the admission
test is 68?

Key to answer on page 33

What I Have Learned

1. Bivariate Data - pairs of variables in a gathered data


2. Dependent Variable - a variable that is affected by the independent
variable.
3. Independent Variable - a variable that can cause the dependent variable
to change.
4. No Correlation -points are not following a straight line pointing upward nor
downward.
5. Pearson Product Correlation Coefficient (r)
𝑛𝑛(∑ 𝑥𝑥𝑥𝑥) − (∑ 𝑥𝑥)(∑ 𝑦𝑦)
𝑟𝑟 =
√[𝑛𝑛(∑ 𝑥𝑥 2 ) − (∑ 𝑥𝑥)2 ][𝑛𝑛(∑ 𝑦𝑦 2 ) − (∑ 𝑦𝑦)2 ]
Where:
n = number of paired values
∑ 𝑥𝑥 = sum of x – values
∑ 𝑦𝑦 = sum of y – values
∑ 𝑥𝑥 𝑦𝑦 = sum of the products of paired values x and y
∑ 𝑥𝑥 2 = sum of squared x – values
∑ 𝑦𝑦 2 = sum of squared y – values

6. Perfect Positive Correlation – dots are on a straight line and it is pointing


upward to the right.
7. Perfect Negative Correlation – dots are on a straight line and it is pointing
downward to the right.
8. Scatter Plot - a diagram using Cartesian coordinate plane in which two
variables are plotted along two axes.
9. Strong Positive Correlation – dots seem to follow a linear trend line that
pointed upward to the right.
10. Strong Negative Correlation – dots seem to follow a linear trend line that
pointed downward to the right.
11. The value of Pearson r ranges from -1 to +1. If the value of r is exactly +1
then there is a perfect positive correlation and if the value of r is exactly -1
then there is a perfect negative correlation. The value of r close to +1

30
indicates a strong positive correlation and if the value of r close to -1
indicates a strong negative correlation.

Assessment

Directions: Read and analyze the statements below. Encircle the letter of the correct
answer.

1. It refers to a diagram in xy plane in which two variables are plotted along two
axes.
a. variables c. scatter plot
b. independent variable d. dependent variable
2. What is the value of r in the given data below?
X 1 2 3 4 5
Y 24 18 12 28 24
a. -0.21 c. 0.25
b. -0.04 d. 0.62
3. What is the dependent variable from the table in number 2?
a. X c. X,Y
b. Y d. none of them
4. From the given data in number 2, what is the strength of correlation from the
two variables?
a. no correlation c. weak positive correlation
b. negligible positive correlation d. strong positive correlation
5. The points on a graph are all on a straight line and pointing upward to the
right. What is the strength of correlation?
a. perfect negative correlation c. perfect positive correlation
b. strong positive correlation d. strong negative correlation
6. The strength of correlation between the two variables is no correlation. Which
of the following is the direction of the straight line that the points seem to
follow?
a. Points are all on a straight line that points upward to the right
b. Points are all on a straight line that points downward to the right
c. Points seem to follow a straight line that points downward to the right
d. Points are neither following a straight line pointing upward nor a pattern
7. In a regression analysis, what do you call the variable being explained or
predicted?
a. Independent variable c. Discrete variable
b. Dependent variable d. Continuous variable

31
8. In a regression analysis, if the slope is -3.22 and the y-intercept is. 1.35,
which of the following should be the least square regression equation?
a. 𝑌𝑌 = −3.22𝑋𝑋 + 1.35 c. 𝑌𝑌 = 3.22𝑋𝑋 − 1.35
b. 𝑌𝑌 = 1.35𝑋𝑋 − 3.22 d. 𝑌𝑌 = −1.35𝑋𝑋 + 3.22

For numbers 9 – 10, refer to the problem below:


A student conducted a regression analysis between the math grades of
his classmates and the number of times they were absent in the subject. He
found that the regression line that will predict the grade (Y) if the number of
absences (X) is known 𝑌𝑌 = 98.125 − 2.45𝑋𝑋.
9. What is the predicted grade of a student who has no absent?
a. 95 c. 98
b. 97 d. 99
10. Which of the following shows the graph of the line of predictor?
a. Y c. Y

X X
X
b. D. Y
Y

X X

32

You might also like