Professional Documents
Culture Documents
REGRESSION ANALYSIS
3
Learning Concept
What’s In
22
Activity 2: Dependent or Independent?
Directions: Identify the dependent and independent variables in each pair of the
following variables. Complete your answer on the table provided.
Pair 1: Growth of a plant and the amount of sunlight and water
Pair 2: Monthly salary and annual income of an employee
Pair 3: Amount of time spent in studying and the scores in Statistics quiz
Pair 4: Ticket sales in raffle draw and the number of tickets sold
Pair 5: Price of commodities and the demand
Pair Dependent Variable Independent Variable
1
2
3
4
5
Then, place each variable on the space provided.
(Dependent Variable) (Independent Variable)
________________________ depends upon ______________________________
________________________ depends upon ______________________________
________________________ depends upon ______________________________
________________________ depends upon ______________________________
________________________ depends upon ______________________________
What is it
When two variables are related, one is the dependent variable while the other
is the independent variable. To identify which is the dependent and independent
variable, put each one on the blank in the statement: _________ depends upon
_________, then evaluate whether the statement is logical.
In a scatterplot, we can draw the trend line if there is an evident correlation
between the bivariate data. We also learned that the trend line is the line “closest” to
the points in the scatterplot.
When the trend line is drawn, we observe that some of the points are on the
line while others are below or above the line. In other words, we say that the points in
the scatterplot regress with reference to the line. If the average (Y) distance of the
points from this line is the least, then we call this line the regression line or the line
that “best fit” in the scatterplot. The regression line is the same as the trend line.
23
To find the regression line, we use the least-square method, which is
summarized using a formula. Like the equation of a line in Algebra, we write the
equation of the regression line using the “point-slope-form.”
Formula for linear regression:
𝑌𝑌 = 𝑚𝑚𝑚𝑚 + 𝑏𝑏
Where m is the slope and b is the y-intercept. We use the Pearson Product-
Moment Correlation or Pearson’s r correlation coefficient as the slope of the equation,
or that
𝑛𝑛(∑ 𝑋𝑋𝑋𝑋)−(∑ 𝑋𝑋)(∑ 𝑌𝑌)
𝑚𝑚 = 2
𝑛𝑛(∑ 𝑋𝑋 2 )−(∑ 𝑋𝑋)
The regression line 𝑌𝑌 = 𝑚𝑚𝑚𝑚 + 𝑏𝑏 is also called the line of prediction equation
because we use it to predict Y if X is known. Since in the analysis, only the Y distance
was considered, the line cannot be used to predict X from Y.
Example 1
The following data shows the number of tardiness and the number of performance
tasks missed by five students. If there is a significant relationship between the two
variables, predict the number of performance tasks missed by a student who was late
for 6 days.
24
5 4 4
Solution:
Steps Solution
1. Identify the dependent and Here, the dependent variable is the
independent variables. number of missed performance tasks
while the independent variable is the
number of tardiness.
2. Compute the correlation Let us put the data in columns and find
coefficient (r) using the formula: the following: ∑ 𝑋𝑋, ∑ 𝑌𝑌, ∑ 𝑋𝑋 2 , ∑ 𝑌𝑌 2 , ∑ 𝑋𝑋𝑋𝑋
𝑛𝑛(∑ 𝑋𝑋𝑋𝑋) − (∑ 𝑋𝑋)(∑ 𝑌𝑌) and substitute them in the formula:
𝑟𝑟 =
√[𝑛𝑛(∑ 𝑋𝑋 2 ) − (∑ 𝑋𝑋)2 ][𝑛𝑛(∑ 𝑌𝑌 2 ) − (∑ 𝑌𝑌)2 ] 𝑋𝑋 2 𝑌𝑌 2
1 1 1 1 1
1 2 1 4 2
2 4 4 16 8
3 2 9 4 6
4 4 16 16 16
∑X= 11 ∑Y= 13 ∑𝑿𝑿𝟐𝟐 = 31 ∑𝒀𝒀𝟐𝟐= 41 ∑XY= 33
5(33) − (11)(13)
𝑟𝑟 =
√[5(31) − (11)2 ] [5(41) − (13)2 ]
𝑟𝑟 = 0.63
𝑡𝑡 = 1.41
25
5. Make a decision. Since the computed 𝑡𝑡 = 1.41 is less
than the critical value of 𝑡𝑡 = 3.182, we
accept the null hypothesis. So there is
no significant relationship between the
two variables.
(Note: If the computed t-value is equal to or
greater than the critical value, reject the null
hypothesis then accept the alternative
hypothesis.)
Example 2
The following data pertains to the height of the mothers and their eldest daughters in
inches. If there is a significant relationship between the two variables, predict the
height of the daughter if the height of her mother is 78 inches.
Solution:
Steps Solution
1. Identify the dependent and Here, the dependent variable is the
independent variables. height of the daughter while the
independent variable is the height of the
mother.
26
2. Compute the correlation Let us put the data in columns and find
coefficient (r) using the formula: the following: ∑ 𝑋𝑋, ∑ 𝑌𝑌, ∑ 𝑋𝑋 2 , ∑ 𝑌𝑌 2 , ∑ 𝑋𝑋𝑋𝑋
𝑛𝑛(∑ 𝑋𝑋𝑋𝑋) − (∑ 𝑋𝑋)(∑ 𝑌𝑌) and substitute them in the formula:
𝑟𝑟 =
𝑋𝑋 2 𝑌𝑌 2
√[𝑛𝑛(∑ 𝑋𝑋 2 ) − (∑ 𝑋𝑋)2 ][𝑛𝑛(∑ 𝑌𝑌 2 ) − (∑ 𝑌𝑌)2 ]
71 71 5041 5041 5041
𝑟𝑟
10(44947) − (659)(680)
=
√[10(43601) − (659)2 ] [10(46356) − (680)2 ]
𝑟𝑟 = 0.95
3. Test the significance of r using Here 𝑛𝑛 = 10 and 𝑟𝑟 = 0.95
the formula:
𝑛𝑛 − 2
𝑡𝑡 = 𝑟𝑟 √
𝑛𝑛 − 2 1 − 𝑟𝑟 2
𝑡𝑡 = 𝑟𝑟 √
1 − 𝑟𝑟 2
10 − 2
𝑡𝑡 = 0.95 √
1 − (0.95)2
𝑡𝑡 = 8.61
4. Compare the computed t-value Using 𝑑𝑑𝑑𝑑 = 𝑛𝑛 − 2 = 10 − 2 = 8, 𝛼𝛼 = 0.05,
to the critical t-value. two-tailed test, we find from the t-table
that the critical value of t is 2.306.
5. Make a decision. Since the computed 𝑡𝑡 = 8.61 is greater
than the critical value of 𝑡𝑡 = 2.306, we
reject the null hypothesis. So there is
27
significant relationship between the two
variables.
6. Summarize the results. There is sufficient evidence to conclude
that there is significant relationship
between the height of the mother and
the height of the daughter. Thus, we will
proceed to regression analysis.
7. Compute the values of m and b Using the values obtained in step 2, we
in the regression equation 𝑌𝑌 = have the following:
𝑚𝑚𝑚𝑚 + 𝑏𝑏using the following
𝑛𝑛(∑ 𝑋𝑋𝑋𝑋) − (∑ 𝑋𝑋)(∑ 𝑌𝑌)
formulas. 𝑚𝑚 = 2
𝑛𝑛(∑ 𝑋𝑋 2 ) − (∑ 𝑋𝑋)
𝑛𝑛(∑ 𝑋𝑋𝑋𝑋) − (∑ 𝑋𝑋)(∑ 𝑌𝑌)
𝑚𝑚 = 2 10(44947) − (659)(680)
𝑛𝑛(∑ 𝑋𝑋 2 ) − (∑ 𝑋𝑋) 𝑚𝑚 =
10(43601) − (659)2
(∑ 𝑌𝑌)(∑ 𝑋𝑋 2 ) − (∑ 𝑋𝑋)(∑ 𝑋𝑋𝑋𝑋)
𝑏𝑏 = 𝑚𝑚 = 0.78
2
𝑛𝑛(∑ 𝑋𝑋 2 ) − (∑ 𝑋𝑋)
(∑ 𝑌𝑌)(∑ 𝑋𝑋 2 ) − (∑ 𝑋𝑋)(∑ 𝑋𝑋𝑋𝑋)
𝑏𝑏 = 2
𝑛𝑛(∑ 𝑋𝑋 2 ) − (∑ 𝑋𝑋)
(680)(43601) − (659)(44947)
𝑏𝑏 =
10(43601) − (659)2
𝑏𝑏 = 16.55
8. From the regression equation. Substitute the values of m and b in the
equation
𝑌𝑌 = 𝑚𝑚𝑚𝑚 + 𝑏𝑏
𝑌𝑌 = 0.78𝑋𝑋 + 16.55
The regression equation for predicting
the height of the daughter given the
height of the mother is 𝑌𝑌 = 0.78𝑋𝑋 +
16.55
9. Predict the height of the Find the value of Y when X = 78 in the
daughter if the height of the regression equation.
mother is 78 inches.
𝑌𝑌 = 0.78𝑋𝑋 + 16.55
𝑌𝑌 = 0.78(78) + 16.55
𝑌𝑌 = 77.39 𝑜𝑜𝑜𝑜 77 inches
So the predicted height of the daughter
whose mother’s height is 78 inches is 77
inches. Remember that this is just a
predicted value based on the given data.
28
What I Can Do
1. For each of the following pairs of variables, identify the dependent and
independent variables.
a. Hourly rate and monthly salary of a part-time teacher
b. Total time used and amount of electrical energy of a ceiling fan
c. Pressure and depth of water
d. Side and area of a square
e. Cost and age of a vehicle
2. Graph the following regression lines.
a. 𝑌𝑌 = 0.2𝑋𝑋 + 1.5 d. 𝑌𝑌 = −3.6𝑋𝑋 + 7.12
b. 𝑌𝑌 = 2.5𝑋𝑋 + 1.56 e. 𝑌𝑌 = −0.87𝑋𝑋 + 21.8
c. 𝑌𝑌 = 3.4𝑋𝑋 + 0.75
3. For each regression line in No. 2 above, predict Y for the given values of X.
a. 𝑌𝑌 = 0.2𝑋𝑋 + 1.5 , (X = 8) d. 𝑌𝑌 = −3.6𝑋𝑋 + 7.12 , (X = 3)
b. 𝑌𝑌 = 2.5𝑋𝑋 + 1.56 ,(X = 4.5) e. 𝑌𝑌 = −0.87𝑋𝑋 + 1.82, (X = 75)
c. 𝑌𝑌 = 3.4𝑋𝑋 + 0.75 , (X = 7)
4. Survey tests on leadership skills and on self-concept were administered to
student-leaders. Both tests use a 10-point Liker scale with 10 indicating the
highest scores for each test. Scores for the student-leaders on the tests
follow:
Student Code A B C D E F G H I J
Self-concept 9.5 9.2 6.3 4.1 5.4 8.3 7.8 6.8 5.6 7.1
Leadership Skill 9.2 8.8 7.3 3.4 6.0 7.8 8.8 7.0 6.5 8.3
a. Find the regression line that will predict the leadership skill if the self-
concept score is known.
b. Predict the leadership skill of a student leader whose self-concept is 1.5.
5. In an attempt to assess the effect of the student’s score in the admission test
to their academic performance, a survey was conducted to eight randomly
selected students to determine the scores in the admission test and their
grades. The data are as follows:
Student number 1 2 3 4 5 6 7 8
Score in admission test 52 61 86 79 45 58 60 45
Grade 80 78 95 90 75 82 80 70
a. Find the regression line that will predict the grade of the student for a given
score in the admission test.
29
b. What is the expected grade of the student if his/her score in the admission
test is 68?
30
indicates a strong positive correlation and if the value of r close to -1
indicates a strong negative correlation.
Assessment
Directions: Read and analyze the statements below. Encircle the letter of the correct
answer.
1. It refers to a diagram in xy plane in which two variables are plotted along two
axes.
a. variables c. scatter plot
b. independent variable d. dependent variable
2. What is the value of r in the given data below?
X 1 2 3 4 5
Y 24 18 12 28 24
a. -0.21 c. 0.25
b. -0.04 d. 0.62
3. What is the dependent variable from the table in number 2?
a. X c. X,Y
b. Y d. none of them
4. From the given data in number 2, what is the strength of correlation from the
two variables?
a. no correlation c. weak positive correlation
b. negligible positive correlation d. strong positive correlation
5. The points on a graph are all on a straight line and pointing upward to the
right. What is the strength of correlation?
a. perfect negative correlation c. perfect positive correlation
b. strong positive correlation d. strong negative correlation
6. The strength of correlation between the two variables is no correlation. Which
of the following is the direction of the straight line that the points seem to
follow?
a. Points are all on a straight line that points upward to the right
b. Points are all on a straight line that points downward to the right
c. Points seem to follow a straight line that points downward to the right
d. Points are neither following a straight line pointing upward nor a pattern
7. In a regression analysis, what do you call the variable being explained or
predicted?
a. Independent variable c. Discrete variable
b. Dependent variable d. Continuous variable
31
8. In a regression analysis, if the slope is -3.22 and the y-intercept is. 1.35,
which of the following should be the least square regression equation?
a. 𝑌𝑌 = −3.22𝑋𝑋 + 1.35 c. 𝑌𝑌 = 3.22𝑋𝑋 − 1.35
b. 𝑌𝑌 = 1.35𝑋𝑋 − 3.22 d. 𝑌𝑌 = −1.35𝑋𝑋 + 3.22
X X
X
b. D. Y
Y
X X
32