You are on page 1of 7

REGRESSION ANALYSIS

Definition
Independent Variable is the variable that causes dependent variable to change and
can be controlled or manipulated. It is also called an explanatory variable or a predictor
variable.

Dependent Variable is the variable that is influenced or affected by the independent variable.
It is also called a response variable.

Simple Regression is a simple relationship analysis where there is one independent variable
that is used to predict the dependent variable.

Examples

Situation 1:
You want to test a new dosage of drug that supposedly prevents sneezing in people.

Independent variable (x-axis):


New dosage of drug

Dependent variable (y-axis):


Sneezing

Situation 2:
A soap manufacturer wants to prove that a little amount of detergent can remove greater
amount of stain.
Independent variable (x-axis):
Amount of detergent

Dependent variable (y-axis):


Amount of stain removed

Note

If the value of the correlation coefficient is significant, the next step is to determine the
equation of the regression line, which is the data’s line of best fit. The purpose of the
regression line is to enable the researcher to see the trend and make predictions on
the basis of the data.

NDDU-IBED-F-081
FORMULAS FOR THE REGRESSION LINE 𝒚′ = 𝒂 + 𝒃𝒙

Slope
n( xy ) − ( x )( y )
b=
( )
n  x 2 − ( x )
2

y-intercept

( y )( x ) − ( x)( xy )
2

a=
n( x ) − ( x )
2 2

Note
Rounding Rule for the Intercept and Slope
Round the values of a and b to three decimal places

Note
The magnitude of the change in one variable when the other variable changes exactly
1 unit is called a marginal change. The value of slope b of the regression line equation
represents the marginal change.

The y-intercept (that is, the point when x = 0) also refers to the starting value.

NDDU-IBED-F-081
Example 1

Consider the following data, the number of ads and sales,

No. of ads 𝒙 2 5 8 8 10 12
Sales $ 𝒚 2 4 7 6 9 10

a) complete the table,


b) calculate the slope and y- intercept of the regression line,
c) interpret the calculated slope and y-intercept of the regression line,
d) write the equation of the regression line,
e) Find y’ when 𝑥 = 7 and 𝑥 = 3 ads.

Solution:
a)
No. of ads 𝒙 Sales in $ 𝒚 𝒙𝟐 𝒙𝒚
2 2 4 4
5 4 25 20
8 7 64 56
8 6 64 48
10 9 100 90
12 10 144 120
∑ 𝒙 = 𝟒𝟓 ∑ 𝒚 = 𝟑𝟖 ∑ 𝒙𝟐 = 𝟒𝟎𝟏 ∑ 𝒙𝒚 = 𝟑𝟑𝟖

b)
n( xy ) − ( x )( y ) 6(338) − (45)(38)
b= = = 0.835
( )
n  x 2 − ( x )
2
6(401) − (45) 2

a=
( y )( x ) − ( x)( xy ) = (38)(401) − (45)(338) = 0.073
2

n( x ) − ( x ) 6(401) − (45)


2 2 2

c)
The slope of the regression line is 0.835, which means for each increase of ads, the value of
sales changes 0.835 unit ($) on average.

Since the y-intercept is 0.073, this could mean that when there are no ads, the starting sales
could be 0.073$.

NDDU-IBED-F-081
d)
𝒚′ = 𝟎. 𝟎𝟕𝟑 + 𝟎. 𝟖𝟑𝟓𝒙

e)
𝒙 = 𝟕 𝒂𝒅𝒔

𝒚′ = 𝟎. 𝟎𝟕𝟑 + 𝟎. 𝟖𝟑𝟓(𝟕)
𝒚′ = 𝟓. 𝟗𝟏𝟖$

𝒙 = 𝟑 𝒂𝒅𝒔

𝒚′ = 𝟎. 𝟎𝟕𝟑 + 𝟎. 𝟖𝟑𝟓(𝟑)
𝒚′ = 𝟐. 𝟓𝟕𝟖$

Example 2

The number of faculty and the number of students in a random selection of small colleges are
shown below.

Faculty 𝒙 99 110 113 116 138 174 220


Students 𝒚 1353 1290 1091 1213 1384 1283 2075

a) complete the table,


b) calculate the slope and y- intercept of the regression line,
c) interpret the calculated slope and y-intercept of the regression line,
d) write the equation of the regression line,
e) Find y’ when 𝑥 = 1100 and 𝑥 = 1500 students.

Solution:
a)
Students 𝒙 Faculty 𝒚 𝒙𝟐 𝒙𝒚
1,353 99 1,830,609 133,947
1,290 110 1,664,100 141,900
1,091 113 1,190,281 123,283
1,213 116 1,471,369 140,708
1,384 138 1,915,456 190,992
1,283 174 1,646,089 223,242
2,075 220 4,305,625 456,500
∑ 𝒙 = 𝟗, 𝟔𝟖𝟗 ∑ 𝒚 = 𝟗𝟕𝟎 ∑ 𝒙𝟐 = 𝟏𝟒, 𝟎𝟐𝟑, 𝟓𝟐𝟗 ∑ 𝒙𝒚 = 𝟏, 𝟒𝟏𝟎, 𝟓𝟕𝟐

NDDU-IBED-F-081
b)
n( xy ) − ( x )( y ) 7(1,410,572) − (9,689)(970)
b= = = 0.111
( )
n  x 2 − ( x )
2
7(14,023,529) − (9,689) 2

a=
( y )( x ) − ( x)( xy ) = (970)(14,023,529) − (9,689)(1,410,572) = −14.974
2

n( x ) − ( x ) 7(14,023,529) − (9,689)


2 2 2

c)
The slope of the regression line is 0.111, which means for each increase of the number of
students, the value of y changes 0.111 unit (number of faculty) on average.

Since the y-intercept is -14.974, this could mean that when there are no students, there are no
faculty or could be −14.974 ≈ −20.

d)
𝒚′ = −𝟏𝟒. 𝟗𝟕𝟒 + 𝟎. 𝟏𝟏𝟏𝒙

e)
𝒙 = 𝟏𝟏𝟎𝟎 𝒔𝒕𝒖𝒅𝒆𝒏𝒕𝒔

𝒚′ = −𝟏𝟒. 𝟗𝟕𝟒 + 𝟎. 𝟏𝟏𝟏(𝟏𝟏𝟎𝟎)


𝒚′ = 𝟏𝟎𝟕. 𝟏𝟐𝟔 ≈ 𝟏𝟎𝟕 𝒇𝒂𝒄𝒖𝒍𝒕𝒚

𝒙 = 𝟏𝟓𝟎𝟎 𝒔𝒕𝒖𝒅𝒆𝒏𝒕𝒔

𝒚′ = −𝟏𝟒. 𝟗𝟕𝟒 + 𝟎. 𝟏𝟏𝟏(𝟏𝟓𝟎𝟎)


𝒚′ = 𝟏𝟓𝟏. 𝟓𝟐𝟔 ≈ 𝟏𝟓𝟐 𝒇𝒂𝒄𝒖𝒍𝒕𝒚

NDDU-IBED-F-081
Example 3

A study on the number of absences and the final grades of seven randomly selected students from
a statistics class. The data are shown here.

Number of 6 2 15 9 12 5 8
absences 𝒙
Final Grades 82 86 43 74 58 90 78
(%) 𝒚

a) complete the table,


b) calculate the slope and y- intercept of the regression line,
c) interpret the calculated slope and y-intercept of the regression line,
d) write the equation of the regression line,
e) Find y’ when 𝑥 = 4 and 𝑥 = 11 absences.

Solution:
a)
Number of Final Grades (%) 𝒙𝟐 𝒙𝒚
absences 𝒙 𝒚
6 82 36 492
2 86 4 172
15 43 225 645
9 74 81 666
12 58 144 696
5 90 25 450
8 78 64 624
∑ 𝒙 = 𝟓𝟕 ∑ 𝒚 = 𝟓𝟏𝟏 ∑ 𝒙𝟐 = 𝟓𝟕𝟗 ∑ 𝒙𝒚 = 𝟑, 𝟕𝟒𝟓

b)
n( xy ) − ( x )( y ) 7(3,745) − (57)(511)
b= = = −3.622
( )
n  x 2 − ( x )
2
7(579) − (57) 2

a=
( y )( x ) − ( x)( xy ) = (511)(579) − (57)(3,745) = 102.493
2

n( x ) − ( x ) 7(579) − (57)


2 2 2

NDDU-IBED-F-081
c)
The slope of the regression line is -3.622, which means for each increase of the number of
absences, the value of y changes −3.622 unit (grades in %) on average.

Since the y-intercept is 102.493, this could mean that when there are no absences, the starting
final grade could be 102.493. (Note: This is just an estimate).

d)
𝒚′ = 𝟏𝟎𝟐. 𝟒𝟗𝟑 − 𝟑. 𝟔𝟐𝟐𝒙

e)
𝒙 = 𝟒 𝒂𝒃𝒔𝒆𝒏𝒄𝒆𝒔

𝒚′ = 𝟏𝟎𝟐. 𝟒𝟗𝟑 − 𝟑. 𝟔𝟐𝟐(𝟒)


𝒚′ = 𝟖𝟖. 𝟎𝟎𝟓 ≈ 𝟖𝟖%

𝒙 = 𝟏𝟏 𝒂𝒃𝒔𝒆𝒏𝒄𝒆𝒔

𝒚′ = 𝟏𝟎𝟐. 𝟒𝟗𝟑 − 𝟑. 𝟔𝟐𝟐(𝟏𝟏)


𝒚′ = 𝟔𝟐. 𝟔𝟓𝟏 ≈ 𝟔𝟑%

Integration
Do you think it is important to be prepared for the future? Why or why not?

The future belongs to those who prepare for it today.

NDDU-IBED-F-081

You might also like