You are on page 1of 6

LINEAR REGRESSION AND CORRELATION ANALYSIS

Overview

In characterizing the association between two variables, there is a need for


statistical procedure that can simultaneously handle these two variables.

Regression analysis describes the effect of one or more variables (designated as


independent variables) on a single variable (designated as dependent variable) by
expressing the latter as a function of the former. In this analysis, it is important to
distinguish between the dependent and independent variables.

Correlation analysis, on the other hand, provides a measure of the degree of


association between the variables;

In this area, discussion would be limited only to simple linear regression and
simple linear correlation.

Objectives

At the end of this area, the students should be able to:

a. Define simple linear regression and simple linear correlation.

b. Performs the computation for simple linear regression and simple linear
correlation analysis.

c. Make the necessary interpretation of the result of the analysis

Simple Linear Regression

Linear regression analyzes the relationship between two variables, X and Y. For
each subject (or experimental unit), you know both X and Y and you want to find the best
straight line through the data. In some situations, the slope and/or intercept have a
scientific meaning. In other cases, you use the linear regression line as a standard curve
to find new values of X from Y, or Y from X.

For simple linear regression analysis to be applicable, the following conditions


must hold:

a. There is only one independent variable X affecting the dependent variable Y.

b. The relationship between Y and X is known, or can be assumed, to be linear.

The simple linear regression analysis deals with the estimation and test of
significance concerning the two parameters α and β in the equation:

Y = α + βX
The estimated linear regression is given by:

Y^ =a+bX
where:

a= the intercept point of the regression line and the y axis. It


is calculated through the equation a= ȳ−b x̄ ;
therefore, the means of both variables in the sample and
the value of b must be known before a can be calculated.

b= the slope of the regression line and is calculated by this


formula:

n ∑ xy −( ∑ x )( ∑ y )
b=
n ∑ x 2 −( ∑ x )2

The value of the slope could be either positive or negative. Positive slope
indicates that an increase in the independent causes an increase in the dependent
variable. A negative slope indicates that an increase in the independent causes a
decrease in the dependent variable

Activity

A college bookstore must order books two months before each semester starts.
They believe that the number of books that will ultimately be sold for any particular
course is related to the number of students registered for the course when the books are
ordered. They would like to develop a linear regression equation to help plan how many
books to order. From past records, the bookstore obtains the number of students
registered, x, and the number of books actually sold for a course, y, for 12 different
semesters. These data are below.

Number of Number of
Semester students books sold
(x) (y) x2 xy y2
1 36 31 1296 1116 961
2 28 29 784 812 841
3 35 34 1225 1190 1156
4 39 35 1521 1365 1225
5 30 29 900 870 841
6 30 30 900 900 900
7 31 30 961 930 900
8 38 38 1444 1444 1444
9 36 34 1296 1224 1156
10 38 33 1444 1254 1089
11 29 29 841 841 841
12 26 26 676 676 676
Total 396 378 13288 12622 12030
Mean 33.0 31.5
A. Computational Procedure

a. calculate the components of the regression equation beginning with the slope
(b).

n ∑ xy −( ∑ x )( ∑ y )
b=
n ∑ x 2 −( ∑ x )2

12( 12622)−(396 )( 378 )


b=
12( 13288)−( 396)2

151464−149688
b=
159456−156816

1776
b=
2640

b=0 .6727

This gives us the slope of the regression line which means that for each
additional student registered for a course, the number of books sold increases by 0.6727

b. calculate y-intercept (a).

a= ȳ−b x̄
a=31. 5− 0. 6727( 33 . 0)

a=31. 5−22 . 2

a=9 .30

This means that when no students have registered for a course, the number of
books sold is 9.30 (or about 9).

Thus the estimated linear regression is

Y^ =a+bX

^y =9 . 30+0 . 6727 x
c. plot the observed points and draw the graphical representation of the
estimated regression equation

^y min= 9. 30+0 . 6727( 0 )=9 . 30


^y max =9. 30+0 . 6727( 39 )=35 .54

Testing the significance of β (slope)

a. State the null and appropriate alternative hypothesis


Ho: β = 0
Ho: β ≠ 0

b. Critical values. From the t distribution table, find the critical values at 0.05 and
0.01 level of significant with df = n – 2.

t 0. 05( 10 )=2 . 2282


t 0. 01( 10 )=3 . 1693

c. Computational procedure
2
y.x
Compute the residual mean square (s )

[ n ∑ xy−( ∑ x )( ∑ y )]2
n ∑ y 2−( ∑ y )2 −
2 n ∑ x 2 −( ∑ x )2
s =
n−2
[ 12(12622 )−( 396 )( 388 )] 2
12( 12030)−( 378)2 −
12( 13288)−( 396 )2
s2=
12−2

[ 1776 ]2
1476−
2640
s2=
10
1476−1194 .7636
s2=
10

281. 2364
s2=
10
2
s =28 .1236

compute the tb value

b
t b=
2

√ n∑
s y. x
x2 −( ∑ x )2

0. 6727
t b=
28 .1236
√ 2640

0 . 6727
t b=
√ 0 . 0107
0 . 6727
t b=
0. 1032

t b=6 .5179

From the t distribution table, the t values at 0.05 and 0.01 level of significance
with (n – 2) = 10 degrees of freedom are 2.2282 and 3.1693 respectively. Because the
computed t b value is greater than tabular value at 0.01 level of significance, reject H o.
The linear response of the number of books sold to the change in the number of
students enrolled, within the range of 26 to 39 students, is highly significant.
Progress Check

1. When an anthropologist finds skeletal remains, they need to figure out


the height of the person. The height of a person (in cm) and the length of their
metacarpal bone (in cm) were collected and are presented below.

Length of Height of
Sample metacarpal person
(cm) (cm)
1 45 171
2 51 178
3 39 157
4 41 163
5 48 172
6 49 183
7 46 173
8 43 175
9 47 173
10 50 168
11 45 175
12 44 167
13 46 170
14 42 162
15 48 177

a. Determine the equation of the line of length of metacarpal and height of person
b. Test the significance of β

You might also like