You are on page 1of 5

CHAPTER EIGHT

8 SIMPLE LINEAR REGRESSION AND CORRELATION

Regression analysis is a conceptually simple method for investigating functional relationships


among variables.
Regression may be defined as the estimation or prediction of the unknown value of one
variable from the known values of one or more variables. The variable whose values are to be
estimated or predicted is known as dependent or explained variable while the variable which
are used in determining the value of the dependent variable are called independent or
predictor variables.

The regression study that involves only two variables is called simple regression and the
regression analysis that studies more than two variables is called multiple regression. If the
relationship between the two variables can be described by a straight line, then the regression
is known as linear regression otherwise it is called non-linear.

The regression analysis involving only two variables and having a linear relationship is called
Simple Linear Regression. This linear relationship between the two variables is represented
by a straight line.

Regression Line (Line of Regression): is the line that gives the best estimate of one variable
for any given value of another variable. The regression line which is used to predict the
values of Y for any given value of X is called regression line of Y on X. similarly the
regression line which is used to predict the values of X for any given value of Y is called
regression line of X on Y.
Regression Equation: is a mathematical equation that defines the relationship between two
variables.

Regression of Y on X
Model: Y= α + βX + ε
Where Y is the dependent variable
X is the independent variable
α is constant term(intercept)
β is slope (change in Y for a unit change in X)
ε is the error term
To estimate the regression coefficients (α and β), the procedure is minimizing the sum of the
squares of the errors. Let the estimated model be Y^ = a + bX. Then, from sample data the
values of a (estimate of α) and b (estimate of β) can be obtained as follows:
n ∑ XY −∑ X ∑ Y
∑ 2
∑ 2
b= n X −( X ) and a= Ȳ -b X̄ .
Interpretation of the slope (b)
1. If b is positive, there is a direct relationship between the two variables.
2. If b is zero, there is no linear relationship between the two variables.
3. If b is negative, there is an indirect relationship between the two variables.

example

Advertising budget (X) 5 6 7 8 9 10 11


Profit(Y) 8 7 9 10 13 12 13

So, if we draw a line, the regression line is one that passes through almost all or closest to all
points in the scatter diagram.
2 2
Advertising Profit(Y) XY X Y
budget (X)

5 8 40 25 64

6 7 42 36 49

7 9 63 49 81

8 10 80 64 100

9 13 117 81 169

10 12 120 100 144

11 13 143 121 169

∑ X =56 ∑ Y =72 ∑ XY =605 ∑ X 2=¿ ¿ ∑ Y 2=¿ ¿77


476 6

n ∑ XY −∑ X ∑ Y 7∗605−56∗72 4235−4032 203


b= n ∑ X −( ∑ X ) = 7∗476−56 = 3332−3136 = 196 =1.036
2 2 2

b=1.036
Hence b is positive, there is a direct relationship between the Advertising budget and profit.
.
a= Ȳ -b X̄ =72/7-1.036*56/7=10.285-1.036*8=10.285-8.288=1.998
hence Y^ =a+bX=1.998+1.036X

1.1. Correlation

Most of the variables in economics and business area show relationship. For example, price
and supply, income and expenditure, advertising expenditure and sales. Thus, in order to
know the degree or direction of such a relationship between variables, correlation analysis is
important. Correlation is a mathematical tool desired towards measuring the degree of the
relationship (degree of association) between the variables. Correlation that involves only two
variables is called simple correlation and which involves more than two variables is called
multiple correlations.
Covariance is a measure of the joint variation in two variables, i.e. it measures the way in
which the values of the two variables vary together. If the covariance is zero, there is no
linear relationship between the two variables. If it is negative, there is an indirect linear
relationship between them. If the covariance is positive, there is a direct linear relationship
between the variables.
Pearson’s Coefficient of Correlation (r)
Pearson’s coefficient of correlation (r) is used to measure the strength of the linear
relationship between two variables.
The population correlation coefficient is denoted by ρ and the sample correlation coefficient
is denoted by r.
n ∑ XY −∑ X ∑ Y

r= n √ ∑ X 2−( ∑ X )2 √ n ∑ Y 2−( ∑ Y )2

The value of r is always in between -1 and 1.

Interpretation of r
 If the value of r is -1 or 1, there is perfect negative or perfect positive linear
relationship between the variables.
 If the value of r is approximately -1 or 1, there is a strong negative or strong positive
linear relationship between the variables.
 If r is -0.5 (or approximately -0.5) or 0.5 (or approximately 0.5), there is moderate
negative or moderate positive linear relationship between the variables.
 If r¿ 0, there is no linear relationship.
Covariance of X and Y measures the co-variability of X and Y together. It is denoted
by
S XY and given by

SX =
∑ ( X i− X̄ )(Y i −Ȳ ) = ∑ XY −n X̄ Ȳ
Y n−1 n−1
Next, we will see the relationship between the coefficients.

S
S X2
r = XY ⇒ r 2= Y
S X SY S 2 S 2
X Y

bS X rS Y
r= ⇒ b=
SY SX

where S X is the standard deviation of the X values and SY is the standard deviation of the Y
values.
Coefficient of determination (r2)
It is the proportion of the variation in the dependent variable which is explained by the
independent variable, in the regression model. It is the square of the correlation coefficient.

1. Given the following data on supply (X) and sales (Y) of a certain commodity

Supply 60 62 6 70 7 75 71
(X) 5 3
Sales (Y) 10 11 1 15 1 19 14
3 6

a. Estimate the regression equation supply on sales.


b. Interpret the estimated coefficient (the slope).
c. Calculate the correlation coefficient between supply and sales, and interpret it.
d. Find the coefficient of determination and interpret it.
e. Predict the amount of sales of the commodity if the supply amount is 80.
2. The following summary results are obtained from price and demand of a commodity
∑price=30 ∑demand=40 ∑(price)(demand)=214
2 2
∑(price) =220 ∑(demand) =340 n=5
a. Identify the dependent and independent variable.
b. Estimate the regression equation.
c. Interpret the estimated coefficients.
d. Calculate the correlation coefficient between price and demand, and interpret
it.
e. Find the coefficient of determination and interpret it.

Regression analysis is useful in predicting the value of one variable from the given values of
another variable.
Example: A researcher wants to find out if there is any relationship b/n height of the son and
his father. He took random sample 6 fathers and their sons. The height in inch is given in the
table bellow
(i) Find the regression line of Y on X
(ii) What would be the height of the son if his father’s height is 70 inch?
Height of father (X) 63 65 66 67 67 68
Height of the son (Y) 60 70 72 75 73 77

Solution: ∑ X=396 , ∑ Y =427 ,∑ X =26152 ,∑ XY =28234 ,∑ Y


2 2
=30567

∑x∑ y
∑ xy− n 6(28234 )−(396 )(427 )
b= = =3 .25
2 (∑ x)
(396)
2 2
∑ x −
n
6(26152 )−

a= ȳ−b x̄ =
∑ Y −b ∑ X = 425−(3 .25 )(396 ) =−143 . 33
(i) n 6
 Y=-143.33+3.25X
If X=70, then Y=-143.33+3.25X(70) =84.2, thus the height of the son
is 84.2 inch
2
2
S
3. Given n=25, X̄ =3.95, Ȳ =2.03, S x =85.35, S y =98.75, xy =90
a. Fit the regression equation Y on X.
b. Interpret the estimated coefficients.
c. Calculate the correlation coefficient and interpret it.
d. Find the coefficient of determination and interpret it.

You might also like