You are on page 1of 25

DR

CHAPTER-15
D R D E E PA K C H AW L A

CORRELATION AND REGRESSION ANALYSIS


NEENA SONDHI

RESEARCH CONCEPTS AND


SLIDE
SLIDE 7-1
15-1

What is Correlation?
DR

Correlation measures the degree of


association between two or more variables.
D R D E E PA K C H AW L A

When we are dealing with two variables, we


are talking in terms of simple correlation and
NEENA SONDHI

when more than two variables are involved,


the subject matter of interest is called multiple
correlation.

RESEARCH CONCEPTS AND


SLIDE 15-2

Types of Correlation
DR

Positive correlation - When two variables X and


Y move in the same direction, the correlation
between the two is positive.
D R D E E PA K C H AW L A

Negative correlation: When two variables X and


Y move in the opposite direction, the correlation is
NEENA SONDHI

negative.
Zero correlation: The correlation between two
variables X and Y is zero when the variables
move in no connection with each other.

RESEARCH CONCEPTS AND


SLIDE 15-3

Graphical Presentation of Positive


Correlation
DR
D R D E E PA K C H AW L A
NEENA SONDHI

RESEARCH CONCEPTS AND


SLIDE 15-4

Graphical Presentation of Negative


Correlation
DR
D R D E E PA K C H AW L A
NEENA SONDHI

RESEARCH CONCEPTS AND


SLIDE 15-5

Graphical Presentation of Zero


Correlation
DR
D R D E E PA K C H AW L A
NEENA SONDHI

RESEARCH CONCEPTS AND


SLIDE 15-6

Quantitative Estimate of a Linear


Correlation
DR

 A quantitative estimate of a linear correlation between


two variables X and Y is given by Karl Pearson as:
D R D E E PA K C H AW L A
NEENA SONDHI

RESEARCH CONCEPTS AND


SLIDE 15-7

Quantitative Estimate of a Linear


Correlation
DR

 The linear correlation coefficient takes a value


between –1 and +1 (both values inclusive).
 If the value of the correlation coefficient is equal to 1,
D R D E E PA K C H AW L A

the two variables are perfectly positively correlated


and the scatter of the points of the variables X and Y
NEENA SONDHI

will lie on a positively sloped straight line.


 Similarly, if the correlation coefficient between the two
variables X and Y is –1, the scatter of the points of
these variables will lie on a negatively sloped straight
line and such a correlation will be called a perfectly
negative correlation.

RESEARCH CONCEPTS AND


SLIDE 15-8

Testing the significance of the


correlation coefficient
DR

The hypothesis to be tested is mentioned below:


H0 : ρ = 0
H1 : ρ ≠ 0
Test statistic is given by,
D R D E E PA K C H AW L A
NEENA SONDHI

Where,

ρ = Population correlation coefficient between the variables X and Y


r = Sample correlation coefficient between the variables X and Y
n – 2 = The degrees of freedom

Now for a given level of significance, if computed |t| is greater than


tabulated |t| with n – 2 degrees of freedom, the null hypothesis of
no correlation between X and Y is rejected.
RESEARCH CONCEPTS AND
SLIDE 15-9

Simple Regression Analysis


DR

Simple linear regression equation can be presented as


Y = α + βX + U
Where,
D R D E E PA K C H AW L A

U = Stochastic error term


α, β = Parameters to be estimated
NEENA SONDHI

 The equation is estimated using the ordinary least


squares (OLS) method of estimation.
 The OLS method of estimation states that the
regression line should be drawn in such a way so as
to minimize the error sum of squares.

RESEARCH CONCEPTS AND


SLIDE 15-10

Simple Regression Analysis


DR

The OLS method of estimation would result in the


following two normal equations:
D R D E E PA K C H AW L A

Solving the above normal equations results in:


NEENA SONDHI

Once βˆ is
estimated, the value
of α may be
computed as,

RESEARCH CONCEPTS AND


SLIDE 15-11

Simple Regression Analysis


 The estimate of error (residual) term is obtained as:
DR

where Ŷ  estimated value of the dependent variable


where Y  observed value of the dependent variable
D R D E E PA K C H AW L A

The estimate of the variance of the error term is given by:


NEENA SONDHI

 n and k denote the sample size and number of parameters to be estimated respectively.

RESEARCH CONCEPTS AND


SLIDE 15-12

Test of Significance of Regression


Parameters
DR

The hypothesis to be tested for the slope coefficient is mentioned below as:
H0 : β = 0
H1 : β ≠ 0
D R D E E PA K C H AW L A

The test statistic to be used to test the significance of the slope coefficient is
given by:
NEENA SONDHI

At a given level of significance, computed t is compared with


absolute t to accept or reject the null hypothesis.

RESEARCH CONCEPTS AND


SLIDE 15-13

Goodness of fit of regression


equation
DR

 r2 is used to measure the goodness of fit of regression equation.


This measure is also called the coefficient of determination of a
regression equation and it takes value between 0 and 1. Higher
the value of r2, higher is the goodness of fit.
D R D E E PA K C H AW L A
NEENA SONDHI

RESEARCH CONCEPTS AND


SLIDE 15-14

Testing the Significance of r2


DR

The hypothesis to be tested is:


H0 : r2 = 0
H1 : r2 > 0
D R D E E PA K C H AW L A

The test statistic F is given by the expression:


NEENA SONDHI

For a given level of significance α, the computed value of the


F statistic is compared with the tabulated value of F with k – 1
degrees of freedom in the numerator and n – k degrees of
freedom in the denominator. If the computed F exceeds the
tabulated F, the null hypothesis is rejected in favour of the
alternative hypothesis.

RESEARCH CONCEPTS AND


SLIDE 15-15

Multiple Regression Model


In the multiple regression model, there are at least two
DR

independent variables. The linear multiple regression model with


two independent variables would look like:
Y = b0 + b1 X1 + b2 X2 + U
b0, b1 and b2 are the parameters to be estimated.
D R D E E PA K C H AW L A

The estimation is carried out using the OLS method which results
NEENA SONDHI

in the following:

RESEARCH CONCEPTS AND


SLIDE 15-16

Multiple Regression Model


DR

 In case of multiple regression model, we have the concept of the


multiple correlation squared given by R2Y.X1X2 which indicates the
explanatory power of the model. The various formulae for R2 are
given as under:
D R D E E PA K C H AW L A
NEENA SONDHI

The test of significance of the individual parameters is


conducted using the t statistic. To be able to use the t
statistic we need the estimates of the variance of the
estimated coefficients of the regression equation.

RESEARCH CONCEPTS AND


SLIDE 15-17

Multiple Regression Model


DR

 The estimates of the variance of estimated coefficients are


presented below:
D R D E E PA K C H AW L A
NEENA SONDHI

RESEARCH CONCEPTS AND


SLIDE 15-18

Multiple Regression Model


DR

Let us assume that we want to test the significance of the slope


coefficient of the variable X1. We can write the null and alternative
hypothesis as:
H 0 : b1 = 0
D R D E E PA K C H AW L A

H 1 : b1 ≠ 0
The test statistic may be written as:
NEENA SONDHI

The value of the test statistic t is computed and compared with


the table value of t for a given level of significance. If the
computed value of |t| is greater than table value of |t|, we reject
H0 in favour of the alternative hypothesis H1.

RESEARCH CONCEPTS AND


SLIDE 15-19

Testing the Significance of R2


DR

 The test for the significance of R2 is carried out using the F


statistic, which is already explained in the case of the two
variable linear regression model. The hypothesis to be tested is
listed as under:
D R D E E PA K C H AW L A

H 0 : b0 = b 1 = b 2 = 0 ⇒ R 2 = 0
H1 : All b’s are not zero ⇒ R2 > 0
NEENA SONDHI

 If R2 is equal to 0 that means all the coefficients are equal to zero


since none of the independent variables would explain any
variations in Y.
 The test for the significance of R2 is shown through the analysis
of variance (ANOVA) table.

RESEARCH CONCEPTS AND


SLIDE 15-20

Testing the Significance of R2


DR
D R D E E PA K C H AW L A
NEENA SONDHI

RESEARCH CONCEPTS AND


SLIDE 15-21

Dummy Variables in Regression


Analysis
DR

 In regression analysis, the dependent variable is generally


metric in nature and it is most often influenced by other metric
variables.
 However, there could be situations where the dependent
D R D E E PA K C H AW L A

variable may be influenced by the qualitative variables like


gender, marital status, profession, geographical region, colour,
or religion.
NEENA SONDHI

 The question arises how to quantify qualitative variables.


 In situations like this, the dummy variables come to our rescue.
They are used to quantify the qualitative variables.
 The number of dummy variables required in the regression
model is equal to the number of categories of data less one.
 Dummy variables usually assume two values 0 and 1.

RESEARCH CONCEPTS AND


SLIDE 15-22

Example of a Dummy Variable


Regression
DR

Suppose the starting salary of a college lecturer is influenced not only


by years of teaching experience but also by gender. Therefore, the
model could be specified as:
Y = f (X, D)
D R D E E PA K C H AW L A

Where,
Y = Starting salary of a college lecturer in thousands ` per month
NEENA SONDHI

X = No. of years of work experience


D is a dummy variable which takes values
D = 1 (if the respondent is a male)
= 0 (if the respondent is a female)

The model could be written as,


Y=α+βX+γD+U

RESEARCH CONCEPTS AND


SLIDE 15-23

Example of a Dummy Variable


Regression
DR

This can be estimated by using ordinary least squares (OLS)


techniques. Suppose the estimated regression equation looks like:
D R D E E PA K C H AW L A
NEENA SONDHI

The above two equations differ by the amount γˆ. It is known that
γˆ can be positive or negative. If γˆ is positive it would imply that
the average salary of a male lecturer is more than that of a
female lecturer by the amount γˆ while keeping the number of
years of experience constant.
RESEARCH CONCEPTS AND
D R D E E PA K C H AW L A DR
NEENA SONDHI

RESEARCH
CONCEPTS AND
END OF CHAPTER

You might also like