You are on page 1of 50

MBA – LSCM Semester II 1

07/14/2020
DNA/BRM/Session 26 (a). Correlation & Regression
BRM
MODULE 5 SESSION 26 (a)

CORRELATION AND REGRESSION ANALYSIS

Dr. Neeraj Anand


Prof. – LSCM & HOD - DoTM
SoB, UPES
BUSINESS RESEARCH METHODS
QUIZ 2

1. Richter's scale for earthquake is an example of ……………….scale.

07/14/2020
DNA/BRM/Session 26 (a). Correlation & Regression
2. Preference for Quality, price, availability in order is a type
of………….scale.
3. Container’s code is an example of……………………………………..
4. Cronbach alpha is a measure of …………………………………………
5. For ordinal data, measure of central tendency used is………………
6. Simplest measure of dispersion is………………………………………………
7. COV is calculated as………………………………………
8. BCG matrix is an example of ……………………
Quantitative Techniques
PURPOSE: TO PROVIDE A RATIONAL BASIS FOR MAKING
DECISIONS IN THE ABSENCE OF COMPLETE
INFORMATION.
IT DEALS WITH THREE CLASSICAL ASPECTS OF SCIENCE:
- DESCRIBING THE BEHAVIOUR OF SYSTEMS
- ANALYZING THE BEHAVIOUR BY CONSTRUCTING
APPROPRIATE MODELS.
- APPLYING THESE MODELS TO PREDICT FUTURE
BEHAVIOUR OR DESCRIBING RELATIONSHIP AMONG VARIOUS
VARIABLES.
O.R. FUNCTION IS A STAFF FUNCTION.
CORRELATION
Definitions
 Correlation
A method used to determine if a relationship between variables exists
 Correlation Coefficient

A statistic or parameter which measures the strength and direction of a


relationship between two variables. Its value ranges between -1 and +1.
 Dependent Variable

A variable in correlation or regression that can not be controlled, that


is, it depends on the independent variable.
 Independent Variable

A variable in correlation or regression which can be controlled, that is,


it is independent of the other variable.
Coefficient of Determination
The percent of the variation in dependent variable that can be
explained by the variation in the independent variable in the
regression model. It is expressed as r2.
Scatter Plot
A plot of the data values on a coordinate system. The
independent variable is graphed along the x-axis and the
dependent variable along the y-axis.
Pearson's Correlation Coefficient
 This is a measure of linear correlation. The population
parameter is denoted by the greek letter ‘rho’(ρ) and
the sample statistic is denoted by the roman letter ‘r’.
The Pearson Product Moment Correlation
 Measures the extent to which one variable covaries
with another. The correlation standardizes the two
variables when it computes the covariance. Hence,
the correlation is a standardized covariance.
Types of Correlation
 POSITIVE
 NEGATIVE
 LINEAR
 PERFECT (Positive/Negative)
 NON-LINEAR
 SIMPLE
 PARTIAL
 NON-SENSE (Spurious)
 BIVARIATE
 MULTIPLE
Types of Correlation
 Bivariate Correlations: are correlations
between two variables. Some bivariate
correlations are non-directional and these are
called symmetric correlations. Other bivariate
correlations are directional and are called
asymmetric correlations.
 Multiple Correlations: are those between one
variable and a set of variables.
There are multiple correlations that hold part of
the set of variables constant:
Assumptions
1. Interval level data
2.Linearity (plot the relationship between the variables with a scattergram or
fit the functional curve formed by the relationship to be sure of linearity).
3. Homoskedasticity or equal variances
4. Independence of observations
5. Representative sampling
Scatter diagram
 Rectangular coordinate
 Two quantitative variables
 One variable is called independent (X) and
the second is called dependent (Y)
 Points are not joined
Y
 No frequency table * *
*
X
Example

Wt. 67 69 85 83 74 81 97 92 114 85
(kg)
SBP 120 125 140 160 130 180 150 140 200 130
(mmHg)
Wt. 67 69 85 83 74 81 97 92 114 85
SBP(mmHg) (kg)
SBP 120 125 140 160 130 180 150 140 200 130
(mmHg)

220
200

180
160

140
120

100
80 wt (kg)
60 70 80 90 100 110 120

Scatter diagram of weight and systolic blood pressure


SBP (mmHg)
220

200

180

160

140

120

100

80
Wt (kg)
60 70 80 90 100 110 120

Scatter diagram of weight and systolic blood


pressure
Scatter plots

The pattern of data is indicative of the type of


relationship between your two variables:
 positive relationship
 negative relationship
 no relationship
Positive relationship
18

16

14

Height in CM 12

10

0
0 10 20 30 40 50 60 70 80 90
Age in Weeks
Negative relationship

Reliability

Age of Car
No relation
Correlation Coefficient

Statistic showing the degree of relation between two variables


Simple Correlation coefficient (r)

 It is also called Pearson's correlation or product


moment correlation coefficient.
 It measures the nature and strength between two
variables of the quantitative type.
The sign of r denotes the nature of
association

while the value of r denotes the strength


of association.
 If the sign is +ve this means the relation is direct (an increase in
one variable is associated with an increase in the
other variable and a decrease in one variable is associated with a
decrease in the other variable).

 While if the sign is -ve this means an inverse or indirect


relationship (which means an increase in one variable is
associated with a decrease in the other).
 The value of r ranges between ( -1) and ( +1)
 The value of r denotes the strength of the
association as illustrated
by the following diagram.

strong intermediate weak weak intermediate strong

-1 -0.75 -0.25 0 0.25 0.75 1


indirect Direct
perfect perfect
correlation correlation
no relation
If r = Zero this means no association or correlation
between the two variables.

If 0 < r < 0.25 = weak correlation.

If 0.25 ≤ r < 0.75 = intermediate correlation.

If 0.75 ≤ r < 1 = strong correlation.

If r = l = perfect correlation.
How to compute the simple correlation
coefficient (r)

 xy   x y
r n

x 
2
(  x) 2
 
.  y 
2
(  y) 2


 n  n 
  
Example:
A sample of 6 children was selected, data about their
age in years and weight in kilograms was recorded as
shown in the following table . It is required to find the
correlation between age and weight.

Weight Age serial


(Kg) (years) No
12 7 1
8 6 2
12 8 3
10 5 4
11 6 5
13 9 6
These 2 variables are of the quantitative type, one
variable (Age) is called the independent and
denoted as (X) variable and the other (weight)
is called the dependent and denoted as (Y)
variables to find the relation between age and
weight compute the simple correlation coefficient
using the following formula:

 xy   x y
r  n

  x2 
(  x) 2

.  y 2 
(  y) 2


 n  n 
  
Weight Age
Serial
Y2 X2 xy (Kg) (years)
.no
(y) (x)
144 49 84 12 7 1

64 36 48 8 6 2

144 64 96 12 8 3

100 25 50 10 5 4

121 36 66 11 6 5

169 81 117 13 9 6

=y2∑ =x2∑ xy=∑ =y ∑ =x∑ Total


742 291 461 66 41
41  66
461 
r 6
 (41) 2   (66) 2 
291  .742  
 6  6 

r = 0.759
strong direct correlation
EXAMPLE: Relationship between Anxiety and Test Scores

Anxiety Test X2 Y2 XY
)X( score (Y)
10 2 100 4 20
8 3 64 9 24
2 9 4 81 18
1 7 1 49 7
5 6 25 36 30
6 5 36 25 30
X = 32∑ Y = 32∑ X2 = 230∑ Y2 = 204∑ XY=129∑
Calculating Correlation Coefficient

(6)(129)  (32)(32) 774  1024


r   .94
 6(230)  32  6(204)  32 
2 2
(356)( 200)

r = - 0.94

Indirect strong correlation


Pearson Product Moment Correlation

consists of the covariation divided by the


square root of the product of the standard
deviations of the two variables.
a = (∑Y)/ N b = ∑xy / ∑x2

Cov( x, y )
rx , y 
sx s y
Test scores and work experience of 5 executives is
given below, determine the coefficient of correlation.
X1= Test Score, X2 = Experience (years)
X1

X2

50 2

80 8

20 6

90 5

60 4

Using
EXCEL

r=0.204124
Properties of “r”
r is always between -1 and 1 inclusive. -1means
perfect negative linear correlation and +1 means
perfect positive linear correlation.
 r only measures the strength of a linear relationship.
There are other kinds of relationships besides linear.
 r has the same sign as the slope of the regression
(best fit) line.
 r does not change if the independent (x) and
dependent (y) variables are interchanged.
 r does not change if the scale on either variable is
changed. You may multiply, divide, add, or subtract
a value to/from all the x-values or y-values without
changing the value of r.
Spearman’s Rank Correlation
Coefficient
Nonparametric correlation between two ordinal
variables.
Rank correlation coefficient rs =
1- [(6 ∑D2 )/( N3-N)]
where D (Difference of ranks) = R1-R2
N= Total number of observations
Procedure
1. Rank the values of X from 1 to n where n is the numbers of pairs
of values of X and Y in the sample.
2. Rank the values of Y from 1 to n.
3. Compute the value of di for each pair of observation by subtracting
the rank of Yi from the rank of Xi
4. Square each di and compute ∑di2 which is the sum of the squared
values.
5. Apply the following formula

6 (di) 2
rs  1 
n(n 2  1)

The value of rs denotes the magnitude


and nature of association giving the same
interpretation as simple r.
Example
In a study of the relationship between level
education and income of 6 executives, the following
data was obtained. Find the relationship between
them and comment.

Income level education sample


(Y) (X) numbers
25 Preparatory. A
10 Primary. B
8 University. C
10 secondary D
15 secondary E
50 illiterate F
Answer:
di2 di Rank Rank
Y X (Y) (X)
4 2 3 5 25 Middle A

0.25 0.5 5.5 6 10 Primary B


30.25 -5.5 7 1.5 8 University C
4 -2 5.5 3.5 10 Secondary D
0.25 -0.5 4 3.5 15 Secondary E
25 5 2 7 50 Illiterate F
0.25 0.5 1 1.5 60 university G

∑ di2=64
6  64
rs  1   0.1
7(48)

Comment:
There is an indirect weak correlation between level of education and
income.
Causation

If there is a significant linear correlation between


two variables, then one of five situations can be
true.
 There is a direct cause and effect relationship
 There is a reverse cause and effect relationship
 The relationship may be caused by a third
variable
 The relationship may be caused by complex
interactions of several variables
 The relationship may be coincidental
Regression
Regression
 A method used to describe the relationship between two variables.
 Regression Line
 The best fit line.
 When there is significant linear correlation, we can use a line to estimate
the value of the dependent variable for certain values of the independent
variable.
When the regression equation should be used:

 There is significant linear correlation. (That is, when we reject the null
hypothesis that rho=0 in a correlation hypothesis test.)

 The value of the independent variable being used in the estimation is close
to the original values not the values much beyond the range. (That is, we
should not use a regression equation obtained using x's between 10 and
20 to estimate y when x is 350).
Regression (contd..)
 The regression equation should not be used with different populations.
( That is, if x is the height of a male, and y is the weight of a male, then
you shouldn't use the regression equation to estimate the weight of a
female).
 The regression equation should n't be used to forecast values not from that
time frame. (That is, if data is from the 1970's, it probably isn't valid estimate
for 2000's).
Regression Equation
 The regression equation is:
y' = a + bx + e
 ‘b’ is the slope of the regression line, ‘a’ is the y-
intercept of the regression line. The regression
line is sometimes called the "line of best fit" or
the "best fit line".
 Since it "best fits" the data, it makes sense that the
line passes through the means.
Coefficient of Determination
The coefficient of determination is:
 the percent of the variation that can be explained by the
regression equation.
 the explained variation divided by the total variation
 the square of r
 Every sample has some variation in it. The total variation
is made up of two parts, the part that can be explained
by the regression equation and the part that can't be
explained by the regression equation.
 The ratio of the explained variation to the total variation
is a measure of how good the regression line is. If the
regression line passed through every point on the scatter
plot exactly, it would be able to explain all of the
variation. The further the line is from the points, the less
it is able to explain.
Total V= UV + EV
Applications
 Finance:
 Profit and sales revenue
 Return on stock and return of BSE/NIFTY
 Capital reserve and return on stock
 Marketing:
 Sales of any product and Advt. budget
 Sales revenue and salary of executive
 Economics:
 Inflation and GDP
 Demand of any product and Temperature
A Case on Multiple Regression
Himalayan Plastics Ltd. manufactures different type of plastic
products. The company has taken the decision to train its
supervisors. The company conducted an aptitude test for
selection of supervisors. A random sample of five supervisors
was selected who had experience of minimum two years. They
were provided training for two weeks as Quality Control
Inspectors and after the completion of training their proficiency
was measured by their output per shift basis. The output per shift
(Number of units) was recorded for all the five supervisors as 20,
60,30,50 and 70 respectively.
Contd.
The values of test scores (X 1) and experience (X2) in terms of number of years are as under:
X1 X2
50 2
80 8
20 3
90 5
60 4

i) Determine the relationship between the performance of the Inspectors and their test scores as well as experience

ii) Determine the correlation between test score and experience of Inspectors.

{Hint: Yc = a + b1 X1 + b2 X2

∑Y = Na + b1∑X1 + b2 ∑X2

2
∑X1Y = a ∑X1 + b1 ∑X1 + b2 ∑ X1X2

2
∑X2Y = a ∑X2 + b1 ∑X1X2 + b2∑X2
FORECAST USING EXCEL
X Y
40 2
45 1.8

50 2.3
52 2.5
60 3
65 3.4
68 3.1
72 4.2
80 4.8
THANK YOU

You might also like