You are on page 1of 9

Data Analysis for Managers

Unit V:
Correlation and Regression
Rajashree Kamath, Ph.D. (Statistics),
Assistant Professor (Economics and Quantitative Techniques),
Coordinator – AcadX, CSR Karma Club, Cell for Sustainable Development,
Documentation
(Kengeri Campus),
School of Business and of Management, CHRIST (Deemed to be University), Kengeri,
Bangalore 560074.
Ph.: +918040129879 (O), Cell: +919448067196.
MISSION VISION CORE VALUES
CHRIST is a nurturing ground for an individual’s Excellence and Service Faith in God | Moral Uprightness
holistic development to make effective contribution to Love of Fellow Beings
the society in a dynamic environment Social Responsibility | Pursuit of Excellence
CHRIST
Deemed to be University

Correlation

● Correlation is a measure of the degree of linear relationship between


two variables.
● It takes a value between -1 and +1.
● A value of -1 indicates a perfect linear negative relationship.
● A value of +1 indicates a perfect linear positive relationship.

● In Excel, arrange all the columns whose (pairwise) correlations are


needed. Excel gives output in the form of a correlation matrix where
the diagonal elements are all 1 (correlation between a variable and
itself is 1) and the off-diagonal elements give the correlations between
the corresponding pair of variables.

Excellence and Service


CHRIST
Deemed to be University

Example 1
● AndersonData\Ch 15 Multiple Regression\CarValues.xlsx

Road-
Price Cost/Mi Test Predicted Value
  ($) le Score Reliability Score
Price ($) 1
Cost/Mile 0.944 1
Road-Test
Score 0.458 0.410 1
Predicted
Reliability 0.126 0.068 0.220 1
Value
Score -0.494 -0.590 0.194 0.644 1
Excellence and Service
CHRIST
Deemed to be University

Simple Linear Regression

● We begin by drawing a scatterplot on a pair of variables for whom we


need to explore whether they have a linear relationship.
● The variable on the X-axis is called as the independent variable.
● The variable on the Y-axis is called as the dependent variable.
● The trend line on the scatterplot can be estimated by a line of the
form:
○ Y = mX + c
● The purpose of regression is to estimate the values of m and c using
the values of X and Y.
● The purpose is also to test whether the equation, called as a model, is
a "good fit", and to test whether m and c are significant (non-zero).
● A model is said to be a good fit when X and Y have a statistically
significant linear relationship.
● The method of estimating m and c is called as Least Squares method.
Excellence and Service
CHRIST
Deemed to be University

Assumptions of SLR

Excellence and Service


CHRIST

Example 2 Deemed to be University

● Elliptical trainers are becoming one of the more popular exercise


machines. Their smooth and steady low-impact motion makes them a
preferred choice for individuals with knee and ankle problems. Price
and quality are two important factors in any purchase decision. Are
higher prices generally associated with higher quality elliptical
trainers? The following data show the price and rating for eight
elliptical trainers tested (Consumer Reports, February 2008).
AndersonData\Ch 14 Simple Regression\Ellipticals.xlsx
● a. Develop a scatter diagram with price as the independent variable.
● b. An exercise equipment store that sells primarily higher priced
equipment has a sign over the display area that says “Quality: You Get
What You Pay For.” Based upon your analysis of the data for ellipical
trainers, do you think this sign fairly reflects the price-quality
relationship for elliptical trainers?
● c. Use the least squares method to develop the estimated regression
equation and test whether the model is a good fit. Also test if intercept
= 0.
Excellence and Service
● d. Use the estimated regression equation to predict the rating for an
CHRIST

Solution 2 Deemed to be University

a. <Scatterplot with Price as X and Rating as Y.>


b. <Compute Correlation Coefficient and verify that it is close to +1.>
c. SUMMARY OUTPUT

Regression
Statistics
Multiple R 0.877498
R Square 0.770003
Adjusted
R Square 0.731671
Standard
Error 5.383267
Observati
ons 8

ANOVA
Significan
  df SS MS F ce F

Regressio
0.004
n 1 582.1226 582.1226 20.08735 184
Residual 6 173.8774 28.97956
Total 7 756      

Coefficien Standard Lower Upper


  ts Error t Stat P-value 95% 95%
6.78E
Intercept 58.15849 4.014418 14.4874 -0648.33557 67.98142 4
Price 0.008449 0.001885 4.481891 0.004184 0.003836 0.013061 0

Excellence and Service


CHRIST
Deemed to be University

Solution 2 (Contd.)
● The regression equation is:
Rating = 58.15849 + 0.008449 * Price
The hypotheses in question are
H0: Rating does not have a linear relationship with price (or, the model is
not a good fit)
Ha: Rating has a linear relationship with price (or, the model is a good fit)

Since the p-value, 0.004, < 0.05, we reject H0. That means, rating has a
significant linear relationship with price (or, in other words, the model is
a good fit), at the 5% level of significance.
The second set of hypotheses, to test whether the intercept is zero, are:
H0: The intercept is zero (or the regression line passes through the
origin).
Ha: The intercept is non-zero.
Since the p-value corresponding to "Intercept", 6.78x10-6, < 0.05, we
Excellence and Service
reject H . That means, the regression line does not pass through the
CHRIST
Deemed to be University

Coefficient of Determination

● In the regression output, "Multiple R" is nothing but the correlation


between the independent and dependent variable.
● "R Square" is the square of this correlation coefficient, and is called
as the Coefficient of Determination.
● The coefficient of determination gives the proportion of variation in Y
that is explained by X.
● For example, in the previous example, 77% of the variation in rating
is explained by price ("R Square 0.770003").

Excellence and Service

You might also like