You are on page 1of 30

Simple Regression & Correlation

Business Statistics
Regression and Correlation
• Regression analysis is the process of
constructing a mathematical model or
function that can be used to predict or
determine one variable by another variable.

• Correlation is a measure of the degree of


relatedness of two variables.
Simple Regression Analysis
• Simple linear regression -- the most elementary
regression model
– Dependent variable, the variable to be predicted,
usually called Y
– Single independent variable, the predictor or
explanatory variable, usually called X
– Relationship between X and Y is described by a
linear function
– Changes in Y are assumed to be caused by changes
in X
Examples
• A firm may be interested in estimating the
relationship between
– Advertising expenditure (X) and sales (Y)
– Extent of training (X) and job performance (Y)
– Riskiness of the stock (X) and return on a stock (Y)
– State of the economy (X) and company profits (Y)
Simple Linear Regression Model
The population simple linear regression model:
Y = β0 + β1X + ε
where,
Y is the response variable
X is the predictor variable
β0 and β1 are regression coefficients
ε is the error component

The problem is to find estimators for β0 and β1.


Model Assumptions
• The relationship between X and Y is a straight line
relationship.
• The values of the independent variable X are assumed
fixed (not random); the only randomness in the values
of Y comes from the error term ε.
• The errors ε are normally distributed with mean 0 and
a constant variance σ2. The errors are uncorrelated
(not related) with one another in successive
observations. In symbols:
ε ~ N(0, σ2)
Example
• Recently, research efforts have focused on the problem
of predicting a manufacture’s market share by using
information on the quality of its product. Suppose that
the following data are available on market share (in
percentage) and product quality, on a scale of 0 to 100,
determined by an objective evaluation procedure:

Product
Quality 27 39 73 66 33 43 47 55 60 68 70 75 82
Market
share 2 3 10 9 4 6 5 8 7 9 10 13 12
Example (contd.)
• Here,
X = Product quality (independent variable)
Y = Manufacture’s market share (dependent variable)
because, we are assuming that manufacture’s market share (Y)
is affected by the quality of its product (X).
Scatter Plot
14

12
Market share (%)

10

0
20 30 40 50 60 70 80 90
Product quality

Scatter Plots (also called scatter diagrams) are used to investigate the
possible relationship between two variables that both relate to the
same "event.“.
Estimators of the Coefficients
• Good estimators of the population coefficients β0 and
β1 can be estimated from sample data.
• Let, b0 and b1 are estimators of β0 and β1.
• Then our prediction equation:
Ŷ = b0 + b1X
where, Ŷ is the predicted value for a given X.
• The best way of obtaining the estimators is known as
least square technique.
Simple Linear Regression Model
(continued)

Y Yi  β 0  β1X i  ε i
Observed Value
of Y for Xi

εi Slope = β1

Predicted Value Random Error for this Xi


of Y for Xi value

Intercept = β0

Xi
X
Interpretation of b0 and b1
• b0 is the estimated average value of Y when the value
of X is zero.

• b1 is the estimated change in the average value of Y


as a result of a one-unit change in X.
Least Square Method
• In least square method we minimize the sum of
squares of errors (SSE) for all of the n data points.
• The deviation ei = (Yi – Ŷi) is called the i-th error or
residual for i = 1,…,n.
• In least square method we will minimize,
n n n

i  i i  i 0 1 i
e 2

i 1
 (Y  ˆ
Y
i 1
) 2
 (Y  b
i 1
 b X ) 2
Least Square Method
• Finally we will obtain,
SS XY
b1  b0  y  b1 x
SS X

where,
  x
2

SS x   ( x  x )   x
2 2

n

SS xy   ( x  x )( y  y )   xy 
  x  ( y )
n
Problem
• The following sample data shows the demand for a product in
thousands of units and its price (Rs.) charged in six different
market areas:
Price: 10 18 14 11 16 13
Demand: 125 58 90 100 72 85
Estimate the simple linear regression relationship between
price and demand.
Estimate the demand for the product in a market where it is
priced at Rs. 15.
Error Variance
• The error variance σ2 is a measure of the spread of
the population elements about the regression line.
• Generally, smaller the error variance, the more closely
the population elements follow the regression line.
• An unbiased estimator of , denoted by s2, is the mean
squared error (MSE) of the regression.
• The estimate s = √(MSE) of the standard deviation of
the regression errors is called standard error of
estimate.
Computation of MSE
• Degrees of freedom (error) = n-2
SSE   (Y  Y )
ˆ 2

(SS )2
=SS  XY
Y SS
X
=SS  b SS
Y 1 XY
MSE= SSE
(n-2)
Standard Error of Regression Coefficients

The standard error of b0 (intercept) :

s (b0 ) 
s  x 2

nSS X
The standard error of b1 (slope) :
s
s (b1 ) 
SS X
Confidence Intervals for the Regression
Coefficients

A (1- ) 100% confidence interval for b :


0
b t s(b )
0  /2,(n2) 0








 

A (1- ) 100% confidence interval for b :


1
b t s(b )
1  /2,(n2) 1








 
Regression Model for Prediction
• Point Estimation
– A single-valued estimate of Y for a given value of X
obtained by inserting the value of X in the
estimated regression equation.
• Confidence Interval
– For an average value of Y given a value of X
• Prediction Interval
– For a value of Y given a value of X
Prediction Intervals
A (1- ) 100% prediction interval for Y:

1 (x  x ) 2
yˆ  t  s 1  
2 n SS X
where tα/2 is based on (n-2) degrees of freedom.
Hypothesis Tests for the Slope
of the Regression Model
• Suppose our null and alternative hypotheses are (for airline
cost data):
H0 : β1 = 0
H1 : β1 ≠ 0
• We have to calculate t-statistic,
b1  1( H 0 )
t
Sb
where, b1 = slope of the fitted regression
β1(H0) = actual slope hypothesized for the population
Sb = Standard error of the regression coefficient
Correlation Analysis
• Correlation analysis is the statistical tool we
can use to describe the degree to which one
variable is linearly related to another.
• The population correlation, denoted by ρ, can
take on any value from -1 to 1.
• Correlation can also be measured by
calculating coefficient of determination.
Correlation

  
 indicates
indicatesaaperfect
perfectnegative
negativelinear
linearrelationship
relationship
-1<<<<00 indicates
-1 indicatesaanegative
negativelinear
linearrelationship
relationship
   indicatesno
indicates nolinear
linearrelationship
relationship
00<<<<11 indicates
indicatesaapositive
positivelinear
linearrelationship
relationship
   indicates
indicatesaaperfect
perfectpositive
positivelinear
linearrelationship
relationship

Theabsolute
The absolutevalue ofindicates
valueof indicatesthe
thestrength
strengthor
orexactness
exactnessof
ofthe
therelationship.
relationship.
Illustrations of Correlation

Y Y Y
 == -1
-1  == 00
 == 11

X X X

Y  == -.8 Y  == 00 Y
-.8
 == .8
.8

X X X
Covariance and Correlation
The covariance of two random variables X and Y:
Cov ( X , Y )  E [( X   )(Y   )]
X Y
where  and  Y are the population means of X and Y respectively.
X
The population correlation coefficient:
Cov ( X , Y )
=
 
X Y
The sample correlation coefficient * :
SS
r= XY
SS SS
X Y
*Note:
*Note: If  << 0,
If 0, b1
b1 << 00 If  == 0,
If 0, b1
b1 == 00 If  >> 0,
If 0, b1
b1 >0
>0
Examples of Approximate
r2 Values
Y

r2 = 1

Perfect linear relationship between X and Y:

X 100% of the variation in Y is explained by


r2 = 1 variation in X
Y

X
r =1
2
Examples of Approximate
r2 Values
Y

0 < r2 < 1

Weaker linear relationships between X


and Y:
X
Some but not all of the variation in Y is
explained by variation in X
Y

X
Examples of Approximate
r2 Values

r2 = 0
Y

No linear relationship between X and Y:

The value of Y does not depend on X.


(None of the variation in Y is explained by
variation in X)
X
r2 = 0
Problem
• The following sample data shows the demand for a product in
thousands of units and its price (Rs.) charged in six different
market areas:
Price: 10 18 14 11 16 13
Demand: 125 58 90 100 72 85
Compute correlation coefficient and coefficient of
determination.

You might also like