Professional Documents
Culture Documents
Regression and
Correlation
Introduction
• Regression refers to the statistical technique of
modeling the relationship between variables.
• In simple linear regression,
regression we model the
relationship between two variables.
variables
• One of the variables, denoted by Y, is called the
dependent variable and the other, denoted by X, is
called the independent variable.
variable
• The model we will use to depict the relationship
between X and Y will be a straight-line relationship.
relationship
• A graphical sketch of the pairs (X, Y) is called a
scatter plot.
plot
Using Statistics
This scatterplot locates pairs of Scatterplot of Advertising Expenditures (X) and Sales (Y)
observations of advertising expenditures on 140
80
S ale s
60
of advertising. 0
0 10 20 30 40 50
A d ve rtising
Y
Y
Y
X 0 X X
Y
Y
X X X
Simple Linear Regression Model
The equation that describes how y is related to x and
an error term is called the regression model.
The simple linear regression model is:
y = a+ bx +
where:
a and b are called parameters of the model,
a is the intercept and b is the slope.
is a random variable called the error term.
Assumptions of the Simple Linear Regression Model
•• The
Therelationship
relationshipbetween
between Assumptions of the Simple
XXand
andYYisisaastraight-line
straight-line Y Linear Regression Model
relationship.
relationship.
errorsare
errors areuncorrelated
uncorrelated
(notrelated)
(not related)in
insuccessive
successive
observations.
observations.
•• That
Thatis: ~N(0,
is: ~ N(0,22)) Identical normal
distributions of errors, all
centered on the
regression line.
X
Errors in Regression
Y
the observed data point
Yi
{
Error ei Yi Yi
Yi the predicted value of Y for X
i
X
Xi
SIMPLE REGRESSION AND CORRELATION
Independent
Dependent variable
Y a bX
variable
n XY X Y
b
n X X
2 2
a Y bX
SIMPLE REGRESSION - EXAMPLE
X Y X2 XY
1 1 1 1
2 1 4 2
3 2 9 6
4 2 16 8
5 4 25 20
X 15 Y 10 55 XY 37
X 2
15 10
X 3 Y 2
5 5
SIMPLE REGRESSION - EXAMPLE
n XY X Y
b b = 0.7
n X X
2 2
a Y bX
a 2 0.7 3 0.1
Ŷ 0.1 0.7 X
Standard Error of Estimate
s Y Ŷ
2
n2
e
Short-cut
s Y a Y b XY
2
n2
e
Standard Error of Estimate
Y2
1
1
se Y a Y b XY
2
4
4
n2
16
Y 26
2 26 0.1 10 0.7 37
se
52
0.6055
Correlation Analysis
Correlation analysis is used to describe
the degree to which one variable is
linearly related to another.
Thepopulation
The populationcorrelation,
correlation,denoted
denotedby,
by,can
cantake
takeon
onany
anyvalue
valuefrom
from-1
-1toto1.1.
indicatesaaperfect
indicates perfectnegative
negativelinear
linearrelationship
relationship
-1<<<<00 indicates
-1 indicatesaanegative
negativelinear
linearrelationship
relationship
indicatesno
indicates nolinear
linearrelationship
relationship
00<<<<11 indicates
indicatesaapositive
positivelinear
linearrelationship
relationship
indicatesaaperfect
indicates perfectpositive
positivelinear
linearrelationship
relationship
Theabsolute
The absolutevalue ofindicates
valueof indicatesthe
thestrength
strengthor
orexactness
exactnessof
ofthe
therelationship.
relationship.
Illustrations of Correlation
Y Y Y
= -1 = 0
= 1
X X X
Y Y Y
= -.8 = 0
= .8
X X X
The coefficient of correlation:
n xy x y
r
n x 2 2
x n y y
2 2
2
Sample Coefficient of Determination r
a Y b XY nY
2
Alternate Formula r
2
Y 2
nY 2
Sample Coefficient of Determination
a Y b XY nY 2
r
2
Y nY
2 2
0.110 0.7 37 5 2
2
r
2 0.8167
26 5 2
2
Interpretation: Percentage of
We can conclude that 81.67 % of the total variation
variation in the sales revenues is explain explained by
the regression.
by the variation in advertising
expenditure.
The Coefficient of Correlation or
Karl Pearson’s Coefficient of Correlation
r r 2
r 0.8167 0.9037
The relationship between the two variables is direct
Hypothesis Tests for the Correlation
Coefficient
H0: = 0 (No linear relationship)
H1: 0 (Some linear relationship)
Test Statistic: r
t( n 2 )
1 r 2
n2
Analysis-of-Variance Table and
an F Test of the Regression Model
H0 : The regression model is not significant
H1 : The regression model is significant
Sourceof
Source of Sum
Sumof
of Degreesof
Degrees of
Variation Squares
Variation Squares Freedom Mean
Freedom MeanSquare
Square FFRatio
Ratio
Regression SSR
Regression SSR (1)
(1) MSR
MSR MSR
MSR
MSE
MSE
Error
Error SSE
SSE (n-2)
(n-2) MSE
MSE
Total
Total SST
SST (n-1)
(n-1) MST
MST
Testing for the existence of linear relationship
H0: b = 0
H1: b is not equal to zero.
b
Test statistic, with n-2 degrees of freedom: t
sb
Correlations
Advertisi
ng Sales
expenses revenue
($00) ($000)
Advertising Pearson 1 .904*
expenses ($00) Correlation
Sig. (2-tailed) .035
N 5 5
Sales revenue Pearson .904* 1
($000) Correlation
Sig. (2-tailed) .035
N 5 5
*. Correlation is significant at the 0.05
level (2-tailed).
Model Summary
Adjusted R Std. Error of
Model R R Square Square the Estimate
1 .904a .817 .756 .606
a. Predictors: (Constant), Advertising expenses ($00)
ANOVAb
Sum of Mean
Model Squares df Square F Sig.
1 Regression 4.900 1 4.900 13.364 .035a
Residual 1.100 3 .367
Total 6.000 4
a. Predictors: (Constant), Advertising expenses
($00)
b. Dependent Variable: Sales revenue
($000)
Ŷ 0.1 0.7 X
MSR
Test Statistic F
MSE
Conclusion:
Conclusion:ThereThereisissufficient
sufficientevidence
evidencetotoreject
reject
the null hypothesis in favor of the alternative hypothesis.
the null hypothesis in favor of the alternative hypothesis.
isisnot
notequal
equaltotozero.
zero.Thus,
Thus,the
theindependent
independentvariable
variableisis
linearly
linearlyrelated
relatedtotoy.y.
This
Thislinear
linearregression
regressionmodel
modelisisvalid
valid
b
Test statistic, with n-2 degrees of freedom: t
sb