Professional Documents
Culture Documents
Financial Blockchain
10 0
S ale s
80
Larger (smaller) values of sales tend to be 60
associated with larger (smaller) values of 40
advertising. 20
0
0 10 20 30 40 50
A d ve rti s i ng
The scatter of points tends to be distributed around a positively sloped straight line.
The pairs of values of advertising expenditures and sales are not located exactly on a
straight line.
The scatter plot reveals a more or less strong tendency rather than a precise linear
relationship.
The line represents the nature of the relationship on average.
Examples of Other Scatterplots
Y
Y
Y
X 0 X X
Y
Y
X X X
Model Building
The inexact nature of the Data In ANOVA, the systematic
relationship between component is the variation
advertising and sales of means between samples
suggests that a statistical or treatments (SSTR) and
model might be useful in Statistical the random component is
analyzing the relationship. model the unexplained variation
(SSE).
A statistical model separates
the systematic component In regression, the
of a relationship from the Systematic systematic component is the
random component. component overall linear relationship,
and the random component
+ is the variation around the
Random line.
errors
The Simple Linear Regression
Model
The population simple linear regression model:
Y= 0 + 1 X +
Non-random/Systematic Part Random Error
where
Y is the dependent variable, the variable we wish to explain or predict
X is the independent variable, also called the predictor variable
is the error term, the only random component in the model, and thus, the
only source of randomness in Y.
{
Error: i } 1 = Slope
E[Yi]=0 + 1 Xi
}
1
Actual observed values of Y
0 = Intercept
differ from the expected value by
an unexplained or random error:
X Yi = E[Yi] + i
Xi
= 0 + 1 Xi + i
Assumptions of SLR Model
• The relationship between
X and Y is a straight-line
relationship. Assumptions of the Simple
Y = b0 + b1X + e
Yˆ b + b X
0 1
where Ŷ (Y - hat) is the value of Y lying on the fitted regression line for a given
value of X.
Errors in Regression
Y
Yi . Yˆ b b X
0 1
the fitted regression line
Yi
{
Error ei Yi Yˆi
Yi the predicted value of Y for X
i
X
Xi
Least Squares Regression
The sum of squared errors in regression is :
n n
SSE = e 2
i (y i yˆ i ) 2
i =1 i =1
The least squares regression line is that whic h minimizes the SSE
with respect to the estimates b 0 and b1 .
n n
y
i =1
i nb0 b1 x i
i =1
n n n
x
i =1
i y i b0 x i b1 x i2
i =1 i =1
Least Squares Estimators
SSx (x x ) x
2 2
n 2
SS y ( y y ) y
2 2 y
n
SSxy (x x )( y y ) xy
x ( y )
n
Least squares regression estimators:
SS XY
b1
SS X
b0 y b1 x
Example 10-1: Aczel & Sounderpandian
b1 t s (b1 )
0.025,( 25 2 )
= 1.25533 ( 2.069) ( 0.04972 )
1.25533 010287
.
[115246
. ,1.35820]
Hypothesis Tests for the
Regression Slope
Example 10 -1:
H : 0
0 1
H : 0
1 1
b
t(n - 2) 1
s(b )
1
= 1.25533 25.25
0.04972
Y
Y
Unexplained Deviation
Explained Deviation
{
}
{
Total Deviation
SST
2
= SSE
2
( y y ) ( y y) ( y y )
+ SSR
Percentage of
2
2 SSR SSE
r 1 total variation
SST SST
X explained by
X the regression.
The Coefficient of Determination
7000
5000
Dollars
SSR 64527736.8 4000
r
2
0.96518 3000
SST 66855898 2000
1000 1500 2000 2500 3000 3500 4000 4500 5000 5500
Miles
ANOVA Table and an F Test of the
Regression Model
Source of Sum of Degrees of
Variation Squares Freedom Mean Square F Ratio
Example 10-1
Source of Sum of Degrees of
Variation Squares Freedom F Ratio p Value
Mean Square
Regression 64527736.8 1 64527736.8 637.47 0.000
Error 2328161.2 23 101224.4
Total 66855898.0 24
The k-Variable Multiple Regression Model
Model assumptions:
1. ~N(0,2), independent of other errors.
2. The variables Xi are uncorrelated with the error term.
Simple and Multiple Least-
Squares Regression
Y y
x1
y b0 b1x
X x2 y b0 b1x1 b2 x 2
y nb b x b x
0 1 1 2 2
x y b x b x b x x
2
1 0 1 1 1 2 1 2
x y b x b x x b x
2
2 0 2 1 1 2 2 2
Example 11-1: Aczel & Sounderpandian
Y 47164942
. 15990404
. X 1 11487479
. X2
Decomposition of the Total Deviation in a Multiple
Regression Model
y
Y Y: Error Deviation
Total deviation: Y Y
Y Y : Regression Deviation
y
x1
x2
Total Deviation = Regression Deviation + Error Deviation
SST = SSR + SSE
The F Test of a Multiple Regression Model
A statistical test for the existence of a linear relationship between Y and any or
all of the independent variables X1, X2, ..., Xk:
H0: 1 = 2 = ...= k= 0
H1: Not all the i (i=1,2,...,k) are equal to 0
F Distribution with 2 and 7 Degrees of Freedom The test statistic, F = 86.34, is greater
f(F) than the critical point of F(2, 7) for any
Test statistic 86.34
common level of significance
(p-value 0), so the null hypothesis is
=0.01
rejected, and we might conclude that
the dependent variable is related to
0
F one or more of the independent
F0.01=9.55 variables.
How Good is the Regression ?
2 SSR SSE
R = = 1-
SST SST
2
2 SSR SSE R ( n ( k 1)) SSE
R = = 1- F 2 MSE
(n - (k +1))
SST SST (1 R ) (k ) R 2 = 1- =1 -
SST MST
(n -1)
References
• Statistics for Management: Srivastava,
T.N. & Rego, S. (Tata McGraw-Hill)
• Research for Marketing Decisions: Green,
P.E., Tull, D.S., & Albaum, G. – 5th Ed.
(Prentice Hall – India)
• Complete Business Statistics: Aczel, A.D.
& Sounderpandian, J. – Sixth Edition (Tata
McGraw-Hill)
• https://www.datacamp.com/community/tut
orials/tutorial-ridge-lasso-elastic-net