Professional Documents
Culture Documents
1
11.1
Probabilistic Models
Models
• Representation of some phenomenon
• Mathematical model is a mathematical expression of some
phenomenon
• Often describe relationships between variables
• Types
• Deterministic models
• Probabilistic models
Deterministic Models
• Hypothesize exact relationships
• Suitable when prediction error is negligible
• Example: force is exactly mass times acceleration
• F = m·a
0 (beta zero) = y-intercept of the line, that is, the point at which the
line intercepts or cuts through the y-axis
1 (beta one) = slope of the line, that is, the change (amount of
increase or decrease) in the deterministic
component of y for every 1-unit increase in x
A First-Order (Straight Line) Probabilistic
Model
[Note: A positive slope implies that E(y) increases by the amount 1 for
each unit increase in x. A negative slope implies that E(y) decreases by
the amount 1.]
Five-Step Procedure
Step 1: Hypothesize the deterministic component of the
model that relates the mean, E(y), to the
independent variable x.
Step 2: Use the sample data to estimate unknown
parameters in the model.
Step 3: Specify the probability distribution of the random
error term and estimate the standard deviation
of this distribution.
Step 4: Statistically evaluate the usefulness of the model.
Step 5: When satisfied that the model is useful, use it for
prediction, estimation, and other purposes.
11.2
y
60
40
20
0 x
0 20 40 60
Thinking Challenge
y
60
40
20
0 x
0 20 40 60
Least Squares Line
The least squares line yˆ ˆ0 ˆ1 x is one that has the following two
properties:
1. The sum of the errors equals 0,
i.e., mean error = 0.
2. The sum of squared errors (SSE) is smaller than for any other
straight-line model, i.e., the error variance is minimum.
Formula for the Least Squares Estimates
SS xy
Slope : ˆ1
SS xx
where SS xy xi x y y i
x x
2
SS xx i
n = Sample size
Interpreting the Estimates of 0 and 1 in
Simple Liner Regression
y-intercept: ̂ 0 represents the predicted value of y when x
= 0 (Caution: This value will not be meaningful
if the value x = 0 is nonsensical or outside the
range of the sample data.)
slope: represents the increase (or decrease) in y for
every 1-unit increase in x (Caution: This ˆ1
interpretation is valid only for x-values within
the range of the sample data.)
Least Squares Graphically
n
LS minimizes i 1 2 3 4
ˆ 2
ˆ 2
ˆ 2
ˆ 2
ˆ 2
i 1
Sales
4
3
2
1
0
0 1 2 3 4 5
Advertising
Parameter Estimation Solution
x x 15
3 y y 10
2
5 5 5 5
SS xy x x y y
SS xx x x
2
x 3 y 2 7 x 3 10
2
Parameter Estimation Solution
The slope of the least squares line is:
ˆ SS xy 7
B1 .7
SS xx 10
yˆ .1 .7 x
Parameter Estimation Computer
Output
Parameter Estimates
^1
yˆ .1 .7 x
Coefficient Interpretation Solution
^
1. Slope (1)
• Sales Volume (y) is expected to increase by $700
for each $100 increase in advertising (x), over the
sampled range of advertising expenditures from
$100 to $500
^
2. y-Intercept (0)
• Since 0 is outside of the range of the sampled
values of x, the y-intercept has no meaningful
interpretation
11.3
Model Assumptions
Basic Assumptions of the Probability
Distribution
Assumption 1:
The mean of the probability distribution of is 0 –
that is, the average of the values of over an
infinitely long series of experiments is 0 for each
setting of the independent variable x. This
assumption implies that the mean value of y, E(y),
for a given value of x is
E(y) = 0 + 1x.
Basic Assumptions of the Probability
Distribution
Assumption 2:
The variance of the probability distribution of is
constant for all settings of the independent
variable x. For our straight-line model, this
assumption means that the variance of is equal
to a constant, say 2, for all values of x.
Basic Assumptions of the Probability
Distribution
Assumption 3:
The probability distribution of is normal.
Assumption 4:
The values of associated with any two observed
values of y are independent–that is, the value of
associated with one value of y has no effect on the
values of associated with other y values.
Basic Assumptions of the Probability
Distribution
.
Estimation of 2 for a (First-Order) Straight-
Line Model
SSE SSE
s
2
Degrees of freedom for error n 2
SS yy yi y
2
n2
We will refer to s as the estimated standard error of the regression
model.
Calculating SSE, s2, s Example
You’re a marketing analyst for Hasbro Toys.
You gather the following data:
Ad Expenditure (100$) Sales (Units)
1 1
2 1
3 2
4 2
5 4
Find SSE, s2, and s.
Calculating s2 and s Solution
SSE 1.1
s
2
.36667
n2 52
s .36667 .6055
11.4
ˆ
1
SSxx
Sampling Distribution of ̂1
s
sˆ
We estimate ̂1by 1 SSxx and refer to this quantity as the
estimated standard error of the least squares slope ̂1 .
A Test of Model Utility: Simple Linear
Regression
Interpreting p-Values for Coefficients
in Regression
Almost all statistical computer software packages
report a two-tailed p-value for each of the
parameters in the regression model. For example,
in simple linear regression, the p-value for the
two-tailed test H0: 1 = 0 versus Ha: 1 ≠ 0 is given
on the printout. If you want to conduct a one-
tailed test of hypothesis, you will need to adjust
the p-value reported on the printout as follows:
Interpreting p-Values for Coefficients
in Regression
-3.182 0 3.182 t
Test Statistic
Solution
s .6055
sö .1914
15
2
1
SS xx
55
5
ö1 .70
t 3.657
Sö .1914
1
Test of Slope Coefficient
Solution
• H0: 1 = 0 Test Statistic:
• Ha: 1 0
• .05 t 3.657
• df 5 – 2 = 3
• Critical Value(s):
Decision:
Reject H0 Reject H0 Reject at = .05
.025 .025
Conclusion:
There is evidence of a
-3.182 0 3.182 t relationship
Test of Slope Coefficient
Computer Output
Parameter Estimates
Parameter Standard T for H0:
Variable DF Estimate Error Param=0 Prob>|T|
INTERCEP 1 -0.1000 0.6350 -0.157 0.8849
ADVERT 1 0.7000 0.1914 3.656 0.0354
^
1 S^
1
t = ^1 / S^
1
P-Value
11.5
where
y y
SS xy x x
x x
2
SS xx
y y
2
SS yy
Coefficient of Correlation
Coefficient of Correlation
Coefficient of Correlation
Coefficient of Correlation Example
y y 7
SS xy x x
y y 6
2
SS yy
x x 10
2
SS xx
SS xy 7
r .904
SS xx SS yy 10 6
A Test for Linear Correlation
Condition Required for a Valid Test
of Correlation
y y 26
SS xy x x
y y 18.5
2
SS yy
x x 40
2
SS xx
SS xy 26
r .956
SS xx SS yy 40 18.5
Coefficient of Determination
It represents the proportion of the total sample variability
around y that is explained by the linear relationship
between y and x.
0 r2 1
r2 = (coefficient of correlation)2
Coefficient of Determination
Example
r2 = (coefficient of correlation)2
r2 = (.904)2
r2 = .817
r2
Root MSE 0.60553 R-square 0.8167
Dep Mean 2.00000 Adj R-sq 0.7556
C.V. 30.27650
A Complete Example
Example
Suppose a fire insurance company wants to relate the
amount of fire damage in major residential fires to the
distance between the burning house and the nearest
fire station. The study is to be conducted in a large
suburb of a major city; a sample of 15 recent fires in
this suburb is selected. The amount of damage, y, and
the distance between the fire and the nearest fire
station, x, are recorded for each fire.
Example
Example
Step 1: First, we hypothesize a model to relate fire
damage, y, to the distance from the nearest fire station,
x. We hypothesize a straight-line probabilistic model:
y = 0 + 1x +
Example
Step 2: Use a statistical software package to estimate
the unknown parameters in the deterministic
component of the hypothesized model. The Excel
printout for the simple linear regression analysis is
shown on the next slide. The least squares estimates of
the slope 1 and intercept 0, highlighted on the
printout, are
ˆ1 4.919331
ˆ0 10.277929
Example