Professional Documents
Culture Documents
Regress Model Chap 2 Four Per Page
Regress Model Chap 2 Four Per Page
• What factors affect lottery sales? Helpful to know for marketing, e.g.,
1 Correlations and Least Squares where to establish new retail outlets.
• i unit of analysis, ZIP (postal) code
2 Basic Linear Regression Model • n = 50 randomly selected geographic areas
• y= average lottery sales (SALES) over a forty-week period, April, 1998
through January, 1999,
3 Is the Model Useful?: Some Basic Summary Measures • x = population (POP), measure of size of the area.
• Later, we will introduce other factors including area’s typical age,
education level, income, and so forth. Population is the obvious place to
4 Properties of Regression Coefficient Estimators start.
• Here are some summary statistics.
5 Statistical Inference
Table: Summary Statistics of Each Variable
10000
Figure: Histograms of Population and Sales. Each distribution is skewed to the right,
indicating that there are many small areas compared to a few areas with larger sales 5000
and populations.
0
POP
Figure: A scatter
Frees (Regression plot of the lottery
Modeling) data.
Basic Linear Each of the 50 plotting symbols
Regression 4 / 40
Correlations and Least Squares Correlations and Least Squares
• It is unaffected by scale and location changes of either, or both, • Minimize this quantity by taking derivatives with respect to the intercept and
variables. slope, setting equal to zero and solving
• It can readily be compared across different data sets.
∂ n
• Correlation coefficients take up less space to report than a scatter ∗ SS(b0∗ , b1∗ ) = (−2) (yi − (b0∗ + b1∗ xi )) = 0
∂b0
plot and are often the primary statistic of interest. i=1
• Scatter plots help us understand other aspects of the data, such as and
the range, and also provide indications of nonlinear relationships in ∂ n
y = 470.8 + (0.647)x.
Frees (Regression Modeling) Basic Linear Regression 7 / 40 Frees (Regression Modeling) Basic Linear Regression 8 / 40
Correlations and Least Squares Correlations and Least Squares
20
S(Z ) = 100 exp (.08)10 + .15 10Z ,
18
CTE Estimates
16
annual mean return of 8% and standard deviation 15% .
• The present value of this option is
14
12
C(Z ) = e−0.06(10) max (0, 110 − S(Z )) ,
10
based on a 6% discount rate.
8
• 1,000 i.i.d. standard normal random variables were simulated and 0 2 4 6 8 10 12
calculate each of 1000 present values, Ci1 , . . . , Ci,1000 .
VaR Estimates
• Vari is the 95th percentile
• CTEi is the average of the highest 50. Figure: Plot of Conditional Tail Expectation (CTE) versus Value at Risk (VaR). Based
on n = 1, 000 simulations from a 10 year European put bond.
Frees (Regression Modeling) Basic Linear Regression 9 / 40
However, approximate normality is enough for central limit Figure: The distribution of the response varies by the level of the explanatory
theorems that we will need for inference. variable.
Frees (Regression Modeling) Basic Linear Regression 11 / 40 Frees (Regression Modeling) Basic Linear Regression 12 / 40
Basic Linear Regression Model Basic Linear Regression Model
Frees (Regression Modeling) Basic Linear Regression 13 / 40 Frees (Regression Modeling) Basic Linear Regression 14 / 40
Is the Model Useful?: Some Basic Summary Measures Is the Model Useful?: Some Basic Summary Measures
yi − y
= y − y
i
i
+ y − y
i
n
n
2
n
2
total = unexplained + explained (yi − y)2 = yi − yi + yi − y ,
deviation deviation deviation i=1 i=1 i=1
y
y^ = b0 + b1x • Summarize with “R-square,” the coefficient of determination, defined as
y − y^
Regression SS
y^ R2 = .
Total SS
y^ − y = b1(x − x)
Is the Model Useful?: Some Basic Summary Measures Properties of Regression Coefficient Estimators
n
σ2
Var b1 = wi2 Var yi = .
i=1
sx2 (n − 1)
x x x
Is the Explanatory Variable Important?: The t-Test Example. Wisconsin Lottery Sales
Frees (Regression Modeling) Basic Linear Regression 23 / 40 Frees (Regression Modeling) Basic Linear Regression 24 / 40
Statistical Inference Statistical Inference
Alternative Hypothesis (Ha ) Procedure: Reject H0 in favor of Ha if • If r = 0, then b1 = 0 and t(b1 ) = 0. No correlation, no relationship.
β1 > d t − ratio > tn−2,1−α . • The correlation between y and x, r = r (x, y ) is the same as
β1 < d t − ratio < −tn−2,1−α . between y and y, say r (x, y).
β1 = d |t − ratio| > tn−2,1−α/2 . • Because r is location and scale invariant (assuming that
y = b0 + b1 x and b1 > 0.
Notes: The significance level is α. Here, tn−2,1−α is the (1-α)th percentile
from the t -distribution using df = n − 2 degrees of freedom. • It turns out (Ex 2.13) that R 2 = r 2 .
• Further, one can check that (Ex 2.16)
Frees (Regression Modeling) Basic Linear Regression 25 / 40 Frees (Regression Modeling) Basic Linear Regression 26 / 40
Frees (Regression Modeling) Basic Linear Regression 27 / 40 Frees (Regression Modeling) Basic Linear Regression 28 / 40
Statistical Inference Building a Better Model: Residual Analysis
Building a Better Model: Residual Analysis Building a Better Model: Residual Analysis
Frees (Regression Modeling) Basic Linear Regression 31 / 40 Frees (Regression Modeling) Basic Linear Regression 32 / 40
Building a Better Model: Residual Analysis Building a Better Model: Residual Analysis
The Effect of Outliers and High Leverage Points The Effect of Outliers and High Leverage Points
Table: 19 Base Points Plus Three Types of Unusual Observations Table: Results from Four Regressions
Building a Better Model: Residual Analysis Building a Better Model: Residual Analysis
5000
Figure: qq Plots of Wisconsin Lottery Residuals. The left-hand panel is based on all
0
50 points. The right-hand panel is based on 49 points, residuals from a regression
0 10000 20000 30000 40000
after removing Kenosha.
POP
Figure: Scatter plot of SALES versus POP, with the outlier corresponding to Kenosha
marked.
Application: Capital Asset Pricing Model Application: Capital Asset Pricing Model
Data Data
• Consider monthly returns over the five year period from January, • Scatter plots of the returns versus time are called time series plots.
1986 to December, 1990, inclusive. • A quick glance at the horizontal axis reveals that this unusual point is in
• y = security returns from the Lincoln National Insurance October, 1987, the time of the well-known market crash.
Corporation as the dependent variable • We also see two outliers in 1990
• x = market returns from the index of the Standard & Poor’s 500
Index. Monthly Return
0.3 LINCOLN
MARKET
0.2
0.1
Table: Summary Statistics of 60 Monthly Observations
0.0
Mean Median
Standard Minimum Maximum −0.1
Deviation −0.2
MARKET 0.0074 0.0142 0.0525 -0.2205 0.1275 1986 1987 1988 1989 1990 1991
Frees (Regression Modeling) Basic Linear Regression 37 / 40 Frees (Regression Modeling) Basic Linear Regression 38 / 40
Application: Capital Asset Pricing Model Application: Capital Asset Pricing Model
0.2
• The important point is that the R 2 decreased when omitting this
0.1
1990 OUTLIERS
unusual point.
0.0
−0.1
OCTOBER, 1987 CRASH
• The outliers were due to some unfounded rumors in the market
−0.2
−0.3 that made Lincoln’s price drop one month and subsequently
−0.3 −0.2 −0.1 0.0 0.1 0.2
recover.
MARKET
Frees (Regression Modeling) Basic Linear Regression 39 / 40 Frees (Regression Modeling) Basic Linear Regression 40 / 40