620
640
testscr
680
700
Empirical MethodsMW24.1
14
16
18
20
22
24
26
str
Call:
lm(formula = testscr ~ str)
Residuals:
Min
1Q
-47.727 -14.251
Median
0.483
3Q
12.822
Max
48.540
Coefficients:
Estimate Std. Error t value
Pr(>|t|)
(Intercept) 698.9330
9.4675 73.825
< 2e-16 ***
str
-2.2798
0.4798 -4.751 0.00000278 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 18.58 on 418 degrees of freedom
Multiple R-squared: 0.05124,Adjusted R-squared: 0.04897
F-statistic: 22.58 on 1 and 418 DF, p-value: 0.000002783
Oliver Kirchkamp
c Oliver Kirchkamp
This handout is a summary of the slides we use in the lecture. The handout is
perhaps not very helpful unless you also attend the lecture. This handout is also
not supposed to replace a book. The principal text for the lecture is Stock and
Watsons book. All formulas we use in the lecture can be found there (with fewer
mistakes). Please expect slides and small parts of the lecture to change from time
to time and print only the material you currently need.
Homepage: http://www.kirchkamp.de/oekonometrie/
Schedule: Lecture: Fri, 10:15-11:45, HS5
Exercise: Fri, 14:15-15:45, SR207
Mon, 16:15-17:45, HS4
Exam: Wed. 18.2., 8-10 (please check homepage!)
Literature:
* Stock and Watson; Introduction to Econometrics, Pearson, 2006
Studenmund; Using Econometrics, Pearson, 2006
free
wide range of applications
Helpful hints, links to the documentation, etc., can be found on the
Homepage
In the lecture we will illustrate many things with R. You should try
these examples on you own computer. Use the online help to look
up unknown commands.
more specialised
more heterogeneous syntax
Contents
1 Introduction
1.1 What is the purpose of economics . . . . . . . . . . . . . . . . . . . .
6
6
1.2
1.3
1.4
1.5
1.6
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2 Statistical theory
2.1 Population . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Random variables and distributions . . . . . . . . . . . . . . . . .
2.3.1 Conditional expected value and conditional variance . . .
2.4 Samples of a population . . . . . . . . . . . . . . . . . . . . . . . .
2.5 Estimations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5.1 The distribution of Y . . . . . . . . . . . . . . . . . . . . . .
2.5.2 Characteristics of sampling distributions . . . . . . . . . .
2.5.3 Why should we use Y to estimate Y ? . . . . . . . . . . . .
2.5.4 Testing hypotheses . . . . . . . . . . . . . . . . . . . . . . .
2.5.5 Estimating the variance of Y . . . . . . . . . . . . . . . . . .
2.5.6 Calculating the p-value with the help of an estimated 2Y .
2.5.7 Relation between p-value and the level of significance . .
2.5.8 What happened to the t table and the degrees of freedom?
2.5.9 A comment . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5.10 Another problem . . . . . . . . . . . . . . . . . . . . . . . .
2.6 Confidence intervals . . . . . . . . . . . . . . . . . . . . . . . . . .
2.7 An alternative: The Bayesian Approach . . . . . . . . . . . . . . .
2.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 Linear regression with a single regressor
3.1 Measures of determination . . . . . . . . . . . . . .
3.2 OLS assumptions . . . . . . . . . . . . . . . . . . .
3.2.1 Digression The Existence of Moments .
3.3 The distribution of the OLS estimator . . . . . . .
^1 . . . . . . . . . . . . . . . . . . .
3.4 Distribution of
^0 . . . . . . . . . . . . . . . . . . .
3.5 Distribution of
^1 . . . . . . . . . . . . . . . .
3.6 Hypothesis tests for
3.7 Confidence intervals and p-values . . . . . . . . .
3.8 Bayesian Regression . . . . . . . . . . . . . . . . .
3.9 Reporting estimation results . . . . . . . . . . . . .
3.10 Continuous and nominal variables . . . . . . . . .
3.11 Heteroscedastic and homoscedastic error terms . .
3.11.1 An example from labour market economics
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
6
7
7
11
11
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
11
11
12
12
15
16
17
17
17
19
19
21
21
21
22
22
23
23
24
31
32
.
.
.
.
.
.
.
.
.
.
.
.
.
34
36
37
38
39
43
46
46
48
49
51
52
56
59
c Oliver Kirchkamp
Contents
c Oliver Kirchkamp
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
59
60
63
63
64
65
66
68
76
80
82
82
83
84
86
87
89
90
90
90
94
94
95
97
97
98
98
98
101
102
103
106
107
108
111
112
118
120
121
121
5
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
129
130
134
135
135
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
140
148
148
152
153
154
156
157
158
159
160
161
162
165
169
170
173
174
175
175
.
.
.
.
.
.
.
.
.
.
.
.
.
.
180
180
180
180
181
181
182
183
184
185
185
186
186
212
213
c Oliver Kirchkamp
Contents
c Oliver Kirchkamp
Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
1 Introduction
What is an interesting economic theory?
Claim:
For each economic theory there is an alternative theory predicting the opposite.
Often economic theories suggest relationships often with policy implications but these relationships are rarely quantified.
How large is the increase in the performance of students when courses are
smaller?
How large is the increase in your income when you study for another year?
What is the elasticity of demand for cigarettes?
By how much does the GDP increase if the ECB raises interest rates by 1%?
hard to do
INTRODUCTION
Incomes of alumni
Time series data about monetary policy
Problems related to data from uncontrolled processes
Unobserved factors
Simultaneous causalities
Coincidence causality
1.4 Example
How does learning success change when class size is reduced by one student? What, if class size is reduced by eight students?
Can we answer this question without using data?
E.g. test scores from 420 school districts in California from 1998-1999
str = student teacher ratio number of students in the district / full time
equivalent teachers
testscr = 5th -grade test score (Stanford-9 achievement test)
c Oliver Kirchkamp
c Oliver Kirchkamp
The command data enables access to the data set contained in a library.
data(Caschool)
We can now access this data set. summary displays an overview of the statistical characteristics of the data
set.
names(Caschool)
[1] "distcod"
[7] "calwpct"
[13] "str"
"county"
"mealpct"
"avginc"
"district" "grspan"
"computer" "testscr"
"elpct"
"readscr"
"enrltot"
"compstu"
"mathscr"
"teachers"
"expnstu"
summary(Caschool$str)
Min. 1st Qu.
14.00
18.58
Median
19.72
Max.
25.80
It is quite cumbersome to write the name of a data set - here Caschool - time and time again. Whenever
we intend to work with the same data set for a while we can use attach(Caschool). This will tell R to
look at Caschool first, whenever we ask for a variable.
attach(Caschool)
summary(str)
Min. 1st Qu.
14.00
18.58
Median
19.72
hist(str)
Max.
25.80
INTRODUCTION
60
40
0
20
Frequency
80
100
Histogram of str
14
16
18
20
22
24
26
str
660
640
620
testscr
680
700
library(car)
scatterplot(testscr ~ str)
14
16
18
20
str
22
24
26
c Oliver Kirchkamp
c Oliver Kirchkamp
10
Test results testscr seem to be getting worse as student teacher ratios str are
getting higher.
Is it possible to show that districts with low student teacher ratios str have
higher test scores testscr?
Compare average test scores in districts with small str to test scores in districts with high str (estimation)
Test the null hypothesis that mean test scores are the same against the alternative hypothesis that they are not (hypothesis testing)
Estimate an interval for the difference of the mean test scores (confidence
interval)
Is the difference large enough
for a school reform
to convince parents
to convince the school authority
In the following example we want to split up the data set into two pieces schools with a student teacher
ratio above and below 20. In other words, we will introduce a nominal variable. In R a nominal variable
is called a factor and factor converts a continuous variable (str) into a factor.
t.test performs a student-t test to compare mean values. We write Caschool$testsrc ~ large.
The variable to be tested is given before the tilde. The factor describing the two groups to be compared is
given after the tilde.
large <- str>20
t.test(testscr ~ large)
This simple test tells us that there is a significant difference of the test scores
testscr between large and small school groups.
We can estimate the difference between the two groups, we can test a hypothesis,
and we can calculate a confidence interval.
STATISTICAL THEORY
11
1.5 Plan
You already know estimates, hypotheses tests and confidence intervals.
We will generalize these concepts for regressions.
Before we do this, we will take a brief look at the underlying theory.
1.6 Exercises
1. Econometrics
What is econometrics?
2 Statistical theory
Population, random variable, distribution
Moments of a distribution (mean value, variance, standard deviation, covariance, correlation)
Conditional distribution, conditional mean values
Distribution of a random sample
2.1 Population
The set of all entities which could theoretically be observed (e.g. all imaginable school districts at all points in time under all imaginable conditions)
Quite often we assume that the population is of infinite size (or at least very
large)
Usually we know something about A (our sample) and we want to say
something about B. We can do this, if we assume that both A and B are
drawn from the same population.
c Oliver Kirchkamp
c Oliver Kirchkamp
12
2.2 Sample
A part of the population that we observe (e.g. Californian school districts in
1998 (and under the conditions of this year))
STATISTICAL THEORY
13
cov(X, Z)
= XZ
X Z
var(X) var(Z)
c Oliver Kirchkamp
FALSE
TRUE
0.020
0.015
Density
c Oliver Kirchkamp
14
0.010
0.005
0.000
600
650
700
testscr
data(Wages)
The data set Wages contains, among others, the following two variables:
exp
years of full-time work experience
lwage logarithm of wage
xyplot(lwage ~ exp,group=sex,data=Wages,auto.key=list(corner=c(1,1)))
STATISTICAL THEORY
15
female
male
8.5
8.0
lwage
7.5
7.0
6.5
6.0
5.5
0
10
20
30
40
50
exp
( important notation)
c Oliver Kirchkamp
c Oliver Kirchkamp
16
with(subset(Caschool,str>=20),mean(testscr))
[1] 649.9788
Recovery rate of all patients who have received a certain drug (X=recovery,
Y=drug)
If E(X|Y = y) = const for all values of y (does not depend on y), then cor(X, Y) = 0
(not vice versa!!!)
STATISTICAL THEORY
17
2.5 Estimations
In econometrics we often estimate unknown quantities. Lets suppose we have a
sample Y1 . . . Yn of a random variable Y. We start with a simple problem: How
can we estimate the mean value of Y (not the mean value of Y1 . . . Yn )?
Idea:
We could simply use the mean value Y of the sample Y1 . . . Yn
We could simply use the first observation Y1
We could use the median of the sample Y1 . . . Yn
2.5.1 The distribution of Y
The observations of the sample are drawn randomly.
Thus, the values of Y1 . . . Yn are random.
Thus, functions of Y1 . . . Yn are random (e.g. the mean value).
If we had drawn a different sample, the function (e.g. the mean value)
would have a different value.
var(Y) =
n
Question: Does Y converge to Y if n is large?
Law of large numbers:
c Oliver Kirchkamp
Y is a consistent estimator of Y .
Formally: If Y1 , . . . , Yn i.i.d. and 2Y < , then Y is a consistent estimator of Y ,
i.e.
p
Y Y
Y N Y , Y
n
Of course, R knows distributions, too. In the following example we draw two density functions of binomially distributed variable using dbinom.
0.6
x/10
1.0
0.020
0.010
0.000
0.30
0.20
0.10
0.00
0.2
0.030
x = 750:850
plot(x/1000,
dbinom(x, size=1000, prob=0.8))
x<-1:10
plot(x/10,
dbinom(x, size=10, prob=0.8))
c Oliver Kirchkamp
18
0.76
0.80
0.84
x/1000
The distribution on the left, which is based on a small sample size, does not
quite look like a normal distribution. The distribution on the right is based on a
STATISTICAL THEORY
19
much larger sample size (n = 1000) and has a lot more similarities with a normal
distribution.
2.5.3 Why should we use Y to estimate Y ?
= Y
Y is unbiased: E(Y)
p
Y is consistent: Y Y
(two-sided test)
one-sided test
one-sided test
c Oliver Kirchkamp
c Oliver Kirchkamp
20
p value = PrH0
= PrH0
= PrH0
sample
|Y Y,0 | > |Y
Y,0 |
!
Y Y,0 Y sample Y,0
/n > /n
Y
Y
!
sample
Y Y,0 Y
Y,0
>
Y
Y
F(|g|)
|g|
(1)
(2)
(3)
F(|g|)
0
|g|
If n is large: p value
= the probability,
that an N(0, 1)-distributed random
Y sample Y,0
.
variable is outside of
Y
Statistic: g =
x
0
/ n
F(|g|)
|g|
F(|g|)
0
|g|
STATISTICAL THEORY
21
n
2
1 X
Yi Y = sample variance of Y
=
n1
i=1
We demand E(Y 4 ) < , because the mean value is not calculated from Yi
but from its square.
2.5.6 Calculating the p-value with the help of an estimated 2Y
sample
Y Y,0 Y
Y,0
>
= PrH0
Y / n Y / n
Y Y sample
Y,0 >
Y,0
= PrH0
sY / n
sY / n
| {z } |
{z
}
sample
t
t
F(|g|)
|g|
F(|g|)
0
|g|
H0 is rejected if p <
2.5.7 Relation between p-value and the level of significance
The level of significance is given. E.g. if the given level of significance is 5%,. . .
. . . the null hypothesis is rejected if |t| > 1.96,
. . . equivalently the null hypothesis is rejected if p < 0.05.
The p-value is also called marginal level of significance.
(4)
(5)
(6)
c Oliver Kirchkamp
c Oliver Kirchkamp
22
2.5.9 A comment
The theory of the t-distribution is a mathematically beautiful and interesting
result.
If Y is i.i.d. and normally distributed, we know the exact distribution of the
t statistic.
But
If the Y are not exactly normally distributed, this does not help us at all.
data(OFP, package="Ecdat")
hist(OFP[["faminc"]],breaks=40)
STATISTICAL THEORY
23
600
0
200
Frequency
1000
Histogram of OFP[[faminc]]
10
20
30
40
50
OFP[[faminc]]
s2B
nB
c Oliver Kirchkamp
c Oliver Kirchkamp
24
confidence interval
for
Y + n Q
0 Y +
Q 1
likelihood
posterior
Here we use a numerical approximation to calculate the Bayesian posterior distribution for the mean of testscr. We employ the Gibbs sampler jags (which is
similar to Bugs).
The first lines specify the stochastic process (y[i] ~ dnorm(mu,tau)), the next
lines specify the priors. Here we use uninformed priors, mu ~ dnorm (0,.0001)
means that mu could take almost any value. The precision of the normal distribution (0.0001) is very small.
library(runjags)
modelX <- model {
for (i in 1:length(y)) {
y[i] ~ dnorm(mu,tau)
}
mu
~ dnorm (0,.0001)
tau ~ dgamma(.01,.01)
sd <- sqrt(1/tau)
}
}
bayesX<-run.jags(model=modelX,data=list(y=testscr),monitor=c("mu","sd"))
STATISTICAL THEORY
25
c Oliver Kirchkamp
0.8
F()
0.6
0.4
0.2
-10000
-5000
5000
10000
The prior distribution for (i.e. dnorm(0,.0001)) assigns (more or less) the
same a-priory probability to any reasonable value of .
s<-10^seq(-1,4.5,.1)
x<-1-pgamma(1/s^2,.01,.01)
xyplot(x ~ s, scales=list(x=list(log=T)), xscale.components = xscale.components.fractions,xlab
0.20
F()
0.15
0.10
0.05
0.00
1/10
10
100
1000
10000
JAGS gives us now a posterior distribution for and for the standard deviation
of testscr.
Density
plot(bayesX,var="mu",type=c("trace","density"))
mu
c Oliver Kirchkamp
26
652
654
mu
Iteration
14000
656
658
STATISTICAL THEORY
27
19
17
18
sd
20
Density
21
plot(bayesX,var="sd",type=c("trace","density"))
18
19
20
21
22
17
sd
14000
Iteration
summary(bayesX)
Iterations = 5001:15000
Thinning interval = 1
Number of chains = 2
Sample size per chain = 10000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean
SD Naive SE Time-series SE
mu 654.10 0.9329 0.006597
0.006387
sd 19.09 0.6630 0.004688
0.004688
2. Quantiles for each variable:
2.5%
25%
50%
75% 97.5%
mu 652.28 653.46 654.11 654.73 655.93
sd 17.84 18.64 19.07 19.53 20.44
Comparison with the frequentist approach: The credible interval which can
be obtained from the last line of the summary is very similar to the confidence
interval from the frequentist approach.
c Oliver Kirchkamp
c Oliver Kirchkamp
28
Credible interval:
2.5%
97.5%
652.2768 655.9267
Confidence interval:
2.5 % 97.5 %
(Intercept) 652.3291 655.984
Also the estimated mean and its standard deviation are very similar to mean
and standard error of the mean from the frequentist approach.
Priors: uninformed / mildly informed / informed
sonable?
Example:
You measure the eye colour of your fellow students. You sample 5 students and
they all have blue eyes.
100% of your sample has blue eyes. You have no variance. How many of the
remaining students will have blue eyes? Can you give a confidence interval?
Informed priors Above we used (similar to the frequentist approach) an uninformed prior. Here we will assume that we already know something. Actually,
we will pretend that we already did a similar study. That study gave us results
of similar precision but with a different mean. Here we pretend that our prior
distribution for is dnorm(664,1). Everything else remains the same.
library(runjags)
modelI <- model {
for (i in 1:length(y)) {
y[i] ~ dnorm(mu,tau)
}
mu
~ dnorm (664,1)
tau ~ dgamma(.01,.01)
}
}
bayesI<-run.jags(model=modelI,data=list(y=testscr),monitor="mu")
Compiling rjags model and adapting for 1000 iterations...
Calling the simulation using the rjags method...
Burning in the model for 4000 iterations...
Running the model for 10000 iterations...
Simulation complete
Calculating the Gelman-Rubin statistic for 1 variables....
The Gelman-Rubin statistic is below 1.05 for all parameters
Finished running the simulation
STATISTICAL THEORY
29
Density
659
658
mu
660
661
plot(bayesI,var="mu",type=c("trace","density"))
657
mu
14000
Iteration
Iterations = 5001:15000
Thinning interval = 1
Number of chains = 2
Sample size per chain = 10000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean
SD Naive SE Time-series SE
mu 658.9 0.7124 0.005037
0.005344
2. Quantiles for each variable:
2.5%
25%
50%
75% 97.5%
mu 657.5 658.4 658.9 659.4 660.3
We see that the informed prior has shifted the posterior away from the previous
results. The new results are now somewhere between the ones we got with an
uninformed prior and the new prior.
Comparison: Frequenties versus Bayesian approach
Frequentist: Null Hypothesis Significance Testing (Ronald A. Fisher, Statistical
Methods for Research Workers, 1925, p. 43)
c Oliver Kirchkamp
c Oliver Kirchkamp
30
Bayesian: (Thomas Bayes, 1702-1761; Metropolis et al., Equations of State Calculations by Fast Computing Machines. Journal of Chemical Physics, 1953.)
X , X is fixed, is random.
STATISTICAL THEORY
31
2
0
-2
-4
9 10 11 12 13 14 15 16 17 18 19 20
Groups
A researcher who a priori only suspects group 16 to have 6= 0 will find (correctly) a significant effect.
A researcher who does not have this a priori hypothesis, but who justs carries
out 20 independent tests, must correct for multiple testing and will find no significant effect. After all, it is not surprising to find in 5% of all samples a 95%
confidence interval which does not include the Null-hypothetical value.
2.8 Summary
Having started from these assumptions
single random samples of a population (Y1 , . . . , Yn are i.i.d.)
E Y4 <
the sample is large (n is large)
c Oliver Kirchkamp
c Oliver Kirchkamp
32
2.9 Exercises
1. Revision I
In the following task we will refresh some basic concepts:
You have the following data about childrens age (a) and the pocket money
(pm) they receive from their parents on children in elementary school.
age in years (a)
6
7
6
7
8
8
9
10
9
10
Median (pm)
2. Revision II
Define the following items:
Confidence interval
Histogram
Scatter plot
Box plot
3. First steps in R: I
Do the following tasks using R and the data from the exercise above on
childrens pocket money.
STATISTICAL THEORY
33
Read the data into R assigning the names age and pm to the variables
age and pocket money, respectively.
Compute the descriptive statistics that you calculated for the exercise
above in R.
Visualize the data with a scatter plot.
Give a summary statistic about the hourly wage in cents in 1976 (wage76).
Draw a histogram on the hourly wage in 1976 (wage76).
Draw a scatter plot on the hourly wage (wage76) and the years of education (ed76) both in the year 1976.
Are the wage (wage76) and the years of education received (ed76) correlated? What does this result mean?
5. Female labor supply
Do the following tasks using R and the library Ecdat. Use the data set
Workinghours on female labor supply:
Are the hours worked by wives (hours) related to the other income of
the household (income)? Calculate the answer and illustrate it with a
graph.
Are the hours worked by wives (hours) related to the education they
received (education)? Calculate the answer and illustrate it with a
graph.
Does the number of hours worked by wives (hours) who have at least
one child below 6 differ compared to wives without children under 6?
Illustrate your answer with a graph.
Do wives who live in a home owned by the household work more
hours?
How many hours do wives below 26 years of age work on average? Is
this significantly more than wives of age 26 and above work?
c Oliver Kirchkamp
c Oliver Kirchkamp
34
35
660
620
640
testscr
680
700
data(Caschool)
attach(Caschool)
scatterplot(testscr~str)
14
16
18
20
22
24
str
Yi = 1 Xi + 0 + ui
i = 1, . . . , n
Y dependent variable
X independent variable
1 slope
0 axis intercept
u error term
lm estimates an OLS regression. The result is saved to a variable. (est1 in this case)
26
c Oliver Kirchkamp
c Oliver Kirchkamp
36
To take a look at the result, we have to tell R to display it. E.g. we can use summary(est1).
R can also display the result graphically. For example we could type abline(est1).
Of course, we do not have to calculate these results manually. R can do that for
us.
lm(testscr~str, data=Caschool)
Call:
lm(formula = testscr ~ str, data = Caschool)
Coefficients:
(Intercept)
698.93
str
-2.28
Approximating Y:
^ 1 Xi +
^0
Y^i =
Residuals:
u
^ i = Yi Y^i
i = 1, . . . , n
i = 1, . . . , n
0 R2 1
37
Call:
lm(formula = testscr ~ str)
Residuals:
Min
1Q
-47.727 -14.251
Median
0.483
3Q
12.822
Max
48.540
Coefficients:
Estimate Std. Error t value
Pr(>|t|)
(Intercept) 698.9330
9.4675 73.825
< 2e-16 ***
str
-2.2798
0.4798 -4.751 0.00000278 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 18.58 on 418 degrees of freedom
Multiple R-squared: 0.05124,Adjusted R-squared: 0.04897
F-statistic: 22.58 on 1 and 418 DF, p-value: 0.000002783
cor(testscr,str)^2
[1] 0.0512401
1
n2
n
X
v
u
u
2
(^
ui u
^) = t
i=1
n
1 X 2
u
^i
n2
i=1
The SER is not only a part of summary. We can also calculate it manually:
est <- lm(testscr~str)
sqrt(with(est,sum(residuals^2)/df.residual))
[1] 18.58097
i = 1, . . . , n
1. E(ui |Xi = x) = 0
2. (Xi , Yi ) are i.i.d.
3. Large outliers in X and Y are rare (the fourth moments of X and Y exist)
c Oliver Kirchkamp
1
(1 + x2 )
1
1
F(x) = arctan(x) +
2
Z
1
dx
x2
2X =
(1 + x2 )
0.4
plot(dnorm,from=-4,to=4,ylab="density")
plot(dcauchy,from=-4,to=4,add=TRUE,lty=2)
legend("topleft",c("Normal","Cauchy"),lty=1:2)
0.2
0.3
Normal
Cauchy
0.0
0.1
density
c Oliver Kirchkamp
38
-4
-2
x
The sample distribution of 2 converges if X is a normally distributed random
variable. This is not the case for the Cauchy distribution.
In the following example, we draw two plots side-by-side. This is is done using the command commandparhttp://finzi.psych.upenn.edu/R/library/graphics/html/par.html(mfrow=c(1,2)). We can always
get back to the original state (one diagram at a time) by typing par(mfrow=c(1,1))
rnorm creates a vector of normally distributed (peudo-)random variables.
rcauchy creates a vector of Cauchy distributed (pseudo-)random variables.
39
set.seed(127)
N <- 1000
z <- rnorm(N)
plot(1:N,sapply(1:N,function(x) {var(z[1:x]) }),ylab="$\\sigma^2$",main="normal",t="l")
z <- rcauchy(N)
plot(1:N,sapply(1:N,function(x) {var(z[1:x]) }),ylab="$\\sigma^2$",main="Cauchy",t="l")
Cauchy
600
0
0.0
0.2
200
400
0.6
0.4
0.8
800
1.0
normal
400
800
1:N
400
800
1:N
c Oliver Kirchkamp
strCIdist
strH0dist
0.8
Density
c Oliver Kirchkamp
40
0.6
0.4
0.2
0.0
-4
-3
-2
-1
^1
sd(strH0dist)
[1] 0.4673529
sd(strCIdist)
[1] 0.4957402
sqrt(diag(vcov(est)))["str"]
str
0.4798256
coef(est)["str"]
str
-2.279808
coef(est)["str"]/sd(strH0dist)
str
-4.87813
^ 0 and
^ 1 are calculated by means of the sample. A different sample results
^ 0 and
^ 1.
in different values for
^ 0 and
^ 1.
Just as there is a distribution for Y there is a distribution for
41
^ 1 ) = 1 ? (OLS is unbiased)
Is E(
^ 1 ) small?
Is var(
How do we test hypotheses? (e.g. 1 = 0)
How do we calculate a confidence interval for 0 and 1 ?
^1
Mean value and variance of
^ 1 . We know:
We are interested in 1
^1
=
=
=
^ 1 1
Yi
Y
=
=
0 + 1 Xi + ui
0 + 1 X + u
also Yi Y
+ (ui u)
1 (Xi X)
Pn
i Y)
(X X)(Y
i=1
Pn i
2
i=1 (Xi X)
Pn
1 (Xi X)
+ (ui u)
(X
X)
i
i=1
Pn
2
i=1 (Xi X)
Pn
Pn
(X
X)(X
X)
(Xi X)(u
i u)
i
i=1
Pn i
P
1 i=1
+
n
2
2
i=1 (Xi X)
i=1 (Xi X)
Pn
(X X)(u
i u)
i=1
Pn i
2
(Xi X)
i=1
Now
n
X
(Xi X)(u
=
i u)
i=1
n
X
i
(Xi X)u
i=1
n
X
i
(Xi X)u
i=1
n
X
i
(Xi X)u
i=1
n
X
u
(Xi X)
i=1
n
X
i=1
Xi
n X u
c Oliver Kirchkamp
c Oliver Kirchkamp
42
Hence,
Pn
Pn
(X
X)(u
u)
i
i
i=1 (Xi X)ui
i=1
^ 1 1 =
Pn
P
=
n
2
2
(Xi X)
(Xi X)
i=1
i=1
^ 1 ) 1 :
Now we can calculate E(
^ 1 1
E
=
=
Pn
i
(X
X)u
i
E Pi=1
n
2
(Xi X)
Pn
i=1
(X
X)u
i
i
E E Pi=1
n
X1 , . . . , X n
2
(Xi X)
i=1
^ 1 ) 1
E(
1
Next we calculate the variance 21 :
Pn
i
(Xi X)u
^ 1 1 = Pi=1
n
2
(Xi X)
i=1
i
call (Xi X)u
s2X
^ 1 1
furthermore we have
vi
Pn
2
X)
n1
Pn
i=1 vi
(n 1)s2X
1 Pn
i=1 vi
n
i=1 (Xi
n1 2
n sX
n1
n
1
n
1, therefore
Pn
^ 1 ) = var(
^ 1 1 ) var
we have var(
43
i=1 vi
2X
1
n
Pn
i=1 vi
2X
var
=
1
n
Pn
i=1 vi
(2X )2
var(vi )/n
1 var((Xi X ) ui )
=
n
(2X )2
4X
Summary
If the three OLS assumptions are true,. . .
1. E(ui |Xi = x) = 0
2. (Xi , Yi ) are i.i.d.
3. Large outliers in X and Y are rare (the fourth moments of X and Y exist)
. . . then it is also true that. . .
^ 1 ) = 1
E(
^ 1) =
var(
^ is unbiased)
(
1 var((Xi X )ui )
n
4X
^1
3.4 Distribution of
^ 1 ) = 1
Mean value: E(
^ 1 1 =
Pn
1 Pn
If n is large, then n
i=1 (Xi X)ui is approximately normally distributed
(Central Limit Theorem)
^ 1) =
var(
^1 N
i)
1 var((Xi X)u
n
4X
i)
var((Xi X)u
1 ,
4
nX
c Oliver Kirchkamp
c Oliver Kirchkamp
44
^1
The larger the variance of X, the smaller the variance of
^1 N
mathematically:
i)
var((Xi X)u
1 ,
4
nX
intuitively:
We determine the regression line for two cases: Firstly, we use the entire
sample.
est1<-lm(testscr~str)
summary(est1)
Call:
lm(formula = testscr ~ str)
Residuals:
Min
1Q
-47.727 -14.251
Median
0.483
3Q
12.822
Max
48.540
Coefficients:
Estimate Std. Error t value
Pr(>|t|)
(Intercept) 698.9330
9.4675 73.825
< 2e-16 ***
str
-2.2798
0.4798 -4.751 0.00000278 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 18.58 on 418 degrees of freedom
Multiple R-squared: 0.05124,Adjusted R-squared: 0.04897
F-statistic: 22.58 on 1 and 418 DF, p-value: 0.000002783
Then we use only those observations where str deviates not much from the
mean value.
lowVar <- str>19 & str<21
est2<-lm(testscr~str,subset=lowVar)
summary(est2)
Call:
lm(formula = testscr ~ str, subset = lowVar)
Residuals:
Min
1Q Median
-46.98 -13.39
2.82
3Q
12.74
Max
42.40
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 677.205
44.538 15.205
<2e-16 ***
str
-1.204
2.233 -0.539
0.59
--Signif. codes:
Call:
lm(formula = testscr ~ str, subset = lowSize)
Residuals:
Min
1Q
-47.022 -13.591
Median
0.844
3Q
12.196
Max
48.722
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 704.4589
15.7465 44.737 < 2e-16 ***
str
-2.5993
0.7956 -3.267 0.00129 **
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 18.99 on 187 degrees of freedom
Multiple R-squared: 0.054,Adjusted R-squared: 0.04894
F-statistic: 10.67 on 1 and 187 DF, p-value: 0.001292
We observe that standard errors and p-values are larger in the second case.
The next diagram clarifies the problem:
plot(testscr ~ str)
points(testscr ~ str,col=2,subset=lowVar)
points(testscr ~ str,col=3,subset=lowSize)
abline(est1)
abline(est2,col=2)
abline(est2b,col=3)
45
c Oliver Kirchkamp
620
640
660
680
700
testscr
c Oliver Kirchkamp
46
14
16
18
20
22
24
str
p
^1
1
^ 1 is consistent
^ 1 E(
^ 1)
q
N(0, 1)
^
var(1 )
^0
3.5 Distribution of
2
^
^
For large n, 0 , too is normally distributed with 0 N 0 ,
^ 0 where
!
1
var(H
u
)
X
i i
Xi
2
und Hi = 1
^0 =
2
n E H2
E X2i
i
^1
3.6 Hypothesis tests for
26
47
The standard error of the estimator is derived from the estimated variance
of the estimator.
Y Y,0
Y
t=
^
1
^ 1:
Recall the theoretical variance of
i)
1 var((Xi X )ui )
var((Xi X)u
=
1
n
4X
n 4X
similarly the sample variance:
i)
1 estimate for var((Xi X)u
^ 2
=
^1
n
(estimate for 2x )2
with ^v = Xi X u
^i
1 Pn
v2
1
i=1 ^
n2
=
2 2
n 1 Pn
i=1 Xi X
n
2
^
^ 1 1,0
^
^1
c Oliver Kirchkamp
c Oliver Kirchkamp
48
F(|t|)
|t|
F(|t|)
0
|t|
str
-3.220249
str
-1.339367
confint(est1)
2.5 %
97.5 %
(Intercept) 680.32313 717.542779
str
-3.22298 -1.336637
49
We have already seen the p-value in the summary above. But we can also calculate it manually:
2 * pnorm (- abs(coef(est1) / sqrt(diag(vcov(est1)))))
(Intercept)
str
0.000000000000 0.000002020858
We have just used the approximation to the normal distribution. R uses the t
distribution in the summary command.
2 * pt (- abs(coef(est1) / sqrt(diag(vcov(est1)))),est1$df.resid)
(Intercept)
6.569925e-242
str
2.783307e-06
strCIdist
strH0dist
0.8
Density
0.6
0.4
0.2
0.0
-4
-3
-2
-1
^1
c Oliver Kirchkamp
modelR<-model {
for (i in 1:length(y)) {
y[i] ~ dnorm(beta0 + beta1*x[i],tau)
}
beta0 ~ dunif (0,1200)
beta1 ~ dnorm (0,.0001)
tau
~ dgamma(.01,.01)
}
}
bayesR<-run.jags(model=modelR,data=list(y=testscr,x=str),
monitor=c("beta0","beta1"))
Compiling rjags model and adapting for 1000 iterations...
Calling the simulation using the rjags method...
Burning in the model for 4000 iterations...
Running the model for 10000 iterations...
Simulation complete
Calculating the Gelman-Rubin statistic for 2 variables....
Convergence may have failed for this run for 2 parameters after 10000
iterations (multi-variate psrf = 1.164)
Finished running the simulation
0.8
0.4
0.0
-3.0
-2.5
-2.0
Density
-1.5
-1.0
plot(bayesR,var="beta1",type=c("trace","density"))
-4
-3.5
beta1
c Oliver Kirchkamp
50
-3
-2
beta1
6000 8000 10000
Iteration
14000
-1
51
summary(bayesR)
Iterations = 5001:15000
Thinning interval = 1
Number of chains = 2
Sample size per chain = 10000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean
SD Naive SE Time-series SE
beta0 700.611 9.9421 0.070301
1.10111
beta1 -2.365 0.5038 0.003563
0.05573
2. Quantiles for each variable:
2.5%
25%
50%
75%
97.5%
beta0 682.074 693.408 700.372 707.782 719.225
beta1 -3.305 -2.732 -2.353 -1.999 -1.423
str
-2.279808
sqrt(diag(vcov(est1)))
(Intercept)
9.4674914
str
0.4798256
confint(est1)
2.5 %
97.5 %
(Intercept) 680.32313 717.542779
str
-3.22298 -1.336637
As in section 2.7 we see that the credible intervals are similar to the frequentist
confidence intervals. The interpretation, however, is quite different. The credible
intervals make a direct statement about the probability that 1 is in a certain interval. Confidence intervals make much more indirect statement which is harder
to interpret.
c Oliver Kirchkamp
c Oliver Kirchkamp
52
summary(lm(testscr ~ str))
Call:
lm(formula = testscr ~ str)
Residuals:
Min
1Q
-47.727 -14.251
Median
0.483
3Q
12.822
Max
48.540
Coefficients:
Estimate Std. Error t value
Pr(>|t|)
(Intercept) 698.9330
9.4675 73.825
< 2e-16 ***
str
-2.2798
0.4798 -4.751 0.00000278 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 18.58 on 418 degrees of freedom
Multiple R-squared: 0.05124,Adjusted R-squared: 0.04897
F-statistic: 22.58 on 1 and 418 DF, p-value: 0.000002783
Standard errors are often shown in parentheses below the estimated coefficients.
The estimated regression line is testscr = 698.933 2.2798 str
The standard error of 0 = 9.4675
The standard error of 1 = 0.4798
The R2 = 0.05, the standard error of the residuals is SER = 18.58.
These are almost all of the numbers we need to perform a hypothesis test and
calculate confidence intervals.
Nominal / discrete
sex
53
profession
sector of a firm
income in categories
Binary variable / dummy-variables are a special case of nominal variables
sex male/female
Call:
lm(formula = testscr ~ large)
Residuals:
Min
1Q
-50.435 -14.071
Median
-0.285
3Q
12.778
Max
49.565
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 657.185
1.202 546.62 < 2e-16 ***
largeTRUE
-7.185
1.852
-3.88 0.000121 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 18.74 on 418 degrees of freedom
Multiple R-squared: 0.03476,Adjusted R-squared: 0.03245
F-statistic: 15.05 on 1 and 418 DF, p-value: 0.0001215
c Oliver Kirchkamp
700
700
680
680
testscr
testscr
c Oliver Kirchkamp
54
660
660
640
640
620
620
14 16 18 20 22 24 26
str
large
Call:
lm(formula = testscr ~ large)
Residuals:
Min
1Q
-50.435 -14.071
Median
-0.285
3Q
12.778
Max
49.565
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 657.185
1.202 546.62 < 2e-16 ***
largeTRUE
-7.185
1.852
-3.88 0.000121 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 18.74 on 418 degrees of freedom
Multiple R-squared: 0.03476,Adjusted R-squared: 0.03245
F-statistic: 15.05 on 1 and 418 DF, p-value: 0.0001215
confint(est1)
2.5 %
97.5 %
(Intercept) 654.82130 659.547833
largeTRUE
-10.82554 -3.544715
t.test(testscr ~ large)
tapply(testscr,large,sd)
FALSE
TRUE
19.28629 17.96589
55
c Oliver Kirchkamp
10
12
x <- runif(1000)
u <- rnorm(1000)
y <- 10 - 3.1*x + u
plot(y ~ x)
c Oliver Kirchkamp
56
0.0
0.2
0.4
0.6
0.8
1.0
57
est<-lm(y ~ x)
plot(est,which=1:2)
Normal Q-Q
7.0
8.0
9.0
Fitted values
10.0
324 126
622
-3 -2 -1
0
-2
Residuals
324
622
126
Standardized residuals
Residuals vs Fitted
-3
-1
1 2 3
Theoretical Quantiles
u2 <- rnorm(1000)*x
y2 <- 10 - 3.1*x + u2
plot(y2 ~ x)
c Oliver Kirchkamp
y2
10
0.0
0.2
0.4
0.6
0.8
1.0
est2<-lm(y2 ~ x)
plot(est2,which=1:2)
Residuals vs Fitted
Normal Q-Q
7.0
8.0
9.0
839
-2
-3
383 697
-4
-2
-1
Standardized residuals
839
Residuals
c Oliver Kirchkamp
58
10.0
Fitted values
In both examples it is true that E(ui |Xi = x) = 0
In the first example u is homoscedastic
697
383
-3
-1
1 2 3
Theoretical Quantiles
59
data(uswages,package="faraway")
plot(wage ~ educ,data=uswages)
plot(lm(wage ~ educ,data=uswages),which=1:2)
10
2780
25909
4000
2000
Residuals
2000
4000
25909
Standardized residuals
15
15387
2780
wage
Normal Q-Q
15387
6000
6000
8000
8000
Residuals vs Fitted
10
15
educ
data(Caschool,package="Ecdat")
attach(Caschool)
est <- lm(testscr ~ str)
plot(testscr ~ str)
plot(est,which=1:2)
100
300
500
700
Fitted values
-3
-1
Theoretical Quantiles
c Oliver Kirchkamp
Normal Q-Q
700
60
Residuals vs Fitted
417
0
-2
-1
Standardized residuals
20
-20
-40
Residuals
40
417
680
660
640
620
testscr
c Oliver Kirchkamp
60
6
7
14
18
22
26
640
str
650
660
Fitted values
-3
-1 0
Theoretical Quantiles
}|
{
z }| {
z
2
1 (E(Xi X )) var ui + (E(ui ))2 var(Xi X ) + var(Xi X ) var ui
n
4X
2u
1 2X 2u
=
n 4X
n 2X
^
^1
v
u
u1
=t
n
61
1 Pn
^ 2i
i=1 u
n2
1 Pn
2
i=1 (Xi X)
n
^
^1 =
1 var((Xi X )ui )
n
4X
^
^
v
u
u1
=t
n
1 Pn
^ 2i
i=1 u
n2
1 Pn
2
i=1 (Xi X)
n
^
^1
v
u
u1
=u
tn
1
n2
Pn
1
n
i=1
Pn
i=1
2
Xi X u
^i
2 2
Xi X
The formula for the case of homoscedastic error terms is simpler, but it is only
correct if the assumption of homoscedastic error terms is actually satisfied.
Since the formulas are different, we usually get different results.
Homoscedasticity is the standard setting of the software (if not the only possible setting).
^ than the setting for
Typically, it will give us smaller standard error for
heteroscedasticity.
^ under the assumption of heteroscedastic residuals.
hccm calculates the variance-covariance matrix for
est <- lm(testscr ~ str)
summary(est)
c Oliver Kirchkamp
c Oliver Kirchkamp
62
Call:
lm(formula = testscr ~ str)
Residuals:
Min
1Q
-47.727 -14.251
Median
0.483
3Q
12.822
Max
48.540
Coefficients:
Estimate Std. Error t value
Pr(>|t|)
(Intercept) 698.9330
9.4675 73.825
< 2e-16 ***
str
-2.2798
0.4798 -4.751 0.00000278 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 18.58 on 418 degrees of freedom
Multiple R-squared: 0.05124,Adjusted R-squared: 0.04897
F-statistic: 22.58 on 1 and 418 DF, p-value: 0.000002783
Call:
lm(formula = testscr ~ str)
Residuals:
Min
1Q
-47.727 -14.251
Median
0.483
3Q
12.822
Max
48.540
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 698.9330
10.4605 66.816
< 2e-16 ***
str
-2.2798
0.5244 -4.348 0.0000173 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 18.58 on 418 degrees of freedom
Multiple R-squared: 0.05124,Adjusted R-squared: 0.04897
F-statistic: 18.9 on 1 and 418 DF, p-value: 0.00001729
(Intercept)
str
(Intercept)
str
9.467491
NaN
NaN 0.4798256
63
sqrt(hccm(est))
(Intercept)
str
(Intercept)
str
10.46053
NaN
NaN 0.5243585
^
we can calculate confidence intervals for
^
we can test hypothesis about
A large amount of econometric analysis is presented in the form of OLS.
One reason for this is that many people understand, how OLS works.
Whenever we use a different estimator we run the risk of not being understood
by others.
Is that enough of an explanation to use OLS?
Are there better estimators? Estimators with a lower variance?
To answer this question we will make additional assumptions
c Oliver Kirchkamp
c Oliver Kirchkamp
64
Assumptions 4 and 5 are more restrictive they are warranted less often.
Gauss Markov
^ 1 has the smallest variance of all linear estimators (of all estimators
Assuming 1-4,
which are linear functions of Y).
Efficiency of OLS-II
^ 1 has the smallast variance of all consistent estimators, if n
Assuming 1-5,
(regardless of wether the estimators are linear or non-linear)
b0 ,b1
n
X
i=1
LAD: min
b0 ,b1
n
X
|Yi (b0 + b1 Xi )|
i=1
however, OLS is used in most use cases we will do the same thing here.
65
1000
1500
138
500
foodexp
2000
library(quantreg)
data(engel)
attach(engel)
1000
2000
3000
4000
5000
income
The estimation result depends on the inclusion or exclusion of observation 138:
lm(foodexp ~ income)
Call:
lm(formula = foodexp ~ income)
Coefficients:
(Intercept)
147.4754
income
0.4852
lm(foodexp ~ income,data=engel[-138,])
c Oliver Kirchkamp
Call:
lm(formula = foodexp ~ income, data = engel[-138, ])
Coefficients:
(Intercept)
91.3330
income
0.5465
plot(foodexp ~ income)
text(engel[138,1],engel[138,2],138,pos=2)
est <- lm(foodexp ~ income)
abline(est)
abline(lm(foodexp ~ income,data=engel[-138,]),lty=2)
legend("bottomright",c("all","138 dropped"),lty=1:2,cex=.5)
plot(est,which=2)
all
138 dropped
1000
3000
-4
-2
59
105
-6
500
1000
1500
138
Standardized residuals
2000
Normal Q-Q
foodexp
c Oliver Kirchkamp
66
5000
income
138
-3
Theoretical Quantiles
-1 0
67
where
1. (x) = x2
OLS
2. (x) = |x|
LAD (quantile regression)
x2 /2
if |x| c
3. (x) =
2
c|x| c /2 else
Hubers Method. c is an estimated value for u .
OLS
LAD
Huber
(x)
6
4
2
0
-3
-2
-1
x
rq performs a quantile regression, minimizing the sum of the absolutes of the residuals. rlm performs a
robust regression.
LAD:
library(quantreg)
summary(rq(foodexp ~ income))
Huber:
c Oliver Kirchkamp
library(MASS)
summary(rlm(foodexp ~ income))
1000
1500
2000
plot(foodexp ~ income)
abline(lm(foodexp ~ income))
abline(rq(foodexp ~ income),lty=2)
abline(rlm(foodexp ~ income),lty=3)
legend("bottomright",c("OLS","LAD","Huber"),lty=1:3)
OLS
LAD
Huber
500
foodexp
c Oliver Kirchkamp
68
1000
2000
3000
4000
5000
income
69
05 1525
Density
beta1
plot(bayesR,var="beta1",type=c("trace","density"),newwindows=FALSE)
0.45
0.50
beta1
Iteration
14000
0.55
c Oliver Kirchkamp
c Oliver Kirchkamp
70
summary(bayesR)
Iterations = 5001:15000
Thinning interval = 1
Number of chains = 2
Sample size per chain = 10000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean
SD Naive SE Time-series SE
beta0 143.872 15.91473 0.1125341
0.3198132
beta1
0.488 0.01429 0.0001011
0.0002814
2. Quantiles for each variable:
2.5%
25%
50%
75%
97.5%
beta0 112.6957 133.1380 143.882 154.6010 174.6390
beta1
0.4602
0.4785
0.488
0.4976
0.5164
So far our results are very similar to the (non-robust) OLS results from above.
The Normal distribution does not put much weight on its tails. In other words,
observations (outliers) which are several standard deviations away from the expected value are very unlikely. When we use the Normal distribution above we,
implicitely, shift the estimator closer to these observations, so that the distance
between posterior estimate and observation becomes smaller.
A distribution which may (but need not) put more weight on its tails, and which
still contains the Normal distribution as a special case, is the t-distribution. If
the degrees of freedom are large, the t-distribution is very close to the Normal
distribution. If the degrees of freedom are small, the t-distribution has very fat
tails.
t20
71
t1
0.4
Density
0.3
0.2
0.1
0.0
-3
-2
-1
summary(bayesRR20)
c Oliver Kirchkamp
c Oliver Kirchkamp
72
Iterations = 5001:15000
Thinning interval = 1
Number of chains = 2
Sample size per chain = 10000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean
SD Naive SE Time-series SE
beta0 97.8757 15.84981 0.1120751
0.4986725
beta1 0.5389 0.01606 0.0001136
0.0004964
2. Quantiles for each variable:
2.5%
25%
50%
75%
97.5%
beta0 66.5581 87.4478 97.828 108.3922 129.4566
beta1 0.5067 0.5284 0.539
0.5495
0.5705
Bayesian, k = 1 Let us compare this with a small value for degrees of freedom:
modelRR1<-model {
for (i in 1:length(y)) {
y[i] ~ dt(beta0 + beta1*x[i],tau,1)
}
beta0 ~ dnorm (0,.0001)
beta1 ~ dnorm (0,.0001)
tau
~ dgamma(.01,.01)
}
}
bayesRR1<-run.jags(model=modelRR1,data=list(y=foodexp,x=income),
monitor=c("beta0","beta1"))
Compiling rjags model and adapting for 1000 iterations...
Calling the simulation using the rjags method...
Burning in the model for 4000 iterations...
Running the model for 10000 iterations...
Simulation complete
Calculating the Gelman-Rubin statistic for 2 variables....
The Gelman-Rubin statistic is below 1.05 for all parameters
Finished running the simulation
summary(bayesRR1)
Iterations = 5001:15000
Thinning interval = 1
Number of chains = 2
Sample size per chain = 10000
73
c Oliver Kirchkamp
Mean
SD Naive SE Time-series SE
beta0 62.3640 14.47629 0.1023629
0.5464357
beta1 0.5892 0.01818 0.0001286
0.0006682
2. Quantiles for each variable:
2.5%
25%
50%
75%
97.5%
beta0 35.2204 52.2198 62.0430 72.192 90.9461
beta1 0.5533 0.5767 0.5897 0.602 0.6234
modelRR<-model {
for (i in 1:length(y)) {
y[i] ~ dt(beta0 + beta1*x[i],tau,k)
}
beta0 ~ dnorm (0,.0001)
beta1 ~ dnorm (0,.0001)
tau
~ dgamma(.01,.01)
k
~ dexp(1/30)
}
}
bayesRR<-run.jags(model=modelRR,data=list(y=foodexp,x=income),
monitor=c("beta0","beta1","k"))
Compiling rjags model and adapting for 1000 iterations...
Calling the simulation using the rjags method...
Burning in the model for 4000 iterations...
Running the model for 10000 iterations...
Simulation complete
Calculating the Gelman-Rubin statistic for 3 variables....
The Gelman-Rubin statistic is below 1.05 for all parameters
Finished running the simulation
k<-10^(seq(-.5,2,.1))
xyplot(pexp(k,1/30) ~ k,type="l",scales=list(x=list(log=T)),xscale.components = xscale.compone
0.8
pexp(k,1/30)
c Oliver Kirchkamp
74
0.6
0.4
0.2
0.3
10
30
100
A t-distribution with 20 degrees of freedom is very similar to a normal distribution. If our prior for k follows dexp(k,1/30), then the probability for k > 20
is (slightly) larger than 1/2, i.e. we are giving the traditional model (of an almost
normal distribution) a very good chance.
plot(bayesRR,var="k",type=c("trace","density"))
75
0.00.20.4
Density
7
6
5
4
14000
Iteration
0 510 20
Density
beta1
plot(bayesRR,var="beta1",type=c("trace","density"))
0.50
0.55
beta1
Iteration
summary(bayesRR)
14000
0.60
c Oliver Kirchkamp
c Oliver Kirchkamp
76
Iterations = 5001:15000
Thinning interval = 1
Number of chains = 2
Sample size per chain = 10000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean
SD Naive SE Time-series SE
beta0 79.4320 15.32741 0.1083812
0.5138090
beta1 0.5622 0.01785 0.0001262
0.0006043
k
3.5468 0.86702 0.0061308
0.0131400
2. Quantiles for each variable:
2.5%
25%
50%
75%
97.5%
beta0 48.6728 69.2156 79.5799 89.814 108.5751
beta1 0.5284 0.5498 0.5619 0.574
0.5984
k
2.2482 2.9405 3.4172 4.001
5.6210
We see that our estimation for the slope (based on the t-distribution) is larger,
similar to the LAD and Huber regression.
Different from LAD and Huber the regression itself has chosen the optimal way
(or the optimal value of k) to accomodate outliers.
3.14 Exercises
1. Regressions I
Define the following items:
Regression
Independent variable
Dependent variable
Give the formula for a linear regression with a single regressor.
What does 1 indicate?
What does u indicate?
2. Regressions II
Use the data set Crime of the library Ecdat in R.
How do you interpret positive (negative) coefficients in simple linear
models?
What is the influence on the number of police men per capita (polpc) on
the crime rate in crimes committed per person (crmrte)? Interpret your
result. Do you have an explanation for this result?
77
What is the correlation coefficient of the number of police men per capital (polpc) and the crime rate (crmrte)? Interpret the result.
What do the standard errors tell you?
What does R2 indicate?
3. Regressions III
List and explain the assumptions that have to be fulfilled to be able to use
OLS.
4. Classes of variables
Explain and give examples of the following types of variables:
continuous
discrete
binary
5. Dummies
Use the data set BudgetFood of the library Ecdat in R.
What is a dummy variable? Which values can it take?
c Oliver Kirchkamp
c Oliver Kirchkamp
78
Draw a scatter plot with age on the x-axis and pm on the y-axis.
Draw the same scatter plot, this time without box plots and without
the lowess spline, but with a linear regression line.
Label your scatter plot with age on the x-axis and pocket money on the
y-axis. Give your graph the title Childrens pocket money.
7. Exam 28.7.2007, exercise 5a+d
Your task is to work on a hypothetical data set in R.
The variable names A, B, C, D, E, and Year are in the header of your data file
file.csv. The data set contains 553 observations in the format .csv (comma
separated values). Explain what the following commands do and choose
the correct one (with explanation).
First, read your data set into R.
Create a dummy variable which takes the value 1 if the size of the sales
floor is > 120 sqm and 0 otherwise.
79
Draw a graph with separate box plots for large and small sales floors
on the sales per square meter.
Measure the influence of the size of the sales floor on the sales per
square meter.
Do the same task as above, this time using your variable for large sales
floors.
9. Heteroscedasticity
What is heteroscedasticity?
Give an example for data where residual variances of differ along the
dimension of a second variable.
What is homoscedasticity?
Look at the variables the data set contains. Formulate some sensible
hypotheses and test them in R.
12. Wages
Use the data set Wages1 of the library Ecdat in R.
c Oliver Kirchkamp
c Oliver Kirchkamp
80
Do you think that gender (sex) matters when it comes to wages (wage)?
Check your assumption in R.
Are years of education received (school) and gender correlated with
each other?
Do you think that experience (exper) or years of schooling matter more
for wage? Check your assumption in R. Which tests could you use to
test it?
Do employees with a college education (more than 12 years of education) earn more than those without? Test this in R. Which type of
variable do you use to answer this question?
Do you think that our models above are well specified? How would
you change the model if you could?
How can we include more than one factor at the same time?
Keep one factor constant by only looking at a small group (e.g. all students with a very similar elpct (english learner percentage))
The subset option of the command lm limits the estimation to a certain part of the dataset.
data(Caschool)
attach(Caschool)
summary(elpct)
Min. 1st Qu.
0.000
1.941
Median
8.778
Max.
85.540
Call:
lm(formula = testscr ~ str, subset = (elpct < 9))
Coefficients:
(Intercept)
680.252
str
-0.835
81
Call:
lm(formula = testscr ~ str, subset = (elpct >= 9 & elpct < 23))
Coefficients:
(Intercept)
696.445
str
-2.231
Call:
lm(formula = testscr ~ str, subset = (elpct >= 23))
Coefficients:
(Intercept)
653.0746
str
-0.8656
[0,9]
(9,23]
(23,100]
700
testscr
680
660
640
620
14
16
18
20
22
24
str
26
c Oliver Kirchkamp
c Oliver Kirchkamp
82
generally:
y = 0 + 1 x1 + 2 x2 + + k xk + u
for every observation:
0 + 1 x11 + 2 x12 + + k x1k + u1
y1
y2
y3
=
=
..
.
yn
Call:
lm(formula = testscr ~ str + elpct)
Coefficients:
(Intercept)
686.0322
str
-1.1013
elpct
-0.6498
y=
y1
y2
y3
..
.
yn
X=
1 x11
1 x21
1 x31
..
.
x12
x22
x32
1 xn1 xn2
x1k
x2k
x3k
..
..
.
.
xnk
; =
0
1
2
..
.
k
; u =
u1
u2
u3
..
.
un
Addition
Multiplication
nm
nm
a21
a22
{z
nm
b12
a23 b22
b32
} |
{z
mk
P
m
=
i=1 a2i bi2
} |
{z
nk
83
We define vectors with c(...). We can then stack vectors horizontally or vertially
with rbind(...) or cbind(...).
A <- rbind(c(1,2,3),c(4,5,6))
[,1] [,2] [,3]
[1,]
1
2
3
[2,]
4
5
6
B <- cbind(c(2,2,2),c(3,3,3))
[1,]
[2,]
[3,]
[,1] [,2]
2
3
2
3
2
3
[1,]
[2,]
+ adds matrices elementwise. This requires that the matrices have the same
rank. In this example we can not calculate A+B but we can calculate A+t(B).
A + t(B)
[1,]
[2,]
* multiplies the elements of a matrix. This is not the usual matrix multiplication:
A * t(B)
[1,]
[2,]
c Oliver Kirchkamp
c Oliver Kirchkamp
84
A
[,1] [,2] [,3]
[1,]
1
2
3
[2,]
4
5
6
B
[1,]
[2,]
[3,]
[,1] [,2]
2
3
2
3
2
3
[1,]
[2,]
[,1] [,2]
12
18
30
45
y=
y1
y2
y3
..
.
=
..
.
X=
y = X + u
1 x11
1 x21
1 x31
..
.
x12
x22
x32
yn
1 xn1 xn2
y = X + u: Now, the residuals are
x1k
x2k
x3k
..
..
.
.
xnk
; =
0
1
2
..
.
k
; u =
u = y X
The sum of squares of the residuals
S() =
n
X
u2i = u u
(y X) (y X)
y y y X X y + X X
y y 2 X y + X X
i=1
u1
u2
u3
..
.
un
85
(recall: (AB) = B A )
To minimize S(), we take the first derivative with respect to :
S()
!
= 2X y + 2X X = 0
^ = X y
X X
Normal equations:
(7)
=
=
(X X)1 X y
(X X)1 X y
| {z }
X+
X+ = (X X)1 X
as an exercise: show that
XX+ X = X
X+ XX+ = X+
(XX+ ) = XX+
X+ X = I
Call
^
^ = X
y
^ = yy
^
u
orthogonality
^ = X y X X(X X)1 X y = 0
^ = X (y y
^ ) = X y X X
X u
^ X u
^ u
^=
^=0
y
^ = X y with
^ yields
Multiplying the normal equation X X
^ X X
^=
^ X y
c Oliver Kirchkamp
c Oliver Kirchkamp
86
then
^ u
^
u
=
=
=
=
=
=
^ (y X)
^
(y X)
^ X y +
^ X X
^
y y 2
^ X X
^+
^ X X
^
y y 2
^ X X
^
y y
^ X
^
y y (X)
^ y
^
y y y
y1
1 x11 x12 x1k
y2
1 x21 x22 x2k
y3
y=
; X = 1 x31 x32 x3k
..
..
..
..
.
.
.
.
1 xn1 xn2 xnk
yn
1
1
1
^ y
^ u
^+ u
^
y y= y
n
n
n
1
1
1
^ y
^ u
^ y 2 + u
^
y y y 2 = y
n
n
n
TSS
2
2
2
ESS
sy = sy
^ + su
^
|{z} |{z} |{z}
SSR
SSR
TSS
ESS
R =
s2y
^
s2y
; =
0
1
2
..
.
k
; u =
s2u
= 1 2^
sy
u1
u2
u3
..
.
un
87
French paradox: Red wine, foie gras less illnesses of the coronary blood
vessels (Samuel Black, 1819)
(missing variable: percentage of fish and sugar in the diet,. . . )
c Oliver Kirchkamp
18
1966
1965
1953
1963
1962
1961 1960
16
1967
1958
1954
1956
1957
1955
1959
1972
1968
1971
1969
1975
1973
1970
14
birth rate
1964
12
1974
1976
1977
50
100
150
200
10
Scotland
Canada
England
Sweden
Norway
Ireland
Belgium
Netherlands
Denmark
Austria
Germany
Mortality
Italy
Switzerland
c Oliver Kirchkamp
88
France
20
40
60
Wine [l/year]
Mortality due to coronary heart disease (per 1000 men, 55 64 years). St. Leger
A.S., Cochrane, A.L. and Moore, F. (1979). Factors Associated with Cardiac Mor-
89
| {z }
X+
b1
(X1 X1 )1 X1 y
(X1 X1 )1 X1 (X1 1 + X2 2 + u)
E(b1 ) =
1 + (X1 X1 )1 X1 X2 2
Y = 0 + 1 X1 + 2 X2 + u
X2 is correlated with X1 , e.g.:
X2 = X1 +
Omitting X2 means:
Y = 0 + 1 X1 + 2 (X1 + ) + u
= 0 + (1 + 2 )X1 + 2 + u
We overestimate 1 if 2 > 0
We underestimate 1 if 2 < 0
c Oliver Kirchkamp
c Oliver Kirchkamp
90
4.6 Multicollinearity
Example
testscr = 1 str + 2 elpct + 0
Now, we extend the model by adding another variable: Ratio of English learners FracEL=elpct/100:
testscr = 1 str + 2 elpct + 3 FracEL + 0
FracEL<-elpct/100
Call:
lm(formula = testscr ~ str + elpct)
Residuals:
Min
1Q
-48.845 -10.240
Median
-0.308
3Q
9.815
Max
43.461
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 686.03225
7.41131 92.566 < 2e-16 ***
str
-1.10130
0.38028 -2.896 0.00398 **
elpct
-0.64978
0.03934 -16.516 < 2e-16 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 14.46 on 417 degrees of freedom
Multiple R-squared: 0.4264,Adjusted R-squared: 0.4237
F-statistic:
155 on 2 and 417 DF, p-value: < 2.2e-16
Call:
lm(formula = testscr ~ str + elpct + FracEL)
Residuals:
Min
1Q
-48.845 -10.240
Median
-0.308
3Q
9.815
Max
43.461
FracEL<-elpct/100+rnorm(4)*.0000001
summary(lm(testscr ~ str + elpct + FracEL))
Call:
lm(formula = testscr ~ str + elpct + FracEL)
Residuals:
Min
1Q
-48.608 -10.063
Coefficients:
Median
-0.152
3Q
9.613
Max
43.857
91
c Oliver Kirchkamp
c Oliver Kirchkamp
92
Estimate
Std. Error t value Pr(>|t|)
(Intercept)
685.9887
7.4172 92.486 < 2e-16 ***
str
-1.1040
0.3806 -2.901 0.00392 **
elpct
-53520.1940
87264.2711 -0.613 0.54001
FracEL
5351954.3231 8726426.9534
0.613 0.54001
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 14.48 on 416 degrees of freedom
Multiple R-squared: 0.4269,Adjusted R-squared: 0.4228
F-statistic: 103.3 on 3 and 416 DF, p-value: < 2.2e-16
We notice that R detects the multicollinearity on its own. It simplifies the model
accordingly.
But this does not always work.
We slightly perturb the variable. This can happen accidentally (e.g. through
rounding errors). Multicollinearity between the variables is no longer perfect.
We get the same result for all coefficients, but any (ever so slight) perturbation
changes the result considerably. The standard errors get very large.
elpct[1:5]/100
[1] 0.00000000 0.04583333 0.30000002 0.00000000 0.13857677
elpct[1:5]/100+rnorm(4)*.0000001
[1]
[5]
0.00000001292877
0.13857678752594
0.04583350642929
0.30000006516511 -0.00000012650612
elpct[1:5]/100+rnorm(4)*.0000001
[1] -0.00000006868529
[5] 0.13857670591188
0.04583329035659
0.30000014148167
0.00000003598138
elpct[1:5]/100+rnorm(4)*.0000001
[1] 0.00000004007715 0.04583334599106 0.29999996348937 0.00000017869131
[5] 0.13857681467431
93
perturbedEstimate(1)
elpct
FracEL
56766.73 -5676738.56
perturbedEstimate(1)
elpct
FracEL
-64347.34 6434669.12
-10000000
-30000000
FracEL
10000000
-200000
-100000
100000
200000
300000
elpct
Large coefficients for elpct are balanced by small coefficients for FracEL. What
happened? What is the true relationship?
testscr = 686.0322 1.1013str 0.6498elpct
FracEL = elpct/100
testscr = 686.0322 1.1013 str + (a 0.6498) elpct 100a elpct/100
testscr = 686.0322 1.1013 str + (a 0.6498) elpct 100a FracEL
coefficients cannot be identified anymore.
c Oliver Kirchkamp
c Oliver Kirchkamp
94
4.6.1 Example 2
Dummy variable assumes the value 1 if str>12 (group is not very small)
NVS <- str>12
lm(testscr ~ str + elpct + NVS)
Call:
lm(formula = testscr ~ str + elpct + NVS)
Coefficients:
(Intercept)
686.0322
str
-1.1013
elpct
-0.6498
NVSTRUE
NA
The new variable NVS is always TRUE and, hence, it is perfectly correlated with
the constant term. Explanation: There are no groups with str < 12. Therefore,
we cannot assess the effect of such a small group size.
4.6.2 Example 3
ESpct = 100 elpct
ESpct <- 100 - elpct
lm(testscr ~ str + elpct + ESpct)
Call:
lm(formula = testscr ~ str + elpct + ESpct)
Coefficients:
(Intercept)
686.0322
str
-1.1013
elpct
-0.6498
ESpct
NA
95
coef(est)[3:4]
}
estList <- sapply(1:100,perturbedEstimate2)
plot(t(estList),main="multicollinearity 2, estimated coefficients")
100
-300
-100
ESpct
300
-300
-200
-100
100
200
300
elpct
c Oliver Kirchkamp
c Oliver Kirchkamp
96
1
1 R2i
In the following example we build a(n) (almost) linearly dependent value elpct2.
Additionally, we add an obviously pointless regressor to the equation: the number of the school district.
set.seed(123)
elpct2 <- elpct + rnorm(4)
est <- lm (testscr ~ str + elpct2 + elpct + as.numeric(district))
summaryR(est)
Call:
lm(formula = testscr ~ str + elpct2 + elpct + as.numeric(district))
Residuals:
Min
1Q
-48.328 -10.212
Median
-0.168
3Q
9.518
Max
43.872
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
685.454379
8.923996 76.810
<2e-16 ***
str
-1.096242
0.438310 -2.501
0.0128 *
elpct2
0.544466
0.853474
0.638
0.5239
elpct
-1.196071
0.857852 -1.394
0.1640
as.numeric(district)
0.001910
0.005937
0.322
0.7478
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 14.49 on 415 degrees of freedom
Multiple R-squared: 0.4271,Adjusted R-squared: 0.4216
F-statistic:
109 on 4 and 415 DF, p-value: < 2.2e-16
We notice that there are some factors with a high variance. We calculate the
variance inflation factor to test for collinearity:
library(car)
elpct2 <- elpct + rnorm(4)
est <- lm (testscr ~ str + elpct2 + elpct + as.numeric(district))
vif(est)
str
1.040984
as.numeric(district)
1.008520
elpct2
512.718019
elpct
512.761866
97
str
elpct mealpct calwpct
1.044388 1.962265 3.870485 2.476509
We notice (at least we would if we had not know in advance) that the number
of the school district is not significant, but neither is it collinear. The two versions
of elpct are collinear. If we remove one, the variance of the other gets smaller.
summaryR(lm (testscr ~ str +
elpct + as.numeric(district)))
group 2
group 3
group 1
constant
1
1
1
1
1
1
1
0
0
1
0
0
0
1
0
0
1
0
0
0
1
0
0
1
^
The distribution of
c Oliver Kirchkamp
c Oliver Kirchkamp
98
^
4.7.1 The variance of
For the simple regression (as a reminder):
^ 2
^
1
=
n
^ 2
^ =
1
1 Pn
^ 2i
i=1 u
n2
1 Pn
2
i=1 (Xi X)
n
1 Pn
v2
i=1 ^
n2
Homoscedasticity
1
n 1 Pn
2 2
i=1 Xi X
n
(mit ^v = Xi X u
^ i)
^ 2u (X X)1
^
^ =
Homoscedasticity
^ j j,0
t=
^
^j
The p-value is p = Pr |t| > tsample = 2(|tsample |)
F(|t|)
|t|
F(|t|)
0
|t|
99
^
Homoscedastic standard deviation of
^ 2u (X X)1
^
^ =
homoscedasticity
(stddevh <-sqrt(diag(vcov(est))))
(Intercept)
7.41131248
str
0.38027832
coef(est) /
stddevh
(Intercept)
92.565554
str
-2.896026
elpct
0.03934255
elpct
-16.515879
round(2*pnorm(- abs(coef(est)) /
(Intercept)
0.00000
str
0.00378
stddevh),5)
elpct
0.00000
summary(est)
Call:
lm(formula = testscr ~ str + elpct)
Residuals:
Min
1Q
-48.845 -10.240
Median
-0.308
3Q
9.815
Max
43.461
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 686.03225
7.41131 92.566 < 2e-16 ***
str
-1.10130
0.38028 -2.896 0.00398 **
elpct
-0.64978
0.03934 -16.516 < 2e-16 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 14.46 on 417 degrees of freedom
Multiple R-squared: 0.4264,Adjusted R-squared: 0.4237
F-statistic:
155 on 2 and 417 DF, p-value: < 2.2e-16
c Oliver Kirchkamp
c Oliver Kirchkamp
100
summaryR(est)
Call:
lm(formula = testscr ~ str + elpct)
Residuals:
Min
1Q
-48.845 -10.240
Median
-0.308
3Q
9.815
Max
43.461
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 686.0322
8.8122
77.85
<2e-16 ***
str
-1.1013
0.4371
-2.52
0.0121 *
elpct
-0.6498
0.0313 -20.76
<2e-16 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 14.46 on 417 degrees of freedom
Multiple R-squared: 0.4264,Adjusted R-squared: 0.4237
F-statistic: 220.1 on 2 and 417 DF, p-value: < 2.2e-16
^
Heteroscedastic variance-covariance matrix of
Now we perform the same steps using heteroscedasticity-consistent standard
errors:
sqrt(diag(vcov(est)))
(Intercept)
7.41131248
str
0.38027832
elpct
0.03934255
str
0.43706612
coef(est) /
stddev
(Intercept)
77.849920
str
-2.519747
elpct
0.03129693
elpct
-20.761681
round(2*pnorm(- abs(coef(est)) /
(Intercept)
0.00000
str
0.01174
stddev),5)
elpct
0.00000
1
X Iu2 X(X X)1
^
^ = (X X)
101
(a0 , a1 , a2 , , ak ) % % +
Elementwise product *
a1
a2
a3
..
.
ak
b1
b2
b3
..
.
bk
b0
b1
b2
..
.
bk
k
X
ai bi
=
i=0
a1 b1
a2 b2
a3 b3
..
.
ak bk
Outer
product A%o%B
a1 b0 a1 b1 a1 b2
a1
a2 b0 a2 b1 a2 b2
a2
a3
%o% (b0 , b1 , b2 , , bm ) = a3 b0 a3 b1 a3 b2
..
..
..
..
.
.
.
.
ak b0 ak b1 ak b2
ak
a1 a1
a1
a2 a2
a2
a3 a3
a3
%o% (+1, 1) =
..
..
..
.
.
.
ak ak
ak
^
confidence interval for
qnorm(.975)
[1] 1.959964
coef(est) + qnorm(.975) * stddev
%o% c(-1,1)
[,1]
[,2]
(Intercept) 668.7605740 703.3039234
str
-1.9579298 -0.2446620
elpct
-0.7111176 -0.5884359
a1 bm
a2 bm
a3 bm
..
..
.
.
ak bm
c Oliver Kirchkamp
c Oliver Kirchkamp
102
%o% c(-1,1))["str",]
Call:
lm(formula = testscr ~ str + elpct)
Coefficients:
(Intercept)
686.0322
str
-1.1013
elpct
-0.6498
str
0.43706612
coef(est) /
stddev
(Intercept)
77.849920
str
-2.519747
elpct
0.03129693
elpct
-20.761681
round(2*pnorm(-abs(coef(est) /
(Intercept)
0.00000
str
0.01174
stddev)),5)
elpct
0.00000
Call:
lm(formula = testscr ~ str + elpct + expnstu)
Coefficients:
(Intercept)
649.577947
str
-0.286399
elpct
-0.656023
expnstu
0.003868
str
0.487512918
stddev
elpct
0.032114291
expnstu
0.001607407
(Intercept)
41.4572475
str
elpct
-0.5874701 -20.4277485
round(2*pnorm(-abs(coef(est) /
(Intercept)
0.00000
str
0.55689
103
expnstu
2.4062993
stddev)),5)
elpct
0.00000
expnstu
0.01612
Compare the standard error of the coefficient of str in the different estimation
equations.
sqrt(diag(hccm(lm(testscr ~ str ))))["str"]
str
0.5243585
sqrt(diag(hccm(lm(testscr ~ str + elpct))))["str"]
str
0.4370661
sqrt(diag(hccm(lm(testscr ~ str + elpct + expnstu))))["str"]
str
0.4875129
^ 1 1,0
^
^1
t2 =
^ 2 2,0
^
^2
c Oliver Kirchkamp
c Oliver Kirchkamp
104
set.seed(100)
N<-1000
p<-0.05
qcrit<- -qnorm(p/2)
b1<-rnorm(N)
mean(abs(b1)>qcrit)*100
[1] 5.9
b2<-rnorm(N)
mean(abs(b2)>qcrit)*100
[1] 4.6
reject<-abs(b1)>qcrit | abs(b2)>qcrit
mean(reject)*100
[1] 10.3
In the example 10.3 % of the values are rejected by the joint test, not 5%. This is
not a coincidence. The next diagram shows that we are not only cutting off on the
left and on the right, but also at the top and at the bottom.
plot(b2 ~ b1,cex=.7)
points(b2 ~ b1,subset=reject,col="red",pch=7,cex=.5)
abline(v=c(qcrit,-qcrit),h=c(qcrit,-qcrit))
dataEllipse(b1,b2,levels=1-p,plot.points=FALSE)
legend("topleft",c("naive rejection","95\\% region"),pch=c(7,NA),col="red",lty=c(NA,1),cex=
105
c Oliver Kirchkamp
0
-3
-2
-1
b2
naive rejection
95% region
-3
-2
-1
b1
Additionally we can see that this nave approach only takes the maximum deviation of the variables into account. It would be more sensible to exclude all
observations outside of the red circle.
The second problem becomes even more annoying if the random variables are
correlated:
set.seed(100)
b1<-rnorm(N)
b2<-.3* rnorm(N) + .7*b1
reject<-abs(b1)>qcrit | abs(b2)>qcrit
plot(b2 ~ b1,cex=.5)
points(b2 ~ b1,subset=reject,col="red",pch=7,cex=.5)
abline(v=c(qcrit,-qcrit),h=c(qcrit,-qcrit))
dataEllipse(b1,b2,levels=1-p,plot.points=FALSE)
text(-1,1,"A")
legend("topleft",c("naive rejection","95\\% region"),pch=c(7,NA),col="red",lty=c(NA,1),cex=.7)
naive rejection
95% region
-2
-1
b2
c Oliver Kirchkamp
106
-3
-2
-1
b1
For example, "A" in the diagram is clearly outside the confidence ellipse, but
none of its single coordinates are conspicious.
4.8.1 F statistic for two restrictions
t1 =
^ 1 1,0
^
^1
t2 =
^ 2 2,0
^
^2
t1 t2 t1 t2
1 t21 + t22 2^
F=
2
1 ^2t t
1 2
Recall:
N(0, 1)
p
tn
2n /n
n
X
i=1
(N(0, 1))
2n
2n1 /n1
Fn1 ,n2
2n2 /n2
(0, 1, 0, , 0)
0 1 0
0 0 1
1 0 1
0 1
0
0 1
2
0
1
2
..
.
k
=0
0
1
2
..
.
k
0
1
2
..
.
k
=
0
7
=
1
1 ^
^ r)
^
(R
F = (R r) R
^
^R
q
with q being the number of restrictions.
If assumptions 14 (see 4.4) are satisfied:
p
F Fq,
107
c Oliver Kirchkamp
1.0
0.6
0.4
0.0
0.2
density
0.8
q=2
q=3
q=5
q = 10
0.2
0.4
0.6
Fq,
density
0.0
c Oliver Kirchkamp
108
Fq,
109
Call:
lm(formula = testscr ~ str + elpct + expnstu)
Residuals:
Min
1Q
-51.340 -10.111
Median
0.293
3Q
10.318
Max
43.181
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 649.577947 15.668622 41.457
<2e-16 ***
str
-0.286399
0.487513 -0.587
0.5572
elpct
-0.656023
0.032114 -20.428
<2e-16 ***
expnstu
0.003868
0.001607
2.406
0.0166 *
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 14.35 on 416 degrees of freedom
Multiple R-squared: 0.4366,Adjusted R-squared: 0.4325
F-statistic: 144.3 on 3 and 416 DF, p-value: < 2.2e-16
1 0 0
0 1 0
0 0 1
.. .. ..
. . .
0 0 0
0
1
0
2
0
3
. .
..
. .. ..
1
k
0
0
0
..
.
0
1 0 0
t1 =
^ 1 1,0
^
^1
0
1
2
..
.
k
=0
X t(k) X2 F(k,k)
c constructs a vector by joining the arguments together. rbind joins the arguments of the function
(vectors, matrices) line-wise. cbind joins the arguments of the function (vectors, matrices) column-wise.
linearHypothesis tests linear hypotheses. pf calculated the distribution function of the F-distribution, df
c Oliver Kirchkamp
c Oliver Kirchkamp
110
calculates the density function, qf calculates quantiles of the F-distribution, rf calculates an F-distributed
random variable.
0
0 1 0 0
0
1
=
0 0 0 1 2
0
3
linearHypothesis(est, R, r)
Linear hypothesis test
Hypothesis:
str = 0
expnstu = 0
Model 1: restricted model
Model 2: testscr ~ str + elpct + expnstu
Res.Df
RSS Df Sum of Sq
F
Pr(>F)
1
418 89000
2
416 85700 2
3300.3 8.0101 0.000386 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
testh<-linearHypothesis(est, R, r)
pf(testh$F[2],2,Inf,lower.tail=FALSE)
[1] 0.0003320828
linearHypothesis(est, R, r, vcov=hccm)
Linear hypothesis test
Hypothesis:
str = 0
expnstu = 0
Model 1: restricted model
Model 2: testscr ~ str + elpct + expnstu
Note: Coefficient covariance matrix supplied.
Res.Df Df
F
Pr(>F)
1
418
2
416 2 5.2617 0.005537 **
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
test<-linearHypothesis(est, R, r, vcov=hccm)
pf(test$F[2],2,Inf,lower.tail=FALSE)
[1] 0.005186642
linearHypothesis(est,c("str=0","expnstu=0"),vcov=hccm)
Linear hypothesis test
Hypothesis:
str = 0
expnstu = 0
Model 1: restricted model
Model 2: testscr ~ str + elpct + expnstu
Note: Coefficient covariance matrix supplied.
Res.Df Df
F
Pr(>F)
1
418
2
416 2 5.2617 0.005537 **
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
^ r)
^
F = (R r) R
(R
^
^R
q
in the case of homoscedasticity if we test 1 = 2 = . . . = k = 0:
F=
n k 1 SSRrestricted SSRunrestricted
q
SSRunrestricted
111
c Oliver Kirchkamp
c Oliver Kirchkamp
112
SSRrestricted
SSRunrestricted
k
q
Recall:
R =
s2y
^
s2y
s2u
SSR
= 1 2^ = 1
TSS
sy
q
1 R2unrestricted
F is distributed according to Fq,nk1
0 + 1 X1 + 2 X2 + u
0 + (1
2 )X1 + 2 (X2 + X
1) + u
data(RetSchool,package="Ecdat")
attach(RetSchool)
summary(wage76)
Min. 1st Qu.
0.000
1.377
Median
1.683
Max.
3.180
NAs
2147
table(grade76)
grade76
0
1
3
2
16
17
539 182
2
2
18
264
3
4
4
6
5
13
6
22
7
42
8
90
9
92
10
148
11
12
194 1213
13
332
14
314
15
209
113
summaryR(est)
Call:
lm(formula = wage76 ~ grade76 + age76 + black + daded + momed)
Residuals:
Min
1Q
-1.75969 -0.25153
Median
0.02054
3Q
0.25961
Max
1.36709
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.002560
0.078653
0.033
0.9740
grade76
0.039486
0.003060 12.902
<2e-16 ***
age76
0.039229
0.002317 16.930
<2e-16 ***
black
-0.218286
0.017866 -12.218
<2e-16 ***
daded
0.000465
0.002732
0.170
0.8648
momed
0.007247
0.003009
2.408
0.0161 *
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.3894 on 3053 degrees of freedom
(2166 observations deleted due to missingness)
Multiple R-squared: 0.2274,Adjusted R-squared: 0.2261
F-statistic: 177.9 on 5 and 3053 DF, p-value: < 2.2e-16
Although the coefficient of momed is significantly different from zero and the
coefficient of daded is not, they are not significantly different from each other:
linearHypothesis(est,c("daded=momed"),vcov=hccm)
Linear hypothesis test
Hypothesis:
daded - momed = 0
Model 1: restricted model
Model 2: wage76 ~ grade76 + age76 + black + daded + momed
Note: Coefficient covariance matrix supplied.
1
2
Res.Df Df
F Pr(>F)
3054
3053 1 1.9809 0.1594
alternatively:
momdaded <- momed+daded
est2<-lm(wage76 ~ grade76 + age76 + black + momed + momdaded)
linearHypothesis(est2,"momed=0",vcov=hccm)
Linear hypothesis test
c Oliver Kirchkamp
c Oliver Kirchkamp
114
Hypothesis:
momed = 0
Model 1: restricted model
Model 2: wage76 ~ grade76 + age76 + black + momed + momdaded
Note: Coefficient covariance matrix supplied.
1
2
Res.Df Df
F Pr(>F)
3054
3053 1 1.9809 0.1594
or even simpler:
summaryR(lm(wage76 ~ grade76 + age76
Call:
lm(formula = wage76 ~ grade76 + age76 + black + momed + momdaded)
Residuals:
Min
1Q
-1.75969 -0.25153
Median
0.02054
3Q
0.25961
Max
1.36709
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.002560
0.078653
0.033
0.974
grade76
0.039486
0.003060 12.902
<2e-16 ***
age76
0.039229
0.002317 16.930
<2e-16 ***
black
-0.218286
0.017866 -12.218
<2e-16 ***
momed
0.006782
0.004818
1.407
0.159
momdaded
0.000465
0.002732
0.170
0.865
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.3894 on 3053 degrees of freedom
(2166 observations deleted due to missingness)
Multiple R-squared: 0.2274,Adjusted R-squared: 0.2261
F-statistic: 177.9 on 5 and 3053 DF, p-value: < 2.2e-16
confidence.ellipse(est,c("daded","momed"),levels=c(.9,.95,.975,.99))
abline(v=0,h=0,a=0,b=1)
0.010
0.005
0.000
momed coefficient
0.015
-0.005
0.000
daded coefficient
linearHypothesis(est,c("daded=0","momed=0"),vcov=hccm)
Linear hypothesis test
Hypothesis:
daded = 0
momed = 0
Model 1: restricted model
Model 2: wage76 ~ grade76 + age76 + black + daded + momed
Note: Coefficient covariance matrix supplied.
Res.Df Df
F Pr(>F)
1
3055
2
3053 2 3.6955 0.02495 *
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
linearHypothesis(est,c("daded=0","momed=0.01"),vcov=hccm)
Linear hypothesis test
Hypothesis:
daded = 0
momed = 0.01
0.005
115
c Oliver Kirchkamp
c Oliver Kirchkamp
116
1
2
Res.Df Df
F Pr(>F)
3055
3053 2 0.4433 0.642
linearHypothesis(est,c("daded=0.01","momed=0"),vcov=hccm)
Linear hypothesis test
Hypothesis:
daded = 0.01
momed = 0
Model 1: restricted model
Model 2: wage76 ~ grade76 + age76 + black + daded + momed
Note: Coefficient covariance matrix supplied.
Res.Df Df
F
Pr(>F)
1
3055
2
3053 2 6.6741 0.001282 **
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
}
}
bayesR<-run.jags(model=modelR,data=mData,monitor=c("MomMinusDad"))
Compiling rjags model and adapting for 1000 iterations...
Calling the simulation using the rjags method...
Burning in the model for 4000 iterations...
Running the model for 10000 iterations...
Simulation complete
Calculating the Gelman-Rubin statistic for 1 variables....
The Gelman-Rubin statistic is below 1.05 for all parameters
Finished running the simulation
0 20406080
Density
0.01
0.00
MomMinusDad
0.02
plot(bayesR,var="MomMinusDad",type=c("trace","density"))
-0.01
0.00
0.01
0.02
-0.01
MomMinusDad
14000
Iteration
The credible interval of this difference contains the zero.
summary(bayesR)
Iterations = 5001:15000
Thinning interval = 1
Number of chains = 2
Sample size per chain = 10000
1. Empirical mean and standard deviation for each variable,
117
c Oliver Kirchkamp
c Oliver Kirchkamp
118
(The first number is about half of the p-value we got for daded=momed above.
linearHypothesis(est,c("daded=momed"))
Linear hypothesis test
Hypothesis:
daded - momed = 0
Model 1: restricted model
Model 2: wage76 ~ grade76 + age76 + black + daded + momed
1
2
Res.Df
RSS Df Sum of Sq
F Pr(>F)
3054 463.32
3053 463.01 1
0.31293 2.0634 0.151
This is to be expected since above we did a two-sided test, while here the alternative is one-sided.)
119
testscr str This model contains perhaps too few coefficients omitted
variable bias)
Omitted variable bias
E(b1 ) = 1 + (X1 X1 )1 X1 X2 2
Only when X1 is orthogonal to X2 or 2 is zero we have no bias
Overfitting (multicollinearity)
1
X Iu2 X(X X)1
^
^ = (X X)
When X is (almost) collinear, then (X X)1 is large, and then is large, hence
our estimates are not precise.
start with a base specification
str
-0.286399240
elpct
-0.656022660
expnstu
0.003867902
str
elratio
-0.286399240 -65.602266008
expnstu
0.003867902
str
-0.2863992
elpct
-0.6560227
expnstuTSD
3.8679018
c Oliver Kirchkamp
20
40
elpct
60
80
700
620
640
testscr
660
680
700
680
660
640
620
620
640
testscr
660
680
700
English learner percentage percentage qualifying for reduced pricepercentage qualifying for income assistance
testscr
c Oliver Kirchkamp
120
20
40
60
80 100
mealpct
4.10.1 Measure R2
R2 = 1
SSR
TSS
20
40
calwpct
60
80
121
calc.relimp(est,type=c("first","last","lmg","pmvd"),rela=TRUE)
Response variable: testscr
Total response variance: 363.0301
Analysis based on 420 observations
4 Regressors:
str elpct mealpct calwpct
Proportion of variance explained by model: 77.49%
Metrics are normalized to sum to 100% (rela=TRUE).
Relative importance metrics:
lmg
str
0.03119231
elpct
0.22371548
mealpct 0.53343971
calwpct 0.21165250
pmvd
0.0148176134
0.0242703918
0.9600101671
0.0009018276
last
0.059126952
0.048159854
0.890678586
0.002034608
first
0.03175031
0.25708495
0.46768098
0.24348376
str
elpct
mealpct
calwpct
1X
-2.2798083
-0.6711562
-0.6102858
-1.0426750
2Xs
-1.4612232
-0.4347537
-0.5922408
-0.5863541
3Xs
-1.1371224
-0.2510901
-0.5645062
-0.2639020
4Xs
-1.01435328
-0.12982189
-0.52861908
-0.04785371
RSS1 RSS2 n k2
RSS2
k2 k1
c Oliver Kirchkamp
c Oliver Kirchkamp
122
RSS
+C
n
But then
RSS2
RSS1
log
2 (L2 L1 ) = n log
n
n
= n log
RSS1
2k2 k1
RSS2
anova(est1,est2,test="Chisq")
Analysis of Variance Table
Model 1: testscr ~ str + mealpct + calwpct
Model 2: testscr ~ str + elpct + mealpct + calwpct
Res.Df
RSS Df Sum of Sq Pr(>Chi)
1
416 35451
2
415 34247
--Signif. codes:
anova(est1,est2)
Analysis of Variance Table
Model 1: testscr ~ str + mealpct + calwpct
Model 2: testscr ~ str + elpct + mealpct + calwpct
Res.Df
RSS Df Sum of Sq
F
Pr(>F)
1
416 35451
2
415 34247 1
1203.3 14.582 0.0001547 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
summary(est2)
Call:
lm(formula = testscr ~ str + elpct + mealpct + calwpct)
Residuals:
Min
1Q
-32.179 -5.239
Median
-0.185
3Q
5.171
Max
31.308
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 700.39184
4.69797 149.084
< 2e-16 ***
str
-1.01435
0.23974 -4.231 0.0000286 ***
elpct
-0.12982
0.03400 -3.819 0.000155 ***
mealpct
-0.52862
0.03219 -16.422
< 2e-16 ***
calwpct
-0.04785
0.06097 -0.785 0.432974
123
c Oliver Kirchkamp
c Oliver Kirchkamp
124
--Signif. codes:
6.000 1859.498
RSS
34239
34247
34169
35431
35721
54871
AIC
1858.4
1858.5
1859.5
1872.7
1876.2
2056.4
Step: AIC=1858.36
testscr ~ str + elpct + mealpct + enrltot
Df Sum of Sq
RSS
AIC
- enrltot 1
60 34298 1857.1
<none>
34239 1858.4
- elpct
1
1208 35446 1870.9
- str
- mealpct
1
1
Step: AIC=1857.09
testscr ~ str + elpct + mealpct
Df Sum of Sq
<none>
- elpct
- str
- mealpct
1
1
1
RSS
34298
1167 35465
1441 35740
52947 87245
AIC
1857.1
1869.1
1872.4
2247.2
Call:
lm(formula = testscr ~ str + elpct + mealpct)
Coefficients:
(Intercept)
700.1500
str
-0.9983
elpct
-0.1216
mealpct
-0.5473
set.seed(123)
N<-nrow(Caschool)
mySamp<-sample(1:N,N/2)
CaIn<-Caschool[mySamp,]
CaOut<-Caschool[-mySamp,]
est <-lm(testscr ~ str + elpct + mealpct + calwpct + enrltot,data=CaIn)
estSm<-lm(testscr ~ str + elpct + mealpct
+ enrltot,data=CaIn)
deviance(est)
[1] 15996.78
deviance(estSm)
[1] 16128.62
sum((CaOut$testscr-predict(est,newdata=CaOut))^2)
[1] 18658.43
125
c Oliver Kirchkamp
sum((CaOut$testscr-predict(estSm,newdata=CaOut))^2)
[1] 18513.59
640
660
680
700
newdata<-list(avginc=5:55)
plot(testscr ~ avginc)
lines(predict(lm(testscr ~ poly(avginc,2)),newdata=newdata)~newdata$avginc,lty=1,lwd=4)
lines(predict(lm(testscr ~ poly(avginc,5)),newdata=newdata)~newdata$avginc,lty=2,lwd=4)
lines(predict(lm(testscr ~ poly(avginc,15)),newdata=newdata)~newdata$avginc,lty=3,lwd=4)
legend("bottomright",c("$r=2$","$r=5$","$r=15$"),lty=1:3,lwd=4)
r=2
r=5
r = 15
620
testscr
c Oliver Kirchkamp
126
10
20
30
40
50
avginc
Of course, the above graph depends on the specific sample of CaIn and CaOut.
We could repeat this exercise for many samples.
plot(sapply(1:15,function(r) deviance(lm(testscr~poly(avginc,r),data=CaIn))),
xlab="degree of polynomial $r$",ylab="within sample deviance")
c Oliver Kirchkamp
28000
30000
127
26000
32000
10
12
14
degree of polynomial r
Out of sample deviance:
60000
50000
40000
plot(sapply(1:9,function(r) sum((CaOut$testscr-predict(lm(testscr~poly(avginc,r),data=CaIn),
newdata=CaOut))^2)),
xlab="degree of polynomial $r$",ylab="out of sample deviance")
degree of polynomial r
Of course, the above graph depends on the specific sample of CaIn and CaOut.
We could repeat this exercise for many samples.
AIC tries to capture the quality of out of sample prediction:
plot(sapply(1:15,function(r) AIC(lm(testscr~poly(avginc,r)))),
xlab="degree of polynomial $r$",ylab="AIC")
AIC
c Oliver Kirchkamp
128
10
12
14
degree of polynomial r
If we want to model the relation between avginc and testscr, perhaps the best
is a polynomial of degree 5:
newdata<-list(avginc=5:55)
plot(testscr ~ avginc)
lines(predict(lm(testscr ~ poly(avginc,5)),newdata=newdata)~newdata$avginc,lty=2,lwd=4)
680
660
640
620
testscr
10
20
30
40
50
avginc
^ i i,0
^
^i
str
-3.7184527
elpct
mealpct
-3.5192705 -13.5610567
calwpct
-0.7778974
round(2*pnorm(-abs(coef(est)/sqrt(diag(hccm(est))))),5)
(Intercept)
0.00000
str
0.00020
elpct
0.00043
mealpct
0.00000
calwpct
0.43663
Instead of always calculating heteroscedasticity-consistent standard errors manually, as we did above, we can also use the function summaryR from the library
tonymisc.
summaryR(lm(testscr ~ str + elpct + mealpct + calwpct))
Call:
lm(formula = testscr ~ str + elpct + mealpct + calwpct)
c Oliver Kirchkamp
129
700
c Oliver Kirchkamp
130
Residuals:
Min
1Q
-32.179 -5.239
Median
-0.185
3Q
5.171
Max
31.308
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 700.39184
5.61546 124.726 < 2e-16 ***
str
-1.01435
0.27279 -3.718 0.000228 ***
elpct
-0.12982
0.03689 -3.519 0.000481 ***
mealpct
-0.52862
0.03898 -13.561 < 2e-16 ***
calwpct
-0.04785
0.06152 -0.778 0.437073
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 9.084 on 415 degrees of freedom
Multiple R-squared: 0.7749,Adjusted R-squared: 0.7727
F-statistic: 349.4 on 4 and 415 DF, p-value: < 2.2e-16
mod1.sum<- summary(mod1.jags)[["statistics"]][,1:2]
Mean
SD
beta[1] 682.925223960 7.5409541182
beta[2] -0.651991336 0.0397160356
beta[3] -0.942712391 0.3874377114
tau
0.004783592 0.0003318526
myData2<-list(testscr=testscr,elpct=elpct)
mod2 <- model {
for (i in 1:length(testscr)) {
testscr[i] ~ dnorm(beta[1] + beta[2]*elpct[i],tau)
}
for (j in 1:2) {
beta[j] ~ dnorm(0,.0001)
}
tau ~ dgamma(.1,.1)
}
mod2.jags <-run.jags(model=mod2,data=myData2,monitor=c("beta","tau"))
Compiling rjags model and adapting for 1000 iterations...
Calling the simulation using the rjags method...
Burning in the model for 4000 iterations...
Running the model for 10000 iterations...
Simulation complete
Calculating the Gelman-Rubin statistic for 3 variables....
The Gelman-Rubin statistic is below 1.05 for all parameters
Finished running the simulation
mod2.sum<- summary(mod2.jags)[["statistics"]][,1:2]
Mean
SD
beta[1] 664.691032674 0.9445561702
beta[2] -0.669906945 0.0389640021
tau
0.004698533 0.0003245831
131
c Oliver Kirchkamp
c Oliver Kirchkamp
132
^ = 2 / var(),
var().
myData12<-list(str=str,testscr=testscr,elpct=elpct,sum=mod1.sum,sumX=mod2.sum)
mod12 <- model {
for (i in 1:length(testscr)) {
testscr[i] ~ dnorm(ifelse(equals(mI,1),
beta[1]+beta[2]*elpct[i]+beta[3]*str[i],
betaX[1]+betaX[2]*elpct[i]),
ifelse(equals(mI,1),tau,tauX))
}
for (j in 1:3) {
beta[j] ~ dnorm(sum[j,1],1/sum[j,2]^2)
}
tau ~ dgamma(sum[4,1]^2/sum[4,2]^2,sum[4,1]/sum[4,2]^2)
for (j in 1:2) {
betaX[j] ~ dnorm(sumX[j,1],1/sumX[j,2]^2)
}
tauX ~ dgamma(sumX[3,1]^2/sumX[3,2]^2,sumX[3,1]/sumX[3,2]^2)
mI ~ dcat(mProb[])
mProb[1]<-.5
mProb[2]<-.5
}
mod12.jags <-run.jags(model=mod12,data=myData12,
monitor=c("beta","tau","betaX","tauX","mI"))
Compiling rjags model and adapting for 1000 iterations...
Calling the simulation using the rjags method...
Burning in the model for 4000 iterations...
Running the model for 10000 iterations...
Simulation complete
Calculating the Gelman-Rubin statistic for 8 variables....
The Gelman-Rubin statistic is below 1.05 for all parameters
Finished running the simulation
summary(mod12.jags)
Iterations = 5001:15000
Thinning interval = 1
133
Number of chains = 2
Sample size per chain = 10000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean
beta[1] 683.257424
beta[2]
-0.651894
beta[3]
-0.964436
betaX[1] 664.697289
betaX[2] -0.669097
mI
1.169200
tau
0.004784
tauX
0.004700
SD
4.8295929
0.0303106
0.2439432
0.9011985
0.0374353
0.3749378
0.0002554
0.0003110
Naive SE Time-series SE
0.034150379
0.245753589
0.000214328
0.000329866
0.001724939
0.012495793
0.006372436
0.008693537
0.000264708
0.000340590
0.002651211
0.012153832
0.000001806
0.000002364
0.000002199
0.000002849
Interesting is the result for mI. This variable has the value 1 or 2 for model 1 or
model 2. An average of 1.1692 means that in 0.8308 of all cases we have model 1
and in 0.1692 of all cases we have model 2.
In other words, model 1 is 4.91 times more likely than model 2.
Compare with p-values from the frequentist analysis:
summaryR(lm(testscr ~ elpct + str ))
Call:
lm(formula = testscr ~ elpct + str)
Residuals:
Min
1Q
-48.845 -10.240
Median
-0.308
3Q
9.815
Max
43.461
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 686.0322
8.8122
77.85
<2e-16 ***
elpct
-0.6498
0.0313 -20.76
<2e-16 ***
str
-1.1013
0.4371
-2.52
0.0121 *
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
c Oliver Kirchkamp
c Oliver Kirchkamp
134
est1
est2
est3
est4
est5
<<<<<-
lm(testscr
lm(testscr
lm(testscr
lm(testscr
lm(testscr
~
~
~
~
~
str)
str +
str +
str +
str +
c Oliver Kirchkamp
135
elpct)
elpct + mealpct)
elpct + calwpct)
elpct + mealpct + calwpct)
mtable("(1)"=est1,"(2)"=est2,"(3)"=est3,"(4)"=est4,"(5)"=est5,summary.stats=c("R-squared","AIC
(1)
(Intercept)
str
698.933
(10.461)
2.280
(0.524)
elpct
(2)
686.032
(8.812)
1.101
(0.437)
0.650
(0.031)
mealpct
(3)
700.150
(5.641)
0.998
(0.274)
0.122
(0.033)
0.547
(0.024)
calwpct
R-squared
AIC
N
0.051
3650.499
420
0.426
3441.123
420
0.775
3050.999
420
(4)
697.999
(7.006)
1.308
(0.343)
0.488
(0.030)
0.790
(0.070)
0.629
3260.656
420
(5)
700.392
(5.615)
1.014
(0.273)
0.130
(0.037)
0.529
(0.039)
0.048
(0.062)
0.775
3052.376
420
4.10.7 Discussion
Controlling for the student characteristics shrinks the coefficient of str to
half its size
The student characteristics are good predictors
The sign of the coefficients of the student characteristics is consistent with
the pictures
Not all control variables are significant
4.11 Exercises
1. Multiple regression I
What is a multiple regression?
c Oliver Kirchkamp
136
where male=1 if it is a boy, age denotes the childs age in years and rain=1
if it has been raining during the day so that the ground is muddy.
Which of the variables are dummy variables? Explain the meaning of
the coefficients of the dummy variables.
How would the estimated model look like if we dropped the variable
male (1 if male, 0 otherwise) and added a variable female (1 if female,
0 otherwise) instead?
You would like to predict the time that it takes the next girl to run 100
meters. She is 10 years old. You know that it has not been raining today.
What is the predicted time?
What would be the prediction for a 13-years-old boy on a day when it
has been raining?
Assume that you use the estimation above for a 20 year old man on a
rainy day. What would be the estimated time? Is that realistic? Why?
In your prediction of the time that the members of your team need to
run 100 meters you would like to add a measure for ability. Unfortunately, you have never classified your team members according to ability. Which other measures could help you to approximate the ability of
each child?
Do you think the model is well specified? What would you change if
you could? Why might you not be able to specify the model the way
you want?
4. Determinants of income
You would like to conduct a survey to find out which factors influence the
income of employees.
Which variables do you think have an influence on income and should
be included in your model?
137
You are only allowed to ask your respondents about their age, their
gender, and the number of years of education they have obtained. Build
a model with these variables as regressors. Which signs do you expect
for the coefficients of each of these variables?
Your assistant has estimated the following equation for monthly incomes: ^I = 976.9 + 38.2 a + 80.5 b 350.7 c with N = 250. Unfortunately, he has not noted which variables indicate what. Look at the
regression. Can you tell which variable stands for which factor?
What is the estimated income of a woman aged 27 who has obtained
17 years of education?
One employee wonders whether she should pursue a MBA program.
The one-year program costs 6 000 in tuition fees. During this year she
will forego a monthly salary of 3 200 (assume 12.5 salaries per year, for
simplicity assume that you live in a world without taxation and where
future income is not discounted). Will this degree pay out during the
next ten years?
Do you think that the above model is a good one? What would you
change if you could?
5. Multiple regressors in R, I
Use the data set Icecream of the library Ecdat in R.
You want to estimate the effect of average weekly family income (income),
price of ice cream (price), and the average temperature (temp) on ice
cream consumption (cons). Formulate your model.
Which variables do you expect to have an influence on ice cream consumption? In which direction do you expect the effect?
Check your assumptions in R.
6. Multiple regressors in R, II
Use the data set Computers of the library Ecdat in R.
What does the data set contain?
c Oliver Kirchkamp
c Oliver Kirchkamp
138
139
c Oliver Kirchkamp
A company rents apartments for students in Jena. The manager would like
to estimate a model for rents for apartments. He has information on the
size of the apartment, the number of bedrooms, the number of bathrooms,
whether the kitchen is large, whether the apartment has a balcony, whether
there is a tub in the bath room, and the location measured as the distance to
the ThULB.
Specify a model to estimate the rents in Jena. Which of the above variables would you include? Which signs do you expect the coefficients
of these variables to take? Explain your answer.
Do you think the model is well specified? Are there any other variables
you would like to add?
13. Exam 21.7.2007, exercise 3
Product Z of your company has been advertised during the last year on two
different TV channels: EURO1 and SAT5. Prices for spots are the same on
both channels. A study with data on the last available periods has provided
the following model (standard errors in parentheses):
Ybi = 300 + 10 X1 + 20 X2 (1.0) (2.5)
You have 44 observations, R2 = 0.9 Y stands for the sales amount of your
product Z (in 1,000 Euros), X1 stands for expenses on commercials at EURO1
(in 1,000 Euros), X2 for expenses on SAT5 (in 1,000 Euros).
Which advertisement method should you prefer according to your regression results (all other factors constant)? Explain your answer.
14. Detecting multicollinearity in R
Use the data set Housing of the library Ecdat in R.
Build a model predicting the price of a house (price) depending on
the lotsize (lotsize), the number of bathrooms (bathrms), the number
of bedrooms (bedrooms), whether the house has air condition (airco),
and whether the house is located in a preferred neighbourhood (prefarea).
Estimate this model in R.
Create a dummy which takes the value 1 if the house has at least one
bathroom. Estimate the same model as above, this time using the dummy
for bathroom instead of the number of bathrooms. What happens?
Why?
Construct a variable which indicates the prices in Euros. Assume an
exchange rate of 0.74 Euros for each Canadian Dollar. Estimate the
same model as above, this time estimating the price of the house in
Euros. Interpret your result.
c Oliver Kirchkamp
140
141
not linear,
no interaction
1.0
1.0
-1
0.8
interaction of
two variables
0.8
linear
c Oliver Kirchkamp
0.6
0.2
x2 = 1
0.0
0.0
-4
0.2
-3
0.4
y
0.4
-2
0.6
x2 = 2
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
Approach:
Logarithmic transformation
Interactions
data(Caschool)
attach(Caschool)
est1 <- lm(testscr ~ avginc)
summaryR(est1)
Call:
lm(formula = testscr ~ avginc)
0.0
0.2
0.4
0.6
x1
0.8
1.0
Residuals:
Min
1Q
-39.574 -8.803
Median
0.603
3Q
9.032
Max
32.530
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 625.3836
1.9290 324.20
<2e-16 ***
avginc
1.8785
0.1188
15.82
<2e-16 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 13.39 on 418 degrees of freedom
Multiple R-squared: 0.5076,Adjusted R-squared: 0.5064
F-statistic: 250.2 on 1 and 418 DF, p-value: < 2.2e-16
620
640
660
680
700
testscr
c Oliver Kirchkamp
142
10
20
30
40
50
avginc
The diagnostic plot confirms that the residuals in this linear model are not independent of avginc.
par(mfrow=c(1,2))
plot(est1,which=1:2)
143
Normal Q-Q
640
39
680
405
2
1
0
-1
-2
-3
Standardized residuals
-40
-20
20
40
Residuals vs Fitted
Residuals
c Oliver Kirchkamp
405
6 39
720
Fitted values
-3 -2 -1
Theoretical Quantiles
Here we have two options: Either we specify a precise functional form for the
relation between avginc and testscr or we leave the precise form open, requiring
only a smooth relationship, like the following:
620
640
660
680
700
testscr
c Oliver Kirchkamp
144
10
20
30
40
50
avginc
Call:
lm(formula = testscr ~ avginc + avginc2)
Residuals:
Min
1Q
-44.416 -9.048
Median
0.440
3Q
8.348
Max
31.639
c Oliver Kirchkamp
145
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 607.301735
2.924223 207.680
<2e-16 ***
avginc
3.850995
0.271104 14.205
<2e-16 ***
avginc2
-0.042308
0.004881 -8.668
<2e-16 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 12.72 on 417 degrees of freedom
Multiple R-squared: 0.5562,Adjusted R-squared: 0.554
F-statistic: 400.9 on 2 and 417 DF, p-value: < 2.2e-16
Normal Q-Q
135
39
630
650
670
Fitted values
690
2
1
0
-1
-2
-3
0
-20
-40
Residuals
20
Standardized residuals
40
Residuals vs Fitted
135
6
39
-3 -2 -1
Theoretical Quantiles
or <- order(avginc)
640
660
680
700
linear
quadratic
620
testscr
c Oliver Kirchkamp
146
10
20
30
40
avginc
testscr
= 3.851 0.042308 2 avginc
avginc
coef(est2)["avginc"] + 2*10*coef(est2)["avginc2"]
avginc
3.004826
coef(est2) %*% c(0,1,2*10)
[,1]
[1,] 3.004826
coef(est2) %*% c(0,1,2*40)
[,1]
[1,] 0.4663179
coef(est2) %*% c(0,1,2*60)
[,1]
[1,] -1.226021
50
147
c = 0 we
to find this, we need for 1 + 2 2 x. We know that with, H0 : Y
2
have
d |
|Y
2d
Y
F1, , hence Y
d =
"
d
|Y|
,
F
hence
c 1.96
c 1.96
|Y|
|Y|
c
c
, Y +
Y
F
F
NA 16.98182
qnorm(.975)/sqrt(lhtest$F)[2])
[,1]
[1,] 3.351629
coef(est2) %*% c(0,1,2*10) * (1 -
qnorm(.975)/sqrt(lhtest$F)[2])
[,1]
[1,] 2.658022
Procedure:
1. theoretical motivation for non-linear dependencies
c Oliver Kirchkamp
c Oliver Kirchkamp
148
1
2
Res.Df Df
F Pr(>F)
417
416 1 2.4615 0.1174
Call:
lm(formula = testscr ~ poly(avginc, 10, raw = TRUE))
Residuals:
Min
1Q
-42.435 -9.159
Median
0.424
3Q
8.764
Max
33.066
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
1.004e+03 5.773e+02
1.739
0.0828 .
poly(avginc, 10, raw = TRUE)1 -2.714e+02 3.245e+02 -0.837
0.4034
poly(avginc, 10, raw = TRUE)2
7.436e+01 7.721e+01
0.963
0.3361
poly(avginc, 10, raw = TRUE)3 -1.073e+01 1.026e+01 -1.046
0.2961
poly(avginc, 10, raw = TRUE)4
9.349e-01 8.449e-01
1.106
0.2692
poly(avginc, 10, raw = TRUE)5 -5.202e-02 4.519e-02 -1.151
0.2503
poly(avginc, 10, raw = TRUE)6
1.888e-03 1.594e-03
1.184
0.2370
poly(avginc, 10, raw = TRUE)7 -4.444e-05 3.678e-05 -1.208
0.2276
[ reached getOption("max.print") -- omitted 3 rows ]
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
149
c Oliver Kirchkamp
[1,]
[2,]
[3,]
[4,]
[5,]
[6,]
[7,]
[8,]
[9,]
[10,]
[,1]
2
3
4
5
6
7
8
9
10
11
[,2]
2181.164
2139.508
2139.384
2136.448
2135.989
2137.467
2139.372
2141.179
2143.160
2143.578
2136
2140
2144
2148
r<-2:50
plot(sapply(r,function(x) extractAIC(lm(testscr ~ poly(avginc,x,raw=TRUE)))[2]) ~ r,t="l",
ylab="AIC")
AIC
c Oliver Kirchkamp
150
10
20
30
40
50
151
c Oliver Kirchkamp
AIC
r<-1:7
plot(sapply(r,function(x) extractAIC(lm(testscr ~ poly(avginc,x,raw=TRUE)))[2]) ~ r,t="l",
ylab="AIC")
660
testscr
680
700
620
640
linear
quadratic
cubic
10th-deg
10
20
30
40
50
avginc
-4
-3
-2
-1
curve(log(x))
log(x)
c Oliver Kirchkamp
152
0.0
0.2
0.4
0.6
0.8
1.0
linear-log
ln Yi = 0 + 1 Xi + ui
log-linear
ln Yi = 0 + 1 ln Xi + ui
153
c Oliver Kirchkamp
log-log
Xi
1
Xi
0.01Xi
1
Xi
. . . Yi changes by 0.01 1
Call:
lm(formula = testscr ~ log(avginc))
Coefficients:
(Intercept) log(avginc)
557.83
36.42
plot(testscr ~ avginc,main="district average income")
abline(est1,col="blue",lwd=3)
lines(avginc[or],fitted(est2)[or],col="red",lwd=3)
lines(avginc[or],fitted(estL)[or],col="green",lwd=3)
legend("bottomright",c("linear","quadratic","linear-log"),lwd=3,col=c("blue","red","green"))
640
660
680
700
linear
quadratic
linear-log
620
testscr
c Oliver Kirchkamp
154
10
20
30
40
50
avginc
coef(estL)[2]/10
log(avginc)
3.641968
coef(estL)[2]/40
log(avginc)
0.9104921
ln Yi
1
Yi
we have
ln Yi 1 Xi
Yi
Yi
Yi
155
c Oliver Kirchkamp
Call:
lm(formula = log(testscr) ~ avginc)
Coefficients:
(Intercept)
6.439362
avginc
0.002844
660
640
linear
quadratic
linear-log
log-lin
620
testscr
680
700
10
20
30
avginc
coef(estLL)[2]
avginc
0.00284407
40
50
c Oliver Kirchkamp
156
library(lattice)
data(Wages, package="Ecdat")
lm(lwage ~ exp,data=Wages)
Call:
lm(formula = lwage ~ exp, data = Wages)
Coefficients:
(Intercept)
6.50143
exp
0.00881
marginal effect:
Yi
Y
= e0 1 Xi1 1 = 1 i
Xi
Xi
Yi Xi
= 1
Xi Yi
1 is the elasticity of Yi with respect to Xi .
(estLLL<- lm(log(testscr) ~ log(avginc)))
Call:
lm(formula = log(testscr) ~ log(avginc))
Coefficients:
(Intercept) log(avginc)
6.33635
0.05542
(estLL <- lm(log(testscr) ~ avginc))
Call:
lm(formula = log(testscr) ~ avginc)
Coefficients:
(Intercept)
6.439362
avginc
0.002844
157
c Oliver Kirchkamp
640
660
linear
quadratic
linear-log
log-lin
log-log
620
testscr
680
700
10
20
30
40
50
avginc
where g (Y) =
Y 1
log Y
if 6= 0
if = 0
355
660
640
620
360
365
testscr
370
680
375
700
library(MASS)
est <- lm (testscr ~ avginc)
plot(boxcox(est,lambda=seq(-2,10,by=.5),plotit=FALSE),t="l",
xlab="$\\lambda$",ylab="log-Likelihood")
est2 <- lm(testscr^8 ~ avginc)
plot(testscr ~ avginc)
lines(fitted(est2)[or]^(1/8) ~ avginc[or],col="blue")
lines(exp(fitted(estLL)[or]) ~ avginc[or],col="orange")
log-Likelihood
c Oliver Kirchkamp
158
-2
10
10
20
30
avginc
40
50
159
0.8
0.7
1 - exp(-x)
0.9
1.0
curve(1-exp(-x),xlim=c(1,10))
x
Estimate the parameters of
Yi = 0 e1 Xi + ui
or (using = 0 e2 )
1 (Xi 2 )
Yi = 0 1 e
+ ui
Yi
0 + 1 ln Xi + ui
0 + 1 Xi + 2 X2i + 3 X3i + ui
10
c Oliver Kirchkamp
c Oliver Kirchkamp
160
Linearizing Yi = 0
1 e1 (Xi 2 )
0 ,1 ,2
n
X
i=1
1 (Xi 2 )
Yi 0 1 e
2
summary(nest)
161
660
640
nls
log
620
testscr
680
700
10
20
30
40
50
avginc
5.2 Interactions
Maybe the effect of group sizes on test scores depends on further circumstances.
Maybe small groups sizes have a particularity large effect if groups have a
lot of foreign students.
testscr
str
depends on elpct.
Generally:
Y
X1
depends on X2 .
c Oliver Kirchkamp
c Oliver Kirchkamp
162
Median
12.00
Max.
17.00
lm(lwage ~ ed)
Call:
lm(formula = lwage ~ ed)
Coefficients:
(Intercept)
5.8388
ed
0.0652
Example 3:
lwage = 1 college + 2 sex + 0 + u
college=ed>16
lm(lwage ~ college + sex)
Call:
lm(formula = lwage ~ college + sex)
Coefficients:
(Intercept) collegeTRUE
6.2254
0.3340
sexmale
0.4626
0.55
0.49
0.24
163
Call:
lm(formula = lwage ~ college + sex + sex:college)
Coefficients:
(Intercept)
6.2057
collegeTRUE:sexmale
-0.2412
collegeTRUE
0.5543
sexmale
0.4850
Instead of regression coefficients we can calculate mean values for the individual categories:
mean(lwage[college==FALSE & sex=="female"])
[1] 6.205665
mean(lwage[college==TRUE & sex=="female"])
[1] 6.760007
mean(lwage[college==FALSE & sex=="male"])
[1] 6.690634
mean(lwage[college==TRUE & sex=="male"])
[1] 7.003751
mean(lwage)
sex
female
6.21
male
6.69
0
6.76
0 + 2
7.00
FALSE
college
TRUE
0 + 1
0 + 1 + 2 + 3
Effect of college education for women: 1 Effect of college education for men:
1 + 3
Histr<-str>=20
Hiel<-elpct>=10
table(Histr,Hiel)
Hiel
Histr
FALSE TRUE
FALSE
149
89
TRUE
79 103
c Oliver Kirchkamp
c Oliver Kirchkamp
164
Call:
lm(formula = testscr ~ Histr * Hiel)
Coefficients:
(Intercept)
664.143
HistrTRUE
-1.908
HielTRUE
-18.163
HistrTRUE:HielTRUE
-3.494
If we do not want to calculate mean values for the individual categories, we can
leave that job to R:
library(memisc)
aggregate(mean(testscr)~Hiel+Histr)
Hiel Histr mean(testscr)
1 FALSE FALSE
664.1433
3 TRUE FALSE
645.9803
2 FALSE
6 TRUE
TRUE
TRUE
165
662.2354
640.5782
18.16
1.91
mean(testscr)
3.49
Histr
FALSE
664.1433
TRUE
662.2354
0
645.9803
0 + 2
640.5782
0 + 1
0 + 1 + 2 + 3
FALSE
Hiel
TRUE
7
6
5
lwage
male
female
10
ed
12
14
16
c Oliver Kirchkamp
c Oliver Kirchkamp
166
(lm(lwage ~ ed,subset=(sex=="female")))
Call:
lm(formula = lwage ~ ed, subset = (sex == "female"))
Coefficients:
(Intercept)
5.04207
ed
0.09452
(lm(lwage ~ ed,subset=(sex=="male")))
Call:
lm(formula = lwage ~ ed, subset = (sex == "male"))
Coefficients:
(Intercept)
5.93060
ed
0.06221
Call:
lm(formula = lwage ~ sex * ed)
Coefficients:
(Intercept)
5.04207
sexmale
0.88854
ed
0.09452
sexmale:ed
-0.03231
detach(Wages)
Hiel=elpct>=10
plot(testscr ~ str)
abline(lm(testscr ~ str,subset=(Hiel==FALSE)))
abline(lm(testscr ~ str,subset=(Hiel==TRUE)),col="red")
legend("topright",c("few el","many el"),lwd=3,col=c("black","red"))
167
660
640
620
testscr
680
few el
many el
14
16
18
20
str
(lm(testscr ~ str,subset=(Hiel==FALSE)))
Call:
lm(formula = testscr ~ str, subset = (Hiel == FALSE))
Coefficients:
(Intercept)
682.2458
str
-0.9685
(lm(testscr ~ str,subset=(Hiel==TRUE)))
Call:
lm(formula = testscr ~ str, subset = (Hiel == TRUE))
Coefficients:
(Intercept)
687.885
str
-2.245
lm(testscr ~ str*Hiel)
Call:
lm(formula = testscr ~ str * Hiel)
Coefficients:
(Intercept)
682.2458
str
-0.9685
HielTRUE
5.6391
str:HielTRUE
-1.2766
22
24
26
c Oliver Kirchkamp
700
c Oliver Kirchkamp
168
Call:
lm(formula = testscr ~ str * Hiel)
Coefficients:
(Intercept)
682.2458
testscr =
str
-0.9685
0
|{z}
682.2458
HielTRUE
5.6391
str:HielTRUE
-1.2766
0.9685
1.2766
linearHypothesis(est,"str:HielTRUE=0",vcov=hccm)
Linear hypothesis test
Hypothesis:
str:HielTRUE = 0
Model 1: restricted model
Model 2: testscr ~ str * Hiel
Note: Coefficient covariance matrix supplied.
Res.Df Df
F Pr(>F)
1
417
2
416 1 1.6778 0.1959
169
Res.Df Df
F
Pr(>F)
1
418
2
416 2 88.806 < 2.2e-16 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
1
2
Res.Df Df
F Pr(>F)
417
416 1 0.0804 0.7769
ifelse returns either the second or the third argument. Which argument is returned depends on the first
arguments. as.data.frame converts the argument (e.g. a matrix) into a data frame. Here, this is helpful,
because the returned structure mixed numbers and strings. colnames provides access to column names.
attach(Wages)
est1 <- lm(lwage
est2 <- lm(lwage
est3 <- lm(lwage
est4 <- lm(lwage
~
~
~
~
ed)
ed + sex)
ed * sex)
ed * sex + exp + black + union + south + wks + married + smsa + ind)
c Oliver Kirchkamp
c Oliver Kirchkamp
170
mtable("(1)"=est1,"(2)"=est2,"(3)"=est3,"(4)"=est4,summary.stats=c("R-squared","N"))
(1)
5.839
(Intercept)
(0.032)
0.065
ed
(0.002)
sex: male/female
(2)
5.419
(0.034)
0.065
(0.002)
0.474
(0.018)
ed sex: male/female
(3)
5.042
(0.087)
0.095
(0.007)
0.889
(0.093)
0.032
(0.007)
exp
black: yes/no
(4)
4.666
(0.107)
0.086
(0.006)
0.552
(0.092)
0.016
(0.007)
0.011
(0.001)
0.168
(0.022)
0.063
(0.012)
0.055
(0.013)
0.005
(0.001)
0.066
union: yes/no
south: yes/no
wks
married: yes/no
(0.022)
0.161
(0.012)
0.043
(0.012)
smsa: yes/no
ind
R-squared
N
0.155
4165
0.260
4165
0.264
4165
mtable("(1)"=est1,"(2)"=est2,summary.stats=c("R-squared","N"))
(Intercept)
ed
exp
(1)
(2)
5.436
(0.037)
0.076
(0.002)
0.013
(0.001)
5.446
(0.075)
0.076
(0.005)
0.013
(0.003)
0.000
(0.000)
ed exp
R-squared
N
0.247
4165
0.247
4165
0.387
4165
171
Marginal Effects:
Yi
= 1 + 3 X2
X1
Yi
= 2 + 3 X1
X2
=
=
=
0 + 1 (X1 + X1 ) + 2 (X
2 + X2 ) + 3 ((X1 + X1 ) (X2 + X2 ))
0 + 1 X1 + 2 X
2 + 3 (X1 X2 )
1 X1 + 2 X2 + 3 (X1 X2 + X1 X2 + X2 X1 + X1 X2 )
3 X1 X2
1 X1 +
2 X2 + 3 X1 X2 +
3 X2 X1 + 3 X1 X2
(1 + 3 X2 )X1 + (2 + 3 X1 )X2 + 3 X1 X2
1.1170
0.6729
3
|{z}
0.001162
attach(Caschool)
summaryR(lm(testscr ~ str * elpct ))
Call:
lm(formula = testscr ~ str * elpct)
Residuals:
Min
1Q
-48.836 -10.226
Median
-0.343
3Q
9.796
Max
43.447
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 686.338525 11.937855 57.493
<2e-16 ***
str
-1.117018
0.596515 -1.873
0.0618 .
elpct
-0.672911
0.386538 -1.741
0.0824 .
str:elpct
0.001162
0.019158
0.061
0.9517
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 14.48 on 416 degrees of freedom
Multiple R-squared: 0.4264,Adjusted R-squared: 0.4223
F-statistic: 150.3 on 3 and 416 DF, p-value: < 2.2e-16
str elpct + u
c Oliver Kirchkamp
c Oliver Kirchkamp
172
What is the effect of the group size str for a group with median share of foreigners?
median calculates the median of a vector. quantile calculates quantiles of vector. The smallest observation equals a quantile of 0, the largest observation equals a quantile of 1. mean calculates the arithmetic
mean.
median(elpct)
[1] 8.777634
est<- lm(testscr ~ str * elpct)
coef(est)
(Intercept)
686.338524629
str
-1.117018345
elpct
-0.672911392
str:elpct
0.001161752
What does this effect look like for a group with a share of foreigners in the 75%
quantile?
quantile(elpct,.5)
50%
8.777634
quantile(elpct,.75)
75%
22.97
(eff2=coef(est)["str"] + coef(est)["str:elpct"] * quantile(elpct,.75))
str
-1.090333
173
1
2
c Oliver Kirchkamp
Res.Df Df
F Pr(>F)
417
416 1 0.0037 0.9517
~
~
~
~
~
~
~
mtable("(1)"=est1,"(2)"=est2,"(3)"=est3,"(4)"=est4,"(5)"=est5,"(6)"=est6,"(7)"=est7,summary.st
(1)
(Intercept)
str
elpct
mealpct
(2)
700.150
658.552
(0.274)
0.122
(0.033)
0.547
(0.024)
(0.261)
0.176
(0.034)
0.398
(0.034)
11.569
(1.841)
(5.641)
0.998
log(avginc)
(8.749)
0.734
Hiel
(3)
682.246
(12.071)
0.968
(0.599)
5.639
(19.889)
1.277
(0.986)
str Hiel
(4)
653.666
(10.053)
0.531
(0.350)
0.411
(0.029)
12.124
(1.823)
5.498
(10.012)
0.578
(0.507)
(5)
252.051
(179.724)
64.339
(27.295)
0.420
(0.029)
11.748
(1.799)
5.474
(1.046)
3.424
(1.373)
0.059
(0.023)
str2
str3
str2 Hiel
str3 Hiel
R-squared
N
0.775
420
0.796
420
0.310
420
0.797
420
0.801
420
It looks like there is a non-linear effect of str on testscr. Lets take another
look at model 6 and lets estimate the marginal effect of str.
estC <- coef(est6)
mEffstr <- function (str,Hiel) {
estC %*% c(0,1,2*str,3*str^2,0,0,0,Hiel,Hiel*2*str,Hiel*3*str^2)
}
mEffstr(20,0)
[,1]
[1,] -1.622543
mEffstr(20,1)
[,1]
[1,] -0.7771982
sapply applies a function to every element of a vector. Within an equation I() prevents a term from
being interpreted as an interaction.
-2
noHitest<-sapply(str,function(x) {mEffstr(x,0)})
Hitest<-sapply(str,function(x) {mEffstr(x,1)})
plot(noHitest ~ str,ylim=c(-6,6))
points(Hitest ~ str,col="red")
abline(h=0)
legend("bottomright",c("noHitest","Hitest"),pch=1,col=c("black","red"))
noHitest
Hitest
-6
-4
noHitest
c Oliver Kirchkamp
174
14
16
18
20
22
24
26
str
175
5.3.2 Summary
Non-linear transformations (log, polynomials) enable us to write non-linear
models as multiple regressions.
Estimations work in the same way as for OLS.
We have to take the transformations into account, when we interpret the
coefficients.
A large number of non-linear specifications is possible. Consider. . .
What non-linear effects are we interested in?
What makes sense with respect to the problem we are trying to solve?
5.4 Exercises
1. Polynomial Regression I
Give the formula of a quadratic regression.
How do you interpret the coefficients and the dependent variable of
this quadratic regression?
Draw a graph illustrating the impact of X1 on Y in a quadratic regression.
Give some real world examples where the use of a quadratic regression
function could be useful.
2. Polynomial Regression II
You want to estimate the time that it takes the members of your running
team to run 1km. You have information on age, gender, and whether the
members are generally spoken in a good physical condition.
Suggest a model to estimate the time it takes your club members to run
1km.
Which sign do you expect for each of the coefficients?
Draw a graph that illustrates the relation between time needed to run
1km and age.
3. Polynomial Regression III
Use the data set Bwages of the library Ecdat in R on wages in Belgium.
c Oliver Kirchkamp
c Oliver Kirchkamp
176
4. Logarithmic Regression I
What is a logarithm?
Where do logarithms occur in nature or science?
Give some examples for the use of logarithmic functions in economic
contexts.
5. Logarithmic Regression II
Which different types of logarithmic regressions do you know? Give
the formulas for each of them.
How do you interpret the coefficients of the different logarithmic models?
Give economic examples for each of them.
6. Logarithmic Regression III
Use the data set Wages of the library Ecdat in R on wages in the United
States.
Estimate the effect of years of experience (exp), whether the employee
has a blue collar job (bluecol), whether the employee lives in a standard metropolitan area (smsa), gender (sex), years of education (ed),
and whether the employee is black (black) on the logarithm of wage
(lwage). Do you think it makes sense to use this model? Would you
rather suggest a different model? Which one would you suggest?
Estimate both models in R. Interpret and compare the outputs.
Visualize the relationship of experience and wage with a graph. Does
this graph support the choice of your model?
7. Polynomial Regression IV
Use the data set Bwages of the library Ecdat in R on wages in Belgium.
177
You want to estimate the wage increase per year of job experience (exper).
You use the level of education (educ) as an additional control. You do
not have information on wage increases, but only on absolute wages
(wage). Solve this problem using R.
8. Linear and Non-linear Regressions I
You want to estimate different models for the following problem sets of one
dependent and one independent variable (assume that the models are otherwise specified correctly, i.e. all other important variables are included, no
high correlation between two independent variables). Name the appropriate model (linear, quadratic, log-lin, lin-log, or log-log) for each of the problems and explain your choice (exercise adapted from Studenmunds "Using
econometrics", chapter 7, exercise 2).
Dependent variable: time it takes to walk from A to B
independent variable: distance from A to B
Dependent variable: total amount spent on food
independent variable: income
Dependent variable: monthly wage
independent variable: age
Dependent variable: number of ski lift tickets sold
independent variable: whether there is snow
Dependent variable: GDP growth rate
independent variable: years passed since beginning of transformation
to an industrialized country
Dependent variable: CO2 emission
independent variable: kilometers driven with car
Dependent variable: hourly wage
independent variable: number of years of job experience
Dependent variable: physical ability
independent variable: age
9. Linear and Non-linear Regressions II
How do you decide which model (linear model or one of the nonlinear
models) to use?
10. Interaction terms I
What is an interaction term?
c Oliver Kirchkamp
c Oliver Kirchkamp
178
Why do you have to include not only the interaction of two variables
into your regression function, but also each of the individual variables?
What would happen if you would not include the individual variables?
Give some examples of situations where you think that interactions
play a role.
11. Exam BW 24.1, 26.5.2010, exercise 19
A group of athletes prepares for a competition. You have the following information about the athletes: age (A), gender (G; 1 if female, 0 otherwise),
daily training (T; 1 if true, 0 otherwise), healthy diet (E; 1 if true, 0 otherwise), and ranking list scores (R). Age and gender are not correlated with
the other variables. You assume that athletes only do especially well in the
ranking list if they practice daily and if they follow a healthy diet; a daily
training is only effective in combination with a healthy diet. What would
be possible specifications of your model to test your assumption? (Here we
dont ask for the "best" specification.)
a) R = 0 + 1 T + 2 E + u
b) R = 0 + 1 A + 2 G + 3 T + 4 E + 5 T E + u
c) R = 0 + 1 T + 2 E + 3 T E + u
d) T E = 0 + 1 R + u
e) R = 0 + 1 A + 2 G + 3 T + 4 E + u
You estimate the following model for the number of days it takes them to
learn the new technique so well that they are able to do their first tours:
^ i = 8 1 iceskating 2.5 alpineskiing 1.5 iceskating alpineskiing
days
How many day needs a person . . . to learn the skating technique with
cross country skis?
who has never done any ice skating nor downhill skiing
who has never done any ice skating, but some downhill skiing
179
who knows how to ice skate, but has never done any downhill
skiing
who is familiar with both ice skating and downhill skiing
13. Interaction terms in R I
You would like to estimate students school achievements measured in test
scores (testscore) in a developing country. You think that gender (female)
and educational background of the parents (eduparents; measured in years)
have an impact. In particular, you think that poor people cannot afford that
their children spend all their time on learning, because they also need their
help to earn money. This might be especially true for girls, because their
parents might think that education is less important for them. How would
you test this assumption in R? Which if the following commands is correct
(multiple correct answers possible)?
a) summary(lm(testscore female+eduparents+eduparents*female))
b) summary(lm(testscore=female+eduparents+eduparents:female))
c) summary(lm(testscore female*eduparents))
d) summary(lm(testscore female+eduparents+eduparents:female))
e) eduparentsfemale <- eduparents*female
summary(lm(testscore female+eduparents+eduparentsfemale))
f) summary(lm(testscore <- eduparents*female))
g) summary(lm(testscore eduparents:female))
14. Interaction terms in R II
Use the data set RetSchool of the library Ecdat in R on returns to schooling
in the United States.
You are interested whether people considered as "black" (black) and
people living in the south (south76) earn less than others. Further, you
are interested whether Afro Americans (black) who live in the south
(south76) earn even less. You control for years of experience (exp76)
and grades (grade76). Solve this problem using R.
15. Interaction terms in R III
Use the data set DoctorContacts of the library Ecdat in R on contacts to medical doctors.
How do gender (sex), age (age), income (linc), the education of the
head of the household (educdec), health (health), physical limitations
(physlim), and the number of chronic diseases (ndisease) effect the
c Oliver Kirchkamp
c Oliver Kirchkamp
180
e.g.: What can we really say about the effect of group sizes on testscores?
6.1.1 Can we evaluate multiple regressions systematically?
Advantages (in comparison to the simple regression model):
Marginal effects X Y can be estimated.
Omitted variable bias can sometimes be prevented (if the variable can be
measured)
Non-linear effects (which depend on X) can be analysed.
still: OLS can be a biased estimator of the true effect.
Internal validity statistical inferences about causal dependencies apply to the population / to the model we are studying.
The estimator is unbiased and consistent.
181
California 1998/99
Massachusetts 1997/98
Mexico 1997/98
c Oliver Kirchkamp
c Oliver Kirchkamp
182
~
~
~
~
~
str)
str +
str +
str +
str +
elpct)
elpct + mealpct)
elpct + calwpct)
elpct + mealpct + calwpct)
mtable("(1)"=est1,"(2)"=est2,"(3)"=est3,"(4)"=est4,"(5)"=est5,
summary.stats=c("R-squared","N"))
(1)
(Intercept)
698.933
(10.461)
2.280
(0.524)
str
elpct
(2)
(3)
700.150
697.999
(0.031)
(0.033)
0.547
(0.024)
(0.030)
(8.812)
1.101
(0.437)
0.650
mealpct
(5.641)
0.998
(0.274)
0.122
calwpct
R-squared
N
0.051
420
(4)
686.032
0.426
420
0.775
420
(7.006)
1.308
(0.343)
0.488
0.790
(0.070)
0.629
420
(5)
700.392
(5.615)
1.014
(0.273)
0.130
(0.037)
0.529
(0.039)
0.048
(0.062)
0.775
420
If the variable does not change over time Regression using panel
data
If the variable is correlated with another variable which we can measure Regression using instruments
183
Logarithmic/polynomial specification
In case of a discrete (e.g. binary) dependent variable: Extending multiple
regression models (probit, logit)
6.2.3 Errors in the variables
What, if we cannot measure our X precisely:
Typos
Imprecise recollections (When did you start working on this project?)
Imprecise questions (What was your income last year?)
Conscious lying (Alcohol intake / sexual preferences)
Yi = 0 + 1 Xi + ui
for this specification it holds that E(ui |Xi ) = 0.
Let Xi be the true value of X and X i the imprecisely measured value of X.
We estimate
Yi
=
=
=
0 + 1 Xi + ui
0 + 1 X i + 1 (Xi X i ) + ui
0 + 1 X i + vi
where vi = 1 (Xi X i ) + ui . If (Xi X i ) is correlated with X i , then X i is corre^ 1 is biased and inconsistent.
lated with vi , and
Example: Let X i = Xi + wi , where wi is a random variable with a mean value
of zero and a variance of 2w and wi is uncorrelated with Xi and ui
vi
1 (Xi X i ) + ui
1 (Xi Xi wi ) + ui
1 wi + ui
0
cov(Xi + wi , wi )
2w
1 cov(X i , wi ) + cov(X i , ui )
1 2w
c Oliver Kirchkamp
c Oliver Kirchkamp
184
^1
Recall
=
p
1 +
In our example
^1
Since
2X
2X +2w
1 +
1 Pn
cov(ui , Xi )
2X
1 +
cov(ui , Xi )
2X
2X + 2w 2w
2X
1 2w
=
1
1 2
2X + 2w
2X + 2w
X + 2w
Draw 100 equity funds today and observe their average performance
over the past 10 years.
185
Draw 100 equity funds ten years ago and observe their average performance over the past 10 years.
Equity funds do not beat the market.
testscr
+
testscr
str
str
Yi
0 + 1 Xi + ui
Xi
0 + 1 Yi + vi
Geographical effect
OLS is still consistent, but the estimators for OLS standard errors are inconsistent.
Alternative formulae for standard errors of panel data, time series data and
data which has correlated groups.
c Oliver Kirchkamp
c Oliver Kirchkamp
186
code
district code (numerical)
municipa
municipality (name)
district
district name
regday
spending per pupil, regular
specneed
spending per pupil, special needs
bilingua
spending per pupil, bilingual
occupday
spending per pupil, occupational
totday
spending per pupil, total
spc
students per computer
speced
special education students
lnchpct
eligible for free or reduced price lunch
tchratio
students per teacher
percap
per capita income
totsc4
4th grade score (math+english+science)
totsc8
8th grade score (math+english+science)
avgsalary
average teacher salary
pctel
percent english learners
Source: Massachusetts Comprehensive Assessment System (MCAS), Massachusetts Department of Education, 1990 U.S. Census
Datensatz$variable denotes a variable (column) in a data set. Alternatively we can write Datensatz[,"variabl
or Datensatz[,c("variable1","variable2")] for more than one column.
data(MCAS)
MCAS<-within(MCAS,{
type<-"MA"
str<-tchratio
testscr<-totsc4
elpct<-pctel
avginc<-percap
mealpct<-lnchpct})
Caschool$type<-"CA"
head(Caschool[,c("type","str","testscr","elpct","avginc","mealpct")])
1
2
3
4
5
6
type
CA
CA
CA
CA
CA
CA
str testscr
elpct
avginc
17.88991 690.80 0.000000 22.690001
21.52466 661.20 4.583333 9.824000
18.69723 643.60 30.000002 8.978000
17.35714 647.70 0.000000 8.978000
18.67133 640.85 13.857677 9.080333
21.40625 605.55 12.408759 10.415000
187
mealpct
2.0408
47.9167
76.3226
77.0492
78.4270
86.9565
head(MCAS[,c("type","str","testscr","elpct","avginc","mealpct")])
1
2
3
4
5
6
type
MA
MA
MA
MA
MA
MA
str testscr
elpct avginc mealpct
19.0
714 0.0000000 16.379
11.8
22.6
731 1.2461059 25.792
2.5
19.3
704 0.0000000 14.040
14.1
17.9
704 0.3225806 16.111
12.1
17.5
701 0.0000000 15.423
17.4
15.7
714 3.9215686 11.144
26.8
merge merges two data sets. Let one data contain the matriculation numbers and grades of students and
let another data set contain matriculation numbers and names. merge assigns the correct matriculation
numbers, grades and names to each other. If the two data sets do not have anything merge can also append
one data set to another.
cama=merge(Caschool[,c("type","str","testscr","elpct","avginc",
"mealpct")],MCAS[,c("type","str","testscr","elpct","avginc",
"mealpct")],all=TRUE)
The new dataframe cama contains now data from both regions, CA and MA.
First comes the CA data, followed by the MA data.
head(cama)
1
2
3
4
5
6
type
CA
CA
CA
CA
CA
CA
str testscr
elpct avginc mealpct
14.00000 635.60 0.000000 10.656 68.8235
14.20176 656.50 0.000000 13.712 20.0000
14.54214 695.30 3.765690 35.342 0.0000
14.70588 666.85 2.500000 11.826 53.5032
15.13898 698.25 2.807284 35.810 0.0000
15.22436 646.40 0.000000 10.268 76.2774
tail(cama)
635
636
637
638
639
640
type
MA
MA
MA
MA
MA
MA
str testscr
elpct
21.9
691 2.816901
22.0
706 0.000000
22.0
711 0.000000
22.6
731 1.246106
23.5
699 0.000000
27.0
664 10.798017
avginc mealpct
15.905
27.1
14.471
18.3
15.603
12.4
25.792
2.5
16.189
6.8
15.581
70.0
aggregate can be used to split a data set into parts and apply a function to each of the parts.
c Oliver Kirchkamp
aggregate(cama[,2:6],list(cama$type),mean)
Group.1
str testscr
elpct
avginc mealpct
1
CA 19.64043 654.1565 15.768155 15.31659 44.70524
2
MA 17.34409 709.8273 1.117676 18.74676 15.31591
aggregate(cama[,2:6],list(cama$type),sd)
1
2
Group.1
str testscr
elpct
avginc mealpct
CA 1.891812 19.05335 18.28593 7.225890 27.12338
MA 2.276666 15.12647 2.90094 5.807637 15.06007
aggregate(cama[,2:6],list(cama$type),length)
Group.1 str testscr elpct avginc mealpct
1
CA 420
420
420
420
420
2
MA 220
220
220
220
220
subset selects a subset of a data set. If a function supports the parameter data, we can supply it with an
appropriate subset. Many function do also have a parameter subset which selects a subset directly. ylim
defines the scale of the y-axis and pch defines, which symbol is to be used to depict a point.
650
700
750
attach(cama)
plot(testscr ~ avginc,subset=(type=="MA"),col="blue",pch=3,ylim=c(600,750))
points(testscr ~ avginc,subset=(type=="CA"),col="red")
legend("bottomright",c("MASS","CA"),pch=c(3,1),col=c("blue","red"))
MASS
CA
600
testscr
c Oliver Kirchkamp
188
10
20
30
avginc
40
189
10
20
30
700
testscr
700
MASS
CA
600
MASS
CA
600
testscr
myPlot(avginc)
myPlot(str)
myPlot(elpct)
myPlot(mealpct)
40
15
10
15
20
700
testscr
700
MASS
CA
600
MASS
CA
0
25
20
var
Call:
lm(formula = testscr ~ avginc, data = subset(cama, type == "CA"))
Coefficients:
(Intercept)
625.384
avginc
1.879
25
var
600
testscr
var
20
40
var
60
c Oliver Kirchkamp
Call:
lm(formula = testscr ~ avginc, data = subset(cama, type == "MA"))
Coefficients:
(Intercept)
679.387
avginc
1.624
650
700
750
myPlot(avginc)
abline(estC,col="red")
abline(estM,col="blue")
MASS
CA
600
testscr
c Oliver Kirchkamp
190
10
20
30
40
var
If a function has the ... parameter in its definition, we can supply additional parameters when calling
the function. These additional parameters are substituted when we use ... within the function.
ePlot <- function(model,data,...) {
est<- lm(model,data)
stdev<-sqrt(diag(hccm(est)))
pvalue<-round(2*pnorm(-abs(coef(est)/stdev)),4)
stars<-ifelse(pvalue<.001,"***",ifelse(pvalue<.01,"**",
ifelse(pvalue<.05,"*",ifelse(pvalue<.1,".",""))))
a<-as.data.frame(cbind(coef(est),stdev,pvalue))
a$stars=stars
colnames(a)[1]="beta"
print(a,digits=3)
sum<-summary(est)
cat("R2=
",round(sum$r.squared,2),"\n")
or<-order(data$avginc)
if (substr(model[2],1,4)=="log(") {
lines(data$avginc[or],exp(fitted(est)[or]),...)
} else
lines(data$avginc[or],fitted(est)[or],...)
est
}
myPlot(avginc)
est<-ePlot(testscr ~ avginc + I(avginc^2),data=subset(cama,type=="CA"))
beta
stdev pvalue stars
(Intercept) 607.3017 2.92422
0
***
avginc
3.8510 0.27110
0
***
I(avginc^2) -0.0423 0.00488
0
***
R2=
0.56
est<-ePlot(testscr ~ avginc + I(avginc^2),data=subset(cama,type=="MA"))
beta stdev pvalue stars
(Intercept) 638.3711 8.0401
0
***
avginc
5.4703 0.6893
0
***
I(avginc^2) -0.0808 0.0136
0
***
R2=
0.48
est<-ePlot(testscr ~ avginc + I(avginc^2)+ I(avginc^3),
data=subset(cama,type=="CA"),col="red")
beta
(Intercept) 600.078985
avginc
5.018677
I(avginc^2) -0.095805
I(avginc^3)
0.000685
R2=
0.56
stdev
5.462310
0.787290
0.034052
0.000437
pvalue stars
0.0000
***
0.0000
***
0.0049
**
0.1167
191
c Oliver Kirchkamp
c Oliver Kirchkamp
192
193
700
650
testscr
600
MASS
CA
10
20
30
var
Multiple regression:
myPlot(avginc)
est<-ePlot(testscr ~ str,data=subset(cama,type=="CA"))
beta stdev pvalue stars
(Intercept) 698.93 10.461
0
***
str
-2.28 0.524
0
***
R2=
0.05
est<-ePlot(testscr ~ str,data=subset(cama,type=="MA"))
beta stdev pvalue stars
(Intercept) 739.62 8.882 0.0000
***
str
-1.72 0.516 0.0009
***
R2=
0.07
myPlot(avginc)
est<-ePlot(testscr ~ str + elpct + mealpct + log(avginc),
data=subset(cama,type=="CA"))
beta stdev pvalue stars
(Intercept) 658.552 8.7489 0.0000
***
str
-0.734 0.2606 0.0048
**
elpct
-0.176 0.0342 0.0000
***
mealpct
-0.398 0.0336 0.0000
***
log(avginc) 11.569 1.8413 0.0000
***
R2=
0.8
40
c Oliver Kirchkamp
750
10
20
30
40
650
700
750
600
MASS
CA
MASS
CA
600
650
testscr
700
750
beta
stdev pvalue stars
(Intercept) 682.432 12.0943 0.0000
***
str
-0.689 0.2779 0.0131
*
elpct
-0.411 0.3512 0.2422
mealpct
-0.521 0.0834 0.0000
***
log(avginc) 16.529 3.3010 0.0000
***
R2=
0.68
testscr
c Oliver Kirchkamp
194
10
var
20
30
var
40
avginc
-3.06669
I(avginc^2)
0.16369
I(avginc^3) -0.00218
R2=
0.69
2.53398 0.2262
0.09172 0.0743
0.00104 0.0370
195
.
*
linearHypothesis(est,c("I(avginc^2)=0","I(avginc^3)=0"),vcov=hccm)
Linear hypothesis test
Hypothesis:
I(avginc^2) = 0
I(avginc^3) = 0
Model 1: restricted model
Model 2: testscr ~ str + elpct + mealpct + avginc + I(avginc^2) + I(avginc^3)
Note: Coefficient covariance matrix supplied.
700
650
MASS
CA
600
testscr
750
Res.Df Df
F
Pr(>F)
1
215
2
213 2 6.227 0.002354 **
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
10
20
30
var
40
c Oliver Kirchkamp
c Oliver Kirchkamp
196
myPlot(avginc)
est<-ePlot(testscr ~ str + I(str^2) + I(str^3) + elpct + mealpct +
avginc + I(avginc^2) + I(avginc^3),data=subset(cama,type=="CA"))
beta
stdev pvalue stars
(Intercept) 330.079169 173.293815 0.0568
.
str
55.618325 26.486559 0.0357
*
I(str^2)
-2.914810
1.340721 0.0297
*
I(str^3)
0.049866
0.022437 0.0262
*
elpct
-0.196440
0.035054 0.0000
***
mealpct
-0.411538
0.033874 0.0000
***
avginc
-0.912858
0.587802 0.1204
I(avginc^2)
0.067430
0.022781 0.0031
**
I(avginc^3) -0.000826
0.000262 0.0016
**
R2=
0.81
est<-ePlot(testscr ~ str + I(str^2) + I(str^3) + elpct + mealpct +
avginc + I(avginc^2) + I(avginc^3),data=subset(cama,type=="MA"))
beta
stdev pvalue stars
(Intercept) 665.49605 116.07834 0.0000
***
str
12.42598 20.27945 0.5401
I(str^2)
-0.68030
1.12956 0.5470
I(str^3)
0.01147
0.02081 0.5814
elpct
-0.43417
0.36722 0.2371
mealpct
-0.58722
0.11724 0.0000
***
avginc
-3.38154
2.74013 0.2172
I(avginc^2)
0.17410
0.09819 0.0762
.
I(avginc^3) -0.00229
0.00111 0.0398
*
R2=
0.69
linearHypothesis(est,c("str=0","I(str^2)","I(str^3)"),vcov=hccm)
Linear hypothesis test
Hypothesis:
str = 0
I(str^2) = 0
I(str^3) = 0
Model 1: restricted model
Model 2: testscr ~ str + I(str^2) + I(str^3) + elpct + mealpct + avginc +
I(avginc^2) + I(avginc^3)
Note: Coefficient covariance matrix supplied.
Res.Df Df
F Pr(>F)
1
214
2
211 3 2.3364 0.07478 .
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
linearHypothesis(est,c("I(str^2)=0","I(str^3)=0"),vcov=hccm)
1
2
Res.Df Df
F Pr(>F)
213
211 2 0.3396 0.7124
linearHypothesis(est,c("I(avginc^2)=0","I(avginc^3)=0"),vcov=hccm)
Linear hypothesis test
Hypothesis:
I(avginc^2) = 0
I(avginc^3) = 0
Model 1: restricted model
Model 2: testscr ~ str + I(str^2) + I(str^3) + elpct + mealpct + avginc +
I(avginc^2) + I(avginc^3)
Note: Coefficient covariance matrix supplied.
Res.Df Df
F
Pr(>F)
1
213
2
211 2 5.7043 0.003866 **
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
197
c Oliver Kirchkamp
650
700
750
MASS
CA
600
testscr
c Oliver Kirchkamp
198
10
20
30
40
var
I(str^3)
0.01147
elpct
-0.43417
mealpct
-0.58722
avginc
-3.38154
I(avginc^2)
0.17410
I(avginc^3) -0.00229
R2=
0.69
0.02081
0.36722
0.11724
2.74013
0.09819
0.00111
0.5814
0.2371
0.0000
0.2172
0.0762
0.0398
***
.
*
linearHypothesis(est,c("str=0","I(str^2)","I(str^3)"),vcov=hccm)
Linear hypothesis test
Hypothesis:
str = 0
I(str^2) = 0
I(str^3) = 0
Model 1: restricted model
Model 2: testscr ~ str + I(str^2) + I(str^3) + elpct + mealpct + avginc +
I(avginc^2) + I(avginc^3)
Note: Coefficient covariance matrix supplied.
Res.Df Df
F Pr(>F)
1
214
2
211 3 2.3364 0.07478 .
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
linearHypothesis(est,c("I(str^2)=0","I(str^3)=0"),vcov=hccm)
Linear hypothesis test
Hypothesis:
I(str^2) = 0
I(str^3) = 0
Model 1: restricted model
Model 2: testscr ~ str + I(str^2) + I(str^3) + elpct + mealpct + avginc +
I(avginc^2) + I(avginc^3)
Note: Coefficient covariance matrix supplied.
Res.Df Df
F Pr(>F)
1
213
2
211 2 0.3396 0.7124
linearHypothesis(est,c("I(avginc^2)=0","I(avginc^3)=0"),vcov=hccm)
Linear hypothesis test
Hypothesis:
199
c Oliver Kirchkamp
c Oliver Kirchkamp
200
I(avginc^2) = 0
I(avginc^3) = 0
Model 1: restricted model
Model 2: testscr ~ str + I(str^2) + I(str^3) + elpct + mealpct + avginc +
I(avginc^2) + I(avginc^3)
Note: Coefficient covariance matrix supplied.
Res.Df Df
F
Pr(>F)
1
213
2
211 2 5.7043 0.003866 **
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
aggregate(elpct,list(type),median)
1
2
Group.1
x
CA 8.777634
MA 0.000000
cama$HiEL=cama$elpct>0
est<-ePlot(testscr ~ str + HiEL + HiEL:str + mealpct + avginc +
I(avginc^2) + I(avginc^3),data=subset(cama,type=="CA"))
beta
stdev pvalue stars
(Intercept) 658.16110 16.404309 0.0000
***
str
1.35471 0.810345 0.0946
.
HiELTRUE
36.11583 16.262280 0.0264
*
mealpct
-0.50670 0.027027 0.0000
***
avginc
-0.90724 0.616350 0.1410
I(avginc^2)
0.05912 0.023531 0.0120
*
I(avginc^3)
-0.00068 0.000272 0.0123
*
str:HiELTRUE -2.18763 0.864613 0.0114
*
R2=
0.8
est<-ePlot(testscr ~ str + HiEL + HiEL:str + mealpct + avginc +
I(avginc^2) + I(avginc^3),data=subset(cama,type=="MA"))
beta
stdev pvalue stars
(Intercept) 759.91422 25.28938 0.0000
***
str
-1.01768 0.38182 0.0077
**
HiELTRUE
-12.56073 10.22789 0.2194
mealpct
-0.70851 0.09894 0.0000
***
avginc
-3.86651 2.71955 0.1551
I(avginc^2)
0.18412 0.09930 0.0637
.
I(avginc^3)
-0.00234 0.00115 0.0414
*
str:HiELTRUE
0.79861 0.58020 0.1687
R2=
0.69
linearHypothesis(est,c("str=0","str:HiELTRUE=0"),vcov=hccm)
201
c Oliver Kirchkamp
650
700
750
MASS
CA
600
testscr
c Oliver Kirchkamp
202
10
20
30
40
var
I(str^2)
-0.68030
I(str^3)
0.01147
elpct
-0.43417
mealpct
-0.58722
avginc
-3.38154
I(avginc^2)
0.17410
I(avginc^3) -0.00229
R2=
0.69
1.12956
0.02081
0.36722
0.11724
2.74013
0.09819
0.00111
0.5470
0.5814
0.2371
0.0000
0.2172
0.0762
0.0398
***
.
*
linearHypothesis(est,c("str=0","I(str^2)","I(str^3)"),vcov=hccm)
Linear hypothesis test
Hypothesis:
str = 0
I(str^2) = 0
I(str^3) = 0
Model 1: restricted model
Model 2: testscr ~ str + I(str^2) + I(str^3) + elpct + mealpct + avginc +
I(avginc^2) + I(avginc^3)
Note: Coefficient covariance matrix supplied.
Res.Df Df
F Pr(>F)
1
214
2
211 3 2.3364 0.07478 .
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
linearHypothesis(est,c("I(str^2)=0","I(str^3)=0"),vcov=hccm)
Linear hypothesis test
Hypothesis:
I(str^2) = 0
I(str^3) = 0
Model 1: restricted model
Model 2: testscr ~ str + I(str^2) + I(str^3) + elpct + mealpct + avginc +
I(avginc^2) + I(avginc^3)
Note: Coefficient covariance matrix supplied.
1
2
Res.Df Df
F Pr(>F)
213
211 2 0.3396 0.7124
linearHypothesis(est,c("I(avginc^2)=0","I(avginc^3)=0"),vcov=hccm)
Linear hypothesis test
203
c Oliver Kirchkamp
c Oliver Kirchkamp
204
Hypothesis:
I(avginc^2) = 0
I(avginc^3) = 0
Model 1: restricted model
Model 2: testscr ~ str + I(str^2) + I(str^3) + elpct + mealpct + avginc +
I(avginc^2) + I(avginc^3)
Note: Coefficient covariance matrix supplied.
Res.Df Df
F
Pr(>F)
1
213
2
211 2 5.7043 0.003866 **
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
aggregate(elpct,list(type),median)
1
2
Group.1
x
CA 8.777634
MA 0.000000
cama$HiEL=cama$elpct>0
est<-ePlot(testscr ~ str + HiEL + HiEL:str + mealpct + avginc +
I(avginc^2) + I(avginc^3),data=subset(cama,type=="CA"))
beta
stdev pvalue stars
(Intercept) 658.16110 16.404309 0.0000
***
str
1.35471 0.810345 0.0946
.
HiELTRUE
36.11583 16.262280 0.0264
*
mealpct
-0.50670 0.027027 0.0000
***
avginc
-0.90724 0.616350 0.1410
I(avginc^2)
0.05912 0.023531 0.0120
*
I(avginc^3)
-0.00068 0.000272 0.0123
*
str:HiELTRUE -2.18763 0.864613 0.0114
*
R2=
0.8
est<-ePlot(testscr ~ str + HiEL + HiEL:str + mealpct + avginc +
I(avginc^2) + I(avginc^3),data=subset(cama,type=="MA"))
beta
stdev pvalue stars
(Intercept) 759.91422 25.28938 0.0000
***
str
-1.01768 0.38182 0.0077
**
HiELTRUE
-12.56073 10.22789 0.2194
mealpct
-0.70851 0.09894 0.0000
***
avginc
-3.86651 2.71955 0.1551
I(avginc^2)
0.18412 0.09930 0.0637
.
I(avginc^3)
-0.00234 0.00115 0.0414
*
str:HiELTRUE
0.79861 0.58020 0.1687
R2=
0.69
linearHypothesis(est,c("str=0","str:HiELTRUE=0"),vcov=hccm)
beta
stdev pvalue stars
(Intercept) 747.36389 21.67952 0.0000
***
str
-0.67188 0.27679 0.0152
*
mealpct
-0.65308 0.07859 0.0000
***
avginc
-3.21795 2.46635 0.1920
I(avginc^2)
0.16479 0.09113 0.0706
.
I(avginc^3) -0.00216 0.00106 0.0415
*
R2=
0.68
linearHypothesis(est,c("I(avginc^2)=0","I(avginc^3)=0"),vcov=hccm)
205
c Oliver Kirchkamp
650
700
750
Res.Df Df
F Pr(>F)
1
216
2
214 2 4.2776 0.01508 *
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
MASS
CA
600
testscr
c Oliver Kirchkamp
206
10
20
30
40
var
Comparing the mean values and standard deviations of str in California and
Massachusetts.
207
(mmean<-aggregate(cama$testscr,list(type),mean))
Group.1
x
1
CA 654.1565
2
MA 709.8273
colnames(mmean)<-c("type","testmean")
(msd=aggregate(cama$testscr,list(type),sd))
Group.1
x
1
CA 19.05335
2
MA 15.12647
colnames(msd)<-c("type","testsd")
cama2=merge(merge(cama,mmean),msd)
head(cama2)
type
str testscr
elpct avginc mealpct HiEL
1
CA 14.00000 635.60 0.000000 10.656 68.8235 FALSE
2
CA 14.20176 656.50 0.000000 13.712 20.0000 FALSE
3
CA 14.54214 695.30 3.765690 35.342 0.0000 TRUE
4
CA 14.70588 666.85 2.500000 11.826 53.5032 TRUE
[ reached getOption("max.print") -- omitted 2 rows ]
testmean
654.1565
654.1565
654.1565
654.1565
testsd
19.05335
19.05335
19.05335
19.05335
tail(cama2)
type str testscr
elpct avginc mealpct HiEL testmean
testsd
635
MA 21.9
691 2.816901 15.905
27.1 TRUE 709.8273 15.12647
636
MA 22.0
706 0.000000 14.471
18.3 FALSE 709.8273 15.12647
637
MA 22.0
711 0.000000 15.603
12.4 FALSE 709.8273 15.12647
638
MA 22.6
731 1.246106 25.792
2.5 TRUE 709.8273 15.12647
[ reached getOption("max.print") -- omitted 2 rows ]
cama2$testnorm=(cama2$testscr - cama2$testmean) / cama2$testsd
detach(cama)
myPlot(avginc)
est<-ePlot(testscr ~ str + I(str^2) + I(str^3) + elpct + mealpct +
avginc + I(avginc^2) + I(avginc^3),data=subset(cama,type=="CA"))
beta
stdev pvalue stars
(Intercept) 330.079169 173.293815 0.0568
.
str
55.618325 26.486559 0.0357
*
I(str^2)
-2.914810
1.340721 0.0297
*
I(str^3)
0.049866
0.022437 0.0262
*
elpct
-0.196440
0.035054 0.0000
***
mealpct
-0.411538
0.033874 0.0000
***
avginc
-0.912858
0.587802 0.1204
I(avginc^2)
0.067430
0.022781 0.0031
**
I(avginc^3) -0.000826
0.000262 0.0016
**
R2=
0.81
c Oliver Kirchkamp
c Oliver Kirchkamp
208
linearHypothesis(est,c("I(avginc^2)=0","I(avginc^3)=0"),vcov=hccm)
Linear hypothesis test
Hypothesis:
I(avginc^2) = 0
I(avginc^3) = 0
Model 1: restricted model
Model 2: testscr ~ str + I(str^2) + I(str^3) + elpct + mealpct + avginc +
I(avginc^2) + I(avginc^3)
Note: Coefficient covariance matrix supplied.
Res.Df Df
F
Pr(>F)
1
213
2
211 2 5.7043 0.003866 **
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
aggregate(elpct,list(type),median)
1
2
Group.1
x
CA 8.777634
MA 0.000000
cama$HiEL=cama$elpct>0
est<-ePlot(testscr ~ str + HiEL + HiEL:str + mealpct + avginc +
I(avginc^2) + I(avginc^3),data=subset(cama,type=="CA"))
beta
stdev pvalue stars
(Intercept) 658.16110 16.404309 0.0000
***
str
1.35471 0.810345 0.0946
.
HiELTRUE
36.11583 16.262280 0.0264
*
mealpct
-0.50670 0.027027 0.0000
***
avginc
-0.90724 0.616350 0.1410
I(avginc^2)
0.05912 0.023531 0.0120
*
I(avginc^3)
-0.00068 0.000272 0.0123
*
str:HiELTRUE -2.18763 0.864613 0.0114
*
R2=
0.8
est<-ePlot(testscr ~ str + HiEL + HiEL:str + mealpct + avginc +
I(avginc^2) + I(avginc^3),data=subset(cama,type=="MA"))
beta
stdev pvalue stars
(Intercept) 759.91422 25.28938 0.0000
***
str
-1.01768 0.38182 0.0077
**
HiELTRUE
-12.56073 10.22789 0.2194
mealpct
-0.70851 0.09894 0.0000
***
avginc
-3.86651 2.71955 0.1551
I(avginc^2)
0.18412 0.09930 0.0637
.
I(avginc^3)
-0.00234 0.00115 0.0414
*
str:HiELTRUE
0.79861 0.58020 0.1687
R2=
0.69
209
c Oliver Kirchkamp
c Oliver Kirchkamp
210
linearHypothesis(est,c("str=0","str:HiELTRUE=0"),vcov=hccm)
Linear hypothesis test
Hypothesis:
str = 0
str:HiELTRUE = 0
Model 1: restricted model
Model 2: testscr ~ str + HiEL + HiEL:str + mealpct + avginc + I(avginc^2) +
I(avginc^3)
Note: Coefficient covariance matrix supplied.
Res.Df Df
F Pr(>F)
1
214
2
212 2 3.7663 0.0247 *
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
linearHypothesis(est,c("I(avginc^2)=0","I(avginc^3)=0"),vcov=hccm)
Linear hypothesis test
Hypothesis:
I(avginc^2) = 0
I(avginc^3) = 0
Model 1: restricted model
Model 2: testscr ~ str + HiEL + HiEL:str + mealpct + avginc + I(avginc^2) +
I(avginc^3)
Note: Coefficient covariance matrix supplied.
Res.Df Df
F Pr(>F)
1
214
2
212 2 3.2201 0.04191 *
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
est<-ePlot(testscr ~ str
beta
stdev pvalue stars
(Intercept) 747.36389 21.67952 0.0000
***
str
-0.67188 0.27679 0.0152
*
mealpct
-0.65308 0.07859 0.0000
***
avginc
-3.21795 2.46635 0.1920
I(avginc^2)
0.16479 0.09113 0.0706
.
I(avginc^3) -0.00216 0.00106 0.0415
*
R2=
0.68
linearHypothesis(est,c("I(avginc^2)=0","I(avginc^3)=0"),vcov=hccm)
211
c Oliver Kirchkamp
650
700
750
MASS
CA
600
testscr
c Oliver Kirchkamp
212
10
20
30
40
var
Functional form:
213
Simlutaneous causality:
testscr
Massachusettes: no measures. V
6.4.3 Result
c Oliver Kirchkamp