Panel Data Lecture Rome

Dynamic Panel Data Methods
Lecture II
Microeconometrics Lectures
Richard Blundell
UCL and IFS
March 2005
Background
The standard panel data model is
yit = β 0 + xit1β 1 + xit 2 β 2 +...+ xitk β k + η i + vit
= xit ′ β + η i + vit
where the η i are the unobserved constant individual effects.

i = 1,..., N ; t = 1,..., T , with N large and T small.
Often lagged values of y are included in x.
2
An Example: Company Investment Rates
The panel data model is
I it I it −1
=β + ηi + λt + vit
K it K it −1
Unbalanced panel
Company level data
T = 4-10, N = 700.
3
Example
OLS Within DIF
Levels Groups 2SLS
( I / K )it −1 0.2669 -0.0094 0.1626

(.0185) (.0181) (.0362)
Instruments ( I / K )it −2
STATA command for GMM: xtabond2 f.windmeijer@ifs.org.uk

On the CeMMAP website http://cemmap.ifs.org.uk/ (resources
page), the Windmeijer course is available together with the
computer exercises and some of the data sets.
4
Three common specifications to deal with η i :
1. Random effects
2. Fixed effects
3. First Differences
In the model
yit = xit ′ β + uit
uit = η i + vit
we assume that
a f a
E vit = 0; E vit | xit = 0 f
5
The Random Effects specification further assumes that
a f a
E η i = 0; E η i | xit = 0f
i.e. it assumes that the individual effect η i is uncorrelated with the
regressors xit .
Therefore
a f a f a f
E yit | xit = xit ′ β + E η i | xit + E vit | xit = xit ′ β
and therefore the simple OLS estimator on the pooled data is unbiased.
However, it is not efficient, and the estimated standard errors are wrong,
as it does not take account of the dependence of the error term within
individual over time.
6
Let uit = η i + vit and assume independence of vis and vit , s ≠ t , and of η i
and the vit , then
a f b g
E uis uit = E η i2 = σ η2
and therefore the uis and uit are correlated. The within individual
variance-covariance matrix is given by, ui′ = ui1 ui 2 ... uiT ,a f
LMσ + σ
2
η
2
v σ η2 OP
σ η2
Ω = Eau u ′f =
i i
M
MM
σ 2
η σ η2 + σ 2v
σ 2
P
PP
η
N σ 2
η σ η2 σ +σ Q
2
η
2
v
β RE
F I
= G ∑ X ′Ω X J
N
−1
−1 N
∑ i yi
Ω
′ −1
Hi =1 Ki i X
i =1
7
Fixed Effects
The more likely and interesting case is when the unobserved individual
effects are correlated with the regressors:
a f
E η i | xit ≠ 0.
Clearly, in this case OLS and the Random Effects estimator are biased
and inconsistent as
a f a f a f
E yit | xit = xit ′ β + E η i | xit + E vit | xit
= x ′ β + Eaη | x f ≠ x ′ β
it i it it
8
A solution is to estimate the model with a separate intercept for every
individual by OLS. As
η i = yi − xi ′ β − vi
this happens to be equivalent for the β parameters to estimate the

transformed, within group model by OLS
a f a
yit − yi = xit − xi ′ β + vit − vi f
Therefore, for the fixed effects, or within group estimator, only the
effects of variables that change over time can be estimated.
(OLS standard errors in this model are again wrong as it ignores the fact
that N intercepts have been estimated).
9
For the fixed effects estimator to be unbiased, one needs that the xit in
all periods are uncorrelated with the vis in all periods:
a f
E vis xit = 0; s = 1,..., T , t = 1,..., T
when xit satisfies this condition, we call it to be strictly exogenous.
Assuming strict exogeneity, the Hausman test can be used to test

whether the unobserved heterogeneity is correlated with the regressors.
When they are not correlated the RE estimator is efficient. If they are
correlated, the FE estimator is consistent, but the RE estimator is not.
′
d i d i
H = β FE − β RE Var β FE − Var β RE d i d
−1
β FE − β REi
If H is large, RE is rejected in favour of FE. For large samples H ~ χ 2k ,
with k the number of elements in β .
10
First Differencing
Again consider the model
yit = xit ′ β + uit

uit = η i + vit
where the unobserved individual effects η i are correlated with xit .

Taking first differences eliminates η i :
a f a f a f a
yit − yit −1 = xit − xit −1 ′ β + uit − uit −1 = xit − xit −1 ′ β + vit − vit −1 f
11
a f
and therefore OLS is unbiased if vit − vit −1 and xit − xit −1 are a f
uncorrelated. This is a weaker assumption than the strict exogeneity
assumption of the fixed effects estimator.
Again OLS estimated standard errors are wrong as it does not take
a f a
account of the correlation between vit − vit −1 and vit −1 − vit −2 f
a fa
E vit − vit −1 vit −1 − vit −2 f = −σ 2
v
LM 2 −1 0 OP
Eav v ′f = σ M P
2
−1 2
i i
MM
v
−1P
N0 −1 2 PQ
(when the vit themselves are not correlated over time).
12
Endogenous Variables
Consider again the model in first differences
a f a
yit − yit −1 = xit − xit −1 ′ β + vit − vit −1 f
And xit is endogenous if it is correlated with vit .
a
There can also be feedback from vit −1 to xit such that E xit vit −1 ≠ 0. In f
this case we call xit predetermined or weakly exogenous.
ba fa fg
In both cases E xit − xit −1 vit − vit −1 ≠ 0 and OLS is biased.
If the uit are not correlated over time, lagged values of xit can be used as
instruments for the endogenous differences, and the model can be
estimated by the Instrumental Variables estimator.
13
a f a f
If xit is endogenous, E xit vit ≠ 0 and E xit −1vit −1 ≠ 0 . Valid instruments
a f
are xis , with s=1,…,t-2, as E xit −2 vit = 0.
a f a f
If xit is predetermined, E xit vit−1 ≠ 0 but E xit −1vit −1 = 0 . Valid
instruments therefore are xis , with s=1,…,t-1.
14
Treatment Effects in Panels
Suppose the model is:

yit = α i dit + xit′ β + λt + ηi + vit
where di = 1 if the program impacts on group i in period t.
Typically once the program is in place this dummy is set to unity for all
remaining time periods.
If the time effects, the group effects and the x are sufficient to render
di = 1 exogenous, then within groups (fixed effects) will be consistent for
the ATT impact of the treatment.
In this case, if the treatment occurs at the same time for all groups that
are treated then diff-in-diff and within groups are identical estimators.
15
Dynamic Panel Data Models
A dynamic panel data model is specified as
yit = αyit −1 + xit ′ β + η i + vit
Consider a model without other explanatory variables
yit = αyit −1 + η i + vit
a f
Clearly, yit −1 = αyit −2 + η i + vit −1 is correlated with η i .
OLS estimator is biased upwards.
Fixed Effects estimator is biased downwards (this bias gets smaller for
larger T)
16
For the first differenced model
a f a
yit − yit −1 = α yit −1 − yit −2 + vit − vit −1 f
yit −1 is of course correlated with vit −1, ( y is predetermined), and the OLS
estimator in the differenced model is severely downward biased.
a f
Valid instruments for yit −1 − yit −2 are the lagged levels yit −2 , yit −3 ,..., yi1,
b a fg
as E yit −2 vit − vit −1 = 0 .
An Instrumental Variables estimator that uses this information optimally

is the Generalised Method of Moments (GMM) estimator.
17
Let ∆vi be the vector of errors for individual i in the first differenced
equation:
LM
vi 3 − vi 2 OP
∆yi 3 − α∆yi 2 LM OP
∆yi 4 − α∆yi 3
∆vi = MM
vi 4 − vi 3
= PP MM PP
MNv
iT − viT −1 PQ MN∆y iT − α∆yiT −1 PQ
and let Zi be the matrix of instruments for individual i
LM yi1 0 0 0 OP
Z =M PP
0 yi1 yi 2 0 0
i
MM PQ
N0 … 0 yi1 yi 2 … yiT −2
18
Then
e j
E Zi ′ ∆vi = 0,
a total of (T − 1)(T − 2) / 2 moment conditions.
The GMM estimator uses these moment condition to estimate the

parameters consistently and efficiently in two steps. The one-step
estimator minimises
F N
1 ′ I ′
F 1 ′ I
= G ∑ Z ∆v J W G ∑ Z ∆v J
N
JN
HNi =1 K HN
i i
K N
i =1
i i
where WN is a weight matrix.
Choosing WN =
FG
1
∑
N
′Z IJ
−1
H
N i =1
Z i i
K results in the Two-Stage Least Squares
estimator.
19
The one-step GMM estimator uses as the weight matrix
F
= G ∑Z A Z J
N
1 ′ I −1
WN 1
HNi =1 K i N i
LM 2 −1 0 0 OP
=M P
−1 2 0
AN
MM 0 −1P
N 0 0 −1 2 PQ
and is efficient when the errors are homoscedastic and not correlated
over time. This is often too restrictive. However, the one-step results are
consistent, and robust standard errors that adjust for heteroscedasticity
and autocorrelation are easily obtained.
20
The two-step estimator is efficient under more general conditions, like
heteroscedasticity. The efficient weight matrix is computed as
F 1 N
′ ′ I
= G ∑ Z ∆v ∆v Z J
−1
WN 2
HN i =1 Ki i i i
∆vi = ∆yi − α 1∆yi , −1
where α 1 is the one-step GMM estimator.

A problem is that in small samples (small number of individuals) the
estimated standard errors of the two-step GMM estimator tend to be too
small.
21
Sargan test for overidentifying restrictions:
The null hypothesis for this test is that the instruments are valid in the
sense that they are not correlated with the errors in the first-differenced
equation. It is computed as
F 1 ′
N
I ′
F 1 ′ I
S = NJ aα f = N G ∑ Z ∆v J W G ∑ Z ∆v J .
N
N 2
HN K HN
i =1
i i2
K N2
i =1
i i2
Under the null, this test statistic has a χ 2q distribution, with q equal to the
total number of instruments minus the number of parameters in the
model.
Only use the two-step result for the Sargan test.
Note also test for serial correlation in the errors.

22
An Example: Investment Rates across Firms
The estimated model is
FG I IJ = λ + α FG I IJ + η + v
it −1
HK K HK K
it
t i it
it it −1
and results are presented in Table 1 for OLS, within groups, just
identified Two-Stage Least Squares for a differenced model, with
( I / K )it −2 as an instrument for ∆( I / K )it −1, and two GMM estimates for
α in the differenced model, one using ( I / K )it −2 and ( I / K )it −3, the other
using ( I / K )it −2 ,..., ( I / K )i1 as instruments.
23
OLS Within 2SLS DIF GMM1 DIF GMM1 DIF
Levels Groups
( I / K )it −1 0.2669 -0.0094 0.1626 0.1593 0.1560

(.0185) (.0181) (.0362) (.0327) (.0318)
m1 -4.71 -11.36 -10.56 -10.91 -11.12

m2 2.52 -2.02 0.61 0.52 0.46
Sargan (p) 0.36 0.43
Instruments ( I / K )it −2 ( I / K )it −2 ( I / K )it −2

( I / K )it −3
( I / K ) i1
24
Exogeneity/Endogeneity of additional regressors and instrument set
Consider again the dynamic model with one other explanatory variable:
yit = αyit −1 + βxit + η i + vit
and the model in first differences:
∆yit = α∆yit −1 + β∆xit + ∆vit .
Consider the case with T = 4. When x is strictly exogenous w.r.t. v , the

instruments are
L
Z =M
y , x ,..., x
i1 i1 i4 0 OP
i
N 0 yi1 , yi 2 , xi1 ,..., xi 4Q.
25
When x is predetermined
L
Z =M
y ,x ,x
i1 i1 i2 0 OP
i
N 0 yi1 , yi 2 , xi1 , xi 2 , xi 3Q.
And when x is endogenous
L
Z =M
y ,x i1 i1 0 OP
i
N 0 yi1 , yi 2 , xi1 , xi 2Q.
26
An example and finite sample inference
Arellano and Bond (1991) estimate dynamic employment equations

using a sample of 140 UK quoted firms over the years 1976-1984. One
model was specified as
nit = α 1nit −1 + α 2 nit −2 + βwit + β 1wit −1 + γkit + δysit + δ 1 ysit −1 + λ t + η i + uit
where nit is the logarithm of UK employment in company i at the end of

the period t , wit is the log of the real product wage, kit is the log of gross
capital and ysit is the log of industry output.
The table presents estimation results for the one- and two-step GMM
estimators.
27
One-Step Two-Step
coeff std err coeff std err std errc
nit −1 .535 .166 .474 .085 .185
nit −2 -.075 .068 -.052 .027 .052
wit -.592 .168 -.513 .049 .146
wit −1 .292 .142 .225 .080 .142
kit .359 .054 .293 .040 .063
ysit .597 .172 .610 .109 .156
ysit −1 -.612 .212 -.446 .125 .217
m1 -2.493 -2.826 -1.999
m2 -0.359 -0.327 -0.316
Wald 219.6 372.0 142.0
28
Another test statistic with reasonable finite sample properties is the
difference between the Sargan test statistics in the models with and
without the restriction imposed. Imposing α 2 = 0 (keeping time periods
and instruments the same) results in a Sargan test of 30.58. The
difference between the Sargan tests is therefore 0.47, which is much
smaller that the 5% critical value of the chi-squared distribution with
one degree of freedom. H0 : α 2 = 0 is therefore not rejected.
29
Weak Instruments and Dynamic Panels
Remember that instruments have to satisfy that
1.They are not correlated with the error term in the equation of interest.
2.They are correlated with the endogenous explanatory variable.
Whether the instruments are correlated with the error term is tested by
means of the Sargan test. If the Sargan test rejects the null of no
correlation, the IV estimator is biased and inconsistent.
However, even if the instruments are not correlated with the error term,
a serious small sample bias can occur if they are only weakly correlated
with the endogenous explanatory variable.
30
For the dynamic panel data model in first differences
∆yit = α∆yit −1 + ∆vit
lagged levels yit −2 ,..., yi1 as instruments for ∆yit −1 become less
informative as α increases. (For the extreme unit root case,
yit = yit −1 + vit , α is not identified in the first differenced GMM model).
The weak instrument bias tend to go in the direction of the within
groups bias (i.e. downward).
This occurs for any highly persistent endogenous r.h.s. variable – capital
etc.
31
There are T − 2 additional moment conditions (additional to the moment
conditions for the model in first differences) for this case are
a f ba f
E uit ∆yit −1 = E η i + vit ∆yit −1 g
= Eba y it f
− αyit −1 ∆yit −1 = 0g
These additional moment conditions are available if the initial
conditions satisfy
a
E η i ∆yi 2 = 0 , f
which holds when the process is mean stationary:
ηi
yi1 = + ε i1
1−α .
a f a f
E ε i1 = E η i ε i1 = 0
32
The GMM estimator that combines the moment conditions for the
differenced model with those for the levels model is call the SYSTEM
estimator (Blundell and Bond (1998)) and has been shown to perform
much better (less bias and more precision), especially when α is large,
i.e. when the series are persistent. This is due to the fact that ∆yit −1 is a
good instrument for yit −1, it explains yit −1 well, irrespective of the value
of α . Whether the additional moment conditions are valid has of course
to be tested, using the Sargan test.
33
The model is
yit = αyit −1 + βxit + η i + vit

xit = ρxit −1 + τη i + θvit + eit
T = 8, N = 500, β = 1, τ = 0.25, θ = −0.1, σ η2 = 1, σ 2v = 1, σ e2 = 0.16

(Normal), 10,000 replications
34
OLS WG DIF SYS
Mean St D Mean St D Mean St D Mean St D
ρ = 0.5
ρ 0.762 .012 0.265 .018 0.494 .034 0.501 .024
α = 0.5 α 0.820 .007 0.311 .017 0.480 .040 0.511 .027
β 0.775 .034 0.490 .045 0.930 .136 0.997 .124
α = 0.95 α 0.990 .001 0.662 .016 0.548 .177 0.979 .011
β 0.581 .035 0.388 .044 0.226 .356 0.983 .101
35
OLS WG DIF SYS
Mean St D Mean St D Mean St D Mean St D
ρ = 0.95
ρ 0.997 .001 0.591 .017 0.676 .222 0.958 .031
α = 0.5 α 0.650 .009 0.396 .015 0.480 .033 0.518 .021
β 0.830 .022 0.796 .040 0.800 .290 1.075 .059
α = 0.95 α 0.962 .001 0.882 .009 0.927 .025 0.957 .003
β 0.902 .017 0.745 .040 0.615 .400 1.019 .031
36
An Example: Company Capital Stock
The estimated model is

kit = λ t + αkit −1 + η i + vit
OLS Within GMM1 DIF GMM1 SYS

Levels Groups (t − 3) (t − 3)
kit −1 0.987 0.733 0.768 0.925

(.002) (.027) (.070) (.021)
m1 7.72 -6.82 -5.80 -6.51

m2 2.29 -1.73 -1.73 -1.81
Sargan (p) .563 0.627

Dif-Sar 0.562
37
Count Data Models
Often the dependent variable is an integer valued non-negative count

variable, like the number of visits to the doctor, the number of patents
granted or the average daily number of cigarettes smoked.
A standard model for analysing such data is the Poisson regression
model. The Poisson density for a count variable yi given xi
e− µi µ i y
a f
f yi | xi =
yi !
where
a f e j
µ i = E yi | xi = exp xi ′ β
is the conditional mean of yi given xi , which is positive.
38
a f
As ln µ i = xi ′ β , the model is often called a log-linear model. The
Poisson distribution has the property that the conditional variance is
equal to the conditional mean (equidispersion):
a f a f e j
Var yi | xi = E yi | xi = exp xi ′ β .
Consider the regression model
e j
yi = exp xi ′ β + ui
a f a f
with E ui | xi = 0 from which it follows that E xi ui = 0. As long as these
conditions are valid in the population, the Poisson estimator for β is
consistent, even if the true distribution is not Poisson.
39
Parameter Interpretation
The partial effects are given by
a f
∂E y| x
e j
= β j exp x ′ β .
∂x j
Further,
βj =
a f
∂E y| x 1
a f
∂x j E y| x
and so β j is a semi-elasticity, it equals the proportionate change in the

conditional mean if the j th regressor changes by one unit. If x j is
c h a f
replaced by ln x j , β j is the elasticity of E y| x with respect to x j .
40
Overdispersion
In many applications there is overdispersion, i.e. the conditional

variance is larger than the conditional mean (and sometimes there is
underdispersion). The Poisson maximum likelihood estimated standard
errors are then wrong, but they can easily be corrected by using robust
standard errors that allow for general heteroskedasticity.
41
Overdispersion can be introduced directly into the model by introducing
an unobserved heterogeneity term, η i . Conditional on xi and η i , the yi
are Poisson distributed with
a f e j e j
E yi | xi , η i = exp xi ′ β + η i = exp xi ′ β ε i
If ε i is independent of xi and has a gamma distribution with E ε i = 1 a f
a f
and Var ε i = δ 2 , then the conditional distribution of yi given xi is
negative binomial with
a f e j
E yi | xi = exp xi ′ β
Var a y | x f = expe x ′ β j + δ e j
2
i i i
2
exp xi ′ β
Note again, that as the conditional mean has not changed, the Poisson
ML estimator is consistent in this case, but not efficient.
42
Panel Data
To allow for general correlation between xit and η i the fixed effects
Poisson estimator is obtained by including N individual specific
dummies in the model. This is similar to the within groups estimator in
the linear model, and therefore the Poisson model does not suffer from
the incidental parameter problem. This fixed effects, within groups
mean scaling estimator is obtained from the regression
yi
yit = µ it + wit
µi
e j ′ 1 T
where µ it = exp xit β and µ i = ∑t =1 µ it .
T
43
#cigs Pooled Poisson Fixed Effects Poisson
coeff se coeff se
age .789 .052 .725 .078
age2 -.102 .006 -.095 .002
lrhi -.112 .020 .019 .004
hsownd -.586 .028 -.132 .008
unemp .218 .030 -.016 .008
dkid04 -.115 .028 -.041 .006
female -.202 .028
# obs 89844 32043

# indiv 19070 5657
44
Weak exogeneity
For both the random and fixed effects estimators, the xit have to be
strictly exogenous. This assumption can be relaxed and quasi-
differencing techniques can be used to allow for endogenous and/or
predetermined explanatory variables.
Write the panel data model with unobserved heterogeneity as
e j
yit = exp xit ′ β uit
uit = ε i vit
and ε i is not correlated with vit .
45
Then
yit
µ it
−
yit −1
µ it −1
a f a
= uit − uit −1 = ε i vit − vit −1 f
a f
If for example xit is endogenous, E xit vit ≠ 0, valid moment conditions
are
LM FG
yit yit −1
− = 0, for s ≥ 2.
IJ OP
N Hµ KQ
E xit − s
it µ it −1
Alternative moment conditions that are only valid when xit is

predetermined are
L F
EMx G y
µ it −1 IJ OP = E L x F y expF −a x f IK
− xit −1 ′ β − yit −1 I OP = 0,
N Hit − s it
µ it
− yit −1
K Q MN H H it − s it it
KQ for s ≥ 1.
These moment conditions can be used for estimation by GMM.

46
Lecture II
Microeconometrics Lectures
Richard Blundell
UCL and IFS
March 2005
47

Panel Data Lecture Rome

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Panel Data Lecture Rome

Uploaded by

Copyright:

Available Formats

Dynamic Panel Data Methods

The standard panel data model is

yit = β 0 + xit1β 1 + xit 2 β 2 +...+ xitk β k + η i + vit

where the η i are the unobserved constant individual effects.

Often lagged values of y are included in x.

The panel data model is

( I / K )it −1 0.2669 -0.0094 0.1626

STATA command for GMM: xtabond2 f.windmeijer@ifs.org.uk

this happens to be equivalent for the β parameters to estimate the

when xit satisfies this condition, we call it to be strictly exogenous.

Assuming strict exogeneity, the Hausman test can be used to test

Again consider the model

yit = xit ′ β + uit

where the unobserved individual effects η i are correlated with xit .

(when the vit themselves are not correlated over time).

Consider again the model in first differences

Suppose the model is:

where di = 1 if the program impacts on group i in period t.

A dynamic panel data model is specified as

yit = αyit −1 + xit ′ β + η i + vit

Consider a model without other explanatory variables

yit = αyit −1 + η i + vit

OLS estimator is biased upwards.

An Instrumental Variables estimator that uses this information optimally

The GMM estimator uses these moment condition to estimate the

where WN is a weight matrix.

∆vi = ∆yi − α 1∆yi , −1

where α 1 is the one-step GMM estimator.

Only use the two-step result for the Sargan test.

Note also test for serial correlation in the errors.

The estimated model is

( I / K )it −1 0.2669 -0.0094 0.1626 0.1593 0.1560

m1 -4.71 -11.36 -10.56 -10.91 -11.12

Sargan (p) 0.36 0.43

Instruments ( I / K )it −2 ( I / K )it −2 ( I / K )it −2

yit = αyit −1 + βxit + η i + vit

and the model in first differences:

∆yit = α∆yit −1 + β∆xit + ∆vit .

Consider the case with T = 4. When x is strictly exogenous w.r.t. v , the

And when x is endogenous

Arellano and Bond (1991) estimate dynamic employment equations

nit = α 1nit −1 + α 2 nit −2 + βwit + β 1wit −1 + γkit + δysit + δ 1 ysit −1 + λ t + η i + uit

where nit is the logarithm of UK employment in company i at the end of

Remember that instruments have to satisfy that

∆yit = α∆yit −1 + ∆vit

yit = αyit −1 + βxit + η i + vit

T = 8, N = 500, β = 1, τ = 0.25, θ = −0.1, σ η2 = 1, σ 2v = 1, σ e2 = 0.16

α = 0.5 α 0.820 .007 0.311 .017 0.480 .040 0.511 .027

β 0.775 .034 0.490 .045 0.930 .136 0.997 .124

α = 0.95 α 0.990 .001 0.662 .016 0.548 .177 0.979 .011

β 0.581 .035 0.388 .044 0.226 .356 0.983 .101

α = 0.5 α 0.650 .009 0.396 .015 0.480 .033 0.518 .021

β 0.830 .022 0.796 .040 0.800 .290 1.075 .059

α = 0.95 α 0.962 .001 0.882 .009 0.927 .025 0.957 .003

β 0.902 .017 0.745 .040 0.615 .400 1.019 .031

The estimated model is

OLS Within GMM1 DIF GMM1 SYS

kit −1 0.987 0.733 0.768 0.925

m1 7.72 -6.82 -5.80 -6.51

Sargan (p) .563 0.627

Often the dependent variable is an integer valued non-negative count

is the conditional mean of yi given xi , which is positive.

Consider the regression model