72 views

Uploaded by Ndivhuho Neosta

notes

- Courseoutline SE(ECO00037M) Term1 201415
- Regression
- Linear Regression Model
- Calibration by Linear Regression - Tutorial
- “Can Firm-specific idiosyncratic financial data provide a solution to the macro-economic factor-risk optimization problem?” “Wage regression residuals- Macro-economic factor cost optimization”
- The Big Problems File
- melkinsburg_2012
- Part 2 Rothart Empirical Research Methods
- NSE_NYSE_Full_Final (3).docx
- Regionalism
- Multi Linear Regression Handout 2x1
- Study Guide 2
- Econ 141
- Ols Derivation
- SSRN-id2241926
- BMS2024-Multiple Linear Regression-1 Lesson (1)
- Appendix c
- Reg Lineal
- Regression
- American Economic Journal: Economic Policy 2009, 1:2, 190–225 Http://Www.aeaweb.org/Articles.php?Doi=10.1257/Pol.1.2.190

You are on page 1of 37

A state implements tough new penalties on drunk drivers: What is the effect on

highway fatalities? Aschool district cuts the size of its elementary school classes:

What is the effect on its students' standardized test scores? You successfully

complete one more year ofcollege classes: What is the effect on your future

earnings?

All three of these questions are about the unknown effect of changing one

variable, X (X being penalties fordrunk driving, class size, or years of schooling), on

another variable, /(/being highway deaths, student test scores, or earnings).

This chapter introduces the linear regression model relating one variable, X, to

another, /. This model postulates a linear relationship between X and Y) the slope of

the line relating Xand Vis the effect of a one-unit change inXon Y. Just as the mean

of Vis an unknown characteristic of the population distribution of Y, the slopeof the

line relating Xand Vis an unknown characteristic ofthe population joint distribution of

Xand Theeconometric problem isto estimate this slopethat is, to estimate the

effect on /of a unit changeinX-using a sample of data on thesetwo variables.

This chapter describes methods for estimating this slope using a random sample

of data on X and Y. Forinstance, using data on class sizes and test scores from

different school districts, we show how to estimate the expected effect on test scores

of reducing class sizes by, say, onestudent perclass. The slope and the intercept of

the line relating Xand Yean be estimated bya method called ordinary least squares

(OLS).

4.1

The superintendent ofanelementary school district must decide whether to hire

additional teachers and she wants your advice. If she hires the teachers, she will

reducethe number of studentsper teacher (the student-teacher ratio) by two. She

faces a trade-off. Parents want smaller classes so that their children can receive

more individualized attention. But hiring more teachers means spending more

money, which is not to the liking of those paying the bill! Soshe asks you: If she

cuts class sizes,what will the effect be on student performance?

149

150

tests, and the job status or pay of some administrators can dependin part on how

well their students do on these tests. We therefore sharpen the superintendent's

question:If she reduces the average classsize by two students,what will the effect

be on standardized test scores in her district?

changes. If thesuperintendent changes the class size bya certam amount, what would

she expect the change in standardized test scores to be? We can write this as a math

ematical relationship using the Greek letter beta, ^oassSize^ where the subscript

ClassSize distinguishes the effectofchanging the class size from other effects. Thus,

ciassSiu

change in ClassSize AClassSize'

. .

where the Greek letter A (delta) stands for "change in." That is, ^ckssSize is the

change in the test score that results from changing the class size divided by the

change in the class size.

If you were lucky enough to know PoassSize^ you would be able to tell the

superintendent thatdecreasing class size by onestudent would change districtwide

test scores by PchssSize- You could also answer the superintendent's actual ques

tion, which concerned changing class size by two students per class. To do so,

rearrange Equation (4.1) so that

(4.2)

Suppose that ^ciassSize ^ "0.6. Then a reduction in class size of two students per

class would yield a predicted change intest scores of (-0.6) X(-2) = 1.2; thatis,

you would predict that test scores would rise by 1.2 points as a result of the

reduction in class sizes by two students per class.

Equation (4.1) is the definition of the slope of a straight line relating test

scores and class size.This straight line can be written

(4.3)

where is the intercept ofthis straight line and, as before, PciassSize is the slope.

According to Equation (4.3), ifyou knew jSo and ^ciassSizc notonly would you be

able todetermine the change intest scores ata district associated with achange in

class size, but you also would be able to predict the average test score itself for a

given class size.

4.1

151

When you propose Equation (4.3) to the superintendent, she tells you that

something is wrong with this formulation. She points out that classsize is just one

of manyfacetsof elementaryeducation and that twodistricts with the same class

sizes will have different test scores for many reasons. One district might have bet

ter teachers or it might use better textbooks.Twodistricts with comparable class

sizes, teachers, and textbooks still might have very different student populations;

perhaps one district has more immigrants (and thus fewer native English speak

ers) or wealthier families. Finally, she points out that even if two districts are the

same in all these ways they might have different test scores for essentially random

reasons having to do with the performance of the individual students on the day

of the test.Sheis right,of course; for all these reasons. Equation (4.3) will not hold

exactly for all districts. Instead, it should be viewed as a statement about a rela

tionship that holds on average across the population of districts.

A version of this hnear relationship that holds for each district must incorpo

rate these other factors influencing test scores, including each district's unique

characteristics (for example, quaUty of their teachers, backgroundof their students,

how lucky the students were on test day). One approach would be to list the most

return to in Chapter 6). For now, however, we simply lump all these "other fac

tors" together and write the relationship for a given district as

TestScore = jSo + ^ciassSize X ClassSize + other factors.

(4.4)

Thus the test score for the district is written in terms of one component, jSq +

^ClassSize XClassSize, that represents the average effect of class size on scores in

thepopulation ofschool districts anda second component that represents allother

factors.

Although this discussion has focused on test scores and class size,the idea

expressed in Equation(4.4) ismuchmoregeneral, so it is useful to introduce more

general notation. Suppose you have a sample of n districts. Let Yj be the average

test score in the i district, let Xibe the average class size in the i district, and let

Uj denote the other factors influencing the test score in the i district. Then

Equation (4.4) can be written more generally as

+

(4.5)

for each district (that is, / = 1,..., n), where j8o is the intercept of this line and (3i

istheslope. [The general notation jSj is used fortheslope inEquation (4.5) instead

of13ClassSize because this equation iswritten interms ofa general variable X^]

152

Yisthe dependent variable andX is the independent variable or the regressor.

The first part of Equation (4.5), /Sq + PiXj, is the populationregression line or

the population regression function. This is the relationship that holds between Y

and X on average over the population. Thus, if you knew the value of X, accord

ing to this population regression line you would predict that the value of the

dependent variable, 7, is jSo + p^X.

Theintercept /3o and the slope /31 are the coefficients ofthe population regres

sion Hne, also known as the parameters of the population regression line. Theslope

jS] isthe change in Yassociated with a unitchange in XThe intercept isthe value

of the population regression line when X = 0;it is the pointat which the popula

tion regression line intersects the Y axis. In some econometric applications, the

intercept has a meaningful economic interpretation. In other applications, the

intercept has no real-world meaning; forexample, when X isthe class size, strictly

speaking the intercept is the predicted value of test scores when there are no stu

dents in the class! When the real-world meaning of the intercept is nonsensical, it

is best to think of it mathematically as the coefficient that determines the level of

the regression line.

The term ii/ in Equation (4.5) is the error term.The error term incorporates

all of the factors responsible for the difference between the district's average

test scoreand the value predicted bythe population regression line. Thiserror term

contains allthe other factors besides X that determine the value of the dependent

variable, Y, for a specific observation, /. In the class size example, theseother fac

tors include all the unique features of the district that affect the performance

ofitsstudents on thetest, including teacher quality, student economic background,

luck, and even any mistakes in grading the test.

cept 4.1.

Figure 4.1 summarizes the linear regression model with a single regressor for

seven hypothetical observations on testscores (F) andclass size (JV').The popu

lation regression line is the straight line /3o + jSiXThe population regression line

slopes down {(3] < 0), which means that districts with lower student-teacher

ratios (smaller classes) tend to have higher testscores. The intercept /3o has a math

ematical meaning as thevalue of the Yaxis intersected bythe population regres

sion line, but, asmentioned earlier, it has no real-world meaning in this example.

Because of the other factors that determine test performance, the hypotheti

calobservations in Figure 4.1 do not fall exactly on the population regression line.

Forexample, the value of Yfor district #1, Fj, isabove the population regression

line. This means that test scores in district #1 were better than predicted by the

4.1

153

Model with a Single Regressor

The linear regression model is

Yi= pQ+ l3iXi + Ui,

where

is the independent variable, the regressor, or simplythe right-hand variable;

jSo +

is the slope of the population regression line; and

Ui is the error term.

(Hypothetical Data)

The scatterplot shows

hypothetical observations

700

680

660

regression line is

i/2-

the population error term

U; for the

'{X2,Y2)

640

observation.

620

600

10

15

20

25

30

154

population regression line, so the error termfor that district, Ui, ispositive. In con

trast, Yj is below the population regression line, so test scores for that district were

worse than predicted, and ui < 0.

expected effect on test scores of reducing the student-teacher ratio by two stu

dents per teacher? The answer is easy: The expected change is (-2) X^ciassSizeBut what is the value of ficiaasSizJ^-

of the Linear Regression Model

In a practical situationsuch as the applicationto classsizeand test scores, the inter

cept ^0 and slope f3i of the population regression line are unknown. Therefore, we

must use data to estimate the unknown slope and intercept of the population

regression line.

This estimation problem is similar to others you have faced in statistics. For

example, suppose youwantto compare the meanearnings of men and women who

recently graduated from college. Although the population mean earnings are

unknown, wecan estimate the population means using a random sample of male

and female college graduates. Then the natural estimator of the unknown popu

lation mean earnings forwomen, forexample, isthe average earnings ofthe female

college graduates in the sample.

The same idea extends to the linear regression model. We do not know the

population value of ^cuissSize^ the slope of the unknown population regression line

relating X (class size) and Y(testscores). Butjustasit was possible to learn about

the population mean using a sample of data drawn from that population, so is it

possible to learn about the population slope ^ciassSize using a sample ofdata.

The data we analyze here consist of test scores and class sizes in 1999 in 420

California school districts that serve kindergarten through eighth grade. Thetest

score isthe districtwide average ofreading andmathscores for fifth graders. Class

size can be measured in various ways. The measure usedhere is one of the broadest,

which isthe numberofstudents in the district divided bythe numberofteachers

that is, the districtwide student-teacher ratio. These data are described in more

detail in Appendix 4.1.

Table 4.1 summarizes the distributions of test scores and class sizes for this sam

dard deviation is 1.9studentsper teacher. The 10'*^ percentile of the distribution of

4.2

155

and Fifth-Grade Test Scores for 420 K-8 Districts in California in 1999

Percentile

10%

Standard

Average

Student-teacher ratio

Test score

25%

40%

50%

60%

75%

90%

(median)

Deviation

19.6

1.9

17.3

18.6

19.3

19.7

20.1

20.9

21.9

654.2

19.1

630.4

640.0

649.1

654.5

659.4

666.7

679.1

J

ratio of 21.9.

ratio is shownin Figure4.2. The samplecorrelation is -0.23, indicating a weak neg

ative relationship between the two variables. Although larger classes in this sam

keep the observations from falling perfectly alonga straight line.

Despite this low correlation, if one could somehow draw a straight line

through these data, then the slope of this line would be an estimate of PciassSize

Scatterplot ofTest Score vs. Student-Teacher Ratio (California School District Data)

Data from 420

Test score

720

negative relationship

700

'

680

'

correlation is -0.23.

660

640

.

..

620

600

10

15

20

25

30

Student-teacher ratio

156

based on these data. One way to draw the hnewould be to take out a pencil and

a ruler andto "eyeball" the best line you could. While this method iseasy, it isvery

unscientific, and different people willcreate different estimated lines.

How, then,should youchoose among the many possible lines? Byfar the most

common way is to choose the line that produces the "least squares" fit to these

datathat is,to use the ordinary least squares (OLS) estimator.

The OLS estimator chooses the regression coefficients so that the estimated

regression line is as close as possible to the observed data, where closeness is mea

As discussed in Section 3.1, the sample average, Y, is the leastsquares estima

mistakes

The OLS estimator extends this idea to the linear regression model. Let bo

and bi be some estimators of /Sq and /3i.The regression line based on these esti

matorsis ^>0 + ^1 A', so the value of Yj predicted using thislineis bo + ^jA^.Thus the

mistake made in predicting the observation is >; - (^^o + b^Xi) = Yi-bQ- b^Xi.

The sum of these squared prediction mistakes over all n observations is

'2iyi-bo-b,Xif.

(4.6)

/=!

Thesum of the squared mistakes for the linear regression model in Expres

sion (4.6) is the extension of the sum of thesquared mistakes for the problem of

estimating the mean in Expression (3.2). In fact, if there is no regressor, then bi

does notenter Expression (4.6) and the two problems areidentical except for the

different notation [m inExpression (3.2), b^) inExpression (4.6)]. Just as thereisa

unique estimator, V, that minimizes theExpression (3.2), soisthere a unique pair

of estimators of /3o and /3i that minimize Expression (4.6).

The estimators of the intercept and slope that minimize the sum of squared

mistakes in Expression (4.6) are called the ordinary leastsquares (OLS) estima

tors of j8o and P]-

OLS hasits own special notation and terminology. The OLS estimator of Po

isdenoted and theOLS estimator ofj8i is denoted /S^.The OLS regression line,

also called thesample regression line orsample regression function, isthe straight

line constructed using the OLS estimators: j8o + p^X. The predicted value of Yj

4.2

(IKEYlCiQNGERITr)

4.2

The OLS estimators of the slope j8i and the mtercept/3o are

k=

-7T

157

(4-7)

/=!

Po = Y-piX.

The OLS predicted values

(4.8)

Yi = h + kXi,i = l,--.,n

Ui = }^-Y;,i = l,...,n.

(4.9)

(4.10)

The estimated intercept (jSo), slope (jSi), and residual (m/) are computedfrom a

sample of n observations of Xfand i = 1,..., n. These are estimates of the

unknowntrue populationintercept ()3o), slope (jSi),and error term (wj).

given,basedon the OLS regression line, is. Theresidual for the i observation is

The OLS estimators,

coefficients, pQ and /3i. Similarly, the OLS regression line ^q + ^iXis the sample

counterpart of the populationregression line /3o + PiX, and the OLS residuals

are sample counterparts of the population errors ui.

You could compute the OLS estimators j8o and

of bo and bi repeatedlyuntil you find those that minimize the total squared mis

takesin Expression (4.6); theyare the leastsquares estimates. This method would

be quite tedious, however. Fortunately, there are formulas, derived by minimiz

ing Expression (4.6) using calculus, that streamline the calculation of the OLS

estimators.

The OLS formulas and terminology are collected in Key Concept 4.2.These

formulas are implemented in virtually all statistical and spreadsheet programs.

These formulas are derived in Appendix 4.2.

158

Test Scores and the Student-Teacher Ratio

When OLS is used to estimate a line relating the student-teacher ratio to test

scores using the 420 observations in Figure 4.2, the estimated slope is -2.28 and

the estimated intercept is 698.9. Accordingly, the OLS regression line for these 420

observations is

(4.11)

where TestScore is the average test score in the district and STR is the

student-teacher ratio.The

over TestScore in Equation (4.11) indicates that it

is the predicted value based on the OLSregression line. Figure 4.3 plots thisOLS

regression linesuperimposed over the scatterplotof the data previously shown in

Figure 4.2.

student per class is,on average, associated with a decline in districtwide test scores

per class is, on average, associated with an increase in test scores of 4.56 points

(larger classes) is associated with poorer performance on the test.

The estimated regres

sion line shows a nega

tive relationship

between test scores and

Test score

720

700

the student-teacher

TestScore

- 2.28 x STR

' OUCU/t? =

= 698.9

'

680

660

640

620

600

10

15

20

25

30

Studentteacher ratio

4.2

159

It is nowpossible to predict the districtwide test scoregiven a value of the student-teacherratio. Forexample, for a district with 20students per teacher, the predicted

test score is 698.9 - 2.28 X 20 = 653.3. Of course, this prediction will not be exactly

right because of the other factors that determine a district's performance. But the

regression line doesgive a prediction (the OLSprediction) of what test scores would

be for that district, based on their student-teacher ratio, absent those other factors.

Is this estimate of the slope large or small? To answer this, we return to the

ers to reduce the student-teacher ratio by 2. Suppose her district is at the median

of the California districts. From Table 4.1, the median student-teacher ratio is 19.7

and the median test score is 654.5. A reduction of two students per class, from 19.7

the 10*'^ percentile. This is a big change, and she would need to hire many new

teachers. How would it affect test scores?

dicted to increase test scores by approximately 4.6 points;if her district's test scores

are at the median, 654.5, they are predicted to increase to 659.1. Is this improve

ment large or small? According to Table 4.1, this improvement would move her

district from the medianto just short of the 60*'' percentile. Thusa decrease in class

size that would place her districtclose to the 10% with the smallest classes would

move her test scores from the 50"' to the 60"' percentile.According to these esti

mates, at least,cuttingthe student-teacher ratio bya large amount (two students

per teacher) would help and might be worth doing depending on her budgetary

situation, but it would not be a panacea.

such as reducing the student-teacher ratio from 20 students per teacher to 5?

Unfortunately, the estimates in Equation (4.11) would not be very useful to her.

This regression was estimated using the data in Figure 4.2, and,as the figure shows,

the smallest student-teacher ratio in these data is 14. These data contain no infor

mation on how districts with extremely small classes perform, so these data alone

are not a reliable basis for predicting the effect of a radical move to such an

extremely low student-teacher ratio.

/V

Thereare bothpractical and theoretical reasons to usethe OLS estimators /Sq and

jSj. BecauseOLSis the dominantmethodused in practice, it has becomethe com

mon language for regression analysis throughout economics, finance (see "The

'Beta' of a Stock" box), and the socialsciences more generally. Presenting results

160

Beta- 'of a Stock

investment, R, must exceed the return on a safe, or

folio and thus conmiands a higher expectedexcess

return.

however,can be reduced by holdingother stocksin a

financial holdings. This means that the right way to

rather by its covariance with the market.

izesthis idea.According to the CAPM,the expected

excess return on an asset is proportional to the

assets (the "market portfolio"). That is,the CAPM

says that

R-Rf = ^{R,-R^),

(4.12)

folio and j8 wthecoefficient in thepopulation regres

sion of i? -

market index.

like Kellogg have stocks with low betas; riskier stocks

have high betas.

Company

Estimated p

Kellogg (breakfast cereal)

Waste Management (waste disposal)

Verizon(telecommunications)

Microsoft (software)

Best Buy (electronic equipment retailer)

Bank of America (bank)

0.3

0.5

0.6

0.6

1.0

1.3

2.4

Source: SmartMoney.com.

ofitsinitial price. Forexample, a stock bought onJanuary 1

for $100, which then paida $2.50 dividend during the year

ing the same language" as other economists and statisticians. The OLS formulas

are built into virtually all spreadsheet and statistical software packages, making

OLS easy to use.

4.3

Measures of Fit

161

The OLS estimators also have desirable theoretical properties. They are anal

the population mean. Under the assumptions introduced in Section 4.4, the OLS

estimator is unbiased and consistent. The OLS estimator is also efficient among a

certain class of unbiased estimators; however, this efficiency result holds under

until Section 5.5.

Having estimated a linearregression, youmight wonder howwell that regression

line describes the data. Does the regressor account for much or for little of the

variation in the dependentvariable? Are the observations tightly clustered around

the regression line, or are they spread out?

The

and the standard error of the regression measure how well the OLS

tion of the variance of Yi that is explained by ^.The standard error of the regres

sion measures how far Yf typically is from its predicted value.

The

dicted by) .X;.The definitions ofthepredicted value and theresidual (see Key Con

cept4.2) allow us to write the dependent variable Yi as the sum of the predicted

value, %plus the residual uf.

Yi = Yi + Ui.

(4.13)

yv

In this notation, the R^ is the ratio of the sample variance of Yi to the sample vari

ance of Yi.

squares tothetotal sum ofsquares.The explained sum ofsquares (ESS) isthesum

of squared deviations of the predicted values of Yi, %from their average, and the

total sumof squares (TSS) is the sumofsquareddeviations of 1/from its average:

ESS=^{Yi-Yf

(4.14)

1=1

r55=2;(if-y)2.

1=1

(4.15)

162

Equation (4.14) uses thefact that thesample average OLS predicted value equals

Y (proven in Appendix 4.3).

The

=

Alternatively, the

Yi not explained by

(4.16)

SSR=^ul

(4.17)

/=i

It is shown in Appendix 4.3 that TSS = ESS + SSR. Thus the R^ also can be

expressed as 1 minus the ratio of the sum of squared residuals to the total sum of

squares:

r>2

SSR

^=1-W

(4-18)

Finally, the R^ of the regression of Yon the single regressor Xis the square of the

correlation coefficient between Y and X.

The R^ ranges between 0and 1. If jS] =0,then Xj explains none of the varia

tion of Yj and the predicted value of Yj based on the regression is just the sample

average of l-. In this case, the explained sum of squares is zero and the sum of

squared residuals equals the totalsum of squares; thus the R^ is zero. In contrast,

ifXj explains all ofthe variation of Yj, then Yi= Yj for all i and every residual iszero

(that is, Uj = 0),so that ESS = TSS and

cates that the regressoris good at predicting Yj, while an R^ near 0 indicates that

the regressor is not very goodat predicting Yj.

The standard error of the regression (SER) is an estimator of the standard devia

tion of the regression error j/,.Theunits of m, and Yj are the same,so the SER is a

measure of the spread of the observations around the regression line, measured

in the units of the dependent variable. For example, ifthe units ofthe dependent

4.3

Measures of Fit

163

the regression linethat is, the magnitude of a typical regression errorin dollars.

Because the regression errors ..., are unobserved, the SER is computed

using theirsample counterparts, the OLS residuals ui,..., w^.The formula for the

SER is

SER =

, wheresi =

(4-19)

where the formula for si uses the fact (proven in Appendix 4.3) that the sample

average of the OLS residuals is zero.

The formula for the SER in Equation (4.19) is similar to the formula for the

1- - y in Equation(3.7) is replaced byw,- and the divisor in Equation (3.7) is - 1,

whereas here it is n - 2.The reason for using the divisor n-2 here (instead of n)

is the same as the reason for using the divisor n - 1 in Equation (3.7): It corrects

for a slight downward bias introduced because two regression coefficients were

estimated. This iscalled a "degreesof freedom" correction because twocoefficients

were estimated (jSq and /3i), two "degrees of freedom" of the data were lost,so

the divisorin this factor is - 2.(The mathematics behind this is discussed in Sec

n - 2 is negligible.

Equation (4.11) reports the regression Une, estimated using the California test

score data,relating the standardized test score{TestScore) to the student-teacher

ratio (5r/?).The R^ of this regression is 0.051, or 5.1%, and the SERis 18.6.

The R^ of 0.051 means that the regressor STR explains 5.1% of the variance

of the dependent variable TestScore. Figure 4.3 superimposes this regression line

on the scatterplot of the TestScore and STR data. As the scatterplot shows, the

student-teacher ratio explains some of the variationin test scores, but muchvari

ation remains unaccounted for.

The SER of 18.6 means that standard deviation of the regression residuals is

18.6, where the units are points on the standardized test. Because the standard

deviation isa measure ofspread, the SERof18.6 means that thereisa large spread

of the scatterplot in Figure 4.3 around the regression line as measured in points

onthe test.This large spreadmeans that predictions oftest scores madeusing only

the student-teacher ratio for that district will often be wrong by a large amount.

164

and large SERl The fact that the of

this regression is low (and the SER is large) does not, by itself, imply that this

regression is either "good" or "bad." What the low R^ does tell us is that other

important factors influence test scores. These factors could include differences in

the student body across districts, differences in school quality unrelated to the

student-teacher ratio, or luck on the test. Thelow R^ and high SER do not tell us

what these factors are, but they do indicate that the student-teacher ratio alone

explains only a small part of the variation in test scores in these data.

This section presents a set of three assumptions on the linear regression model

and the sampling scheme underwhich OLS provides an appropriate estimator of

the unknown regression coefficients, /3o and /3i. Initially, these assumptions might

appear abstract.They do, however, have natural interpretations,and understand

ing these assumptions is essential for understanding when OLS willand will

notgive useful estimates of the regression coefficients.

The first of the three least squares assumptionsis that the conditional distribution

of Uj given Xjhas a mean of zero.Thisassumption is a formal mathematicalstate

ment about the "other factors" contained in a,- and asserts that these other factors

are unrelated to Xj in the sense that,given a valueof Xf, the mean of the distribu

tion of these other factors is zero.

relationship that holds on average between class size and test scores inthe popu

lation, and the error term m, represents the other factors that lead test scores at a

As shown in Figure 4.4, at a given value of class size, say 20 students per class,

sometimes these other factors lead tobetter performance than predicted {tij > 0)

and sometimes to worse performance (, < 0), but on average over the popula

tion the prediction is right. In other words, given Xj = 20, the mean of the distri

centered onthe population regression line at Xj = 20 and, more generally, at other

values jc ofXi aswell. Said differently, the distribution of Ui, conditional on Xj = jc,

has a mean ofzero; stated mathematically, E{ui\Xj = x) = 0,or,insomewhat sim

plernotation, E{ui\Xi) = 0.

4.4

165

Test score

720 r

700

Distribution of Y when X = 15

Distribution of Ywhen X = 20

680

Distribution of Ywhen X = 25

660

E(Y\X=15)

EiY\X = 20)

640

E{)'|X = 25)

Po

620

600

10

15

20

25

30

Student-teacher ratio

The figure shows the conditional probability oftestscores for districts with class sizes of 15, 20,

and 25 students. The mean of the conditional distribution of test scores, given the student-

teacher ratio, E{Y\X), is the population regression line /3o + j8]X. At a given value ofX, Y\s

distributed around the regression line and theerror, u = Y (/So + jSiX), has a conditional

mean of zero for all values of X.

assuming thatthepopulation regression line istheconditional mean of Yi given Xi

(a mathematical proofof thisis left asExercise 4.6).

The conditional mean of u in a randomized controlled experiment.

In a ran

is done using a computer program that uses no information about the subject,

ensuring thatX isdistributed independently of all personal characteristics of the

subject. Random assignment makes X and u independent, which in turn implies

that the conditional mean of u given X is zero.

the best that canbe hoped for is that X is as if randomly assigned, in the precise

sense that E{u!\Xi) = 0. Whether this assumption holds ina given empirical appli

cation with observational data requires careful thought and judgment, and we

return to this issue repeatedly.

166

Correlation and conditional mean. Recall from Section 2.3 that if the condi

tional mean of one random variable given another is zero, then the two random

variables have zero covariance and thus areuncorrelated [Equation (2.27)]. Thus

theconditional mean assumption E{Ui\Xi) = 0 implies thatXi andm/ areuncorre

lated, or corr(^, m,) = 0.Because correlation is a measure of linear association,

this implication does not go the other way; even if Xiand are uncorrelated, the

conditional mean of m, given Xi might be nonzero. However, if Xi and m, are cor

related, thenit must be thecase thatE{ui\Xi) isnonzero. It istherefore often con

venient to discuss theconditional mean assumption in terms ofpossible correlation

between Xi and

tion is violated.

Are

Independently and Identically Distributed

The second least squares assumption isthat{X}, 1^), i = 1,..., n,are independently

and identically distributed (i.i.d.) across observations. As discussed in Section 2.5

(Key Concept 2.5), this assumption is a statement about how thesample isdrawn.

Ifthe observations are drawn by simple random sampling from asingle large pop

ulation, then {Xi, If),/ = 1,..., n, are i.i.d. For example, let X be the age of a

worker and Ybe his or her earnings, and imagine drawing a person at random

from the population of workers. That randomly drawn person will have a certain

age and earnings (thatis,^ and Ywill takeonsome values). If a sample ofn work

ers is drawn from this population, then {Xi, 1^), / = 1,..., n,necessarily have the

same distribution. Ifthey are drawn at random they are also distributed indepen

dently from one observation to the next; that is, they are i.i.d.

The i.i.d. assumption is a reasonable one for many data collection schemes.

For example, survey data from a randomly chosen subset ofthe population typi

cally can be treated as i.i.d.

Not all sampling schemes produce i.i.d. observations on{Xi, >;),however. One

example is when the values ofXare not drawn from a random sample ofthe pop

ulation but rather are set by a researcher as part ofan experiment. For example,

suppose a horticulturalist wants tostudy the effects ofdifferent organic weeding

methods {X) on tomato production (7) and accordingly grows different plots of

tomatoes using different organic weeding techniques. If she picks the techniques

(the level ofX) tobe used on the plot and applies the same technique tothe

plot inall repetitions ofthe experiment, then the value ofZ;does not change from

one sample tothenext. Thus ^ isnonrandom (although theoutcome 1; israndom),

so the sampling scheme is not i.i.d. The results presented in this chapter developed

4.4

167

fori.i.d. regressors arealso trueifthe regressors arenonrandom.The case ofa nonrandom regressor is, however, quite special. Forexample, modem experimental pro

tocols would have the horticulturalist assign the level of X to the different plots

sible bias by the horticulturalist (she might use her favorite weeding method for

the tomatoes in the sunniest plot). Whenthismodemexperimental protocolisused,

the level of Xis random and {Xi^ Yj) are i.i.d.

unit of observation over time.For example, we mighthave data on inventorylev

els (7) at a firm and the interest rate at which the firm canborrow (^), where

these data are collectedover time from a specific firm; for example, they mightbe

recorded four times a year (quarterly) for 30 years. This is an example of time

series data,and a keyfeature of time series data is that observations falling close

to each other in time are not independent but rather tend to be correlated with

eachother;ifinterestrates are lownow, they are likely to be lownext quarter.This

Timeseriesdata introducea set of complications that are best handledafter devel

opingthe basictools of regression analysis.

The third least squares assumption isthatlarge outliersthat is, observations with

values ofXi, Yi, orboth that are far outside theusual range ofthedata-are unlikely.

Large outliers can make OLS regression results misleading. This potential sensitiv

ity ofOLS to extreme outliers isillustrated inFigure 4.5 using hypothetical data.

In this book, the assumption that large outliers are unlikely is made mathe

matically precise by assuming thatX and Yhave nonzero finite fourth moments:

0 < E{Xj) < oo and 0 < E{Yf) < oo. Another way to state this assumption is

that X and Y have finite kurtosis.

encountered this assumption in Chapter 3 when discussing the consistency of the

sample variance. Specifically, Equation (3.9) states that the sample variance s? is

a consistent estimator ofthe population variance o-y {sy

* o-y). If >i,..., are

i.i.d. and the fourth moment of Yi is finite, then the law of large numbers in Key

Concept 2.6 applies to the average, ^2?=i(l/-iay)^, a key step in the proof in

Appendix 3.3showingthat Syis consistent.

168

This hypothetical data set has one out

lier. The OLS regression line estimated

2000

relationship between X and Y, but the

OLS regression line estinnatedwithout

the outliershowsno relationship.

1700

1400

1100

800

includmg outlier

500

200

0

30

excluding oudier

40

50

60

70

X

height incentimeters instead. One way tofind outliers is to plot your data. Ifyou

decide that an outlier is due to a data entry error, then you can either correct the

error or,if that is impossible, drop the observation from your data set.

Data entry errors aside, the assumption offinite kurtosis is a plausible one in

many applications with economic data. Class size is capped by the physical capac

ity ofaclassroom; the best you can do on astandardized test is toget all the ques

tions right and the worst you can do is toget all the questions wrong. Because class

size and test scores have a finite range, they necessarily have finite kurtosis. More

generally,commonly used distributions such as the normal distribution have four

moments. Still, as a mathematical matter, some distributions have infinite fourth

moments, and this assumption rules out those distributions. If the assumption of

finite fourth moments holds, then itis unlikely that statistical inferences using OLS

will be dominated by a few observations.

The three least squares assumptions for the linear regression model are summa

rized in Key Concept 4.3. The least squares assumptions play twin roles, and we

return to them repeatedly throughout this textbook.

4.5

169

= jSo + jSi^-+ Mi,/ = 1 , n , where

1. The error term W/ has conditional mean zero given E{ui\Xi) =0;

2. {Xi^ Yi), i = 1,... ,n,are independent and identically distributed (i.i.d.) draws

from their joint distribution; and

Their first role is mathematical: If these assumptions hold, then, as is shown

in the next section, in large samples the OLS estimators have sampling distribu

tions that are normal. In turn,this large-sample normal distribution letsusdevelop

methods for hypothesis testing and constructing confidence intervals using the

OLS estimators.

Their second role is to organize the circumstances that pose difficulties for

OLS regression. As we will see, the first least squares assumption is the most

important to consider in practice. Onereason why thefirst least squares assump

tion might not hold in practice is discussed in Chapter 6,and additional reasons

are discussed in Section 9.2.

application. Although itplausibly holds in many cross-sectional data sets, the inde

pendence assumption is inappropriate for time series data. Therefore, the regres

sion methods developed under assumption 2 require modification for some

applications with time series data.

The third assumption serves as a reminder that OLS, just like the sample

mean, canbe sensitive to large outliers. If your dataset contains large outliers, you

should examine those outliers carefully to make sure those observations are cor

rectly recorded and belong in the data set.

Estimators

Because the OLS estimators jSo and

ple, the estimators themselves are random variables with aprobability distribution

the sampling distributionthat describes the values they could take over different

possible random samples. This section presents these sampling distributions.

170

In small samples, these distributions are complicated, but in large samples, they

are approximately normal because of the central limit theorem.

Review of the samplingdistribution of V. Recall the discussion inSections 2.5

and 2.6 about the sampling distribution ofthesample average, F,an estimator of

the unknown population mean ofK, fxy. Because Yis calculated using arandomly

drawn sample,7 is a random variable that takes on different valuesfrom one sam

ple to the next; the probability of these different values is summarized in its sam

when the sample size is small, it is possible to make certain statements about it

that hold for all n. Inparticular, the mean ofthe sampling distribution is fjLyy that

is, E{Y) = so Yis an unbiased estimator of/xy. If is large, then more can be

said about the sampling distribution. Inparticular, the central limit theorem (Sec

tion2.6) states that this distribution isapproximately normal.

mators Pq and of the unknown intercept /3o and slope /3i of the population

regression line. Because the OLS estimators are calculated using a random sam

ple, ^0 and

are random variables that take on different values from one sam

ple to the next; the probability of these different values is summarized in their

sampling distributions.

Although the sampling distribution ofjSq and jSi can be complicated when the

samplesizeissmall, it is possible to make certainstatementsabout it that holdfor

all n. In particular, the mean ofthe sampling distributions ofj8o and jSi are (Sq and

/3i. In other words, under the least squares assumptions in Key Concept 4.3,

(4.20)

that is, ^0 and are unbiased estimators of jSq and /3i.The proof that jSj is unbiased

isgiven inAppendix 4.3, andtheproof that jSq isunbiased isleft asExercise 4.7.

Ifthe sample is sufficiently large, by the central limit theorem the sampling

distribution of

This argument invokes the central limit theorem. Technically, the central limit

theorem concerns the distribution ofaverages (like V). Ifyou examine the numer

ator in Equation (4.7) for ^[, you will see that it, too, is atype of averagenot asim

ple average, like 7, but an average of the product, {Yi -Y){Xi - ^). As discussed

4.5

171

/V

Iftheleast squares assumptions inKey Concept 4.3 hold, then inlarge samples /3o

and j8i have ajointly normal sampling distribution. The large-sample normal dis

tribution of ^1 is Nipy o-|), where the variance of this distribution, a*! ,is

2 _ 1var[(^ - /XA:)M,

(4.21)

[var(Ji^)]2

-I),where

f^x

lEiXl).

X,,

(4.22)

further inAppendix 4.3, the central limit theorem applies tothis average so that, like

thesimpler average Y, it is normally distributed inlarge samples.

The normal approximation to thedistribution oftheOLS estimators inlarge

samples is summarized inKey Concept 4.4. (Appendix 4.3 summarizes the deriva

tion ofthese formulas.) A relevant question inpractice ishow large n must be for

these approximations to be reliable. InSection 2.6, we suggested that n = 100 is

sufficiently large for the sampling distribution ofYtobe well approximated by a

normal distribution, and sometimes smaller n suffices. Thiscriterion carries over

modern econometric applications, n > 100, sowe will treat thenormal approxi

mations to the distributions of the OLS estimatorsas reliable unlessthere are good

reasons to think otherwise.

that is, when the sample size is large, j8o and j8i will be close tothe true population

coefficients and jSj with high probability. This isbecause the variances

and

cr^ of the estimators decrease to zero as n increases {nappears in the denomina

tor of the formulas for the variances), so the distribution of the OLS estimators

will betightly concentrated around their means, /3o and /3i, when n islarge.

of ^i. Mathemati

cally, this implication arises because the variance of /3i in Equation (4.21) is

inversely proportional to the square of the variance of Xf. the larger is var(J,),the

larger is the denominator in Equation (4.21) so the smaller is o"! .To get abetter sense

172

a set of X/s with a small

206

a large variance. The

204

202

200

I ,

<

0,

198

o o

e ^ o

^

<0

196

194

97

98

99

100

101

102

103

X

ofwhy this is so, look at Figure 4.6, which presents ascatterplot of150 artificial data

points on XandJ'.The data points indicated by the colored dots are the 75 observa

tions closest to X. Suppose you were asked to draw a line as accurately as possible

through either the colored orthe black dots-which would you choose? Itwould be

easier to draw aprecise line through the black dots, which have alarger variance than

the colored dots. Similarly, the larger the variance of X, the more precise is

The distributions in Key Concept 4.4 also imply that the smaller is the vari

ance of the error ///, the smaller isthe variance of /3|.This can be seenmathemat

ically in Equation (4.21) because w,- enters the numerator, but not denominator, of

r^|. Ifall Ui were smaller by afactor ofone-half but the ^s did not change, then

one-fourth (Exercise 4.13). Stated less mathematically, if the errors are smaller

(holding the A"s fixed), then the data will have atighter scatter around the popu

lation regression line so its slope will be estimated more precisely.

The normal approximation to the sampling distribution of and jSi is apow

erful tool. With this approximation in hand, we are able to develop methods for

making inferences about the true population values ofthe regression coefficients

using only a sample of data.

Summary

173

4.6 Conclusion

This chapter has focused onthe use ofordinary least squares toestimate the inter

cept and slope ofa population regression line using a sample ofn observations on

a dependent variable, Y, and a single regressor, X There are many ways todraw a

straight line through a scatterplot, but doing so using OLS has several virtues. If

the least squares assumptions hold, then theOLS estimators ofthe slope and inter

cept are unbiased, are consistent, and have a sampling distribution with avariance

that isinversely proportional to thesample size n.Moreover, ifn islarge, then the

sampling distribution of the OLS estimator is normal.

These important properties of thesampling distribution of the OLS estima

tor hold under the three least squares assumptions.

The first assumption isthat the error term in the linear regression model has

a conditional mean ofzero, given the regressor XThis assumption implies thatthe

OLS estimator is unbiased.

The second assumption is that {Xf, Yi) are i.i.d., as is thecase ifthedata are col

lected by simple random sampling. This assumption yields the formula, presented in

Key Concept 4.4, for the variance ofthe sampling distribution ofthe OLS estimator.

The third assumption isthatlarge outliers areunlikely. Stated more formally,

Xand Fhave finite fourth moments (finite kurtosis).The reason for this assump

tion is that OLS can be unreliable if there are large outliers.Taken together, the

three least squares assumptions imply that the OLS estimator is normally distrib

uted in largesamples as described in Key Concept 4.4.

The results inthis chapter describe the sampling distribution ofthe OLS esti

mator. By themselves, however, these results are not sufficient totest a hypothe

sis about the value of j8i or to construct a confidence interval for Doing so

is, thestandard erroroftheOLS estimator. This step-moving from thesampling

distribution ofj8i toitsstandard error, hypothesis tests, and confidence intervals

is taken in the next chapter.

Summary

L The population regression line, /3o + jSi^, is the mean ofF as a function of

the value of X. The slope, j8], isthe expected change in Yassociated with a

one-unit change inX. The intercept, /3o, determines the level (or height) of

the regression line. Key Concept 4.1 summarizes the terminology ofthe pop

ulation linear regression model.

174

2.

3.

{Yi,Xi),i = l,...,nby ordinary least squares (OLS). The OLS estimators of

the regression intercept and slope are denoted /Sq and Pi.

The and standard error of the regression (SER) are measures of how

close the values of Yi are to the estimated regression Hne.The R^ is between

0 and 1,with a larger value indicating that the 1^'s are closer to the line. The

standard error of the regression is an estimator of the standard deviation of

the regression error.

There are three key assumptions for the linear regression model: (1) The

regression errors, Uj, have a mean ofzero conditional on the regressors Xf,

(2) the sample observations are i.i.d. random draws from the population; and

4.

^0 and Pi are (1) unbiased, (2) consistent, and (3) normally distributed when

the sample is large.

Key Terms

linear regression model witha single

dependent variable (152)

independent variable (152)

sample regression line (156)

sample regressionfunction (156)

predicted value (156)

regressor (152)

residual (157)

populationregression function (152)

population intercept (152)

population slope (152)

population coefficients (152)

regression R^ (161)

explained sumof squares (ESS) (161)

total sum of squares (TSS) (161)

sum of squared residuals (SSR) (162)

standarderror of the regression (SER)

regressor (152)

parameters (152)

error term (152)

(162)

estimators (156)

4,1

Explain the difference between jSj and /3i; between the residualand the

regression error ur, and between the OLS predicted value Yj and E(Yi\Xi).

4.2

assumption isvalid, then provide anexample inwhich the assumption fails.

Exercises

175

F?- = 0.9. Sketch a hypothetical scatterplot of data for a regression with

i?2 = 0.5.

Exercises

4.1 Suppose that a researcher, using data on class size (C5) and average test

scoresfrom 100 third-grade classes, estimatesthe OLS regression

TestScore = 520.4 - 5.82 X C5, R- = 0.08, SER = 11.5.

that classroom's average test score?

b. Last year a classroom had19 students, and this year it has 23 students.

What is the regression's prediction for the change in the classroom

average test score?

c. The sample average class size across the 100 classrooms is21.4. What

isthe sample average of the testscores across the 100 classrooms?

{Hint: Review theformulas for the OLS estimators.)

d. What isthe sample standard deviation oftest scores across the 100

4.2 Suppose that arandom sample of200 twenty-year-old men is selected from

a population and that these men's height and weight are recorded. Aregres

sion of weight on height yields

where Weight ismeasured inpounds and Height is measured ininches.

a. What is theregression's weight prediction for someone who is 70 in.

tall? 65 in. tall? 74 in. tall?

b. A man has a late growth spurt and grows 1.5 m. over the course ofa year.

What isthe regression's prediction for the increase in this man's weight?

inches these variables are measured in centimeters and kilograms.

176

age (measured in years) using a random sample ofcollege-educated fulltime workers aged 25-65 yields the following:

AWE = 696.7 -h 9.6 XAge,

= 0.023,SER = 624.1.

b. The standarderror of the regression (SER) is624.1. What are the units

c. Theregression R^ is 0.023. What are the units of measurement for the

d. What isthe regression's predicted earnings for a 25-year-old worker?

A 45-year-oId worker?

Why or why not?

f. Given what you know about the distribution ofearnings, do you think

it isplausible thatthedistribution oferrors intheregression isnor

mal? {Hint: Do you think that the distribution issymmetric or

skewed? What is the smallest value of earnings, andisit consistent

with a normal distribution?)

g. The average age in this sample is 41.6 years. What isthe average value

oiAWE inthesample? {Hint: Review Key Concept 4.2.)

4.4 Read the box "The 'Beta' of a Stock" in Section 4.2.

Show that the variance of(/? - Rf) for this stock is greater than the

variance of {R, - 7?,).

possible that variance of{R - R^) for this stock isgreater than the

variance of(/?, /?,)? {Hint: Don't forget the regression error.)

c. In a given year, therate ofreturn on3-month Treasury bills is3.5%

and the rate ofreturn ona large diversified portfolio ofstocks (the

S&P 500) is7.3%. For each company listed in the table in the box, use

theestimated value of to estimate thestock's expected rateofreturn.

4.5 A professor decides to run an experiment to measure the effect of time

pressure on final exam scores. He gives each of the 400 students in his

course the same final exam, but some students have 90 minutes tocomplete

the exam while others have 120 minutes. Each student is randomly assigned

Exercises

177

one of the examination times based on the flip of a coin. Let 1/denote the

student (0

^ 100),

= 90 or 120), andconsider the regression model Yi = Pq + piXi+ Ui.

different values of up.

c. Are the otherassumptions in Key Concept 4.3 satisfied? Explain,

i. Compute the estimated regression's prediction for the average

score of students given 90minutes to completethe exam. Repeat

for 120 minutes and 150 minutes.

additional 10 minutes on the exam.

4.6 Show that the first least squares assumption, E{ui\Xi) = 0,implies that

E{Yi\Xi) = l3o + liiX,

4.7 Show that jSq isan unbiased estimator of jSq. (Hint: Use the fact that

is

4.8 Suppose that all ofthe regression assumptions inKey Concept 4.3 are sat

isfied except that the first assumption is replaced with (/1 Xi) =2. Which

parts ofKey Concept 4.4 continue to hold? Which change? Why? (Is /3i

normally distributed inlarge samples with mean and variance given inKey

Concept4.4? What about )8o?)

b. A linear regression yields = 0.Does this imply that

= 0?

4.10 Suppose that 1/=/3o +where (^, W/) are i.i.d., and Xi is a

Bernoulli random variable with Pr(A' = 1) = 0.20. When X=l, Ui isN(0,4);

when

= 0, Mj is N(0,1).

b. Derive anexpression for the large-sample variance of j8i. [Hint: Eval

uate the terms in Equation (4.21).]

4.11 Consider the regression model

= jSo +

+ /

a. Suppose you know that jSq = 0.Derive a formula for the least squares

estimator of jSi.

178

estimator of/3i.

value of the sample correlation between X and Y. That is, show that

b. Show that the R} from the regression of 7 on X is the same as the

from the regression of X on Y.

X and Y, and iy and sx are the sample standarddeviations ofX and Y.

4.13 Suppose that Yi = (Bq + f^iXj + KUj, where kis anon-zero constant and (Yi, Xj)

satisfy the three least squares assumptions. Show that the large sample

variance of

is given by

4.14 Show that the sample regression line passes through the point (X,Y).

Empirical Exercises

E4.1 On the text Web site http://www.pearsonglobaIeditions.coii]/stock, you will

find a data file CPS08 that contains an extended version of the data set used

in Table 3.1 for 2008. It contains data for full-time, full-year workers, age

25-34, with a high school diploma or B.A./B.S. as their highest degree. A

detailed description is given in CPS08_Descripfion, also available on the

Web site. (These are the same data as in CPS92_08 but are Imiited to the

year 2008.) In this exercise, you will investigate the relationship between a

worker's age and earnings. (Generally, older workers have more job expe

rience, leading to higher productivity and earnings.)

What is the estimated intercept? What is the estimated slope? Use the

estimated regression to answer this question: How much do earnings

increaseas workers age by 1 year?

mated regression.Alexis is a 30-year-old worker. Predict Alexis's

earnings using the estimated regression.

across individuals? Explain.

Empirical Exercises

179

will find a data file TeachingRatings that contains data on course evalua

tions, course characteristics, and professor characteristics for 463 courses

TeachmgRatings_Description, also available on theWeb site. One of the

characteristics is an index of the professor's"beauty" as rated by a panel of

six judges. In this exercise, you will investigate how course evaluations are

related to the professor's beauty.

on the professor's beauty {Beauty). Does there appear to be a rela

tionship between the variables?

professor's beauty (Brawry). What is the estimated intercept? What is

the estimated slope? Explain why the estimated intercept is equal to

Beautyl)

Stock's value of Beauty is one standard deviation above the average.

Predict Professor Stock's and Professor Watson's course evaluations.

"large" and "small."

across courses? Explain.

will find a datafile CollegeDistance that contains datafrom a random sam

In this exercise, you will use these data to investigate the relationship

between the number ofcompleted years ofeducation foryoung adults and

the distance from each student's high school to the nearest four-year col

who live closer to a four-year college should, on average, complete

' These data were provided by Professor Daniel Hamermesh of the University ofTexas atAustin and

were used in his paper with Amy Paricer, "Beauty in the Classroom: Instructors' Pulchritude and Puta

tive Pedagogical Productivity," Economics ofEducation Review, August 2005,24(4); 369-376.

180

Distance.Description, also available on the Web site.^

to the nearest college {Dist), where Dist is measured in tens of miles.

theestimated intercept? What istheestimated slope? Use theesti

mated regression to answer this question: How does the average

value ofyears ofcompleted schooling change when colleges are built

close to where students go to highschool?

b. Bob's high school was 20 miles from thenearest college. Predict Bob's

years ofcompleted education using the estimated regression. How

would the prediction change if Bob lived 10 miles from the nearest

college?

educational attainment across individuals? Explain.

d. What isthevalue of thestandard errorof the regression? What are

the units for thestandard error (meters, grams, years, dollars, cents, or

something else)?

will find a data file Growth that contains data onaverage growth rates from

1960 through 1995 for 65 countries along with variables that are potentially

related togrowth. A detailed description is given in Growth_Description,

also available on the Web site. Inthis exercise, you will investigate the rela

tionship between growth and trade.3

the average trade share (TradeShare). Does there appear to bea rela

tionship between the variables?

b. Onecountry, Malta, has a trade share much larger than theother coun

tries. FindMaltaon the scatterplot. Does Malta look Uke an outlier?

c. Using all observations, run a regression of Growthon TradeShare.

What is the estimated slope? What is the estimated intercept? Use the

^These data were provided by Professor Cecilia Rouse of Princeton University and were used in her

paper "Democratization or Diversion? The Effect ofCommunity Colleges onEducational Attain

ment," Journal ofBusiness and Economic Statistics,ApxW 1995,12(2): 217-224.

3These data were provided by Professor Ross Levine of Brown University and were used in his paper

withThorsten Beck and Norman Loayza, "Finance and the Sources oi Gtomh,''Journal ofFinancial

Economics, 2000,58:261-300.

181

regression to predict the growth rate for a country with a trade share

of 0.5 and with a trade share equal to 1.0.

the same questions in c..

be included or excluded from the analysis?

APPENDIX

4.1

The California Standardized Testing and Reporting data set contains data on test perfor

mance, school characteristics, and student demographic backgrounds. Thedata used here

are from all 420 K-6 and K-8 districts in California with data available for 1999.Testscores

are the average ofthe reading and math scores ontheStanford 9Achievement Test, astan

dardized test administered to fifth-grade students. School characteristics (averaged across

number ofcomputers perclassroom, and expenditures perstudent. The student-teacher

ratio used here is the number of students in the districtdivided by the number of full-time

equivalent teachers. Demographic variables for the students also are averaged across the

district. The demographic variables include the percentage ofstudents who are in the pub

lic assistance program CalWorks (formerly AFDC), the percentage ofstudents who qual

ify for areduced price lunch, and the percentage ofstudents who are English learners (that

is, students for whom English is a second language). All ofthese data were obtained from

the California Department of Education (www.cde.ca.gov).

APPENDIX

This appendix uses calculus to derive the formulas for the OLS estimators given in Key

Concept 4.2. To minimize the sum of squared prediction mistakes S"=|(>; - - b\XiY

[Equation (4.6)], first take the partial derivatives with respect to6o and b^.

O0oi=l

2 (>; - ^0 -

=-2i;{1!

- io - h^X,) and

/=1

(4.23)

= -2 s ( - ^0 - b.XdX,

(4.24)

182

The OLS estimators,

and

that minimize

^0 ~ bxXi)^ or, equivalently, the values of bo and bi for which the derivatives in

Equations (4.23) and (4.24) equal zero. Accordingly, setting these derivatives equal tozero,

collecting terms, and dividing by n shows that the OLS estimators, jSq and j8i, must satisfy

the two equations

y-iSo-^1^=0and

(4.25)

^'2Xfyi-pX-kl^Xj =0.

(4.26)

and

^1=7^

yields

i=l

(4.27)

;=1

A, =y-^,X

(4.28)

Equations (4.27) and (4.28) are the formulas for jSq and j8] given in Key Concept 4.2; the

formula =Sxy/sx is obtained by dividing the numerator and denominator in Equa

tion (4.27) by - 1.

APPENDIX

In this appendix, we show that the OLS estimator j8] is unbiased and, in large samples, has

thenormal samphng distribution given inKey Concept 4.4.

We start by providing an expression for j8) interms ofthe regressors and errors. Because

^ ~ ^0 + ^

Equation (4.27) is

(=1

;=i

(4.29)

= - ii).

1=1

=l

183

Now

- ^){"/ - )=

- X)Ui - X)u = ^Ui^i - 'T)Ui, where

the final equality follows from the definition of X, which implies that ^"=i{Xi -X)u =

[S^LiA-f - nX]u =0. Substituting ^Ui^i- ^(/ - ) =^Ui^i- X)Ui into thejinal

expression in Equation (4.29) yields ^"=i{Xi-X){Yi-Y) =^i'2i=i{Xi-X)^ +

2"=i(A^- - X)Ui. Substituting this expression inturn into the formula for

inEquation

(4.27) yields

/3i= |8| +

/=!

(4.30)

i=l

The expectation of is obtained by taking the expectation of both sides of Equation (4.30).

Thus,

(^i) = /3i +

/=!

(4.31)

= /3, +

/=1

= /3i,

1=1

where thesecond equality inEquation (4.31) follows by using thelaw ofiterated expecta

tions (Section 2.3). By the second least squares assumption, M/ is distributed independently

of X for all observations other than i, so (m/| A'l,..., X) = E{ui\Xi). By the first least

squares assumption, however, E{ui\Xi) =0. Itfollows that the conditional expectation in

large brackets in the second line of Equation (4.31) is zero, so that E{^\- ^\\Xx,... X)=Q.

Equivalently, (j8i|A'j, ...,X) =

that is,

By the law of iterated expectations, (j8i - P\) =E[E{^^ - p\\Xx,^)] =0, so that

(j8|) =

The large-sample normal approximation to the limiting distribution of px (Key Concept

4.4) is obtained by considering the behavior ofthe final term in Equation (4.30).

Firstconsider the numeratorof thisterm. Because X isconsistent, if the sample size is

large, ^ is nearly equal to /i;i'.Thus, to aclose approximation, the term in the numerator of

184

Equation (4.30) is the sample average v, where v,- = {Xi - fix)ui. By the first least squares

assumption, v,- has a mean ofzero. By the second least squares assumption, v,- is i.i.d.The

variance of V/ is al =var[(A'/

nonzero and finite. Therefore, vsatisfies all the requirements of the central limit theorem

(Key Concept 2.7).Thus v/a^ is, in large samples, distributed A^(0,1), where o-| =

Thus

the distribution of? iswell approximated by the N{0, al/n) distribution.

Next consider the expression in the denominator in Equation (4.30); this is the sample vari

ance ofX(except dividing by n rather than n 1, which is inconsequential ifn is large). As

discussed in Section 3.2 [Equation (3.8)], thesample variance isa consistent estimator ofthe

population variance, soin large samples itis arbitrarily close tothe population variance ofX

Combining these two results, we have that, in large samples, jSj - ^\ = v/var(A;),

where cr| =

(4.21).

TheOLS residuals and predicted values satisfy;

1 "

2"/ =0,

(4.32)

J=I

si >; =?.

(4.33)

7=1

/I

2 itiXi =0and

=0, and

(4.34)

/=!

(4.35)

Equations (4.32) through (4.35) say that the sample average ofthe OLS residuals is zero;

the sample average of the OLS predicted values equals Y\ the sample covariance 52;^between the OLS residuals and the regressors is zero; and the total sum of squares is the

sum of squared residuals and the explained sum ofsquares [the ESS, TSS, and SSR are

defined in Equations (4.14), (4.15), and (4.17)].

To verify Equation (4.32), note thatthe definition of jSq lets uswrite theOLS residu

t^'i=t(^-Y)-p,{X,-X).

(=1

/=i

/=i

But the definitions of Yand ^ imply that 2:;'=i()^-P) =0 and 2"=i(;t;--X) =0, so

^','=\Uj = 0.

so S"=iT/ =

185

+ ^"=iU\ = 2/'=i

1=1

(4.36)

/=!

where the final equality in Equation (4.36) is obtained using the formula for

(4.27). This result, combined with the preceding results, implies that

in Equation

= 0.

Equation (4.35) follows from the previous results and some algebra:

/=1

= 2(1;- )'+

/=1

/=!

i=l

(=1

= SSR + ESS +

hw-y)

(4.37)

= SSR + ESS,

i=l

where the final equality follows from S/LiM/V? = S'Lii^jSo + ^lA^) =/So2/'=|W,-+

^\^'i=\UiXj = 0 by theprevious results.

- Courseoutline SE(ECO00037M) Term1 201415Uploaded byJoe Ogle
- RegressionUploaded byparkchick
- Linear Regression ModelUploaded byHedeto M Dato
- Calibration by Linear Regression - TutorialUploaded bypatrickjanssen
- “Can Firm-specific idiosyncratic financial data provide a solution to the macro-economic factor-risk optimization problem?” “Wage regression residuals- Macro-economic factor cost optimization”Uploaded byIOSRjournal
- The Big Problems FileUploaded byMichael Mazzeo
- melkinsburg_2012Uploaded bymaria_romaniello
- Part 2 Rothart Empirical Research MethodsUploaded byjulile
- NSE_NYSE_Full_Final (3).docxUploaded byHaritika Chhatwal
- RegionalismUploaded byAnkush Gupta
- Multi Linear Regression Handout 2x1Uploaded byAbhishek Sarda
- Study Guide 2Uploaded byngocmimi
- Econ 141Uploaded byMonki Chiu Vongola
- Ols DerivationUploaded byheng_hau_1
- SSRN-id2241926Uploaded byDavis
- BMS2024-Multiple Linear Regression-1 Lesson (1)Uploaded byfayz_mtjk
- Appendix cUploaded byhermerry
- Reg LinealUploaded bynacho2012
- RegressionUploaded byaviniman
- American Economic Journal: Economic Policy 2009, 1:2, 190–225 Http://Www.aeaweb.org/Articles.php?Doi=10.1257/Pol.1.2.190Uploaded byPeter Martin
- chapter09 part 3Uploaded byapi-232613595
- WykladUploaded byababacarba
- Ha Us ManUploaded byshahriarsadighi
- Topic_3Uploaded bySouleymane Coulibaly
- sess2Uploaded byNaveen Goyal
- 13. ShawgatUploaded byarnabdas1122
- Final ReportUploaded byMazhar Nabi
- Application of Panel Data to the Effect of Five (5) World Development Indicators (WDI) on GDP Per Capita of Twenty (20) African Union (AU) Countries (1981-2011)Uploaded byAlexander Decker
- 07. Optimal Bank Capital - Bank of England - April 2011Uploaded byGokce Ozcan
- Chapter 009Uploaded bytrannnen

- Dew - Effectual Versus Predictive Logics in Entrepreneur_3Uploaded byNdivhuho Neosta
- RE InvesmentsUploaded byNdivhuho Neosta
- Regression Explained SPSSUploaded byKashif Munir Idreesi
- Stock ValuationUploaded byNdivhuho Neosta
- Heteros Ce Dasti CityUploaded byNdivhuho Neosta
- Case Study Risk Analysis and Real Options in Real Estate DevelopmentUploaded byNdivhuho Neosta
- IRRvsTWRRUploaded byAmol Chavan
- TFIP Fund ListUploaded byNdivhuho Neosta
- Econometric sUploaded byNdivhuho Neosta
- Entrepreneurial Expertise and the Use of ControlUploaded byNdivhuho Neosta
- Irrational+ExuberanceUploaded bytrendscatcher
- CaseUploaded byNdivhuho Neosta
- Chandler et al_Causation and effectuation processes.pdfUploaded byNdivhuho Neosta
- A Meta-Analytic Review of Effectuation and Venture PerformanceUploaded byNdivhuho Neosta
- Black Economic Power and Nation Building in Post Apartheid South AfricaUploaded byNdivhuho Neosta
- Cash Flow WaterfallUploaded byNdivhuho Neosta
- business_plan_presentation_format.pptUploaded byFahad Sarfraz
- WorkbookTemplateUploaded byNdivhuho Neosta
- Sample ContractUploaded byNdivhuho Neosta
- Investmentl.docUploaded byNdivhuho Neosta
- Lect 12 RealestateUploaded byNdivhuho Neosta
- statsUploaded byNdivhuho Neosta
- class4Uploaded byNdivhuho Neosta
- Week 7 NotesUploaded byNdivhuho Neosta
- StatisticsUploaded byNdivhuho Neosta
- AnalysisUploaded byNdivhuho Neosta
- Caso 1Uploaded byPlanifica Tus Finanzas
- ModelAnalysisUploaded byNdivhuho Neosta
- BankingUploaded byjayashree_jothivel

- Buss and Vilkov - Option-implied correlation and factor betas revisitedUploaded bySeanHo
- Ppt Mbf104 Unit 01Uploaded byTushar Ahuja
- Chapter 11Uploaded byMohammed Khouli
- 2013_Hult_MAUploaded bytomlt
- PR_3_2014Uploaded byYeong Chee Heng
- AMEC FS Guidelines 042806Uploaded byPanji Aji Wibowo
- Lect_7&8p6Uploaded byiltdf
- PLAN INTEGRAT DE DEZVOLTARE URBANA BAILE HERCULANE-EnglezaUploaded byantoniacristina
- CivilsDigest Ed#7 - August 2016Uploaded byAkshat Chourey
- finmanUploaded byVivi Fang-asan
- Bank-Asia PDF.pdfUploaded byMd Khaled Noor
- NISM-Series-V-C-MFD-L2 workbook (version-March-2015).pdfUploaded bySanjeev Sinha
- Introduction to Foreign ExchangeUploaded byAli Mukhtar
- Tutorial 2_Mathematics of Finance.pdfUploaded byИбрагим Ибрагимов
- Muiruri, Paul Munene- PhD Business Administration ( Finance)-2015Uploaded bySigei Leonard
- Goodwill_letter.docxUploaded byArjun AK
- ANTHROPOLOGY for BEGINNERS_ Problems of Indian Tribes and Measures From Government of IndiaUploaded bySurendraSinghRathoreChowk
- DK5739_CH6Uploaded byÖzer Ökten
- Economics reviewUploaded byBogdan Bogdanovic
- Staff Selection Commission Graduate Level Tier 2 SylabussUploaded byJanardhanan Kavalan
- SF 28 Affidavit of Individual SuretyUploaded byandy miller
- ,,,Uploaded byShai Garcia
- Online SAP FICO Training with Live Projects in USA|UK|Canada|Australia|IndiaUploaded byPriyanka CH
- 2014 Director and Officer Questionnaire UpdateUploaded bySteve Quinlivan
- Is FDIC Negligent or Fleecing Americans in the LIBOR ScandalUploaded byI-Newswire.com
- Compiled NIRC ProvisionsUploaded byStradivarium
- Foreign-Based Competition and Corporate Diversification StrategyUploaded byHenry Dong
- tugas keuanganUploaded byAnggun Diah Safitri
- Barclays Money Skills Activity Pack 16 25 YearsUploaded bylol123d
- Defendant’s Second Request for AdmissionsUploaded by1SantaFean