29) (2. TI.29) through (2.35) follow from the definitions of the mean.e results in Key Concept 2. (2. Equations (2. variance.3 Two Random Variables 35 Means. (2. and V be random variables. Y)I s 1and l(Txyl :5 v'(T}(T9 (correlation inequality). The variance of the sum of X and Y is the sum of their variances plus two times their covariance: var(X+ Y) =var(X) +var(Y) +2cov(X. and covariances involving weighted sums of random variables are collected in Key Concept 2.xtv. . b. Y) =(T}+(T9+2(Txy. and Covariances of Sums of Random Variables Let X. and covariance: ~ 2. and Jet a.let /. (2.Lx and (T) be the mean and variance of X.35) E(XY) = (TXY + !J.37) Useful expressions for means. Y) = b(Txy + WVY. var(a + bY) = b2(T9.y.3. (2. then the covariance is zero and the variance of their sum is the sum of their variances: var(X + Y) = var(X) + var( Y) = (T) + (T9 (if X and Yare independent). Icorr(X. and c be constants. variances. Y.33) (2.36) If X and Yare independent.30) COV(ll + bX + cV.1.34) (2.3 are derived in Appendix 2. Variances.x + C!J.let a xv be the covariance between X and Y (and so forth for the other variables).2.3 E(a + bX + cy) = a + b!J.
between Jl . and variance u2 i expressed 2 concisely as "N(I'. ChiSquared.96u.e probability distributions most often enc untered in econ metri mal.96a II II + \ W><r y 2. have been devel p d for the normal distribution. and F dist ributi ns.4 The Normal.f. and the standard normal cumulative distribution function is denoted by the Greek letter (1). where e is a constant.36 The normal probability density function with mean 1l and variance (J2 is a bellshaped curve. TI. arc the nor The Normal Distribution A continuous random variable with a normal distribution has th familiar hellshaped probability density shown in Figure 2.1.2).96u and I' + 1. I). The normal distribution is denoted N(p. Values of the standard normal cumulative distribution function are tabulated in Appendix Table l. we must standardize the variable by first subtracting the mean. Student t. Rand m variables that have a N(O.96" is 0. Student I. The area under the normal p.95. accordingly."The standard normal distribution is the normal distribution with mean I' ~ 0 and variance u2 ~ J and is denoted N(O.96(J and I' + J .d. centered at JJ. The normal distribution with mean JJ. then by dividing Some special notation and terminology .5 show. . 1) distribution are of len denoted Z. To look up probabilities for a normal variable with a general mean and variance.. and F Distributions 11.5.1. 1/1. chisquared. ( ). As Figure 2.e function defining th normal probability density is given in Appendix 17. Pr(Z :5 e) ~ <pte).. the n rmal density with mean I' and variance (J2is symmetric around its mean and has 95% of it probability between I' 1.
it has version of Y is Y minus its mean.691 is taken from Appendix The same approach distributed random variable Table l. For example. suppose Y is distributed of 4.Thus.1) is one (see Exercise 2.6a? the random variable ~(Y . Y is normally distributed with a mean of 1 and a variance What is the probability The standardized normally to ~(Y 1) tion. what is the shaded area in Figure 2. The box "A Bad Dayan normal distribution..39) The normal Table 1. c2 is distributed ing by its standard Let c\ and d2 = (C2 /")/u. C2) = Pr(Z s.8). C2 CI Y is normally distributed with mean /" and variance N(/". is symmetric. so its skewness is zero.1)/ distributed that Y s.' s. 2. of the cumulative These steps are summarized Street" presents an unusual The normal distribution the normal distribution is 3.. d2) = <I>(~). The kurtosis . the standard normal distribution s.691. Then Y is standardized deviation. (2. 4)that is. and F Distributions 37 Computing Probabilities Involving Normal Random Variables Suppose ~ 2. Y by subtracting its mean and dividZ = (Y /")/u. ~) = <1>(0. devia N(I.38) Cl) = Pr(Z .2. s. 2that is.4. (2. (Y .4 The Normal.4 u2.~(Y 1) s. Accordingly. in other words. (2. ChiSquared. (2). 2 is equivalent Pr(Y s. that is. cumulative distribution function <I> is tabulated in Appendix the result by the standard deviation. divided by its standard V4 = ~(Y  1). with mean zero and variance is.5)= 0.. Student t. the probability that a normally Wall of can be applied to compute in Key Concept application exceeds some value or that it faUs in a certain range. by computing with denote Then two numbers < and let dl = (CI /")/" and Pr(Y Pr(Y. Now Y s.41) where the value 0. 2) ~ Pr[~ (Y 1) ~l Pr(Z = s. ~(2 I)that shown in Figure 2. dJl = 1 <I>(dl). ~. that is.6b.
4) dl51nbut~n (Y . is given in Appendix 18.5 The normal distribution can be generalized to describe the joint distribution of a set of random variables. is given in Appendix 17. or. then aX + bY has the normal distribution: aX The multivariate normal distribution.38 CHAPTER 2 Review Probability of Ci13!ll1JD . 1) dl'tnbutlon 0.0 (b)N(O. 4) Calculatingthe Probability That To calculate Pr( Y :::. the distribution is called the multivariate normal distribution. 2 = r YI "T" a Pr(Z s 0.691. + bY is distributed N( a/Lx + b/Ly.5) ~ <1>(05) ~ 0. y s 2 When Y Is Distributed NO.f.42) (X. a2(T} + b2(T~ + Tabo X1') (2.1 )/2.4) y Pr(Z 505) N(O.0 2. The multivariate normal distribution has four important pr perties.0 (a) N(I. Because the standardized random variable. Y IS standardized by subtracting its mean (J. If X and Y have a bivariate normal distribution with covariance (J KY and if a and b are two constants.d. The probability that Y:::..1. and the formula for the general multivariate normal p.6b. Pr( Y:::. use the standard normal distribution table. In this case. Pr(Y51) N(I. and the corresponding probability after standardizing Y is shown in Figure 2. Y bivariate normal) . is a standard normal (Z) variable.d. I) 0.5).6a. if nly two variabies are being considered.1)= random 1.2). 2 is shown in Figure 2. ) p( <L=. Pr(Z" 0. the bivariate normal distributionThe formula for the bivariate normal p. From Appendix Table 1.L ::: 1) and dividing by its standard deviation (c> 2). then ..f.1. standardize Y.
so the drop of 22. October 19. 1980.I December 31. If daily percentage price changes are normally distributed. or more than 22 'landerd devianons: 10 5 a 5 10 15 October 9. Percent change Ihe average percentage daily change of "the Dow' index was 0.6%." the Dow Jones Industrial Average (an average of 30 large industrial stocks) fell hv 22. ChiSquared. a plot of the daily returns on the Dow during the 1980s.05% and its standard deviation was 1.5 X 1089. The enormity of this drop can be seen in Figure 2.6%! Fr m January I.stock market can rise or fall on Monday.. where there are a total of 88 zeros! Daily Percentage Changes in the Dow Jones Industrial Average in the 1980s Duringthe 19805.. This is a 101but nothing compared to what happened 1987.On "Black Monday. You will not find this value in Appendix Table I. then the probability of a change of at least 20 standard deviations is Pr(IZI c: 20) = by 1% or even more. and F Distributions 39 A Bad Day on Wall Street O n a typical day the overall value of stocks traded on the U.J3) pric changes on the Dow was 1. the standard deviation of daily percentage return of 20(= 22. . that is.7.2.4 The Normal.6% was a negative 2 x <1>( 20). 2009.16%.6/J.1987 1 20 25 L_'_'::':::':::_!=_:::::::::::::::::::::: 1980 1981 1982 1983 1984 t985 1986 1987 1988 1989 1990 Year continued . This probability is 5.000 . 19a7'Black Monday" the indexrell25. 0.l3%. standard deviations. n O October 19. but you can calculate it using a computer (try it!). 00055. Student t.
1987 October 13.2008 December 01.0 7.1 8.9 6. there have been many large to be consistent with a normal distribution with a constant variance.1 10.IXI012 6.an exiremel prices are normally distributed lcarly.0 X 10 to  f . or 1.! and very gooddays we actually sen 011 Ir ret.01'9 = (x . 1987 October 26.200'1. varying variances ore discussed In models abandon more than 5. Thl 1.4 X 1011 2. tlongwilh hnng eX eed 6. 8/" k ·.9 10. All len standard deviati ns.5 X 1089.st kpriceper cntage r re vent hnn Ir \lock have existed for 14bil lion years. F r this reason.5 x 1O89? Consider the following: • The world population is about 7 billion. In fact.4 chonge u 109 the me n ond vori the srandardized ance over this period.7 Change z Nom. Daily Percentage Changes in the Dow Jones Industnal and the Normal Probability of a Change at Least as Larg Standardized x Percentage Change (x) 22.2 7.5X 10 12 2.. Til... October 21.4 X 10• The universe is believed 10 10 Industrial January Average in the 7571 tr dlnp tit ~ bell"een 1. 1987 October 15.40 CHAPTER 2 Review of Probability How small is 5.1 9. or about 5 X 1017 seconds.2008 OClober 28. so the probability of choosing a particular second at random from all the seconds since the beginning of time is 2 x 1018 • There are approximately 1043 molecules of gas in the first kilometer above the earth's surface.19 O. Date 19802009.3 X 1011 5.9 X 1011  7. 2008 October 09. so period.6 11.0 7. X 10 I~  7.( tober wilh 11Ine16).5 2<1>1 10 !<'I z) October 19.finnn e pr()(c~'lonnls u models of stock price hnnge n stock price change as normally dhtnbuled ancc that evolves over lime.8 9.. 2008 October 27. 604 604 lAX 101• 2.9 7. Other In in the f II of 21Xl/\ hove (mlldc" hnpter higher volatiliry than Iher.. 1987 and the Iinanci I eris: u h m I\J l trcats with like. 2008 .so the probability of winning a random lottery among allliving people is about one in 7 billion.2001 20. These / models are more consistent with the very but. 0ll!JD The Ten Largest ~ Ind .1997 September 17.1 .5 lists the ten largest daily percentage price changes in the Dow Jones the norrnal di<tribulion fnvor r distributions with heavier toils.111.2 7.5 I. I Probob Illy of • Chang a' L.JJJ/fT p~lz '" z) 5. The probability of choosing one at random is 1043• Although Wall Street did have a bad day.0 6. Table 2.6 9. the fact that it happened at all suggests its probability daysgood and badwith stock price changes was 100 s hav a disother vllri tributi n with heavierwilli th n rhe normal lil tnbuti n.3 7..andDe ember 1.on ideo papulnnlcd in Nassim Tnleb's 2007 book.1 X 1011 6.
Selected percentiles of the X. Second.2. Then Zt + Z~ + Z~ has a chisquared distribution with 3 degrees of freedom. ern = O. TIle chisquared distribution is the distribution of the sum of m squared independent standard normal random variables. The ChiSquared Distribution The chisquared distribution is used when testing certain types of hypotheses in statistic and econometrics. then the marginal distribution of each of the variables is normal [this follows from Equation (2. If X and Yare jointly normally distributed.Joint normality innplies linearity of conditional expectations.81) = 0. The Student t Distribution TIle Student I distribution with m degrees of freedom is defined to be the distribution of the ratio of a standard normal random variable. let Z be a standard normal random variable.11). and 2:l be independent standard normal random variables.95. distribution are given in Appendix Table 3. Student I. if a set of variables has a multivariate normal distribution. then.. This distribution depends on m..let W be a random variable with a chisquared distribution with m degrees of freedom. ChiSquared.. if variables with a multivariate normal distribution have covariances that equal zero. let ZI> Zz. if X and Y have a bivariate normal distribution and a xy = 0. For example.if n random variableshave a multivariate normal distribution. then the variables are independent. For example. . where a and b are constants (Exercise 17.. Appendix Table 3 shows that the 95th percentile of the X~ distribution is 7. which is called the degrees of freedom of the chisquared distribution. that is. 7. regardless of their joint distribution. if X and Y have a bivariate normal distribution. but linearity of conditional expectations does not imply joint normality. In Section 2.81. The name for this distribution derives from the Greek letter used to denote it: A chisquared distribution with m degrees of freedom is denoted X. and FDistributions 41 More generally. then the converse is also true.. then the conditional expectation of Y given X is linear in X.3 it was stated that if X and Yare independent. so Pr(Zt + Z~ + Z~ :s.42) by setting a = 1 and b = OJ. then X and Yare independent. divided by the square root of an independently distributed chisquared random variable with m degrees of freedom divided by m. Third. That is. Fourth. This resultthat zero covariance implies independenceis a special property of the multivariate normal distribution that is not true in general.E( YI X = x) = a + bx. then any linear combination of these variables (such as their sum) is normally distributed.4 The Normal. Thus.
. divided by 111: Wlm is distributed F. where Wand V are independently distributed. In thi limiting case. To state this rnathcmati ally. x3 .00 distribution is 2. divided by the degrees of freedom. to an independently distributed chisquared rand III variable with degrees of freedom 1/. the 95110 percentile of the 1'3. denoted 0"JI' is defined 10 be the distribution of tbe ratio of a chis [uare I random variable with d grccs of freedom m.. 95'h. which is the same as the 95110 pel' entile of the distribution. elected percentiles f the Student I distribution are given in Appendix Table 2.. distribution depends on the degrees of [Ieedo. In statistics and econometrics.60. The Student I distribution depends on the degrees f freed m IIl. the tudent I distribution is well approximated by the standard normal distribution and the I distributi n equals the standard normal distribution. distribution tends to the 1'. Then the random variable and let Z an d W b e 111 • •• • Z/VW/m has a Student I distribution (also called the I dISlrobllhO~) with //I degrees of freedom.Then has an distributionthai is.Thus the 95'h percentile of the I". 7.". and that mean is 1 because the mean f a squared standard nOImal random variable is 1 (see Exercise 2.24)..81/3 = 2.60.. JI 95 .00 limit of 2. The 90110..m "~.." distribution can be approximated by the 0". but when m is small (20 or less). For example.. an F distrihuti n with numerator degrees of freedom 1/1 and dcnominat rdcgrce of freedom II. let W be a chisquared random variable with 111 degrees r freed m and let be a chisquared random variable with 1/ degrees of freed m.. and the 951h percentile of the F.42 CHAPTER 2 Review of Probability 'dependently distributed. distribution. Thus the F. r.l1lC tudcnr I distribution has a bell shape similar to that of the normal distriburi n.30 distribution is 2...81 (from Appendix Table 2). (rom AppendixTable 4..". distribution arc given in Appendix Table 5 for selected values of I'll and n.. divided by m.7 J. distribuii n is the distribution of a chisquared random variable with 11/ degree of fr cdom.oo. As the denominator degrees of freedom n increases.. This distribution is denoted I". The F Distribution The F distributiou with 1'1/ and 1/ degrees f freedom.92. the 95'10 percentile of the 1'.the denominator random variable V is the mean of infinitely many chisquared randam variables. which is 3 (7. it has more mass in the tailsthat i il is a "fauer" bell shape than the normal. When 111 is 30 or more. 90 distributi n is 2.. and 991h percentiles of the F. o/f. an important pccial asc f the F di trihuti n arises when the denominator degrees of freedom is large enough that the f.. the 1h percentile (the F. divided by II.60). For example.
kn \ ing the value of the commuting time n ne of the e rnnd 1111 ele ted day provides n inf rrnati n about the commuting lime nan th r f the do . l'2 is the sec nd b ervati n. where >'I is the first b servatlon. . and her daily mmutin lime ha the umulati e di tribution function in igure 2.5 Random Sampling and the Distribution of the Sample Average 1m l all the statisti al and econometric procedures used in this book involve aver ges or weighted average of a sample of data.2a.. . \ hich is called it sampling distribution.1h of her rand ml elected day Be u e the membe f the population included in the sample are elected at rand m. .5 RandomSamplingand the Distributionof the SampleAverage 43 2.rand mly drawing a sample fr m larger popul ti nha the effect of making the sample average itself a rand m variable. Random Sampling uppo e our c mmuting student fr m ecti n 2. is the mmuting time 0 the . Because these d \ ere sete ted at rand m. Be use the ample average is a rand m variable.2. If Simpl random sampling. and forth. TIli se tion introduces some basic concepts bout random sampling and the di: tributi n of average that arc used through ut the bo k.called simple random sllmpling. The situ ti n de ribed in the previ u paragraph is an example of the simple t ampling heme u ed in stati nics.. the v rlue f the commuting time n each f the different day are independently di .. that is.because the days were elected at rand m. it has a probabllit distributi n.the value of the b er ati n >'I. Iri uted r nd m variables. This ecti n concludes with m pr pertie f the sampling di tributi n of the sample average. >'I i the commuting time n the first of her /I rand mly elected da sand Y. haracterizing the distributi n f sample aver ges theref re is an es ential step toward understanding the perf rman e of e n mel ric pr edures. Y". In the commuting example. . We begin by discussing random sampling. The /I b erv ti n in the ample are den ted >'I •.1 aspire l be a slati ti ian nd decides t rec rd her commuting time n various do he el ct the e da at rand m from the ch I year. y" are themselve random. The a t f random samplingthat i . in which /I bje tare el ted at rand m fr m a population (the p pulati n f cornmutan do and e h memb r f the p pulatlon (each day) is equally likely t be in luded in the ample..
....Y. Thus the act of random sampling means that ables. . Before they are sampled._1 Y  Y. When I the marginal distribution tribution is the distribution distributed. this marginal of Yin Ihe population /1.. . .... + 2>. of the n observations Yl" . .i. oncept 2. . Random Variables In a simple random sample.) = n 1 nO.. values. + Y..i..).i. is the same for each i = 1. of Yz..). they are said to be indepeudently and identically distributed Simple random sampling and i.. Y" are said to be identically Under simple random sampling.44 CHAPTER 2 Review of Probability 4mi.d.11.. so the conditional distribution marginal distribution When of distributed independently the value of of l'1 provides no infor Yz given l'1 is the same as the 12..43) An essential concept is th~ the act of drawing a random sample has the effect of making tbe sample average Ya random variable.5.. Y. 1 II Y. are independently and so fortb..... is distributed 12. r. knowing mation about )1. then ~.. can take on many possible for each observation. . draws. in Key distributed... nand r..d. + 1'. . under simple random sampling.Y" are drawn from the same distribution and are independently (or i.d.d. . the distriindependcntly bution of Y. that is. their values of Y will differ. Y" different members of the population are chosen. . Yi is r. draws are summarized The Sampling Distribution of the Sample Average The sample average or sample mean.. n objects are drawn at random from a popl~lation each object is equally likely to be drawn.. .Y.. ..m3m Simple Random Sampling and i. can be treated as rand m vari l'1.' i=l n (2. of Y.'. Because each object is equally to be drawn and the distribution of and 2..i. . and identically (i. Because r. is the same for all i = 1. is ..5 Y for likcly Y. di being sampled. is the same for all i.. a specific value is recorded i. . .Y" are randomly drawn from the same population.d... . ... The value of the random variable the ph randomly drawn object is denoted Y.i.. . the rand distributed m variables of Yo11 .. .. Y. has the same marginal distribution for i = 1. Y. In other words. after they are sampled. Because the sample was drawn .
so cov( y" 1j) = O. .. For example. = 1. average is random.44) n = 2.Ly = 2f. Had she chosen five different days.d..var(Y) = 1<T$· For general n. the mean and variance is the same for ail .. because r.and Y. are random. .".31) with a = b ~ 1 and cov(r. n).Ly = f.37). Mean and variance ofY.28):£(r. . Thus the mean of the sample average is E[1( l'J + 11)] = x 2f. + 11 is given by applying Equation (2.11". it has a probability distribution.uy and <T~denote tbe mean and variance of Y. Random Sampling and the Distribution of the Sample Average 45 thei.(because the observations are i.1j) i=l1'=l. for + 11) = 2<T~. When n = 2. <Ty/Vii. In general..Thus. .i.. = var (1" Ii var(Y) 2: Y. 11) = OJ. .5 at random.r the value of each }j is random.) i=l ~ f. .. Suppose that tbe observations r.Ly.2.. ..d. ... . Y is found by applying Equation (2. The variance of are independently distributed for i '" j..45) =n' The standard deviation of <T$ Y is the square root of the variance. 1 _ £(Y) 1 II = Ii~ E(Y. the mean of tbe sum r. she would have recorded five different timesand thus would have computed a different value of the sample average. . The distribution of Y is called the sampling distribution of Y because it is the probability distribution associated with possible values ofY that could be computed for different possible samples r..i.uy +f.. Y. + 11) = . We start our discussion of the sampling distribution of Y by computing its mean and variance under general conditions on the population distribution of Y.11" are i. The sampling distribution of averages and weighted averages plays a central role in statistics and econometrics.Ly. Had a different sample been drawn. Because l'l.d.) 1 +2' n 1""'[ 1 =2' n 2:var(Y. For example..) i=l n 2: 2: cov(Y. then the observations and their sample average would have been different: The value of Y differs from one randomly drawn sample to the next. ... j#i n n (2. and let. Because Vis random.Ly..Y" are i. so [by applying Equation (2.i. var(r. Suppose our student commuter selected five days at random to record her commute times. then computed the average of those five times. (2.Y.
Suppose you divide $1 equally among n assets. so portfolio after 1 year is (1'[ + Y.45). + Y. the distribution of Y. that is. and. in which the fund holds many Slacks and an individual share of the fund.46) and (Ty  2 up = . the same variance.)/n = 1'. Sampling distribution of Y when Y is normally distributed . does not need to take on a specific form.dev( Y) =(Ty =. the actual payoff of your products such as stock mutual funds.48) to hold.46 CHAPTER Reviewof Probability 2 Financial Diversification and Portfolios T he principle of diversification says that you can reduce your risk by holding small investments in EY = J. thereby owning a small amount of many stocks. But diversification has its limits: For many assets. y. " are i. In the case of stocks. but that portfolio remains subject to the unpredictable fluctuations of the overall stock market.46) through (2. The math of diversification follows from assets has the same expected pay ut. + . Then the expected payout is In summary. that is the variance of the population distribution from which the observation is '~rawn. the sum of n normally distributed random variables i itself . Vii These results hold whatever the distribution of 1'1 is. That is. payouts are positively correlated. but In (T diversiFying reduces the variance fr The math of diversification 2 to pu2. draws from the N(IJy.42). the variance. ' The not~ion (T~ denotes the variance of the sampling distribution of the sample average Y.' imilarly. compared to putting all your money into one asset. (T9 is the variance of each individual y.d.i. Putting all your money into one asset or spreading it equally acrossall 11. S uppose t h at Y.47) (2. and tbe same positive correlation p across assets [so that cov( y" lj) ~ pu'].. As stated following quation (2.lhe variance of the portfolio payout is varCY) = per' (Exercise 2. multiple assets.26). Yaref') remains positive even if II is large. for largen. In contrast.. the mean. such a the normal di tribution for Equations (2. fLy.you shouldn't put all your eggs in one basket. suppose that each asset has the same expected payout.48) std. . 2 To keep things simple. ~""'. a .n' (2. has led to financial owns a Equation (2. (Tf denotes the standard deviation of the sampling distribution of Y.. var( Y) = (TY  (2. risk is reduced by holding a portfolio. (T9) distribution. and the standard deviation of Yare E(Y) = IJy.'Y. Because you invested lin dollars in each asset. Let }j represents the payout in 1 year of $1 invested in the i1h asset.
There are two approaches to characterizing samphng di tribuli ns: an "exa t" approach and an "approximate" appr ach.ptotie distribution"a ymptotic" because the appr im Ii n bc m exa t in Ihe limit Ihalll ./Ly)/u). if l'I. o}). . t ampling distributions are complicatcd and depend n the di tobuli n f . then (a dis ussed in c lion 2. what the sampling di rribut! n f is... then in gcueral rhe exa I mpling distribution f l' is very complicated and dcp nd n the dlstributi n f . (ry/n). When til ~nmple il i large. The cenlral limit theorem ays Ihat.lhe a mpl ti distributions are imple. l} wil) bc I to /Ly with e high probability. when the sample size is large.r Ih u and these nsympt Ii di tributi n can be counled n to pr vide vc good appr ximati n t the exact sampling di tribution. 2.. Thi n rmal appr ximate distribution provide enormous impliIi lJ n and underlie the theory of regre i n u ed Ibr ughout this book. .d. The sampling distribution that exactly des ribe: the distributi n f f r ny 1/ is called the e net distribution or finitevnmple dhlribution f .d. nfortunately. this mean th~t. then l' is distributcd (/Ly.y)/u does not depend on Ihe di lribuli n f Y.Y" are i. r example.Y" are i. if the distribution of Yin t n rmal.i.As we see in thi secti n. i pproximalely normal. in a rnathernatica! en e. .5) the exa t di tribution of l' i normal v Ith mean /Ly and vori n e U~/II. draws from the N(/Ly. lthougb cx. Be ause the mean of l' is /Ly and the variance of l' is uNn.6 LargeSample Approximations to Sampling Distributions ampling dislributi n playa central r le in the development of tatistical and c on metrt pr edure so it is imp rtaru to know. The" ppr lmme" ippr a h u es approximati n to the ampling distributi n thot rcl n the mple ize cing Iarge. MoreoverremarkablyIhc mpl Ii normal di lributi n f ( p.2. Thi e Ii n pres '0\ tbe t\ 0 ke I I u ed I appr imalc sampling di tribUll n ben Ihe nmple ize i I rge: the law of large number and the central Itmit Ih rem.The larges mple approximation to the amplmg di lri uti n i fl n allcd the OS)'Il. the ampling di tributi n of lhe londardized sample avcrage..i.. ( . . if Y is n rmally distributed and l'I. Ih "e ppr imali n n b very a curale even if the ample ize is onl 1/ = 0 01> erv Ii n Bccnu c snmpl ilC u ed in pro tice in ec n melri typically number in th hundr d.6 LargeSample Approximationso SamplingDistributions t 47 normall distributed. Thc "e a I" appr ach entails deriving a f rmula for the sampling distributi n rhat h Ids exactly f r any value f n.. TIle la\ f large numbers ay that.
48
CHAPTER 2
Review Probability of
Convergence
~
in Probability, Consistency,
and the
law
of Large Numbers
2.6
11,e sample average Y converges in p~obability to /ky (or, equivalently, Y i COn_ sistent for /kY) if the probability that Y is in the range /ky  c to /ky + C beeoll1,= arbitrarily close to 1 as n. increases for any constant c > 0.111e convergence of y to in probability is written,}I ...L... /ky, . The law of large numbers says that if Y;, i = 1, ... ,11 arc indc] endcntly and identically distributed with E( Y,) = /ky and if large outliers arc unlikely (tcehniI"Y =
cally ifvar(Y;)
ol
< (0), then Y '> tv

p
The Law of Large Numbers and Consistency
1110law of large numbers states that, under general conditions, Y will be ncar My with very high probability when /I is large. Thi is sometimes called the "law f averages."When a large number of random variables with the same mean arc averaged together, the large values balance the small values and their sample average is close to their COlll1110n mean. For example, consider a simplified version of our student ommuter's experiment in which she simply records whether her commute was short (Ie s than 20 minutes) or long. Let Y; equal I if her commute was short on the i1h rand mly selected day and equal 0 if it was long. Because she used simple random saml lin , l'\, ... , Y" are i.i.d. Thus Y'j, i = 1, ... II are i.i.d. draws of a Bernoulli rand 11'\ variable, where (from Table 2.2) the probability that Y; = .l is 0.78. Because the expectation of a Bernoulli random variable is its success probability, E( Y,) = /ky = 0.78.111e sample average }lis the fraction of days in her sample in which her commute was short. Figure 2.8 shows the sampling distribution of Y for vari u sample sizes ,.1. When n = 2 (Figure 2.8a), Y can take on only three values:O, ~,and 1 (neither commute was short, one was short, and both were short), none of which is particularly close to the true proportion in the population, 0.78. As 11 increases, however (Figures 2.8bd), Y takes on more values and the sampling distribution bec mes tightly
I
centered on
My.
11,e property that Y is near I"Y with increasing probability as 11 increases is called convergence in probability or, more coocisely, consistency (see Key oncept 2.6). The law of large numbers states that, under certain conditions, Y converges in probability to /ky or, equivalently, that Y is consistent for MY,
r
2.6
LargeSample Approximations to Sampling Distributions
49
Sampling Distribution of the Sample Average of n Bernoulli Random Variables
Probability 0.7 0.6
Probability 0.5 0.4
0.5
0.4
u = 0.78
0.3
u = 0.78
0.3 . 0.2 0.1 0.0 0.0 (a)
0.2
0.1
0.25
0.50 0.75 1.00 Value of sample average
(b)
0.25
,,= 2
0.50 0.75 1.00 Value of sample average
,,= 5
Probability 0.25 0.20
Probability 0.125 0.100
p = 0.78
0.15 0.075 0.10 0.05 0.00 e.,,...,.,,.,,.,,."I'" 0.0 0.25 0.50 0.75 1.00 Value of sample average
(c) ,,=25
0.050 0.025 0.00 'rrrrrrrrrrfrrrf 0.0 0.25 0.50 0.75 1.00 Value of sample average (d) ,,=100
The distributions are the sampling distributions of with p
:=
Y, the
sample average of n independent
Bernoulli random variables
Pr( Y,' ee 1)
=
0.78 <the probability of a short commute is 78%), The variance of the sampling distribution of
J.1.
Y
decreases as n gets larger, so the sampling distribution becomes more tightly concentrated around its mean the sample size n increases.
=
0.78 as
50
CHAPTER 2
Review of
Probability
· . f tl e law of large numbers thai we will usc in this b k are The con d mons 'or 1 _ 2'" that Y;,i = 1, ... ,1'1 are i.i.d. and that the variance of 11, Uy,lS finite. The mathemat_ ical role of these conditions is made clear in ection 17.2, where the.law of large num, bers is proven. If the data are collected by simple random sampling, then the i.i.d. assumption bolds. The assumption that the variance is finite says that extremely large values of Y;that is, outliersare unlikely and observed Infrequently; therwise, these large values could dominate Yand the sample average w uld be unreliable. This assumption is plausible for the applications in this b k. F r example, because there is an upper limit to our student's commuting time (she could park and walk if the traffic is dreadful), the variance of the distribution of c mmuting times is finite.
The Central Limit Theorem
The central limit theorem says that.under general conditions, the di uri ution ofY is well approximated by a normal distribution when 11 is large. Recall that the mean ofYis I"Y and its variance is u~ = up/no According to Ihe central limit the rem, when 11 is large, the distribution ofY is approximately N(I"Y, u~). As discus ed at the end of Section 2.5, the distribution ofYis exactly N(I"Y, u~) when the sample is drawn from a population with the normal distributi n N(I'Y, up). The central limit theorem says that this same result is approximately true when 11 i large even if y" ... ,Y;, are not themselves normally distributed. The convergence of the distribution of Y to the bellshaped, n rrnal appr ximation can be seen (a bit) in Figure 2.8. However, because the distribution gets quite tight for large 1'1, this requires some squinting. It would be easier t sec the shape of the distribution ofY if you used a magnifying glass r had me other way to zoom in or to expand the horizontal axis of the figure. One way to do this is to standardize Y by subtracting its mean and dividing by its standard deviation so that it has a mean of 0 and a variance f l.This process leads to examining the distribution of the standardized vcrsi n f Y, (Y  I'I)/uy. According to the central limit theorem, this di tribution should be w II approximated by a N(O,1) distribution when n is large. The distribution of the standardized average (Y  I'Y)/U1' is plotted in Figure 2.9 for the distributions in Figure 2.8; the distributions in Figure 2.9 arc exactly the same as in Figure 2.8, except that the scale of the horizontal axis i changed so that the standardized variable has a mean of 0 and a variance of l.After this change of ~cale, it is easy to see that, if n is large enough, the distribution of Y is well approximated by a normal distribution. One might as".:,how large is "large enough"? That is, how large must 1/ be for the distribution of Y to be approximately nonnal?The answer is, "I t depends."",e
t
2.6
LargeSample Approximationsto SamplingDistributions
51
Distribution of the Standardized Sample Average of n Bernoulli Random Variables with p = 0.78
Probability 0.7 0.6 0.5
0.4 0.3
Probability 0.5
0.4
0.3 0.2
0?:
0.1 0.0 ~h,..,lL,r.,J"'rr~ 3.0 2.0 1.0 0.0 1.0 2.0
0.1 0.0 ", ...... , 3.0 2.0 ...... ,!''r.'',,;:"'; 1.0 0.0 1.0 2.0 3.0 Standardized value of sample average (b)
3.0
Standardized value of sample average (a)
,,= 2
,,= 5
Probability 0.25
Probability
0.12
2.0
1.0
0.0
1.0
2.0
3.0
2.0
1.0
0.0
1.0
2.0
3.0
Standardized value of sample average (c),,=25 The sampling distribution of (d) ,,=100
Standardized value of sample average
Y in
Figure 2.8 is plotted here after standardizing
Figure 2.8 and magnifies the scale on the horizontal axis by a tactor of
Y. This plot centers the distributions in Vii. When the sample size is large, the sampling
distributions are increasingly well approximated by the normal distribution (the solid line), as predicted by the central limit theorem. The normal distribution is scaled so that the height of the distributions is approximately the same in all figures.
I 1': are i. after centering and scaling. Y.2 < 00. n f the underlv. Y.. that is quite different from the Bern lion has a long right tail (it is "skewed" to the right). the d. The central limit theorem is summarized in Key Summary 1.d. have a similar shape. 0' y where 2 o < . the pI' bability I function (for continuous random variables).. The probabilities with which a random variable takes on different variables).10 for a population Figure 2.' = iJl'  and var( 1'. Becau e the distri ution The convenience of the normal approximation. While the "small in Figures 2.stflbutlon of (Y  iJY)/" 2 _ (where .7 S uppose th a t v l] I •. distributed. themselves have a distribution that is far from n rmal.e respectively. then this di tributi ampling n. n rmally when the ing Y. Although the sampling distribution ulli distribution. amazingly. then Y is exactly normally distributed underlying I'.7. i approaching imperfect] ~ the bell ns. quality of the normal approximation depends on the distributi if the for all II...".10 are complicated 1/" and quite differd are simple aches the and. ability hecause of the central limit theorem.the normal to the distribution ofY typically is very g d f I' a wide variety II" population distributions. sh wn in This di tribudistribution of approximation can require n = 30 or even more. is shown in Figures 2.. of is quite go d. By nape for II n ~ 25.I I' II IDO.. and 100.10a.) 11 . the normal approximation however. 25.. makes it a key underpinning applied econometrics. The central limit theorem is a remarkable ent from each other.9 and 2. In Iact.O'y/n) becom:s arbitrarily well approximated by the standard n rmal distribution. the normal approximation approximation still has noticeable = 100.52 CHAPTER 2 Review of Probability ~ The Central Limit Theorem 2.. normal as 1/ grows large.\ f Yappr distribu tions of Yin parts band c ofFigures 2. that make up the average.IObd f I'll = 5. Y is said to have an asymptotic normal distribution. and the probability values are distridensity summarized by the cumulative distribution bution function (for discrete random functi n.) = a}. At one extreme. . 00. As n . the "large distributions re ult. with E( Y.i. are thcrnselve In contrast. This point is illustrated in Figure 2.9d and 2. combined wi th its wide applicof modern nccpt 2.
0 1. like the population distribution. as predicted by the central limit theorem..06 0.3.lOa. When n is small tn = 5).0 1.0 0.0 2.0 20 3.50 0.10 0. the sampling distribution is well approximated by a standard normal distribution (solid line)..oo~.0 2.00 ~.0 3. the sampling distribution."'" 3.0 0.12 Probability 0.0 2.30 0.0 3.0 3.0 0.~fI 3.0 2.09 0.~100 1/=25 The figures show the sampling distribution of the standardized sample average of rid population distribution n draws from the skewed (asyrnrnet shown in Figure 2.0 1.03 o.Summary Distribution of the Standardized Sample Average of n Draws from a Skewed Distribution 53 Probability 0.0 1.0 0.0 2.0 Standardized value of sample average (e) Standardized value of sample average (d) .0 Standardized value of sample average (a) 1/ Standardized value of sample average (b) 1/ ~ =1 5 Probability 0.0 0.09 0.0 0. But when n is large tn = 10m.0 2. .0 1.12 0.. is skewed.03 1.20 0. The normal distribution is scaled so that the height of the distributions is approximately the same in all figures.06 0.12 0.40 Probability 0..0 2.00"'='.0 1.0 1.
··· . The conditional pr bability distribu_ tion of Y given X = x is the probability distribution of Y. Y.d.l'. Key Terms outcomes (15) probability (15) sample space (15) event (15) discrete random variable (15) continuous random variable (15) probability distribution (16) cumulative probability distribution (16) cumulative distribution function (c.. S. that are independently and identically distributed (i.i. .1'. the law of large numbers says that u 2 fT~/n..). 111esample average. The expected value of a random variable Y (also called its mean. 3.Ld.L) (17) Bernoulli random variable (17) Bernoulli distribution (17) probability density Iuncti ( 18) density function (I ) density (18) expected value (18) expectation (I ) mean () variance (21) standard deviation (2\) moments of a distribution skewness (23) kurtosis (25) outlier (25) n (p..f. are i.. Simple random sampling produces n. then: a.d. varies from one randomly chosen sample t the next and thus is a random variable with a sampling distributi n. The joint probabilities [or two random variablcs X and Yare summarized by their joint probflbility distribution. has a standard normal distribution [N(O.tribution] when fI is large.d. ILy). To calculate a probability associated with a normal random variable.5.. the central limit theorem says that the standardized veri n f Y. I) di . . 4. random observations lative distribution
and (e) whether it is raining or not. you the computer of the next person you meet. (b) the number of times a computer (c) the time it takes to commute to school. weight of the students A random sample average weight is calculated.) (44) (44) (45) distribution (47) (48) (47) sampling (43) distribution expectation mean (28) Jaw of iterated expectations variance (30) distributed (31) independently distributed sample average sampling asymptotic sample mean (44) distribution distribution exact (finitesample) (36) (36) (38) (32) distribution law of large numbers (48) convergence consistency (38) asymptotic in probability (48) normal distribution (52) normal distribution a variable (36) normal distribution central limit theorem (50) normal distribution Review the Concepts 2.i. 2. and their in the Will the average variable.1 Examples of random variables used in this chapter included (a) the gender crashes. during a given month and Y denotes born in Los Angeles during the same month. and you know why knowing the value of X tells you nothing 2. and the mean student weight is 145 lb. sample equal J 45 lb? Why or why not? Use this example to explain why the Y. of four students is selected from the class. (d) whether Explain why each can be thought of as random.d. is a random .2 Suppose that the random variables X and Yare independent their distributions. sample average. Are X and Y independent? 2.4 An econometrics class has 80 students.3 Suppose that X denotes the amount of rainfall in your hometown the number of children Explain. are assigned in the library is new or old.Reviewthe Concepts leptokurtic rth moment (25) (25) distribution (27) (28) (29) (31) (26) (27) distribution chisquared Student distribution (41) 55 t distribution (41) joint probability marginal conditional conditional conditional conditional independently independent covariance correlation uncorrelated normal standard standardize multivariate bivariate (31) (32) probability t distribution (42) F distribution (42) simple random population identically (43) distributed (44) and identically (i. Explain about the value of y.
3 Using the random variables X and Y from Table 2. and (c) "'wv and corr(W.. Suppose that p ~ OJ. 2. skewness = 0.. consider (b). What IS the rela tionship between your answer and the law of large numbers? 2. (Hint: You might find it helpful to use the formulas Exercise 2.S. population Or lookfor 2008.Repeat this for n _ and n . two new ran 2..5 Suppose that 1>""" a . Derive the probability distribution of Y. describe how the densities differ.. and kurtosis 100. Would if n = 5? What Y is a random variable with !Ly = 0. You want to calculate Pr( Y it be reasonable to use the normal approximation n ~ 25 or n ~ 100? Explain. ance in DC? 2. and (c) a xy and corr(X.1 Let Y denote the number of "heads" that occur when two coins are tossed. 2. . a. re i i d random variables with a N(I.1 and. V). 0.2 to compute (a) E(Y) and E(X). Compute tbe mean. Sketch a hypothetical probability distribution of Y.6 The [allowing table gives the joint probability distribution between employment status and college graduation among those either employed Ing [or work (unemployed) in the working age U. = why n ran Exercises 2.10a.21..~.10 lion. variance.. 4) distribu2.1). Derive the mean and variance of Y. are i i d. Sketch the probability density of Y when n ~ 2.a.. Explain dom variables drawn from this distribution might have some large outliers. In war dS. Show E(Xk) ~ ~ p. What are the mean. . random variables with the probahility • _ disabout tribution given in Figure 2. In' Y.. a. dom variables W ~ 3 + 6X and V ~ 20 . standard deviation.4 Suppose X is a Bernoulli random variable with P(X = 1) = p.5 In September. Compute (a) E(W) and E(V). V).6 S uppose th a t Y. (b) . b. Seattle's daily high temperature has a mean of 70°F and a and vari standard deviation of7°F. Derive the cumulative probability distribution of Y.11 . c.2 Use the probability distribution given in Table 2. skewness.2..7 s.~. ...) 2. "'y ~ 1. 2.56 CHAPTER 2 Review of Probability Y. given in c. and kurtosis of X. and . p for k > O. Y.100 .7Y. Show E(X3) b..
I).8 The random variable Y has a mean of 1 and a variance of 4. The correlati ct n between male and female earnings for a earnings for a randomly c uple is O.659 0.000 = I) a. dollars ($) to euros (€).000. n of twoearner male/female status independent? In a given p pulati have a mean tion couples.Exercises Joint Distribution of Employment Status and College Graduation in the U.954 0. O.0 0 per year and a standard devia[$1 .S. h w that the unemployment c. Population Aged 25 and Greater. Female earnings have a mean of $45. What is the tandard deviation of C? d.341 1. Let Z = ~(Y . A rand mly selected member ot this p pulation reports being unernpi yed. alculaie the unemployment rate [or (i) college graduates and (ii) n ncollcge graduates.s. is the covariance between male and female earning? c.622 0. how that Ikz 2. Wha. d. elected couple. b. \ hat is the pr bability that this worker i a college graduate? Anon.009 0.000.332 0.000 per year and a tandard deviation of $12. lculat E(YIX=I)andE(YIX=O). rare is the fracuon of the labor force that is rate is given by 1.7 Are educational achievement and crnploymeru xplain.9 = 0 and u~= l. ompule' (Y). X and Yare di crete rand m variables with the foLJowingjoint distribution: . male earnings [$40. onvert the an \ er to (a) through (c) from U.ollege graduate? r. 2008 Unemployed Employed 57 CY = 0) W=1) Total Noncotlege grads(X ollegegrads(X To'ot = 0) 0.037 0. denote the combined u.E(Y). What i the mean of b. e. TIle unemployment unemployed. 2. 2.046 0.
99).03 30 0.99).78).01 0. If Y is distributed Ro.4.58 CHAPTER 2 Review of Probability Value of y 14 22 0. Use the definition 2. find Pr( Y:5 7. If Y is distributed F. If Y is distributed N(5.02 0. 2). Why are the answers to (b) and (c) the same? e..) xl xl. Pr(X = 1. Why are the answers to (b) and (c) approximately e.79). mean.31). If Y is distributed b.oo. Y:5 1.10 65 0. 9). find Pr( 1. and variance mean.15 40 0.t7 0. find Pr(40 :5 Y:5 52). 1). of c. > 1.02 8 That is. If Y is distributed the distribution. '90. c. Y = 14) = 0. .4). If Y is distributed 'l5. [f Y is distributed xlo.02.03 0.10 Compute the foUowing probabilities: a. Calculate the covariance and correlation 2. find Pr( Y d. find Pr(Y 3).find Pr(Y > 18. > 0). b.05 0.01 0. If Y is distributed N(O.83).99:5 c.75). find Pr(6 :5 Y :5 8). if Y is distributed N(50. Calculate the probability distribution.99 :5 Y:5 1.09 J Value of X 5 0. find Pr( Y > 1. If Y is distributed FWD. and variance of Y. f. 2.02 0. between X and Y. If Y is distributed N(3. X=8. find Pr(Y :5 1.05 0. Calculate the probability distribution. b. find Pr( 1.12 Compute the following probabilities: a. the same? find Pr(Y > 2. and so forth. find Pr( Y > 4. If Y is distributed N(I. c.10 0. find Pr(Y:5 b. d. 25). u Y is distributed xl.12).11 Compute the folJowing probabilities: a. of Y given a. d.0).15 0. (Hint.
find Pre J 0 I :s Y :s 103). you do n t ha e your rcxibo k and do not have acce to a n rmal probabilit table like ppendix Table I. iHint: Use the law of iterated expectations conditioning on X = 0 and X = I.43) whcn /I /I = 100. /I = 20. a. /I.. b. find Prey > 98). Y. (ii) /I = 100.. (Ilill/: What is the skewness for a symmetric di tribution?) the kurt = 3 and E(W4) = 3 X 1002.100) distributi n. . In a random sample of size /I = 64. each distributed mpute Pr(9. Pre Y :s 0.) ) 4 c.99. Show that Pre I 0 .0 as /I grows large. E( 2). find Pr(Y:S In a random ample h. uppo c c is a positive number. 2.d.14 In a populau nlJ. (Hil/I: Use the fact that i is 3 for a normal distribution. and X. and b.i.6). are i.6).X) W.. Let S = Y + (I ...y= lOOanduf=43.4).i. f size /I = 165. 2. are i. Derive the skewness and kurtosis for S. II. ii. . I). Shov that E( y d.16 Y i distributed (5. 2.c :s Y :s 10+ c) be omcs ct C I 1. Pre Y i!: = 0. draws from the I ( .. Y is distributed (0.6 :s Y:s lOA) when (i) (iii) /I 1.. nfortunately. In a random sample of size /I = 100. Let Y ns for 0. and S = W when X = 0.37) when . Derive E( ). W is di tribuicd N(O.d. random variable. (That is.4. you do have your ompurer and a mpurcr program tha: can generate i. argue that Y converge in probability to 10. £(S3) and £(S4). se the central limit to compute approximati i. an wer the f II wing que tions: II. how that E( y3) = 0 and E( W3) = O. how that £(y2) = I and E(W2) = 100. (10.000. i = I. sc our answer in (b) 10 c. and Ware independent. c.i. Howe er.) e. Use the central limit theorem to 101).d.) B. xplain how you can u e your computer to compute an a urate appro irnation for Prey < 3.17 y.Exercises S9 2.1 uppose y" iI. Bernoulli random variables with denote the ample mean.100) and you want 10 calculate Prey < 3.13 X is a Bernoulli random variable with Pr(X = J) = 0. = 400. 100). 2. /I. p 2.S = Y when X = 1.
Suppose that Y takes on k values Y" .] (2. . Show E(X _1')4 = E(X ) 4 and so forth. Y) = O. Use your answer to (a) to verify Equation c.95? (Use the central limit theorem to compute an approximate answer. Z = z)..).41) 0. Consider an "insurance pool" of 100 people whose homes are sufficiently dispersed so that. E(X'). What are the mean and standard deviation of the damage in any year? b./Y~y. Y..19). .] that Y = Y can be calculated [Hin/:This is a generalization of b.20).) Pr(X = x. XI' a. and Z. and that Z takes on of X. Z)].).. .60 CHAPTER2 Review 01 Probability b. the damage is random. Suppose that X and Yare independent. corr( X. Suppose that in 95% of the years Y = $0. Z is of Y values z" . a. Y = Y. Let Y denote the dollar value of damage in any given year.x~x Zz) . Suppose that Y takes on k values y" 11/ . Yk. (i) What is Y? (ii) What is the proba Y exceeds $2000? 2.  3[E(X2)][E(X)] 4[E(X)J[E(X3)] + 2[E(X)]3 + 6[E(X)f[E(X')] 3[E(X)]' .000 . and the conditional probability distribution givenXandZisPr(Y=yIX=x Z=z) =P.19) and (2. . of Equation (2. the weather can inflict storm damage to a home. Zm' The joint probability distribution Pr(X = x.20 Consider three random variables X. Pr(X Xj. a. that X takes on I values x" .19 Consider two random variables X and Y. I]'Xy b.Z z) . [Hint: Use the definition of Pr( Y = YilX = x. in any year. How large would n need to be to ensure that Pr(0. Explain how the marginal probability from the joint probability distribution. a. Show that = 0 and 2. the damage to different can be viewed as independently distributed random variables.. E(X3). [Hin/:This is a generalization Equations (2. Show that E(Y) = E[E(YIX.] 2. the expected value of the average damage bility that homes Let Y denote the average damage to these 100 homes in a year.. but in 5% of the years Y = $20.39 s: 2: Y '" 0.21 X is a random variable with moments E(X).) 2.~lPr( Y ~ yil X = x.16). From year to year..18 In any year. Y. X. Yk and that X takes on I values x]. Show E(X _1')3 = E(X3) b. how that Pre Y = Yi) = L:.
after 1 year and that $1 invested sup0. Show that !'y = = X2 1. [Hint: Use your answer to (a)..22 Suppose you have some money to investfor simplicity.5. What value of w makes the mean of R as large as possible? What is the standard deviation of R for this value of w? d.Xn denote a sequence of num of summation bers. Show that E(W) = n. (Harder) What is the value of w that minimizes the standard deviation of R? (Show using a graph. Show that E(Yl/o.. in the bond fund. (1/(72) L::~. . 1 . b yields R. Y) = 0 and thus corr(X. standard Y) = O.w. is random fund.] d. Suppose that w b.75. = O. = 1. Compute the mean and standard deviation of R. b.. X~. = O. Suppose that = 0. Show that W = i. and let Y = X2 + Z. and c denote three constants.Exercises 61 2. N(O.i. Yl>' . Compute w = 0.w.. . b a. c. $land you are planning to put a fraction w into a stock market mutual fund and the rest.d. normal random a. The correlation between R. Show that E(yIX) b. n.) Let Xl. Show that V = Yj 2. into a bond mutual pose that R. Show that cov(X. + (1 .05 (5%) and standard devia tion 0.23 This exercise provides an example of a pair of random variables X and Y for which the conditional mean ofY given X depends distributed on Xbut corr(X. is distributed Yl n c..Yn denote another sequence of numbers. (72) for i = 1.) Y) d. 1 . . .08 (8%) and standard deviation and Suppose that lib is random with mean 0. the mean and standard deviation of R.2) b. If you place a fraction w of your money ill the stock fund and the rest. algebra.25 (Review / L::~2Y?IS d· ib ute d l.w)R . Let X and Z be two independently variables. Show that E(XY) of a standard normal random variable are all zero. then the return on your investment is R = w R. 2. .J1 istn notation.. and Rb is 0.24 Suppose 1) is distributed a. with mean 0.07.2. and a. or calculus. Show that . Suppose that $1 invested in a stock fund in a bond fund yields R .25. (Hint: Use the fact that the odd moments c.04.) 2.
Let error associated with this guess. Derive E(V').=1 2.1 Derivation of Results in Key Concept 2. Suppose the value of Z. are random variables with a common mean tv.=1 tl II n b.(Hint: use the law of iterated expectations.] APPENDIX 2.26 Suppose that l'l..XiYi . you know a guess of denote tbe X ~ E(XI Z) denote the value of X using the information on Z. To derive Equation (2.3. show that E(Y) = /"y and d. + ey. but not the value of X.h(Z).a j=l = na n II II II d.3 This appendix derives the equations in Key Concept 2.x. . 2. Let X = g(Z) denote anotber guess of X using Z. Show that cov( Y. When n is very large.  a.y. Suppose that n = 2. show that var( Y) '" pUy. Show that E( WZ) ~ O.. Lax. [Hint: Let h(Z) g(Z) .E(a + bY l]'J = El[b(Y . where i '" j). + 2ab ~Xi + 2ae ~Yi + 1=1 11 II 2be 2.X a. c.) b. use the definition of the variance to write yare a = + bY) = El[a + bY . . and V =X  X = denote its error. b. 11. Show tbat E(W) = O. and let W ~ X . lj) = perf for i '" j.62 CHAPTER2 Review of Probability II a.  = /"y and var(Y) = ~af + ~paf.=1 + e' 2. . i=1 i(a + n bx. and lj is equal to p for all pairs i and j. so that V = [X .I'v)'] b'u9. . a common variance O"~l and the same correlation p (so that the correlation between Y. j=! II = a LX. Equation (2.) = i=l n ?Xi 1=1 + ?Yi 1=1 c. Show that E( V') "" E(W').1'.. 2.E(XI Z)] . 2 2 var(Y) = av/n + [(n . 2.29) follows from the definition of the expectation. (Xi + y. .27 X and Z are two jointly distributed random variables.30)..l'y1]'} ~ b'E[(Y .1 )/n ]pUy. Show tbat E(Y) Co For n "" 2. .E(Xj Z).)' = no' + b' 2.
.34). I carr (X.Derivationof Results in KeyConcept 2. ing the quadratic.33).I"x) + I"x][( Y .E(a + bX + cV)][Y I"Y]} =E ([b(Xl"x) + c(V l"v)][Y I"y]j I"y]j I"y]j = E {[b(X l"x)][Y = buXY + E ([c(V l"v)][Y + caVY.32). inequality implies that cr}y/(crlcr'f.):5 !crxy!(uxu). which (using the definition of the correlation) proves the correlation I carr (X Y)I s 1. I"Y) + l"yJ'} = E[(Y I"Y)'] + To derive Equation (2. use the definition of the covariance to write cov(a + bX + cV.a}yja}.52) lor.31). We now prove the correlation inequality in Equation (2. Y)I Let a = UXY/ a} and b = 1. Because u} + 0'9 + 2( uxy/u})aXY (2.I"Y) + I"y]j = oS Y I"Y)] + I"xE(Y I"Y) + l"yE(X I"x) + I"x I"Y = aXY + I"x I"Y' 1.Rearranging this inequality yields s uh The covariance :5 a}u~ (covariance inequality).33).50) which is Equation (2.u'jyja} ~ O. (2.(al"x + bI"Y)]') = E{[a(X I"x) + b(Y I"Y)]'} = E[ a'(X I"x)'] + 2E[ ab(X . it cannot be negative. that is.Applying Equation (2.3 To derive Equation (2. write E(XY) E[(X I"x)( = E {[(X .49) +E[b'(YI'Y)'] = a'var(X) + 2abcov(X. Y) ~ E{[a + bX + cV . the third equality follows by expand.51) it must be that uf . and the fourth equality follows by the definition of the variance and To derive Equation (2. equivalently. (2. To derive Equation (2. where the second equality follows by collecting terms. covariance. we have that var(aX+ Y) =a2ul+uf+2auXY ~ (uxy/u})' = u~.I"x)(Y I"Y) ] Y) + b' var( Y) (2. write E(Y') 21"YEI.51) var(aX + Y) is a variance.35).)1 inequality.31). YI"Y) +I"~= u9 + 1"9 because E(YI'Y) = E{[(Y = O. ~ a'a} + 2aba xy + b'a}. 1. use the definition of the variance to write 63 var(aX + bY) = E{[ (aX + bY) . so from the final line of Equa tion (2.
1. if so. more practicalapproach is needed.S. what is the mean of the distribution S tatistics is the science of using data to learn about the world around us. by how much' These questions relate to the distribution of earnings in the population of workers. . hypothesis testing. managing and conducting the surveys. measuring the earnings of each worker and thus finding the population distribution of earnings. 3. For example. we can use this sample to reach tentative conclusionsto draw statistical inferencesabout characteristics of the full population.say. and 3. 1000 members of the population. and confidence Intervals In the context of statisticalinference about an unknown mean.l. and confidence intervals.3 review estimation. Three types of statistical methods are used throughout econometrics: estimation. The key insight of statisticsis that one can learn about a population distribution selected by selecting a random sample from that population. Confidence intervals use a set of data. The 2000 l. Estimation entails computing a "best guess" numerical value for an unknown characteristic testing entails formulating a specific hypothesis about the population. of a population distribution. and the 2010 Censuscould cost $15 billion or more. population. Census cost $10 billion. from a sample of data. population is the decennial census.s. Statistical of earnings tools help us answer questionsabout unknown characteristics of distributions in of recent college graduates' Do mean earnings differ for rnen and women. In practice. at random by simple random sampling. Thus a difJerent. The only comprehensive survey of the U. One way to answer these questions would be to perform an exhaustive survey of the population of workers.S. and compiling and analyzing the data takes ten years.CHAPTER 3 Review of Statistics populations of interest. such a comprehensive survey would be extremely expensive. 64 population . to estimate an interval or range for an unknown population characteristic. Hypothesis then using sample evidence to decide whether it is true. such as its mean. many members of the population slip through the cracks and are not surveyed.2. Despite this extraordinary commitment. and.' . we might survey. Sections 3. The process of designing the census forms. Rather than survey the entire U. however. Using statistical methods. hypothesis testing.
lj. For example.d.1 Estimation of the Population Mean Suppose estimate pendently you want to know the mean valne of Y (that is. Both Yand 11 are functions of the data that are designed to esti3. so what makes one estimator "better" are random variables. Yand mate !Ly.3 are extended to compare means in two different populations. A natural way to Y from a sample of n inde (i. 1'. be based on the Student I distribution instead of the normal distribution. another observation. both are estimators of !Ly. but it is not !Ly is simply to use the first the only way.. than another? phrased bution Because estimators There are many possible estimators.. of from one sample to the next. In some special circumstances. at least in some average sense. if they are collected by simple random estimation of !Ly and tbe properties of Y as an estimator Estimators and Their Properties Estimators.1 Estimationof the Population Mean 6S Most of the interesting questions in economics involve relationships between two or more variables or comparisons between different populations. which Y and lJ are two examples. this question can be of the sampling distrithat gets as close more precisely: What are desirable characteristics of an estimator? to the unknown In general. 3.i.6. this mean is to compute and identically the sample average such as the mean earnings of women recently graduated distributed from college.5 focus on the use of the normal distribution for performing hypothesis tests and for constructing confidence intervals when the sample size is hypothesis tests and confidence intervals can large. The chapter concludes with a discussion of the sample correlation and scatterplots in Section 3.2 through 3. Sections 3..1 through 3. Y and lJ take on different values (they proof !Ly. . we would like an estimator as possible true value. The sample average Y is a natural way to estimate !Ly. For example..5 discusses how the methods for comparing the means of two populations can be used to estimate causal effects in experiments.!LY) in a population. these special circumstances are discussed in Section 3.d. in fact. the methods for learning about the mean of a single in Sections 3. Thus the estimators There are.4. Section 3.·· .i. This section of !Ly.1. lJ. (recall that sampling). in other . way to estimate lJ. Y" discusses are i.) observations.7. is there a gap between the mean earnings for male and female recent college graduates? population In Section 3. . using the terminology When evaluated duce different in repeated estimates) in Key Concept samples.> 3. many estimators l[ both have sampling distributions.
fj.y is that.if so. [ky and ILy. a desirable of fj. while an estimate is a nonrandom words.y is that the probability approaches Concept 2.y has a smaller variance than ILy.y. that is. An estimate is the numerical value of the estimator computed using data from a specific sample.y has a smaller variance than ILY. 1 as the sample size increases. How might you choose between them? One way to do so is to choose the estimator with the tightest sampling distribution. An estimator because of randomness number. of an estimator to be a lightly cenleads to three specific value as possible. let fj. the uncertainty about the value of /'oy arising from random variations in the sample is very small. when it is actually is a random in selecting the sample.y and ILy by picking the estimator with the smallest ance. Thus a desirable properly of an estimator sampling distribution as equals /'oy. This sugvarithan gests choosing between fj. where E([ky) i the mean of fj..y) = /'oy. (Key Variance and efficiency.y is said to be more efficient ILy. This observation of an estimator: unbiased ness (a lack of bias). It is reasonable to hope that. such Y or l'I. To slate this concept mathematically.l' is biased. Another desirable property fj. variable ulation. consistency.6). . both of which are unbiased. The terminology "efficiency" stems from the notion that if fj.y denote some est ima t r of /ky. The estimator fj. we would like the sampling distribution tered on the unknown desirable characteristics and efficiency. otherwise. Unbiosedness. the estimator is said to be unbiased. fj.c 66 CHAPTER 3 Reviewof Statistics ~ Estimators and Estimates 3. Stated more precisely. on average. Suppose you evaluate an estimator many times over repeated is that the mean of its randomly drawn samples.y is consistent that it is within a small interval of the true value /ky for /k. If fj. Suppose you have two candidate estimators.1 An estimator is a function of a sample of data to be drawn randomly from a pop. of an estimator of the sampling distribution Consistency. when the property sample size is large. then fj. then it uses the information in the data more efficiently than does fiy. you would get the right answer.y is unbiased if E(fj.
consistency. the variance of Y is o}/n.. we need to specify the estimator or estimators to which Y is to be compared. and Efficiency CilDmim Let /LY be an estimator of l'y.2. is an unbiased estimator of I'Y' Its variance is var( l'i) = of. estimator estimator • Let /iy be another estimator of I'y and suppose that both /Ly and /iy are unbiased. Thus.6. Consistency.. Y is a more efficient estimator than Y" so. that is. We start by comparing the efficiency of V to the estimator l'i. • ===~_lI..I . the law of large numbers (Key Concept 2. From Section 2. Y should be used instead of Y. and efficiency are summarized in Key Concept 3. of I'y if E(fly) of j.5 and 2. Bias. .1IlIIIIIIIIiI. /Ly. The estimator Y. Properties of Y How does Y fare as an estimator of I'Y when judged by the three criteria of bias.2 = I'Y.i. and efficiency? The sampling distnbution of Y has already been examined in Sections 2. consistency.. the variance of Y is less than the variance of l'i. Becanse Y" . I'y. .I'Y.Then /Ly is said to be more efficient than /iy if var(/Ly) < var(/iy). E(V) = I'Y..6) states that Y L.d.1 Estimation the Population of Mean 67 Bias.!I.5. according to the criterion of efficiency. Bios and consistency. Y" are i. thus y..5.en: • The bias of fly is E(fly) • fly is an unbiased • (Ly is a consistent . so V is an unbiased estimator of I'Y' Similarly. for n 2: 2. that is.ty if My ~ 3. . the mean of the sampling distribution of l'i is E( l'i) = I'y. As shown in Section 2..TI. might strike you as an obviously poor estimatorwhy would you go to the trouble of collecting a sample of n observations only to throwaway all but the first?and the concept of efficiency provides a formal way to show that V is a more desirable estimator than Y.Y is consistent. What can be said about tbe efficiency of Y? Because efficiency entails a comparison of estimators.3. Efficiency.
. . 1'.=J I'i> varey) < var(p. Y is consistent. If iJ..25a~/11 (Exercise 3. that is..I.y is unbiased. that is.. functions 5.. 11. Because m is an estimator of E(Y).. show thai the weighted of all unbiased estimators averages 11 and Y have these conclusions reflect a more general result: Y is the most efficient estimator that are weighted averages of 11. Thus j? is unbiased and.1'. jiy =::: ". . Said and are linear differently..3 Efficiency of Y: Y Is BLUE Let My be an estimator of J. . However. .. The comparisons in the previous two paragraphs larger variances than Y.. then .1 aY where a. structure: They are Y.y = Y Thus Y is the Best Linear Unbiased Estimator (BLUE).68 CHAPTER 3 Reviewof Statistics m:mmm 3.This result is stated in Key Concept that is. Y is the most efficient estimator of My among all unbiased estimators weighted averages of }]. Y"..y) unless p.. a are nonrandom constants.mf. II (l/n) ":::'. ..lllUs The estimators weighted averages of Y is more efficient than Y. m that minimizes (3. because var( Y) > 0 as n > 00. Y is the Best Linear Unbiased Estimator (BLUE). The mean of Y is /"y and its variance is var( Y) = 1.1' that are What about a less obviously poor estimator? in which the observations are alternately weighted Consider the weighted by ~ and~: average (3. it is the most efficient (best) estimator among all estimators that are unbiased 3.\.2) rn ~(Y.3 and is proven in Chapter Y is the least squares estimator of /"y. . i=l which is a measure of the total squared gap or distance between the estimator and the sample points.." . ot r].ty that is a weighted average of>. . The sample average Y provides the best the obser fit to the data in the sense that the average squared Consider the problem of finding the estimator n differences between vations and Yare the smallest of all possible estimators.. . j? has a larger variance than Y...11). 1'. . and Y have a common mathematical r.. 1'. In fact. you can think of it as a .. .1) where the number of observations 11is assumed to be even for convenience.
such as those that would be sampling. .2) is called the least squares estimator. Landon would defeat the incumbent.1 Estimationof the Population Mean 69 _f: S won hortly before the 1936 U. you have the value that makes Expression solve the least squares problem: Try many values of m until you are satisfied that as is done in Appendix 3. and an estimate of the unemployment or oversarnples.2) as small as possible. . This assumption is important because the from simple random national unemployment sampling can result in Y being biased.i. to estimate rate. . example biases introduced in the sample. that }]. Franklin D.3. Alternatively.S. the unemployed represented this sampling overrepresents. Because the telephone survey did not sample randomly from the population but instead undersampled Democrats. Because ple are at work at that hour (not sitting in the park!). Suppose agency that. Y is the least The Importance of Random Sampling We have assumed obtained nonrandom monthly scheme 10:00 A.2. but the "Landon by sampling that is not entirely random. the Literary Gazette records and automobile registration files.. Do you think surveys conducted over the Internet might have a similar problem with bias? prediction of the value of Y. . you can use algebra or calculus to show that choosing m = Y minimizes squares estimator the Sum of squared gaps in Expression of J.d. the estimator was biased and the Gazette made an embarrassing mistake.Ly.m in Expression One can imagine using trial and error to (3. 11. but it was wrong about the winner: Roosevelt by 59% t041%1 Bow could the Gazette have made such a big ntistake?111e Gazette's sample was chosen from telephone were also more be Republican. presidential election. a. so the gap Y. � iiiiiiiiiiiiii ===""""""'_' _ . published a poll111dlcatmg to 43%. a statistical adopts a sampling most employed peoare overly rate based on in which interviewers survey workingage adults sitting in city parks at on the second Wednesday of the month. (3..2) so that (3.e estimator 111 that minimizes the sum of squared gaps Y. This bias arises because this sampling scheme of the population. the unemployed members plan would be biased. The sum of squared gaps in Expression (3..M. Roosevelt. But in 1936 many households did not have cars Or telephones. by a landslide57% The Gazette was right that the election was a landslide. Y" are i. draws.2) can be thought of as the sum of squared prediction mistakes. This example of Wins!" box gives a realworld is fictitious.nd those that did tended to be richerand likely to that AlI M.m can be thought of as a prediction mistake.
unemployment 3. The twosided alternative H. Current the monthly U.O (twosided alternative).S. called the null hypothesis. specifies what is true if the null hypothesis hypothesis is that E(Y) is written The alternative twosided alternative hypothesis hypothesis The most general alternative * is not. a !'Y. called the alternative mean. college graduates a null hypothesis distribution = 20.3) that. that of hourly earnings. denoted !'Y. which is called because it allows E(Y) to be either as less than or greater than !'y.: E(Y) * !'Y. Hypothesis pare the null hypothesis that holds if the null does not. 111is section describes hypothesis tests concerning the population mean of hourly earnings equal $20?). The null hypothesis is that the population to a second hypothesis.4.O' about the population is that E(Y) (3. the conjecture earn $20 per hour constitutes = !'Y. Stated mathematically. takes on a specific Ho and thus is value.S. !'Y.o. The starting point of statistical hypotheses testing is specifying the hypothesis be tested.S.o. then the null hypothesis is. the Sur bias. (3. on average in the population. Null and Alternative Hypotheses to testing entails using data to comhypothesis. if Y is the hourly earning of a randomly selected recent college graduate. Appendix 3. Hypothesis tests involving two populations mean earnings the same for men and women?) are taken up in Section 3.2 Hypothesis Tests Concerning the Population Mean Many hypotheses about the world around us can be phrased as yes/no questions. college graduates embody specific hypotheses equal $20 per hour? Are mean earnings the same for male and female college graduates? Both these questions about the population distribution of earnings. E( V).Q 70 CHAPTER3 Reviewof Statistics 11 is important to design sample selection schemes Population in a way that minimizes Statistics Survey (CPS).4) . Do the mean hourly earnings of recent U.o = 20 in Equation (3. The statismean (Does the population (Are tical challenge is to answer these questions based on a sample of evidence.O' The null hypothesis is denoted Ho: E(Y) For example.1 includes vey it uses to estimate a discussion of what the Bureau of Labor rate.3). actually does when it conducts the U.
and it is reasonable not to reject the null hypothesis. suppose that.o because of random sampling. if this pvalue is large.o (the null hypothesis is true) but Y differs from !'y. the sample average Y will rarely be exactly equal to the hypothesized value tv» Differences between Y and tv» can arise because the true mean in [act does not equal!. The pValue In any given sample. alternatives are also possible. is the probability of drawing a statistic at least as adverse to the null hypothesis as the one you actually computed in your sample. then it is quite likely that the observed sample average of $22. In the case at hand. and these are discussed later in this The problem [acing the statistician is to use the evidence in a randomly selected sample ot data to decide whether to accept the null hypothesis H or to o reject it in favor o[ the alternative hypothesis Hl.64 by pure random sampling variation. the average wage is $22. let ya" denote the value of the sample average actually computed in the data set at hand and let PrHo . To state the definition of the pvaJue mathematically. say 0.2 Hypothesis TestsConcerning the Population Mean 71 Onesided section. assuming the null hypothesis is correct.3. For example. it is accepted tentatively with the recognition that it might be rejected later based on additional evidence. statistical hypothesis testing can be posed as either rejecting the null hypothesis or failing to do so. accordingly. Although a sample of data cannot provide conclusive evidence about the null hypothesis. It is impossible to distinguish between these two possibilities with certainty. the pvalue is the probability of drawing Y at least as far in the tails of its distribution under the null hypothesis as the sample average you actually computed. in your sample of recent college graduates. assuming that the null hypothesis is true. If the null hypothesis is "accepted. the evidence against the null hypothesis is weak in this probabilistic sense.y. The pvalue.64.The pvalue is the probability of observing a value of Y at least as different from $20 (the population mean under the nuU) as the observed value of $22. For this reason. also called the signilicance probability. If this pvalue is small.5%. then it is very unlikely that this sample would have been drawn if the null hypothesis is true.64 could have arisen just by random sampling variation if the null hypothesis is true. By contrast. thus it is reasonable to conclude that the null hypothesis is not true." this does not mean that the statistician declares it to be true. say 40%.o (the null hypothesis is false) or because the true mean equals !'y. This calculation involves using the data to compute the pvalue of the null hypothesis. rather. it is possible to do a probabilistic calculation that permits testing the null hypothesis in a way that accounts for sampling uncertainty.
Written mathematically. in which case the .O > (Ty I ly acl u:Y..o)/Ul" has a standard normal distribution.O than y " under the nuU hypothesis or.!Ly. 11. when the sample size is large the sampling distribution ofY is well approximated by a normal distribution. the standardized version of Y.o)/Ul' in absolute value. this variance is typically unknown.o.The pa value is the probability of obtaining a value ofY farther from /l1'.o  I) = 2(1' (ly  acl .ol.Y.e calculation of the pvalue when Uy is known is summarized in Figure 3.o )/u. Thus. then under the null hypothesis the sampling distribution of Y is N(/ly.6. the pvalue) is pvalue = PI'".1./ly./ly. (3. is binary so that its distribution is Bernoulli. the pvalue is the area in the tails of the distribution of Y under the null hypothesis beyond I Y'''' . how utl.!Ly.1.!LY.y.o. [An exception is when Y..!LY. The details of the calculation. equivalently. the shaded tail probability in Figure 3. when the sample size is small this distribution is complicated.6) where <l> is the standardnormal cumulative distribution function.6) depends on the variance of the population distribution.o)/ul' greater than (Y'''' .72 CHAPTER Reviewof Statistics 3 denote the probability assuming that E( Y. To compute the pvalue..o).e formula for the pvalue in Equation (3. computed pvalue is = !Ly.Y. As discussed in Section 2. However. In practice. (Y .5) That is. Under the null hypothesis the mean of this normal distribution is !LY.This probability is the shaded area shown in Figure 3.ol]. the pvalue is the area in the tails of a standard normal distribution outside ± (yacl " . as long as the sample size is large..o. I) (3. it i not.That is. it is necessary to know the sampling distribution of Y under the null hypothesis. according to the central limit theorem. () (I Y . where = ut/n . but if the pvalue is small. where u~ ~ ut/n. u~). so under the null hypothesis Y is distributed N(/ly.) computed under the null hypothesis (that is. is consistent with the null hypothesis. ut ever.1 (that is.. Tf the sample size is large. The = pvalue Prl/o[IY .lf the pvalue is large. is the probability of Obtaining (Y . then the observed value yac.ol > !ya"  /ly. depend on whether CT~ is known.This largesample normal approximati n makes it possible to compute the pvalue without needing to know the population distribution of Y. ut.a. Calculating the pValue When O'y Is Known 11. under the null hypothesis.
so (Y . s~. is 2 Sy n 1 ". Thus the pvalue is the shaded standard normal tail probability outside _ N(O. n. The sample variance and standard deviation.11 Y. Sy.!Ly)2 in the population distribution. and second. . The formula for the sample variance is mucb like the formula for tbe population variance. The sample variance. Y is distributed N(IJy.u. . and the standard error of the sample average Y is an estimator of tbe standard deviation of the sampling distribution of Y. In large samples.o by at least as much as yoct.O I I variance is determined by the null bypothesis. and Standard Error The sample variance s? is an estimator of tbe population variance O"~. . is the square root of the sample variance.7) and Exercise 3. . 1)..!Ly f.7) The sample standard deviation.1 instead of n.3.o)!uy is distributed N(O. is replaced by Y.2 ~ Calculating Hypothesis Tests Concerning the Population Mean a pvalue 73 The pvalue is the probability of drawing a value of Y that differs from My. The population variance. see Equation (2.. witb two modifications: First. The Sample Variance..O I o y"" . the sample standard deviation Sy is an estimator of the population standard deviation ay.o. E(Y .!LYf. (3..2. iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii===~~. Similarly.IJy.. .] Because in general O"} must be estimated before the pvalue can be computed. i = 1.the sample variance is tbe sample average of (Y.(1) i=1 1 "" _2 Y). Sample Standard Deviation. we now turn to the problem of estimating u~. u~) under the null hypothesis.o)!uyl· I Y""Il O"y Y. the average uses the divisor n . is the average value of (Y . n z ±I( y"" IJy.
The reason (or the first modificationreplacing unknown and thus must be estimated.1)//1. and as a result Sy IS un bi d . The result in Equation that 1"[" .](T~. esti1 "degree of freedom" in f freedom Consistency of the sample variance./Vii.7) instead of n is called a degrees of freedom is..4.:'\ (Y. and Y. deviation of the sampling dis tribution ofY is (T1' = (Ty/Vii.4 Y is an Y.1)(T~.1 /.18.V)'] = [(/1. has a finite fourth moment.1 degrees Dividing by /1. . Y" SE(Y) = 01' = s. uses up remain.1 in Equation (3. the natural son for the second modificationdividing mating /"y by by /1..  Vf] = (/1. lase correction: Estimating the mean uses up some of the informationthat the data. is called the standard error of as Key Concept 3.7) instead " reets for this small downward bias.8) 3. Dividing by n . Y and is 01' (the caret "'" over the symbol means that it i an estimator error of V is summarized of (T1')'The standard . that so s~ in Key obeys the law of large numbers. .Thus EL.74 CHAPTER 3 Review of Statistics KEY CONCEPT The Standard Error of The standard error of Y estimator of the standard deviation of are i. that is. Because the standard tor of (T1" The estimator denoted SE(Y) or of (T1'. .. must have a finite fourth moment.y that /.. (3. the sample variance is close to the population probability when /1.is large. which in turn means E( yt) Y. so that only /1. E( yl) < that s~ is consistent is that it is a sample average.9) variance with high In other words. S9 to obey the law of large numbers Concept 2. Equation (3.6. . (3.d. the reason (3. a small downward bias in (Y.1 in Equation (3. Y" are i.. Intuitively..i. of 1/ shown in Exercise E[( Y. The reaas is instead of by I/is that esti Y introduces 3.d. . The standard error 0('1.y by Y is /. (1. The stan_ dard error ofY is denoted SE(Y) or 01" When r.Y? = Cor IlE[(Y.  vf Specifically. mator of the population variance: SY The sample variance is a consistent '+(T' y.. But (or must be finite..9) justifies using sy/Vii as an estirna sy/Vii.i.3 under the assumptions 00../'y)' must have finite variance.y is estimator of is Y..9) is proven in Appendix 3. in other words.
.d./ are i. the pvalue can be computed by replac ing cry in Equation (3. ..."Y.o 1= SE(Y)' a hypothesis (3.. . ..d.p)/n (see Exer cise 3. the pvalue is calculated using the formula pvalue = 2(t>(_ll'aa  SEC Y) .o)/SE(l') plays a central role in testing hypotheses and has a special name. Y. are i. That is. (3."Y.... the formula for the variance ofl'simplifies distribution with Success to p(l .LY. Thus the distribution same as the distribution the standard normal distribution when 11 of' the rstatistic is approximately (l' .6) by the standard error. draws from a Bernoulli p..3. The formula for the standard error also takes on a simple form that depends only on l'and n./LY. .===~ . 1) for large fl.. (3. sf is close to cr9 the by with high probability...13) __________ iiiiiiiiiiiiii.."Y. when cry is unknown and }j.. SE(l') = <Ty. I is approximately The formula the Istatistic.O I  SEC Y) . a test statistic is a statistic used to perform rstatistic is an important example of a test statistic.o)/cry.. the (statistic or zratio: l' . is large because of the centrallimit the orem (Key Concept 2. Accordingly...11) In general. SE(l') = \!Y(l.10) can be rewritten in terms of la" denote the value of' the tstatistic actually computed: yaci  act _  j.Y)/n.. The Largesample distribution of the tstotistk: of When n is large. Calculating the pValue When Uv Is Unknown Because Sf is a consistent estimator of a}..10) The tStatistic The standardized statistical sample average (l' . 1.i.2 When probability Hypothesis TestsConcerning the PopulationMean 75 1[..12) for the pvalue in Equation (3.i. test.ol) .. which in turn is well approximated under the null hypothesis.. Let distributed N(O.. (3.7).2).
28 = 2. but in many practical situations this preferential treatment is appropriate.14.20)/1. the pvalue is 2<1>(2. That is. Hypothesis tests using a fixed significance level. Suppose it has been decided that the hypothesis will be rejected if the pvalue is less than 5%.96.e value of the zstatistic is I"" = (22.when n is large. and the sample standard deviation is Sy = $18.28. E(Y). then under the null hypothesis the zstatistic has a N(O.64.15) That is. Then the standard error of Y is sy/Vii = 18.14) As a hypothetical example. of incorrectly rejecting the null hypothesis when it is true. 5%). Thus the probability of erroneously rejecting the null hypothesis (rejecting the null hypothesis when it is in fact true) is 5%. you can make two types of mistakes: You can incorrectly reject the null hypothesis when it is true. .9%.06) = 0.reject if the absolute value of the Istatistic computed from the sample is greater than 1.039. the pvalue can be calculated using pvalue = 2(1)( 11""1)· (3. Hypothesis Testing with a Prespecified Significance Level When you undertake a statistical hypothesis test. From Appendix Table 1. Because the area under the tails of the normal distribution outside ±1. the probability of obtaining a sample average at least as different from the null as the one actually computed is 3.05. If n is large enough. 1) distribution. or 3.06.76 CHAPTER 3 Review of Statistics Accordingly. The sample average wage is ¥"" = $22. Hypothesis tests can be performed without computing the pvalue if you are willing to specify in advance the probability you are willing to tolerate of making the first kind of mistakethat is. (3.9%. or you can fail to reject the null hypothesis when it is false. TI. If you choose a prespecified probability of rejecting the null hypothesis when it is true (for example. suppose that a sample of n = 200 recent college graduates is used to test the null hypothesis that the mean wage.96 is 5%.64 .14/v'200 = 1. then you willreject the null hypothesis if and only if the pvalue is less than 0. is $20 per hour. this gives a simple rule: Reject Ho if 1/""1 > 1. assuming the null hypothesis to be true.96. This approach gives preferential treatment to the null hypothesis.
the rstatistic was 2. the critical value ofthi two. and the probability that the test corrc tly "ejects the null hypothe is when the alternative is true is the power of the test. TI.15) is 5%. In the previou example of re ting the hypothesis that the mean earnings f recent college graduates is $20 per hour.0 at the 5% significance level. reporting only whether the null hypothesis is rejected at a prespecified significance level conveys less information than reporting the pvalue.lf you were to test many statistical . The pvalue is the probability of obtaining a te t statistic.96. Although performing the test' ith a 5% significance level is easy. the pvalue is the smallest ignifi ance level at which you can reject the null hypothesis. the prespecified pr bability of a type I errori the significance level of the test.e critical vulue of the test statistic is the value of the statistic for which the test just rejects the null hypothesis at the given significance level. Equivalently. assuming that the null hypothe is is correct.e pr bability that the test actually incorrectly rejects the null hyp thesis when it is true is the size of the test. This value exceeds 1. TI. summarized in Key oncept 3.e prespeeified rejection probability of a statistical hypothesis test when the null hypothesis is truethat is. The set of values of the test statistic for which the test rejects the null hypothesis is the rejection region.3.96. by random sampling variation. Testing hypotheses u ing a prespecified significance level does not require computing pvalues.2 Hypothesis Tests(oncerning the Population Mean 77 The Terminology of Hypothesis Testing ~ A statistical hypothesi te t can make two types of mi takes: a type I error.ided test is 1. so the hypothesis is rejected at the 5% level.96. the populati n mean !iy is said to be statistically significantly different from 1'1'.06. in which the null hypothesis is not rejected when in fact it is false. What significance level should you use in practice? and econometricians In many cases. statisticians use a 5% significance leveJ.5. If the test rejects at the 5% significance level. and a type U error. 3. The significance level of the test in Equation (3. at least as adverse to the null hyp thesis value as is the statistic actually observed. and the values of the lest stati tic f r which it docs not reject the null hypothesi is the acceptance region.5 111i framework f r te ting statistical hypotheses has some specialized terminology. and the rejeclion regi n i the values of the rstatistic out ide ±J . in which the null hypothesis is rejected when in fact it is true.ll.
if a government agency is considering permitting the sale of a new drug. For example. then you never need to look at any statistical change yonr mind! The lower the significance test.1 %.78 CHAPTER 3 Review of Statistics KEY CONCEPT 3. Compute the tstatistic 3. the most conevidence for you will never servative thing to do is never to reject the null hypothesisbut view. tbe alternative hypothesis might he that the mean exceedS so the relfJy. Sometimes a more conservative significance level might be in order. if It""1 > 1. so a 5% significance level. Many economic compromise.O * 1. Compute the pvalue error ofY.6 summarizes hypothesis tests for the population and policy applications a legal case. evant alternative to the null hypothesis that earnings are the same for college graduates and noncollege graduates is not just that their earnings differ. but rather . in the sense of using a very low significance level. In fact. the significance level used is 1%. Compute the standard 2.o· For example.o Against the Alternative E(Y) fLY. has a cost The smaller the significance difficult it becomes level. you would incorrectly reject the null on average once in 20 cases. SECY) [Equation [Equation (3.8)]. OneSided Alternatives In some circumstances. [Equation (3. the hypothesis at the 5% significance level if the pvalue is less than 0. hypotheses at the 5% level. Being conservative.6 Testing the Hypothesis E( Y) = fLy. and the null hypothesis could be that the defendant is not guilty.13)]. Key Concept 3. Reject (3. legal cases sometimes involve statistical evidence. one hopes that education helps in the labor market. In some legal settings. the lower the power can call for less conservatism level is often considered to be a reasonable mean against the twosided alternative. to avoid this sort of mistake.or even 0. Similarly. the larger the critical value and the more if that is your of the than to reject the null when the null is false. a very conservative standard might: be in order so that consumers can be sure that the drugs available in the market actually work.05 (equivalently.14)].96). then one would want to be quite sure that a rejection of the null (conclusion of guilt) is not just a result of random sample variation.
64. to test the onesided hypothesis in Equation (3. and the prespecified this set is called the confidence level. Test the null value !Lv.3 Confidence Intervalsfor the Population Mean 79 that graduates earn mare than non graduates. called a confidence interval.: E(Y) 11.16). region For this test is all values of the zstatistic exceeding 1.. = !Lyo against the alternative that!LY '" !Ly. with the modifi cation that only large positive values of the Istatistic reject the null hypothesis rather than values that are large in absolute value. That is. Such a set is in sible to use data from a random the true population called a confidence the possible sample to construct a set of values that contains probability that !Ly is contained mean tv with a certain prespecified set.a.96. this hypothesized 5% level. for example. is pvalue The N(O.L.o is not rejected at the that u .. Now pick another arbitrary ___________ iiiiiiiiiiiiiiiiii====~~. pvalues and to hypothesis (3.a· The onesided of the previous the 5% rejection IF instead the alternative hypothesis is that E(Y) < !LY.l. call it !LY.16) concerns values of!LY exceeding The rejection !Ly. The confidence set for !Ly turns out to be all values of the mean between a lower and an upper Limit. The pvalue is the to the distribution of the area under the standard normal distribution to the right of the calculated Istatistic..o.64.13). 3. However. "~_ . it is posprobability. rsiatistic. the confidence set is an interval.64. and write down tbis non rejected value !Ly. This is called a onesided alternative hypothesis and can be written /I. J) approximation = PrHo(Z > I'''') = 1. the pvalue. Specifically.3. hypothesis in Equation (3. region consists of values of the Istatistic less than 1.3 Confidence Intervals for the Population Mean Because oC random sampling error.16) testing is the to computing same for onesided alternatives as it is for twosided alternatives. it is impossible to learn the exact value of the population mean of Y using only the information in a sample. based on the N(O.a.17) J) critical value for a onesided test with a 5% significance level is 1. construct the Istatistic in Equation (3. then the discussion paragraph applies except that the signs are switched.e general approach > !LyO (onesided alternative). (3. Here is one way to construct Begin by picking some arbitrary hypothesis rstatistic: value for the mean.O by computing if it is less than 1. so that the a 95% confidence set for the population mean.q)(I'''').
1) distribution. you will correctly accept 21.58S£(Y)!. write this value down on your list.64S£(Y)). In 95% of all samples. a trial value of iJ. Suppose uv is 21.Thus the values on your list constitute 95% confidence set for iJ.y is an interval constructed in 95% of all possible random samples. mean that cannot be rejected This list is useful because it summarizes the set of hypotheses you can and cannot reject (at the 5% level) based on your data: If someone walks up to you with a specific number in mind. A bit of clever reasoning property: The probability the true value of that it contains the true value of the population mean is 95%.y = 21. value of iJ.7 interval for iJ.5.5. Fortunately. mean Thus.96SE(Y) of Y. your list will contain the true value of tv.96SE(Y))..y= IY interval for iJ.5 has a N(O. if you cannot reject it.y 3. so that it When the sample size n is large.y. Do this again and again. goes like this. Thus the set of values of My that are not rejected at the 5% level consists of those valfor My is Ythis approach.5. Y .. Then Y has a normal distribution centered and the zstatistic testing the null hypothesis u v = 21. 95%. for it requires you to test all possible values of iJ. the probability of rejecting the null hypothesis iJ. This method of constructing a confidence set is impractical. But because you tested all possible values of the population in constructing your set.y are ± 1.5 on 21. do so for all possible values of the population mean.5 at the 5% level is 5%. iJ.80 CHAPTER 3 Reviewof Statistics ems. a 95% confidence interval + 1. in particular you tested the true value.mUD Confidence Intervals for the Population Mean A 95% twosided confidence contains the true value of iJ. Continuing this process yields the set of all values of the population at the 5% level by a twosided hypothesis test.y. and 99% confidence 95% confidence 90% confidence 99% confidence interval for iJ. 1.96 standard errors away from ues within ±1.965£(1'). Key Concept 3.u . you can tell him whether shows that this set of values has a remarkable The clever reasoning his hypothesis is rejected or not simply by looking up his number on your handy list. there is a much easier approach. (although we do not know this). if n is large.y as null hypotheses.y= interval for intervals for iJ.o is rejected at the 5 % level if it is more than 1. this means that in 95% of all samples. According to the formula for the rstatistic in Equation (3.13). That is. indeed.y.o and test it. 90%. = 21.7 summarizes a Y. My 0.y = IY ± 2. IY ± 1.965£(1') 0.
13.28.e coverage probability of a confidence interval for the population mean is the probability. One could instead construct a onesided confidence interval as the set of values of J. Coverage probabilities. Consider the null hypothesis that mean earnings for these two populations differ by a certain amount.51 ~ [$20.4 Comparing Means from Different Populations Do recent male and female college graduates earn the same amount on average? This question involves comparing the means of two different population distributions.64 ± 1. Although onesided confidence intervals have applications in some branches of statistics.L". $25.3.15]. . 3.Ly that cannot be rejected by a onesided hypothesis test. that it contains the true population mean. 111 section summarizes how to test hypotbeses and how to conis struct confidence intervals for tbe difference in the means from two different populations.18) The null bypothesis that men and women in these populations have tbe same mean earnings corresponds to Ho i11Equation (3. be the population mean for recently graduated men. consider the problem of constructing a 95% confidence interval for the mean hourly earnings of recent college graduates using a hypothetical random sample of 200 recent coLlegegraduates where Y ~ $22.96 x 1. Then the nuLl hypotbesis and the twosided alternative hypothesis are (3. let IJ" be the mean hourly earning in the population of women recently graduated from coLlegeand let J.64 ± 2.4 ComparingMeansfrom DifferentPopulations 81 As an example.28 ~ 22. say do. computed over all possible random samples.18) with do ~ O. ~'ypothesis Tests for the Difference Between Two Means To iLlustrate a test for the difference between two means. The 95% confidence interval for mean hourly earnings is 22.64 and SECY) ~ 1. This discussion so far has focused on twosided confidence intervals. they are uncommon in applied econometric analysis. TI.
is distributed can be = do· In N[Jio". . then this approximate normal distribution fJw used to compute pvalues for the test of the null hypothesis that fJ". practice.82 CHAPTER 3 Reviewof Statistics Because these population means are unknown./I1". s.. is the population ance of earnings for men.Jio". however./ and (T~. Thus the 1.4 that a weigbted average of two normal random variables is itself normally distributed."1. the p·value of the tWOSIde d . according to the central distributed limit theovari distributed eI.Y.. 1.. they are independent random variables.. III ·IV and dividing t=("1./ nm). where s. .) = (3... and 1. where eI'. .Jio". by subtracting Y. . except that the statistic is is computed only for the men in the sample.. Y./1  ~v 5£(1. these population variances are typically unknown be estimated. (320) If both fIlii and nw are large. (J~ t:v is approximately constructed N( Mill! a." are from different randomly Thus elected samples. The rstatistic for testing the null hypothesis is constructed analog variable.y ..y". two means). = do using Y. is defined similarly for the women. Suppose we have samples of be"'Y. then this rstatistic has a standard normal distribution... we need to know "1.:. As before." .:/I1".19) when Y is a Bernoulli random see Exercise 3. If (J"I. Because the rstatistic in Equation (3. "1. from the estimator the result by the standard error of YIII .. Let the sample average annual yoIV for women. and s.)]..). is the population variance of earnings for women. .~ is defined standard error of as in Equation (3. usly to the Istatistic for testing a hypothesis about a single population the null hypothesized value of 1'". Also. 11/ 5£( Y _ V) lw (rstatistic for comparing .) + (eI.Y. they must be estimated 11.Jio"" (eI. for men and III the distribution of rem.. approximately where men and nw women earnings ~I" ~Il  drawn at random from their populations..15...19) For a simplified version of Equation (3. 11" mean.Yw)dO ..7/I1"." ." is...20) has a standard normal distribution under the null hypothesis whe n fIlii an d· Ilw ale large.. recall from Sec tion 2. Then an estimator o[!LI/I ""'1 is Recall that N(Jio"" To test the null hypothesis that Jio". Similarly. are known." . Because Y. they can be estimated so they must using the sample variances.. . and s.7)./11 from samples of men and women.
. But III '" 1..3 the that of to constructing a confidence interval for the difference set if means.2 that a randomized subjects (individuals 01'.96. and control groups is imental treatment.L11I  95% confidence interval for d = fJ.5 DlfferencesofMeans Estimation of Causal Effects Using Experimental Data 83 test is computed exactly as it was in the case of a single population. of gender dif 3.. For example.Y. college graduates.96SE(Y.Y...17). . which receives the experbetween the sample means of the treatment of the causal effect of the treatment. (3.64. which does not receive the treatment.).. the nul1 hypothesis is rejected at the 5% significance value of the rstatisric exceeds 1.: f. d = 1'".) ± 1.. If the alternative is onesided rather than twosided (tbat is.96.1'".21) With these formulas in hand..w is (Y. and a test with a 5% significance level Confidence Intervals for the Difference Between Two Population Means The method extends if for constructing confidence intervals summarized in Section between 3. more general1y. simply calculate the level if the absolute 1 (3. or to a control group.Y". controlled experiment randomly selects of interest... Because the bypothesized value do is rejected at the 5% level III > 1. is less than 1.3. To conduct a test with a prespecified statistic in Equation significance level. .20) and compare it to the appropriate critical value...96 means of those values the estimated Y. The pvalue is computed using Equation rejects when I > 1.5 DifferencesofMeans Estimation of Causal Effects Using Experimental Data Recal1 from Section 1..2.e difference an estimator assigns them either to a treatment group..96. . . if the alternative is that 1'". do wi11be in the confidence difference.S. (3. Thus the 95% twosided confidence d within ± 1. .. III '" 1..96 standard errors of Y. the pvalue is computed using Equation (3. 1'". the box "TIle Gender Gap of Earnings of College Graduates in the United States" contains an empirical investigation ferences in earnings of U. entities) from a population then randomly 'TI.Y. then the test is modified as outlined in Section 3. > do). .14).. That is.96 standard interval for d consists errors away from do.
21). For this reason. In economics) however.84 CHAPTER 3 Review of Statistics The Causal Effect as a Difference of Conditional Expectations The causal effect of a treatment of the treatment as measured effect can be expressed is the expected effect on the outcome of interest This in an ideal randomized controlled experiment. econometricians also called quasiexperiments. E(yIX value of Y for the treatment ideal randomized controlled level x is the difference where E(YIX treatment in the cond]. as the difference of two conditional Specij. then we can let X = 0 group.20). A for the of by the difference in the sample average outcomes that the treatment the treatment and control groups. study "natural experiments. the causal effect is also called the treatment denote the control group and X = 1 denote the treatment (that . ethically sometimes in which some event questionable. then the causal between is ineffective (3. or subject characteristics has the effect of assigning Savings. the treatment controlled experiment. randomized iments tend to be expensive. for the causal effect. wellrun experiment interval 95% confidence interval for the difference in the means of the two groups is a 95% causal effect can be constructed can provide a compelling controlled experiments estimate a causal effect.E(ylx=O) in an ideal randomized is binary). ized controlled experiment. For this reason. "A Novel Way to Boost Retirement of such a quasiexperiment that yielded some surprising . so they remain rare. In the context effect. if the treatment then the causal effect level X = 0). E(Ylx=l) levels (that is. which can be tested using the rstatistic for comparing two means. given in Equation confidence interval A welldesigned. experand." to different subjects as if they had been part of a randomThe box.E(YIX = 0). such as medicine. difficult to administer. Estimation of the Causal Effect Using Differences of Means If the treatment in a randomized effect can be estimated controlled experiment is binary. If the treatment effect) is is." unrelated to the treatment different treatments provides an example conclusions. The hypothesis is equivalent to the hypothesis that the two means are the same. the causal effect on Y of treatment tional expectations. expectations. so a 95% confidence using Equation (3. are com monly conducted in some fields. If there are of experiments. experiment and = x) group (which receives level X = x) in an E( Y Ix = 0) is the expected of Y for the control group (which receives treatment only two treatment is binary treatment. in some cases. is the expected value = x) . ically.
' In 2008.74 21.10 11.87. Is the gender gap in earnings of college graduates stable.11 might' not sound like much. $4. on earn more than Socialnorms and for men was $11.50** 4. this increase is not statistically significant collegeeducated fulltime workers aged 2534 in the United States in 1992. Second.01 11.87).97 3. in 2008 Dollars Men Women Differencer Men V5.60 12.1"1per hour. or has it diminished Table 3. 1992 to 2008.$20. and 2008.98 .]1 ** 0. continued .11 (= $24. and 2004 were Population Sur adjusted for inflation by putting them in 2008 dollars using the Consumer Price Index (CPI).35 2.19 3.98 20.88 2.5 DifferencesoFMeans Estimation of Causal Effects Using Experimental Data 85 The Gender Gap of Earnings of College Graduates in the United States T he box in Chapter 2. the data for 2008 were collected til March 2009).41.02 20.66 1368 1230 ll81 1735 .17 10. 1996. from 1992 to 2008.87 7.66. assuming a 40hour work week and 50 paid weeks per year.12 24. the average hourly earnings of the '1838 men surveyed ~rC"ds in Hourly Earnings in the United States of Working College Graduates.Yw) for d 1992 1996 2000 2004 2008 to.324.42 0.66'/1871)."14** 4.ey.80).80 These estimates are computed using data on all fulltime workers aged 2534 surve~ed in the Current POI~ulfltionS~rv. Ym Yw SE(Ym .96 X 0.1871 3. with a standard error of $0. but over a year it adds up to $8220. Women 95% Confidence Interval $m IJm Year ¥m 23.35 = ($3.414.11 ± 1." shows that. The average hourly earnings in 2008 of the 1871 women surveyed was $20.87 per hour in real terms. Til us the estimate of the gender gap in earnings for 2008 is $4.95 9.27 22. and the standard deviation of earnings was $9.22** 3.22 per hour to $4.804.2000..The95% con States. from $3.98 y. Earnings [or 1992.87 $.2004.2000. "The Distribution of Earn was $24. the gender gap is large. using data collected by the Current vey. the estimated gender gap increased by $0.35 (=Y1I. n.78.98.404. conducted in March of the next year (for example. however. What arc the recent' trends laws governing gender discrimination in the workin the United over lime? + 9.1 suggest four conclusions.33 0. An hourly gap of $4.782/1838 average. Ages 2534.36 9.35 0.78 1594 1379 1303 1894 1838 7.36 0. and the standard deviation of earnings ings in the United States in 2008.88 25.36 9.'10** 4.1996.583.80 3..3. 111e results in Table 3. male college graduates in this "gender gap" in earnings? place have changed substantially female college graduates. 2005 18.1 gives estimates of hourly earnings for fidence interval for the gender gap in earnings in 2008 is 4. First. The difference ISsignificantly dIfferent from zero at the U I % significance level.48 24.
the topic of Part It working fulltime in 2008 was $23.1.93)/30. The use of the standard with critical valis testing and for the connormal distribution justified by the central limit theorem. To make earnings in 1992 and 2008 comparable in Table 3. Y".17). the CPI basket of goods and services that cost $100 in . that the "genanalysis documents der gap" in hourly earnings is large and has been fairly stable (or perhaps increased slightly) over the recent past. using data yt.40 in 2008. the gap is Jarge if it is measured instead in percent age terms: According to the estimates in Table 3. while for men this mean was $30.5. however. which corresponds to a gender gap of23% [= (30. by multiplying 1992earnings by 1. The analysis does not." 3. experience.1992 cost $153. When the sample size is small. which applies when the sample size is large.11/$24. the standard normal distribution can provide approximation to the distribution of the zstatistic.4):As reported in Table 2.1) than it is for all college graduates (analyzed in Tahle 2. I and criti cal values can be taken from the Student distribution. the population a poor dis tribution is itself normally distributed.4. the rstatistic is used in conjunction normal distribution for hypothesis intervals.1. Does it arise from gender 'Because of inflation. .If.6) of the zstatistic testing the mean of a single I population is the Student distribution with n . however. Over the 16 years from '1992 to 2008.6 Using the tStatistic When the Sample Size Is Small In Sections 3. tell us diswhy this gap exists. see Section 2. then the exact distribution (that is.1992 earnings arc inflated by the amount of overall CPI price inflation.86 CHAPTER3 Reviewof Statistics crimination in the labor market? Does it reflect dif.97 . the price of the CPI market basket rose by 53.97. a dollar in 1992 was worth more than a dollar in 2008.4%. . Third.. or education between at the 5% significance level (Exercise 3.2 through ues from the standard struction of confidence 3. The tStatistic and the Student t Distribution I The tstotistk: testing the mean. in 2008 women earned 16% less per hour than men did men and women? Does it reflect differences in choice of jobs? Or is there some other cause? We return to ($4. the mean earnings for all collegeeducated women these questions once we have in hand the tools of multiple regression analysis. that is.27). I Consider the zstatistic used to test the hypoth esis that the mean of Y is /Lv. slightly more than the gap of 14% seen in 1992 ($3. in other words. One way 10 make this adjustment is to lise the CPI. in the sense that a dollar in 1992 could buy more goods and services than a dollar in 2008 could.534 to put them into "2008 dollars.22/$23.o. the gender gap is smaller for young college graduates (the group aaalyzed in Tahle 3..98). the finitesample distribution. a measure of the. ferences in skills. pnce of a "market basket" of consumer goods and services Constructed by the Bureau of Labor Statistics.1 degrees of freedom. The formula for this statistic .23. Fourth. Thus earnings in 1992 cannot be directly compared to earnings in 2008 without adjusting for inflation.97] among all fulltime collegeeducated This empirical workers.93.
however.lT?. Y" are i. Specifically. and Z and Ware independently distributed./ty.d.4 that if 11..7). and it can be very complicated.1 the rstatistic given in Equation degrees of freedom. then under the null hypothesis distribution with n ..i. (3.o v. . lTh the zstatistic can be written as such a ratio. X~I distribution fat all n.0).22) can be written as I = Z/VW/(n 1).8).22) has an exact Student is normally distributed. Substitution the rstatistic: (3.1 degrees of freedom. It follows that if the population distribution of Y is normal. When 1']. . In addition. with a chisquared W is a random with n . under general conditions the standard normal has a to if the sample size is large and the null hypothesis of Y if n is large. Yu9.o)/V IT?/n and let W= (n l)s?.22) has a Student I distribution case in which the exact distribution of the rstatistic distributed. .1 degrees of freedom.2. The exact distribution of Y.r. and the population distribution of Y is N(/ty. then some algebra! shows that the zstatistic in Equation (3. and the population of1'is distribution IT?/n) for all n. let Z = (1' .d. .. recall [rom Section 2. then Z = (1' . it can As discussed is true [see Equation the rstatistic be unreliable distribution normally tribution (3. thus. W = (n l)s?/lT? has a of Y is N(}Jy. one special is relatively simple: If Y is I diswith (3. Although is reliable for a wide range of distributions if n is small. if the null hypothesis uv = }Jy. where Z is a random variable with a standard normal distribution.. where the standard error of l' is given by Equation for of the latter expression into the former yields the formula 1'/ty..o)/V IT?/n has a standard normal distribution for all fl.n 111 . and dividing by ~ and collecting terms: VsNn VU9!11 \j~ Is! (1'/"0) (nl)s9l<T9~Z.VW/(n_l).22) where standard s? is given in Equation in Section normal distribution (3..3. of the zstatistic depends on the There is.6 Using the IStatistic When the Sample Size Is Small 87 is given by Equation (3. the sampling distribution exactly N(/ty. of freedom is defined distribution To verify this result. the rstatistic approximation (3.4 that the Student n 1 degrees variable to be the distribution of Z/VW/(n 1)./ty. and lTh then l' and sr are independently I distributed. 3.10). then the zstatistic in Equation with n . If the population distribution the Student I distribution then critical values from tests and to construct can be used to perform hypothesis "The desired expression is obtained by multiplying {~1'I"'O (1'1'". .i.o is correct. Recall from Section 2.. Y" are i.12)]..
error has a Student I distribution with /I. would be for /ky.15 null hypothesis would be rejected at the 5% significance sided alternative.09). If the population in group w. + II"  2 .. consider t"" = 2. interval constructed The tstotistit: testing differences of means. distribution of Y is normal.y ) I III 'III· of Y in group /'11.. distribution is N(/k"" (T. and if the two group variances are the I . is 2. . A modified version of the differencesofmeans standard error formulathe dent I distribution "pooled" standard when Y is normally distributed. based on a different an exact Stustandard 3. does not have a does not apply here because the variance estimator chisquared distribution.88 CHAPTER 3 Review of Statistics a hypothetical is n . The (Exercise as Adopt the notation of Equation (3.20).) .Jooleri X in group 11/ and the second sumerror of the diferror is the mation is for the observations Istatistic is computed pooled standard error . given in Equation (3.19) so that the two groups are denoted and pooled variance estimator is S2 pooe Id= 1 I'lm+nw2 [ ~ (Y. (J".~v) V1/1I11/ + If"'wl and tile pooled .15 and n = 20 so that the degrees of freedom t9 ' = J 9.. The 95% confidence interval distribution. .19) does not produce a denominator in the rstatistic rstatistic.". tion.)2] I (3.. .1=1 v"i+ ~ i=l (Y.20). III error formulahas however.(T". even if the population error ill Equation The Istatistic testing tudent the differI distribu ence of two means. This confidence wider than the confidence value of 1. V.09 SE(Y)..~). Because dix Table 2. The Student used to compute I distribution the standard with a (3.23) group III group w where the first summation is for the observations = S. 2 _ 2) same ( t rat IS. As an example. From Appe.then under the null hypothesis the rstatistic computed using the pooled standard degrees of freedom.09. U".1l . using Equation (3. the level against the twn. where the standard SEpootec I( y. if the population distribution of Y in group w is N(/k". using the 119 is somewhat normal critical Y± 2.I problem in which confidence intervals.21). The pooled standard ference in means is SEpooled(Y. constructed interval using the standard > 2. the pooled of observations error formula applies only in the special case that the two groups have the same variance or that each group has the same number w.96. the 5% twosided critical value for the distribution the rsratistic is larger in absolute value than the critical value (2.
lax code. Madrian and Shea studied changed the default option firms to offer 40"1 plans in which enrollment (k) default.6 Using the rSratist« When the Sample Size Is Small 89 A Novel Way to Boost Retirement Savings M any economists think that people do not save Conventions! methods savings focus on finan Madrian between and Shea found the workers 110 systematic before differences the enough for retirement. Thus. (treatof savings. their sample is large. This research had an important then takes it But. study published Madrian and Shea found that the default enrollment rule made a huge differenceThe enrollment rate for the "optin" (control) group was 37. Laibson. To learn more about the design of retirement' behavioral economics and change and not automatically enrolled (but could opt in). Madrian (2008).4% (n whereas the enrollment Madrinn and Dennis Shea considered unconventional method for stimulating = 4249). and savings plans. cial incentives.S. although they (computed in Exercise3. be wrong? Docs the of the growing field of "behavioural both could lead to accepting option. in such plans. Brigitte one such retirement and the causal effect of the change could be estimated by the difference in means between the two groups. the Pension Pro In August 2006. but there also has been an upsurge in assigned treatment interest in unconventional for retirement. How could the default choice matter so much? firms. The estimate taken out of the paycheck of participating employ.They groups of workers: those hired prominently in testimony the year before on this part of the legislation. .8% to 50.2%. Congress tection Act that (among a large firm that passed otber things) encouraged is the and ment.3. However.5% (= 85. called 401(k) plans effect is 48. and those hired in the year after the change and automatically enrolled (but could opt out). hired and after for encouraging retirement change.9% . savings ment) group was 85. see Benartzi Choi. Enrollment savings plans in rate for the "optout" in full or in part. and Shea wondered.37. the 95% confidence for the treatment effect is after the applicable section of the U. Neither explanation is economically only if they choose to opt in. 46. ees. The econometric Shea and others featured findings of Madrian for its 40I(k) plan from compared two the nonparticipation 10participation. The financial aspects of the plan remained the same. from an econometrician's the change was like a randomly perspective.15) tight. or maybe they just didn't want to think about growing rationalold.9% (/7 the treatment Because = 5801). Many firms offer retirement which the firm matches. According tional economic models enrollmentopt of behavior. employees arc enrolled to convertof Maybe workers found these financial choices too confusing. at other at some firms employees in the plan. is always optional. and and Thaler (2007) and Beshears." and The rational worker the optimal action. Madrian could conventional economics the default enrollment method of enrotlment in a savings plan directly affect its enrollment rate? To measure the effect of the method of enroll practical impact. In an important ways to encourage saving in 2001. are automatically enrolled can opt out. the method out or opt inshould computes not matter: but both are consistent with the predictions economics.4%).
distribution in large samples.19) does not inferences about differences with the large bave a Student t distribution..0 I. in means should be based on Equation (3. and the standard normal distribution normal is negligible between interthe if the sam > 15. . however.Theretests and confidence intervalsabout the mean of a Street"). in tbe United States in 2008" and "A Bad Day on Wall of the {statistic is valid if the sample size is large. see the boxes in Chapter Distribution of Earnings imation to the distribution fore. in fact. For economic variables. which allows for different group variances. therefore.. are normal. If the population variances the null distribution * II". is used. difference never exceeds 0. the pooled standard error and the pOoled Use of the Student t Distribution in Practice For the problem of testing the mean of Y.. the difference in the pvalues computed using the Studistributions never exceeds 0.90 CHAPTER 3 Reviewof Statistics The drawback of using the pooled variance estimator S~ool'd is if the two population variances are the same (assuming 11. distributions correct standard error formula. it does not even have a standard tstatistic should not be used unless you have a good reason to believe population variances are the same. this does not pose a problem because the difference Student t distribution ple size is large. inferenceshypothesis if the 2. any economic reason for two groups having different means typically implies that the two groups also could have different ances.). variis as the and the using the standard error formula in Equation (3. Even though the Stndent {distribution is rarely applicable in econom ics.002. the pooled standard error formula is inappropriate. for n > 80. the applications. the pooled variance estimator is biased and inconsistent. In most modern enough for the difference between the Student t distribution mal distribution to be negligible. are different but the pooled variance formula distribution.In practice. If the population I that it applies only variances are different. Even if the underlying data are not normally distributed.19).19). some software uses the Student t distribution to compute pvalues and confidence vals. Therefore. When comparing two means. used in conjunction sample standard normal approximation. and in all applicalarge norand the standard the sample sizes are in the hundreds or thousands. the Student {distribution is applicable normal distributions are the exception (for example. "TIle underlying population distribution of Y is normal. the normal approx distribution should be based on the largesample normal approximation. Accordingly. even if normal that the of the pooled tstatistic is not a Student the data are normally distributed. For n dent { and standard tions in this textbook. Even if tbe population In practice. given in Equation tstatistic computed (3.
how of the joint probability distribution or correlation.X)(Y. and 1). . The population covariance and correlation ever. and the Sample Correlation What is the relationship ers. this worker's by the highlighted dot in Figure 3. Y (earnings). The sample covariance.. however.1 instead of 11. .3.2 is a scatterplot (Y) for a sample of 200 managers of age (rom the March 2009 CPS. Like the estimators discussed a population SXY. like many othThis section reviews the one variable. to another. X (age). n. they are computed by replacing mean (the expectation) is _ sample mean. in which each observation in the information is by the point (X" 1).=1 V). Sample Covariance and Correlation TIle covariance population covariance and correlation were introduced in Section 2.. this difference stems from using X and Y to estimate .3 as two properties distribution is unknown.here. i = 1. . denoted 1 n !1 _ _ SXy==y2:(X. (3. 100. and the sample correlation coefficient.and the Sample Correlation 91 3.24) is computed by dividing by n.. one of the workers in this samage and earnings shows a positive rela ple is 40 years old and earns $35.This relationship is not exact.7 Scatterplots.2. and earnings could not be predicted perfectly using only a person's age. Figure 3.2 corresponds to an (X. previously in this chapwith a ter. are Y) pair for one of the observations. the Sample Covariance. For example. the Sample Covariance. The sample covariance and correlation of the population and are estimators of the population covariance and correlation. the average in Equation (3. indicated tionship For example. relates between age and earnings? This question.). three ways to summarize the relationship between variables: the scatterplot. of the random variables X and Y Because the in practice we do not know the population can.7 Scatterplots. observations represented industry (X) and hourly earnings on X. be estimated by taking a random sample of n members collecting the data (X" Y. Each dot in Figure 3.24) Like the sample variance.78 per hour. TIle scatterplot between age and earnings in this sample: Older workers tend to earn more than younger workers. Scatterplots A scat1erplot is a plot of n. sample covariance.
correlation is unitless and lies between 1 and 1: I :s. More generally. the correlation is ±1 if the scallerplot .• . Like the population correlation. = . The data are for computer and intorrna tion systems managers from the March 2009 CPS. The hiqhlighted dot corresponds to a 40yearold worker who earns $35. .78 per hour. The sample correlation coefficient.25) The sample correlation measures the strength of the linear association between X and Y in a sample of n observations.:• e. or sample correlation. = Y. Age Average hourly earnings • 80 70 60 50 40 30 20 ###BOT_TEXT### • • • • • r r. · •• • • 40 45 35 • 25 30 50 55 60 65 Age Eachpoint in the plot represents the age and average earnings of one of the 200 workers in the sample.Y. If the all i. is denoted is the ratio of the sample covariance to the sample standard deviations: whether 'Xyand (3.1. it makes little difference division is by n or n . When n is large. the respective population means.• • 0 20 • • •• •• • • ••• ••• • • • • • • • •• • • • ••• •• • • • ••••• • • • •• •• • •I • • • :• • • • • • . for all i and equals 1 if is a straight X. for line. 1. the sample 'xyl The sample correlation equals 1 if X.92 CHAPTER 3 Reviewof Statistics t:ml:!DD 100 90 Scallerplot of Average Hourly Earnings vs. • •I • • • • • • • • ••• • • •• • • • : : • •• • • •••• • •• • • •••••• •• • •••• •• • • • • • • ••• • • • •• • • • • ••• • • ••• •• • • • • • ··.
then the correlation is 3316/(9. the sample standard deviation of age is SA = 9. and the sam3.20). rather. The covariance between age and earnings is SAE = 33. have finite fourth moments is similar to the proof in Appendix 3.8.3c shows a scatterplot with no evident . lj).3a relationship between tbese variables. examples of scatterplots and correlation.3.07 X 1437) = 0. it means that the points in the scatterplot fall very close to a straight line.26) In other words.9.l'Y '+ oXy· Like the sample vari (3..07 years and the sample standard deviation of earnings is SE = $14. The closer the scatterplot is to a straight line. the sam~ ple correlation coefficient is consistent..25 or 25%. SampleCovariance.i.26) under the assumption that (X. the SampleCorrelation the and 93 line slopes upward.25 or 25%. and Y.25 means that there is a positive relationship between age and earnings.3b shows a strong negative relationship with Figure 3. then there is a positive relationship between X and Y and the correlation is L If the line slopes down. Because the sample variance and sample covariance are consistent. in which case the sample standard deviations of earnings is 1437¢ per hour and the covariance between age and earnings is 3316 (units are years X cents per hour). lj) are i. To verify that the correlation does not depend on the units of measurement. P s. The correlation of 0. Suppose that earnings had been reported in cents.3 that the sample covariance is consistent and is left as an exercise (Exercise 3.16/(9. tbe closer is the correlation to ±L A high correlation coefficient does not necessarily mean that tbe line has a steep slope.3 gives additional shows a strong positive linear ple correlation is 0.7 Scatterplots. The proof of the result in Equation (3. corr(X. this relationship is far from perfect.d. and that X.16 (the units are years X dollars per hour.37) = 0. Thus the correlation coefficient is "IE = 33.07 X 14.37 per hour. rXY Example. That is. Figure 3. in large samples the sample covariance is close to the population covariance with high probability. For these 200 workers. ance. Figure 3. that is. then there is a negative relationship and the correlation is1. but as is evident in the scatterplot. not readily interpretable). consider the data on age and earnings in Figure 3. Figure a sample correlation of 0. the sample covariance is consistent.2. As an example. Consistency of the sample covariance and correlation.
.3b show strong linear relationships between X and Y In Figure 3. '.' . :oi fe... the sample correlation is zero..94 ~ CHAPTER 3 Review of Statistics Scatterplots for Four Hypothetical Data Sets in Figures The scatterplots 3.. . • '0. . ' . 0: . This final example emphasizes all important point:The correlation is a measnre of linear association.3d.. In Figure 3. °0' . .. : ... ... to:' . .. 0° : 60 50 40 30 20 10 0 70 • :' to· "'~'.. . .:I. . There is a relationship not linear. \:..' : '.. • .... • °0... cernable relationship is zero.3a and 3.0 (quadratic) ~ relationship. .' '.. ...3d. :.. .. . . :. for these data.:'.. . 80 90 100 110 120 130 x t10 120 130 x 0 70 (a) Correlation y = (b) Correlation = 0.." . . . 80 90 (c) Correlation = 0.. .. and the sample correlation tionship: As X increases.rI'!. 0° ".. • 0° '.3d shows a clear relathis dis Y initially increases bnt then decreases.>fo. the two y 70 60 50 40 30 20 10 0 70 80 90 100 +0. X is independent of Yand the two variables are uncorrelated. "0. '..." • :'. 100 110 120 130 x . • ::###BOT_TEXT###0. " . Despite between X and Y . •\ . ..0° .' I' ..0 (d) Correlation = 0.~ . .. but it is I=~=:... .: • .9 y 70 variables also are uncorrelated even though they are related nonlinearly. ~. 60 50 40 30 20 10 i.=~~ .:. " " ..8 Y 70 60 50 40 30 20 10 0 70 80 90 100 110 120 130 x 70 .3(.. '. 0". small values of Yare associated values of X.' . '. ..". Figure 3. the reason with both large and small coefficient is that. in Figure 3.
Ly and variance u~ = a'P/n. Y is unbiased.d.. 2.::=~=~ . 4... Hypothesis tests and confidence intervals for the difference two populations are conceptually of a single population. The sample average.... b.. and measures the linear relationship between two vari is. iiiiiiiiiiiiiiiiiiiiiiiiiiii. by the central limit theorem.. The rstatistic can be used to calculate the pvalue associated with the null so that it con hypothesis.. 3. Ylj"" Y" are i.. the sampling distribution of Y has mean J. 6. d. is an estimator of the population mean.i. pling distribution when the sample size is large.Ly in 95% of all possible samples.. 5. Y is consistent. how well their scatterplot is approximated Key Terms (66) (66) and efficiency (68) estimator tests (70) (70) hypothesis probability) (70) (71) hypothesis (70) alternative (significance variance (73) (69) (67) (Best Linear Unbiased sample standard standard error of deviation (73) (74) degrees of freedom estimator estimate BLUE bias. consistency.When a. The sample correlation lation coefficient ablesthat coefficient in the means of similar to tests and intervals for the mean is an estimator of the population by a straight correline.. Estimator) least squares hypothesis alternative twosided pvalue sample null hypothesis Y (74) Istatistic (rratio) (75) test statistic (75) type 1 error (77) type II error (77) significance level (77) critical value (77) rejection region (77) acceptance region (77) size of a test (77) ______ .. The rstatisric is used to test the null hypothesis that the population takes on a particular value. If n is large. tv. and Y has an approximately normal sammean normal c.. Y... the /statistic has a standard sampling distribution when the null hypothesis is true..Key Terms 9S Summary 1. A small pvaJue is evidence that the null hypothesis is false. A 95% confidence interval for !"y is an interval constructed tains the true value of J. by the law of large numbers...
5 of confidence play in statistical hypothesis test intervals? is? Among hypoth What is the difference between a null and alternative hypothe size. (b) 1. Exercises 3. Relate your answer to the law of large numbers. is an estimator of the treatment Sketch a hypothetical scatterplot for a sample of size 10 for two random variables with a population correlation of (a) 1.96 CHAPTER 3 Reviewof Statistics power of a test (77) onesided alternative confidence set (79) confidence level (79) confidence interval (79) coverage probability test for tbe difference means (81) causal effect (84) hypothesis (79) treatment scatterplot effect (84) (91) sample covariance (91) sample correia lion coefficient (sample correlation) (92) (81) between two Review the Concepts 3. (d) 0. significance level. Determine the mean and variance ofY from an i. 3. and (c) n = 1000.9.0. Provide an Y and the population 3.6 Why does a confidence interval contain of a single hypothesis test? Explain why the differencesofmeans randomized 3. and an estimate.0.3 A population distribution has a mean of 10 and a variance of l6.!"y = 100 and answer the following questions: O'f = 43. (b) rt = 100. (e) 0.2 Explain the difference between an estimator example of each.i.8 controlled experiment.d. 3. more information than the result 3.7 estimator. applied to data from a effect.5. and power? Between a onesided alternative esis and a twosided alternative hypothesis? 3.1 Explain the difference between the sample average mean. Use the central limit theorem to .4 What role does the central limit theorem ing? In the construction 3.0. (c) 0.1 In a population. sample from this populati n for (a) n = 10.
What is the pvalue for the test Ho: p ~ 0. draws from this distribution. to calculate the standard error of your estimator. In a random sample of size n = 100.3 .5? for the test Ho: P = 0. and let p be the fraction of survey respondents preferred the incumbent.p lin. that they would vote for the Let at who the incumbent In a survey of 400 likely voters. 98). HI: p > 0. 215 responded incumbent and 185 responded p denote the fraction that they would vote for the challenger.50 vs. Why do the results from (c) and (d) differ? f. b.d. Hl: p '" 0. evidence that the at the time of the survey? c. . Show that var( 3. Y" be i.Exercises 97 a.i. Y = 1) =p .5 calculations.Plin. test the hypothesis level. In a random sample of size n = 64. of all likely voters who preferred the time of the survey. find PrCY < Y> 101).. b.5 vs. Construct a 99% confidence interval for p.5? e. c.2 < Y < 103). c. Use the estimator of the variance of p. p is an unbiased p) = p(l estimator of p.3: a. find Pr(l01 c. and let A. a. II. What is the pvalue d. HI: p '" 0. Why is the interval in (b) wider than the interval in (a)? d. A and candidate who prefer candidate A survey of JOSS registered to choose between candidate of voters in the population and the voters are asked A. Let Y be a Bernoulli random variable with success probability Pre and let l'[.50 at the 5% significance voters is conducted. find Pre 3. of successes (Is) in this sample. interval for p. . Did the survey contain statistically significant incumbent was ahead of the challenger Explain. Letp be the fraction Show that b. Let p denote the fraction p denote the fraction of voters in the sample who prefer Candidate . 3. Construct a 95% confidence b.5 vs.. Without doing any additional Ho: p 3.p(l . B. Show that p = Y. = 0. In a random sample of size n = 165.4 Using the data in Exercise 3. Use the survey results to estimate p.
What is the probability that the true value of p is contained in aU 20 of these confidence intervals? n. * 5 usingthe usual rstatisuc yields a pvaJue of 0.and the sample standard deviation is 123.HI: P > O. Y" be i. I.pi> 0. Ill.i. draws from a distribution with mean u. . Test Ho: p = 0. val? Explain. it is half the length of 9S% confidence interval.96 X SEt p). v. c. a 9S% confidence interval for p is constructed.p I. = 0. Construct a 95% confidence interval for p. .s] > 0. Suppose that you decide to reject flo if * = O.01) :5 O. b. Test Ho: P = 0. Suppose that the survey is carried out 20 times. Construct a 95% confidence interval for the population mean test score for high school seniors. = 6 is contained a. 11.5 vs. In the survey.98 CHAPTER3 Reviewof Statistics a. The sample mean test score is 1110. A survey using a simple random sample of 600 landline telephone numbers finds 8% African Americans. Does the 9S% confidence interval contain u: = 5? Explain. Construct a 99% confidence interval for p. Can you determine if jJ. = S versus Hv. .8 A new version of the SAT test is given to 1000 randomly selected high school seniors. You are interested in the competing hypotheses flo: P fll: p 0. that is. fll: P * O. 11% of the likely voters are African American..Susing a S% significance level. IV. Construct a SO%confidence interval for p.Q2. In survey jargon. the "margin of error" is 1. Compute the power of this test if p = 0. i.Svs. What is the size of this test? II.53.03.Susing a S% significance level. Ip .54. Suppose you wanted to design a survey that had a margin of error of at most 1%. jJ. b. using independently selected voters in each survey. you wanted Pr( p .5 vs. 3.6 Let flo: l'I. A test of jJ. Is there evidence that the survey is biased? Explain. How many of these confidence intervals do you expect to contain the true value of p? d. That is. in the 9S% confidence inter 3.5.o.For each of these 20 surveys.OS.d.7 In a given population. ow large should n be if the surH vey uses simple random sampling? I 3..
H(!L > 2000. she will conclude that the new process is no better than the old process.11 Consider the estimator Y. Show that 3. Consider the null and alternative hypothesis HO:!L = 2000 vs. . Can you conclude with a high degree of confidence that the population means for Iowa and New Jersey students are different? (What is the standard error of the difference in the two sample means? What is the pvalue of the test of no difference in means versus some difference?) 3. Construct a 95% confidence interval for the mean score of all New Jersey third graders. and the sample standard deviation.25o}/n.10 Suppose a new standardized test is given to 100 randomly selected thirdgrade students in New Jersey.Exercises 99 3. Sy. What testing procedure should the plant manager use if she wants the size of her test to be 5%? 3.~.12 To investigate possible gender discrimination in a firm. An inventor claims to have developed an improved process that produces bulbs with a longer mean life and the same standard deviation. b. Suppose the new process is in fact better and has a mean bulb life of 2150 hours. Construct a 90% confidence interval for the difference in mean scores between Iowa and New Jersey. What is the size of the plant manager's testing procedure? b. The sample average score Yon the test is 58 points. a. otherwise. c. The authors plan to administer the test to all thirdgrade students in New Jersey. She says that she wiil believe the inventor's claim if the sample mean life of the bulbs is greater than 2100 hours.1). defined in Equation (a) E(Y) =!Ly and (b) var(Y) = 1. The plant manager randomly selects 100 bulbs produced by the process. What is the power of the plant manager's testing procedure? c. producing a sample average of 62 points and sample standard deviation of 11 points. (3. is 8 points.9 Suppose that a lightbulb manufacturing plant produces bulbs with a mean life of 2000 hours and a standard deviation of 200 hours. Let 11 denote the mean of the new process. A summary of the resulting monthly salaries follows: n iiiiiiiiiiiiiiiiiiiiiiii======'. a sample of 100 men and 64 women with similar job descriptions are selected at random. Suppose the same test is given to 200 randomly selected third graders from Iowa. a.
b.14 Values 01 height in inches (X) and weight in pounds (Y) are recorded a sample of 300 male college students.15 Let Y" and y" denote Bernoulli random variables [rom two different popa and b. Convert these statistics to the metric system (meters grams).. Y= 158 Ib. hypothesis. TI. ~ n the following results were found: .) [gender discrimination fir t state the rstatistic. with sample average ulations.5 rXY = in.13 Data on fifthgrade districts in California •. 3..8 in. Do these data suggest that the firm is guilty in its compensation 3.5 .85. . Construct population.100 CHAPTER 3 Reviewof Statistics Average Men Women Salary (Y) Standard Deviation (Sy) n $3100 $2900 $200 $320 tOO 64 of •. policies? Explain. and a random sam pic of size lib is chosen from population b.2 and standard = 19.2 lb. null and alternative third. SXl' = 21.0 significant evidence 19.9 238 182 Is there statistically that the districts with smaller classes have higher average test scores? Explain..4 650. When the districts were divided into districts with small cia scs « students per teacher) and large classes 20 (2: 20 students per teacher). s« = 1. compute use the pvalue the pvalue associated to answer the question.Class Size Small Large Average Score (Y) Standard Deviation (Sy) 6S7. Sy = 14.4 t7.e resulting summary statistics from are X = 70. What do these data suggest about wage dif[erences in the firm? Do they represent statistically significant evidence that average wage men and women are different? (To answer this question.) = Pb' A random sample of size II" is chosen from population a. 3. X lb. test scores (reading yield and mathematics) deviation for 420 school Sy Y = 646. compute the relevant with the Istatistic. Suppose that E( y") = P« and E( Yt. and finally. a 95% confidence interval for the mean test score in the b. and and kilo 0. denoted denoted p". second.73 in.
Exercises with sample average denoted jJb' Suppose the sample from population independent of the sample from population b. a. Show that E(p,)
= Pa and var(Pa) = Pa(1
101 a is
Pa)/n". Show that
E(p,,)
= p"
and
var(Pb)
1I
= Pb(l  p")/n,,.
b. Show that
var(p
 P'b ) = Pa(1 nn Pal + p,,(1 tlb. Pb) (H"tnt. R emem ber , I" er
that the samples are independent.) c. Suppose that n, and nb are large. Show that a 95% confidence interval for Pa  p" is given by
(Pa  p,,) ± 1.96
cP,,""( 1 a",,)  "c,.''P + /)" (1n» P b ) . n{/
interval for Pa  p,,? Savings" in Section group and
How would you construct d.
a 90% confidence
Read the box "A Novel Way to Boost Retirement 3.5. Let population population confidence b denote
a denote the "optout"
the "optin" (control)
(treatment)
group. Construct a 95%
interval for the treatment
effect,p,  p".
3.16 Grades dents
on a standardized in the United
test are known to have a mean of 1000 for stuto 453 randomly
States. The test is administered
selected students in Florida; in this sample, the mean is 1013 and the standard deviation (s) is 108.
3.
Construct a 95% confidence Florida students.
interval for the average test score for
b. Is there statistically differently
significant evidence that Florida students perform
than other students in the United States? are selected at random from Florida. They are deviation of 95.
c. Another 503 students
given a 3hour preparation
course before the test is administered.
Their average test score is 1019 with a standard \. Construct a 95% confidence
interval for the change in average
test score associated
I\.
with the prep course. significant evidence that the prep course
Is there statistically helped?
d. The original 453 students
are given the prep course and then are change in their test of the change is 60
asked to take the test a second time. The average scores is 9 points, and the standard deviation points. \. Construct a 95% confidence rest scores.
interval for the change in average
102
CHAPTER 3
Review Statistics of
II.
Is there statistically significant evidence that students will perform better on their second attempt after taking the prep curse?
iii. Students may have performed better in their sec nd attempt because of the prep course or because they gained testtaking experience in their first attempt. Describe an experiment that would quantify these two effects. 3.17 Read the box "The Gender Gap of Earnings United States" in Section 3.5. a. Construct a 95% confidence interval for the change in men' hourly earnings between 1992 and 2008. b. Construct a 95% confidence interval f r the change in w men's average hourly earnings between '1992 and 2008. c. Construct a 95% confidence interval for the change in the gender gap in average hourly earnings between 1992 and 200 . (Hirn:
"Y,1l.1992 Yw.l992
of
allege Graduates
in the
average
is independent of
~1I,2008  ~v,2008')
3.18 This exercise shows that the sample variance i an unbiased
the population ance
(J~.
estimator
of
variance when
Y1"",
Y,', arc i.i.d. with mean J.Ly and vari
a. Use Equation (2.31) to show that E[( Y,  y)2] = var( Y,)  2cov( Y" Y) var(Y). b. Use Equation (2.33) to show that cov(Y, Y,)
+
= uNII.
c. Use the results in (a) and (b) to show that £(s~) 3.19 a. Y is an unbiased estimator of !ky. Is b. Vis a consistent estimator of tv Is
=
IT~.
y2 an un iased e .timator of Jk~?
y2
a c nsi tent cstimat
r of !k~?
3,20 Suppose that (Xi, Y,) are i.i.d. with finite fourth m mcnts, Prove that the sample covariance is a consistent estimator of the populati n covariance, that is, SXY L.. a xy, where SXY is defined in Equati n (3.24). (Ill/II: Use the strategy of Appendix 3.3 and the auchy chwartz inequality.) 3.21 Show that the pooled standard errol' [S£,wol"I(Y,,,  Y,,)] given following Equation (3.23) equals the usual standard error for the difference in means in Equation (3.19) when the two group sizes are the same (11m = 11".).
Empirical Exercise
103
Empirical Exercise
E3.1 On the text Web site http://www.pearsonhighered.com/stock_watson/You will find a data file CPS92_08 that contains an extended version of tbe dataset used in Table 3.1 of the text for the years 1992 and 2008. It contains data on fulltime, fullyear workers, age 2534, with a high scbool diploma or B.A.lB.S. as their highest degree. A detailed description is given in CPS92_08_Description, available on the Web site. Use these data to answer the followingquestions. a. Compute the sample mean for average hourly earnings (AHE) in 1992 and in 2008.Construct a 95% confidence interval for the population means of ARE in 1992 and 2008and the change between 1992and 2008. b. In 2008,the value of the Consumer Price Index (CPI) was 215.2. In 1992, the value of the CPI was 140.3.Repeat (a) but use ARE measured in real 2008 dollars ($2008);that is, adjust the 1992 data for the price inflation that occurred between 1992 and 2008. c. If you were interested in the change in workers' purchasing power from 1992 to 2008, would you use the results from (a) or from (b)? Explain. d. Use the 2008 data to construct a 95% confidence interval for the mean of ARE for high school graduates. Construct a 95% confidence interval for the mean of ARE for workers witb a college degree. Construct a 95% confidence interval for the difference between the two means. e. Repeat (d) using the 1992 data expressed in $2008. 1'. Did real (inflationadjusted) wages of high school graduates increase from 1992 to 2008? Explain. Did real wages of college graduates increase? Did the gap between earnings of college and high school graduates increase? Explain, using appropriate estimates, confidence intervals, and test statistics. g. Table 3.1 presents information on the gender gap for college graduates. Prepare a similar table for higb school graduates using tbe 1992 and 2008 data. Are there any notable differences between the results for high school and college graduates?
i~
104 CHAPTER 3 Review
of Statistics
APPENDIX
3.1
The U.s. Current Population Survey
Each month, the Bureau of Labor Statistics in the U.S, Department
population, including the level of employment, unemployment,
of Labor conducts
the
Current Population Survey (CPS), which provides data on labor force characteristics 50,000 U.S. households arc surveyed each mont h. The sample is chosen by randomly ing addresses from a database of addresses from the most recent decennial mented with data on new housing units constructed
of the select_
and earnings. More than
census aug
after the last census. The exact random
sampling scheme is rather complicated (first, small geographical arcus arc randomly
selected, then housing units within these areas arc randomly selected): details can be found in the Handbook of Labor Sratisrics and on the Bureau of Labor Statistics Web site (www .bls.gov), The survey conducted each March is more detailed than in other m nths and asks
questions about earnings during the previous year.The statistics in Tables 2.4 and 3.1 were computed using the March surveys.The CPS earnings data are for fulltime workers, defined to be somebody employed ous year. more than 35 hours per week for at least 48 weeks in the previ
APPENDIX
3.2
Two Proofs That Yis the Least Squares Estimator of fLy
This appendix provides two proofs. one using calculus and aile not, that Y minimizes sum of squared prediction
estimator or
the
mistakes in Equation
(3.2)thnl
is, that
Y
is the least squares
E( Y),
Calculus Proof
To minimize the sum of squared prediction mistakes. take its derivative and set it to zero: d
dm
II
:? (Y; III)' ~ 2 2:(l'i
1=1
II
1/
II
III)
= 22: Y;+ 2/1111 0, =
;=1 'C'IJ ~i=1
(3.27)
Solving for the final equation for
//'1
shows that
(V'iII'/
)2' IS
••. nurunuze d
I W1 ell
m=Y.
from which it follows that Y is the =Y _ so that m=Vd.by setting m = Y so that Y is the least APPENDIX 3. as stated 1')' = [( Y.[Equation (3. ~.2)J is II H n ~ (Y. as small as possible. .Y) = O.. when Y. .3 A Proof That the Sample Variance Is Consistent This appendix uses the law of large numbers to prove that the sample variance sistent esti maror of the population variance Sf is a con a$.~I (11 11I)2 is minimized by choosing d to make the second term. .i.!Ly)J' = (Y.9). Because both terms in the final line of Equation (3.  where the final equality follows from the definition ofY [which implies that L.d.1')2 into the definition of sl.28) where the second equality uses the fact that L.V)'+ 2d( Y. Let d and Y In.!LY)+ (1' . Then (Y.(Y.!LY)] and by collecting terms. 1'.!Ly)2 2( Y..A ProofThat the Sample Variance Is Consistent 105 Noncalculus Proof The strategy is to show that the difference between the least squares estimator least squares estimator. d = Othat is. add and subtract uv to write (Y.28) are non negative and because the first term does not depend on d.VJ+d)'=(Y.m)'=(Y. . This is done by setting squares estimator of E( Y).7)].!LY)'.fLy) = fI( 1" .!Ll')( 1" .l1lL1s the sum of squared prediction mistakes [Equation (3. nd2./ are i.1')' + lid'. .[Vdj)'=([Y. 1=1 1"'1 i=l (3. . m)' 1=1 = L(Y. in Equat ion (3. must be zero.~l( tj . .1")' + 2d2.1') + nd' ~ L(Y. we have that First.~l(}j . .(1' . and E( Yf) < !LY) . .V) II + d'. Substituting this expression [or (Y. .
ltYj..) = u~ (by the definition of the vari. Now E( W. But W=  E( yl) < I'Y) " "'I..). W 00. so the second term converges in probability to zero. Combining these results yields s~~ uf.P.29).d. Also. .. ance). ..I'y)'. .) = (f9. = (Y..i.. . .. Define W. In addition. . the random variables DO = E[( Y.29) converges 2. I'y)'] < < 00. Wn are i. so the first term in in probability to uf. . by assumption.d..106 CHAPTER 3 Review of Statistics The law of large numbers can now be applied to the two terms in the final line of Equation (3.i. so (l(n) Equation (3. 11/(n 1) I.50 p because. Because 17 "> I'Y..  "> (f9.i.IJy)' .. E(W.d.. Y" are i. Because the random variables are i. and var("'[) W satisfies the conditions for the law of large E(W. numbers in Key Concept E( W. (Y ..) Thus f1.. 0. z and ~:~l (Y.6 and W ~ I'y)' ('<''' l(n)"'i~I(Y.
X. say. we show how to estimate the expected effect on test scores the line relating X and Y can be estimated by a method called ordinary least squares (OLS). Y. to another. to estimate the a sample of data on these two variables. the slope of the line relating of line relating X and Y is the effect of a oneunit change in X on Y. one student per class. the slope of the X and Y is an unknown characteristic of the population joint distribution of is. student test scores. using data on class sizes and test scores from class sizes by. She teachers and she wants your advice. The slope and the intercept of different school districts. This chapter introduces the linear regression model relating one variable. But hiring more teachers means spending more reduce the number of students per teacher (the studentteacher faces a tradeoff. For instance. she will Parents want smaller classes so that their children can receive attention. Y A state implements tough new penalties on drunk drivers: What is the effect on <Y being highway deaths. class size. or years of schooling). which is not to the liking of those paying the bill! So she asks you: If she cuts class sizes. Tf she hires the teachers. or earnings). X (X being penalties for drunk driving.1 The Linear Regression Model The superintendent additional of an elementary school district must decide whether to hire ratio) by two. on another variable. This model postulates a linear relationship between X and Y. X and Y. more individualized money. Just as the mean Y is an unknown characteristic of the population distribution of Y. The econometric problem is to estimate this slopethat effect on Y of a unit change in Xusing of data on of reducing This chapter describes methods for estimating this slope using a random sample X and Y.cmmD Linear Regression 4 with One Regressor highway fatalities? A school district cuts the size of its elementary school classes: What is the effect on its students' standardized test scores? You successfully complete one more year of college classes: What is the effect on your future earnings? All three of these questions are about the unknown effect of changing one variable. what will the effect be on student performance? 107 . 4.
1) so that A TesiScore ~ (3CI". Then a reduction you would predict in class size of two students (4. (4. (4. Equation (4. not only would you be the change in test scores at a district associated with a change in class size. you would be able to tell the superintendent that decreasing class size by one student would change districtwide {3CfassSize' test scores by You could also answer the superintendent's changing class size by two students actual ques tion.108 CHAPTER 4 Linear Regression with One Regressor In many school districts. 111us. We therefore is measured by standardized can depend in part on how sharpen the superintendent's question: [f she reduces the average class size by two students. To do so. A (delta) stands for "change is..'Si". of the test that test scores would rise by 1.2 points as a result reduction in class sizes by two students per class. what would beta. This straight line can be written line relating TestScore = where {3o is the intercept According to Equation able to determine f30 + {3cltlssSize X ClassSize."That f3C/assSize (4. (4.. .6) X (2) = 1. as before. change in TestScore change in ClassSire A TestScore A Classsize' in.Siu X A ClassSize. {3C1a. where the subscript she expect the change in standardized test scores to be? We can write this as a mathusing the Greek letter ClosiSir» distinguishes the effect of changing the class size from other effects.3) of this straight line and. which concerned rearrange Equation per class.1) is the definition of the slope of a straight scores and class size. if you knew f30 and {3CI". Suppose that f3C1""Si" ~ 0.3)..2) per class would yield a predicted change in test scores of (0. If the superintendent ematical relationship a quantitative statement about changes the class size by a certain amount. f3C1""SI".1) where the Greek letter change in the class size. f3CI""Slu is the by the change in the test score that results from changing the class size divided If you were lucky enough to know f3C1""Si. what will the effect be on standardized test scores in her district? A precise answer to this question requires changes..6. that is.Siu is the slope. student performance tests.2. and the job status or pay of some administrators well their students do on these tests. but you also would be able to predict the average test score itself for a given class size.
3) to the superintendent. and let u..4) Thus the test Score for the district is written in terms of one component. denote the other factors influencing the test score in the il" district. let Xi be the average class size in the i th district. Equation (4.] . including each district's unique characteristics (Ior example. A version of this linear relationship that holds for each district must incorporate these other factors influencing test scores. she tells you that something is wrong with this formulation. i = 1. of course.. = /30 + f3. be the average test score in the i Ih district. She is right.4. n). teachers. One approach would be to list the most important factors and to introduce them explicitly into Equation (4.5) instead of /3 C/tI.Xi + /I.. (4. where f30 is the intercept of this line and /31 is the slope.5) for each district (that is. Let Y.4) is much more general.3) (an idea we return to in Chapter 6). Two districts with comparable class sizes. however.. fnstead. Suppose you have a sample of n districts. . we simply lump all these "other factors" together and write the relationship for a given district as TesrScore = /30 + /3C1""Si" X ClassSize + other factors.1 The LinearRegression Model 109 When you propose Equation (4. . Finally. quality of their teachers. so it is useful to introduce more general notation. She points out that class size isjust one of many facets of elementary education and that two districts with the same class sizes will have different test Scores for many reasons. Then Equation (4. it should be viewed as a statement about a relationship that holds 011 average across the population of districts. she points out that even if two districts are the same in all these ways they might have different test scores for essentially random reasons having to do with the performance of the individual students on the day of the test. and textbooks still might have very different student populations.uSIz. /30 + /3ClassSize X ClasxSize.e because this equation is written in terms of a general variable Xi. . that represents the average effect of class size on scores in the population of school districts and a second component that represents all other factors. Although this discussion has focused on test scores and class size. One district might have better teachers or it might use better textbooks. for all these reasons. the idea expressed in Equation (4. For now. how lucky the students were on test day). perhaps one district has more immigrants (and thus fewer native English speakers) or wealthier families.4) can be written more generally as Y.3) will not hold exactly for all districts. [111egeneral notation /31 is used for the slope in Equation (4. (4. background of their students.
1'1. i. when X is the class size. strictly of the intercept is nonsensical. Y. a single regressor regression observations on test scores (Y) and class size (X). for example. f30 variable or the regressor that holds between The first part of Equation the popnlation regression ing to this population + f3. acconj.X. if you knew the value of X. including teacher quality. The linear regression cept 4. in Equation (4.TIle popu lation regression line is the straight line f30 + f3. but.X The population < 0).1. line Or Y is the dependent variable and X is the independent (4. also known as the parameters of the population regression line. Because of the other factors that determine test performance. for a specific observation. Tn the class size example. as mentioned earlier. This error term the value of the dependent all of the factors responsible test score and the value predicted by the population contains all the other factors besides X that determine variable. In other applications.110 CHAPTER 4 Linear Regression with One Regressor Equation (4. is above the population line.e slope is the value the the with a unit change in X The intercept the Y axis.. The errol' term incorporates for the difference between the i'h district's average regression line. is f30 + f3.5). by the regression cal observations in Figure 4. and even any mistakes in grading the test. ematicalmeaning the linear regression model with model and its terminology are summarized in Key Confor line background. the hypothetiregression line. In some econometric line when X = 0. 111e term /I. which means that districts with lower studentteacher by the population regres ratios (smaller classes) tend to have higher test scores. it is the point at which the populaapplications. meaning.X The intercept f30 and the slope f31 is the change in Yassociated of the population regression tion regression line intersects intercept has a meaningful intercept has no realworld speaking the intercept f3. economic interpretation.1 do not fall exactly on the population For example. student economic luck. This is the relationship line you would regression predict that the value of the regresdependent variable. Thus. it has no realworld meaning in this example. in whict. The intercept f30 has a mathas the value of the Y axis intersected sion line.1 summarizes seven hypothetical slopes down (f3.. function. Figure 4. This means that test scores in district # I were better than predicted .5) is the linear regression model with a single regressor. is the population regression y and X on average over the population. are the coefficients of the population sian line. that determines it is the predicted value of test scores when there are no stuas the coefficient the level of dents in the class! When the realworld meaning is best to think of it mathematically the regression line.5) is the error term. Y. TI. these other factors include all the unique features of the i Ih district that affect the performance of its students on the test. the value of Y for district #1.
.1 The Linear Regression Model 111 Terminology for the Linear Regression Model with a Single Regressor The linear regression model is cmmtm) 4. Ai f30 is the independent variable. n. StudentTeacher (Hypothetical Datal The scatterplot shows hypothetical observations for seven school districts. . Yj) 680 vertical distance from the point to the population regression line is ". or simply the righthand variable.4.. Y.1 where the subscript i runs over observations. 660 640 • Yi. The population regressian line is f30 + {31X. ui for the which is the population error term ith observation.. 620 600 10 15 20 25 30 ratio (X) Studentteacher . the regressor. is the dependent variable.({3o + f31Xi). The jth Ratio Test score (Y) 700 . or simply the lefthand variable. the regressand. Scattenplot of Test Score vs.(X. . and f30 is the intercept of the population f3L is the slope of the population Ui is the error term. regression line. + f31X is the population regression line or the population regression function. i = 1. regression line.
is positive .112 CHAPTER 4 Linear Regression with One Regressor population regression line. This estimation problem is similar to others you have faced in statistics. We do not know the population value of f3CI". Y.2 Estimating the Coefficients of the Linear Regression Model In a practicalsituation such as the application to class size and test scores. But just as it was possible to learn about the population mean using a sample of data drawn from that population. The same idea extends to the linear regression model. Table4. we can estimate the population means using a random sample of male and female college graduates. for example.1summarizes the distributions of test scores and class size for this sample. The data we analyze here consist of test scores and class sizes in 1999 in 420 California school districts that serve kindergarten through eighth grade. so is it possible to learn about the population slope f3ClassSi" using a sample of data. of the population regression line are unknown.'Si". The average studentteacher ratio is 19. For example. so the error term for that district. so test scores for that district were worse than predicted.In con.6 students per teacher. Then the natural estimator of the unknown population mean earnings for women. The 10'h percentile of the distribution of . Class size can be measured in various ways. These data are described in more detail in Appendix 4. But what is the value of {3Clas. The test score is the districtwide average of reading and math scores for fifth graders.I'Size? 4. and '" < O.9 students per teacher. is below the population regression line..e answer is easy: The expected change is (2) X f3Cltu. and the standard deviation is 1.. trast.'Si'" the slope of the unknown population regression line relating X (class size) and Y (test scores). we must use data to estimate the unknown slope and intercept of the population regression line. which is the number of students in the district divided by the number of teachersthat is. is the average earnings of the female college graduates in the sample.the districtwide studentteacher ratio. Now return to your problem as advisor to the superintendent: What is the expected effect on test scores of reducing the studentteacher ratio by two students per teacher? 11.suppose you want to compare the mean earnings of men and women who recently graduated from college. Although the population mean earnings are unknown. "I.Therefore. the intercept Po and slope f3.The measure used here isone of the broadest.1.
:.3 (that is..' " '.. •. StudentTeacher Test score Ratio (California School District Data) California school distrios. The sample correlation is 0. •... .5 ~~~~~..: .... /..'.: .9. . .~.. .23..f .. .!t. .~~""0... there are other determinants keep the observations Despite through from falling perfectly along a straight if one could somehow this low correlation. ·.\ I..c~..:r ..' .:. • . ?. . ."~~~311o ' Studentteacher ratio ... There is a weak negative relationship between the studentteacher ratio and test scores:The sample correlation is 0...!. nOr 700 680 660 640 620 .. . these data. • '· ..4.. then the slope of this line would be an estimate of Cim!DD Data from 420 Scatterplot of Test Score vs.:.. .•. . . while the district at the 90th percentile ratio of 21.. .. Although ple tend to have lower test scores. ..2 Estimating the Coefficientsof the Linear Regression Model 113 the studentteacher ratio is 17. .: • .t ........ ...... . s:... '. draw a straight line {3Cla"Si" ratio is shown in Figure 4. .c\. •• • .....:...20. only 10% of districts have studentteacber has a studentteacher ratios below 17.23.3)...:111·· •.. .. • • ·ro. 1. 6001'~0~~~~:1.." . ..2..:y " ...: ••• .. . .. .. indicating a weak negbetween the two variables. . ...~~~~2~5.:. A scatterplot ative relationship of these 420 observations on test scores and the studentteacher larger classes in this samof test scores that line.. : ..
that minimize Expression (4. is. Y minimizes the total squared 2:. least squares (OLS) OLS estimator estima· of f30 f30 and f3. so the value or Y. are identical except for the (4. and the OLS estimator or f3. . the sample average.(bo + b. then. that (3. where cI seness is mea sured by the sum of the squared mistakes made in predicting Y given X. Y. mistakes data..j2. Just as there is a [1'17 in Expression (3.2).~.1.x..X.e 0 LS regressiou line. The regression linc based on the e estiusing this line is bo mators is bo + b. is the straight line constructed ffi.predicted mistake made in predicting + b.6) The sum of tbe squared sion (4. f3o.. The predicted value of Y. if there is no regressor.(y. One way to draw the line would be 10 take out a pencil and a ruler and to "eyeball" the best line you could. it is very people will create different the line that produce estimated lines. should you choose among the many possible lines? By far the 1U0st the "least squares" (OLS) estimator.6) and the two problems minimizes the Expression mistakes I'or the problem estimating the mean in Expression (3.lation mean. and different common way is to choose datathat How. In fact.6) is the extension does not enter Expression different notation unique estimator.6).bo ns i b.X.114 CHAPTER4 Linear Regressionwith One Regressor based on these data.2)]. Let bo of f30 and f3. fit 10 these unscientific.6)]. 2:(Y. .  mY among all possible estimators [sec Exprc The OLS estimator and b.bob. to use the ordinary least squares The Ordinary Least Squares Estimator The OLS estimator chooses the regression coefficients so that the estimated regression line is as close as possible to the observed As discussed in Section 3.. E( Y).2).so is there a unique pair (4. of the sum of the squared (4.The is denoted also called the sample regression liue or sample regression function.2).Xj• Thus the the i'"observation is Y. that is.l1. least squares estimaestimation ion (3.6) are called the ordinary of the intercept and slope that minimize the sum of squared mistakes in Expression f30 and f3. bo in Expression Y. i=l " (4. is the 111 tor of the popu. .Xi) The sum of these squared prediction mistakes over all » observati = Y.X. using the OLS estimators: ffio + OLS h~s its own special notation and terminology. While this meth d is easy. be some estimators extends this idea to the linear reg res ion m del. of estimators of The estimators tors of mistakes for the linear regression model in Expresof then b. is denoted ffi.
The residual for tbe i th observation is the difference coefficients. repeatedly takes in Expression be quite estimators.2. and its predicted value: Ui = Y. . of the population The OLS estimators. of slope (pd. = Po + 13.8 ..11. . that streamline are collected in Key Concept 4. they are the least squares estimates. however. Xi.'~.  y. These all statistical and spreadsheet 4. counterpart are sample of 1)0 between Y. and residual X.f3. and the intercept ~ (30 are 4.6) using calculus. and Y.: 1. i = 1"". and Residuals The OLS estimators of the slope (3.).7) L(X.2 (4. j=j " Y) L: (X. " ..X)(Y. the OLS regression line + is the sample of the population regression line (30 + . programs.8. based on the OLS regression line. i . slope (. and £tj = residuals iti are (49) (4.. Similarly. Fortunately.10) (Ui) are computed from a of tbe (Ui)' Y. Po and 13" are sample counterparts ..X. . and the OLS residuals Ui counterparts of the population errors Ui' Po PjX You could compute tbe OLS estimators and b. This method would the calculation tedious. (4. (4. n }J .6). . n.80).2 Estimating the Coefficientsof the Linear Regression Model 115 The OLS Estimator. and error term given. . i = 1. The estimated intercept sample of n observations unknown true population (Po). These are estimates intercept (.80 and (3. Predicted Values. The OLS formulas and terminology formulas are implemented in virtually These formulas are derived in Appendix ing Expression 130 and PI by trying different values the total squared misderived by minirnizof the OLS until you find those that minimize there are formulas.xj2 i=l The OLS predicted values n (48) Y.4.. is .2.
~~~~=''~~~L~~~.·:··.. • ....•••• .11) STI? is the that it Tes/Score is the average test score in the district studentteacher ratio. Aecordingly. . .' .. ·. .9. (larger classes) is associated with poorer performance ~ The Estimated Regression Line for the California Data Test score 720 700 680 660 640 620 The estimated regression line shows a negative relationship between test scores and the studentteacher ratio. the estimated regression predictsthat test scores will increase by 2. on average. .28 X STR 6 .9 ..28 where X STII. . The slope of 2.. the estimated slope is 2. The "~.2.:'~<~. . over Test Score in Equation over the scatterplot (4.~I":s::t:... ~ . •.. • ·'1 •• .11) indicates is the predicted value based on the OLS regression regression line superimposed line.9 ... Figure 4.1: . the OLS regression line For these 420 Tesiscore = 698. associated with an increase in test scores of 4.••":....2.. A decrease in the studentteacher [= 2 X (2. .116 CHAPTER 4 Linear Regression with One Regressor OLS Estimates of the Relationship Between Test Scores and the StudentTeacher Ratio When OLS is used to estimate scores using the 420 observations the estimated intercept observations is a line relating the studentteaeher ratio to test and in Figure 4.3 plots this OLS of the data previously shown in Figure 4... associated with a decline in districtwide by 2.2. .J' 10 t5 20 ... .28)]..28 means that an increase in the studentteacher student per class is. ..' z 25 Studentteacher .. . .. .::. .1 . . on average. ratio by one test scores ratio by two st udents per teacher per class is....'.'..~:. '.~"tscore =•• 98\.. 30 ratio 600o~~~~7...56 points The negative slope indicates that more st udents on the test.. .. .28 is 698.."{ ...\ ~~ .....28 points.C~'~ It {~.... and (4.2. : . If class sizes fall by one student.. . .28 points all the test. . .
] I). Of course.7 From Table 4. and. but it would not be a panacea. this. we return to the hiring enough teachratio is 19. ratio in these data is 14.5.5. Thus a decrease in class size that w uld place her district close to the 10% with the smallest classes would move her test scores from the 50'h to the 60'h percentile. if her district's test scores to 659. as the figure shows.9 . u ing the data in Figure 4. per teacher to 5? quaiion (4. from 19. ba is for predicting low studentteacher Why Use the OLS Estimator? There arc both pra tical and theoretical ~I' Because • reasons to usc the OLS estimators throughout economics. would move her studentteacher ratio from the 50'h percentile to very near the J O'h percentile. and the social sciences more generally. What if the superintendent the estimate was estimated in were contemplating such as reducing Unfortunately. Recall that she is contcmplating district.11) would not be very useful to her.H w would it affect test scores? According to Equati n (4. the median studentteacher and the mcdian test core is 654. This regression the studentteacher ratio from 20 students on her budgetary a far more radical change. the effect of a radical so these data alone move to such an the srna lie t studentteacher marion on how district arc not a reliable extremely with extremely ratio. cutting the studentteacher to increase ratio by 2 is pre dieted to increase test sc res by approximately arc at the median. this improvement district from the median to just short of the 60'h percentile.2. 654.1. But the of the other factors that determine a district's Iinc does give a prediction (the OLS prediction) of what test scores would ratio.7. at least.4. they are predicted mcnt large or small? According 4. absent those other factors. and she would need to hire many new teacher . mates. based on their studentteacher Is this estimate uperimcnderu's ers to reduce of thc alifornia problem.6 points. it has become tbe comfinance (sec "The results mon language 'Beta' of a analysis t ck ' box).2. This is a big change.2 Estimating the Coefficientsof the Linear Regression Model 11 7 It is now possible to predict the districtwide test Score given a value of the studentteacher right because regression ratio. Is this improvewould move her to Table 4. this prediction will not be exactly performance. Presenting . the studentteacher ratio by 2. A reduction of two students to 17.28 x 20 = 653.3. For example. These data contain no infersmall classes perform. 130 and i the dominant (or regression method used in practice. for a district with 20 students per teacher. cutting the studentteacher pCI' According to these esti(two students ratio by a large amount teacher) IV uld help and might be worth doing depending situation.7 of the slope large or small? To answer be for that district.1. Suppose her district is at the median per class. the predicted test score is 698.1.
Said differently.11ms the expected excess riskfree. The "beta" of a stock has become a workhorse of the investment industry.Rf. must exceed the return on a safe. R .ccm. R. which then paid a $2. For example. 0.Rf.05.3 2. riskier stocks The capital asset pricing model (CAPM) formalizes this idea. like owning firm Web sites. a stock with a {3 < 1 has less risk than the market portfolio and therefore has a lower expected excess return than the market portfolio.can "porttolio'tciu be reduced by holding other stocks in a other words. the expected to the on an asset is proportional Estimated {J expected excess return on a portfolio of all available assets (the "market portfolio").50 dividend during the year and sold on December 31 for $1. should be positive.4 CAPM. The table below gives estimated betas for seven should be measured by its variance. Tn practice. would have a return of R ~ [($105 . "Thereturn o~ ~ll investment is the change in its price plus an~ p~y. In contrast. return. a stock bought on January 1 for $100.50]/$ tOO~ 7. According excess return to the CAPM. the expected return! on a risky investment.3 0. by diversifying your financial holdings.vldend) from the investment as a percentage of Its initial ~nce. Company staples like Kellogg have stocks with low betas.0 1. stock in a company. the riskfree return is often taken to be the rate of interest shortterm U.6 1.5%.S. as other economists all spreadsheet and statisticians.Rf on Rill . the CAPM says that (412) where Rm is the expected return on the market portfolio and f3 is the coefficient in the population regreson sion of R . This means that the right way to measure the risk of a stock is not by its variance but rather by its covariance with the market.o.6 0. The OLS formulas software packages ' making and statistical . however. or Rf. That is.$100) + $2. Much of that risk. government debt.S. According to the WalMart (discount retailer) Kellogg(breakfast cereal) Waste Management (waste disposal) Verizon (telecommunications) Microsoft (software] Best Buy (electronic equipment retailer} Bankof America (bank) Source: Smartbdoncy.ut(dl. and you can obtain estiby mated betas for hundreds of stocks on investment return. investment.118 CHAPTER 4 LinearRegressionwith One Regressor _fA functamental investor idea of modern finance is that an a financial incentive to take a a stock with a f3 > 1 is riskier than the market portexcess needs folio and thus commands a higher expected risk.5 0. Those betas typically are estimated At first it might seem like the risk of a stock OLS regression of the actual excess return on the stock against the actual excess return on a broad market index. on a risky investment. stocks. ~sing OLS (or its variants discussed later in this book) means that you are "speakmg the same language" are built Into virtually OLS easy to use. U. Lowrisk producers of consumer have high betas.
3 Measuresof Fit 119 The OLS estimators also have desirable theoretical properties.1. and the total Sum of squares (TSS) is the Slim of squared deviations of 1) from its average: ESS = ~ (Y. TIle OLS estimator is also efficient among a certain class of unbiased estimators. from their average.13) In this n tation. r are they pread out? The R and the standard error of the regression measure bow well the OLS regres ion line fits the data. jE'1 . The R2 ranges between 0 and 1 and measures the fraction of the variance of Y. Does the regressor account for much or for lillie of the variation in the dependent variable? Are the observations tightly clustered around the regrc sion line.. (4. of Y as an estimator of the population mean. 2 The R2 The regression R' is the fracti n of the sample variance of 1) explained by (or predicted by) X.Y). (4. typically is from its predicted value.. this efficiency result holds under some additional special conditions.1')2 iI " . that i explained by X.15) " TSS= ~ .4. plus the residual iiI' B. They are analogous to the desirable properties.3 Measures of Fit Having estimated a linear regression. the OLS estimator is unbia ed and consistent. however. you might wonder how well that regression line describes the data. 4. the R2 can be written as the ratio of the explained sum of square to the total sum of square. studied in Section 3. 1'1.14) (4.5.(Y. the R2 is the ratio of the sample variance of ance of 1'1. and further discussion of this result is deferred until Section 5..4.. TIle explained su~u of squares (ESS) is the sum of quared deviations f the predicted values of Y. Under the assumptions introduced in Section 4. B to the sample vari Mathematically. . The definitions f the predicted value and the residual (see Key Concept 4. The standard error of the regression mea ures how far Y.2) allow us to write the dependent variable 1) as the sum of the predicted value.
= (that is. The Standard Error of the Regression The standard error of the regression (SER) is an estimator of the standard tion of the regression measure ot the spread error Ui. is the sum of the SSR = "" z LJ"i' i=[ " (4. Tn this case. then Xi explains none of the varia value of 1) based on the regression is just the sample sum of squares is zero and the sum of squared residuals equals the total Sum of squares.16) Alternatively. U ~ I tion of 1) and the predicted average of Y. while an 1/' near 0 indicates the regressor is not very good at predicting Y. devia The units of u. then Y. or SSR.. the does not take on that the extreme values of 0 or I but (ails somewhere in between. An R2 near 1 indi cates that the regressor is good at predicting Y. In general.14) uses the fact that the sample average OLS predicted value equals Y (proven in Appendix 4.. and }j are the same.3 that TSS = ESS + SSR. thus the R2 is zero. i( the units of the dependent . not explained by Xi' The sum of squared squared OLS residuals: residuals.17) It is shown in Appendix 4. 1110 R' is the ratio of the explained sum of squares to the total sum of squares: (4. In contrast.3). the R2 of the regression of Yon the single regressor X is the square of the correlation coefficient between Yand X. 11. Thus the R2 also can be expressed as 1 minus the ratio of the sum of squared residuals to the total Sum of squares: R2 = 1_ SSR TSS' (4.. if Xi explains all of the variation of 1).18) Finally.120 CHAPTER 4 Linear Regression with One Regressor Equation (4. Ui = 0). so that ESS = TSS and Y. For example. the explained = 0. the R' can be written in terms of the fraction of the variance of Y. for all i and every R2 residual is zero R2 = 1. so the S ER is a around the regression line. measured of the observations in the units of the dependent variable.e R' ranges between 0 and 1.
by n . (The mathematics behind this is discussed in Section 5.051. The formula for the SER in Equation (4.051 means that the regressor STR explains 5. Figure 4. .:...6.6.3 superimposes this regression line On the scatterplot of the Test Score and STR data. .1 %. but much variation remains unaccounted for.. the SER is computed using their sample counterparts. the difference between dividing by 17. Because tbe standard deviation is a measure of spread..1.2' (4.Y in Equation (3.19) is similar to the formula for the sample standard deviation of Y given in Equation (3.3 around tbe regression line as measured in points on the test.. relating the standardized test score (TestScore) to the studentteacher ratio (STR). two "degrees of freedom" of the data were lost. As the scatterplot shows. The R2 of this regression is 0.2. whereas here it is n . Because the regression errors ttl. .4..6. the OLS residuals Ul>. This is called a "degrees of freedom" correction because two coefficients were estimated ({3o and {3.1 in Equation (3.=:======::::::=:=~lil " . LU? n . the SER of 18.except that y.) is the same as the reason for using the divisor 17.19) where the formula for s~ uses the fact (proven in Appendix 4. .2 i='] I SSR =_ n .3) that the sample average of the OLS residuals is zero. The reason for using the divisor 17. U/1 are unobserved.6means that tbere is a large spread of the scatterplot in Figure 4.6 means that standard deviation of the regression residuals is 18. and tbe SER is 18. ____ .1. the studentteacher ratio explains some of the variation in test scores. The R2 of 0.11) reports the regression line. the magnitude of a typical regression errorin dollars. Application to the TestScoreData Equation (4.2.1 % of the variance of the dependent variable TesrScore. or by n .2 here (instead of 17..or 5..then the SER measures the magnitude of a typicaldeviation from the regression Linethat is.3 Measuresof Fit 121 variable are dollars. __ iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii.7): It corrects for a light downward bias introduced because two regression coefficients were estimated.). . The formula for the SERis l un SER  SUI wheresi/ 2 = 1".7) is replaced by Uj and the divisor in Equation (3.7) is 17. The SER of ]8.2.7) in Section 3. estimated using the California test score data..2 is negligible.. so the divisor in this factor is I'l .) When /I is large. . This large spread means that predictions of test scores made using only the studentteacher ratio for that district will often be wrong by a large amount.. where the units are points on the standardized test.
" What the low SER is large) does no. being at other centered on the population = 20 and. or luck on the test. by itself. in the sense that. valuesx of X. represents the other factors that lead test scores at a given district to differ from the prediction based on the population sometimes these other factors lead to better performance and sometimes to worse performance tion the prediction bution of is right. They do. at a given value of class size. or. but they do indicate ratio. on thc linear regression Initially.does tell us IS that other in in school quality unrelated that the studentteacher to the important factors influence test scores. the mean of the distri 1. = x. In other words. E(u.4. and the error term u. As shown in Figure 4. have natural interpretations. > 0) iu. the distribution has a mean of zero. these assumptions when OL willand model of might will and the sampling scheme under which OLS provides an appropriate the unknown regression ing these assumptions notgive estimator f30 and f3. 4. is essential for understanding of the regression coefficients. is zero. pler notation. regression line at X. but on average over the popula = 20. (II.This assumption is a formal mathematical statement about the "other factors" contained in Iii and asserts that these other factors are unrelated to X. < 0). useful estimates and understand Assumption #1 : The Conditional Distribution of u.IX. stated mathematically. regression line. this is shown as the distribution 01 u. The population regression is the relationship that holds on average between class size and test scores in the population. as well. more generally. In Figure 4. sim E( u.[X. These factors could include dIfferences the student body across districts. Said differently. however..) = o. of It"conditional on X..4 The Least Squares Assumptions This section presents a set of three assumptions coefficients. = x) = 0. Given X Has a Mean of Zero The first of the three least squares assumptions is that the conditional distribution of given Xi has a mean of zero. given X. imply that this R. This assumption is illustrated in Figure 4. in somewhat . appear abstract.4. say 20 students than predicted per class. The low R2 and high SER do not tell us ratio alone explains only a small part of the variation in test scores in these data.122 CHAPTER 4 Linear Regression with One Regressor What should we make of this low this regression is low (and the regression is either R2 and large SER? The fact that the R2 of "good" or "bad. given a value of X" the mean of the distribuHi tion of these other factors is zero. differences studentteacher what these factors are.4./.
) = 0 is equivalent to that the population line is the conditional mean of Y.Whether cation return with observational to this issue repeatedly.IX.4 The Least Squares Assumptions 123 GBDt Test score The Conditional Probability Distributions and the Population Regression Line 720 700 680 660 Distribution of Y when X == 15 / E(Ylx = 15) Distribution of Y when X == 20 / E(Ylx = 20) E(YIX = 25) 20 Distribution of Y when X == 25 / 25 30 640 620 600 .) = O. proof of this is left as Exercise 4.6). (X = 1) or to the control group (X using a computer Random program that X is distributed assignment independently = 0).. The mean of the conditional distribution of test scores. in Figure 4. X is not randomly assigned in an experiment.4. assigned. subjects are randomly experiment.X).~~~~=~~~~l. given the studentteacher ratio.. and we the best that can be hoped for is that X is as if randomly sense that E(u. has a conditional mean of zero for all values of X. in the precise and judgment. E(YI X). U = Y .~~~~. that the conditional In observational mean of u given X is zero. The random that uses no information of all personal makes X and u independent.4. data requires careful thought this assumption holds in a given empirical appli . is the population regression line f30 + f31X.IX. and 25 students. Y is distributed around the regression line and the error. given X. data. As shown assuming (a mathematical The conditional domized group is done ensuring subject. the assumption regression that E(u. mean of u in a randomized controlled experiment. In a ran controlled assigned to the treatment assignment typically about the subject.(/30 + f3. 20. characteristics of the which in turn implies Instead.~~~~i 10 IS Studentteacher ratio The figure shows the conditional probability of test scores for districts with class sizes of 15. At a given value of X.
ii. Because correlation is a measure of linear association. are independently and identically distributed (i. .i. i = 1" . then the two random variables have zero covariance and thus are uncorrelated [Equation (2. survey data from a randomly chosen subset of the population typically can be treated as i..)..27)]. The i.d. then it must be the case that E(u.i. If they are drawn at random they are also distributed independently from one observation to the next. is random)..d.i. ii. are Correlated.i. given X. If a sample of n workers is drawn from this population. If she picks the techniques (the level of X) to be used on the ilh plot and applies the same technique to the i th plot in all repetitions of tbe experiment.~ Y. the conditional mean of u. For example. then (X. .) is nonzero. n.) = O. Y. does not change from one sample to the next. tion is violated.) across observations.. u. this implication does not go the other way. is nonrandom (although the outcome Y.). Assumption #2: (X. and imagine drawing a person at random from the population of workers. might be nonzero. assumption is a reasonable one for many data collection schemes. i = 1"". i = 1.d. If the observations are drawn by simple random sampling from a single large population.3 that if the condi_ tional mean of one random variable given another is zero. X and Y willtake on some values).).i. . One example is when the values of X are not drawn from a random sample of the population but ratber are set by a researcher as part of an experiment. i = 1. suppose a horticulturalist wants to study the effects of different organic weeding methods (X) on tomato production (Y) and accordingly grows different plots of tomatoes using different organic weeding techniques.. For example. Recall from Section 2. Thus X. That randomly drawn person will have a certain age and earnings (that is. If X. n. As discussed in Section 2. necessarily bave tbe same distribution. if X. It is therefore often COnvenient to discuss the conditional mean assumption in terms of possible correlation between X. then the value of X. Thus the conditional mean assumption E(u. and u. For example. let X be the age of a worker and Y be his or her earnings.d. then (X" Y. and a. or carr (X.).5).IX. and are un correlated. n. Not all sampling schemes produce i. and are correlated.i. they are i. and u..) = 0 implies that X.i. are uncOrre_ lated.IX. even if X. then the conditional mean assump. .d.5 (Key Concept 2. . The results presented in this chapter developed . this assumption is a statement about how the sample is drawn. n. so the sampling scheme is not i. however. observations on (X" Y.124 CHAPTER 4 Linear Regression with One Regressor Correlation and conditional mean... Are Independently and Identically Distributed The second least squares assumption is that (X" Y.d.).d. However. that is. are i.
. sampling is when observations refer to the same unit of observation Over time.For example. This is an example of time series data. Imagine collecting . Large outliers can make OLS regression results misleading.d.. thereby circumventing any possible bias by the horticulturalist (she might use her favorite weeding method for the tomatoes in the sunniest plot).d.6 applies to the average. Assumption #3: Large Outliers Are Unlikely The third least squares assumption is that large outliersthat is. . then the law of large numbers in Key 1 n . are i. quite special.i. Another way to state this assumption is that X and Y have finite kurtosis.) are i.i.y )2 .4. Concept 2. This potential sensitivity of OLS to extreme outliers is illustrated in Figure 4. Y. Time series data introduce a set of complications that are best handled after developing the basic tools of regression analysis. observations with values of Xi. and a key feature of time series data is that observations falling close to each other in time are not independent bnt rather tend to be correlated with each other.3 showing that s ~ is consistent. we might have data on inventory levels (Y) at a firm and the interest rate at which the firm can borrow (X). The case of a nonrandom regressor is.2:i~l("Y.9) states tliat the sample variance s~ is a consistent estimator ofthe population variance a} (s~ ..i. Y.a key step in t I proo f In ie Appendix 3. the level of X is random and (Xi.. they might be recorded four times a year (quarterly) for 30 years...d.however.. 1'.lf}J. This pattern of correlation violates the "independence" part of the i. modern experimental protocols would have the horticulturalist assign the level of X to the different plots using a computerized random number generator. for example.d. a}). such as a typographical error or incorrectly using different units for different observations. regressors are also true if the regressors are nonrandom. jJ. One source of large outliers is data entry errors.or both that are far outside the usual range of the dataare unlikely.. In this book.5 using hypothetical data. We encountered this assumption in Chapter 3 when discussing the consistency of the sample variance. if interest rates are low now. Specifically.4 TheLeastSquaresAssumptions 125 for i. Equation (3.!!.i. is finite. they are likely to be low next quarter.i. the assumption that large outliers are unlikely is made mathematically precise by assuming that X and Y have nonzero finite fourth moments: o < E(Xt) < 00 and 0 < E(YI) < 00. . When this modern experimental protocol is used.d. Another example of noni. . and the fourth moment of Y... where these data are collected over time from a specificfirm. assumption. The assumption of finite kurtosis is used in the mathematics that justify the largesample approximations to the distributions of the OLS test statistics. For example.
drop the observation Data entry errors aside. but inadvertently height in centimeters recording one student's correct the instead... used distributions such as the normal distribution rules out those distributions. have four of matter.3. then it is unlikely that statistical inferences using OLS by a few observations. the best you can do on a standardized size and test scores have a finite range. ... some distributions IHlVe infinite moments.. they necessarily generally... 2000 • 1700 1400 1100 800 500 200 OLS regression line including outlier L.. __ ' 40 excluding outlier L~::_L__ __::' oL 30 50 60 X 70 data on the height of students in meters. Still. then you can either error or.:::O~LSregrc'. The least squares throughout assumptions play twin roles..:... and we this textbook.::_ . and this assumption finite fourth moments will be dominated If the assumption holds. the assumption of finite kurtosis is a plausible one in many applications with economic data. One way to find outliers is to plot your data..::. but the OLS regression line estimated without the outlier shows no relationship.. Use of the Least Squares Assumptions The three least squares rized ill Key Concept return to them repeatedly assumptions for the linear regression model are summa4. commonly moments.126 CHAPTER 4 Linear Regression with One Regressor tmImIID The Sensitivity of OLS to Large Outliers y This hypothetical data set has one outlier.. If you (rom your data set. as a mathematical test is to get all the quesclass More fourth tions right and the worst you can do is to get all the questions wrong... if that is impossible... Class size is capped by the physical capacity of a classroom. decide that an outlier is due to a data entry error."si~online:/T..The OLS regression line estimated with the outlier shows a strong positive relationship between X and V. Because have finite kurtosis.=:::::::::::.
where 4. then. you those outliers carefully to make sure those observations and belong in the data set. Their second OLS regression. The error term it. The third assumption should examine rectly recorded modification serves as a reminder that OLS. n. have nonzero finite fourth moments. i = 1.2. the first least squares to consider in practice. as is shown have sampling distribulets us develop intervals using the in the next section. . in large samples the OLS estimators methods for hypothesis OLS estimators.lx. has conditional mean zero given and identically X. One reason why the first least squares assumpis discussed in Chapter 6. tion might not hold in practice are discussed in Section 9. . Therefore.5 Sampling Distribution of the OLS Estimators Because the OLS estimators ~o and ~ I are computed from a randomly drawn sample. the estimators the sampling possible random themselves are random variables with a probability distributionthat describes the values they could take over different presents tbese sampling distributions .. Y.) draws 2. iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiO====~:::::::::::== . Tbis section distribution .). are independent from their joint distribution.: E(u.3 1. 4.. (Xi. pendence to consider holds in an the regres Although it plausibly holds in many crosssectional 2 require data sets. + uil i :::::1.i. If your data set contains large outliers. Their first role is mathematical: 1f these assumptions hold. samples. just like the sample are cor mean..4. It is also important application. 3. .. this largesample normal distribution confidence role is to organize the circumstances that pose difficulties for assumption is the most reasons As we will see.) = 0.. In turn. important testing and constructing tions that are normal. distributed (i. I n..5 Sampling Distribution of the OLS Estimators 1 27 The Least Squares Assumptions }j = /30 + [3 Lx. can be sensitive to large outliers. Large outliers are unlikely: and Xi and Y. the indefor some assumption is inappropriate sion methods developed under assumption applications with time series data.d. and additional whether the second assumption for time series data.
the probability of these diffcrent values is summarized _ I piing distribution. it is possible to make certain tatcmcnt bout it that hold for all 11.not a simple average. but in large In small samp Ies. Rccallthc discuss~n in Sections 2.7) for f3"you will see that it. If you examine the numerator 10 Equation 0. (11 _ Y)( X. 1'1" Bccause Yis calculated using a randomly drawn sample. This argument invokes the central limit theorem. then m re can be said about tbe sampling distribution. are unbiased estimators of . an be rnplicated when the sample size is small. The pro f that ~.Y an be Complicated when the sample size is small.7. ~o and ~. of the unknown interccpt. .3.8o and sl pe.8.128 CHAPTER 4 Linear RegreSSion WI . In particular.is a type of average. but an average of the pr duct. If the samp!e is sU~iciently large. like Y. under the least qua res assurnpti ns in Key onccpt 4. In particular. Although the sampling distributi n f. ver to the L estimators ~o and~. are approxl1nate y norma The Sampling Distribution of the OLS Estimators Review of the sampling distribution of Y. E(Y) = I'Y.3. and the proof that ~o is unbiased is left as xercise 4. _ X).Technically. so Y is an unbiased estimator of 1'1" If 11 is large.ibution of averages (like Y). of thc population regression line. an CStimator of the unknown population mean of Y.5 and 2. . the probability of the e different values is ummnrized in their sampling distributions. is unbiased is given in Appendix 4.TI..80 and . I because of the centralllll1ltthcorcJ11.4. the central limit theorem concerns the dist. Because the OL e timat r are calculated using a random sample. 'Fis a random variable thattakcs on differcnt value from One samin its sam. ~o and are random variables that take on different value fr rn ne sample to the next. the mean of the sampling di tribution is tv. Y. As discussed . are normal in large samples.the centrr Ilimit the rem (Section2. This implies that the marginal distributi n of ~o and ~.::plc average. inn with One Regressor amples th ' ey these distributions are complicated. by the central limit the rem the sampling distribution of f30 and f3.20) that is.C c idcas carr (4.)..6) states that this distribution is approximately normal. is well approximated by the bivariate normal distribution (Section 2. .8. Although the sampling distribution of ~o and~. '0 . ~j Thesampling distribution of ~o and ~ r. pe to the next: .6 about the sampling distribution of thc a.ln other words. are f30 and f3. the mean of the sampling distributi n of ~o and ~. it is possible to make certam statements about it that hold for all n_ In particular. I . that is.
.4 and f31 have ~ jointly normal sampling distribution..1 (421) The largesample normal distribution of &0 is N(f3o. the central limit theorem applies to this average so that. the OLS estimators are consistentthat is. where the variance of this distribution (T? is . u x ].21) so the smaller is (Tl. f30 and f3h when n is large. The largesample normal distribution of f31 is N(f31. of f31 in Equation "' of &. . Mathemati(4. in general. larger is the denominator in Equation (4.3 hold. like the simpler average Y. (422) further in Appendix 4. ). modern econometric to be reliable. so the distribution around their means. so we will treat the normal approxi mations to the distributions of the OLS estimators as reliable unless there are good reasons to think otherwise. when the sample size is large.. The normal approximation samples is summarized to the distribution of the OLS estimators in large in Key Concept 4.4. (Appendix 4. with high probability. In virtually all applications.4 is that.21) is the to the square of the variance of Xi: the larger is var(Xi). This criterion carries over averages appearing in regression analysis..) A relevant question in practice is how large n must be for these approximations sufficiently normal distribution. f30 and f31 will be close to the true population coefficients f30 and f3.6. the larger is the variance of Xi. (T~o)' where 2 _ 1var(Hiui) (T~o . The results in Key Concept 4.3 summarizes the deriva tion of these formulas.. large for the sampling distribution of Y to be well approximated to the more complicated n > 100..3. This is because the variances (T~o and to zero as n increases (Tl of the estimators decrease will be tightly concentrated Anotber (n appears in the denominaof the OLS estimators tor of the formulas for the variances).: least squares assumptions Po and PI &0 . then in large samples 4..ply tha. this implication inversely proportional arises because the variance .To get a better sense .4 iU. we suggested that n = 100 is by a and sometimes smaller n suffices. In Section 2.7.n [E(H1JF' where H. (T7. {31' Cllm:mm in Key Concept 4.4. the smaller is the variance cally. it is normally distributed in large samples. implication of the distributions in Key Concept 4. = 1  [ E(X1) X.5 Sampling Distribution of the OLS Estimators 129 LargeSample Distributions of Ifth.
the more precise is f31' The distributions in Key Concept 4.the larger the variance of X. . ..__ . Thi can be seen mathemat ically in Equation (4.21) because 1/... 0 0 • 196 f • o o 194 L__ 'J. The y 206. ....:e. the smaller is the variance of ffit.130 CHAPTER 4 Linear Regression . 202  200 f . . The data points indicated by the colored dots are the 75 observations closest to X. if thc errors are smaller (holding the X's fixed). 198 f o . but not dcnominator. .. •• • . were smaller by a factor of onehalf but the X's did not change. ~ .of all 1/. • •• • • o 01 .. Similarly.. The black dots represent a set of Xis with a large variance.. and the Varianceof X The colored dots represent a set of Xis with a small variance..6. would be smaller by a factor of onehalf and would be smaller by a factor of onefourth (Exercise 4.4 also imply that the smaller is the variance of the error U. The normal approximation to the sampling distributi n f ~o and ~1 is a powerful tooL With this approximation in hand. . .... :~ . which have a larger varia~ce than the colored dots.. then the data will have a tighter scatter around the population regression line so its slope will be estimated more precisely._ 101 102 103 X 100 of why this is so. 204 f regression line can be estimated more accurately with the black dots than with the colored dots. 0. allf cr' .__ __ 97 98 99 ___l' __ ~:_:_l'_::_____. look at Figure 4... enters the numerator.13).. we are able to develop methods for making inferences about the true population values of the regression coefficients using only a sample of data. Stated less mathematically. : :a: : . WI 'th One RegressOr CiBID The Variance of P.. ... 0 . which presents a catterplot of 150 artificial data points on X and Y.. Suppose you were asked to draw a line as accurately as possible through either the colored or the black dotswhich would you choose? ft would be easier to draw a precise line through the black dots. . . then <rj.... • 1. .
X. determines the level (or height) of the regression line. Key Concept 4.that is. f3" is the expected change in Yassociated with a oneunit change in X. X and Y have finite fourth moments (finite kurtosis). The first assumption is that the error term in the linear regression model has a conditional mean of zero. to its standard error. Stated more formally. There are many ways to draw a straight line through a scatterplot. Moreover.) are i. The results in this chapter describe the sampling distribution of the OLS estimator. By themselves.Summary 131 4.This assumption yields the formula. These important properties of the sampling distribution of the OLS estimator hold under the three least squares assumptions. Y. The intercept. The second assumption is that (X" Y. then the OLS estimators of the slope and intercept are unbiased.. f30 + f3. hypothesis tests. If the least squares assumptions hold. f3o. Summary 1. given the regressor X.X. as is the case if the data are collected by simple random sampling. however. or to construct a confidence interval for f3. and a single regressor. if n is large.i. The third assumption is that large outliers are unlikely. are consistent.1 summarizes the terminology of the population linear regression model. these results are not sufficient to test a hypothesis about the value of f3. The reason for this assumption is that OLS can be unreliable if there are large outliers. and confidence intervalsis taken in the next chapter. This stepmoving from the sampling distribution of ~.. is the mean of Y as a function of the value of x''Ille slope. The population regression line.4.4.for the variance of the sampling distribution of the OLS estimator.6 Conclusion This chapter has focused on the use of ordinary least squares to estimate the intercept and slope of a population regression line using a sample of n observations on a dependent variable. presented in Key Concept 4. Taken together. then the sampling distribution of the OLS estimator is normal. the standard error of the OLS estimator. and have a sampling distribution with a variance that is inversely proportional to the sample size n. but doing so using OLS has several virtues. .d. Doing so requires an estimator of the standard deviation of the sampling distribution . This assumption implies that the OLS estimator is unbiased. the three least squares assumptions imply that the OLS estimator is normally distributed in large samples as described in Key Concept 4.
The R' and standard error of the regression (SER)are meas~res of how close the values of Y.1 Explain the difference between hi and 13.). (2) consistent. model: (1) The have a mean of zero conditional on the regressors Xi. between the residual iii and ihe regression error IIi. Key Terms linear regression model with a single regressor (110) dependent variable (110) independent variable (110) regressor (110) population regression line (11. If these assumptions hold. 4.. Xl i = 1. n by ordinary least squares (O.LS). and between the OLS predicted value Y. with a larger value indicating that the l)'s are closer to the line. .the OLS estimators ho and are (1) unbiased. The popu Ia t IOnr .. .d. assumption is valid .2 For each. Ui. are to the estimated regression line.132 CHAPTER 4 Linear Regression with One Regressor · egression line can be estimated using sample observations 2. least square s assumptIOn. and E(Y.and (3) large outliers are unlikely. random draws from the population. (2) the sample observations are i.IX.i.. The R is between o and 1. (). provide an example '.0) population regression function (110) population intercept (110) population slope (110) population coefficients (110) parameters (110) error term (110) ordinary least squares (OLS) estimators (114) OLS regression line (114) sample regression line (114) sample reg res ion function (114) predicted value (114) residual (l15) regression R2 (119) explained sum of squares (ESS) (119) total sum of squares (TSS) (119) sum of squared residuals (SSR) (120) standard error of the regression (SER) (120) least squares assumptions (122) Review the Concepts 4.Th: OLS estimators of th~ r:g~ession intercept and slope are denoted 130 and 13" 3. then p roviid e an example in which the assumption f aJ'1 S . 4. and (3) normally distributed when hl the sample is large. in which tbe '. There are three key assumptions for the linear regression regression errors.The standard error ol the regression is an estimator of the standard deviation of the regression error.
) d. SER = 10. prediction for the change in the classroom c. estimates .41 where Weight is measured a. Last year a classroom What is the regression's average test score? had 19 students. What is the regression's prediction for the increase in this man's weight? c.81. What is the sample standard deviation of test scores across the 100 classrooms? 4.82 the OLS regression test Testscore = 520.94 X Height. R2 = 0. = 99.5.. scores from 100 thirdgrade using data on class size (CS) and average classes.1 Suppose that a researcher. and SER.4.5. = 0. prediction for a.g. What is the regression's + 3.2 (Hint: Review the formulas for the R2 and SER. A classroom has 22 students.) . What is the sample average of the test scores across the 100 classrooms? (Hint: Review the formulas for the OLS estimators.5.3 Sketch a hypothetical scatterplot of data for an estimated scatterplot regression with with R 2 R2 = 0.08. R2 ~ 0. tall? 65 in. tall? b..Exercises 1 33 4. weight prediction for someone who is 70 in. estimates from this new centimeterkilogram (Give all results. What is the regression's that classroom's average test score? b. estimated R2.) men is selected from Suppose that a random sample 01'200 twentyyearold a population and that these men's height and weight are recorded.2. in pounds and Height is measured in inches.. tall? 74 in. over the course of a year.9. and this year it has 23 students. The sample average class size across the 100 classrooms is 21.4 xes. A man has a late growth spurt and grows 1.5 in. A regression of weight on height yields w. and kilograms. Suppose that instead of measuring weight and height in pounds and inches these variables What are the regression regression? are measured in centimeters coefficients. SER = 11. Sketch a hypothetical of data for a regression Exercises 4.
age (measure in ye . ] 20 rrunutes.). .3%.2.6 X Age. Given what you know about the distribution of earnings. Suppose that the value of f3 is less than 1 for a particular stock. for the worker? do you think it is plausible that the distribution of errors in the regression is normal? (Hint: Do you think that the distribution is symmetric or skewed? What is the smallest value of earnings. Each student is randomly assIgne I . In a given year.use the estimated value of f3 to estimate the slack' expected rate of return. . What are the units of measurement for the SER? (Dollars? Years? Or is SER unitfree?) c. a. d . Is it possible that variance of (1/ . For each company listed in the table in the box.7 + 9. b.023.1. . R2 = 0. . 4. b.) 4.2. The standard error of the regression (SER) is 624. measured in dollars) On A regressIOn 0 av d. a.134 CHAPTER 4 Linear Regression with One RegresSor 43 .) c. The regression R2 is 0.Rf) for this stock is greater than the variance of (R". f erage weekly earnings (AWE. and is it consistent with a normal distribution?) g.7 and 9. . What is the regression's predicted earnings for a 25yearold A 45yearold worker? e.023.6 mean.5 A professor decides to run an experiment to mea ure the effect of time pressure on final exam scores. He gives each of the 400 students in his course the same final exam. The average age in this sample is 41.R f) for this stock is greater than the variance of (1/". but some students have 90 minutes to complete the exam while other Slave. kers aeed 2565 yields the followmg: nme war co AWE = 696. Suppose that the value of f3 is greater than I for a particular stock. ars) using a random sample of collegeeducated fUll.R.)? (Hint: Don't forget the regression error. Explain what the coefficient values 696. Will the regression give reliable predictions for a 99yearold worker? Why or why not? f.4 Read the box "The 'Beta' of a Stock" in Section 4.6 years.5% and the rate of return on a large diversified portfolio of stocks (the S&P 500) is 7.1. the rate of return on3monlh Treasury bills is 3. Show that the variance of (R . What are the units of measurement R2? (Dollars? Years? Or is 1/2 unitfree?) d. SER = 624. . What is the average value of AWE in the sample? (Hin: Review Key oncept 4.I/.
. = f30 + f3. 1).20. Why will different students different values of Ui? b. A linear regression 4. Compute the estimated gain in score for a student who is given an additional 10 minutes on the exam.. let Xi denote the amount of time that the student has to complete the exam (Xi = 90 Or 120). II.] uate the terms in Equation 4. Suppose you know that f30 = O.IX.4 continue to hold? Which change? Why? (Is f3.8 4. regression the estimated Y. where (Xi.) = O. A linear regression b. 4.Xi + u.6 Show that the first least squares assumption.Xi + u. = f30 + f3. and consider the regression model 1.9 a. represents.. implies that Show that &0 is an unbiased estimator of f30. 4). = 49 + 0. [Hint: Eval for the largesample (4. denote ith 135 the number of points scored on the exam by the student (0 so 1..) Suppose that all of the regression assumptions isfied except that tbe first assumption is replaced parts of Key Concept in Key Concept 4..24 Xi' regression's prediction for the average the exam. E(UilX. Let 1.i.d.) 4. which is shown in Appendix 4. U. is a is N(O.Derive a formula for the least squares estimator of f3. Ui) are i. Explain why E(UiIXi) = 0 for this regression in Key Concept is model.3 are satisfied. is N(O.X.Does this imply that &. = f30 + f3. have a. 4..Exercises one of the examination times based on the flip of a coin. and X.7 = f30 + e. Show that R2 yields R2 = O. Explain what the term u.11 Consider the regression model 1. = O. Derive an expression assumptions in Key Concept 4. c.3. = O.4? What about &o?) 4. when X = 0. The estimated I. Are the other assumptions d. Repeat Compute Score of students given 90 mioutes to complete for 120 minutes and 150 minutes. a. normally distributed in large samples with mean and variance given in Key Concept 4. + "i. is unbiased.21). Whit. so 100).3 are satwith E(UilX. = O? U. a. Show that tbe regression b. variance of &.h 4. Bernoulli random variable with Pr(X = 1) = 0.) = 2. . When X = 1.10 Suppose that yields &.x.. (Hint: Use the fact that &.3 satisfied? Explain. 1. E(1.
Show that the large sample ..1 [1'1' 'Th' '. where K is a nonzero constant and ( y" Xi) satisfy the three least squares assumptions.)". IS equation IS the (31 variance given in equation (4.) a. R2~rh· b. as their highest degree.13 Suppose that Y. Does age account for a large traction of the variance in earnings across individuals? Explain.14 Show that the sample regression line passes through the point (X. (These are the same data as in CPS92_08 but are limited to the year 2008. c.the squared Sho X a v. Show that ~I ~ rXY(syjsx). lder workers have more job experience. where I'XY is the sample correlation between X and Y. (Generally.pearsonhighered. Predict Alexis's earnings using the estimated regression. and Sy and Sx are the sample standard deviations of X and Y. Empirical Exercises E4. Predict Bob's earnings using the estimated regression. 2 2Iv"[IX.1 for 2008.you will find a data file CPS08 that contains an extended ver ion of the data set used in Table 3. variance of (3 I IS given by (J. fullyear workers. show thai value of the sample correlation bel ween an . A detailed description is given in CPS08_Description..com/slock_watsonl. Derive a formula for the least square s w that the regression R2 in the regression of Yon X is. also available on the Web site. Bob is a 26yearold worker. Suppose you now estimator of (31' 4.rt rat IS. .136 CHAPTER 4 Linear Regression with One Regressor k that b. What is the estimated intercept? What is the estimated slope? Use the estimated regression to answer this que tion: H w much do earnings increase as workers age by 1 year? b..12a.V). leading to higher productivity and earnings. ~ K ./B. ]' • tnt. you will investigate the relationship between a worker's age and earnings. (3 0 ~ 4.A. age 2534.) In this exercise. . Run a regression of average hourly earnings (A H £) on age (Age).21) multiplied by <2. Alexis is a 30yearold worker. c. ~ (30 + (31 Xi + KUi.. It contains data for fulltime. [vnr('\i).] 4. 4. Show that the R2 from the regression of Yon X is the same as the R2 from the regression of X on Y. with a high school diploma or B.1 On the text Web site http://www.
(Hinl: What is the sample mean of Beaufy?) c. 24(4): 369376.Empirical Exercises 137 E4. "Beauty in the Classroom: Instructors' Pulchritude and Putalive Pedagogical Productivity. while Professor Stock's value of Beauty is one standard deviation above the average." e. also available on the Web site. a. Construct a scatterplot of average course evaluations (Course_Eval) on the professor's beauty (Beamy). What is the estimated intercept? What is the estimated slope? Explain why the estimated intercept is equal to the sample mean of CourseEval. Professor Watson has an average value of Beauty. In this exercise. course characteristics. Run a regression of average course evaluations (Course_Eval) on the professor's beauty (Beauty). Comment on the size of the regression's slope.2 On the text Web site hltp:/Iwww.you will find a data file TeachingRatings that contains data on course evaluations. Is the estimated effect of Beauty on Course_Evallarge or small? Explain what you mean by "large" and "small.pearsonhighered. d. and professor characteristics for 463 courses at the University of Texas at Austin. complete I These data were provided by Professor Daniel Hamerrnesh of the University of Texas at Austin and were used in his paper with Amy Parker. E4.pearsonhighered. Does Beauty explain a large fraction of the variance in evaluations across courses? Explain.com/stock_watsonl. In this exercise.you will find a data file CollegeDistance that contains data from a random sample of high school seniors interviewed in 1980 and reinterviewed in 1986.' A detailed description is given in TeachingRatings_Description. One of the characteristics is an index of the professor's "beauty" as rated by a panel of six judges. . on average. so that students who live closer to a fouryear college should. Does there appear to be a relationship between the variables? b. (Proximity to college lowers the cost of education. you will use these data to investigate the relationship between the number of completed years of education for young adults and the distance from each student's high school to the nearest fouryear college.3 On the text Web site hltp:llwww. you will investigate how course evaluations are related to the professor's beauty." Economics of Education Review. August 2005. Predict Professor Stock's and Professor Watson's course evaluatious.com/stock_watson/.
mance and the Sources or Growth"• Ioumol 01 //JOII EeonOl1l1es. . D es Malta look like an outlier? e.) What is the estimated intercept? What IS the estimated lope? Use the esu.com/stoek_watsonl. essor ecrna Rouse of Princeton University And were use In paper Democratlzatlon or D' .you will investigate the relationship between growth and trade.s a.58: 261300. .138 CHAPTER 4 Linear Regression with One Regressor f hi hereducation. April 1995. run a regression of Growth on TradeShare.less find Economic Sllllislics. "L rversrcn'. has a trade share much larger than the other countries.you will find a data file Growth that contains data on average growth rates from 1960 through 1995 for 65 countries along with variable that are potentially related to growth. One country. cents. yza. t'on also available on the Web sue. Using all observations. Find Malta on the scatterplot. A detailed deseripti n is given in Growth_Description. What is the value of the standard error of the regression? What are the units for the standard error (meters.4 On the text Web site http://www. b. years. What is the estimated slope? What is the estimated intercept? Use the "These data were provided by Prof C ili d' her " '. mated regression to answer this question: How does the average value of years of completed scho ling change when colleges are built close to where students go to high school? I . dollars. Docs there appear to be a relationship between the variables? b. d. Malta. Construct a scanerplot of average annual gr wth rate (Grow/h) on the average trade share (TradeSltare).oumat of BlIsil.. 2 ge . er w'Ll 11 ( B k r essor ass Levine of Brown University nnd were used 111 ISpap h J 1 lOTS en ec and Norman Loa "Fi F' cial '2000 .pearsonhighercd.) A detailed description is given in Colle more years 0 tg . also available on the Web site. Does distance to college explain a large I'racti n of the variance in educational attainment across individuals? xplain. ? 10 . e Effect of Community alleges on Educational Att31n~nThent. In this exerci e. grams. 12(2): 217224. Predict Bob's years of completed education using the estimated regression. Bob's high school was 20 miles [rom the ncare t c liege. Dis! = 2 means that thc distance i 20 miles. Or something else)? E4. Distance_ D esenp I l ' a. Run a regression of years of completed ~ducati n (ED) on distance to the neares t college (Dis/) where Dist IS measured 111 tens of miles (For example. ese data were provided by P of R' . How would the prediction change if Bob lived 10 miles [rom the nearest college? c.
ca. and the percentage of students who are English learners (that is. a standardized test administered to fifthgrade students. The studentteacher ratio used here is the number of students in the district divided by the number of fulltime equivalent teachers..23) (4. e. Where is Malta? Why is the Malta trade share so large? Should Malta be included or excluded from the analysis? APPENDIX  _ 4. students for whom English is a second language). TIle demographic variables include the percentage of students who are in the public assiuance program Cal Works (formerly AFDe) . APPENDIX 4.24) .1 The California Test Score Data Set The California Standardized Testing and Reporting data set contains data on test performance. the percentage of students who qualify for a reduced price lunch.)2 P [Equation (4.bo .~1 i .0. Estimate the same regression excluding the data from Malta.Derivation of the OLS Estimators 139 regression to predict the growth rate for a country with a trade share of 0.6)]. The data used here are [rom all 420 K6 and K8 districts in California with data available for 1999. number of teachers (measured as "fulltime equivalents"). and student demographic backgrounds.2. Answer the same questions in c. number of computers per classroom.2 Derivation of the OLS Estimators This appendix uses calculus to derive the formulas for the OLS estimators given in Key Concept 4. and expenditures per student. To minimize the sum of squared prediction mistakes 2:.5 and with a trade share equal to 1.bIX. Demographic variables for the students also are averaged across the district.Test scores are the average of the reading and math scores on the Stanford 9 Achievement Test. d.cde. All of these data were obtained from the California Department of Education (www. school characteristics. School characteristics (averaged across the district) include enrollment.gov). first take the partial derivatives with respect to bo and b( (4.
I b xi d ~ are the values of bo and b. Bec~use Ii. l ill iI *±XHW i=1 ~o=y~. Xl j"'l (427) (428) given in Key Concept 42.iI).24) equa zero.has the normal sampling distribution given in Key Concept 4.settmg these derivatives equal tozer . . the and denominator in Equa Equations (4.X " 2: (Xi. in large samples.. we show that the OLS estimator ~I is unbiased and. . eqUIvalent y. "th One Regressor CHAPTER 4 Linear Regression WI LS The O II _ estimators.3 Sampling Distribution of the OLS Estimator In this appendix.140 ..1') i.27) by I. Equauons '.  (429) = il. I the values of bo and b] or W ic 1 the denvatives' or.X)(Y.so the numerator of the formula for ~Iin Y == f31(Xi  X) + u. APPENDIX   4. Lj:I(Y.27) and (4. and dlVldlOo Y .X){Lli .28) are the formulas for ~o and~. (423) and (4. ." 0..Y. formula ffil ..iloX 11i=l "I II "  _ il'n£'" i=l " 1 Xf ~ =0 . Representation of ~ t in Terms of the Regressors and Errors We start by providing an expression for }'j = 130 + (3\Xi + Uh 1i Equation (4. (426) Solving this pair of equations for ~oand ~ I yields l±xyxy n I I " 2:(X. _ bu J I" Accordingly.::0 sxyJsl 11  is obtained by dividing the numerator tion (4. mustsatisfv collecting terms. f30 and (31.". 2: (Xi 1"'\ II X)' + 2: (Xi i=l II . .. that minimize (30 an ###BOT_TEXT###gt; f h' I .27) is ~I in terms of the regressors and errors. . 0 b 11 shows that the OLS estimators.4. " In A the two equations (425) 2:X.
lx" .) = E(u.~l(X. . Because (4./.(X.1 I Li=IXi . . .~.:::::::=====". E(~l  f3rl ~ E[E(~. X. . SUbstituting in  L. is unbiased.X)Uj into the final Y) =f3Il'='l (XX)'+ 2:" I X)Ui' SUbstituting (4.:¥)u..). that unbiased. so that E(~. ••• . the term in the numerator ~ iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii~~. is ohtained by taking the expectation of both sides of Equation (4..~I (~. (4.31) where the second equality in Equation (4.X)Uh where . It follows that the conditional is'~1 is conditionally expectation in large brackets in the second line or Equation (4. First consider the numerator large...   u) = 2:. other is distributed independently than i. j. to a close approximation.) = O. Thus. X. E(UiIX.. of ~.f3J!X" .Lx.) = f31. By the law of iterated expectations. if the sample size is of X is nearly equal to f. Equivalently. I X.29) 2:'~ (XX)(Y . X is consistent.f31IxJ. .. By the first least however.30) Proof That f31 Is Unbiased The expectation Thus.'".(X.(X."" Ar..31.27) yields 2: =1 (.)] ~ 0..u) = yields '11 I L.nX]u = O. Xn) = f31. that LargeSample Normal Distribution of the OLS Estimator The largesample 4.<..X)(Ui .4) is obtained normal approximation by considering to the limiting distribution of ffi j (Key Concept the behavior of the final term in Equation of this term.II Equation (4. 2::~. ..X)u= .3).) = O. the final equality follows from the definition of X which implies that c 2:.31) is zero. given Xl. so E(u.Sampling Distribution of the OLS Estimator N ow 11 [ 141 2:1/ ..30). X. .~J(Xi I expression "." (X _ X)u = '::"/=..) follows by using the law of iterated Ui expecta tions (Section 2.30).X)(u.. Ecfillx isj.. so that E(~. .t.. of X for all observations squares assumption. i this expression in turn into the formula for ffil in Equation (4. By the second least squares assumption.
142 CHAPTER 4 Linear Regression with One Regressor Equation (4. in large samples.~. and the total sum of squares is the 1 sum. 15).in large samples. the sample variance is a consistent population variance. Thus the distribution of v is well approximated by the N(O. .(X. this is the sample Van.thus /I 11 " of 130lets us write the OLS residu ~ili= ~(Y. . distribution of ~1  13\ Oil y/var(xi): /31 is._ L. N(13I. But the definitions 'C" ". v satisfies all the requirements (Key Concept 2.17)].Y) . . at. I). is. by the third least squares of the central distributed N(O.i. Y. of squared residuals an d tlte exp Iained sum of squares [the ESS TSS and SSR are . X). which is inconsequential if" is large).so O. f30 f3IXi~(Y. . .30) is the sample average V. is (T~ is i. Next consider the expression in the denominator in Equation (4. The = var[(Xi !LX)Ui]._X)=O.30). the sample covariance s::x rete between the OLS resod ua Is an d t Ite regressors IS zero.(lJY)=O . ° II defined In Equations (4. which.8)]. so in large samples it is arbitrarily close to the population Combining these two results. and (4. so that the sampling var(v)/[ var( Xi)]' (4. we I~ave that.Y)f31(XiX).2 [Equation (3. where assumption. variance of Vi Vi vi = (X j J. limit the arem nonzero and finite . ±(X. in large samples. = if:/II.' .IS . Equations (432) through (435) say t Itat the sample average of the . the sample average of the OLS pre di d values equals '9.35) L residuals is zero. verify Equation (432) n~e t hat the deflni " ~t t ie d~nllion . TSS = SSR + E:SS.r=llli  of Y and X . .34) (4.d.32) (4. ance of X (except dividing by II rather than 1'/  I.33) " LUiXi 1=1 = 0 and $. which is the cxpressi Some Additional Algebraic Facts About OLS The OLS residuals and predicted values satisfy: (4. where a? == n in Equa~:on = var] (Xi I'x )11/]/ {II[ var( Xi) ]'}.jx = 0. To = "_" _ ~ t als as U. and I'.14).7). As estimator of the variance of X discussed in Section 3.mpyI that L:~. (4.LX)lIi· By the first least squares VI has a mean of zero. u~/lI) distribution. Thus v/ if. where assulllption .and (4. . Therefore ..21). By the second least squares assumption.ul).
u. so 2:7"'1}j = where the second equality is a consequence of Equation (4. SSR + ESS. note that j L.Y) 1=1 (4.liY.I.(1. .36) is obtained using the formula for (4. where ~l the L.~Ii/'iX:' final equality follows from = 0 by the previous results. This result. "A Lio=ll.:"lUi(X X).36) wbere the final equality in Equation (4. :~')J(X.27). so i=l ±u.~ltJ'i= 0 implies 2:7".~l B. To verify Equation (4. _ j 2:.J' + 2. :y) .X.~lU'J = L.I. Equation (4.)(Y. .=1 /I A Y)'= 2.=1 ... i==l Y)' + 2 2:(y.0=.~. 1/ = II"" Li=lltj(f30 " + f31Xi ) _ " .1 II A 1. .(X. r/ II = 2: (I. = .. (4....::::::====::' . UiXi = L.Y)(X.:y) ±[(I.~li1+ L. implies that Some ffil in Equation sux = O.I "+ ~ iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii~~.32). note that tj= Y. ~I ~l  X)' ~ 0.(1.Sampling Distribution of the OLS Estimator 143 To verify Equation (4.= i"'l " .~l2:(X.y)' II" A _ ~ 2:(y.{30£"i=lUi ". + Uj. combined with the preceding results.33).l.Y).34).35) follows from the previous results and II '" algebra: /I A _ TSS~ 2:(1.37) i=J = SSR + ESS+Z2.+ 1.
4. Sections 5. some stronger conditions hold. Because the effect on test scores a f a unit change 111 class size is /3CI 5". the superintendent 144 asks. and its standard error to test hypotheses." of the population regression line is zero. The for this Section 5.3 assume that the three least squares assumptions results can be derived regarding the distribution of the OLS estimator. C sertmg t at the population regression line is flatthat i . which measures the spread of the sampling distribution of ~" Section 5. we show how knowledge of this sampling distribution can be used to uncertainty. under certain conditions. a concept Section 5. Section 5. the superintendent. claims. Is there.. She ha an angry tax. payer IS as . the raxa uni .2 /31 that accurately summarize the sampling starting point is the standard error of the OLS estimator. then some stronger One of these in stronger conditions is that the errors are homoskedastic. lass ize. in addition.5 presents the GaussMarkov introduced theorem.the slope /3C1""S.rased in the language f regres ion analysis. In differs from one sample to the nextthat make statements about /31 has a sampling distribution. no effect on test scores. /3" Section 5. explains how to construct confidence intervals for standard error (and for the standard error of the OLS estimator of the intercept). this chapter. 5. Section 5. has payer in her office who assertsthat cutting cla s size will not help boost test scores. calls you with a problem. Chapter 4 hold. .1 through 5.1 provides an expression then shows how to use ~. If.CHAPTER 5 Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals T his chapter continues the treatment oflinear is. h' oss I . OLS is efficient (has the smallest variance) among a certain classof estimators. . how r:gression with a single regresSor.6 discusses the distribution of the OLS estimator when the population distribution of the regression errors is normal. so reducing them further is a waste of money. the taxpayer The taxpayer's claim can be reph.1 Testing Hypotheses About One of the Regression Coefficients Your client. which states that.3 takes up the of special case of a binary regressor. Chapter 4 explained how the OLS eStimator ~ I of the slope coefficient /3.
at least tentatively pending further new evidence? This section discusses tests of hypotheses about the slope f3.6. the tstatistic has the form 5.5ize = 0. and the twosided alternative is fl. lact the pvalue for a twosided hypothesis 2 <I> ( ltac/I). TwoSided Hypotheses Concerning /3. .Alternatively.5). dom sampling 3.D' as The test of the null bypothesis as in the three steps summarized standard sampling 1= alternative proceeds the in Key Concept 3. the statistic Concept samples equivalently.n.hypotheSized value standard error of the estimator' evidence in your sample of 420 observations on California school districts that this slope is nonzero? Can you reject the taxpayer's hypothesis that f3Ch". by ranvariation. The first is to compute second step is to compute 5. The the rstatistic. which has here. We start by discussing twosided tests of the slope tests and to tests of hypotheses regarding fJ. is the same as to testing hypotheses about the population mean. in detail. based on the test statistic actually observed. Testing hypotheses about the population mean. The third step is to compute the pvalue.5.1 (51) estimator . at least as different from the null hypothesis value as is assuming that the null hypothesis normal is correct (Key in large test is distribution actually observed. which is an estimator of the standard deviation of the distribution of Y. so we begin with a brief review. which is the smallest significance level at which the null bypothesis could be rejected.1. the zstatistic is the general form given in Key Concept (Y .2 that that the mean of Y is a specific value !Ly.o can be written = !LY. The general approach to testing hypotheses about the coefficient f3. applied error ofY. then turn to onesided intercept f3o.1 Testing Hypotheses About One of the Regression Coefficients 14S General Form of the tStatistic ~ In general. Because the zstatistic has a standard under the null hypothesis. SE(Y).o)/SE(Y).: E(Y) Ho against the twosided * !LY./LY. the null hypothesis flo: E(Y) Recall from Section 3. where cumulative is the value of the Istatistic actually computed and <I> is the standard normal distribution tabulated in Appendix Table l. or intercept of the population f30 the regression line. or should you accept it. the pvalue is the probability of obtaining a statistic.
can be tested using the same general approach.under the null hypothesis the true population slope (31 takes on some specific value. The second step IS to compute the rstatistic . the critical feature justifying the foregoing testing procedure for the popula tion mean is that.2) To test the null hypothesis Ho. t46 CHAPTER 5 . hypothe es about the true value of the slope 13. . a twosided test with a 5% significance level would reject the null hypothesis if 11"'1 > 1. t= 13.96. the population mean i~said to be stati tically significanily different from the hypothesized value at the 5 Yostgntflcance level. The angry taxpayer's hypothesis is that 13 I"" I" = 0. tian mean.. is an estimator of (J~" the standard deviation of the sampling distribution of (31' Specifically. '" 131.0 (twosi Ie I alternative).0 SE(ffit) (55) . " ' computed by regression software so that it is easy to use in practice. does not equal (3 t. tn applicati ns the standard error IS h '1 ' .oThat is.1. in large samples. the null hypothesis and the twosided alternative hypothesis are Testinghypotheses about the slope f3t Ho: 131 = (31. The null and alternative hypotheses need to be stated precisely before they can be tested. 131. (3. In this case.n.13'. AItough the formula for (J~IIS camp I'icated. Under the twosided alternative. we follow the same three steps a for the populaThe prst step is to compute the standard error of E(ffil)' The standard err?r of 13.0 vs. For example. (53) where ih (SA) The estimator of the vari 'E ' anance 111 quauon (SA) is discussed in Appendix 5. also has a normal sampling distribution in large samples. More generally. the sampling distribution of'Yis approximately normal. Because~. h S' I Regressor' ypothesisTestsand Confidence lntervals H . At a thcoreticallevel. HI: 13. RegressionWit a In9 e the third step can be replaced by simply comparing the lstati tic to the critical value appropriate for the test with the desired significance level. (5.
... pvalue = Pr(IZI > 1t''''I) = 2<I>(lla"I).. Compute the standard 2.. is approximately normally distributed in large samples.. _____ iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii~. (5. by comparing test. _ 'TIle third step is to compute the pva]ue. Stated mathematically.1 Testing Hypotheses About One of the Regression Coefficients 147 Testing the Hypothesis Against the Alternative 1.96. Compute the pvaJue f3..3)]...::::". 'f.f31. equivalently. [Equation (5. in Key Concept 5..ol > = Pr" o 1~1"  f31. error of ~t. the probability of observing a value of f31 at least as different from f3t.96.ol] = Pr" 0 [1 ~1f3I...7)]. pvalue = Pr"o[I~1 ..Oas the estimate actually computed (~1Cf). SE(~I) [Equation (5..oll~1"f3t'OI] _ > _ SE(f3I) SE(f3I) (1t1>11""I).. the critical value for a twosided of obtaining a value of f31 at least as far from the null as that actually observed is less than 5%.. so in large samples.2. Compute the zstatistic 3.. if It ""I > 1. as a standard under the null hypothesis the Istatistic is approximately distributed normal random variable..96.. the hypothesis can be tested at the 5% significance level simply at the 5% level if Ila"l the value of the rstatistic to ±1. under the null hypothesis. = f31 0 f3. the probability hypothesis is rejected at the 5% significance level.7) A pvalue of less than 5% provides evidence against the null hypothesis i~ the sense that. If so... Reject the hypothesis at the 5% sig nificance level if the pvalue is less than 0. the null Alternatively...2 (5.5....6) where Pr"o denotes the probability computed under the null hypothesis.05 or.=== . The standard error and (typically) the zstatistic and pvalue testing f3'1= 0 are computed automatically by regression software. the second equality follows by dividing by SE(~ I)' and la" is the value of the rstatistic actuBecause ~J ally computed. assuming that the nul] hypothesis is correct..".. (5..f3'"o [Equation KEY CONCEPT 5..5)].. and rejecting the null hypothesis These steps are summarized > 1.
2.0 against the hyp~thesls that f3.11 . . . Because of the importance of the standard errors. as shown in Figure 5.. by onvenuon they are included when reporting the estimated OLS coelf. The OlS Reportmg regression eq '.e null hypothesis f3Ch. . e against the studentteacher ratto. One compact way to I t d d errors is to place them in parentheses below the respective report t re S an ar coefficients of the OLS regression line: TestScore _ 698. itis . This is a common format f I' reporting a single regression equation. it is reasonable to conclude that the null hypothesis is false.11. 111is probability is the area in the tails of standard normal distribution.y.8) at the 5% significance level. TIllis quation (5. .8) into the general f rrnula in Equation (5.3 . 11.8) also reports the regression R2 and the standard error f the regression (SER) following the estimated regression line.5). construct tbe zstatistic and compare it to 1. Thi rstatistic exceeds (in absolute value) the 5% twosided critical value f 1. uations and application to test scores. = f31. less than 0.38. That is. as far [rom the null as the value we actually brained is extremely small.1. is zero in the population counterpart of Equation (5.52 = 4. " . approximately 0.e standard error f these estinon (4. and its standard error from Equation (5. we can compute the pvalue associated with I'''' = 4.9 . Id d f3' ~ 698 9 and f3.o. estimates of the sampling uncertainty of the slope and tbe intercept (tbe standard errors). the estimated slope. To do so. the 5% (twosided) critical value taken from the standard normal distribution. Suppose yon wish to test the null hypothesis that the sl pe /3.6. This probability is extremely small. the result is t'" = (2.52) = 0. ometimes. reported In Equa regresslOO of the test scar .00001.28. S E R = 18. HypothesisTests and Confidence Intervals 148 CHAPTER 5 Regression with a Single Regressor. and two measures of the fit of this regressian line (tbe R2 and the SER).isis a twosided hypothesis te t. however. The discussion so far has focused on testing the hypothe i that /3..001'Yo.clent . Because this event is so unlikely.28 . or 0. .9 . ) provides the estimated regression line.) = 0. and it will be used throughout the rest f this book. OneSided Hypotheses Concerning {3. because under the alternative /3. under the null hyp thesis (zero).28 x STR.8) Equation (5. TI. mates are SE(~o) = lOA and SE(f3.0)/0. Alternatively. (5.051 . ify. the probability of obtaining a value of /3.s the null hypothesis is rejected in favor of the twosided alternative at the 5% significance level.e rstatistic is constructed by substituting the hypothesized value of f3. ). '" l3"o. = 2.52.001 %.96.ssSiu = 0 is true. could be either larger or smaller than f3.e eo' . R2 (lOA) (0. .
00001 a The pvalue is the area to the left of 4.0 vs. therefore.645. It might make sense. the construction sided alternative sided alternative of the Istatistic is the same.9) is reversed. The only difference between (5. but not large positive. (5. to higher Scores.o. appropriate ratio/test learning to use a onesided hypothesis test. f3 j is negative: Smaller classes lead to test the null hypothesis that environment.and twosided hypothesis in Equation for large negative.9) ratio exam where f31.9).5. If the alternative is that f3. For the oneis rejected against the one test. values of the (statistic: Instead of rejecting if level if tact < 1. the hypothesis is rejected at the 5% significance . is less than than f31.96.and a twosided hypothesis a test is how you interpret the (statistic. the null hypothesis and the onesided alternative hypoth esis are Ho: f31 = f31.0 is the value of f31 under the null (0 in the studentteacher pie) and the alternative greater Because is that f3.38. HI: f31 < f31. It'IC'1 > 1. When r"> 4.o (onesided alternative).38 The pvalue of a twosided test is the probability that !Zl > ItOC'1 where Z is a standard normal random variable and tOcr is the value of the rsratistk calculated from the sample. For a onesided test. For example.38 + z the area to the right of +4. f3l.0' the inequality the null hypothesis in Equation (5. the pvalue is only0. is is the same for a one. f31 = 0 (no effect) against the onesided alternative that f3j < O. the null hypothesis one. many people think that smaller classes provide a better Under that hypothesis. in the studentteacher Score problem.1 Testing Hypotheses About One of the Regression Coefficients 149 cmmn:DI Calculating the pValue of a TwaSided Test When /"ct = 4.38.
B "u on lh~.uion j ke thai a university's secret of ucce s is 10 admit mlemed . the IIlcquUlities in Equations (5. If the alternative i one.lO) If the alternative hYPolhesi is that fJ. upOn reflection this might not nccessaril be newt ormul H 'd dru Undergoing clinical trials actually could prove harmful he u e of pr \ II U I unr 'cognized side effects.llIp" {JI ca ionally.lI) 1lte general approach t testing thi null h pothell " n\l ts of Ihe Ihree stepS Key Concept S.th .150 CHAPTER 5 . In the class size example. rd ncr. so the IJV lu Pr(Z > (""').lgOlll't Ih· n '"J u Iltcrtlotive at the 1% level. How_ ever.m ###BOT_TEXT### then make sure that the faculty stays out of their way and eoe llute dum I' .33 (Ihe critical valuc fro onc 'Id 'd I "1 \ Itil I 1% igni(icance level).' d 10.2 applied 10 fJo (Ihe f nnula for Ihe lanu. When should a onesided test be used? In I rh n hI I III ('Ir bability. Testing Hypotheses About the Int rc pt 130 This discussion has focused on lesting h pot he".(l()(1Il .o (t wll'Sld 'd lit rn 1\ 1\ c). .. 8. however.. This value is less than 2.o vs.. The r·Slalhlic resrln the h pothe I Ih II there is no effect of class size on lest seore [ fJt. pn r empin tl evid nc or h th. we or' remind 'd 01 th r ulu. d d Iternalive hypotheses should be use I nly when ther . even if it initially seems that the rdev nl nit rn 11
First. there are many times that no single hypothesis about a regression coefficient is dominant. Being able to accept or to reject this null hypothesis based on the statistical evidence provides a powerful tool for coping with the uncertainty inherent in using a sample to learn about the population.0 is outside the range ~ iiiiiiiiiiiiiiilliii ""'". the true value of {3. Because the 95% confidence interval (as defined in the first definition) is the set of all values of f31 that are not rejected at the 5% significance level.testing the null hypothesis f3. in 95% of all possible samples.. in 95% of possible samples that might be drawn. reject the true value of {31 in only 5% of all possible samples.3). it follows that the true value of f31 will be contained in the confidence interval in 95 % of all possible samples. The reason these two definitions are equivalent is as follows. however. 5. and instead one would like to know a range of values of the coefficient that are consistent with the data.The 95% confidence interval is then the collection of all the values of {31 that are not rejected. in principle a 95% confidence interval can be computed by testing all possible values of {3J (that is. it is the set of values that cannot be rejected using a twosided hypothesis test with a 5% significance level. It is possible. Second. A hypothesis test with a 5% significance level will. An easier way to construct the confidence interval is to note that the tstatistic will reject the hypothesized value {31. that is. Confidence interval for {3. Yet.2 Confidence Intervals for a Regression Coefficient Because any statistical estimate of the slope {31 necessarily has sampling uncertainty.0 for all values of {3J. we cannot determine the true value of {31 exactly from a sample of data. the confidence interval will contain the true value of f31' Because this interval contains the true value in 95% of all samples. = {31."" _ .5.2 Confidence Intervals for a Regression Coefficient 151 Hypothesis tests are useful if you have a specific null hypothesis in mind (as did Our angry taxpayer). Recall that a 95% confidence interval for {31 has two equivalent definitions.o) at the 5% significance level using the tstatistic. by definition. it is an interval that has a 95% probability of containing the true value of f31. it is said to have a confidence level of 95%. will not be rejected.O whenever {31. to use the OLS estimator and its standard error to construct a confidence interval for the slope f31 or for the intercept f3o. This calls for constructing a confidence interval. As in the case of a confidence interval for the population mean (Section 3. But constructing the zstatistic for all values of f31 would take forever. that is...
11le ther end of the confidence intervnl e. = is not contained in this confidence interval. The OL regression stud~ntteacher ratio.it contains the true value of /3.96x 0. ± 1. + L96S£(~ . ted by a 5% twosided hypoth SIStest. Becauseone en I of a 95% con(idence interval for /3 is [3 . of a confidence interval [3.12) + 1. t he argument used for to develop a confidence interval for the population mean.)J. that is'. it is constructed as 95% confidence interval for [~.96SE(~I)' That is. . reported in quation f the te tore against the 2. . 10 ibl 01 all POSSI e ran do Inly drawn samples. .8). yielded ffil SE(~. Hypothesis Tests and ConfidenceIntervals ~ Confidence Interval for f31 5.) = 0.52. h a 95°'(0 probability'. . so (as we knew already from ection 5. The value [3. ~. The 95% twosided confidence int rval for f3. the 95% confidence interval for /31 is the interval [~j1. in 95"' valueof [3 lwlt ( .e 95 % confidence interval for /3.96SE(~.96 E(~. is surnmurizcd as Key Confidence interval for [3n A 95% confidence interval for f3n is constructed as in Key Concept 5.I 96S£([3') I .52}. ~. /31 = (5. ". The p pulati Yasso n slope f31 is unknown. or 3. but because we can construct a confidence interval for (3" we can construct a confidenceinterval for. .the predictedeffect /316x. can be usedto construct a 95% confiden e interval for the predicted effect of a general change in X. is {2.96SE([3.30 :5 [3.28 and (5. When the sample size that cannot b e rejec is large. " .152 CHAPTER 5 Regression with a Single Regressor. Consider changing X by a given amount. EqUivalently. t ie predicted effect of the change Ax uSing estunateof IS[/31 .) ] X 6x.3.1) the hypothesis [3.3 id d fide ice interval for [31 is an interval that contains the tru A 95% tWOSI e con I I . I.1965E(~I)j.3.)]. II IS the set of values of"~I .28 ± 1. :5 126.. 111i argument parallel.. with ~o and SE(ffio) replacing ~I and E(~I)' Application to test scores. tllis .6x. a Confidence intervals for predicted effects of changing X 11. ~ 0 can be rejected at the 5% significance level. The construction Concept 5.' .1.11le predicted change in ciated with this change in X is /3. e . 6x.). .
The interpretation .13) reducing for the superintendent by 2. dependratio is less than 20: ratio in ilh district < 20 ratio in ith district > 20· is (5.D.52. + 1. = . if female. .60 or as little as 1. i = 1. ~.4.3 RegressionWhen X Is a Binary Variable t 53 is}.96SE(~1) X D... = The population regression (5.)] X D.96SE(~ 1) X D.8. = 0 if male). + 1. Because the 95% confidence interval . = 0 if rural). . Thus decreasing the studentteacher ratio by 2 is predicted to increase test scores by between 2.26].X For example.X . suppose you have a variable D.8 1D. gender so far has focused on the case that the regressor is a continuous Regression analysis can also be used when the regressor is binarythat is. = 0 if large).80 + . is different. whether a school district is urban or rural (= 1.8.1. when it takes on only two values.x = ing x by the amount D.26 X (2) = 2. as described ill Section 3..14) model with D. that equals either 0 or 1. with a 95% confidence level.D. X might be a worker's (= 1. For example.3 Regression When X Is a Binary Variable The discussion variable.30. 5.30 X (2) = 6. or whether the district's class size is small or large (= 1 if variable or some small.52 and 6.8.x. A binary variable is also called an indicator times a dummy variable. n. ing on whether the studentteacher To see this. as the regressor y.96SE(. Interpretation The mechanics of the Regression Coefficients of regression of with a binary regressor are the same as if it is cona difference of means tinuous. our hypothetical studentteacher [3. 0 or 1.60 points.5. ratio + 1. is 1. and it turns out that regresto performing sion with a binary variable is equivalent analysis.x. and the predicted effect of the change using that estimate is [. the effect of reducing the studentteacher ratio by 2 could be as great as 3. is contemplating (5.96SE(~Jl.x]. I __ lIiIiiiiiiiiiiliiilll _ .x can be expressed as 95% confidence [~. Thus a 95% confidence interval for the effect of changinterval for . + 1Ii.8.15) 1 if the studentteacher { 0 if the studentteacher D. if urban. D.8" however.
ID. E( Y. io the two groups. Y. ges of Y. when D. ically. Thus we will not lies there IS no me.=O). it makes sense that the OLS estimator f3j is the difference between the sample aver. as the refer to f3i as the s ope In . (516) Because E(Il.15) is not a slope. It' I'D· in this regresSIOn or. . = 0. when D. when D. then f3.f30 = f3. Because {31 is the difference in the population means. = I. D i . " ' . what is it? The best way to interpret ~o ' regression with a binary regressor 15 to consider. I on ~~ f31 in Equation (5. .) . that is. can take on only two val· not useful to t 10 0 1 .. f31 is the difference between mean test score in districts with low studentteacher ratios and the mean test score in districts with high studentteacher ratios. If the two population means are the same. and f30 i the pop' ulation mean of Y.ID.ID.' the binary variable D. and. wben D. ThIShypothesis can be tested using the procedure outlined in Seeti n 5. . = E( Y. In the test score example. 154 CHAPTER 5 ' H pothesis Testsand Confidence Intervals y Regression with a Single Regressor. in fact.1 5).15) becomes Y.15) is zero. In other words. Specif.) = 0. that is.1. = I. + III (D. r f3. = 0.. = 1)E(Y. = 1 If the studentteacher rauo IS 11Igh. = 0) = f3o.ID. sion model with the continuous regressor X. when D. = 1.17) Thus. = f30 + f3. in Equation (5. • . one at a lime. Thus the null hypothesis that the two population means are the same can be tested against the alternative hypothesis that they differ by testing the null hypothesis f3. . the conditional expectation f l'/ when D. f30 + f31 is the population mean value of test scores when the studentteacher ratio is low. Similarly. Because f30 + f3. 1I1stead we will simply refer to f3. (D.ID. ible cases. is the population mean of Y. IS not eOnlmuous it is that now the regresSOl IS _ ' hi k f f3 as a slope' indeed. so 1 ~ . the coefl1cient coefflcient mu Ip ymg . is the difference between these two means. I' Equation (5.''''1 makes no sense to talk about a slope. = 1 and when D. because D. f3.=f30+U. then D. this is the case. the null bypothesis can be rejected at the 5% level against the twosided . = 0). = 0 is E(Y. = 0 POSSI I • I and Equation (5. Hypothesis tests ond confidence intervals. = 0 against the alternative f31 # 0. is the difference between the conditional expectation of Y. except This is the same as t he regres. = 1) = f30 + f3. the two an d f3 1108 • . = I). (5. f30 is the population mean value of test scores when the studentteacher ratio is high. Because D. more compactly. "1.0 and D.. the difference (f30 + f3.
9). The difference between the sample average test scores for the two groups is 7.4 Heteroskedasticity and Homoskedasticity 155 alternative when the OLS rstatisric I ~ P 1/ SE(Pl) exceeds 1.96 X 1. furthermore..0.4/1.2 yields TeSIScore ~ 650. 5. then the errors are said to be homoskedastic.0 (1.4 Heteroskedasticity and Homoskedasticity Our only assumption about the distribution of u.0 + 7. 10. PI Application to test scores. The OLS estimator and its standard error can be used to construct a 95% confidence interval for the true difference in means.96 in absolute value. and the risks you run if you use these simplified formulas in practice.4 ~ 657. constructed as ± 1. (5. provides a 95% confideuce interval for the difference between the two population means. As an example.04.18) where the standard errors of the OLS estimates of the coefficients f30 and f31 are given in parentheses below the OLS estimates.4 ± 1.4D. so that (as we know from the previous paragraph) the hypothesis 131 ~ 0 can be rejected at the 5 % significance level. Is the difference in the population mean test scores in the two groups statistically significantly different from zero at the 5% level? To find out. construct the zstatistic on f31: I ~ 7.8 ~ 4. so the hypothesis that the population mean test scores in districts with high and low studentteacher ratios is the same can be rejected at the 5% significance level. the simplified formulas for the standard errors of the OLS estimators that arise if the errors are homoskedastic.8) ~ 0.3) + 7. . and the average test score for the subsample with studentteacher ratios lessthan 20 (so D ~ 1) is 650. SER ~ 18.96SE(iJd as described in Section 5. for which D ~ 0) is 650.2. Similarly. a regression of the test score against the studentteacher ratio binary variable D defined in Equation (5. This confidence interval excludes f31 ~ 0. its theoretical implications.8 ~ (3. This section discusses homoskedasticity.4.037. This is the OLS estimate of f31.4.96 in absolute value. the variance of this conditional distribution does not depend on X. is that it has a mean of zero (the first least squares assumption).5. conditional on X. If. a 95% confidence interval for f31. This value exceeds 1.7. R2 (1. the coefficient on the studentteacher ratio binary variable D. Thus the average test score for the subsample with studentteacher ratios greater than or equal to 20 (that is.9.This is 7.14) estimated by 0 LS using the 420 observations in Figure 4.
in Figure 4. so that the err r in Figure 5.. this distribution is tight. the error term is heteroskedastic. Unlike Figure 4." butlo n ot '7"_ 7'~" uno n e! u glven)( var(uIX). = x. the conditional variance of III given X. return to Figure 4..2 illustrates a case in which the conditional distribu.4. this is the conditional distribution of "I given X. = x does not depend all x. ~~~~~t5~~~~~. Because this distribution appJie specifically for the indicated value of x. cisely. more pre.. given Xi is con. tion of u.4 are homoskedastic. Otherwise. As an illustration. . . and homosked sri ity arc summarized Cim!Il'Im distribution of test An Example of Heleroskedasticity Test score Like Figure 4.. it has a greater spread._0. depends on X. As drawn in that figure.4. stant for i = 1.4. 156 CHAPTER 5 . That is. = x increases with x.JL~~ 15 20 25 30 Studentteacher ratio .2 are heteroskedastic. TIle definitions of heteroskedasticity in Key Concept SA. For small values of x . The error term "i is homoskedastic if the variance of the conditional distribution of u. Regression WIt a Ing e What Are Heteroskedasticity and Homoskedasticity? Definitions of heteroskedasticity and homoskedasticity. h S· I Regressor'HypothesisTests and Confidence Intervals . III contrast. Because the variance of the distribution of .. so the errors illustrated in Figure 4. these distributions become more spread out (have a larger variance) for larger class sizes. the variance of these distributions is the same (or the various values of r. The distribution of the errors "i is shown for various values of x.2 the variance of Ui given X. u is 10 hetercskedastic. Thus in Figure 5. Figure 5. n and in particular does not depend on X. spreads out as x increases. but for larger values o( x. all these conditional distributions have the same spread.~~_~./. this shows the conditional scores for three differ 720 700 680 660 640 620 600 Distribution of Y when X = 15 ent class sizes..4.
. . Earnings.21) = f30 + f31 + UI Thus. In this regard.IXi = x). for women. is the variance of the error term the same for men and for women? If so. u. depends on MA LEI requires thinking hard about what the error term actually is. var(u.. (5.19) as two separate equations. so at issue is whether the variance of the error term depends on MALE. f31 is the difference in the population means of the two groupsin tbis case. The binary variable regression model relating a college graduate's earnings to his or her gender is Earnings. (5. it is useful to write Equation (5." is eqnivalent to the . is constant for i = 1... is homoskedastic if the variance of the conditional distribution of til given Xi. . It follows that the statement. be a binary variable that equals 1 for male college graduates and equals 0 for female graduates. we digress from the studentteacher ratio/test score problem and instead return to the example of earnings of male versus female college graduates considered in the box in Chapter 3. n. The definition of homoskedasticity states that the variance of does not depend on the regressor. is the deviation of the it" woman's earnings from the population mean earnings for women (f3o). and the definitions might seem abstract.19) for i = 1. it is heteroskedastic. 5. one for men and one for women. if not. UI is the deviation of the it" man's earnings from the population mean earnings for men (f3o + f3.20) (5. Here the regressor is MALE. . and for men. n and in particular does not depend on x. Because the regressor is binary. To help clarify them with an example. "the variance of UI does not depend on MALE. It. (women) and (men). Otherwise. = f30 + It. . Deciding whether the variance of u. tbe difference in mean earnings between men and women who graduated from college. In other words.SA Heteroskedasticily and Homoskedasticily 157 Heteroskedasticity and Homoskeda5ticity ~ The error term U. the error term is beteroskedastic. These terms are a moutbful.. Earnings. the error is hornoskedastic.. = f30 + f31MALE." Let MALE.4 Example.). + U. "The Gender Gap in Earnings of College Graduates in the United States.
Mathematical Implications of Homoskedasticity The OLS estimators remain unbiased and asymptotically norrnal. ' tIlOy ' e an where s' is giiven ~n 'E f. e 'd' t ibution of earnings is the same for men and women. then there is a specialized formula that can be used fOl}he standard errors 01' ~o and ~ I' 11..3 hold and thc errors are hornoskedas. homoskedasticityonly estimator of the variance of ~I: (hornoskedasticity(522) ' quauon (4. they apply to both the general case of hetcroskcdasticity and the special case of homoskedasticity. if these va ' popu Ianon IS n I'. ditional variance. then the formulas for the variances of ~o and ~ I in Key oncept4. consistent. the OLS estima· tors have sampling distributions that are normal in large samples even if the errors are homoskedastic. the OLS estimators remain unbiased and consistent even if the errors are homoskedastic. n. is the variance formula. then the OLS estimators Po and ~I are efficient among all estimators that are linear in 1'\. . In the special case that X is a binary variable.4 simplify. is discussed in Secti n 5.the square of the standard error 01' /3.. ances differ. Homoskedasticityonly If the err r term is homoskedastic. " h t e V31lanc " e of earnings is the same for men as it is for Wonlen" I .I' 111is result.. ."/ and are unbiased.1.1. under homoskedasticity) i the socalled pooled vanance formula for the difference in means.3 place no restrictions on thecon. and asympt tically normal.. the error term is heteroskedastlc. [f the least squares assumptions in Key Concept 4. 158 CHAPTER 5 ' with a S' Reqressron statement.. the OLS estimator is unbiased. In addition. Because the least squares assumptions in Key Concept 4.. Therefore. . given in Equation (3. tic. if the errors are homoskedastic. .5. Because these alternative formulas are derived for the special case that the errors are homoskedastic add 0 not ' apply If rhe errors are heteroskedaslic.The homoskedasticityonly f rmula for the standard error of /30 is given in Appendix 5. 1119e I R ressor HypothesisTests and eg . Confidence Intervals n ' thi xample the error term is homoskedastlc If the vallance ofth other wor d 5. In lIS e c . conditional 11 X\l .19 ) . which is called the GaussMarkov theorem. Whether the errors are homo kedastic or heteroskedastic. . Efficiency of the OLS estimator when the errors are homoskedastic. '. I X.e homoskedastieityonly standard error of /3" derived in Appendix 5. Consequently. is SE(~I) = where 0'6.!fiJ. the estimator of the v~riance of ~I under h III skedasticity (that is. .l'. nly).23).
19) is heteroskedastic. todaywomen were not found in the toppaying jobs:There have always been poorly paid men. In other words . heteroskedasticity or homoskedasticity? The answer to this question depends on the application. For many yearsand. Because the standard errors we have used so far [that is.those based on Equations (5. the estimators and of the variances of ~I and ~o given in Equations (5. Because such formulas were proposed by Eicker (1967). if the errors are heteros ked astic. "The Gender Gap in Earnings of College Graduates in the United States"). the variance of the error term in Equation (5. the issues can be clarified by returning to the example of the gender gap in earnings among college graduates.4) and (5. Thus the presence of a "glass ceiling" for women's jobs and pay suggests that the error term in the binary variable regression model in Equation (5. ai. in general the probabun« that this interval contains tbe true value of the coefficient is not 95%. In fact. Familiarity with how people are paid in the world around us gives Someclues as to which assumption is more sensible. and White (1980).4 Heteroskedasticity and Homoskedasticity 159 will be referred to as the "homoskedasticity_only" formnlas for the variance and standard error of the OLS estimators. if the errors are heteroskedastic but a confidence interval is constructed as ±1.4) and (5. because homoskedasticity is a special case of heteroskedasticity.jjnn hypothesis tests and confidence intervals based on those standard errors are valid whether or not the errors are heteroskedastic. they are also referred to as EickerHuberWllite standard errors. In contrast. the correct critical values to use for this homoskedasticityonly Istatistic depend on the precise nature of the heteroskedasticity.26) produce valid statistical inferences whether the errors are heteroskedastic or homoskedastic. even in large samples. then the zstatistic computed using the homoskedasticityonly standard error does not have a standard normal distribution. Specifically. Unless there are compelling reasons to the contraryand we can think of noneit makes sense to treat the error term in this example as heteroskedastic.21) for men. to a lesser extent. However. ai. As the name suggests. but there have rarely been highly paid women. even in large samples. so those critical values cannot be tabulated. What Does This Mean in Practice? Which is more realistic. if the errors are heteroskedastic. Huber (1967). . This suggests that the distribution of earnings among women is tighter than among men (See the box in Chapter 3.96 homoskedasticityonly standard errors. they are called heteroskedasticityrobust standard errors. then the homoskedasticity_only standard errors are inappropriate.26)] lead to statistical inferences that are valid whether or not the errors are heteroskedastic.5. Similarly.20) for women is plausibly less than the variance of the error term in Equation (5.
i I I 0 . this standard tion is $7. SER Education.23) earning $50 per hour by the time they are 29. it might also be that the spread 01 the of earnings is greater for workers with of earnings 0 . u II Ir] 10 : 20 15 Years of education . But if the bestpaying jobs mainly go to the college educated.159.3.1. but some will. (5. Earnings with many years of cducarion have low~payingjobs.3 has two striking features. The spread around the regression line increases with the years of education. for workers with a high seh 01 diploma.91.to 30Year Olds in the United States in 2008 Average hourly earnings 80 Hourly earnings are plotted against years of education for 2.Or distribution 1.to 3Dyearold workers. indicating that the regression errors are heteroskedastic.05) (0. and devia r r workers with a college degree. Figure 5.76 Years (1.This increase is summarized by the OLS regression line. • 60 40 20 Fiucd values . hourly earnings increase by $1.96 tion. this standard deviation increases these standard of education. RegreSSion Wit The Economic Value of a Year of Education: Homoskedasticity or Heteroskedasticity? average. and workers with only len years of education have no shot at those job This line is plotted in Figure 5.25. n errors He hetcroskedastic. Figure 5. so answering it requires analyzing data.76 in the OLS regression line means that on Gm!IiIm scatterplot of Hourty Earnings and Years of Education for 29. workers with more education have higher earnings than workers with less educa additional year f education. For workers with ten years of education.50.h a S· gle Regressor' Hypothesis Tests and Confidence Intervals In . This can bc quantified at the spread of the residuals around sion line. The data come from the March 2009 Current Population Survey. the rcgressi realworld terms.08) R' = 0. Does the distribution spread out as education increases? This is an empirical question. to $12.I .76 for each O n average.76 ± 1. In all college graduates = 9. f I . the varinnce (the years 11 t regression of Equation (5.30. The coefficient of 1.38 + 1.08 .34. The 95% confidence X interval for this coefficicnt is 1. The second striking feature of Figure 5. which is described in Appendix 3. i !. Because levels in the deviati l1S differ for different f the residuals f cducauon). While more education.. 160 CHAPTER 5 . The first is that the mean of the distribution of earnings increases with the number of years of education. on the value in other will be = .989 fulltime.5.3 is that the spread of the distribution with the years of cducnti of earnings increases some workers n. the standard deviation of the residuals is $4.60 to 1. 29.3 is a scarterplot of the hourly earnings and the number of years of education for a sample of 2989 fulltime workers in the United States in 2008. with between 6 and 18 years of education. very few workers with low levels f education have by looking the OLS regres highpaying jobs. ages 29 and 30.23) depends of the regressor words.
many software programs report homoskedasticityonly standard errors as their default setting. if the least squares assumptions hold and if the errors are homoskedastic.. so it is up to the user to specify the option of heteroskedasticityrobust standard errors. As just discussed. which is a consequence of the GaussMarkov theorem.. then the OLS estimator has the smallest variance of all conditionally unbiased estimators that are linear functions of r.. and has a normal sampling distribution when the sample size is large. is always to use the heteroskedasticityrobust standard errors.. All of the empirical examples in this book employ heteroskedasticityrobust standard errors unless explicitly stated otherwise.The details of how to implement heteroskedasticityrobust standard errors depend on the software package you use. 1'. economic theory rarely gives any reason to believe that the errors are homoskedastic. then..5. then you should use the more reliable ones that allow for heteroskedasticity. is consistent. If the homoskedasticityonly and heteroskedasticityrobust standard errors are the same.. it might be helpful to note that some textbooks add hornoskedasticity 1'0 the list of least squares assumptions. In addition. In this regard. it is useful to imagine computing both. . For historical reasons. has a variance that is inversely proportional to n. iiiiiiiiiiiiiio ...5 The Theoretical Foundations of Ordinary Least Squares As discussed in Section 4.:::==1 . under certain conditions the OLS estimator is more efficient than some other candidate estimators. ____ iiiiiiiiiiiiiiiiiii ... This section explains and discusses this result. then choosing between them. It therefore is prudent to assume that the errors might be heteroskedastic unless you have compelling reasons to believe otherwise. The main issue of practical relevance in this discussion is whether one should use heteroskedasticity_robust or homoskedasticityonly standard errors. nothing is lost by using the heteroskedasticityrobust standard errors. the OLS estimator is unbiased. At a general level. "This section is optional and is not used in later chapters. Specifically. Practical implications..5 The Theoretical Foundations of Ordinary LeastSquares 161 As this example of modeling earnings illustrates. if they differ. The section concludes with 'Jn case this book is used in conjunction with other texts. however. The simplest thing. however.! *5. ...5. heteroskedasticity arises in many econometric applications. this additional assumption is not needed for the validity of OLS regression analysis as long as heteroskedasticityrobust standard errors are used.
= ~aiY. ' f It ' rive estimators that are more efficient than OLS when th a dISCUSSion0 a et na 1 e conditions of the GaussMarkov theorem do not hold. thai arc linear functions of YJ... if {31 is a linear estimator.. given X].. that is... summarized in Key Concept 3. (5. .." ..3) hold and if the error is homoskedastic. is conditionally unbiased if the mean of its conditional sampling distribution. the estimator 13t is conditionally unbiased if E(13...2.2 that the OL estimator is linear and conditionally unbiased. the OLS estimat r is the De t Linear conditionally Unbiased Estimatorthat is.. In other words. The GaussMarkov a set of conditions known as the GaussMarkov has ~he smal. which are stated in Appendix 5. Jt is shown in Appendix 5. .. x" but not on Yj.1 and that are unbiased.) = {3.. are implied by the three least squares . x. then it can be written as " ~.. .. Linear Conditionally Unbiased Estimators and the GaussMarkov Theorem If the three least sqnares assDlllptions (Key Concept 4. )(". then the OLS estimator has the smallest variance. is conditionally unbiased). conditional On among all estimators in the class of linear conditionally unbiased esu.24) (it is linear) and if Equation (5.. That is.3. given X". it is BLUE. III Linear conditionally unbiased estimators.1'..lest conditional variance. . . (ii. can depend on Xii' .1'. (13. This result is an extension of the result. is (3" That is.. The clas of linear conditionally unbiased estimators consists of all estimators of {3. i=1 a/I 1 I (5. und~r conditions the OLS estimator~.. of ~II linear conditionally unbiased estimators of {3j. theorem states that. . I x. The GaussMarkov theorem. 162 CHAPTER 5 ' 'I R r: HypothesisTests and Confidence Intervals RegreSSionwith a SIng e egresso. that the sample average Y is the most efficient estimator of the population mean among the class of all estimators that are unbiased and are linear functions (weighted averages) of 1']. is linear). X. }j" The estimator 13. . X b . The GaussMarkOv conditions. X mators.lx" ..25) The estimator ~j is a linear conditionally unbiased estimator if it can be written in the form of Equation (5. .24) where the weights a" . conditional on Xlt···..the OLS estimator is BLUE. .25) holds (it is conditionally unbiased).
==:r .5 and proven in Appendix 5. if tbe error term is heteroskedasticas it often is ineconomic applicationsthen tbe OLS estimator is no longer BLUE.:. tbe presence of beteroskedasticity does not pose a threat to inferencebased on heteroskedasticityrobust standard errors.then the OLS estimator ~l is the Best (most efficient) Linear conditionallyUnbiased Estimator (is BLUE). If the errors are heteroskedastic. its conditions might not hold in practice. called weighted least squares (WLS). weights the i''' observation by the inverse oftbe square root of tbe conditional variance of u.5. is known up to a constant factor of proportionalitythen it is possible to construct an estimator tbat has a smaller variance than the OLS estimator. but it does mean that OLS is no longer the efficient linear conditionally unbiased estimator. Regression Estimators Other Than OLS Under certain conditions. An alternative to OLS when there is heteroskedasticity of a known form. tbese other estimators are more efficient than OLS.5 The Theoretical oundations OrdinaryLeastSquares F of 163 TheGaussMarkov Theorem for ffi.However. the errors in this weighted regression are homoskedastic. tben OLS is BLUE. The weighted least squares estimator. there are other candidate estimators that are not linear and conditionally unbiased. is discussed below.2. The GaussMarkov theorem is stated in Key Concept 5. The GaussMarkov theorem provides a theoretical justification for using OLS. Because of tbis weighting. is BLUE. This method. under some conditions. called the weighted least squares estimator. some regression estimators are more efficient than OLS. if the conditional variance of u. so OLS. then OLS is no longer BLUE. If the nature of tbe heteroskedasticity is knownspecifically. given X. ________ iiiiliiiiiiiiiiiiiiitliiiiiiiiiiiiiiiil . Consequently. when applied to the weigbted data. As discussed in Section 5. 5. First.4. given X. In particular..3 hold and if errors are homoskedastic. ~ If the three least squares assumptions in Key Concept 4.5 assumptions plus the assumption that the errors are homoskedastic. Limitations of the GaussMarkov theorem. the theorem has two important limitations. if the three least squares assumptions bold and the errors are homoskedastic. The second limitation of tbe GaussMarkov tbeorem is that even if the conditions of the theorem hold.
3.These five assurnptions. As discussed in ection 4. IS therefore used far less frequently than OLS. arc obtained by solving a minimization like that in Equation (4. that the errors are hornoskedasric. Thus the treatment of linear regression throughout Ihe remainder of this text focuses exclusively on least squares method.is uncommon in applications. the exact distribution f the rstatistic is complicated and depends on the unknown population distributi n of the data. and the regression errors are normally distributed. I k wn in econometnc applications. .164 CHAPTER 5 Regression with a Single _ R egresso..If.6) except that the absolute value of the prediction "mistake" is used instead of its square. The tStatistic and the Student t Distribution Recall from Section 2.where Z is a random variable Wit h a .bo .b. r: Hypothesis Tests and Confidence Intervals .:.I .II I gant the practical problem with weighted least squa Although theoretlca ye eza __ . that minimize L.~. The least absolute deviations estimator. the OLS estimator can be sensitive to outliers. Weighted least squares' thing that IS rare y no . the LAD estimators of f30 and f3.6 Using the tStatistic in Regression When the Sample Size Is Small When the sample size is small. or other estimators with reduced sensitivity to outl ier . then the OLS estimator is normally distributed and the hornoskedasticityonly rstatistic has a Student t distribution. One such estimator is the least absolute dcviati n (LAD) estima_ tor.. the three least squares assumptions hold. depends on X sOme IS that you must now . the three least q uares assumption. *5.That is. tk how the conditional vanance of II.. In many economic data sets.4 that the Student t distribution with III degrees of freedom is defined to be th e diistnibuti of. however. and that the errors are n rmally distriblltedare collectively called the homoskedastic normal regression assumptions. are the values of bo and b.. The LAD 1'1 estimator is less sensitive to large outliers in u than is OL . the regression errors are hornoskedastic. deferred to Chapter ]7. . W is a random variable with a chisquared dislributioa l 'This section is optional <lad is not used in later chapters.res . usc f the LAD estimator.: . If extreme utliers are not rare.. then other estimators can be more efficient than OLS and can produ e inferences that are more reliable.Xi!. in which the regression coefficients f30 and f3.. severe outliers in II arc rare. uuon W/m standard normal distribution. and further discussion of WLS is I!.Z I'V r.
are independently distributed. Under the homoskedastic normal regres. 1.O) has a normal distribution under the null hypothesis. In econometric applications. are homoskedastic and normally there is rarely a reason to believe that the errors distributed. It follows that the result of Section 3. conditional tion. and confi large. rstatistic has This result is closely related to a result discussed in Section 3.~" (5.13 L."" a weighted uted. . then the hornoskedasticityonly regression rstatistic has a Student t distribution (see Exercise 5. average of independent 131 has a normal distribution.. Xn' As dison XL. inference can proceed as described in Sections 5. the OLS estimator is a weighted average of l]. conditional On XI.23)].5.x".2 degrees of freedom." where the weights depend on XL... the (normalized) distribution homoskedasticily_only x". and ffi. errors are homoskedastic (Appendix and normally distributed and if the homoskedasticityonly zstatistic is used. 111 Under the null hypothstandard error can using the homoskedasticity_only zstatisue testing 131 = 131. by first computing heteroskedasticityrobust the standard normal distribution dence intervals. Because sample sizes typically are standard errors and then by using tests. and Z and Ware independent.1 and 5.2]. Thus (ffil . and iii.5 is a special case of the result that if the homoskedastic normal regression assumptions hold. Use of the Student t Distribution in Practice If the regression the Student tribution. the zstatisrn. In addi variance estimator has a chisquared with n .2that is. this distinction is relevant only if the sample size is small.131o)/(. conditional cussed in Section 5. .5.2. computed be written in this form. Consequently. . Th~~. then the (pooled) zstatistic has a Student ( distribution.32) in Appendix 5. then critical values should be taken from Table 2) instead of the standard normal dist t distribution Because the difference between the Student distribution and the nor mal distribution is negligible if n is moderate or large.IS defined sion assumptions.5 in the context of testing for the equality of the means in two samples.0 is 'i = (ffi] .22). . .6 Using the rStarisric in Regression When the Sample Size Is Small 165 with m degrees of [reedom. if the two population distributions are normal with the same variance and if the rstatistic is constructed using the pooled standard error formula [Equation (3. the homoskedasticityonly standard error for ffi I simplifies to the pooled standard error formula for the differ ence of means... divided by n .omoskedasticitY_Only where u~. esis. the homoskedasticityonly a Student t distribution with n . x" [see Equation (5.2 degrees of freedom.'" Equation Y has a normal distribution. When X is binary. In that problem. Because normal random variables is normally distribon X" .. however. to compute pvalues. hypothesis ____ iiiiiiiiIII _ .10).
" could mean that the OLS analysis one so far has little value to the superintendent. The coefficient is moderately large. This corresponds to moving a district at the 50'h percentile of the distribution of test scores to approximately the 60'h percentile. after all. It thus might be that our negative estimated relationship between test scores and the studentteacher ratio is a con eq uence of large classes being found in conjunction with many other factors that are. their children are not native English speakers. the real cause of the lower test scores. But students at wealthier schools also have other advantages over their poorer neighbors. on average. in many cases. Indeed. it could be misleadtng. in fact. but is this relationship necessarily the causal one that the superintendent needs to make her decision? Districts with lower studentteacher ratios have. on average. the probability of doing so (and of obtaining a rstati ti n fJ\ as largeas we did) purely by random variation over potential amples is exceedingly small. ideri hi ing additional teachers to cut the studentteacher ratio Wil t who IS consi enng in ' a have we learned that she might find useful? . This result represents considerable progress toward answering the superintendent's question yet a nagging concern remain . reasnn to worry that it might not. in fact. Moreover. students at wealthier schools tend themselves to come from more affluent families and thus have other advantages not directly associated with their school.6 points higher.The population coefficient might be 0. and betterpaid teachers. The coefficient on the studentteacher ratio is statistically significantly different from 0 at the 5% significance level. including better facilities. Our regression analysis. However. But does this mean that reducing the studentteacher rati will.30 :S fJ\ :5 1. .based on the 420 observations for 1998 m the California d t et showed that there was a negative relationship between the stu testscore a as. higher test scores. approximately 0. and we might simply have estimated our negative coefficient by rand 10 ampling variation. c  dentteacher ratio and test scores: Districts with smaller classes have higher test scores. and. so wealthier school districts can better afford smaller classes.There i a negative relationship between the studentteacher ratio and test scar . Hiring more teachers. in fact. newer books. 5. or "omitted variables.001%. increase scores? There is. test scores that arc 4.26. costs money. For example.166 CHAPTER 5 . d These other factors. ' R r: HypothesisTestsand Confidence Intervals RegressionWitha 51ngle egresso.A 95% confidence interval for f3\ is 3. the e immigrants tend to be poorer than the overall population.7 Conclusion he problem that started hapter 4: the superintende t Return for a momen t to t n . in a practical sense: Districts with two fewer students per teacher have. . California has a large immigrant community.
If the three least squares assumption hold and if the regression errors are homoskedastic. 3. 2. = x) is constant. the OLS estimator is BLUE. then the OLS tstatistic computed using homoskedasticityonly standard errors has a Student t distribution when the null hypothesis is true. the error u. a 95% confidence interval for a regression coefficient is computed as the estimator ±1.96 standard errors. Homoskedasticityonly standard errors do not produce valid statistical inferences when the errors are heteroskedastic. 5. then. holding these other factors constant.var(". Summary 1. is heteroskedasticthat is. the variance of u at a given value of x" var(". Hypothesis testing for regression coefficients is analogous to hypothesis testing for the popnlation mean: Use the tstatistic to calculate the pvalues and either accept or reject the null hypothesis.Ix. The difference between the Student t distribution and the normal distribution is negligible if the sample size is moderate or large.that is. the regression model can be used to estimate and test hypotheses about the difference between the population means of the "X = 0" group and the "X = I" group. If the three least squares assumptions hold.Ix. = x) depends on x. we need a method that willallow us to isolate the effect on test Scores of changing the studentteacher ratio. Like a confidence interval for the population mean. That method is multiple regression analysis. When X is binary. A special case is when the error is homoskedastic. the topic of Chapter 6 and 7. as a result of the GaussMarkov theorem. To address this problem. and if the regression errors are normally distributed. In general. if the regression errors are homoskedastic.Key Terms 167 Changing the studentteacher ratio alone would not change these other factors that determine a child's performance at school. Key Terms null hypothesis (146) twosided alternative hypothesis (146) standard error of ~ I (146) zstatistic (146) pvalue (147) confidence interval for /31 (151) confidence level (151) indicator variable (153) dummy variable (153) . 4. but heteroskedasticityrobust standard errors do.
lhe ficient.. Calculate the pvalue for the twosided rest of the null hypothe Ho: fJ.. = O.:e = 520. hut are the dependent and independent variables? 5.4) (2. I I•. . Do you reject the null hypothe is at the 5% level? 1% level? . coefficient multiplying D.1 Suppose that a researcher.08. (154) heteroskedasticityand homoskedasticity (156) homoskedasticityonly standard errors (158) heteroskedasticityrobuSI error (159) standard aussM rko theorem (162) be t linear unbia cd esurnato(BL )(16) weighted least square. a..5.d.5.ressi n test TestSco.). I R ressor Hypothesis Tests and Confidence Intervals Regressionwith a Sing e eg .2 Explain how you could use a regression model to c 110101 the wage gender gap using the data on earnings f rn 'n and w m ·n. II. .i. .4 . Construct a 95% confidence interval f r PI. set of ob crvati ns~. i I. a 5. using data n etas 'i. scores from lOa thirdgrade classes.: PI = a in a regression model using an i. (154) coefficient on D..168 CHAPTER 5 . (163) h moskcda lie normal regression assumptions 164) nussMarko onditions (17 ) Review the Concepts 5.. ER 11.1 Outline the procedures f r computing the /Ivl\lue fa IWO sided test of flo: tv = using an i. PrOVIde it hyp thetical empirical example in which y u think Ihe errors would be hctcr kcdastic and explain your reasoning.d.82 x (2Q. regression sl pe coefis I the b.. estimates lite ( ) and average rc ." Outline the procedures for computing the /Ivalue of n twosid d I 'Sl <I( II. Exercises 5. R2 0. s t of obscrvarlons ( /•• \.21) .i.3 Define homoskedasticity and heteroskednsticit .
2. Construct a 99% confidence interval for 130' 5. ~ 5. using wage data On 250 randomly selected male workers and 280 female workers. In the sample.) c. R' ~ 0. Construct a 95% confidence interval for the gender gap. Another researcher uses these same data but regresses Wages on Female. d. SER ~ 10.94 X Height. determine whether 5. Without doing any additional calculations. what is the mean wage of women? Of men? e. Define the wage gender gap as the difference in mean earnings between men and women. What is the estimated gender gap? b. Calculate the pvalue for the twosided test of the null hypothesis Ho: 13.2 SUppose that a researcher.6 is contained in the 95% confidence interval for 13" d.5 inches over the Courseof a year. . R' = 0.6.3 Suppose that a random sample of200 twentyyearold men is selected from a population and their heights and weights are recorded.23) (0.15) (0.52 + 2.31) where Weight is measured in pounds and Height is measured in inches.06.12 x Male. Construct a 99% confidence interval for the person's weight gain. Is the estimated gender gap significantly different from zero? (Compute the pvalue for testing the null hypothesis that there is no gender gap. a variable that is equal to 1 if the person is female and 0 if the person a male.2. SER ~ 5.Exercises 169 c. a. A regression of weight on height yields Weight ~ 99. A man has a late growth spurt and grows 1. (. What are the regression estimates calculated from this regression?  Wage ~ __ + __ X Female.41 + 3. SER ~ 4. estimates the OLS regression Wage ~ 12.36) where Wage is measured in dollars per hour and Male is a binary variable that is equal to 1 if the person is a male and 0 if the person is a female. (2. R 2 ~ .81.
6) (2. How much is this worker's average hourly earnings expected 10 increase? c. test score. A randomly selected 30yearold worker reports an educati n level of 16 years. Do you think that the regressi n errors plausibly are homoskedastic? Explain.0 + 13. and small classe contained approximately 15st udents. A high school graduate (12 years of educati n) is contemplating going to a community college for a 2year degree. Is the estimated effect of class size on test score cant? Carry out a test at the 5% level. Is this statement consistent with the regression evidence? What range of values is consistent with the regression evidence? 5. a. in the population.4. college graduates earn $10 per hour more than high school graduates.0 I.5) R2 = 0. A high school counselor tells a student that.Suppose that the regressron errors were homoskedastic: Would this affect the validity of the confidence interval constructed in Exercise 5.) was computed USing Equation (5.170 CHAPTER 5 . and given standardized tests at the end of the year.5 In the 1980s.) Suppose that. S' I R ressor:HypothesisTests and Regressionwith a Ing e eg .Tennessee conducted an experiment in which kindergarten students were randomly assigned to "regular" and "small" classes.23) to answer the following. (1. Do small classes improve test scores? By how much? Is the effect large? Explain.4 Read the box "The Economic Value of a Year of Education: Homoskedas_ ticity or Heteroskedasticity?" in Section 5.9 X SmaltClass. (Regular cia ses contained approximately 24 students. What is the worker's expected average hourly earnings? b. the standardized tests have a mean Score of 925 points and a standard deviation f 75 points.5. a. a. Let mollCloss denote a binary variable equal 10 1 if the student is assigned t a small class and equal to 0 otherwise. SE(~.SS yields Yes/Score = 918. Construct a 99% confidence interval for the effect of mall Class on 5. .5(c)? Explain.6 Refer to the regression described in Exercise 5.6. ER = 74. . b. A regression of TestScore on Small Ie. iatistically signifi c. Use the regression repOrted in Equation (5. b. on average. ). Confidence Intervals 5.
: f3.5) = 0. (7. and (a) and (b) answered. A sample of size n = 30 Y = 43. HI: f31 '" 0 at the 5 % level. where Y and X 73 is a linear function of Yj.2 (10. and /h Y.3 and. .. Xi) satisfy the assumptions in Key Concept 4.26. d. and Xi.2X.. Test Ho: f31 = 55 vs. where lI.4) standard where the numbers in parentheses are the homoskedasticonly errors for the regression coefficients. II. Let 73 denote are the an estimator of f3 that is constructed as sample means of Y.1) (1. Would you be and many samples of size In Xi are independent n = 250 are drawn. and Xi satisfy the assumptions in Key Concept 4.3. interval for b. yields that (Y. In what fraction of the samples would Ho from (a) be rejected? what fraction of samples would the value f3 j = 0 be included in the confidence interval from (b)? 5. a. Suppose you learned that surprised? Explain.9 Consider the regression model > 55 at the 5% level. Suppose that Y. a. b. R' = 0.54.2) + 61. SER = 6. Xi) satisfy the assumptions in Key Concept 4.2. Y. R' (3.. HJ' f31 '" 55 at the 5% level.8 Suppose addition.. SER = 1. O"~) and is independent of Xi.5X. Test Ho: f31 = 55 vs. Show that 73 = YIX. Construct a 95% confidence c. regressions estimated. in is N(O. Test Ho: f31 = 0 VS. Construct a 95% confidence interval for f3o. 73 is conditionally unbiased. 5. Show that b. A random sample of size n = 250 is drawn and yields y = 5. + u.3. a. and Xi were independent.. c. Y" . = f3X.7 Suppose that (Y. respectively. H.52.4 + 3. Y".Exercises 1 71 5. .
How would your answers to (a) and (b) change if you assumed only that (Y" X) satisfied the assumptions in Key oncept 4.' n 1. LtV denote the sample mean for observation With X'" 0 a d f3 1tUrelo X .3 and var(u. O'~) and is independent I' X. + 1/.22). b. + I/i' Find the OL estimates and their corresponding standard errors.. is N(O. a.y"J2] is $68.13 Suppose that (Y. . = '\)". Derive the conditional variance of the estimator. derive the variance of f30 under homoskedasticity given in Equation (5.an ~r o~servatlons With X = 1.i = {3w.. . suppose that 1. = 131 WOmen.lx = x) is constant? d. To be specific. how that ~o = 'YoJo + ~. X).l' au 'sMarkov con of {3 an sh W that it is a linear unbiased.".31).:. denotes years of schooling.denotes earnings..3 and. 172 CHAPTER 5 .::I L. How would your answers to (a) and (b) change if you assumed only that (Y" X.i + lim. of f30 and ~I J:.11 A random sample of workers contains II". 5. = y" and {31 = y.10 and Sw = $51. Let Women denote an indicator variable that is equal to 1 for women and 0 for men and suppose that all 251 observations are used in the regression 1. d.. 5." = (1/11. Y.i .o = (3/11. u. = $485.i' Let ~m. 5..0 + f3m. Show that the estimator is conditionally c. denote a binary variable and consider the r~gre sian Y..15 A researcher has two independent samples of observations on (1.l denote the OLS estimator constructed ~1I. . Women. Prove that the estimator is BLUE. X. conditionally unbiased? h.10. and the sample standard deviation [s"..) satisfied the assumptions in Key oncept 4. where (1/" X.. Is ~I oncept 4.. The corresponding values for women are Y. denote the sample m. Is ~.. = f3X.) Y.tX. . Derive the least squares estimator function of 1\.10 Let X. .10. '" f30 + · + ..1..) satisfy the ditions given in Equation (5. a..14 Suppose that Y.1 (1'. Regression WIth a Sing e 5.ll.3'1 5.1.. I R egre ssor: HypothesisTests and Confidence Intervals . = 120 1l1~ and II". and the Independent samples are for men and women. 5.I~
2. or 1% significance level? What is the pvalue associated with coefficient's Istatistic? b. using data only on males and repeat (b). Repeat (a) using only the data for high school graduates.5%.1.l) denote the corresponding s.15. different for e.) ~ V[sE(ffi""Jl)2+ [sE(ffiw. (Hint: See Exercise 5.3. Construct a 95% confidence interval for the slope coeftlcient..) .. can you reject the null hypothesis Ho: [31 ~ 0 versus a twosided alternative at the 10%.Empirical Exercises 173 ffi". and SE(ffi"u) and SE(ffiw. Repeat (a) using only the data for college graduates. Is the effect of distance on completed years of education men than for women? tHint: See Exercise 5. or 1 % significance level? What is the pvalue associated with coefficient's Istatistic? b. can you reject the null hypothesis H : [31 ~ 0 versus a two o sided alternative at the 10%. or 1% significance level? What is the pvalue associated with coefficient's Istatistic? E5. Is the estimated regression slope coefficient statistically significant? That is.l)f using the sample of men.15. e.1 _ ffiw" is given by SE([3"". Rnn the regression interval for the slope coefficient. regression of average hourly earnings (A HE) on Age and carry out the fol a. 5%.2 Explain. Show that the standard errol' of ffi".[3w. Is the estimated regression slope significant? That is. can you reject the null hypothesis Ho: [3.l denote the OLS estimator constructed from the sample of women. Is the effect of age on earnings different for high school graduates than for college graduates? E5. ~ 0 versus a twosided alternative at the 10%.) described in Empirical Exercise E4. Construct a 95% confidence c.andar~ errors. Run the regression d. Is the estimated regression slope coefficient statistically significant? That is..1 Using the data set CPS08 described in Empirical Exercise E4. run a regression of years of completed education (ED) on distance to the nearest college (Disl) and carry out the following exercises. run a lOWing exercises . Empirical Exercises E5. using data only on females and repeat (b). a.5%. c.2 Using the data set CollegeDistance described in Empirical Exercise E4. Using the data set TeachingRatings coefficient statistically run a regression of Course_Eval on Beauty. d.
uti and stems from HomoskedasticityOnly Variances var] to (5. Replacing var[(X. ity. TIle variance Li' I(X/ adjustment tIT. simplify =and (T~ 1'/U'1. a7 =x (1'" ')' ' L. th e formulas In Key Conccpl4. APPENDIX . The estimator of the variance of ffiois '" . with a modificHlion. 111CSC 5.174 CHAPTER 5 Regression Wit . is a constant: It the errors are hornoskedasric . 1 /I .'~l X~)Xi.3.'\!) ::= O"~. = 1 . os t3IJ 2 = '~<T' 110'2 . these are the "heteroskedasticityrobust" the OLS estimators of homoskedasticity..21) by the corresponding variance in the numerator of Equation (4.. the conditional variance of u. which allow for hcteroskedastic_ standard errors. where to correct by a degreesoffreedom adjustment used in the definiis estimated in the denominator X)'. .. The consistency of hClcroskcdasticityrobust standard errors is discussed in Section 17. R rmulas for the variance of and the associated standard errors are then given r r the special case HeteroskedasticityRobust ances in Equation Standard Errors the population X)2 varisample variances. for downward bias.1 Formulas for OLS Standard Errors are first presented under the least squares assumptions in Key Concept 4.!. analogously to the degreesoffreedom in Section 4.4 ormu as i 2 (T~ I Ii.26) iJo n iI where if.I Nl".2l) by these two estimators yields .3.N.(Xi) in quati n (4. given X.h a S· I R gressor· Hypothesis Tests and Confidence Intervals Ing e e ..4)..111e standard error of ffiois E(ffio) = al.] and va.Y II' E(X') (5.21) is estimated by the divisor n .Pine reasoning replacing population behind the estimator uJuis the same as behind expectations with sample averages./Lx)n.27) ~nder hornoskedasticity.(X/t L. This appendix discusses the formulas for OLS standard errors.28) .2 (instead of II) incorporates tion of the SER (l/n)2::~I(X. .JI in Equation (5. 1 1'/2~ Ii I I (5.3.4) is obtained by replacing (4. The n ~ 2 The estimator UPI defined in Equation (5. . .
21) and simplifying. and by TIle hornoskedaSlicity_only standard errors are obtained by substjtuting sample means and estimating the variance of tors of these variances are by the square of the SER. !Lx)'var(u. and where the final equality follows from the law of then var(u. .3). where E[(X.] = .]}') ~ E{[(X. The homoskedasticity_only square roots of (fpo and tTl standard errors are the APPENDIX  _ 5.21) as var[(X._.!Lx)u. "" (X. If u.IX.28).E[(X. A sim this expression into the numerator of Equation ilar calculation yields Equation (5.Ix.:'_". so 2. !Lx)u. Standard Errors means and variances in Equations (5.]') follows Theorem 175 (5. (homoskedasticity_only) and (5. then the OLS estimator is the best (most efficient) unbiased estimator (is BLUE).19).1 __ PI) n i=1 1" ) ( ~ Xf s~ n~ /I I 2: (X.27) and (5.!Lx)".2 s.X) '" Ii  ..)] substituting iterated (Section =iT. write the numerator .J].27) follows by (4.28).~ i=l .E[(X. the second equality because =0 (by the first least squares assumption) expectations ar E[(A'. (5. . HomoskedasticityOnly variances for the population It. (4. The hornoskedaSlicity_only estima Util . This appendix begins by stating the GaussMarkov condi tions and showing that they are implied by the three least squares condition plus .2 The GaussMarkov Conditions and a Proof of the GaussMarkov Theorem As discussed conditions in Section 5. :
i. 2).. bomoskedasucity .(Xi. Thus the least squares assumptions onccpt 4. (nonzero W £(II/IX.. . I t n ) (5..(X. (3. "/) a 1::(11. L:~I(X. squares assumptions. irnilarlv.. E .1"1. Xn) = £(/I. £(1111. and rhnt the err h lei coudln nrc uncorrclatcd fordif~ where all these statements nnlly on nil SqUMCS bserved X's assumptions and by 2. because E(II. .X)I.II)X" ." I (