Part 1

Cross Sectional Data
•Simple Linear Regression Model
– Chapter 2
•Multiple Regression Analysis –
Chapters 3 and 4
•Advanced Regression Topics –
Chapter 6
•Dummy Variables – Chapter 7
•Note: Appendices A, B, and C are
additional review if needed.

1. The Simple Regression Model
2.1 Definition of the Simple Regression
2.2 Deriving the Ordinary Least Squares
2.3 Properties of OLS on Any Sample of
2.4 Units of Measurement and Functional
2.5 Expected Values and Variances of the
OLS Estimators
2.6 Regression through the Origin

2.1 The Simple Regression Model
• Economics is built upon assumptions

-assume people are utility maximizers
-assume perfect information
-assume we have a can opener
• The Simple Regression Model is based on

-more assumptions are required for
more analysis
-disproving assumptions leads to
more complicated models

2.1 The Simple Regression Model

y   0  1 x  u


-relates two variables (x and y)
-also called the two-variable linear regression
model or bivariate linear regression model
y is the DEPENDENT or EXPLAINED variable
y is a function of x

1 The Simple Regression Model • Recall the SIMPLE LINEAR REGRESION MODEL: y   0  1 x  u (2.2.1) u is the ERROR TERM or DISTURBANCE variable -u takes into account all factors other than x that affect y -u accounts for all “unobserved” impacts on y .

) .1 The Simple Regression Model • Example of the SIMPLE LINEAR REGRESION MODEL: taste   0  1cookingtime  u (ie) -taste depends on cooking time -taste is explained by cooking time -taste is a function of cooking time -u accounts for other factors affecting taste (cooking skill. ingredients available. random luck. differing taste buds. etc.2.

2) -for example. a 2 increase in x would cause a 6 unit change in y (2 x 3 = 6) -B1 is the SLOPE PARAMETER -B0 is the INTERCEPT PARAMETER or CONSTANT TERM -not always useful in analysis .1 The Simple Regression Model • The SRM shows how y changes: y  1x if u  0 (2. if B1=3.2.

1) -note that this equation implies CONSTANT returns -the first x has the same impact on y as the 100th x -to avoid this we can include powers or change functional forms .2.1 The Simple Regression Model y   0  1 x  u (2.

simply increase B1 . it can always be modified to make (2.5) -if Bo is included in the equation. we first assume that the average of u in the population is zero: E (u)  0 (2. we need assumptions of u’s relationship with x -in order to simplify our assumptions.1 The Simple Regression Model -in order to achieve a ceteris paribus analysis of x’s affect on y.5) true -ie: if E(u)>0.2.

2.5) -called the ZERO CONDITIONAL MEAN .6) -the average value of u does not depend on x -second equality comes from (2. u and Dependence -we now need to assume that x and u are unrelated -if x and u are uncorrelated. u may still be correlated to functions such as x2 -we therefore need a stronger assumption: E (u | x)  E (u )  0 (2.1 x.

1 Example Take the regression: Papermark   0  1Paperquality  u (ie) -where u takes into other factors of the applied paper.2. in particular length exceeding 10 pages -assumption (2.6) requires that a paper’s length does not depend on how good it is: E (length | good paper)  E(length | bad paper)  0 .

8 is called the POPULATION REGRESSION FUNCTION (PRF) -a one unit increase in x increases the expected value of y by B1 -B0+B1x is the systematic (explained) part of y -u is the unsystematic (unexplained) part of y .6) give us: E (y | x)   0  1x (2.1) and (2.8) -2.1 The Simple Regression Model • Conditional Expectations of (2.2.

y):i=1.9) -here yi is explained by xi with error term ui -y5 indicates the observation of y from the 5th data point -this regression plots a “best fit” line through our data points: .….2.n} be a sample of n observations from the population yi   0  1x i  u i (2.2 Deriving the OLS Estimates • In order to estimate B1 and B2. we need sample data -let {(x.

2 Deriving the OLS Estimates These OLS estimates create a straight line going through the “middle” of the estimates: .2.

10) -Secondly. We must first assume that u has zero expected value: E (u)  0 (2.13) .11) can also be rewritten in terms of x and y as: E (y .2 Deriving OLS Estimates In order to derive OLS.12) (2.2.1x)  0 E[ x(y .1x)]  0 (2. u )  E ( xu )  0 (2.10) and (2. 0 . 0 . we must assume that the covariance between x and u is zero: Cov ( x.11) -(2. we first need assumptions.

12) and (2.ˆ x )  0 (y   (2.2.13) imply restrictions on the joint probability of the POPULATION -given SAMPLE data. these equations become: 1 n ˆ .ˆ x )  0 x (y    i i o 1i (2.14)  i o 1 i n i 1 1 n n ˆ .2 Deriving OLS Estimates -(2.15) i 1 -notice that the “hat” above B1 and B2 indicate we are now dealing with estimates -this is an example of “method of moments” estimation (see Section C for a discussion) .

16) Which can be rewritten as: ˆ0  y  ˆ1 x (2. (2.14) simplifies to: y  ˆ0  ˆ1 x (2.17) Which is our OLS estimate for the intercept -therefore given data and an estimate of the slope.2.2 Deriving OLS Estimates Using summation properties. the estimated intercept can be determined .

2.17) and 2.2 Deriving OLS Estimates By cancelling out 1/n and combining (2.15 we get: n ˆ x)  ˆ x ]  0 x [ y  ( y    i i 1 1 i 1 Which can be rewritten as: n n i 1 i 1 ˆ x ( y  y )    i i 1  xi ( xi  x ) .

2.2 Deriving OLS Estimates Recall the algebraic properties: n n  x [ x  x]   [ x  x] i 1 i i i 1 2 i And n  x [y i 1 i n i  y ]   [ xi  x][ yi  y ] i 1 .

2 Deriving OLS Estimates We can make the simple assumption that: n  [ x  x] i 1 i 2 0 (2.18) Which essentially states that not all x’s are the same -ie: you didn’t do a survey where one question is “are you alive?” -This is essentially the key assumption needed to estimate B 1hat .2.

B1hat will be positive (negative) .18) basically ensured the denominator is not zero.2.2 Deriving OLS All this gives usEstimates the OLS estimate for B1: n ˆ1   [ x  x][ y i i 1 n i  [ x  x] i 1  y] (2.19) 2 i Note that assumption (2. -also note that if x and y are positively (negatively) correlated.

2 Fitted Values OLS estimates of B1 and B2 give us a FITTED value for y when x=xi: ˆyi  ˆ0  ˆ1x i (2. less than or (rarely) equal to the actual y’s .20) -there is one fitted or predicted value of y for each observation of x -the predicted y’s can be greater than.2.

or residuals: uˆi  yi  yˆ i  yi  ˆ0  ˆ1x i (2. there is one residual for each observation -these residuals ARE NOT the same as the actual error term .2.2 Residuals The difference between the actual y values and the estimates is the ESTIMATED error.21) -again.

15) are our FIRST ORDER CONDITIONS (FOC’S) and we are able to derive the same OLS estimates as above (2.19) -the term “OLS” comes from the fact that the square of the residuals is minimized .22) i 1 -if B1hat and B2hat are chosen to minimize (2.2.2 Residuals The SUM OF SQUARED RESIDUALS can be expressed as: n 2 ˆ ˆ  uˆi  ( yi   0  1x i ) 2 (2.14) and (2.22). (2.17) and (2.

consistency.2 Why OLS? Why minimize the sum of the squared residuals? -Why not minimize the residuals themselves? -Why not minimize the cube of the residuals? -not all minimization techniques can be expressed as formulas -OLS has the advantage of deriving unbiasedness. and other important statistical properties. .2.

there are no subscripts -B0hat is the predicted value of y when x=0 -not always a valid value -(2.23) is also called the SAMPLE REGRESSION FUNCTION (SRF) -different data sets will estimate different B’s (2.2.2 Regression Line Our OLS regression supplies us with an OLS REGRESSION LINE: ˆy  ˆ0  ˆ1x -note that as this is an equation of a line.23) .

.24) Shows the change in yhat when x changes.25) The change in x can be multiplied by B1hat to estimate the change in y.2.2 Deriving OLS Estimates The slope estimate: ˆ1  yˆ/x (2. or alternatively. yˆ  ˆ1x (2.

only comment on positive or negative relations between x and y 3) We often use the terminology “regress y on x” to estimate y=f(x) .2. econometrics software (like Shazam) must be used. 2) A successful regression cannot conclude on causality.2 Deriving OLS Estimates • Notes: 1) As the mathematics required to estimate OLS is difficult with more than a few data points.

3 Properties of OLS on Any Sample of Data •Review -Once again.2. simple algebraic properties are needed in order to build OLS’s foundation -OLS (B1hat and B2hat) can be used to calculate fitted values (yhat) -the residual (u) is the difference between the actual y values and the estimated y values (yhat) .

3 Properties of OLS u=y-yhat Here yhat underpredicts y uhat yhat y .2.

30. the sum of all residuals is zero: n  uˆ i 1 i 0 (2.31) is proportional to the required sample covariance .2.15).3 Properties of OLS 1) From the FOC of OLS (2. the sample covariance between the regressors and the OLS residuals is zero: n  x uˆ i 1 i i 0 (2.14). the left side of 2.31) From 2.30) 2) Also from the FOC of OLS (2.

16): y  ˆ0  ˆ1 x (2. ybar) is always on the OLS regression line (from 2.2.16) Further Algebraic Gymnastics: 1) From (2.3 Properties of OLS 3) The point (xbar.30) we know that the sample average of the fitted y values equals the sample average of the actual y values: yˆ  y .

32) .30 and 2.31 combine to prove that the covariance between yhat and uhat is zero Therefore OLS breaks down yi into two uncorrelated parts – a fitted value and a residual: yi  yˆ i  uˆi (2.3 Properties of OLS Further Algebraic Gymnastics: 2) 2.2.

35) .34) i 1 n SSR   (y i .33) i 1 n 2 ˆ SSE   (y i .2.3 Sum of Squares From the idea of fitted and residual components. the EXPLAINED SUM OF SQUARES (SSE) and the n RESIDUAL SUM OF SQUARES (SSR) 2 SST   (y i .yˆ i ) 2  i 1 n 2 ˆ ( u )  i i 1 (2.y) (2.y) (2. we can calculate the TOTAL SUM OF SQUARES (SST).

SSR measures the sample variation in uhat (the residual component.36) . SSE measures the sample variation in yhat (the fitted component.3 Sum of Squares SST measures the sample variation in y.2. These relate to each other as follows: SST  SSE  SSR (2.

2 uˆ i (yi  y )  0 (2.37) .2.36) is as follows: 2 2 ˆ ˆ (y  y )  [(y  y )  ( y  y )]  i  i i i   [(uˆi )  ( yˆ i  y )]2 2 2 ˆ ˆ ˆ   (ui )  2 u i (yi  y )   ( yi  y )]  SSR  2 uˆ i (y i  y )  SSE Since we assumed that the covariance between residuals and fitted values is zero.3 Proof of Squares The proof of (2.

2. it is always important to look up the base formula .3 Properties of OLS on Any Sample of Data •Notes -An in-depth analysis of sample and inter-variable covariance is available in section C for individual study -SST. As such. SSE and SSR have differing interpretations and labels for different econometric software.

“How well does x explain y. we use R2. the COEFFICIENT OF DETERMINATION: SSE SSR R   1SST SST 2 (2. “How well does the OLS regression line fit the data?” -To measure this.” -We can’t answer that yet.38) . the question is begged.2.3 Goodness of Fit -Once we’ve run a regression. but we can ask.

3 Goodness of Fit -”R2 is the ratio of the explained variation compared to the total variation” -”the fraction of the sample variation in y that is explained by x” -R2 always lies between zero and 1 -if R2=1. OLS is a “poor fit” . all actual points lie on the regression line (usually an error) -if R2≈0. the regression explains very little.2.

that means 12% of the variation is explained. which is better than the 0% before the regression . if R2=0.3 Properties of OLS on Any Sample of Data •Notes -A low R2 is not uncommon in the social sciences.12. especially in cross-sectional analysis -econometric regressions should not be heavily judged due to a low R 2 -for example.2.