19 views

Uploaded by Celine Lin

save

- Stat Solutions
- HLM
- GRM: Generalized Regression Model for Clustering Linear Sequences
- Chap 2
- IJCAI13-032.pdf
- EPA Detailed Comments on Moffat Project
- 0523-Business Mathematics and Statistics
- Simple Lin Regress Inference
- regression.pdf
- January 2015 (IAL) QP - S1 Edexcel
- 81063542 Regression Analysis
- Multiple Regression Analysis Part 1
- ISO-8859-1 Brand Equity Estimation
- Regression Estimators
- MELJUN CORTES ABSTRACT Research Paper 6 Metro South JULY 9 2018
- a
- 1905702
- MTH302
- TQM en La Rentabilidad
- A Contribution to the Understanding of Illegal Copying of Software
- Accenture Top Considerations Wholesale Credit Loss Ccar
- ContentServer (1)
- R Si Rezilienta La Studenti
- Determinants of Demand for International Tourist Flow to Turkey. Tourism Management
- Job Card
- Demand Estimation
- tmp31EB
- 1.5.2 VPH-GCH
- EM 513 - Practice Problem Set 2 - Panama
- Uso de La US en Deshidratación.
- Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
- Dispatches from Pluto: Lost and Found in the Mississippi Delta
- The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution
- Sapiens: A Brief History of Humankind
- Yes Please
- The Unwinding: An Inner History of the New America
- Grand Pursuit: The Story of Economic Genius
- This Changes Everything: Capitalism vs. The Climate
- A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
- The Emperor of All Maladies: A Biography of Cancer
- The Prize: The Epic Quest for Oil, Money & Power
- John Adams
- Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
- The World Is Flat 3.0: A Brief History of the Twenty-first Century
- Rise of ISIS: A Threat We Can't Ignore
- The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
- Smart People Should Build Things: How to Restore Our Culture of Achievement, Build a Path for Entrepreneurs, and Create New Jobs in America
- Team of Rivals: The Political Genius of Abraham Lincoln
- The New Confessions of an Economic Hit Man
- How To Win Friends and Influence People
- Angela's Ashes: A Memoir
- Steve Jobs
- Bad Feminist: Essays
- You Too Can Have a Body Like Mine: A Novel
- The Incarnations: A Novel
- The Light Between Oceans: A Novel
- Leaving Berlin: A Novel
- The Silver Linings Playbook: A Novel
- The Sympathizer: A Novel (Pulitzer Prize for Fiction)
- Extremely Loud and Incredibly Close: A Novel
- A Man Called Ove: A Novel
- The Master
- Bel Canto
- We Are Not Ourselves: A Novel
- The First Bad Man: A Novel
- The Rosie Project: A Novel
- The Blazing World: A Novel
- Brooklyn: A Novel
- The Flamethrowers: A Novel
- Life of Pi
- The Love Affairs of Nathaniel P.: A Novel
- The Bonfire of the Vanities: A Novel
- Lovers at the Chameleon Club, Paris 1932: A Novel
- The Perks of Being a Wallflower
- A Prayer for Owen Meany: A Novel
- The Cider House Rules
- Wolf Hall: A Novel
- The Art of Racing in the Rain: A Novel
- The Wallcreeper
- Interpreter of Maladies
- The Kitchen House: A Novel
- Beautiful Ruins: A Novel
- Good in Bed

You are on page 1of 59

January 26, 2009

Outline

Descriptive Analysis

Causal Estimation

Forecasting

Regression Model

We are actually going to derive the linear regression model in 3

very different ways

While the math for doing it is identical, conceptually they are

very different ideas

Outline

Descriptive Analysis

Causal Estimation

Forecasting

Descriptive Analysis

Here our goal is simply to estimate E(y | x)

However, when x is continuous we can’t do it directly

thus, we need a model for what this looks like

The easiest model I can think of is

E(y | x) = β

0

+ β

1

x

To interpret it notice that

E(y | x = 0) = β

0

you can also see that β

1

is the slope coefﬁcient

∂E(y | x = 0)

∂x

= β

1

Estimation

Now we need some way to estimate it

It will be useful to deﬁne

u = y −β

0

−β

1

x

What do we know about u?

The fact that E(y | x) = β

0

+ β

1

x means that

E(u | x) =E(y −β

0

−β

1

x | x)

=E(y | x) −β

0

−β

1

x

=β

0

+ β

1

x −β

0

−β

1

x

=0

The fact that E(u | x) = 0 means two important things:

1

E(u) = 0

2

The expected value of u does not change when we change

x.

This second thing has a number of implications, but the

most useful (and intuitive) is that

0 = cov(u, x)

= E(ux) −E(u)E(x)

Putting this together we have two conditions:

E(u) = 0

E(ux) = 0

What was the point of all of that?

To estimate an expectation we use a sample mean.

What if replace the expectations above with sample mean

expressions?

Before doing this lets be clear what we mean the sample.

Assume that I observe a sample size of N

Let i = 1, ..., N index the people in the data

Lets look at this for the voting data set

We will write the population regression model as

y

i

= β

0

+ β

1

x

i

+u

i

where (for example) y

i

is the value of y for individual i .

Now lets go back to the equations.

To think about estimation lets deﬁne the sample regression

model

y

i

=

´

β

0

+

´

β

1

x

i

+

ˆ

u

i

where

´

β

0

and

´

β

1

are our estimates of β

0

and β

1

from the sample

ˆ

u

i

is deﬁned as

ˆ

u

i

= y

i

−

´

β

0

−

´

β

1

x

i

How do we want to estimate this model?

Well if

´

β

0

and

´

β

1

are like β

0

and β

1

, then

ˆ

u

i

should be like u

i

.

We know that in the population E(u

i

) = 0 so it makes sense to

force this to be approximately true in the sample

This is the idea of a sample analogue

If the sample looks like the population, then lets force things to

be true in the sample which we know would be true in the

population

0 =

1

N

N

i =1

´

u

i

=

1

N

N

i =1

_

y

i

−

´

β

0

−

´

β

1

x

i

_

=

1

N

N

i =1

y

i

−

1

N

N

i =1

´

β

0

−

1

N

N

i =1

´

β

1

x

i

=

1

N

N

i =1

y

i

−

´

β

0

−

´

β

1

1

N

N

i =1

x

i

= y −

´

β

0

−

´

β

1

x

Which I can write as

´

β

0

= y −

´

β

1

x

Now we have two unknowns an one equation, we need one

more.

The population expression is

E(u

i

x

i

) = 0.

The sample analogue of this is

0 =

1

N

N

i =1

´

u

i

x

i

=

1

N

N

i =1

_

y

i

−

´

β

0

−

´

β

1

x

i

_

x

i

=

1

N

N

i =1

_

y

i

x

i

−

´

β

0

x

i

−

´

β

1

x

2

i

_

=

1

N

N

i =1

y

i

x

i

−

1

N

N

i =1

´

β

0

x

i

−

1

N

N

i =1

´

β

1

x

2

i

=

1

N

N

i =1

y

i

x

i

−

´

β

0

1

N

N

i =1

x

i

−

´

β

1

1

N

N

i =1

x

2

i

now lets take this equation and plug in the fact that

´

β

0

= y −

´

β

1

x

0 =

1

N

N

i =1

y

i

x

i

−

´

β

0

1

N

N

i =1

x

i

−

´

β

1

1

N

N

i =1

x

2

i

=

1

N

N

i =1

y

i

x

i

−

_

y −

´

β

1

x

_

1

N

N

i =1

x

i

−

´

β

1

1

N

N

i =1

x

2

i

=

1

N

N

i =1

y

i

x

i

−

1

N

N

i =1

yx

i

+

´

β

1

1

N

N

i =1

x

i

x −

´

β

1

1

N

N

i =1

x

i

x

i

=

1

N

N

i =1

(y

i

−y) x

i

−

´

β

1

1

N

N

i =1

x

i

(x

i

−x)

Next we will use a result about means that we discussed in the

statistics review

N

i =1

x

i

(x

i

−x) =

N

i =1

x

i

(x

i

−x) −

N

i =1

x (x

i

−x)

=

N

i =1

(x

i

−x)

2

N

i =1

(y

i

−y) x

i

=

N

i =1

(y

i

−y) x

i

−

N

i =1

(y

i

−y) x

=

N

i =1

(y

i

−y) (x

i

−

¯

x)

Using that we can solve for

´

β

1

´

β

1

=

1

N

N

i =1

(y

i

−y) (x

i

−

¯

x)

1

N

N

i =1

(x

i

−x)

2

and we still know that

´

β

0

= y −

´

β

1

x

And we are done.

Note that since

´

β

1

=

1

N

N

i =1

(y

i

−y) (x

i

−

¯

x)

1

N

N

i =1

(x

i

−x)

2

it is essentially the sample covariance between x and y divided

by the sample variance of x.

The sample variance must be positive therefore:

The regression coefﬁcient, the covariance, and the

correlation coefﬁcient must all have the same sign

They will only be zero if all of them are zero

Voting Example

Lets use this to think about the voting example

We can say nothing about causality.

There is a clear positive relationship, but it could be due to

anything

x →y Spending a lot of money gets people to like you

which leads to more votes

z →x, z →y People who are popular attract a lot of money

and get a lot of votes

y →x Its all about inﬂuence. I only give money to the

people who are going to win so I can inﬂuence

their future votes.

CEO example

As another example lets consider the relationship between

CEO salary and the Return on Equity

This comes from the data set CEOSAL1 in Wooldridge

We know that CEOs make tons of money, but is it just a scam

or are they actually worth something?

The key variable is return on equity (roe) deﬁned as “net

income as a percentage of common equity”

A number of 10 means if I invest a $100 in equity in a ﬁrm I

earn $10 each year

Salary is measured in terms of $1000 per year

We will write the conditional expectation as

E(salary | roe) = β

0

+ β

1

roe

We can run this regression in stata

For every point higher in return on equity CEO salary raised by

$18,501 per year

Really not that much if you think about it.

Is this causal?

Again we have all of the three possibilities

x →y When stock price does well, CEOs get a raise

z →x, z →y Good CEOs tend to make a lot of money and

their companies perform well

y →x By paying my CEO more I get her to work harder

and push the stock price up

Outline

Descriptive Analysis

Causal Estimation

Forecasting

To really say something about causality we need to make some

more assumptions

This involves writing down a “structural data generating model”

This will look similar to what we have been doing, but

conceptually is very different

For the conditional expectation data we started with the

data and then asked which model could help summarize it

For the structural case we start with the model and then

use it to say what the data will look like

OK we need a model to do this

We will use the same regression model:

y = β

0

+ β

1

x +u

and think about it this way:

1

β

0

and β

1

are real number which are out there in nature

and we want to uncover them

2

you choose x

3

Nature (or God) chose u in a way that is unrelated to your

choice of x

This is now a real causal model

suppose that

y is your annual income

x is your education

β

0

= 10, 000

β

1

= 5000

u = −5000

So what does this mean?

Suppose I choose different levels of education, here is the

earnings I will get

Education Earnings

0 $5000

12 $65,000

16 $85,000

18 $95,000

It might seem a little goofy to assume that u were known

You might think that you make your college going decision

without knowing u

Actually, it kind of doesn’t really matter

If you know the model you know that graduating college (16)

versus not going to college (12) is worth about $20,000 per year

Probably a pretty good investment

What is u?

It is things that affect your earnings other than education.

Examples:

Intelligence

Work Effort

Smoothness

Family Connections

race

gender

age

Estimation

Now we need a way to estimate our model In particular we

want to estimate β

0

and β

1

The question is how is u determined?

The standard assumption (at least the one to start with) is that

u is essentially assigned at random

We can write this as

E (u | x) = 0

Note that this is the same as what we did before, but

conceptually very different

Before u was deﬁned simply as y −β

0

−β

1

x it didn’t

actually mean anything

Now we think of u as this real thing that is actually out there

and means something-it is just that we can’t observe it

As before, the fact that we have assumed E (u | x) = 0 really

means two separate things one of which is a big deal the other

isn’t:

1

E (u | x) does not vary with x

2

E(u) = 0 (as opposed to some other number)

The ﬁrst is a really important assumption.

The second really isn’t. Suppose we instead assumed that

E (u | x) = 6

with the model

β

0

+ β

1

x +u

This is equivalent to another model

γ

0

+ γ

1

x + ν

with

γ

0

=β

0

+6

γ

1

=β

1

ν =u −6

Notice that now

E (ν | x) = E (u −6 | x)

= E (u | x) −6

= 6 −6

= 0

β

1

and γ

1

are the same and that is really what we care

about (remember the education model above)

Thus, the fact that we pick zero is really a Normalization.

It makes the model well deﬁned (identiﬁed) without any

real imposition on the data

However, the fact that E (u | x) does not vary with x is

much more than a normalization. It has real content.

Now that we have a model how do we estimate it?

Now nothing is really any different than before

We can write down the sample regression function as

y

i

=

´

β

0

+

´

β

1

x

i

+

ˆ

u

i

we know that

E(u

i

) =0

E (u

i

x

i

) =0

So it makes sense to use the sample analogues

1

N

N

i =1

ˆ

u

i

=0

1

N

N

i =1

ˆ

u

i

x

i

=0

which will give you

´

β

1

=

1

N

N

i =1

(y

i

−y) (x

i

−

¯

x)

1

N

N

i =1

(x

i

−x)

2

´

β

0

=y −

´

β

1

x

exactly as before

It is only the interpretation that has changed.

Lets look back on some of our models and interpret things in

this way.

Let me be very clear about something: the question of whether

this additional assumption is reasonable or not is deﬁnitely

questionable in all of these cases

Lets worry about that later and now just think about given the

assumption what is the interpretation.

Voting Example

For the voting example we get the sample regression function

voteA = 26.812 +0.463shareA +

´

u

i

As usual the

ˆ

β

0

parameter is not very interesting. It tells me the

fraction of the votes I would get if I spent no money.

I would get about 27% of the vote on average if I spent no

money (thats actually pretty good if you think about it)

The key thing is that for every 1% that I outspend my opponent,

my votes increase by 0.463.

Suppose that currently I am spending the same amount as my

opponent, then

shareA = 50

If I double my spending (and my opponent does not react) then

shareA = 67

I would get

17 ×0.463 = 7.8

more of the vote

Is it worth it? (I don’t know the answer to this, but this is the

economic question)

CEO Example

For the CEO model we got

salary = 963.191 +18.501roe +

´

u

i

I would interpret this as the CEO’s pay schedule

If the CEO can get the return on Equity to increase by 10

percentage points more, she will earn an extra $185,010 next

year

This really is not that much money given how large a 10

percentage point increase is

Schooling Example

Lets think about the schooling example

In some ways income is better, but Wooldridge used hourly

wage instead so lets use that

The sample regression model is

wage

i

=

´

β

0

+

´

β

1

educ

i

+

ˆ

u

i

We use the data set wage1 which gives data from 1976

we got

wage

i

= −0.90 +0.54educ

i

+

ˆ

u

i

For every extra year of education I get, my wage increase by

about 50 cents an hour

Thus four years of education is worth

4 ×0.54 = 2.16

an hour.

To get this into annual income ﬁgure if I am full time, I work

about 2000 hours per year (50 weeks ×40hours per week).

This works out to about $4,320 a year

Was it worth it?

Outline

Descriptive Analysis

Causal Estimation

Forecasting

Lets completely shift gears

Imagine the following

You have a bunch of data on x

i

and y

i

now

You know the value of x

∗

tomorrow and want to predict

what the value of y will be tomorrow

Examples

Inﬂation Rate this year, Unemployment Rate next year

Corporate Proﬁts Today, Stock Price Tomorrow

SAT Scores, College GPA

We are going to use the sample regression model.

In order to do prediction we need a model.

Lets once again use the linear model

y

i

=

´

β

0

+

´

β

1

x

i

+

ˆ

u

i

except now we also want to deﬁne a predicted value as

´

y

i

=

´

β

0

+

´

β

1

x

i

Our goal is to predict

´

y

∗

=

´

β

0

+

´

β

1

x

∗

It is important to note that this is a very different question than

the causal model question in that WE DO NOT CHOOSE x

∗

Think about the inﬂation/unemployment example to see the

difference between the following two questions:

I am Ben Bernanke. If I change the inﬂation rate to 2.3%

what will happen to the unemployment rate next year

I observe the current inﬂation rate is 2.3%, given that what

is my best guess of the unemployment rate next year

The second question is much less ambitious (which doesn’t

mean it isn’t hard) This second question is the forecasting

question

How might I do this?

Well, I want y to be close to

´

y

∗

so for a good model we want the

´

y

i

’s “close” to the y

i

’s.

How do we decide what close means?

We want some function to measure the “distance” between

ˆ

y

i

and y

i

One obvious example is just how far apart they are from each

other

The ﬁrst thing to try is just the absolute difference between the

two

¸

¸

y

i

−

´

y

i

¸

¸

The problem is the absolute value is a really ugly function

A much smoother function is

_

y

i

−

´

y

i

_

2

This looks much nicer

We have shown what the difference is for one data point, but

we want to do it for all of the data points

N

i =1

_

y

i

−

´

y

i

_

2

=

N

i =1

_

y

i

−

´

β

0

−

´

β

1

x

i

_

2

This says how close our model is to the data.

We want it as close as possible so we want to choose

´

β

0

and

´

β

1

to minimize this function.

To do this we take the derivative of this function with respect to

´

β

0

and

´

β

1

and set to zero

First consider

´

β

0

,

0 =

∂

N

i =1

_

y

i

−

´

β

0

−

´

β

1

x

i

_

2

∂

´

β

0

=

N

i =1

∂

_

y

i

−

´

β

0

−

´

β

1

x

i

_

2

∂

´

β

0

=

N

i =1

−2

_

y

i

−

´

β

0

−

´

β

1

x

i

_

=

1

N

N

i =1

_

y

i

−

´

β

0

−

´

β

1

x

i

_

=

1

N

N

i =1

y

i

−

1

N

N

i =1

´

β

0

−

1

N

N

i =1

´

β

1

x

i

= y −

´

β

0

−

´

β

1

x

or

´

β

0

= y −

´

β

1

x

which is exactly what we got before

Next consider

ˆ

β

1

0 =

∂

N

i =1

_

y

i

−

´

β

0

−

´

β

1

x

i

_

2

∂

´

β

1

=

N

i =1

∂

_

y

i

−

´

β

0

−

´

β

1

x

i

_

2

∂

´

β

1

=

N

i =1

−2x

i

_

y

i

−

´

β

0

−

´

β

1

x

i

_

=

1

N

N

i =1

x

i

_

y

i

−

´

β

0

−

´

β

1

x

i

_

=

1

N

N

i =1

x

i

y

i

−

1

N

N

i =1

x

i

´

β

0

−

1

N

N

i =1

x

i

´

β

1

x

i

=

1

N

N

i =1

x

i

y

i

−

´

β

0

1

N

N

i =1

x

i

−

´

β

1

1

N

N

i =1

x

2

i

Now lets plug in the value for

´

β

0

0 =

1

N

N

i =1

x

i

y

i

−

_

y −

´

β

1

x

_

1

N

N

i =1

x

i

−

´

β

1

1

N

N

i =1

x

2

i

=

1

N

N

i =1

x

i

(y

i

−y) −

´

β

1

1

N

N

i =1

x

i

(x

i

−x)

for the same reasons as before we can write this as

´

β

1

=

1

N

N

i =1

(x

i

−x) (y

i

−y)

1

N

N

i =1

(x

i

−x)

2

again exactly the same as before

Again we have another way of thinking about the same

regression model.

Lets use stata to think about a few examples

Inﬂation and Unemployment

SAT scores and college GPA

Since we got

´

β

0

=1.2792

´

β

1

=0.0008918

Thus suppose I have two applicants, one with SAT scores of

1000, the other with 12000

what is my forecast of their GPAs

1.2792 +0.0008918 ×1000 = 2.17

1.2792 +0.0008918 ×1200 = 2.35

Outline

Descriptive Analysis

Causal Estimation

Forecasting

Regression Model

We are actually going to derive the linear regression model in 3 very different ways While the math for doing it is identical, conceptually they are very different ideas

Outline

Descriptive Analysis

Causal Estimation

Forecasting

Descriptive Analysis

Here our goal is simply to estimate E(y | x) However, when x is continuous we can’t do it directly thus, we need a model for what this looks like The easiest model I can think of is E(y | x) = β0 + β1 x

To interpret it notice that E(y | x = 0) = β0 you can also see that β1 is the slope coefﬁcient ∂E(y | x = 0) = β1 ∂x .

Estimation Now we need some way to estimate it It will be useful to deﬁne u = y − β0 − β1 x What do we know about u? The fact that E(y | x) = β0 + β1 x means that E(u | x) =E(y − β0 − β1 x | x) =E(y | x) − β0 − β1 x =0 =β0 + β1 x − β0 − β1 x .

This second thing has a number of implications. but the most useful (and intuitive) is that 0 = cov (u. x) = E(ux) − E(u)E(x) .The fact that E(u | x) = 0 means two important things: 1 E(u) = 0 2 The expected value of u does not change when we change x.

Putting this together we have two conditions: E(u) = 0 E(ux) = 0 .

Assume that I observe a sample size of N Let i = 1. N index the people in the data Lets look at this for the voting data set . What if replace the expectations above with sample mean expressions? Before doing this lets be clear what we mean the sample.... .What was the point of all of that? To estimate an expectation we use a sample mean.

To think about estimation lets deﬁne the sample regression model ˆ yi = β0 + β1 xi + ui where β0 and β1 are our estimates of β0 and β1 from the sample ˆ ui is deﬁned as ˆ ui = yi − β0 − β1 xi . Now lets go back to the equations.We will write the population regression model as yi = β0 + β1 xi + ui where (for example) yi is the value of y for individual i.

then lets force things to be true in the sample which we know would be true in the population . We know that in the population E(ui ) = 0 so it makes sense to force this to be approximately true in the sample This is the idea of a sample analogue If the sample looks like the population.How do we want to estimate this model? ˆ Well if β0 and β1 are like β0 and β1 . then ui should be like ui .

0 = 1 N 1 N 1 N 1 N N ui i=1 N i=1 N i=1 N i=1 = yi − β0 − β1 xi yi − 1 N N i=1 = β0 − 1 N 1 N N N β1 xi i=1 = yi − β0 − β1 xi i=1 = y − β0 − β1 x Which I can write as β0 = y − β1 x .

The population expression is E(ui xi ) = 0. . we need one more.Now we have two unknowns an one equation.

The sample analogue of this is 0 = 1 N 1 N 1 N 1 N 1 N N ui xi i=1 N i=1 N i=1 N i=1 N i=1 = yi − β0 − β1 xi xi yi xi − β0 xi − β1 xi2 1 yi xi − N yi xi − β0 N i=1 = = 1 β0 xi − N N N i=1 β1 xi2 N i=1 = 1 N i=1 xi − β1 1 N xi2 .

now lets take this equation and plug in the fact that β0 = y − β1 x N i=1 N i=1 N i=1 N i=1 N i=1 N i=1 0 = 1 N 1 N 1 N 1 N yi xi − β0 1 N xi − β1 1 N 1 N xi2 1 N N i=1 N N i=1 = yi xi − y − β1 x 1 yi xi − N N i=1 xi − β1 N i=1 xi2 xi xi i=1 = 1 yxi + β1 N N i=1 1 xi x − β1 N = 1 (yi − y ) xi − β1 N xi (xi − x) .

Next we will use a result about means that we discussed in the statistics review N i=1 N N xi (xi − x) = = i=1 N i=1 N xi (xi − x) − (xi − x)2 i=1 x (xi − x) N i=1 N (yi − y ) xi = = i=1 N i=1 (yi − y ) xi − i=1 (yi − y ) x ¯ (yi − y ) (xi − x ) .

.Using that we can solve for β1 β1 = 1 N N i=1 (yi − y ) (xi − N 2 1 i=1 (xi − x) N ¯ x) and we still know that β0 = y − β1 x And we are done.

Note that since β1 = 1 N N i=1 (yi − y ) (xi − N 2 1 i=1 (xi − x) N ¯ x) it is essentially the sample covariance between x and y divided by the sample variance of x. and the correlation coefﬁcient must all have the same sign They will only be zero if all of them are zero . the covariance. The sample variance must be positive therefore: The regression coefﬁcient.

but it could be due to anything x → y Spending a lot of money gets people to like you which leads to more votes z → x.Voting Example Lets use this to think about the voting example We can say nothing about causality. . There is a clear positive relationship. I only give money to the people who are going to win so I can inﬂuence their future votes. z → y People who are popular attract a lot of money and get a lot of votes y → x Its all about inﬂuence.

but is it just a scam or are they actually worth something? The key variable is return on equity (roe) deﬁned as “net income as a percentage of common equity” A number of 10 means if I invest a $100 in equity in a ﬁrm I earn $10 each year Salary is measured in terms of $1000 per year .CEO example As another example lets consider the relationship between CEO salary and the Return on Equity This comes from the data set CEOSAL1 in Wooldridge We know that CEOs make tons of money.

.We will write the conditional expectation as E(salary | roe) = β0 + β1 roe We can run this regression in stata For every point higher in return on equity CEO salary raised by $18.501 per year Really not that much if you think about it.

CEOs get a raise z → x.Is this causal? Again we have all of the three possibilities x → y When stock price does well. z → y Good CEOs tend to make a lot of money and their companies perform well y → x By paying my CEO more I get her to work harder and push the stock price up .

Outline Descriptive Analysis Causal Estimation Forecasting .

but conceptually is very different For the conditional expectation data we started with the data and then asked which model could help summarize it For the structural case we start with the model and then use it to say what the data will look like .To really say something about causality we need to make some more assumptions This involves writing down a “structural data generating model” This will look similar to what we have been doing.

OK we need a model to do this We will use the same regression model: y = β0 + β1 x + u and think about it this way: 1 β0 and β1 are real number which are out there in nature and we want to uncover them you choose x Nature (or God) chose u in a way that is unrelated to your choice of x 2 3 This is now a real causal model .

suppose that y is your annual income x is your education β0 = 10. 000 β1 = 5000 u = −5000 So what does this mean? .

Suppose I choose different levels of education.000 $85. here is the earnings I will get Education 0 12 16 18 Earnings $5000 $65.000 .000 $95.

000 per year Probably a pretty good investment .It might seem a little goofy to assume that u were known You might think that you make your college going decision without knowing u Actually. it kind of doesn’t really matter If you know the model you know that graduating college (16) versus not going to college (12) is worth about $20.

What is u? It is things that affect your earnings other than education. Examples: Intelligence Work Effort Smoothness Family Connections race gender age .

Estimation Now we need a way to estimate our model In particular we want to estimate β0 and β1 The question is how is u determined? The standard assumption (at least the one to start with) is that u is essentially assigned at random We can write this as E (u | x) = 0 .

Note that this is the same as what we did before. but conceptually very different Before u was deﬁned simply as y − β0 − β1 x it didn’t actually mean anything Now we think of u as this real thing that is actually out there and means something-it is just that we can’t observe it .

.As before. the fact that we have assumed E (u | x) = 0 really means two separate things one of which is a big deal the other isn’t: 1 2 E(u) = 0 (as opposed to some other number) E (u | x) does not vary with x The ﬁrst is a really important assumption.

Suppose we instead assumed that E (u | x) = 6 with the model β0 + β1 x + u This is equivalent to another model γ0 + γ1 x + ν with γ0 =β0 + 6 γ1 =β1 ν =u − 6 .The second really isn’t.

It makes the model well deﬁned (identiﬁed) without any real imposition on the data However.Notice that now E (ν | x) = E (u − 6 | x) = E (u | x) − 6 = 6−6 = 0 β1 and γ1 are the same and that is really what we care about (remember the education model above) Thus. . It has real content. the fact that we pick zero is really a Normalization. the fact that E (u | x) does not vary with x is much more than a normalization.

Now that we have a model how do we estimate it? Now nothing is really any different than before We can write down the sample regression function as ˆ yi = β0 + β1 xi + ui we know that E(ui ) =0 E (ui xi ) =0 .

.So it makes sense to use the sample analogues 1 N 1 N which will give you β1 = N 1 N ˆ ui =0 i=1 N ˆ ui xi =0 i=1 β0 =y − β1 x exactly as before N i=1 (yi − y ) (xi − N 2 1 i=1 (xi − x) N ¯ x) It is only the interpretation that has changed.

Lets look back on some of our models and interpret things in this way. Let me be very clear about something: the question of whether this additional assumption is reasonable or not is deﬁnitely questionable in all of these cases Lets worry about that later and now just think about given the assumption what is the interpretation. .

.463. It tells me the fraction of the votes I would get if I spent no money.463shareA + ui ˆ As usual the β0 parameter is not very interesting. I would get about 27% of the vote on average if I spent no money (thats actually pretty good if you think about it) The key thing is that for every 1% that I outspend my opponent.812 + 0.Voting Example For the voting example we get the sample regression function voteA = 26. my votes increase by 0.

463 = 7. but this is the economic question) .8 Is it worth it? (I don’t know the answer to this.Suppose that currently I am spending the same amount as my opponent. then shareA = 50 If I double my spending (and my opponent does not react) then shareA = 67 I would get more of the vote 17 × 0.

CEO Example For the CEO model we got salary = 963.191 + 18.010 next year This really is not that much money given how large a 10 percentage point increase is .501roe + ui I would interpret this as the CEO’s pay schedule If the CEO can get the return on Equity to increase by 10 percentage points more. she will earn an extra $185.

my wage increase by about 50 cents an hour ˆ wagei = −0.54educi + ui . but Wooldridge used hourly wage instead so lets use that The sample regression model is ˆ wagei = β0 + β1 educi + ui We use the data set wage1 which gives data from 1976 we got For every extra year of education I get.90 + 0.Schooling Example Lets think about the schooling example In some ways income is better.

I work about 2000 hours per year (50 weeks ×40hours per week).320 a year Was it worth it? . To get this into annual income ﬁgure if I am full time.Thus four years of education is worth 4 × 0.54 = 2.16 an hour. This works out to about $4.

Outline Descriptive Analysis Causal Estimation Forecasting .

Lets completely shift gears Imagine the following You have a bunch of data on xi and yi now You know the value of x ∗ tomorrow and want to predict what the value of y will be tomorrow .

Examples Inﬂation Rate this year. College GPA . Unemployment Rate next year Corporate Proﬁts Today. Stock Price Tomorrow SAT Scores.

In order to do prediction we need a model.We are going to use the sample regression model. Lets once again use the linear model ˆ yi = β0 + β1 xi + ui except now we also want to deﬁne a predicted value as yi = β0 + β1 xi Our goal is to predict y ∗ = β0 + β1 x ∗ It is important to note that this is a very different question than the causal model question in that WE DO NOT CHOOSE x ∗ .

Think about the inﬂation/unemployment example to see the difference between the following two questions: I am Ben Bernanke.3% what will happen to the unemployment rate next year I observe the current inﬂation rate is 2.3%. If I change the inﬂation rate to 2. given that what is my best guess of the unemployment rate next year The second question is much less ambitious (which doesn’t mean it isn’t hard) This second question is the forecasting question How might I do this? .

How do we decide what close means? ˆ We want some function to measure the “distance” between yi and yi One obvious example is just how far apart they are from each other The ﬁrst thing to try is just the absolute difference between the two yi − yi . I want y to be close to y ∗ so for a good model we want the yi ’s “close” to the yi ’s.Well.

The problem is the absolute value is a really ugly function .

A much smoother function is yi − yi This looks much nicer 2 .

We have shown what the difference is for one data point. To do this we take the derivative of this function with respect to β0 and β1 and set to zero . We want it as close as possible so we want to choose β0 and β1 to minimize this function. but we want to do it for all of the data points N i=1 yi − yi 2 N = i=1 yi − β0 − β1 xi 2 This says how close our model is to the data.

0 = ∂ N i=1 yi − β0 − β1 xi ∂ β0 2 2 N = i=1 N ∂ yi − β0 − β1 xi ∂ β0 −2 yi − β0 − β1 xi N i=1 N i=1 = i=1 = 1 N 1 N yi − β0 − β1 xi 1 yi − N N i=1 = 1 β0 − N N β1 xi i=1 = y − β0 − β1 x .First consider β0 .

or which is exactly what we got before β0 = y − β1 x .

ˆ Next consider β1 0 = ∂ N i=1 yi − β0 − β1 xi ∂ β1 2 2 N = i=1 N ∂ yi − β0 − β1 xi ∂ β1 = i=1 −2xi yi − β0 − β1 xi N = 1 N i=1 xi yi − β0 − β1 xi .

= 1 N 1 N N i=1 N i=1 1 xi yi − N xi yi − β0 N i=1 1 xi β0 − N N N xi β1 xi i=1 = 1 N i=1 xi − β1 1 N N i=1 xi2 .

Now lets plug in the value for β0 0 = 1 N 1 N N i=1 N i=1 1 xi yi − y − β1 x N xi (yi − y ) − β1 1 N N N i=1 1 xi − β1 N N i=1 xi2 = i=1 xi (xi − x) for the same reasons as before we can write this as β1 = 1 N N i=1 (xi − x) (yi − N 2 1 i=1 (xi − x) N y) again exactly the same as before .

Lets use stata to think about a few examples Inﬂation and Unemployment SAT scores and college GPA .Again we have another way of thinking about the same regression model.

2792 + 0.35 . one with SAT scores of 1000.0008918 × 1000 = 2.Since we got β0 =1.0008918 × 1200 = 2.2792 β1 =0.0008918 Thus suppose I have two applicants. the other with 12000 what is my forecast of their GPAs 1.17 1.2792 + 0.

- Stat SolutionsUploaded byrajeev_khanna_15
- HLMUploaded byamirriaz1984
- GRM: Generalized Regression Model for Clustering Linear SequencesUploaded byFanny Sylvia C.
- Chap 2Uploaded byHuyen My
- IJCAI13-032.pdfUploaded byvarunt92
- EPA Detailed Comments on Moffat ProjectUploaded byBob Berwyn
- 0523-Business Mathematics and StatisticsUploaded byShujaat Naqvi
- Simple Lin Regress InferenceUploaded byfa2heem
- regression.pdfUploaded byapriani_herni
- January 2015 (IAL) QP - S1 EdexcelUploaded byahamed
- 81063542 Regression AnalysisUploaded byAlphaeus
- Multiple Regression Analysis Part 1Uploaded bykinhai_see
- ISO-8859-1 Brand Equity EstimationUploaded byonkar_mohapatra
- Regression EstimatorsUploaded byosiccor
- MELJUN CORTES ABSTRACT Research Paper 6 Metro South JULY 9 2018Uploaded byMELJUN CORTES, MBA,MPA
- aUploaded byArham Rahim
- 1905702Uploaded byAlejandro Ávila
- MTH302Uploaded byMisbah Sarwar
- TQM en La RentabilidadUploaded byGilberth
- A Contribution to the Understanding of Illegal Copying of SoftwareUploaded bydmacedo
- Accenture Top Considerations Wholesale Credit Loss CcarUploaded byAnonymous iVNvuRKGV
- ContentServer (1)Uploaded byCANDYPA
- R Si Rezilienta La StudentiUploaded byAnonymous fgWsWiSLl
- Determinants of Demand for International Tourist Flow to Turkey. Tourism ManagementUploaded bySriyantha Fernando
- Job CardUploaded byEr Pupone de Naza
- Demand EstimationUploaded byvivek1119
- tmp31EBUploaded byFrontiers
- 1.5.2 VPH-GCHUploaded byCarlos HV
- EM 513 - Practice Problem Set 2 - PanamaUploaded byAbel Carr Soto
- Uso de La US en Deshidratación.Uploaded byJose Abi Mb