6 views

Uploaded by Munnah Bhai

Econometrics Complete Note

- CFA_L2_QM
- Course Details ECON1002
- EMET3006_2014
- Mostly Pointless Spatial Econometrics
- Management Science - Regression
- 08_chapter 3.pdf
- UT Dallas Syllabus for eco5311.501.08s taught by Wim Vijverberg (vijver)
- Wilsdcats Drilled 3
- IJLTFES Vol-5 No. 1 March 2015
- Draft
- Solution for HW 1
- Linear Logistic Regression Proofs
- logisticregression-130320062824-phpapp02
- Why Are Consumers Going Green - The Role of Environmental Concerns in Private Green-Is Adoption_Kranz Picot 2011
- Comparison and Analysis of Strain Gauge Balance Calibration Matrix Mathematical Models
- Chapter9 Regression Multicollinearity
- WQU_FundamentalsofStochasticFinance_m1
- Optimization of Outrigger Braced Structures using Regression Analysis
- killmata franget
- Download Unit 4 GB513 Business Analytics

You are on page 1of 26

measurement is an important part of econometrics, the scope of Econometrics is

much broader, as can be seen from the following quotations:

Econometrics, the result of a certain outlook on the role of economics, consists of

the application of mathematical statistics to economic data to lend empirical sup-

port to the models constructed by mathematical economics and to obtain numerical

results.

Econometrics, may be defined as the quantitative analysis of actual economic

Phenomena based on the concurrent development of theory and observation,

related by appropriate methods of inference.

Econometrics, may be defined as the social science in which the tools of economic

theory, mathematics, and statistical inference are applied to the analysis of

economic phenomena.

Econometrics, is concerned with the empirical determination of economic laws.

METHODOLOGY OF ECONOMETRICS

How do econometricians proceed in their analysis of an economic problem?

That is, what is their methodology? Although there are several schools of thought

on econometric methodology, we present here the traditional or classical

methodology, which still dominates empirical research in economics and other

social and behavioral sciences.

Broadly speaking, traditional econometric methodology proceeds along the

following lines:

1. Statement of theory or hypothesis.

2. Specification of the mathematical model of the theory

3. Specification of the statistical, or econometric, model

4. Obtaining the data

5. Estimation of the parameters of the econometric model

6. Hypothesis testing

7. Forecasting or prediction

8. Using the model for control or policy purposes.

To illustrate the preceding steps, let us consider the well-known Keynesian theory

of consumption.

Keynes stated:

The fundamental psychological law . . . is that men [women] are disposed, as a rule

and on average, to increase their consumption as their income increases, but not as

much as the increase in their income.

In short, Keynes postulated that the marginal propensity to consume (MPC), the rate of

change of consumption for a unit (say, a dollar) change in income, is greater than

zero but less than 1.

income, he did not specify the precise form of the functional relation-ship between

the two. For simplicity, a mathematical economist might suggest the following form

of the Keynesian consumption function:

Where Y= consumption expenditure and X=income, and where β1andβ2, Known as

the parameters of the model, are, respectively, the intercept and slope coefficients.

The slope coefficientβ2measures the MPC. Geometrically, Eq. (I.3.1) is as shown in

Figure I.1. This equation, which states that consumption is

Linearly related to income, is an example of a mathematical model of the

relationship between consumption and income that is called the consumption

function in economics. A model is simply a set of mathematical equations.

If the model has only one equation, as in the preceding example, it is called a single

equation model, whereas if it has more than one equation, it is known as a multiple-

equation model (the latter will be considered later in the book).

In Eq. (I.3.1) the variable appearing on the left side of the equality sign is called the

dependent variable and the variable(s) on the right side are called the independent,

or explanatory, variable(s). Thus, in the Keynesian consumption function, Eq. (I.3.1),

consumption (expenditure) is the dependent variable and income is the explanatory

variable.

The purely mathematical model of the consumption function given in Eq. (I.3.1) is of

limited interest to the econometrician, for it assumes that there is an exact or

deterministic relationship between consumption and income. But relationships

between economic variables are generally inexact.

Thus, if we were to obtain data on consumption expenditure and disposable (i.e.,

after tax) income of a sample of, say, 500 American families and plot these data on

a graph paper with consumption expenditure on the vertical axis and disposable

income on the horizontal axis, we would not expect all 500 observations to lie exactly

on the straight line of Eq. (I.3.1) because, in addition to income, other variables

affect consumption expenditure. For ex-ample, size of family, ages of the members

in the family, family religion, etc. are likely to exert some influence on consumption.

To allow for the inexact relationships between economic variables, the

econometrician would modify the deterministic consumption function (I.3.1) as

follows:

Y=β1+β2X+u (I.3.2)

Where u, known as the disturbance, or error, term, is a random (stochastic) variable

that has well-defined probabilistic properties. The disturbance term u may well

represent all those factors that affect consumption but are not taken into account

explicitly.

Equation (I.3.2) is an example of an econometric model. More technically, it is an

example of a linear regression model, which is the major concern of this book. The

econometric consumption function hypothesizes that the dependent variable Y

(consumption) is linearly related to the explanatory variable X(income) but that the

relationship between the two is not exact; it is subject to individual variation.

The econometric model of the consumption function can be depicted as shown in

Figure I.2.

4. Obtaining Data

To estimate the econometric model given in (I.3.2), that is, to obtain the numerical

values of β1andβ2, we need data. Let us look at the data given in Table I.1, which

relate to TABLE I.1 DATA ON Y (PERSONAL CONSUMPTION EXPENDITURE)

ANDX (GROSS DOMESTIC PRODUCT, 1982–1996), BOTH IN 1992 BILLIONS OF

DOLLARS

5. Estimation of the Econometric Model

Now that we have the data, our next task is to estimate the parameters of the

consumption function. The numerical estimates of the parameters give empirical

content to the consumption function. The actual mechanics of estimating the

parameters will be discussed in Chapter 3. For now, note that the statistical

technique of regression analysis is the main tool used to obtain the estimates. Using

this technique and the data given in Table I.1, we obtain the following estimates of

β1andβ2, namely, −184.08 and 0.7064.

Thus, the estimated consumption function is:

ˆ Y=−184.08+0.7064Xi (I.3.3)

The hat on the Y indicates that it is an estimate.

The estimated consumption function (i.e., regression line) is shown in Figure I.3.

*As a matter of convention, a hat over a variable or parameter indicates that it is an estimated value.

As Figure I.3 shows, the regression line fits the data quite well in that the data points

are very close to the regression line. From this figure we see that for the period

1982–1996 the slope coefficient (i.e., the MPC) was about 0.70, suggesting that for

the sample period an increase in real income of 1 dollar led, on average, to an

increase of about 70 cents in real consumption expenditure.

We say on average because the relationship between consumption and income is

inexact; as is clear from Figure I.3; not all the data points lie exactly on the regression

line. In simple terms we can say that, ac-cording to our data, the average, or mean,

consumption expenditure went up by about 70 cents for a dollar’s increase in real

income.

6. Hypothesis Testing

Assuming that the fitted model is a reasonably good approximation of reality, we

have to develop suitable criteria to find out whether the estimates obtained in, say,

Eq. (I.3.3) are in accord with the expectations of the theory that is being tested.

According to “positive” economists like Milton Friedman, a theory or hypothesis that

is not verifiable by appeal to empirical evidence may not be admissible as a part of

scientific enquiry.

As noted earlier, Keynes expected the MPC to be positive but less than 1. In our

example we found the MPC to be about 0.70. But before we accept this finding as

confirmation of Keynesian consumption theory, we must en-quire whether this

estimate is sufficiently below unity to convince us that this is not a chance

occurrence or peculiarity of the particular data we have used. In other words, is 0.70

statistically less than 1? If it is, it may support Keynes’ theory.

Such confirmation or refutation of economic theories on the basis of sample

evidence is based on a branch of statistical theory known as statistical inference

(hypothesis testing). Throughout this book we shall see how this inference process

is actually conducted.

7. Forecasting or Prediction

If the chosen model does not refute the hypothesis or theory under consideration,

we may use it to predict the future value(s) of the dependent, or forecast, variable Y

on the basis of known or expected future value(s) of the explanatory, or predictor,

variable X.

To illustrate, suppose we want to predict the mean consumption expenditure for

1997. The GDP value for 1997 was 7269.8 billion dollars.

*Do not worry now about how these values were obtained. As we show in Chap. 3, the statistical

method of least squares has produced these estimates. Also, for now do not worry about the

negative value of the intercept.

*See Milton Friedman, “The Methodology of Positive Economics,” Essays in Positive Economics,

University of Chicago Press, Chicago, 1953.

*Data on PCE and GDP were available for 1997 but we purposely left them out to illustrate the topic

discussed in this section. As we will discuss in subsequent chapters, it is a good idea to save a portion

of the data to find out how well the fitted model predicts the out-of-sample observations.

ˆ Y1997= −184.0779+0.7064 (7269.8)

= 4951.3167 (I.3.4)

Or about 4951 billion dollars. Thus, given the value of the GDP, the mean, or average,

forecast consumption expenditure is about 4951 billion dollars. The actual value of

the consumption expenditure reported in 1997 was 4913.5 billion dollars. The

estimated model (I.3.3) thus over predicted the actual consumption expenditure by

about 37.82 billion dollars. We could say the forecast error is about 37.82 billion

dollars, which is about 0.76 percent of the actual GDP value for 1997. When we fully

discuss the linear regression model in subsequent chapters, we will try to find out if

Such an error is “small” or “large.” But what is important for now is to note that such

forecast errors are inevitable given the statistical nature of our analysis.

There is another use of the estimated model (I.3.3). Suppose the President decides

to propose a reduction in the income tax. What will be the effect of such a policy on

income and thereby on consumption expenditure and ultimately on employment?

Suppose that, as a result of the proposed policy change, investment expenditure

increases. What will be the effect on the economy? As macroeconomic theory

shows, the change in income following, say, a dollar’s worth of change in investment

expenditure is given by the income multiplier M, which is defined as

If we use the MPC of 0.70 obtained in (I.3.3), this multiplier becomes about M=3.33.

That is, an increase (decrease) of a dollar in investment will eventually lead to more

than a threefold increase (decrease) in income; note that it takes time for the

multiplier to work.

The critical value in this computation is MPC, for the multiplier depends on it. And

this estimate of the MPC can be obtained from regression models such as (I.3.3).

Thus, a quantitative estimate of MPC provides valuable in-formation for policy

purposes. Knowing MPC, one can predict the future course of income, consumption

expenditure, and employment following a change in the government’s fiscal policies.

Suppose we have the estimated consumption function given in (I.3.3). Suppose

further the government believes that consumer expenditure of about 4900 (billions

of 1992 dollars) will keep the unemployment rate at its

LOGIT, PROBIT & TOBIT MODELS

Logit and probit models are appropriate when attempting to model a dichotomous dependent

variable, e.g. yes/no, agree/disagree, like/dislike, etc. The problems with utilizing the familiar

linear regression line are most easily understood visually. As an example, say we want to model

whether somebody does or does not have Bieber fever by how much beer they’ve consumed. We

collect data from a college frat house and attempt to model the relationship with linear (OLS)

regression.

There are several problems with this approach. First, the regression line may lead to predictions

outside the range of zero and one. Second, the functional form assumes the first beer has the same

marginal effect on Bieber fever as the tenth, which is probably not appropriate. Third, a residuals

plot would quickly reveal heteroskedasticity.

Logit and probit models solve each of these problems by fitting a nonlinear function to the data

that looks like the following:

The straight line has been replaced by an S-shaped curve that 1) respects the boundaries of the

dependent variable; 2) allows for different rates of change at the low and high ends of the beer

scale; and 3) (assuming proper specification of independent variables) does away with

heteroskedasticty.

What Logit and probit do, in essence, is take the linear model and feed it through a function to

yield a nonlinear relationship. Whereas the linear regression predictor looks like:

Logit and probit differ in how they define f (*). The Logit model uses something called the

cumulative distribution function of the logistic distribution. The probit model uses something

called the cumulative distribution function of the standard normal distribution to define f (*). Both

functions will take any number and rescale it to fall between 0 and 1. Hence, whatever α + βx

equals, it can be transformed by the function to yield a predicted probability. Any function that

would return a value between zero and one would do the trick, but there is a deeper theoretical

model underpinning Logit and probit that requires the function to be based on a probability

distribution. The logistic and standard normal cdfs turn out to be convenient mathematically and

are programmed into just about any general purpose statistical package.

Is Logit better than probit, or vice versa? Both methods will yield similar (though not identical)

inferences. Logit – also known as logistic regression – is more popular in health sciences like

epidemiology partly because coefficients can be interpreted in terms of odds ratios. Probit models

can be generalized to account for non-constant error variances in more advanced econometric

settings (known as heteroskedasticty probit models) and hence are used in some contexts by

economists and political scientists. If these more advanced applications are not of relevance, than

it does not matter which method you choose to go with.

Tobit Model

• This model is for metric dependent variable and when it is “limited” in the sense

we observe it only if when it is limited in the sense we observe it only if it is above

or below some cut off level. For example,

–the wages maybe limited from below by the minimum wage

–The donation amount give to charity

–“Top coding” income at, say, at $300,000

–Time use and leisure activity of individuals

Extramarital affairs –Extramarital affairs

• It is also called censored regression model. Censoring can be from below or from

above, also called left and can be from below or from above, also called left and

right censoring. [Do not confuse the term “censoring” with the one used in

dynamic modeling.

Uses of Logit Model:

Logistic regression is used in various fields, including machine learning, most medical

fields, and social sciences. For example, the Trauma and Injury Severity Score (TRISS),

which is widely used to predict mortality in injured patients, was originally developed by

Boyd et al. using logistic regression. Another example might be to predict whether an

American voter will vote Democratic or Republican, based on age, income, sex, race, state

of residence, votes in previous elections, etc. The technique can also be used

in engineering, especially for predicting the probability of failure of a given process,

system or product. It is also used in marketing applications such as prediction of a

customer's propensity to purchase a product or halt a subscription, etc. In economics it

can be used to predict the likelihood of a person's choosing to be in the labor force, and

a business application would be to predict the likelihood of a homeowner defaulting on

a mortgage.

Examples of probit regression

Example 1: Suppose that we are interested in the factors that influence whether a

political candidate wins an election. The outcome (response) variable is binary (0/1); win

or lose. The predictor variables of interest are the amount of money spent on the

campaign, the amount of time spent campaigning negatively and whether the candidate

is an incumbent.

Example 2: A researcher is interested in how variables, such as GRE (Graduate Record

Exam scores), GPA (grade point average) and prestige of the undergraduate institution,

effect admission into graduate school. The response variable, admit/don’t admit, is a

binary variable.

Examples of Tobit Model

Example 1.

In the 1980s there was a federal law restricting speedometer readings to no more than

85 mph. So if you wanted to try and predict a vehicle’s top-speed from a combination of

horse-power and engine size, you would get a reading no higher than 85, regardless of

how fast the vehicle was really traveling. This is a classic case of right-censoring (censoring

from above) of the data. The only thing we are certain of is that those vehicles were

traveling at least 85 mph.

Example 2. A research project is studying the level of lead in home drinking water as a

function of the age of a house and family income. The water testing kit cannot detect lead

concentrations below 5 parts per billion (ppb). The EPA considers levels above 15 ppb to

be dangerous. These data are an example of left-censoring (censoring from below).

Example 3. Consider the situation in which we have a measure of academic aptitude

(scaled 200-800) which we want to model using reading and math test scores, as well as,

the type of program the student is enrolled in (academic, general, or vocational). The

problem here is that students who answer all questions on the academic aptitude test

correctly receive a score of 800, even though it is likely that these students are not “truly”

equal in aptitude. The same is true of students who answer all of the questions

incorrectly. All such students would have a score of 200, although they may not all be of

equal aptitude.

Importance of Quantitative Research

2. Can use statistics to generalize a finding

3. Often reduces and restructures a complex problem to a limited number of

variables

4. Looks at relationships between variables and can establish cause and effect in

highly controlled circumstances

5. Tests theories or hypotheses

6. Assumes sample is representative of the population

7. Subjectivity of researcher in methodology is recognized less

8. Less detailed than qualitative data and may miss a desired response from the

participant

b2usiness it is not possible to rely on the unscientific decisions based on the

intuitions. This provides the scientific methods for tackling various problems for

modern business.

2. Tools for scientific analysis– Quantitative techniques provide the managers with a

variety of tools from mathematics, statistics, economics and operational research.

These tools help the manager to provide a more precise description and solution

of the problem. The solutions obtained by using quantitative techniques are often

free from the bias of the manager or the owner of the business.

3. Solution for various business problems. Quantitative techniques provide solutions

to almost every area of a business. These can be used in production, marketing,

inventory, finance and other areas to find answers to various question like (a) how

the resources should be used in production so that profits are maximized. (b) How

should the production be matched to demand so as to minimize the cost of

inventory?

4. Optimum allocation of resources- An allocation of resources is said to be optional

if either a given level of output is being produced at minimum cost or maximum

output is being produced at a given cost. A quantitative technique enables a

manager to optimally allocate the resources of a business or industry.

5. Selection of an optimal strategy– Using quantitative techniques it is possible to

determine the optimal strategy of a business or firm that is facing competition

from its rivals. The techniques for determining the optimal strategy is dependent

upon game theory.

6. Optimal deployment of resources- Using quantitative technique It is possible to

find out the earliest and latest time for successful completion of project and this is

called program evaluation and review technique.

7. Facilitate the process of decision making- quantitative techniques provide a

method of decision making in the face of uncertainty. These techniques are based

upon decision theory.

What is Multicollinearity?

Multicollinearity generally occurs when there are high correlations between two or

more predictor variables. In other words, one predictor variable can be used to predict the

other. This creates redundant information, skewing the results in a regression model.

Examples of correlated predictor variables (also called multicollinearity predictors) are: a

person’s height and weight, age and sales price of a car, or years of education and annual

income.

An easy way to detect multicollinearity is to calculate correlation coefficients for all pairs

of predictor variables. If the correlation coefficient, r, is exactly +1 or -1, this is called

perfect multicollinearity. If r is close to or exactly -1 or +1, one of the variables should be

removed from the model if at all possible.

It’s more common for multicollinearity to rear its ugly head in observational studies; it’s

less common with experimental data. When the condition is present, it can result in

unstable and unreliable regression estimates. Several other problems can interfere with

analysis of results, including:

The t-statistic will generally be very small and coefficient confidence intervals will be

very wide. This means that it is harder to reject the null hypothesis.

The partial regression coefficient may be an imprecise estimate; standard errors may

be very large.

Partial regression coefficients may have sign and/or magnitude changes as they pass

from sample to sample.

Multicollinearity makes it difficult to gauge the effect of independent

variables on dependent variables.

The two types are:

100% observational, or data collection methods that cannot be manipulated. In some

cases, variables may be highly correlated (usually due to collecting data from purely

observational studies) and there is no error on the researcher’s part. For this reason,

you should conduct experiments whenever possible, setting the level of the predictor

variables in advance.

Structural multicollinearity: caused by you, the researcher, creating new predictor

variables.

Causes for multicollinearity can also include:

Insufficient data. In some cases, collecting more data can resolve the issue.

Dummy variables may be incorrectly used. For example, the researcher may fail to

exclude one category, or add a dummy variable for every category (e.g. spring,

summer, autumn, winter).

Including a variable in the regression that is actually a combination of two other

variables. For example, including “total investment income” when total investment

income = income from stocks and bonds + income from savings interest.

Including two identical (or almost identical) variables. For example, weight in pounds

and weight in kilos, or investment income and savings/bond income.

Next: Variance Inflation Factors.

What is Multicollinearity?

As stated in the lesson overview, multicollinearity exists whenever two or more of the predictors in a

regression model are moderately or highly correlated. Now, you might be wondering why a researcher

can’t just collect his data in such a way to ensure that the predictors aren't highly correlated. Then,

multicollinearity wouldn't be a problem, and we wouldn't have to bother with this silly lesson.

Unfortunately, researchers often can't control the predictors. Obvious examples include a person's

gender, race, grade point average, math SAT score, IQ, and starting salary. For each of these predictor

examples, the researcher just observes the values as they occur for the people in her random sample.

Multicollinearity happens more often than not in such observational studies. And, unfortunately,

regression analyses most often take place on data obtained from observational studies. If you aren't

convinced, consider the example data sets for this course. Most of the data sets were obtained from

observational studies, not experiments. It is for this reason that we need to fully understand the impact

of multicollinearity on our regression analyses.

Types of multicollinearity

There are two types of multicollinearity:

Structural multicollinearity is a mathematical artifact caused by creating new predictors from other

predictors — such as, creating the predictor x2 from the predictor x.

Data-based multicollinearity, on the other hand, is a result of a poorly designed experiment, reliance

on purely observational data, or the inability to manipulate the system on which the data are

collected.

In the case of structural multicollinearity, the multicollinearity is induced by what you have done. Data-

based multicollinearity is the more troublesome of the two types of multicollinearity. Unfortunately it

is the type we encounter most often!

Example

multicollinearity exists. Some researchers observed — notice the choice of word! — the following data

(bloodpress.txt) on 20 individuals with high blood pressure:

blood pressure (y = BP, in mm Hg)

age (x1 = Age, in years)

weight (x2 = Weight, in kg)

body surface area (x3 = BSA, in sq m)

duration of hypertension (x4 = Dur, in years)

basal pulse (x5 = Pulse, in beats per minute)

stress index (x6 = Stress)

The researchers were interested in determining if a relationship exists between blood pressure and age,

weight, body surface area, duration, pulse rate and/or stress level.

Dummy Variables

A dummy variable is a numerical variable used in regression analysis to

represent subgroups of the sample in your study. In research design, a

dummy variable is often used to distinguish different treatment groups. In

the simplest case, we would use a 0,1 dummy variable where a person

is given a value of 0 if they are in the control group or a 1 if they are in

the treated group. Dummy variables are useful because they enable us

to use a single regression equation to represent multiple groups. This

means that we don't need to write out separate equation models for each

subgroup. The dummy variables act like 'switches' that turn various

parameters on and off in an equation. Another advantage of a 0,1

dummy-coded variable is that even though it is a nominal-level variable

you can treat it statistically like an interval-level variable (if this made no

sense to you, you probably should refresh your memory on levels of

measurement). For instance, if you take an average of a 0,1variable, the

result is the proportion of 1s in the distribution.

a posttest-only two-group randomized experiment. This model is

essentially the same as conducting a t-test on the posttest means for two

groups or conducting a one-way Analysis of Variance (ANOVA). The key

term in the model is 1, the estimate of the difference between the

groups. To see how dummy variables work, we'll use this simple model

to show you how to use them to pull out the separate sub-equations for

each subgroup. Then we'll show how you estimate the difference

between the subgroups by subtracting their respective equations. You'll

see that we can pack an enormous amount of information into a single

equation using dummy variables. All I want to show you here is that 1 is

the difference between the treatment and control groups.

To see this, the first step is to compute what the equation would be for

each of our two groups separately. For the control group, Z = 0. When

we substitute that into the equation, and recognize that by assumption

the error term averages to 0, we find that the predicted value for the

control group is 0, the intercept. Now, to figure out the treatment group

line, we substitute the value of 1 for Z, again recognizing that by

assumption the error term averages to 0. The equation for the treatment

group indicates that the treatment group value is the sum of the two beta

values.

difference between the groups. How do we determine that? Well, the

difference must be the difference between the equations for the two

groups that we worked out above. In other word, to find the difference

between the groups we just find the difference between the equations for

the two groups! It should be obvious from the figure that the difference

is 1. Think about what this means. The difference between the groups

is 1. OK, one more time just for the sheer heck of it. The difference

between the groups in this model is 1!

Whenever you have a regression model with dummy variables, you can

always see how the variables are being used to represent multiple

subgroup equations by following the two steps described above:

create separate equations for each subgroup by substituting the dummy values

find the difference between groups by finding the difference between their equations

A dummy variable is a dichotomous variable which has been coded to represent a variable with a

higher level of measurement. Dummy variables are often used in multiple linear regression (MLR).

Dummy coding refers to the process of coding a categorical variable into dichotomous variables. For

example, we may have data about participants' religion, with each participant coded as follows:

A categorical or nominal variable with three categories

Religion Code

Christian 1

Muslim 2

Atheist 3

This is a nominal variable (see level of measurement) which would be inappropriate as a predictor in

MLR. However, this variable could be represented using a series of three dichotomous variables

(coded as 0 or 1), as follows:

Full dummy coding for a categorical variable with three categories

Religion Christian Muslim Atheist

Christian 1 0 0

Muslim 0 1 0

Atheist 0 0 1

There is some redundancy in this dummy coding. For instance, in this simplified data set, if we know

that someone is not Christian and not Muslim, then they are Atheist.

So we only need to use two of these three dummy-coded variables as predictors. More generally, the

number of dummy-coded variables needed is one less than the number of categories.

Choosing which dummy variable not to use is arbitrary and depends on the researcher's logic. For

example, if I'm interested in the effect of being religious, my reference (or baseline) category would be

Atheist. I would then be interested to see whether the extent to which being Christian (0 (No) or 1

(Yes)) or Muslim (0 (No) or 1 (Yes)) predicts the variance in a dependent variable (such as Happiness)

in a regression analysis. In this case, the dummy coding to be used would be the following subset of

the previous full dummy coding table:

Dummy coding for a categorical variable with three categories, using Atheist as the reference

category

Religion Christian Muslim

Christian 1 0

Muslim 0 1

Atheist 0 0

Alternatively, I may simply be interested to recode into a single dichotomous variable to indicate, for

example, whether a participant is Atheist (0) or Religious (1), where Religious is Christian or Muslim.

The coding would be as follows:

A categorical or nominal variable with three categories

Religiosity Code

Atheism 0

Religious 1

Autocorrelation

In this part of the book (Chapters 20 and 21), we discuss issues especially related to the study of

economic time series. A time series is a sequence of observations on a variable over time.

Macroeconomists generally work with time series (e.g., quarterly observations on GDPand monthly

observations on the unemployment rate). Time series econometrics is a huge and complicated subject.

Our goal is to introduce you to some of the main issues.

We concentrate in this book on static models. A static model deals with the contemporaneous

relationship between a dependent variable and one or more independent variables.Asimple example

would be a model that relates average cigarette consumption in a given year for a given state to the

average real price of cigarettes in that year:

In this model we assume that the price of cigarettes in a given year affects quantity demanded in that

year.1In many cases, a static model does not adequately capture the relationship between the variables

of interest. For example, cigarettes are addictive, and so quantity demanded this year might depend on

prices last year. Capturing this idea in a model requires some additional notation and terminology. If we

denote year t’s real price by RealPricet, then the previous year’s price is RealPricet-1. The latter quantity

is called a one-period lag of RealPrice. We could then write down a distributed lag model:

Although highly relevant to time series applications, distributed lag models are an advanced topic which

we will not cover in this book.2

As always, before we can proceed to draw inferences from regressions from sample data, we need a

model of the data generating process.We will attempt to stick as close as possible to the classical

econometric model. Thus, to keep things simple, in our discussion of static models we continue to

assume that the X’s, the independent variables, are fixed in repeated samples. Although this assumption

is pretty clearly false for most time series, for static models it does not do too much harm to pretend it

is true. Chapter 21 points out how things change when one considers more realistic models for the data

generating process.

Unfortunately, we cannot be so cavalier with another key assumption of the classical econometric model:

the assertion that the error terms for each observation are independent of one another. In the case we

are considering, the error term reflects omitted variables that influence the demand for cigarettes. For

example, social attitudes toward cigarette smoking and the amount of cigarette advertising both

probably affect the demand for cigarettes. Now social attitudes are fairly similar from one year to the

next, though they may vary considerably over longer time periods. Thus, social attitudes in 1961 were

probably similar to those in 1960, and those in 1989 were probably similar to those in 1988. If that is

true and if social attitudes are an important component of the error term in our model of cigarette

demand, the assumption of independent error terms across observations is violated.

These considerations apply quite generally. In most time series, it is plausible that the omitted variables

change slowly over time. Thus, the influence of the omitted variable is similar from one time period to

the next. Therefore, the error terms are correlated with one another. This violation of the classical

econometric model is generally known as autocorrelation of the errors. As is the case with

heteroskedasticity, OLS estimates remain unbiased, but the estimated SEs are biased.

For both heteroskedasticity and autocorrelation there are two approaches to dealing with the problem.

You can either attempt to correct the bias in the estimated SE, by constructing a heteroskedasticity- or

autocorrelation-robust estimated SE, or you can transform the original data and use generalized least

squares (GLS) or feasible generalized least squares (FGLS). The advantage of the former method is that

it is not necessary to know the exact nature of the heteroskedasticity or autocorrelation to come up with

consistent estimates of the SE. The advantage of the latter method is that, if you know enough about

the form of the heteroskedasticity or autocorrelation, the GLS or FGLS estimator has a smaller SE than

OLS. In our discussion of heteroskedasticity we have chosen to emphasize the first method of dealing

with the problem; this chapter emphasizes the latter method. These choices reflect the actual practice

of empirical economists who have spent much more time trying to model the exact nature of the

autocorrelation in their data sets than the heteroskedasticity.

In this chapter, we analyze autocorrelation in the errors and apply the results to the study of static time

series models. In many ways our discussion of autocorrelation parallels that of heteroskedasticity. The

chapter is organized in four main parts:

Understanding Autocorrelation

Consequences of Autocorrelation for the OLS estimator

Diagnosing the Presence of Autocorrelation

Correcting for Autocorrelation

Chapter 21 goes on to consider several topics that stem from the discussion of

autocorrelation in static models: trends and seasonal adjustment, issues surrounding the data

generation process (stationarity and weak dependence), forecasting, and lagged dependent variable

models.

Autocorrelation

Autocorrelation is a characteristic of data in which the correlation between the values of the

same variables is based on related objects. It violates the assumption of instance

independence, which underlies most of the conventional models. It generally exists in those

types of data-sets in which the data, instead of being randomly selected, is from the same

source.

Presence

The presence of autocorrelation is generally unexpected by the researcher. It occurs mostly

due to dependencies within the data. Its presence is a strong motivation for those

researchers who are interested in relational learning and inference.

Examples

In order to understand autocorrelation, we can discuss some instances that are based upon

cross sectional and time series data. In cross sectional data, if the change in the income of

a person A affects the savings of person B (a person other than person A), then

autocorrelation is present. In the case of time series data, if the observations show inter-

correlation, specifically in those cases where the time intervals are small, then these inter-

correlations are given the term of autocorrelation.

In time series data, autocorrelation is defined as the delayed correlation of a given

series. Autocorrelation is a delayed correlation by itself, and is delayed by some specific

number of time units. On the other hand, serial autocorrelation is that type which defines the

lag correlation between the two series in time series data.

Patterns

Autocorrelation depicts various types of curves which show certain kinds of patterns, for

example, a curve that shows a discernible pattern among the residual errors, a curve that

shows a cyclical pattern of upward or downward movement, and so on.

In time series, it generally occurs due to sluggishness or inertia within the data. If a non-

expert researcher is working on time series data, then he might use an incorrect functional

form, and this again can cause autocorrelation.

The handling of the data by the researcher, when it involves extrapolation and interpolation,

can also give rise to autocorrelation. Thus, one should make the data stationary in order to

remove autocorrelation in the handling of time series data.

(like an economic series) depicts an upward or downward pattern, then the series is

considered to exhibit positive autocorrelation. If, on the other hand, the series depicts a

constant upward and downward pattern, then the series is considered to exhibit negative

autocorrelation.

When a researcher has applied ordinary least square over an estimator in the presence of

autocorrelation, then the estimator is incompetent.

There is a very popular test called the Durbin Watson test that detects the presence of

autocorrelation. If the researcher detects autocorrelation in the data, then the first thing the

researcher should do is to try to find whether or not it is pure. If it is pure, then one can

transform it into the original model that is free from pure autocorrelation.

- CFA_L2_QMUploaded byAditya Bajoria
- Course Details ECON1002Uploaded byNatdanai Gamee Jungpairoj
- EMET3006_2014Uploaded byScarl0s
- Mostly Pointless Spatial EconometricsUploaded bynikitas666
- Management Science - RegressionUploaded byShivani Roopnarain
- 08_chapter 3.pdfUploaded bySundeep Narwani
- UT Dallas Syllabus for eco5311.501.08s taught by Wim Vijverberg (vijver)Uploaded byUT Dallas Provost's Technology Group
- Wilsdcats Drilled 3Uploaded bySk.Ashiquer Rahman
- IJLTFES Vol-5 No. 1 March 2015Uploaded byexcelingtech
- DraftUploaded byChinh Xuan
- Solution for HW 1Uploaded byAshtar Ali Bangash
- Linear Logistic Regression ProofsUploaded byHarold Valdivia Garcia
- logisticregression-130320062824-phpapp02Uploaded byRavi Indra Varma
- Why Are Consumers Going Green - The Role of Environmental Concerns in Private Green-Is Adoption_Kranz Picot 2011Uploaded byCheng Kai Wah
- Comparison and Analysis of Strain Gauge Balance Calibration Matrix Mathematical ModelsUploaded byJonathan Olson
- Chapter9 Regression MulticollinearityUploaded byrambista
- WQU_FundamentalsofStochasticFinance_m1Uploaded byShravan Venkataraman
- Optimization of Outrigger Braced Structures using Regression AnalysisUploaded byIRJET Journal
- killmata frangetUploaded bySiddharth Shekhar
- Download Unit 4 GB513 Business AnalyticsUploaded bySolutionZIP
- MSK AnalysisUploaded byEj Xpress
- multiple reggresion slideUploaded byGülben
- OutputUploaded byAdria Putra Farhandika
- BankingUploaded byAkotuah
- Paper 3 June 2007Uploaded byecobalas7
- Hetero IeaUploaded bybambang
- 500ch7Uploaded bykunalsekhri123
- Homework 1.5 Paul VoigtUploaded byhungdahoang
- Regression through ExcelUploaded byPrince Malik
- Auto Correlation 1Uploaded byhunjo420

- DirectorsUploaded byAhsan
- CG Lecture8Uploaded byMuhammad Mudassir
- 10 Business EthicsUploaded byMunnah Bhai
- 8 Risk Management.pptUploaded byMunnah Bhai
- 5 Board CommitteesUploaded byMunnah Bhai
- 11 CSRUploaded byMunnah Bhai
- Corporate Governance Solved BUploaded byMunnah Bhai
- Corporate Governance Solved AUploaded byMunnah Bhai
- Crisis Management Courseoutline.docxUploaded byMunnah Bhai
- Crisis Management Part BUploaded byMunnah Bhai
- MBA CRISIS MANAGEMENT NOTESUploaded byChauhan Neelam
- Solved Paper Strategic BUploaded byMunnah Bhai
- Solved Paper Strategic AUploaded byMunnah Bhai
- PaperUploaded byMunnah Bhai
- Econometrics Pper (Autosaved)Uploaded byMunnah Bhai
- Econometrics Complete NotesUploaded byMunnah Bhai
- MultiColl TheoryUploaded byMunnah Bhai
- Heterodesc TheoryUploaded byMunnah Bhai
- 6 Basic EconometricsUploaded bysabyrbekov
- Demand ForecastingUploaded byPallav Tyagi
- Lecture 14Uploaded bynageswara_mutyala
- Lecture 12Uploaded bynageswara_mutyala
- Lecture 15.pdfUploaded byMunnah Bhai
- Lecture 10Uploaded bynageswara_mutyala
- Lecture 13.pdfUploaded byMunnah Bhai
- Lecture 11.pdfUploaded byMunnah Bhai
- Lecture 09Uploaded bynageswara_mutyala
- Lecture 08 DemandUploaded bynageswara_mutyala

- anstett constitution unit lp6Uploaded byapi-296705974
- Impact of Six-Sigma DMAIC Approach OnUploaded byVarun
- Common Mistakes Found in Assignment 1Uploaded byCeleste Lim
- Advanced Topics in SPCUploaded byAngélica Ramos Lopez
- Jurnal trakeostomiUploaded byCaesar Muhammad Wijaya
- Anne Darr-Identifying CSEC WkstUploaded byBOsch Vakil
- Mahindra projectUploaded byAnkitSingh
- Anna VillumsenUploaded byTran Tuan Linh
- Parameteric Optimization of TIG Welding on Joint of Stainless Steel(316) & Mild Steel using Taguchi TechniqueUploaded byAnonymous kw8Yrp0R5r
- textbook-on-ethics-report_en.pdfUploaded byArfiatul Isnaini
- Organizational Motivators-Financial and Non-FinancialUploaded byAna Djuvara
- Sas Dwh MethodUploaded byPrasad Hegde
- 770 Homework - ANOVA and General FactorialsUploaded byMujtaba Masood
- Smart Energy Control Systems for Sustainable Buildings-Springer InternationaUploaded byPutri
- 915400.pdfUploaded by...?
- Job Satisfaction and Job Performance at TheUploaded bybleancabianca
- 08Declare_MajUploaded bywhyhellothere458738
- Karine Van TrichtUploaded byDoina Mafteiu
- competency 001Uploaded byapi-341505564
- Strategic MgmtUploaded bykatariyasonam
- Leadership TheoriesUploaded byDhiraj Soni
- Spencer: Privacy and Predictive Analytics in E-commerceUploaded byNew England Law Review
- 2017 - A Critique of Response Bias in the Tourism, Travel and Hospitality ResearchUploaded byJumanji
- Decision+Analysis+for+UG_2C+Fall+2014_2C+Course+Outline+2014-08-14Uploaded bytuahasiddiqui
- observation formal 4-4-17Uploaded byapi-344055512
- Oviatt and McDougall 1994 JIBS Toward a Theory of International New Ventures(1)Uploaded byJoao Serra
- Nielsen Sea Cross-platform Report 2014 VnUploaded byNmcuong91
- A Level Science ChallengeUploaded byMrs S Baker
- Collective Security in Space - European PerspectivesUploaded bySaid Ali
- Group A4_Sales Performance in KRCUploaded byBinita Kumari