You are on page 1of 21

The nature of regression

analysis

Chapter 2
Learning goals

• Why do we use regression analysis


• Different kinds of functional forms and relationships between
dependent and independent variables.
• How to get the best fitting line for the regression model
• Significance of error term and the intercept
• Regression vs. correlation
Nature of regression analysis
• In simple regression models, such as a two variable
regression model, one variable known as the dependent
variable and is expressed as a linear function of one or more
other variables known as the independent variables.
• In such models, it is implicitly assumed that the causal
relationship, if any, between the dependent and independent
variable(s), flows in one direction only, namely, from the
independent/explanatory variables to the dependent variable.
Why do we use regression analysis?
• Typically, regression analysis is used for one or more of the
following three purposes:
a. Modeling the relationship between X and Y
b. Predicting/forecasting the target variable &
c. Testing hypothesis about the relationship between X and
Y
• Before the discussing the nature of regression analysis, we need to
know about the possible kinds of relationship that may exist
between the dependent and independent variable(s) included in a
regression model.
Types of relationships: Deterministic
• The relationship between one explained (dependent) and
explanatory (independent) variable(s) may be of two types:
a. Deterministic or mathematical
b. Stochastic or statistical

Deterministic
Relationship
Types of relationships: Deterministic
These types of
deterministic
relationships play a
minor role in
econometrics.

For example, a deterministic or


mathematical equation can be
written as follows:
EQ.1
Sources of randomness
• Real world economic relationships are not as precise as one
in EQ [1]. This is because of-
a. Measurement error
b. Omitted variables
c. Other factors- e.g., random component of Y is not related to
X.
• Consider Keynesian consumption function-consumption goes
up as income goes up but not exactly at the same rate as
income does.
Stochastic relationship

EQ.2
Stochastic relationship
Meaning of the term “Regression Analysis”
❶ Modeling the relationship between the dependent and
independent variables. In other words, whether there exists any
relationship between the dependent and independent variable(s).
• We have one particular variable that we are interested in
understanding/modeling. For example, sales of a particular
product or the stock prices of a publicly traded firm. This variable
is called as the target or dependent variable, and is usually
represented by Y.
• We have a set of other p variables we think might be helpful in
predicting/modeling the target variable.
Meaning of the term “Regression Analysis”
❷ Testing the hypothesis about the nature of dependence
between dependent and independent variables or the hypothesis
suggested by economic theory or common sense. Put it
differently, we need to know whether the relationship specified
in a regression model is statistically significant or not.
❸ Predicting/forecasting the mean/average value of the
target or dependent variable, given the values of the
independent variable(s).
Meaning of the term “Regression Analysis”

A positive value of is consistent with a direct relationship between


x and y. That is, higher (lower) values of y are related to higher
(lower) values of x. For a negative lower (higher) values of y are
related to higher (lower) values of x.
Finding the best fitting line
• The main task of regression analysis is to find a line consistent
with the economic theory that best fits the data. The best fitting line
is the one that passes through the points in such a way that positive
and negative deviations (errors) cancel out each other.
• Therefore, the best fitting line helps us making total errors zero, that
is or at least makes the total errors as small as possible,
that is,
• However, an infinite number of lines exists that will minimize the
sum of residuals. The solution is to minimize the sum of squared
residuals, that is,
• This will result in a unique regression line that will be the line closest
to the actual data points. This procedure explains why regression
analysis is often called least squares analysis.
Finding the best fitting line
• The intuitive idea behind the regression analysis can be
illustrated by the following figure:
The vertical distance between the
regression line and the observed points of
is termed as the residual or error. We wish
to fit the regression line in such a way as
to make these residuals as small as
possible. There are a number of ways of
doing this. The most obvious way (also
efficient) is to minimize the sum of
squared residuals.
Significance of the Stochastic Error Term
• We need to keep in mind always that a regression model is a
simplified representation of a real-world problem.
• For instance, say, that the quantity demanded of orange
depends on its price is a simplified representation of reality
because there are a host of other variables that one can think of
that determine the demand for oranges.
• For this reason, it is often suggested to keep a regression
model as simple as possible until it is proved to be an
inadequate representation of the reality
Significance of the Stochastic Error Term
❷ Even if we have theoretically correct variable(s) in our regression
model explaining a phenomenon and even if we can obtain data on
these variables, very often we do not know the form of functional
relationship between the dependent and independent variables.
• Is consumption expenditure a linear or nonlinear function of
income? If it is the former, is the proper
functional form.
• If it is the latter, that is, nonlinear, then is the
proper functional form of the relationship between x and y.
❸ In addition, because of errors in measurement, human behavior,
use of poor qualitative variable, disturbance terms plays a critical
role in the regression analysis.
Significance of the Intercept Terms

From the figure above it is clear that zero price orange consumption
has a positive value. But, in reality, price of a product can never be
zero, unless otherwise it is a free good, like the air we breath (but note
that pollution free air is not a free good!). For this reason, the value of
intercepts in regression analysis often may not have any particular
economic meaning.
Significance of the Intercept Terms
• Now, one relevant question is if the intercept term do not have
any economic meaning, why do we keep a constant term in
our regression model? The answer is, in regression analysis we
keep an intercept term simply because of mathematical reason.
Regression & Causality
• While regression results cannot be used to prove that a causal
relationship exists between the data, regression models are
always constructed with the idea of causality in mind.
• That is, we must assume that one variable is affected (caused)
by other variable(s). But on the basis of regression results,
we cannot confirm a causal relationship between
variables unless it is supported by theory or at least by
common sense.
Regression & Causality
• Suppose that you found a strong statistical result to support
the hypothesis that rainfall does depend on the crop yield.
Although there is no theory that bolsters such statistical
finding, we know from our common sense that such
hypothesis is simply absurd.
• “A statistical relationship, however strong and however
suggestive, can never establish causal connection: our ideas
of causation must come from outside statistics, ultimately
from some theory or common sense”- Kendall & Stuart
Regression and Correlation
• The two concepts are rather different. The primary objective of
the correlation analysis is to measure the strength or degree of
linear association between two variables. But in the regression
analysis, we try to estimate or predict the average value of one
variable on the basis of fixed of other variables.
• In the regression analysis, the dependent variable is assumed
to be statistical or random. That is, they can only be described
by probability distribution. On the other hand, the independent
variables are assumed to have fixed values. But, in correlation
analysis, there is no such distinction between the dependent
and independent variables and both of them are assumed to be
stochastic.

You might also like