Professional Documents
Culture Documents
Part 3-Quantative Analysis
Part 3-Quantative Analysis
30
25
20
15
10
5
0
1 2 3
Treatm e nt Group
Pie Chart: Lists the categories and presents the percent or
count of individuals who fall in each category.
Measures of Descriptive Statistics
Descriptive statistics: are methods for organizing and
summarizing data.
Mode:
( x x)
i
Var ( x ) i 1
n
Standard deviation:
is expressed in the
SE ( x) i 1
n
same units as the data.
Small Variance
Large Variance
(small SD)
(large SD)
Coefficient of variation (CV):
(x i x )( yi y )
Cov ( x, y ) i 1
n
Shape of Frequency Distribution
Skweness:
Kurtosis:
34% 34%
0 .1
14% 14%
2% 2%
0 .0
-4 -2 0 2 4
SCORE
CORRELATION BETWEEN VARIABLES
In statistics, dependence refers to any statistical relationship
between two random variables or two sets of data. Correlation
also tells you the degree to which the variables tend to move
together.
(x i x )( yi y )
r ( x, y ) i 1
var( xi x ) var( yi y )
ECONOMETRIC ANALYSIS: Robust Regression
Analysis
Regression analysis is a statistical tool for the investigation of
relationships between variables. Usually, we seek to ascertain the
causal effect of one variable upon another.
•How then does one know that the data really support the Keynesian theory of
consumption? Is it because the Keynesian consumption function (i.e., the
regression line) shown in Figure I.3 is extremely close to the actual data
points?
•Is it possible that another consumption model (theory) might equally fit the
data as well? For example, Milton Friedman has developed a model of
consumption, called the permanent income hypothesis.
•Robert Hall has also developed a model of consumption, called the life-cycle
permanent income hypothesis. Could one or both of these models also fit the
data in Table I.1?
•In short, the question facing a researcher in practice is how to choose among
competing hypotheses or models of a given phenomenon, such as the
consumption–income relationship.
•Let us use the Keynesian model for a time being. Let Keynes states that on average,
consumers increase their consumption as their income increases, but not as much as the
increase in their income (MPC < 1).
2. Specification of the Mathematical Model of Consumption (single-
equation model)
Y = β1 + β 2 X 0 < β2 < 1 (I.3.1)
• Y = β1 + β2X + u (I.3.2)
Yi X i i
Dependent Independent
Regression
(Response) (Explanatory)
Variable Line Variable
Ordinary Least Squares Method
E (Y ) Yˆ a bX
Y Yˆ Y (a bX ) Y a bX
2 (Y Yˆ ) 2 (Y a bX ) 2
(Y a bX ) 2 Y 2 a 2 b 2 X 2 2aY 2bXY 2abX
2
(Y Y
ˆ ) 2 (Y a bX ) 2
Min 2 Min (Y a bX ) 2
2
(Y a bX ) 2
2na 2 Y 2b X 0
a a
na Y b X 0
a Y
b X
Y bX
n n
Take a partial derivative with respect to b and plug in a
you got,
Y X
2
(Y a bX ) 2
2b X 2 2 XY 2a X 0
b b
b X 2 XY a X 0 Yb
X
X 2 XY Y bX X 0
b X 2 XY b
Y X
X 0
n n
X Y X 0
b X 2 XY b
2
n n
n X 2 X 2 XY X Y
b
n n
Least squares method is an algebraic solution that minimizes
the sum of squares of errors (variance component of error)
n XY X Y ( X X )(Y Y ) SP
b xy
n X 2 X (X X )
2 2
SS x
a Y
b X
Y bX
n n
Properties of OLS estimators: The outcome of least squares method is
OLS parameter estimators a and b.
•OLS estimators are linear
•OLS estimators are unbiased (precise)
•OLS estimators are efficient (small variance)
•Gauss-Markov Theorem: Among linear unbiased estimators, least
square estimator (OLS estimator) has minimum variance. BLUE (best
linear unbiased estimator)
In order to estimate coefficients, first we need to build the Classical
linear regression model:
– Linear in Parameter
– Linear relationship between Y and Xs
– Constant slopes (coefficients of Xs)
– Xs are fixed; Y is conditional on Xs
– X is exogenous and error is not related to Xs
– Constant variance of errors (Homoscedascticity)
– No autocorrelation with error terms
Therefore, the estimation of the Econometric Model of the example we have
is as follows:
• Regression analysis is the main tool used to obtain the estimates. Using this
technique and the data given in Table I.1, we obtain the following estimates
of β1 and β2, namely, −184.08 and 0.7064. Thus, the estimated consumption
function is:
R esponse 0 i
P la n e
X2
X1 ( X 1 i, X 2 i)
Y |X = 0 + 1 X 1 i + 2 X 2 i
R2 and Goodness-of-fit
Goodness-of-fit measures evaluates how well a regression
model fits the data. The smaller RSS, the better fit the model.
n 2
(Y Y )
TSS ESS RSS R2
ESS
TSS
n
i 1
(X
i 1
X )2
t
(Y Y ) 2
(Yˆ Y ) 2
t
(Y Yˆ
t ) 2
( n 1)
R 1
2
(1 R 2 )
(n k )
Analysis of Variance and F Statistic
R /(k 1)
2
F
(1 R ) /(n k )
2
Statistical Test:
Inferential Statistics and Hypothesis
Testing
Inferential statistics:
They are methods for using sample data to make general conclusions (inferences)
about populations.
Because a sample is typically only a part of the whole population, sample data
provide only limited information about the population. As a result, sample statistics
are generally imperfect representatives of the corresponding population parameters.
The discrepancy between a sample statistic and its population parameter is called
sampling error.
In this method, we test some hypothesis by determining the likelihood that a sample
statistic could have been selected, if the hypothesis regarding the population
parameter were true.
The goal of hypothesis testing is to determine the likelihood that
a population parameter, such as the mean, is likely to be true.
The method can be summarized in five steps.
H 0 : i 0
H1 : i 0
H 0 : 1 2 0
H1 : 1 2 0
B) Compute the Test Static:
t t
(Y Yˆ ) 2
e 2
sbˆ
t
( n k ) ( X t X ) 2
(n k ) ( X t X )2
se( )
1) Statistically Testing for joint level of significance
H 0 : 1 0 0.7064 0
t 28.56
H1 : 1 0 0.025
C) Decision Rules:
a) Decision Rule
et e11 t
Having the above regression estimate, Durbin-Watson propose the
following to detect the existence of autocorrelation:
t n
(ei ei 1 ) 2
d i2
t n
d 2(1 )
(e )
i 1
i
2
Having this decision rule, the Hypothesis Testing is:
H 0 : 0 or d=2, no autocorrelation
H 1 : 0 or d 2, there is autocorrelation
If d d L reject H 0 : 0
If d du do not reject H 0 : 0
If d L d du test is inconclusive
Example: regressing Y on x in simple regression with
sample size 20. After regression you have the following:
t n
(ei ei 1 ) 2
d i 2
t n
1.08
(e )
i 1
i
2
H 0 : No serial correlation
H1 : Null Hypothesis is not true
• which gives X = 7197, approximately. That is, an income level of about 7197
(billion) dollars, given an MPC of about 0.70, will produce an expenditure
of about 4900 billion dollars. As these calculations suggest, an estimated
model may be used for control, or policy, purposes. By appropriate fiscal
and monetary policy mix, the government can manipulate the control
variable X to produce the desired level of the target variable Y.
• Figure: Summarizes The Anatomy Of Classical Econometric Modeling.
Introducing
Qualitative/Categorical/Discrete
Explanatory Variables
yi Di ut
Keys:
yi Di xi ut
y
Non-smoker
Smoker
α+β
x
Two ways of Specifying Model with Dummy:
1)A model with constant term:
• Drop out one of the dummy category and consider it as a
reference category. This is due to protecting the model from
multicollinearity.
•Constant term coefficient is mean value of the reference category.
•Coefficients of dummy variables measures marginal difference.
•Example a model for having 4 season dummy variables:
Examining the impacts of seasonality on wage income
Y 0 1d 2 2 d3 3 d 4
Exercise 1: seasonality is represented by dummy variables and
agricultural wage income is captured by Y.
set D = 0: Y = 70 + .44Age
Set D = 1: Y = 75 + .65Age
Y=75 + .65Age
90
DRUG Y=70 + .44 Age
BLOOD PRESSURE
80
CONTROL
70
10 20 30 40
AGE
– Note that for those taking the drug not only does the intercept
increase (that is, the average level of blood pressure), but so does
the slope.
– Interpretation of an interactive term -- The effect of one independent
variable (DRUG) depends on the level of another independent
variable (AGE).
– The results here suggest that for people not taking the drug, each
additional year adds .44 units to blood pressure.
– For people taking the drug, each additional year increases blood
pressure by .65 units.