Econometrics Cheatsheet en

Econometrics Cheat Sheet Assumptions and properties Ordinary Least Squares
By Marcelo Moreno - King Juan Carlos University

Econometric model assumptions Objective - minimizePn the2 Sum of Squared Residuals (SSR):
The Econometrics Cheat Sheet Project
Under this assumptions, the OLS estimator will present min i=1 ûi , where ûi = yi − ŷi
good properties. Gauss-Markov assumptions:
Basic concepts 1. Parameters linearity (and weak dependence in time y
Simple regression model
Equation:
Definitions series). y must be a linear function of the β’s.
yi = β0 + β1 xi + ui
Econometrics - is a social science discipline with the 2. Random sampling. The sample from the population
Estimation:
objective of quantify the relationships between economic has been randomly taken. (Only when cross section)
ŷi = β̂0 + β̂1 xi
agents, test economic theories and evaluate and implement 3. No perfect collinearity.
β1 where:
government and business policies. There are no independent variables that are constant:
β̂0 = y − β̂1 x
Econometric model - is a simplified representation of the Var(xj ) ̸= 0, ∀j = 1, . . . , k. Cov(y,x)
There isn’t an exact linear relation between indepen- β0 β̂ 1 = Var(x)
reality to explain economic phenomena.
Ceteris paribus - if all the other relevant factors remain dent variables.
x
constant. 4. Conditional mean zero and correlation zero.
a. There aren’t systematic errors: E(u | x1 , . . . , xk ) = Multiple regression model
Data types E(u) = 0 → strong exogeneity (a implies b).
Cross section - data taken at a given moment in time, an y Equation:
b. There are no relevant variables left out of the model:
static photo. Order doesn’t matter. yi = β0 + β1 x1i + · · · + βk xki + ui
Cov(xj , u) = 0, ∀j = 1, . . . , k → weak exogeneity.
Time series - observation of variables across time. Order Estimation:
5. Homoscedasticity. The variability of the residuals is
does matter. ŷi = β̂0 + β̂1 x1i + · · · + β̂k xki
the same for all levels of x:
Panel data - consist of a time series for each observation where:
Var(u | x1 , . . . , xk ) = σu2
of a cross section. β̂0 = y − β̂1 x1 − · · · − β̂k xk
6. No auto-correlation. Residuals don’t contain infor- Cov(y,resid x )
Pooled cross sections - combines cross section from dif- β0 x 2 β̂j = Var(resid xj )j
mation about any other residuals:
ferent time periods. Corr(ut , us | x1 , . . . , xk ) = 0, ∀t ̸= s. Matrix: β̂ = (X T X)−1 (X T y)
7. Normality. Residuals are independent and identically x 1
Phases of an econometric model
1. Specification. 3. Validation. distributed: u ∼ N (0, σu2 )
8. Data size. The number of observations available must
Interpretation of coefficients
2. Estimation. 4. Utilization. Model Dependent Independent β1 interpretation
be greater than (k + 1) parameters to estimate. (It is
Regression analysis Level-level y x ∆y = β1 ∆x
already satisfied under asymptotic situations) Level-log y log(x) ∆y ≈ (β1 /100)(%∆x)
Study and predict the mean value of a variable (dependent Log-level log(y) x %∆y ≈ (100β1 )∆x
variable, y) regarding the base of fixed values of other vari- Asymptotic properties of OLS Log-log log(y) log(x) %∆y ≈ β1 (%∆x)
ables (independent variables, x’s). In econometrics it is Under the econometric model assumptions and the Central Quadratic y x + x2 ∆y = (β1 + 2β2 x)∆x
common to use Ordinary Least Squares (OLS) for regres- Limit Theorem (CLT):
sion analysis. Hold 1 to 4a: OLS is unbiased. E(β̂j ) = βj Error measurements Pn Pn
Hold 1 to 4: OLS is consistent. plim(β̂j ) = βj (to 4b Sum of Sq. Residuals: SSR = i=1 û2i = Pi=1 (yi − ŷi )2
Correlation analysis left out 4a, weak exogeneity, biased but consistent) Explained Sum of Squares:
n
SSE = Pi=1 (ŷi − y)2
Correlation analysis don’t distinguish between dependent n
Hold 1 to 5: asymptotic normality of OLS (then, 7 is Total Sum of Sq.: SST = SSE + SSR = i=1q (yi − y)2
and independent variables.
necessarily satisfied): u ∼ N (0, σu2 ) Standard Error of the Regression: σ̂u = n−k−1 SSR
Simple correlation measures the grade of linear associa- a
Hold 1 to 6: unbiased estimate of σu2 . E(σ̂u2 ) = σu2
p
tion between two variables.Pn Standard Error of the β̂’s: se(β̂) = σ̂u2P· (X T X)−1
r = Cov(x,y) √Pn i=1 ((xi −x)·(y i −y)) Hold 1 to 6: OLS is BLUE (Best Linear Unbiased Esti- n
i=1 (yi −ŷi )
2
σx ·σy = 2
Pn 2 Mean Squared Error: MSE =
i=1 (xi −x) · i=1 (yi −y) mator) or efficient. Pn n
Partial correlation measures the grade of linear associa- Absolute Mean Error: AME
|y −ŷ |
= i=1 n i i
Hold 1 to 7: hypothesis testing and confidence intervals Pn
tion between two variables controlling a third. Mean Percentage Error:
|û /y |
MPE = i=1n i i · 100
can be done reliably.
3.3-en - github.com/marcelomijas/econometrics-cheatsheet - CC-BY-4.0 license

R-squared Individual tests Dummy variables
Tests if a parameter is significantly different from a given
Is a measure of the goodness of the fit, how the regression value, ϑ. Dummy (or binary) variables are used for qualitative infor-
fits to the data: H0 : β j = ϑ mation like sex, civil state, country, etc.
SSE
R2 = SST = 1 − SSR
SST H1 : βj ̸= ϑ Takes the value 1 in a given category and 0 in the rest.
Measures the percentage of variation of y that is lin- β̂ −ϑ Are used to analyze and modeling structural changes
early explained by the variations of x’s. Under H0 : t = se(j β̂ ) ∼ tn−k−1,α/2
j in the model parameters.
Takes values between 0 (no linear explanation of the If |t| > |tn−k−1,α/2 |, there is evidence to reject H0 . If a qualitative variable have m categories, we only have to
variations of y) and 1 (total explanation of the varia- Individual significance test - tests if a parameter is sig- include (m − 1) dummy variables.
tions of y). nificantly different from zero.
When the number of regressors increment, the value of the H0 : βj = 0 Structural change
H1 : βj ̸= 0 Structural change refers to changes in the values of the pa-
R-squared increments as well, whatever the new variables
β̂j rameters of the econometric model produced by the effect
are relevant or not. To solve this problem, there is an ad- Under H0 : t = ∼ tn−k−1,α/2
se(β̂j ) of different sub-populations. Structural change can be in-
justed R-squared by degrees of freedom (or corrected R-
If |t| > |tn−k−1,α/2 |, there is evidence to reject H0 . cluded in the model through dummy variables.
squared):
2 The location of the dummy variables (D) matters:
n−1
R = 1 − n−k−1 · SSR n−1
SST = 1 − n−k−1 · (1 − R )
2 The F test
On the intercept (additive effect) - represents the mean
2 Simultaneously tests multiple (linear) hypothesis about the
For big sample sizes: R ≈ R2 difference between the values produced by the structural
parameters. It makes use of a non restricted model and a
change.
restricted model:
Hypothesis testing Non restricted model - is the model on which we want
y = β0 + δ 1 D + β 1 x 1 + u
On the slope (multiplicative effect) - represents the ef-
to test the hypothesis.
Definitions fect (slope) difference between the values produced by
Restricted model - is the model on which the hypoth-
An hypothesis test is a rule designed to explain from a sam- the structural change.
esis that we want to test have been imposed.
ple, if exist evidence or not to reject an hypothesis y = β0 + β1 x 1 + δ 1 D · x 1 + u
Then, looking at the errors, there are:
that is made about one or more population parameters. Chow’s structural test - is used when we want to analyze
SSRUR - is the SSR of the non restricted model.
Elements of an hypothesis test: the existence of structural changes in all the model param-
SSRR - is the SSR of the restricted model.
Null hypothesis (H0 ) - is the hypothesis to be tested. SSRR −SSRUR n−k−1 eters, it’s a particular expression of the F test, where the
Under H0 : F = · q ∼ Fq,n−k−1
Alternative hypothesis (H1 ) - is the hypothesis that SSRUR
null hypothesis is: H0 : No structural change (all δ = 0).
where k is the number of parameters of the non restricted
cannot be rejected when the null hypothesis is rejected.
model and q is the number of linear hypothesis tested.
Test statistic - is a random variable whose probability Changes of scale
If Fq,n−k−1 < F , there is evidence to reject H0 .
distribution is known under the null hypothesis.
Global significance test - tests if all the parameters as- Changes in the measurement units of the variables:
Critic value - is the value against which the test statistic
sociated to x’s are simultaneously equal to zero. In the endogenous variable, y ∗ = y ·λ - affects all model
is compared to determine if the null hypothesis is rejected
H0 : β1 = β2 = · · · = βk = 0 parameters, βj∗ = βj · λ, ∀j = 1, . . . , k
or not. Is the value that makes the frontier between the
H1 : β1 ̸= 0 and/or β2 ̸= 0 . . . and/or βk ̸= 0 In an exogenous variable, x∗j = xj · λ - only affect the
regions of acceptance and rejection of the null hypothesis.
In this case, we can simplify the formula for the F statistic. parameter linked to said exogenous variable, βj∗ = βj · λ
Significance level (α) - is the probability of rejecting R2 n−k−1
Under H0 : F = 1−R 2 · ∼ F k,n−k−1 Same scale change on endogenous and exogenous - only
the null hypothesis being true (Type I Error). Is chosen k
by who conduct the test. Commonly is 0.10, 0.05 or 0.01. If Fk,n−k−1 < F , there is evidence to reject H0 . affects the intercept, β0∗ = β0 · λ
p-value - is the highest level of significance by which the
null hypothesis cannot be rejected (H0 ). Confidence intervals Changes of origin
The rule is: if the p-value is less than α, there is evidence The confidence intervals at (1 − α) confidence level can be
Changes in the measurement origin of the variables (en-
to reject the null hypothesis at that given α (there is calculated:
dogenous or exogenous), y ∗ = y + λ - only affects the
evidence to accept the alternative hypothesis). β̂j ∓ tn−k−1,α/2 · se(β̂j ) model’s intercept, β0∗ = β0 + λ

Multicollinearity Heteroscedasticity Auto-correlation
Perfect multicollinearity - there are independent vari- The residuals ui of the population regression function do The residual of any observation, ut , is correlated with the
ables that are constant and/or there is an exact linear not have the same variance σu2 : residual of any other observation. The observations are not
relation between independent variables. Is the breaking Var(u | x1 , . . . , xk ) = Var(u) ̸= σu2 independent.
of the third (3) econometric model assumption. Is the breaking of the fifth (5) econometric model as- Corr(ut , us | x1 , . . . , xk ) = Corr(ut , us ) ̸= 0 ∀t ̸= s
Approximate multicollinearity - there are indepen- sumption. The “natural” context of this phenomena is time series. Is
dent variables that are approximately constant and/or the breaking of the sixth (6) econometric model as-
there is an approximately linear relation between inde-
Consequences sumption.
OLS estimators still are unbiased.
pendent variables. It does not break any economet-
OLS estimators still are consistent. Consequences
ric model assumption, but has an effect on OLS.
OLS is not efficient anymore, but still a LUE (Linear OLS estimators still are unbiased.
Consequences Unbiased Estimator). OLS estimators still are consistent.
Perfect multicollinearity - the equation system of Variance estimations of the estimators are biased: OLS is not efficient anymore, but still a LUE (Linear
OLS cannot be solved due to infinite solutions. the construction of confidence intervals and the hypoth- Unbiased Estimator).
Approximate multicollinearity esis testing is not reliable. Variance estimations of the estimators are biased:
– Small sample variations can induce to big variations in the construction of confidence intervals and the hypoth-
the OLS estimations.
Detection esis testing is not reliable.
Graphs - look u y
– The variance of the OLS estimators of the x’s that are
collinear, increments, thus the inference of the param-
for scatter pat- Detection
terns on x vs. u Graphs - look for scatter patterns on ut−1 vs. ut or
eter is affected. The estimation of the parameter is x
or x vs. y plots. make use of a correlogram.
very imprecise (big confidence interval).
Ac. Ac.(+) Ac.(-)
x ut ut ut
Detection
Correlation analysis - look for high correlations be- Formal tests - White, Bartlett, Breusch-Pagan, etc.
tween independent variables, > |0.7|. Commonly, the null hypothesis: H0 : Homoscedasticity. ut−1 ut−1
Variance Inflation Factor (VIF) - indicates the in- ut−1
Correction
crement of Var(β̂j ) because of the multicollinearity. Use OLS with a variance-covariance matrix estimator ro-
1
VIF(β̂j ) = 1−R 2 bust to heteroscedasticity (HC), for example, the one pro-
2
j
Formal tests - Durbin-Watson, Breusch-Godfrey, etc.
where Rj denotes the R-squared from a regression be- posed by White.
Commonly, the null hypothesis: H0 : No auto-correlation.
tween xj and all the other x’s. If the variance structure is known, make use of Weighted
– Values between 4 to 10 suggest that it is advisable to Least Squares (WLS) or Generalized Least Squares Correction
analyze in more depth if there might be multicollinear- (GLS): Use OLS with a variance-covariance matrix estimator ro-
ity problems. – Supposing that Var(u) = σu2 ·xi , divide the model vari- bust to heterocedasticity and auto-correlation (HAC), for
– Values bigger than 10 indicates that there are multi- ables by the square root of xi and apply OLS. example, the one proposed by Newey-West.
collinearity problems. – Supposing that Var(u) = σu2 · x2i , divide the model Use Generalized Least Squares. Supposing yt = β0 +
One typical characteristic of multicollinearity is that the variables by xi (the square root of x2i ) and apply OLS. β1 xt + ut , with ut = ρut−1 + εt , where |ρ| < 1 and εt is
regression coefficients of the model aren’t individually dif- If the variance structure is not known, make use of Fea- white noise.
ferent from zero (due to high variances), but jointly they sible Weighted Least Squared (FWLS), that estimates a – If ρ is known, create a quasi-differentiated model where
are different from zero. possible variance, divides the model variables by it and ut is white noise and estimate it by OLS.
then apply OLS. – If ρ is not known, estimate it by -for example- the
Correction Make a new model specification, for example, logarithmic Cochrane-Orcutt method, create a quasi-differentiated
Delete one of the collinear variables. transformation (lower variance). model where ut is white noise and estimate it by OLS.
Perform factorial analysis (or any other dimension reduc-
tion technique) on the collinear variables.
Interpret coefficients with multicollinearity jointly.

Econometrics Cheatsheet en

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Econometrics Cheatsheet en

Uploaded by

Copyright:

Available Formats

Econometrics Cheat Sheet Assumptions and properties Ordinary Least Squares

By Marcelo Moreno - King Juan Carlos University

3.3-en - github.com/marcelomijas/econometrics-cheatsheet - CC-BY-4.0 license

3.3-en - github.com/marcelomijas/econometrics-cheatsheet - CC-BY-4.0 license

You might also like