Professional Documents
Culture Documents
ECTURE 1
L
2
LECTURE 2 AND 3 2
LECTURE 5 4
Lecture 6 7
LECTURE 1
extbook: 1.3,1.5-1.7, 2.1, 2.3.1-2.3.2, 2.3.4, 2.4,2.7.
T
Lecture notes: C1
LECTURE 2 AND 3
extbook: C3 and C4
T
Lecture notes: C2.1 - 2.3
2 Econometric Metods
e have to make sure these guesses are as close to the real numbers as possible.
W
Sometimes, we talk about how good or preferred one guess is compared to another. We
use words like "unbiased" or "efficient" to describethem. But remember, it's the way
we guess (the formula) that's unbiased, not the guess itself.
.2 Properties of Estimators
2
A good estimator is like a tool we use to make guesses about numbers in economics.
There are different ways to measure how good an estimator is. Here are some ways we
do it:
hese criteria help us pick the best estimator for the job. We want it to be accurate,
T
consistent, and efficient, especially when dealing with a lot of data.
ere, X is a matrix containing the values of the independent variables, β is a vector of
H
unknown parameters, and ε is a vector of errors. The goal is to estimate the parameters
β using the method of ordinary least squares (OLS), which minimizes the sum of the
squared differences between the observed and predicted values of y.
2. Zero mean error: On average, the errors have a meanof zero.
3. H
omoskedasticity: The errors have constant varianceand are uncorrelated with
each other.
5. N
o multicollinearity:There is no exact linear relationshipamong the
independent variables.
o estimate the variance σ2, we use the sample varianceof the residuals ε=y−Xβ. This
T
helps us construct statistical tests to check if our parameters are significantly different
from zero.
or example, if we want to test if one of the parameters (let's call it βk) is significantly
F
different from zero, we use a test statistic called tk. This tk statistic follows a
t-distribution with (N−K) degrees of freedom. We can also use the standard normal
distribution for large enough sample sizes. This test helps us make conclusions about
our parameters based on our sample data.
LECTURE 5
Lecture Notes (2019), Chapter 2.5
or example, let's say we're studying how people's heights vary with their ages. We may
F
assume that the heights follow a normal distribution, where the average height depends
on age. Using maximum likelihood, we estimate the parameters like the average height
and the variance that make our observed data the most likely.
S o, maximum likelihood helps us find the best estimates for unknown parameters in a
distribution, assuming we already have an idea about how the distribution works.
et's say we haveNballs in total andN1 of themare red. The likelihood of getting this
L
result, based on the proportionp,is calculated usinga formula.
e want to find the value ofpthat makes this likelihoodas big as possible. We call this
W
the maximum likelihood estimator, denoted asp^.Tomake calculations easier, we often
work with the natural logarithm of the likelihood.
loglikelihood function
S o, the maximum likelihood estimator is like our best guess for the proportion of red
balls based on the sample we took. This method helps us estimate unknown parameters
from data by maximizing the likelihood of observing that data.
he likelihood function, which represents the probability of observing the data given the
T
model parameters, is expressed using the normal distribution formula.
o find the maximum likelihood estimators for the parametersB1 , B2 and σ2, we
T
maximize the log-likelihood function. This involves taking derivatives of the
log-likelihood function with respect to each parameter and setting them equal to zero.
Solving these equations gives us the maximum likelihood estimators.
orB1 and B2 the estimators turn out to be equivalentto the ordinary least squares
F
(OLS) estimators. This is because the model is linear.
or σ2 the ML estimator differs slightly from theOLS estimator due to the difference in
F
the divisor used in the calculation. However, in large samples, both estimators converge
and become equivalent
Lecture 6
hapter 3
C
3.1 Time series Properties
a process is considered weakly stationary or covariance stationary if:
1. The expected value (mean) of the process at any given time is finite and constant.
2. T
he expected value of the process at any given time is the same, regardless of when it
is observed.
3. T
he covariance (a measure of how two variables change together) between
observations at different times only depends on the time difference between those
observations. In other words, how much one observation changes when another
observation changes is consistent over time.
hese conditions mean that the process has a consistent average value and variability over
T
time, and the way variables are related to each other doesn't change as time progresses.
Additionally, for azero-mean white noise process(where the average value is zero):
1. The expected value of the disturbance term at any given time is zero.
2. T
he expected value of the square of the disturbance term at any given time is constant
and finite.
3. T
he expected value of the product of the disturbance terms at different times is zero,
unless they are from the same time (in which case it's also finite).
hese properties help define how random disturbances behave over time in a time-series
T
analysis.
3 .2 ARMA process
In simpler terms, theARMA(A
utoRegressive MovingAverage) process is a way to model
how one observation in a time series depends on previous observations.
● T
he value at time 't' (Yt) depends linearly on its previous value at time 't-1' (Yt-1),
along with some random noise (εt).
● T he expected value of Yt can be calculated based on its previous value and the noise,
assuming that the expected value of Yt doesn't change with time.
● The variance of Yt is also determined based on its previous value and the noise,
ensuring consistency over time.
● The covariance between Yt and its lagged values (Yt-k, where 'k' represents how
many steps back) depends only on the time difference ('k'), not on the specific time 't'.
This reflects the stationarity of the process.
● T he value at time 't' (Yt) is determined by a weighted average of the current noise (εt)
and the noise from the previous time step (εt-1), along with a constant mean.
● The variance and autocovariances (covariance between different time steps) are
calculated based on the properties of the noise terms.
● Both AR and MA processes are essentially different ways to model dependence in
time series data. The choice between them depends on simplicity and suitability for
the data at hand. In fact, you can convert one into the other if needed, demonstrating
that they are essentially equivalent ways of representing the data.
- A t lag 0 (ρ0), the correlation is always 1, because it's the correlation of a variable with
itself.
- At lag 1 (ρ1), the correlation is given by α / (1 + α^2), where α represents a parameter
of the model and σ^2 represents the variance.
- At all other lags (k = 2, 3, 4, ...), the correlation is 0.
hese formulas allow us to calculate the autocorrelation at different lags, providing insights
T
into the temporal dependency structure of the time series data.
The formula for calculating the sample autocorrelation at lag 'k' is:
he SACF values can be plotted against the corresponding lag values to create a correlogram,
T
which provides insights into the temporal dependence structure of the data
When the observations are independent, the sample autocorrelations are asymptotically
1
normal with mean zero and variance 𝑇 .
Therefore, to test the statistical significance of the sample autocorrelation at a particular lag,
odels
M
Moving Avergade (MA)
The simplest class of time-series model that one could entertain is that of
the moving average process.
his equation will later gace ti be manipulated sush a process is most easlily archives by
T
introducing thelag operator notation. This wouldbe written asLyt = yt-1 in order to denote
that yt is lagged once.
I n order to show that theithlag of yt is being taken(that is the value that yt tookiperiods
ago), the notation would beLiyt = yt-1.
or
where:
I n much of what follows, the constant (μ) is dropped from the equations. Removing μ
considerably eases the complexity of algebra involved, and is inconsequential for it can be
achieved without loss of generality. In order to see this, consider a sample of observations on
a seris, zt that has a mean ż. A zero-mean series,yt can be constructed simply by subtracting ż
from each observation zt.
where ut is a xero mean white noice process with varianceσ2 .
or
where: