You are on page 1of 14

‭Summary of EMFIN‬

‭ ECTURE 1‬
L ‭‬
2
‭LECTURE 2 AND 3‬ ‭2‬
‭LECTURE 5‬ ‭4‬
‭Lecture 6‬ ‭7‬
‭LECTURE 1‬
‭ extbook: 1.3,1.5-1.7, 2.1, 2.3.1-2.3.2, 2.3.4, 2.4,2.7.‬
T
‭Lecture notes: C1‬

‭LECTURE 2 AND 3‬
‭ extbook: C3 and C4‬
T
‭Lecture notes: C2.1 - 2.3‬

‭2 Econometric Metods‬

‭ .1 Estimates and Estimators‬


2
‭Econometrics is about using math to understand how money stuff works. If an‬
‭economist says something like "how much people spend depends on how much they‬
‭make," an econometrician would add in that there's some random stuff affecting it too,‬
‭like maybe unexpected expenses or changes in prices. We use numbers to figure out‬
‭these relationships, but because of all the random stuff, we can't always be exactly right.‬
‭So we use formulas and techniques to make educated guesses instead.‬

‭ e have to make sure these guesses are as close to the real numbers as possible.‬
W
‭Sometimes, we talk about how good or preferred one guess is compared to another. We‬
‭use words like "‬‭unbiased‬‭" or "‬‭efficient‬‭" to describe‬‭them. But remember, it's the way‬
‭we guess (the formula) that's unbiased, not the guess itself.‬

‭ .2 Properties of Estimators‬
2
‭A good estimator is like a tool we use to make guesses about numbers in economics.‬
‭There are different ways to measure how good an estimator is. Here are some ways we‬
‭do it:‬

‭1.‬ L ‭ east Squares:‬‭This is a method where we try to make‬‭the difference between‬


‭our guesses and the real numbers as small as possible.‬
‭2.‬ ‭Unbiasedness:‬‭An estimator is unbiased if, on average,‬‭it gives us the correct‬
‭answer when we use it many times.‬

‭3.‬ E ‭ fficiency:‬‭Sometimes, we can't find an unbiased estimator.‬‭But when we can, we‬


‭want the one with the smallest error. This is called efficiency.‬
‭4.‬ ‭Mean Squared Error:‬‭This looks at both bias (how much‬‭our guesses tend to be‬
‭off) and variance (how much our guesses differ from each other). We want to‬
‭minimize this error.‬
‭5.‬ A
‭ symptotic Properties:‬‭This is about how well the‬‭estimator works when we‬
‭have a lot of data. We want our estimator to get closer to the right answer as we‬
‭get more data.‬

‭ hese criteria help us pick the best estimator for the job. We want it to be accurate,‬
T
‭consistent, and efficient, especially when dealing with a lot of data.‬

‭2.3 Ordinary Least Square: OLS‬

‭ he linear regression model is a fundamental tool in statistics for understanding‬


T
‭relationships between variables. Suppose we have data on a dependent variable‬
‭y‬‭i‬ ‭and several independent variables x‬‭i‭.‬ The model‬‭assumes that the dependent variable‬
‭is a linear combination of the independent variables plus some error term:‬

‭This equation can be written in matrix form as:‬

‭ ere, X is a matrix containing the values of the independent variables, β is a vector of‬
H
‭unknown parameters, and ε is a vector of errors. The goal is to estimate the parameters‬
‭β using the method of ordinary least squares (OLS), which minimizes the sum of the‬
‭squared differences between the observed and predicted values of y.‬

‭The assumptions of the multiple linear regression model are:‬


‭1.‬ ‭Functional form:‬‭The relationship between the independent‬‭and dependent‬
‭variables is linear.‬

‭2.‬ ‭Zero mean error‬‭: On average, the errors have a mean‬‭of zero.‬

‭3.‬ H
‭ omoskedasticity‬‭: The errors have constant variance‬‭and are uncorrelated with‬
‭each other.‬

‭4.‬ ‭Independence‬‭: The errors are independent of the independent‬‭variables X.‬

‭5.‬ N
‭ o multicollinearity:‬‭There is no exact linear relationship‬‭among the‬
‭independent variables.‬

‭6.‬ ‭Normality‬‭: The errors follow a multivariate normal‬‭distribution.‬


‭ hen these assumptions are met, the OLS estimator provides the best linear unbiased‬
W
‭estimates of the parameters β. This means that, among all possible unbiased estimators,‬
‭OLS gives the most precise estimates.‬

‭ .3.1 Statistical inference for β‬


2
‭Under the assumptions mentioned earlier, the OLS estimator β^has some nice‬
‭properties. Firstly, it's unbiased, meaning on average, it gives us the right answer. We can‬
‭prove this by showing that its expected value is the true value of the parameter β.‬

‭ o know the distribution of β^ we need to find its variance-covariance matrix. This‬


T
‭matrix tells us how spread out our estimates are. The OLS estimator follows a normal‬
‭distribution with mean β and variance σ‬‭2‭(‬ X‬‭T‭X
‬ ‭-1‬
)‬ ‭.‬

‭ o estimate the variance σ‬‭2‭,‬ we use the sample variance‬‭of the residuals ε=y−Xβ. This‬
T
‭helps us construct statistical tests to check if our parameters are significantly different‬
‭from zero.‬

‭ or example, if we want to test if one of the parameters (let's call it β‬‭k‭)‬ is significantly‬
F
‭different from zero, we use a test statistic called t‬‭k‭.‬ This t‬‭k‬ ‭statistic follows a‬
‭t-distribution with (N−K) degrees of freedom. We can also use the standard normal‬
‭distribution for large enough sample sizes. This test helps us make conclusions about‬
‭our parameters based on our sample data.‬

‭LECTURE 5‬
‭Lecture Notes (2019), Chapter 2.5‬

‭ .5 Maximum Likelihood Estimation‬


2
‭Maximum likelihood estimation starts with the idea that we mostly know how‬
‭something behaves, except for a few things we're not sure about. We want to figure out‬
‭these‬‭unknown‬‭parts by finding the values that make‬‭what we observed the most likely‬
‭to happen.‬

‭ or example, let's say we're studying how people's heights vary with their ages. We may‬
F
‭assume that the heights follow a normal distribution, where the average height depends‬
‭on age. Using maximum likelihood, we estimate the parameters like the average height‬
‭and the variance that make our observed data the most likely.‬
S‭ o, maximum likelihood helps us find the best estimates for unknown parameters in a‬
‭distribution, assuming we already have an idea about how the distribution works.‬

‭ .5.1 A simple examples‬


2
‭Let's break down maximum likelihood estimation using a simple example. Imagine we‬
‭have a big bag filled with red and yellow balls, and we want to know the proportion of‬
‭red balls, let's call it‬‭p‭.‬ To figure this out, we‬‭randomly pick some balls and see how‬
‭many are red.‬

‭ et's say we have‬‭N‬‭balls in total and‬‭N‬‭1‬ ‭of them‬‭are red. The likelihood of getting this‬
L
‭result, based on the proportion‬‭p,‬‭is calculated using‬‭a formula.‬

‭ e want to find the value of‬‭p‬‭that makes this likelihood‬‭as big as possible. We call this‬
W
‭the maximum likelihood estimator, denoted as‬‭p^.‬‭To‬‭make calculations easier, we often‬
‭work with the natural logarithm of the likelihood.‬

‭loglikelihood function‬

‭ or example, if we have a sample of 100 balls‬


F
‭and 44 of them are red, we can use a formula to find‬‭p^‬
‭​It turns out that‬‭p^‬‭equals the fraction of red balls‬‭in our sample, which in this case if the‬
‭sample size is 100 and 44 red balls (N‬‭1‬ ‭= 44):‬
‭𝑁‭1‬ ‬ ‭44‬
‭𝑝‬‭=‬ ‭𝑁‬
= ‭100‬

S‭ o, the maximum likelihood estimator is like our best guess for the proportion of red‬
‭balls based on the sample we took. This method helps us estimate unknown parameters‬
‭from data by maximizing the likelihood of observing that data.‬

‭ .5.2 The linear regression model with ML methodεεεεε‬


2
‭In linear regression, we often use the method of maximum likelihood (ML) to estimate‬
‭the parameters of the model. ML is based on the idea of maximizing the probability (or‬
‭likelihood) that the observed data fits the model with certain parameter values.‬

‭Consider the following simple model:‬

-‭ ‬ ‭ ‬‭i‬ ‭= observed outcome‬


y
‭-‬ ‭B1 and B2 = parameters‬
‭-‬ ‭x‭t‬i‬ ‭= predictor variable‬
‭-‬ ‭ε‬‭i‬ ‭= error term‬
‭ the error term is assumed to be normally distibutred‬

‭ ith mean 0 and variance.‬
w
‭-‬ ‭iid‬‭= errors are independently, identically, and normally‬‭distributed.‬

‭ he likelihood function, which represents the probability of observing the data given the‬
T
‭model parameters, is expressed using the normal distribution formula.‬

‭ o find the maximum likelihood estimators for the parameters‬‭B‭1‬ ‬‭, B‬‭2‬ ‭and σ‬‭2‭,‬ ‬‭we‬
T
‭maximize the log-likelihood function. This involves taking derivatives of the‬
‭log-likelihood function with respect to each parameter and setting them equal to zero.‬
‭Solving these equations gives us the maximum likelihood estimators.‬

‭ or‬‭B‬‭1‬ ‭and B‬‭2‬ ‭the estimators turn out to be equivalent‬‭to the ordinary least squares‬
F
‭(OLS) estimators. This is because the model is linear.‬

‭ or σ‬‭2‬ ‭the ML estimator differs slightly from the‬‭OLS estimator due to the difference in‬
F
‭the divisor used in the calculation. However, in large samples, both estimators converge‬
‭and become equivalent‬

I‭ n summary, the ML method helps us estimate the parameters of a linear regression‬


‭model by maximizing the likelihood of observing the data under certain parameter‬
‭values.‬

‭ .5.3 Inference with ML‬


2
‭The asymptotic distribution of the ML estimator, ˆθ = (βˆ 1 , βˆ 2 , σˆ‬‭2‬ ‭), can be expressed‬
‭as (in time-series terms)‬

‭-‬ ‭I‬‭= information matrix‬

‭The information matrix can be estimated in two ways:‬


‭1.‬ ‭either by the negative of the expected value of the second derivatives of the‬
‭log-likelihood function‬

‭2.‬ ‭by the outer product of the score vectors.‬

‭Given ˆθ = (βˆ 1 , βˆ 2 , σˆ 2 ), the partial derivative are given,‬


S‭ ince this is a 3x1 vector s of derivatives, thus the‬‭I‬‭OP‬ ‭is a 3x3 matrix. Therefore we can‬
‭write‬‭I‭O‬ P‬ ‭as:‬

‭In order to calculate‬‭I‭O‬ P‬ ‭you have to:‬


‭1.‬ ‭Calculate the three partial derivatives‬

‭2.‬ ‭Calculate the sums of the products‬

‭3.‬ ‭Divide each elements ⬆ by the number of observations‬


‭ nce you have calculated the information matrix, you can compute the inverse of the‬
O
‭information matrix to get the covariance matrix of the parameters. The standard errors‬
‭of the parameters are given by the square root of the diagonal elements in the‬
‭covariance matrix.‬

‭Lecture 6‬
‭ hapter 3‬
C
‭3.1 Time series Properties‬
‭a process is considered weakly stationary or covariance stationary if:‬

‭1.‬ ‭The expected value (mean) of the process at any given time is finite and constant.‬
‭2.‬ T
‭ he expected value of the process at any given time is the same, regardless of when it‬
‭is observed.‬

‭3.‬ T
‭ he covariance (a measure of how two variables change together) between‬
‭observations at different times only depends on the time difference between those‬
‭observations. In other words, how much one observation changes when another‬
‭observation changes is consistent over time.‬

‭ hese conditions mean that the process has a consistent average value and variability over‬
T
‭time, and the way variables are related to each other doesn't change as time progresses.‬

‭Additionally, for a‬‭zero-mean white noise process‬‭(where the average value is zero):‬
‭1.‬ ‭The expected value of the disturbance term at any given time is zero.‬

‭2.‬ T
‭ he expected value of the square of the disturbance term at any given time is constant‬
‭and finite.‬

‭3.‬ T
‭ he expected value of the product of the disturbance terms at different times is zero,‬
‭unless they are from the same time (in which case it's also finite).‬

‭ hese properties help define how random disturbances behave over time in a time-series‬
T
‭analysis.‬

3‭ .2 ARMA process‬
‭In simpler terms, the‬‭ARMA‬‭(‭A
‬ utoRegressive Moving‬‭Average‬‭) process is a way to model‬
‭how one observation in a time series depends on previous observations.‬

‭ or instance, in an‬‭AR(1) process:‬


F
‭Model dependence between consecutive observation:‬

-‭ ‬ Y‭ ‬‭t‬ ‭depends linearly upoin its pervious value Y‬‭t-1‬


‭-‬ ‭ε‭t‬‬ ‭= serially uncorrelated innovation with a mean‬‭of xero and a constant variance.‬

‭●‬ T
‭ he value at time 't' (Yt) depends linearly on its previous value at time 't-1' (Yt-1),‬
‭along with some random noise (εt).‬
‭●‬ T ‭ he expected value of Yt can be calculated based on its previous value and the noise,‬
‭assuming that the expected value of Yt doesn't change with time.‬
‭●‬ ‭The variance of Yt is also determined based on its previous value and the noise,‬
‭ensuring consistency over time.‬
‭●‬ ‭The covariance between Yt and its lagged values (Yt-k, where 'k' represents how‬
‭many steps back) depends only on the time difference ('k'), not on the specific time 't'.‬
‭This reflects the stationarity of the process.‬

‭Similarly, in an‬‭MA(1) process‬‭:‬

‭●‬ T ‭ he value at time 't' (Yt) is determined by a weighted average of the current noise (εt)‬
‭and the noise from the previous time step (εt-1), along with a constant mean.‬
‭●‬ ‭The variance and autocovariances (covariance between different time steps) are‬
‭calculated based on the properties of the noise terms.‬
‭●‬ ‭Both AR and MA processes are essentially different ways to model dependence in‬
‭time series data. The choice between them depends on simplicity and suitability for‬
‭the data at hand. In fact, you can convert one into the other if needed, demonstrating‬
‭that they are essentially equivalent ways of representing the data.‬

3‭ .2.1 Autocorrelation function (ACF)‬


‭The‬‭autocorrelation function (ACF)‬‭measures‬‭the correlation‬‭between observations at‬
‭different time lags.‬‭It helps us understand how much‬‭each observation in a time series is‬
‭related to previous observations.‬

‭For an MA(1) process:‬

‭-‬ A ‭ t lag 0 (ρ0), the correlation is always 1, because it's the correlation of a variable with‬
‭itself.‬
‭-‬ ‭At lag 1 (ρ1), the correlation is given by α / (1 + α^2), where α represents a parameter‬
‭of the model and σ^2 represents the variance.‬
‭-‬ ‭At all other lags (k = 2, 3, 4, ...), the correlation is 0.‬

‭For an AR(1) process:‬


-‭ ‬ A ‭ t lag 0 (ρ0), the correlation is always 1.‬
‭-‬ ‭At lag 1 (ρ1), the correlation is given by θ, where θ is a parameter of the model.‬
‭-‬ ‭At other lags (k = 2, 3, 4, ...), the correlation decreases exponentially with increasing‬
‭lag, following the pattern ρk = θ^k.‬

‭ hese formulas allow us to calculate the autocorrelation at different lags, providing insights‬
T
‭into the temporal dependency structure of the time series data.‬

3‭ .2.2 The sample ACF‬


‭The sample autocorrelation function (SACF) is an estimation of the theoretical‬
‭autocorrelation function (ACF) using observed data. It's calculated based on sample‬
‭autocovariances.‬

‭The formula for calculating the sample autocorrelation at lag 'k' is:‬

-‭ ‬ T‭ represents the total number of observations‬


‭-‬ ‭Y‬‭t‬ ‭represents the value of the time series at time‬‭'t'‬
‭-‬ ‭Ȳ‬‭represents the sample mean of the time series.‬

‭ he SACF values can be plotted against the corresponding lag values to create a correlogram,‬
T
‭which provides insights into the temporal dependence structure of the data‬

‭When the observations are independent, the sample autocorrelations are asymptotically‬
‭1‬
‭normal with mean zero and variance‬ ‭𝑇‬ ‭.‬
‭Therefore, to test the statistical significance of the sample autocorrelation at a particular lag,‬

‭'zk' can be calculated as‬ ‭and treated as a standardized‬‭normal variable.‬

‭ he null hypothesis of no autocorrelation at lag 'k' is rejected if‬


T
‭Z‬‭k‬ ‭>1.96, indicating statistically significant autocorrelation‬‭at that lag.‬
3‭ .2.3 The Partial ACF‬
‭In time series analysis, when dealing with an autoregressive (AR) process of order 'p', the‬
‭partial autocorrelation function (PACF) becomes more useful than the traditional‬
‭autocorrelation function (ACF). The PACF helps us determine the order of the AR process.‬

‭1.‬ T ‭ he k-th order partial autocorrelation coefficient‬ ‭measures the correlation‬


‭between Y‬‭t‬ ‭and Y‬‭t-k‬ ‭after adjusting for the intermediate‬‭values Y‬‭t-1‬‭, Y‬‭t-2‬‭,..., Y‬‭t-k+1‬‭.‬
‭2.‬ ‭If the true model is an AR(p) process, then estimating an AR(k) model using ordinary‬
‭least squares (OLS) provides consistent estimators for the model parameters if k≥ p.‬
‭ herefore,‬
T ‭if k > p.‬
‭3.‬ ‭The partial autocorrelations can be plotted against the lag 'k', or a test for the‬
‭significance of the coefficients can be performed. If the observations are from an‬
‭AR(p) process, then 0 = 0 for all k> p.‬
4‭ .‬ ‭Testing an AR(k-1) model versus an AR(k) model involves testing the null hypothesis‬
‭that Okk = 0 Under the null hypothesis, the approximate standard error of Okk is‬
‭number of observations. Therefore, 0k = 0 is rejected if |√Tokk | > 1.96. where 'T' is‬
‭the total‬
‭5.‬ ‭For a genuine AR(p) model, the partial autocorrelations will be close to zero after the‬
‭p-th lag.‬

‭In summary, for an AR(p) process:‬

‭ he ACF tails off.‬


T
‭The PACF is (close to) zero for lags larger than 'p'.‬
‭For an MA(q) process:‬

‭ he ACF is (close to) zero for lags larger than 'q'.‬


T
‭The PACF tails off.‬
‭In situations where neither of these patterns is observed, a combined ARMA model may be a‬
‭more appropriate representation of the data.‬

‭ odels‬
M
‭Moving Avergade (MA)‬
‭The simplest class of time-series model that one could entertain is that of‬
‭the moving average process.‬

‭ et u‬‭t‬ ‭(t = 1,2,3,...) be a‬‭white noice‬‭process with‬‭E(u‬‭t‭)‬ = 0 and Var(u‬‭t‬‭) = σ‬‭2‬


L
‭then‬
t‭he model above is a‬‭q‬‭th‬‭order moving average model,‬‭MA(q)‬‭.‬
‭Thos can be expressed using sigma denotion as‬
‭ oving average model‬‭is a linear combination of white‬‭noice processes, so that y‬‭t‬ ‭depends‬
M
‭on the current and previous values of white noise disturbance term.‬

‭ his equation will later gace ti be manipulated sush a process is most easlily archives by‬
T
‭introducing the‬‭lag operator notation‬‭. This would‬‭be written as‬‭Ly‬‭t‬ ‭= y‬‭t-1‬ ‭in order to denote‬
‭that y‬‭t‬ ‭is lagged once.‬

I‭ n order to show that the‬‭i‬‭th‬‭lag of y‬‭t‬ ‭is being taken‬‭(that is the value that y‬‭t‬ ‭took‬‭i‬‭periods‬
‭ago), the notation would be‬‭L‭i‬‬‭y‬‭t‬ ‭= y‬‭t-1‬‭.‬

‭Using the lag operation notation, it could be written like this:‬

‭or‬

‭where:‬

I‭ n much of what follows, the constant (μ) is dropped from the equations. Removing μ‬
‭considerably eases the complexity of algebra involved, and is inconsequential for it can be‬
‭achieved without loss of generality. In order to see this, consider a sample of observations on‬
‭a seris, z‬‭t‬ ‭that has a mean ż. A zero-mean series,‬‭y‭t‬‬ ‭can be constructed simply by subtracting ż‬
‭from each observation z‬‭t‬‭.‬

‭The distinguishing properties of the‬‭moving average‬‭process in order q given above are‬

‭ o,‬‭moving average process‬‭has‬‭constant mean, constant‬‭variance and autocovariance‬‭which‬


S
‭may be non-zero to lag q and will always be zero thereafter.‬
‭ x.‬
E
‭Consider following MA(2) process‬

‭where u‬‭t‬ ‭is a xero mean white noice process with variance‬‭σ‭2‬ ‬‭.‬

1‭ )‬ C ‭ alculate the mean and variance of y‬‭t‬‭.‬


‭2)‬ ‭Derive the autocorrelation function for this process (i.e. express the‬
‭autocorrelations T‬‭1‭,‬ T‬‭2‬‭,... as functions of the parameters‬θ‭1‭‬‬‭𝑎𝑛𝑑‬‭‬θ‭2‬‭.‬
‭3)‬ ‭If‬θ‭1‭‬‬ = ‭‬ − ‭0,‬ ‭5‭‬‬‭𝑎𝑛𝑑‬‭‬θ‭2‭‬‬ = ‭‭0
‬ ‬, ‭25‬‭sketch the ACF of y‬‭t‬‭.‬

‭ utogressive processes (AR)‬


A
‭An autoregressive model is one where the‬‭current value‬‭of a variable, y,‬
‭depends upon only the values that the variable took in previous periods‬
‭plus an error term.‬‭An autoregressive model of order‬‭p, denoted as AR(p),‬
‭can be expressed as‬

‭where u‬‭t‬ ‭is a white noice disturbance term. A manipulation‬‭of expression‬

‭will be required to demonstrate the properties of‬‭an autoregressive‬


‭ odel.‬
m
‭This expression can be written more compactly‬‭using‬‭sigma notation:‬

‭or‬‭usin the lag operation:‬

‭or‬

‭where:‬

You might also like