807 2

Course: Basic Econometrics (807)
Semester: Autumn, 2021

ASSIGNMENT No. 2
Q.1 Write down the relationship between two variables in the form of a distributed lag model and auto
regressive model. What is the difference between two models?
There are many interesting studies that are applied to time series variables that consider the use of multivariate
methods. These techniques seek to describe the information that is incorporated within the temporal and cross-
sectional dependence of these variables. In most cases, the goal of the analysis is to provide a better
understanding of the dynamic relationship between variables and in certain cases these techniques may be used
to improve the forecasting accuracy. The models that have been developed within this area of research may also
be used in policy analysis purposes or for making specific inference about potential relationships. Some of the
early multivariate time series models fall within the class of linear distributed lag models. Early examples of
these models include the polynomial and geometric distributed lag models. In addition, the autoregressive
distributed lag (ARDL) model, which incorporates what have been termed the rational distributed lag model,
continue to used in a number of studies that may be found in the current literature.
1
Q.2 What is dummy variable? Explain interaction effects using dummy variables?
A dummy variable is a numerical variable used in regression analysis to represent subgroups of the sample in
your study. In research design, a dummy variable is often used to distinguish different treatment groups. In the
simplest case, we would use a 0,1 dummy variable where a person is given a value of 0 if they are in the control
group or a 1 if they are in the treated group. Dummy variables are useful because they enable us to use a single
regression equation to represent multiple groups. This means that we don’t need to write out separate equation
models for each subgroup. The dummy variables act like ‘switches’ that turn various parameters on and off in
an equation. Another advantage of a 0,1 dummy-coded variable is that even though it is a nominal-level
variable you can treat it statistically like an interval-level variable (if this made no sense to you, you probably
should refresh your memory on levels of measurement). For instance, if you take an average of a 0,1 variable,
the result is the proportion of 1s in the distribution.
y_i = \beta_0 + \beta_1Z_i +e_iyi=β0+β1Zi+ei
where:
 yi is outcome score of ith unit,
 β0 is coefficient for the intercept,
 β1 is coefficient for the slope,
2
 Zi is:
o 1 if the ith unit is in the treatment group;
o 0 if the ith unit is in the control group;
 ei is residual for the ith unit.
To illustrate dummy variables, consider the simple regression model for a posttest-only two-group randomized
experiment. This model is essentially the same as conducting a t-test on the posttest means for two groups
or conducting a one-way Analysis of Variance (ANOVA). The key term in the model is β1, the estimate of
the difference between the groups. To see how dummy variables work, we’ll use this simple model to show you
how to use them to pull out the separate sub-equations for each subgroup. Then we’ll show how you estimate
the difference between the subgroups by subtracting their respective equations. You’ll see that we can pack an
enormous amount of information into a single equation using dummy variables. All I want to show you here is
that β1 is the difference between the treatment and control groups.
To see this, the first step is to compute what the equation would be for each of our two groups separately. For
the control group, Z = 0. When we substitute that into the equation, and recognize that by assumption the error
term averages to 0, we find that the predicted value for the control group is β0, the intercept. Now, to figure out
the treatment group line, we substitute the value of 1 for Z, again recognizing that by assumption the error term
averages to 0. The equation for the treatment group indicates that the treatment group value is the sum of the
two beta values.
Now, we’re ready to move on to the second step – computing the difference between the groups. How do we
determine that? Well, the difference must be the difference between the equations for the two groups that we
worked out above. In other word, to find the difference between the groups we just find the difference between
the equations for the two groups! It should be obvious from the figure that the difference is β1. Think about what
3
this means. The difference between the groups is β1. OK, one more time just for the sheer heck of it. The
difference between the groups in this model is β1!
Whenever you have a regression model with dummy variables, you can always see how the variables are being
used to represent multiple subgroup equations by following the two steps described above:
 create separate equations for each subgroup by substituting the dummy values
 find the difference between groups by finding the difference between their equations
Q.3 Explain Autoregressive Conditional Heteroskedasticity (ARCH) and Generalized Autoregressive
Conditional Heteroskedasticity (GARH) models.
Autoregressive conditional heteroskedasticity (ARCH) is a statistical model used to analyze volatility in time
series in order to forecast future volatility. In the financial world, ARCH modeling is used to estimate risk by
providing a model of volatility that more closely resembles real markets. ARCH modeling shows that periods
of high volatility are followed by more high volatility and periods of low volatility are followed by more low
volatility.
In practice, this means that volatility or variance tends to cluster, which is useful to investors when considering
the risk of holding an asset over different time periods. The ARCH concept was developed by
economist Robert F. Engle III in the 1980s. ARCH immediately improved financial modeling, resulting in
Engle winning the 2003 Nobel Memorial Prize in Economic Sciences.
The autoregressive conditional heteroskedasticity (ARCH) model was designed to
improve econometric models by replacing assumptions of constant volatility with conditional volatility. Engle
and others working on ARCH models recognized that past financial data influences future data—that is the
definition of autoregressive. The conditional heteroskedasticity portion of ARCH simply refers to the
observable fact that volatility in financial markets is nonconstant—all financial data, whether stock market
values, oil prices, exchange rates, or GDP, go through periods of high and low volatility. Economists have
always known the amount of volatility changes, but they often kept it constant for a given period because they
lacked a better option when modeling markets.
4
ARCH provided a model that economists could use instead of a constant or average for volatility. ARCH
models could also recognize and forecast beyond the volatility clusters that are seen in the market during
periods of financial crisis or other black swan events. For example, volatility for the S&P 500 was unusually
low for an extended period during the bull market from 2003 to 2007, before spiking to record levels during
the market correction of 2008.2 This uneven and extreme variation is difficult for standard-deviation-based
models to deal with. ARCH models, however, are able to correct for the statistical problems that arise from
this type of pattern in the data. Moreover, ARCH models work best with high-frequency data (hourly, daily,
monthly, quarterly), so they are ideal for financial data. As a result, ARCH models have become mainstays for
modeling financial markets that exhibit volatility (which is really all financial markets in the long run).
According to Engle's Nobel lecture in 2003, he developed ARCH in response to Milton Friedman's conjecture
that it was the uncertainty about what the rate of inflation would be rather than the actual rate of inflation that
negatively impacts an economy. 3 Once the model was built, it proved to be invaluable for forecasting all
manner of volatility. ARCH has spawned many related models that are also widely used in research and in
finance, including GARCH, EGARCH, STARCH, and others.
These variant models often introduce changes in terms of weighting and conditionality in order to achieve
more accurate forecasting ranges. For example, EGARCH, or exponential GARCH, gives a greater weighting
to negative returns in a data series as these have been shown to create more volatility. Put another way,
volatility in a price chart increases more after a large drop than after a large rise. Most ARCH model variants
analyze past data to adjust the weightings using a maximum likelihood approach. This results in a dynamic
model that can forecast near-term and future volatility with increasing accuracy—which is, of course, why so
many financial institutions use them.
Generalized AutoRegressive Conditional Heteroskedasticity (GARCH) is a statistical model used in analyzing
time-series data where the variance error is believed to be serially autocorrelated. GARCH models assume that
the variance of the error term follows an autoregressive moving average process.
Although GARCH models can be used in the analysis of a number of different types of financial data, such as
macroeconomic data, financial institutions typically use them to estimate the volatility of returns for stocks,
bonds, and market indices. They use the resulting information to help determine pricing and judge which assets
will potentially provide higher returns, as well as to forecast the returns of current investments to help in
their asset allocation, hedging, risk management, and portfolio optimization decisions.
GARCH models are used when the variance of the error term is not constant. That is, the error term
is heteroskedastic. Heteroskedasticity describes the irregular pattern of variation of an error term, or variable,
in a statistical model.
Essentially, wherever there is heteroskedasticity, observations do not conform to a linear pattern. Instead, they
tend to cluster. Therefore, if statistical models that assume constant variance are used on this data, then the
conclusions and predictive value one can draw from the model will not be reliable.
5
The variance of the error term in GARCH models is assumed to vary systematically, conditional on the
average size of the error terms in previous periods. In other words, it has conditional heteroskedasticity, and
the reason for the heteroskedasticity is that the error term is following an autoregressive moving
average pattern. This means that it is a function of an average of its own past values.
History of GARCH
GARCH was developed in 1986 by Dr. Tim Bollerslev, a doctoral student at the time, as a way to address the
problem of forecasting volatility in asset prices. It built on economist Robert Engle's breakthrough 1982 work
in introducing the Autoregressive Conditional Heteroskedasticity (ARCH) model. His model assumed the
variation of financial returns was not constant over time but are autocorrelated, or conditional to/dependent on
each other. For instance, one can see this in stock returns where periods of volatility in returns tend to be
clustered together.1
Since the original introduction, many variations of GARCH have emerged. These include Nonlinear
(NGARCH), which addresses correlation and observed "volatility clustering" of returns, and Integrated
GARCH (IGARCH), which restricts the volatility parameter. All the GARCH model variations seek to
incorporate the direction, positive or negative, of returns in addition to the magnitude (addressed in the original
model).
Each derivation of GARCH can be used to accommodate the specific qualities of the stock, industry, or
economic data. When assessing risk, financial institutions incorporate GARCH models into their Value-at-
Risk (VAR), maximum expected loss (whether for a single investment or trading position, portfolio, or at a
division or firm-wide level) over a specified time period. GARCH models are viewed to provide better gauges
of risk than can be obtained through tracking standard deviation alone.
Various studies have been conducted on the reliability of various GARCH models during different market
conditions, including during the periods leading up to and after the Great Recession.
Q.4 What to do when find problem of Autocorrelation?
ACF functions are used for model criticism, to test if there is structure left in the residuals. An important
prerequisite is that the data is correctly ordered before running the regression models. If there is structure in the
residuals of a GAMM model, an AR1 model can be included to reduce the effects of this autocorrelation.
There are basically two methods to reduce autocorrelation, of which the first one is most important:
1. Improve model fit. Try to capture structure in the data in the model. See the vignette on model
evaluation on how to evaluate the model fit: vignette("evaluation", package="itsadug").
2. If no more predictors can be added, include an AR1 model. By including an AR1 model, the GAMM
takes into account the structure in the residuals and reduces the confidence in the predictors accordingly.
6
How to include an AR1 model
1. First mark the start of each time series as TRUE, and all other data points as FALSE. For measures that
develop over time this means typically marking the start of each participant-trial combination. For
behavioral or response measures, such as reaction times, this means typically marking the first trial of
each participant.
2. Determine the value for the autocorrelation coefficient rho.
Example
Loading the data:

library(itsadug)
library(mgcv)
data(simdat)
# add missing values to simdat:
simdat[sample(nrow(simdat), 15),]$Y <- NA
Mark start of time series and determine rho
In this data, for each individual subject each trial is a unique time series of at most 100 measurements. Mark the
start of each time series using the function start_event:
simdat <- start_event(simdat, column="Time", event=c("Subject", "Trial"), label.event="Event")
head(simdat)
## Group Time Trial Condition Subject Y Event start.event
## 1 Adults 0.00000 -10 -1 a01 0.7554469 a01.-10 TRUE
## 2 Adults 20.20202 -10 -1 a01 2.7834759 a01.-10 FALSE
## 3 Adults 40.40404 -10 -1 a01 1.9696963 a01.-10 FALSE
## 4 Adults 60.60606 -10 -1 a01 0.6814298 a01.-10 FALSE
## 5 Adults 80.80808 -10 -1 a01 1.6939195 a01.-10 FALSE
## 6 Adults 101.01010 -10 -1 a01 2.3651969 a01.-10 FALSE
To determine the value of rho, we first have to run a ‘plain’ model to see how strong the residuals are
correlated.
library(mgcv)
# example model:
m1 <- bam(Y ~ te(Time, Trial)+s(Subject, bs='re'), data=simdat)
Different ways to inspect correlation, they all result in the same picture:
par(mfrow=c(1,3), cex=1.1)
# default ACF function:
7
acf(resid(m1), main="acf(resid(m1))")
# resid_gam:
acf(resid_gam(m1), main="acf(resid_gam(m1))")
# acf_resid:
acf_resid(m1, main="acf_resid(m1)")
Determine the value of lag 1, as indicated by the red dot in the picture below:
# we also ask to plot the ACF by specifying plot (FALSE by default):
r1 <- start_value_rho(m1, plot=TRUE)
The function start_value_rho basically implements the following line:

acf(resid(m1), plot=FALSE)$acf[2]
## [1] 0.9506088
Run model with AR1 included
Now we have all information to run a model with AR1 model included:
# example model:
m1AR1 <- bam(Y ~ te(Time, Trial)+s(Subject, bs='re'), data=simdat, rho=r1, AR.start=simdat$start.event)
Q.5 Outline the steps you would follow in the Principal Component Analysis (PCA).
Principal component analysis (PCA) is a statistical procedure that is used to reduce the dimensionality. It uses
an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values
of linearly uncorrelated variables called principal components. It is often used as a dimensionality reduction
technique.
Step 1: Standardize the dataset.
Step 2: Calculate the covariance matrix for the features in the dataset.
Step 3: Calculate the eigenvalues and eigenvectors for the covariance matrix.
Step 4: Sort eigenvalues and their corresponding eigenvectors.
Step 5: Pick k eigenvalues and form a matrix of eigenvectors.
Step 6: Transform the original matrix.
Let's go to each step one by one.
1. Standardize the Dataset
Assume we have the below dataset which has 4 features and a total of 5 training examples.
8
Dataset matrix
First, we need to standardize the dataset and for that, we need to calculate the mean and standard deviation for
each feature.
Standardization formula
Mean and standard deviation before standardization

After applying the formula for each feature in the dataset is transformed as below:
Standardized Dataset
9
2. Calculate the covariance matrix for the whole dataset
The formula to calculate the covariance matrix:
Covariance Formula
the covariance matrix for the given dataset will be calculated as below
Since we have standardized the dataset, so the mean for each feature is 0 and the standard deviation is 1.
var(f1) = ((-1.0-0)² + (0.33-0)² + (-1.0-0)² +(0.33–0)² +(1.33–0)²)/5
var (f1) = 0.8
cov(f1,f2) =
((-1.0–0)*(-0.632456-0) +
(0.33–0)*(1.264911-0) +
(-1.0–0)* (0.632456-0)+
(0.33–0)*(0.000000 -0)+
(1.33–0)*(-1.264911–0))/5
cov(f1,f2 = -0.25298
In the similar way be can calculate the other covariances and which will result in the below covariance matrix
10
covariance matrix (population formula)

3. Calculate eigenvalues and eigen vectors.
An eigenvector is a nonzero vector that changes at most by a scalar factor when that linear transformation is
applied to it. The corresponding eigenvalue is the factor by which the eigenvector is scaled.
Let A be a square matrix (in our case the covariance matrix), ν a vector and λ a scalar that satisfies Aν = λν, then
λ is called eigenvalue associated with eigenvector ν of A.
Rearranging the above equation,
Aν-λν =0 ; (A-λI)ν = 0
Since we have already know ν is a nonzero vector, only way this equation can be equal to zero, if
det(A-λI) = 0
A-λI = 0
Solving the above equation = 0
λ = 2.51579324 , 1.0652885 , 0.39388704 , 0.02503121
Eigenvectors:
Solving the (A-λI)ν = 0 equation for ν vector with different λ values:
11
For λ = 2.51579324, solving the above equation using Cramer's rule, the values for v vector are
v1 = 0.16195986
v2 = -0.52404813
v3 = -0.58589647
v4 = -0.59654663
Going by the same approach, we can calculate the eigen vectors for the other eigen values. We can from a matrix
using the eigen vectors.
eigenvectors(4 * 4 matrix)
4. Sort eigenvalues and their corresponding eigenvectors.
Since eigenvalues are already sorted in this case so no need to sort them again.
5. Pick k eigenvalues and form a matrix of eigenvectors
If we choose the top 2 eigenvectors, the matrix will look like this:
Top 2 eigenvectors(4*2 matrix)

6. Transform the original matrix.
Feature matrix * top k eigenvectors = Transformed Data
Data Transformation
12
Compare with sklearn library
code snippet for PCA using Sklearn

The results are the same, only change in the direction of the PC1, which according to me, doesn’t make any
difference as mentioned here also. So we have successfully converted our data from 4 dimensional to 2
dimensional. PCA is mostly useful when data features are highly correlated.
13

807 2

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

807 2

Uploaded by

Copyright:

Available Formats

Course: Basic Econometrics (807)

Semester: Autumn, 2021

Loading the data:

# default ACF function:

The function start_value_rho basically implements the following line:

Mean and standard deviation before standardization

covariance matrix (population formula)

Top 2 eigenvectors(4*2 matrix)

code snippet for PCA using Sklearn

You might also like