You are on page 1of 29

1

Non-Stochastic Volatility Vs. Stochastic Volatility Models

Student's Name

Institutional Affiliation

Course Number and Name

Professor's Name

Assignment Due Date


2

Non-Stochastic Volatility Vs. Stochastic Volatility Models

Introduction

A time-varying parameter model can be classified into two categories: parameter-driven

and observation-driven specifications (Koopman et al., 2016). It's important to note that in an

observation-driven model, the current parameters are dependable functions of both lag-

dependent and lag-exogenous variables. Randomly evolving parameters can be predicted one

step in the future based on previous data. The forecast error dissection provides a closed-form

conditional probability for observation-driven models (Koopman et al., 2016). This property has

made this framework popular in applied econometrics and analytics since it facilitates easy

estimating processes.

Dynamic models and idiosyncratic inventions drive variables in parameter-driven

models. They are subject to change. There are no closed-form theoretical formulas for the

likelihood function for these models. A more comprehensive approach to evaluating likelihood is

needed for parameter-driven simulations because efficient prediction models are often required.

Stochastic volatility models incorporate stochastic conditional duration models and stochastic

copula models (Koopman et al., 2016). In light of the vast amount of time and effort that has

been invested in studying and implementing a wide range of these parameters, it is critical to

examine the relative advantages of parameter-driven and observation-driven frameworks

(Koopman et al., 2016). Any time series model's applicability hinges on its robust out-of-sample

performance.
3

The generalized autoregressive score (GAS) model belongs to the same family of

observation-driven approaches as nonlinear non-Gaussian state-space models in terms of

generalizability. The GAS approach updates time-varying variables using a scaled score vector

from the density of the statistical models (Blasques et al., 2018). For any density of observations,

we can use the GAS model. Although the GARCH model belongs to the GAS class, new models

like the mixed variable dynamic factor structure can also be developed using the GAS

framework. A natural assertion alternative to the state space framework, the GAS framework can

be used for a wide variety of DGPs (Blasques et al., 2018). No matter how much data you have,

comparing observation- and parameter-driven models are impossible. In parameter-driven

models, the predictive dispersion is a mix of observation distributions over the random time-

varying variable, whereas, in observation-driven models, the forecast density is just the

observation distribution given a completely predictable parameter (Blasques et al., 2018). Over-

distribution, larger tails, and other properties of parameter-driven models may directly place

these models ahead of observation-driven models.

Research Motivation

Since the financial crisis of 2008, stock risk analysis has become increasingly important.

Financial institutions and regulators focus on determining the common variation in corporate

failures (Blasques et al., 2018). This study developed a novel instrumental variable modeling

approach for mixed quantification time series data. There may be a wide range of distributions to

choose from and the possibility of missing data and cross-sectional interdependence due to

shared experience to dynamic homogeneity components in this framework. The primary goal is
4

to build a versatile framework for modeling stock risk estimation, assessment, and projection.

This framework has a significant advantage in that the certainty is obtainable in a linear system

and does not need to be determined through simulation (Blasques et al., 2018). As a result,

parameter estimation can be performed using simple approaches.

It is possible to price options using the stochastic volatility model. Option prices may not

represent volatility predictions with large standard errors. It is hoped that this averaging will

lower the standard errors in option prices, which are dependent on the average volatility of the

option contract. If process parameters are considered over a lengthy period, volatility converges

to the process's unconditional variance, which is known without any error. The smoothing

estimator's errors will be highly autocorrelated, and its convergence to unconditional volatility

will be delayed if the volatility process's persistence parameter r is close to unity. Large standard

errors will be present in both the option pricing and the deltas if this is the case.

Stochastic volatility models are a natural option to the GARCH parameters of time-

varying volatility models. In SVOL models, there is a distinct error method for the conditional

variance and the conditional mean. The auto-regressive lognormal process with independent

innovations invariance equations is the SVOL model's basis. The SVOL family of models has

been more scalable than the GARCH family. Conditional mean innovations with a fat tail or

"leverage" impact can naturally expand an SVOL model. Gallant and coworkers have uncovered

findings in favor of fat-tails. Volatility increases (decreases) for negative shocks to the mean

when the mean and variance errors are negatively correlated. This is something that Black has

looked into. GARCH and Glosten et al. (1991)'s modified GARCH are two such models.
5

According to Dumas et al. (1998), index option prices negatively correlate with the underlying

index return.

Academic Contribution

Using volatility models to analyze financial time series data has become a widespread

practice in recent years, and a large literature has been developed. The ARCH model is a critical

tool for studying variance changes over time. Using the ARCH process, Engle (1982) attempts to

explain time-varying conditional variance by taking into account past disruptions. High ARCH

order is required to capture the provisions of conditional variance, as early empirical data

suggests. Bollerslev's (1986) GARCH model provides an answer to this problem. Many great

ARCH/GARCH models assessments may be found in Bollerslev, Chou, and Kroner (1992) and

Pantula (1986), describing ARCH models' maximum likelihood inference strategies under the

normality assumption (Asai & McAleer, 2009).

It has been discussed in Mark (1988) and Bodurtha and Mark (1991), as well as Simon

(1989).  In addition to this, Geweke (1988) established the Bayesian inference processes for

ARCH models using Monte Carlo approaches to predict the precise posterior distributions.

While Robinson (1987) utilizes a nonparametric method, Gallant and Nychka (1987), Gallant,

Rossi, and Tauchen (1990) use a semiparametric approach. It has been found in recent studies

that a good vector autoregression (VAR) for predicting and computational modeling of

macroeconomic data requires a broad, broad spectrum of macroeconomic variables, as well as

time variation in their volatilities.  According to Banbura, Giannone, and Reichlin (2010), larger

systems are better at predicting and structural analysis. Volatility changes throughout time and
6

Primiceri (2005) emphasizes the importance of this variance. Bayesian estimate relies on a

probability density, the sum of the certainty, and the before accommodate both of these

components (Asai & McAleer, 2009). A large model's many parameters necessitate Bayesian

shrinkage, and Bayesian computation is useful for stochastic volatility.

Literature Review

Observation-Driven Models

An observation-driven model uses contemporary parameters as predictable functions of

serial correlation and exogenous variables (Creal et al., 2013). In this scenario, parameters

change at random yet can be predicted exactly one step ahead of time-based on prior knowledge.

i. GARCH

According to Creal et al. (2013), Heteroskedastic means that the regression coefficients

are predicted to be bigger for certain points or bands of data than for others and that the

deviations of the parameter estimates are not equal. Even if the regression estimates for an

ordinary least regression are still impartial even when heteroskedasticity is present, conventional

techniques' standard errors and confidence intervals will be too small, leading to an illusion of

precision. GARCH models regard heteroskedasticity as a variability to be modeled rather than a

concern to be fixed. A prediction is also made for the dispersion of each residual variance due to

the corrections made to least squares. This is frequently of interest in the financial industry. The

GARCH general formula is given by;


7

A class of GARCH-X models can be defined by the return and GARCH equations, which

we refer to as the return and GARCH equations, respectively. In the abbreviation GARCH-X, the

exogenous variable xt is referred to. Chen et al. (2009) developed the HYBRID GARCH

framework, which incorporates GARCH-X model versions and other related models. If you look

at xt, you can see that it can be used to measure ht; hence, we'll call it the measurement equation.

xt = ht + ut is the most basic measurement equation. The measurement equation is a crucial part of

the model, which completes it. It is known that rt and xt are dependent on each other, and the

measurement equation gives an easy approach to model this relationship. The existence of zt in

the quantification equation models this interdependence, which our empirical research shows to

be quite significant. The Realized GARCH approach incorporates all of the ARCH and GARCH

model variants, if not all of them. In such models, the measurement solution is reduced to a

single identity, making nesting possible by setting xt = rt or xt = r2t.

The multimodal realized GARCH model now includes return variables, GARCH

symmetries, and assessment equations, and it can be introduced using the proper notation. The

following is the return formula for the i-th asset at time t:


8

With example, the original observation equation can be substituted for the simplified one

under the structural model, i.e., percent t = percent (t). Most of the time, a useful analogy for

returns is more important than a model for actual measurements because they are secondary

considerations. Using the converted variance and correlation measurement equations, we may

incorporate the actual measurements into our models more easily. As long as a factor structure is

in place, the lower-dimensional assessment equation for t can be utilized to achieve this, making

it possible to gain numerical advantages by employing a lower-dimensional system of equations.

It's important to remember that the equation's initial statistic should be used in some

cases. To compare the overall log-likelihood of several model specifications, all models must use

the same measurement formulae. It is possible to combine different factor structures into distinct

measurement algorithms. Models that use distinct measurement equations can only be compared

in terms of their total likelihood, comparing models that simulate different (dimensions). This

comparison would be like comparing apples and oranges. If the total likelihoods of different

models are to be compared, then a universal set of assessment equations, such as the first

equation. You can also use a selective log-likelihood for returns to evaluate the model

parameters instead of considering the entire probability and then omitting the part about the

actual measurements. Since multivariate GARCH models are designed to model the rate of
9

return, this may be the ideal comparison method. The following section goes through the log-

likelihood terms pertinent to these comparisons.

It is simple to forecast return probabilities from the model because all dynamic variables

are presented in an observation-driven manner. It is possible to calculate all endogenous

variables and correlations for period t+1 by using the GARCH equations. Ht+h's constituents are

not established beyond threshold h = 1; hence the future of zt and ut cannot be predicted.

Simulated or bootstrapped estimates of Ht+h's distribution are trivial to calculate. Multi-step

predictions can be inferred from this model at any forecasting horizon. Such Realized GARCH

models are extensively discussed in depth in Lunde and Olesen's book on forecasting

methodologies. The bootstrap method has the advantage of not relying on the distributional

features of the data.

In multivariate GARCH models, the conditional dispersion of the variance vector is

described. This objective is defined by the log-likelihood function for returns used in the

estimation. There are many ways to evaluate the different specifications, but comparing their

return log-likelihoods, which represent their ability to predict the distribution of the return

vector, is an excellent starting point for comparison. Since the parameter vector can be calculated

from in-sample data, it is necessary to evaluate and compare specification values using the

average value of 'r,t (in and out of sample). The mean predictive log-likelihood is used as a gain

function in this type of model evaluation, which is similar to one-day-ahead return vector density

forecasting.

ii. GAS
10

Huang et al. (2014) asserted that observation-based models like GAS have advantages

over the other models. The process of determining the probability of an event is simple.

Asymmetric, long-term memory, and other more intricate dynamics can be studied without

additional complexity. Based on the score, rather than averages and high-order moments alone,

the GAS model exploits the full density structure, not only means and high-order moments.

According to Benjamin et al. (2003) and Cipollini and Cipollini et al. (2012), it is distinct from

other observation-driven modeling approaches, such as generalized autoregressive moving

average models and vector cumulative error models. The general formulation of the GAS model

is represented as below;

2.2 is the measurement equation because it connects realized variance to conditional

variation. The bivariate Gaussian distribution of zt and ut is taken for granted. The leverage

function, d1(z2t +1) + d2zt, introduces reliance between the yield shock and fluctuation shock.

Density p is used to jointly distribute y1t and y2t (y1t; y2t). The new (zt; ut) function of time t's

inventions. As a result of these new findings and their previous value, the volatility changes. The

observation density immediately following the GAS definition determines the precise functional

shape. The GAS model is a model for latent dynamic components based on observations. The

dynamic score determines the time-varying parameter t = ft.


11

There's a big advantage to the GAS(p/q) specification in that it may be used for various

models and parameterizations. While the recursion approach can be applied to a broad variety of

models with a parametric likelihood definition, it is more general. The GAS model's score is

scaled according to the information matrix's inverse. The unit matrix, St1 I, provides a simpler

scaling option. Unscaled gradients are used in this situation; therefore, it's similar to a steepest-

descent optimization procedure. However, our experience has shown that this revamping

mechanism is generally less reliable than other mechanisms. Due to these considerations, we

believe it is best to use St=I1/t to scale the rating matrix instead of the direct product of the

matrix. The execution of the inversion when the informational matrix is not full rank or

statistically volatile for specific models could be a problem.

The GAS model has the advantage of using all of the likelihood information. The time-

varying parameter makes a scaled (local density) score step to lower the one-step-ahead

estimation error at the present observation. Despite Being Built on A Fundamentally Different

Paradigm, the GAS model offers a powerful and hypercompetitive alternative to conventional

observation-driven models and parameter-driven models. Extensive nontrivial empirical and

computational examples have been used to demonstrate this. The time-varying parameters of

state-space structures with stochastically time-varying values, multivariate unmarked point

operations, and time-varying copula structures all have intriguing expansions and alternate

specifications in these situations.

This versatility and relevance for a wide range of models make it challenging to develop

a standard set of parameters for stationarity and regularity that can be applied to all relevant
12

scenarios. A more promising approach may be to use GAS specifications to create conditions for

specific subsets of models. A second research direction is to examine the finite-sample aspects of

GAS models in greater detail. The statistical features of parameter estimations for GAS models

may benefit from a more comprehensive investigation than the few fascinating empirical and

computational examples we've supplied so far. This is especially true if the information matrix is

unambiguously non-singular for all sample data. The probabilistic maximization technique

converges quickly and reliably. When there is little or no information about the design parameter

in observation, it is more important to introduce information smoothing and find suitable

beginning values. There are locations where the information matrix degenerates, which is

especially pertinent here. In our opinion, data smoothing is essential in these situations.

Automated smoothing by computing the smoothing variable directly from data has also

improved the probability value in numerous circumstances.

iii. Realized variance

According to Barndorff‐Nielsen et al. (2009), it is possible to estimate the exponential

variation of an efficient price mechanism from high bandwidth noisy data using the realized

kernel estimators. With the help of alternative methodologies like subsampling and pre-

averaging, we can understand better time-varying variability and better anticipate future

volatility by extending the influential achieved variance literature. RV is a popular experimental

statistical measurement, and RV provides a perfect approximation of volatility in fictitious

circumstances when prices are tracked continuously and without estimation errors.
13

Since the RV is a sum-of-squared yields calculation, this finding recommends that the

evidence be sampled as frequently as possible when calculating the RV. However, due to market

microstructure noise, this leads to a bias issue. Awartani, Corradi, and Distaso have recently

documented the presence of noise in volatility indicator plots that Andersen established,

Bollerslev, Diebold, and Labys (2000) (Barndorff‐Nielsen et al., 2009).  As a result, returns are

often sampled at a modest frequency, such as every five minutes, because of the trade-off

between prejudice and volatility. Filtering techniques, which earlier studies have employed to

correct bias, are an alternate method of dealing with the problem.

Parameter-Driven Models

There are many unique improvements in parameter-driven models because the

parameters themselves change over time. Closed-form analytical equations do not represent the

probability function in these models. Effective simulation methodologies are generally required

to evaluate the likelihood of a parameter-driven model (Hansen et al., 2010). Time-varying

parameters in stochastic processes are possible for any conditional observation density, so long

as the process itself is stochastic. Because of this, parameter-driven models are applicable in a

various situation. Instead of a flexible unifying framework for observation-driven models, we

must create a new data function to update the time-varying parameter for new observation

density and parametrization. Many times, it may not be obvious what the right function is in a

certain situation, such as volatility modeling.

i. Stochastic Volatility Models


14

According to Hansen et al. (2010), simplicity, flexibility, and resilience are the

multivariate factor stochastic volatility model (SV) (Abbara & Zevallos, 2019). According to this

model, a potentially large monitoring space is reduced to a smaller orthogonal matrix

factorization space, much like a factor model; therefore, it is straightforward. These components

are allowed to display clustering, but they are also allowed to be stochastic volatility processes,

allowing the extent of turbulence co-movement to be time-varying. This is both flexible and

resilient.

There may be a tendency for volatility to revert to its long-term mean value in stochastic

volatility models. Stochastic volatility models can be used to solve the challenge of derivative

models that assume constant volatility over a certain time frame (Hansen et al., 2010). Stochastic

volatility models are used to control and value the risk relating with derivative contracts. The

variance (error term) in an ARCH model is a function of the period squared mistakes in the past.

Volatility clustering is a financial theory that explains the correlation between volatile markets

and times of low volatility. For example, in a generalized autoregressive model, a GARCH

model's error term is a function of previous period squared errors and the previous period

estimated variance (Hansen et al., 2010). High volatility tends to follow periods of low volatility,

which is explained in finance by the concept of "volatility clustering."

ii. State-Space Models

The state-space model of a continuous-time dynamic conceptual model can be generated

from either the time domain system model supplied by a differential equation or its transfer

function representation (Hansen & Lunde, 2006). This section will deal with scenarios that
15

incorporate the observer form, the modal form, and the Jordan form, four state-space forms

commonly employed in current control theory and application. The general formula is

represented below;

This may or may not incorporate all of the underlying transfer function modes, i.e., the

poles of the underlying transfer function, depending on how the state-space models are

constructed (before zero-pole cancellation, if any, takes place) (Hansen & Lunde, 2006). The

state-space model will have a lower order if parts of the transfer function's zeros or poles are

negated, and the accompanying patterns will not be visible in the transition matrix.

According to Costa & Alpuim (2010), the Kalman filter method, developed by Kalman in

1960, has been widely applied to studying dynamic structures' evolution. Using a group of

equations called a state-space model, the technique attempts to derive estimates for unobservable

variables based on associated observable variables. The equations used to construct these types

of models define them.


16

Assessment Equation (1) connects the n*1 vectors of measured variables (Yt) and states

(bt), which are known as the m*1 vectors of unobservable variables. White noise n*1 vector et

referred to as the sampling error, and the correlation coefficient of the n*m matrix Ht make up

the Ht matrices (Costa & Alpuim, 2010). Eq. (2), the equation of transformation or state, also

shows that the vector of states bt changes with time. Covariance matrix and vector

autoregression coefficient matrix U are used in this example. There is no correlation between the

two disturbances et and et. When the state vector is a deterministic model with a mean, one

family of models is of major relevance.

iii. Quasi Maximum Likelihood

According to Wooldridge (2014), if you want to estimate a linear model with one or more

endogenous explanatory variables, 2SLS is the most typical method. The limited information

probabilistic estimator, computed under the nominal condition of jointly regularly dispersed

unobservable, has better small-sample features, however, as many authors have shown,

particularly when multiple overidentifying requirements are present. The Quasi model can be

computed by
17

Even though dealing with explanatory variable endogeneity in nonlinear models is

notoriously challenging, specific instances have been provided. In nonlinear models, the

probability of the endogenous explanatory variables (EEVs) – whether continuous, discrete, or

some mix – portrays imminent significance (Durbin & Koopman, 2000). 2SLS can be utilized

irrespective of the type of the EEVs. In the second stage, the structural variables and other

quantities of relevance, e.g., average partial impacts, are frequently discordant with the values

obtained in the first stage. The bulk of the time, two strategies are utilized to approximate

nonlinear models with EEVs.

To achieve maximum likelihood, a model of equations with unknown unobserved errors

must have an explicit description of the EEV distribution and an associated response parameter

distribution conditional on the EEVs (Durbin & Koopman, 2000). There are various downsides

to the MLE technique, especially when dealing with binary responses. Managing many EEVs,

for one thing, could be computationally taxing. Furthermore, if assumptions are erroneous, it is

typically inaccurate. This is perhaps an essential reason to avoid it.

Remainders from the first stage estimate technique, including EEV, are utilized in a

second stage assessment issue using a control function approach. Nonlinear models with cross-

section data and panel data are just two of the circumstances in which researchers use the control

function (CF) technique (Wooldridge, 2014). Using the method in semiparametric and

nonparametric circumstances has been proven by Wooldridge (2014). According to BP, there are

no distributional or functional limits on quantities of interest, which means they can be found

generically. APE provides unobservable that aren't considered independent of external causes,
18

but Durbin & Koopman (2000) compares the ordinary structural support and average partial

effects as similar notions. The method of inserting first-stage aspects for EEVs can generate

consistent estimators of parameters up to a universal scale factor in some cases, but the

constraints under which this occurs are quite restrictive, and average partial effects cannot be

easily retrieved (Wooldridge, 2014). It is also difficult to test the premise that the EEVs are

exogenous because of the fitted-value technique.

iv. Indirect Inference

According to Gourieroux et al. (1993), Indirect inference makes use of the simplicity and

convenience with which data from even complex structural frameworks may be replicated. The

central notion is to look at observed values and replicated findings through the lens of an

inferential statistical (or auxiliary) framework with auxiliary parameters. The  structural

parameters are then chosen to match the parameter estimates, and when viewed through this

perspective, the simulated outcomes resemble the observed data. To formalize these notions,

suppose the actual choices {yit}, I = 1, . . . , n, t = 1, . . . , T, are created by the structural discrete

choice model specified in (2.1), for a given value β0 of the structural response. An auxiliary

framework can be evaluated using the observed values to obtain parameter estimates ˆθn.

Formally, ˆθn solves:


19

Where the x/s are external variables that can be observed, but the u/s and &;s are not.

Suppose (Al) is a stationary Markov process (yt,xt) and (et) is a white noise whose dispersion

GO is known. We further assume that (Al) (xt) is a homogeneous Markov process (Al) (xt).

Notably, in the parametric situation, it is not necessary to assume that et has some known

distribution as a white noise with standard normal distribution, and a variable that can be

included in 6 can always be used instead.

Because any simulated choice is a step function, step functions appear naturally when

using indirect inference on discrete choice models. The prototype binding function n is hence

discontinuous. Because of this, the II estimators' criterion functions have a discontinuity. Due to

II's discrete outcomes, gradient-based optimization methods cannot be used. There are only a few

options left: derivative-free approaches, random search algorithms (such as simulated annealing),

or simply abandoning optimization in favor of a Laplace-type estimator (Gourieroux et al.,

1993). MCMC, on the other hand, can provide (infinite samples) an approximation that differs

significantly from the statistical criterion's optimum even when it converges slowly. Since non-

smooth criterion functions define II estimators, their employment in the form of nonlinear data is

extremely problematic (Gourieroux et al., 1993). Despite the challenges of applying II to discrete

choice models, several authors have persisted in doing so due to the attraction of the II approach.

v. Importance Sampling

According to Yuan & Druzdzel (2012), Monte Carlo estimator convergence is

accelerated if the samples are drawn from a distribution comparable to the function in the

integrand. Importance sampling utilizes this fact. The underlying principle is that an accurate
20

approximation is produced more quickly by focusing effort where the integrand's value is

relatively high. The scattering equation for this model is computed using the formula below;

The scattering equation, for example, might be used as an example. What happens if a

random direction is selected that is approximately perpendicular to the surface normal? If the

cosine term is close to zero, this integral is estimated. Durbin & Koopman (2000) asserted that

performing a BSDF analysis and then tracking a laser beam to determine the incident radiance at

the sample site will be a complete waste of time and money, as the results will be so insignificant

(Durbin & Koopman, 2000). Sample directions in such a manner that they are less likely to lead

us to choose directions close to the horizon. In general, efficiency improves when directions are

selected from populations that match other integrand factors (the BSDF or the distribution of

incoming illumination). Variance can be decreased as long as the independent variables are

drawn from an integrand-like probability distribution.

vi. Local-Level Model

According to Durbin & Koopman (2000), the local level model encompasses other

filtering models such as the Kalman filter, Regression Lemma, Bayesian Treatment, Minimum

Variance linear unbiased treatment, and smooth state variance, among other models within the
21

local level projection. For instance, In a time series, each observation is sorted sequentially from

y1 to yn. The additive model is the most fundamental representation of a time series. The

resultant model formula is provided by;

These three components are called the trend, the seasonal, and the error/disturbance,

respectively. t is the slow-changing component. For this model, we assume the observation yt

and the other parameters listed in (2.1) are numeric values. Components are multiplied together

in various applications, especially in the economy (Durbin & Koopman, 2000). The resulting

condensed formula for the model becomes;

This model, despite its simplicity, does not represent a contrived special instance but

rather serves as a foundation for the investigation of key real-world challenges in time series

analysis. According to typical multivariate analysis results, estimation of contingent means,

variances, and covariance matrices is a routine affair based on the features of the multivariate

standard deviation, which may be applied here (Durbin & Koopman, 2000). However, when the

number of observations yt rises, the usual computations become increasingly time-consuming.

Using the filtering and flattening techniques discussed in the following three sections, this naïve

approach to estimating can be greatly improved (Durbin & Koopman, 2000). It is possible to get
22

the same findings as multivariate analysis theory using these techniques because they give fast

computer algorithms.

The Student T-Distribution

According to Zhu & Galbraith (2010), a variety of distributions resembles the normal

distribution curve, although it is a little shorter and thicker. The t distribution is preferred over

the normal distribution when working with tiny samples. The t distribution resembles the normal

distribution more closely as the sample size increases. For sample sizes greater than 20, the

dispersion is nearly identical to the normal distribution. This model is computed using the

formula presented below;

Adapted from: https://www.statisticshowto.com/probability-and-statistics/t-distribution/


23

If you want to know if you should embrace or reject the null hypothesis, you utilize the T

Distribution (and its associated t scores) in hypothesis testing. The graph's core represents the

area of acceptance, while the graph's extremities represent the area(s) of rejection. This two-

tailed test graph's rejection area is colored blue (Zhu & Galbraith, 2010). Z-scores and t-scores

can be used to describe the tail area.

Gaussian Distribution

The Gaussian distribution is formulated using the formula presented below.

According to Giner & Smyth (2016), Gaussian distribution is a theoretical symmetrical

dispersion used to compare scores or make other statistical decisions, such as determining the

mean and standard deviation. This distribution's form suggests that the bulk of scores is located

near the distribution's center and that the frequency of scores that deviate from the center

diminishes. The normal probability distribution is the most common among discrete probability

distributions or cumulative distribution functions. A probability density model is a variable that

expresses the probability of a random variable taking a specific value. Plotting the variable x by

its likelihood of happening, y, creates this graph. They have asymmetric bell shape but can have

any real mean and any positive integer standard deviation. These are normally distributed

ranges.  To put it simply, normal distributions are defined by continuous data, which means that

any value in the data set can be represented. Special instances allow the normal distribution to be
24

normalized so that its mean is zero and the standard deviation is one. It is possible to standardize

any normal distribution to fit the normal curve. Its condensed formula is;

The mean (μ), standard deviation (σ), and variance (σ2) make up this exponential

function. Shorthand for the expression is N (μ, σ2). N (0, 1) has the usual normal distribution if

the parameter values are equal to zero and one (Giner & Smyth, 2016). The mean and the

confidence interval determine the normal distribution's form. Standard deviation controls the y-

axis, whereas the mean controls the x-axis. The mean affects the distribution's apex's location, so

it is referred to as the location parameter. What determines how large a distribution appears is

referred to as a "scale parameter."  The bell curve will be broader if the variance is greater.

Data Cleaning

Volatility assessment from high-frequency data requires careful data cleansing. High-

frequency data cleansing has been given considerable attention. A large data set can improve

volatility estimators, as demonstrated by Barndorff‐Nielsen et al. (2009). The reasoning behind

this conclusion may initially appear to be counterintuitive, yet it is rather simple. As a rule of

thumb, an estimator that uses all available data to its fullest extent is more likely to place a high

value on reliable statistics and a lower value on inaccurate data.

The generalized least-squares (GLS) approximation provides a suitable comparison point

(Barndorff‐Nielsen et al., 2009). The conventional least squares estimator, on the other hand, has
25

a lower level of precision when it includes noisy data. The least-squares estimator can suffer

more than benefit from using low-quality observations, and this is a reasonable comparison in

light of the current scenario. Outliers can significantly impact the realized kernel and related

estimators, which process all observations equally (Barndorff‐Nielsen et al., 2009). The

following steps can be followed for trade data to achieve high-frequency data cleaning.

 Re-enter the right trades in T1. A Correction Indicator (CORR > 0) is used in these trades

 Use T2 to remove any entries that have an odd Sale Condition. Letter-coded COND

transactions, except those including the letters "E" and "F," are excluded from this list.

TAQ 3's User's Guide has more information on how sales are handled (Barndorff‐Nielsen

et al., 2009).

 T3: Use the median price if there are numerous transactions with the same time stamp.

 In T4, the bid-ask spread should be subtracted from any prices that are over the 'ask.'

With prices below the "bid" minus the "bid-ask spread," the situation is similar

(Barndorff‐Nielsen et al., 2009).

Conclusion

This paper has exploratively discussed the stochastic and non-stochastic volatility models

encompassed under observation-driven and parameter-driven models. An observation-driven

model uses contemporary parameters as predictable functions of serial correlation, and

exogenous variables and parameters change at random yet can be predicted exactly one step

ahead of time based on prior knowledge. Time-varying parameters in stochastic processes are
26

possible for any conditional observation density, so long as the process itself is stochastic.

Because of this, parameter-driven models can be applied in various situations.


27

References

Abbara, O., & Zevallos, M. (2019). A note on stochastic volatility model estimation. Brazilian

Review of Finance, 17(4), 22-32.

Asai, M., & McAleer, M. (2009). The structure of dynamic correlations in multivariate stochastic

volatility models. Journal of Econometrics, 150(2), 182-192.

Barndorff‐Nielsen, O. E., Hansen, P. R., Lunde, A., & Shephard, N. (2009). Realized kernels in

practice: trades and quotes.

Blasques, F., Gorgi, P., Koopman, S. J., & Wintenberger, O. (2018). Feasible invertibility

conditions and maximum likelihood estimation for observation-driven

models. Electronic Journal of Statistics, 12(1), 1019-1052.

Costa, M., & Alpuim, T. (2010). Parameter estimation of state-space models for univariate

observations. Journal of Statistical Planning and Inference, 140(7), 1889-1902.

Creal, D., Koopman, S. J., & Lucas, A. (2013). Generalized autoregressive score models with

applications. Journal of Applied Econometrics, 28(5), 777-795.

Durbin, J., & Koopman, S. J. (2000). The time series analysis of non‐Gaussian observations is

based on state-space models from classical and Bayesian perspectives. Journal of the

Royal Statistical Society: Series B (Statistical Methodology), 62(1), 3-56.


28

Giner, G., & Smyth, G. K. (2016). stated: probability calculations for the inverse Gaussian

distribution. arXiv preprint arXiv:1603.06687.

Gourieroux, C., Monfort, A., & Renault, E. (1993). Indirect inference. Journal of applied

econometrics, 8(S1), S85-S118.

Gustafsson, F. (2010). Particle filter theory and practice with positioning applications. IEEE

Aerospace and Electronic Systems Magazine, 25(7), 53-82.

Hansen, P. R., & Lunde, A. (2006). Realized variance and market microstructure noise. Journal

of Business & Economic Statistics, 24(2), 127-161.

Hansen, P. R., Huang, Z., & Shek, H. H. (2010). Realized GARCH: A complete model of returns

and realized volatility measures, mimeograph, Department of Economics.

Huang, Z., Wang, T., & Zhang, X. (2014). Generalized autoregressive score model with realized

measures of volatility. Available at SSRN 2461831.

Koopman, S. J., Lucas, A., & Scharth, M. (2016). Predicting time-varying parameters with

parameter-driven and observation-driven models. Review of Economics and

Statistics, 98(1), 97-110.

Wooldridge, J. M. (2014). Quasi-maximum likelihood estimation and testing for nonlinear

models with endogenous explanatory variables. Journal of Econometrics, 182(1), 226-

234.
29

Yuan, C., & Druzdzel, M. J. (2012). An importance sampling algorithm based on evidence pre-

propagation. arXiv preprint arXiv:1212.2507.

Zhu, D., & Galbraith, J. W. (2010). A generalized asymmetric Student-t distribution with

application to financial econometrics. Journal of Econometrics, 157(2), 297-305.

You might also like