Statistics For Quants 1696336178

Statistics for Quants: A Primer
Amit Kumar Jha, UBS
Contents
1 Introduction to Statistics 2
2 Descriptive Statistics 3
2.1 Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Median . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.3 Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.4 Variance and Standard Deviation . . . . . . . . . . . . . . . . . . . . . . 3
3 Probability Distributions 4
3.1 Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.2 Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4 Statistical Inference 4
4.1 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.1.1 Type I Error (False Positive) . . . . . . . . . . . . . . . . . . . . . 5
4.1.2 Type II Error (False Negative) . . . . . . . . . . . . . . . . . . . . 5
4.2 Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.3 Sample Size and Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
5 Regression Analysis 6
5.1 Simple Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 6
5.2 Multiple Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
5.3 Coefficient of Determination (R2 ) . . . . . . . . . . . . . . . . . . . . . . 6
5.4 Applications in Finance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
6 Time Series Analysis 7

6.1 Components of Time Series . . . . . . . . . . . . . . . . . . . . . . . . . 7
6.2 Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
6.3 Autocorrelation and Partial Autocorrelation . . . . . . . . . . . . . . . . 7
7 Other Advanced Statistical Techniques 7

7.1 Machine Learning in Finance . . . . . . . . . . . . . . . . . . . . . . . . 8
7.2 Volatility Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
7.3 Risk Management and Value at Risk (VaR) . . . . . . . . . . . . . . . . . 8
1
8 Time Series Models 8
8.1 Autoregressive Model (AR) . . . . . . . . . . . . . . . . . . . . . . . . . 8
8.2 Moving Average Model (MA) . . . . . . . . . . . . . . . . . . . . . . . . 8
8.3 Autoregressive Moving Average Model (ARMA) . . . . . . . . . . . . . . 8
8.4 Autoregressive Integrated Moving Average (ARIMA) . . . . . . . . . . . 9
8.5 Seasonal ARIMA (SARIMA) . . . . . . . . . . . . . . . . . . . . . . . . . 9
8.6 Seasonal ARIMA with Exogenous Variables (SARIMAX) . . . . . . . . . . 9
8.7 Generalized Autoregressive Conditional Heteroskedasticity (GARCH) . . 9
8.8 Variants of GARCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
8.8.1 EGARCH (Exponential GARCH) . . . . . . . . . . . . . . . . . . . 9
8.8.2 IGARCH (Integrated GARCH) . . . . . . . . . . . . . . . . . . . . 9
8.8.3 TGARCH (Threshold GARCH) . . . . . . . . . . . . . . . . . . . . 10
8.8.4 PGARCH (Power GARCH) . . . . . . . . . . . . . . . . . . . . . . 10
8.8.5 VGARCH (Variance GARCH) . . . . . . . . . . . . . . . . . . . . . 10
8.8.6 DCC GARCH (Dynamic Conditional Correlation GARCH) . . . . . 10
9 Tests in Statistics 10
9.1 t-test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
9.2 ANOVA (Analysis of Variance) . . . . . . . . . . . . . . . . . . . . . . . . 11
9.3 Chi-Squared Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
9.4 Shapiro-Wilk Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
9.5 Durbin-Watson Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
9.6 Mann-Whitney U Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
9.7 Variance Inflation Factor (VIF) . . . . . . . . . . . . . . . . . . . . . . . . 12
9.8 Breusch-Pagan Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
9.9 Ljung-Box Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
9.10 Augmented Dickey-Fuller (ADF) Test . . . . . . . . . . . . . . . . . . . . 12
9.11 Engle’s ARCH Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
9.12 Bollerslev’s Multivariate GARCH . . . . . . . . . . . . . . . . . . . . . . . 12
1 Introduction to Statistics
Statistics is like a magnifying glass for data. It lets us zoom in on important details, spot
patterns, and make educated guesses about future events. In the financial world, this
is akin to analyzing market trends, evaluating risks, and predicting stock performance.
Every day, countless financial transactions take place around the world, from buy-
ing a cup of coffee to purchasing billion-dollar companies. Each of these transactions
generates data. Over time, this massive collection of data can tell us stories about eco-
nomic health, consumer behavior, and market dynamics. But to unlock these stories,
we need the tools of statistics.
Example: Consider the housing market in a bustling city. Over a year, the prices of
homes might fluctuate due to various factors like interest rates, employment levels, or
even seasonal changes. If a real estate investor merely glances at this data, they might
feel lost amidst the monthly ups and downs. But with statistics, they can calculate the
average house price for the year, understand the degree of price variation, and even
2
predict future price trends based on historical data. By doing this, our investor can
make informed decisions about when to buy or sell properties.
Imagine a bustling stock market. Without statistics, an individual trader might feel
overwhelmed by the sheer volume of numbers flashing on the screen. How do you
differentiate between a momentary price dip and a significant downward trend? How
do you gauge the general sentiment of the market? This is where statistics come in. By
computing averages, measuring variability, and testing hypotheses, traders can discern
patterns, evaluate risks, and predict potential market movements.
Furthermore, statistics isn’t just about crunching numbers. It’s about understanding
uncertainty and making the best possible decisions with the information at hand. In the
world of finance, where the stakes are high and uncertainty is a given, statistical analy-
sis becomes a crucial tool. Whether it’s an investment bank assessing the risk of a new
financial product, an insurance company setting premiums, or a retail investor deciding
which stocks to buy, statistics provides the framework for making these decisions.
In essence, while financial markets might seem chaotic and unpredictable, beneath
the surface, there are patterns, trends, and relationships waiting to be discovered. And
the key to unveiling them? Statistics.
2 Descriptive Statistics
Descriptive statistics is like giving a quick snapshot of data. Instead of sifting through
pages of numbers or trying to interpret rows of data, you receive a distilled overview
in the form of a few key measures that summarize everything. These measures help in
understanding the main characteristics of the data at a glance, providing insights into
its general behavior and key trends.
Imagine you’re trying to understand the performance of a stock over a year. Looking
at the daily closing prices might be overwhelming; that’s where descriptive statistics
come in. They offer an organized summary, making it easier to see the overall trend,
the typical price, and the variability in price.
2.1 Mean
The mean, often termed the average, tells us the central value around which all other
data points gravitate.
Example: Suppose you have the monthly returns of a stock over a year. The mean
would give you an idea of the stock’s typical monthly return. If the mean return is
positive, it indicates that, on average, the stock had a gain every month.
2.2 Median
The median is the middle value when all data points are arranged in ascending or
descending order. If there are an odd number of data points, the median is the one
right in the center. For an even number, it’s the average of the two central numbers.
Example: If you’re evaluating the annual salaries of employees in a firm, and the
salaries vary widely due to a few top executives with hefty compensation, the median
salary would provide a more representative figure of what the typical employee earns,
unaffected by the extremely high values.
3
2.3 Mode
The mode identifies the most frequently occurring value in a dataset. If a particular
number appears more often than any other, that number is the mode.
Example: In analyzing the sizes of shoes sold in a store, if size 8 is the most fre-
quently sold, then that’s the mode. This information could be useful for inventory
management.
2.4 Variance and Standard Deviation

These two measures describe the spread or dispersion of data. While variance calculates
the average squared difference from the mean, standard deviation, its square root, gives
an idea of the typical deviation a data point has from the mean.
Example: In a mutual fund, if the monthly returns have a high standard deviation, it
indicates that the returns vary significantly from the average return, signaling a higher
risk.
In essence, descriptive statistics provide a concise summary of data, making it more
digestible and interpretable. Whether it’s for understanding market trends, comparing
investment options, or making business decisions, these measures offer a solid founda-
tion for deeper analysis and decision-making.
3 Probability Distributions
A probability distribution is like a playbook for randomness. Instead of leaving us in
the dark about what might happen next, it illuminates the various possible outcomes
and tells us the odds of each. In the realm of finance, these outcomes could be anything
from stock prices and interest rates to market returns and even the success of a new
product launch.
Imagine throwing a dice. The outcome could be any number between 1 and 6. Each
of these outcomes has a known probability (in this case, 16 for each number). But what
if we’re dealing with something more complex, like the future price of a stock? There
are countless factors at play, and the possible outcomes might range in the hundreds,
thousands, or even more. A probability distribution helps map out these possibilities,
giving a clearer picture of potential futures.
3.1 Normal Distribution

Often called the ”bell curve” because of its shape, the normal distribution is one of the
most commonly encountered probability distributions in finance. It assumes that most
occurrences (like stock prices or company revenues) take place near the average, and
fewer occur as we move away from the average in either direction.
Example: Imagine a company that has been recording its monthly sales for several
years. If we plotted these sales figures, we might find that most of the months had
sales figures close to a central number, say 10,000 units. Some months might have
seen slightly more sales, some slightly less. Very few months would have extremely
high or extremely low sales. This distribution of sales figures could potentially form a
bell shape, indicating that the company’s monthly sales follow a normal distribution.
4
3.2 Binomial Distribution
The binomial distribution deals with binary outcomes – situations where there are only
two possible results, like a coin toss yielding heads or tails. In finance, this might be
used to predict the likelihood of success or failure of a particular event based on a
known probability.
Example: Consider a trader who, based on their analysis, believes that there’s a
70% chance a particular stock will rise tomorrow. If they were to trade based on this
belief over 10 days, the binomial distribution could be used to calculate scenarios like
the probability that the stock will rise on exactly 7 out of those 10 days.
Ultimately, probability distributions offer a structured approach to uncertainty. They
don’t predict the future but rather provide a framework to understand the various pos-
sibilities and their likelihoods. For financial professionals, this knowledge is invaluable.
It helps them gauge risks, set expectations, and devise strategies that align with their
goals and risk tolerance.
4 Statistical Inference
Statistical inference is like detective work for data. Instead of investigating crime
scenes, we’re probing datasets. We gather clues from a sample, a smaller group se-
lected from a larger population, and make educated guesses or inferences about the
characteristics of the entire population.
Imagine you’re a financial analyst trying to understand the spending habits of con-
sumers in a large city. It’s impractical to ask every single person about their spending.
Instead, you might survey a smaller group and use their responses to make inferences
about the entire city’s habits.
4.1 Hypothesis Testing

Hypothesis testing is akin to the courtroom drama of the statistical world. We start with
a claim or hypothesis about a population parameter. Then, using sample data, we test
this claim to see if there’s enough evidence to support it.
Example: Suppose a mutual fund claims to have consistently outperformed the
market average. To test this claim, an analyst might set up a null hypothesis stating
that the fund’s performance is equal to the market average. Using past return data,
the analyst will then determine whether there’s enough evidence to reject this null
hypothesis in favor of the alternative hypothesis, which posits that the fund has indeed
outperformed the market.
4.1.1 Type I Error (False Positive)

A Type I error occurs when we incorrectly reject a true null hypothesis. In other words,
we believe there’s an effect or difference when there actually isn’t one. The probability
of making a Type I error is denoted by the symbol α, often referred to as the significance
level of the test.
Example: Imagine a financial analyst testing the effectiveness of a new trading
strategy. The null hypothesis might state that the new strategy performs no better
than the old one. If the analyst concludes that the new strategy is superior (rejecting
5
the null) when it’s actually not, they’ve made a Type I error. This could lead to the
adoption of an ineffective strategy.
4.1.2 Type II Error (False Negative)

A Type II error happens when we fail to reject a false null hypothesis. Essentially, we
overlook an effect or difference that exists. The probability of making a Type II error is
denoted by β. The power of a test, which is 1 − β, represents the probability of correctly
rejecting a false null hypothesis.
Example: Using the same trading strategy scenario, suppose the new strategy is
genuinely better, but the analyst’s test fails to detect this improvement. This oversight,
a Type II error, could result in missing out on potential profits from the superior strategy.
4.2 Confidence Intervals

While hypothesis testing gives a binary outcome (accept or reject a claim), confidence
intervals provide a range of values in which we’re fairly confident the population pa-
rameter lies.
Example: An economist trying to estimate the average annual income of house-
holds in a region might sample 1,000 households. Based on this sample, they might
estimate the average income to be $50,000 with a 95% confidence interval of $48,000
to $52,000. This means they’re 95% confident that the true average income for the
entire region falls within that range.
4.3 Sample Size and Power

The accuracy of inferences often hinges on the sample size. A larger sample can lead
to more precise estimates. Additionally, the concept of power in hypothesis testing
refers to the probability of correctly rejecting a false null hypothesis. Ensuring adequate
sample size can boost the power of a test.
Example: Imagine a bank introducing a new app feature and wanting to know if it
increases user engagement. If they test the feature on just 10 users, the results might
not be very reliable. But if they test it on 1,000 users, the conclusions drawn will be
much more robust.
In essence, statistical inference provides the tools to make sense of data in mean-
ingful ways, allowing decisions to be data-driven and evidence-based. This is especially
crucial in finance, where decisions can have significant monetary implications.
5 Regression Analysis
Regression analysis is akin to finding the best path through a forest of data points.
Imagine you’re looking at a scatter plot of points, and you want to draw a single line
that best captures the overall trend of those points. That’s what regression analysis
does. It’s a statistical tool that lets you predict a dependent variable based on one
or more independent variables. In the realm of finance, this can be incredibly useful,
as it allows professionals to make educated predictions about future events based on
historical data.
6
5.1 Simple Linear Regression
Simple linear regression is the most basic form of regression. It involves one indepen-
dent variable and one dependent variable. The goal is to find a straight line that best
fits the data.
Example: Suppose an analyst wants to understand the relationship between a com-
pany’s advertising spend and its sales revenue. By plotting past data and fitting a line,
the analyst can predict how much revenue increase can be expected for a given increase
in advertising spend.
5.2 Multiple Regression

When there’s more than one independent variable, we turn to multiple regression.
It’s like drawing a multi-dimensional line of best fit, accounting for various factors
simultaneously.
Example: Consider a portfolio manager trying to predict a stock’s future price. They
might consider multiple factors like the company’s earnings, the broader market’s per-
formance, interest rates, and economic indicators. Multiple regression would allow the
manager to account for all these variables in a single model, providing a comprehensive
prediction.
5.3 Coefficient of Determination (R2 )

The R2 value, or the coefficient of determination, measures how well the regression
line fits the data. A value of 1 means a perfect fit, while a value closer to 0 indicates
that the model doesn’t explain much of the variability in the data.
Example: If an analyst builds a regression model to predict future stock prices based
on several economic indicators and gets an R2 value of 0.85, it indicates that 85% of the
stock price movement can be explained by the chosen indicators, while the remaining
15% might be due to other factors not included in the model.
5.4 Applications in Finance

Regression analysis is widely used in finance for tasks such as portfolio diversification,
risk assessment, and predicting future values of assets. It provides a structured way to
understand relationships between variables and can be a powerful tool when combined
with domain expertise.
Example: Credit analysts might use regression analysis to determine which factors
most influence a person’s credit score. By understanding these factors (like income,
debt level, and payment history), banks can make informed lending decisions and set
appropriate interest rates.
In summary, regression analysis offers a way to navigate the complex world of finan-
cial data, highlighting relationships and guiding decision-making processes. As markets
and economies are influenced by countless factors, having a tool to systematically ana-
lyze and predict trends is invaluable.
7
6 Time Series Analysis
Time series analysis is like watching a movie of data points. Instead of seeing a snapshot
or a single frame, you’re observing how data changes over time. In finance, time series
analysis helps in tracking and forecasting the movement of stocks, commodities, and
other financial metrics over periods ranging from days to years.
6.1 Components of Time Series

A time series can often be broken down into four main components:
1. Trend: The overall direction in which the data is moving over the long term.
2. Seasonality: Regular fluctuations in the data that occur at consistent intervals,

like daily, monthly, or yearly.
3. Cycles: Long-term wave-like patterns that aren’t as regular as seasonality.
4. Irregular or Noise: The random fluctuations that can’t be attributed to the above
components.
Example: When analyzing monthly sales of a seasonal product, like beachwear,

you might notice a trend of increasing sales year-over-year, a seasonal spike during
summer months, cycles related to broader economic conditions, and irregularities due
to unforeseen events or anomalies.
6.2 Forecasting
A major application of time series analysis in finance is forecasting future values based
on past and present data. Predicting stock prices, interest rates, and economic indica-
tors are all areas where time series forecasting plays a pivotal role.
Example: An investment firm might use time series models to predict the future
performance of a stock based on its historical prices. This prediction can guide decisions
about buying, selling, or holding the stock.
6.3 Autocorrelation and Partial Autocorrelation

Autocorrelation measures the relationship between a time series and a lagged version
of itself. Partial autocorrelation, on the other hand, measures the relationship between
a time series and its lags while excluding the influence of other lags.
Example: In analyzing a stock’s daily returns, if we find a significant autocorrelation
at a lag of 7 days, it might suggest a weekly pattern in the stock’s performance.
7 Other Advanced Statistical Techniques

As financial markets evolve and data becomes more intricate, advanced statistical tech-
niques are constantly being developed and refined to glean insights and make predic-
tions.
8
7.1 Machine Learning in Finance
Machine learning, a subset of artificial intelligence, involves training algorithms on vast
amounts of data to make predictions or decisions without being explicitly programmed.
Example: Hedge funds might employ machine learning models to predict stock
movements based on a vast array of variables, from company fundamentals to global
economic indicators.
7.2 Volatility Modeling

Volatility, a measure of price variability, is a critical component in finance, especially in
options pricing and risk management. Models like GARCH (Generalized Autoregressive
Conditional Heteroskedasticity) are often used to predict future volatility based on past
data.
Example: An options trader might use volatility models to price options contracts
more accurately, ensuring they’re neither overcharging nor undercharging their clients.
7.3 Risk Management and Value at Risk (VaR)

Quantitative techniques are often employed to assess and manage financial risks. Value
at Risk (VaR) is a popular measure that quantifies the maximum potential loss an in-
vestment portfolio could face over a specified period for a given confidence interval.
Example: A portfolio manager might calculate that the 1-day 95% VaR of their
portfolio is $1 million, meaning there’s a 5% chance the portfolio could lose more than
$1 million over the next day.
8 Time Series Models

In the realm of finance, understanding past data and predicting future movements can
be of immense value. Time series models are specially designed to work with data
points ordered in time, making them indispensable tools for financial forecasting.
8.1 Autoregressive Model (AR)

The Autoregressive model posits that the value of a series at a given time depends
linearly on its previous values.
Example: A stock analyst might use an AR model to predict stock prices, assuming
that today’s price is a function of its prices over the past few days.
8.2 Moving Average Model (MA)

The MA model represents a series as a combination of a white noise series and its past
values.
Example: When analyzing daily returns of a stock, short-term fluctuations can be
better understood using an MA model, capturing the random shocks in the system.
9
8.3 Autoregressive Moving Average Model (ARMA)
ARMA combines both AR and MA components. It’s suitable for univariate time series
data that exhibits both trend and seasonality.
Example: The daily trading volume of a stock might be influenced by its past vol-
umes (AR part) and also by recent random events like news or macroeconomic factors
(MA part).
8.4 Autoregressive Integrated Moving Average (ARIMA)

ARIMA is an extension of ARMA, adding an integration component to make the data
stationary (constant mean and variance over time).
Example: For a stock whose prices have been consistently rising over time, the
analyst might use ARIMA to first ”detrend” the data before making forecasts.
8.5 Seasonal ARIMA (SARIMA)

SARIMA extends ARIMA by adding a seasonal differentiation step, making it apt for
data with regular seasonal patterns.
Example: A retailer might use SARIMA to forecast monthly sales data, taking into
account both long-term trends and yearly seasonal patterns like holiday sales spikes.
8.6 Seasonal ARIMA with Exogenous Variables (SARIMAX)

SARIMAX extends SARIMA by incorporating external or exogenous variables which
might influence the time series but are not a part of the series itself.
Example: When forecasting a company’s quarterly revenues, an analyst might in-
corporate external factors like GDP growth or industry-specific indicators using SARI-
MAX.
8.7 Generalized Autoregressive Conditional Heteroskedasticity (GARCH)

GARCH models are used to estimate and predict volatility. They assume that the volatil-
ity of a time series is dependent on its past values.
Example: An options trader might use GARCH to understand and forecast the
volatility of underlying assets, which is crucial for pricing options.
8.8 Variants of GARCH

8.8.1 EGARCH (Exponential GARCH)
EGARCH stands for Exponential GARCH. Unlike the basic GARCH model, EGARCH can
capture the asymmetric effects of positive and negative shocks (or returns) on volatility.
In simple terms, it allows for the possibility that bad news might have a different impact
on volatility compared to good news.
Example: Imagine a company that has been consistently performing well. Sud-
denly, there’s a negative rumor about its financial health. The stock price might plum-
met, and the volatility (or the magnitude of price fluctuations) might increase dramat-
ically. EGARCH can capture this kind of asymmetric response in volatility.
10
8.8.2 IGARCH (Integrated GARCH)
The ”I” in IGARCH stands for ”integrated”. In this model, the sum of the ARCH and
GARCH coefficients is constrained to be exactly one. This implies that shocks to the
volatility persist indefinitely into the future. In simpler terms, once there’s a shock, its
effects never completely fade away.
Example: Consider a financial market that has experienced a major crash. Even
after many years, the memories of that crash might still affect the market’s volatility.
IGARCH captures this never-ending impact of past shocks.
8.8.3 TGARCH (Threshold GARCH)

TGARCH, or Threshold GARCH, is another model that captures the asymmetric impact
of shocks on volatility. It introduces a ”threshold” component, allowing it to differenti-
ate between positive and negative shocks in a slightly different manner than EGARCH.
Example: Think of a well-established company’s stock. Minor good news might
not change its volatility much. However, even a small piece of bad news might result
in significant stock price fluctuations. TGARCH can model such scenarios where the
stock’s reaction to negative news is more pronounced than its reaction to positive news.
8.8.4 PGARCH (Power GARCH)

PGARCH introduces a power term to the GARCH model, allowing for more flexibility
in modeling volatility. This ”power” can be any positive number, and it doesn’t have
to be an integer. By adjusting this power, the model can capture different patterns in
volatility.
Example: Suppose analysts are studying a cryptocurrency whose price fluctuations
don’t fit neatly into standard models. PGARCH, with its added flexibility, might be a
better fit for capturing the unique volatility patterns of this cryptocurrency.
8.8.5 VGARCH (Variance GARCH)

VGARCH models the variance directly, as opposed to the standard deviation, which
is typically modeled in other GARCH variants. This direct modeling of variance can
sometimes provide a more accurate representation of volatility.
Example: In analyzing an emerging market with high volatility, modeling the vari-
ance directly using VGARCH might offer insights that are more consistent with the
observed data, compared to other models.
8.8.6 DCC GARCH (Dynamic Conditional Correlation GARCH)

DCC GARCH, or Dynamic Conditional Correlation GARCH, is an extension of the tradi-
tional GARCH model designed to analyze correlations between multiple time series. In-
stead of focusing on the volatility of a single series, DCC GARCH captures the dynamic
relationships between different financial assets. This is especially useful in portfolio
management and risk assessment where understanding the co-movements of assets is
crucial.
11
In simpler terms, while basic GARCH models tell us about the volatility of a single
asset, DCC GARCH reveals how different assets move together over time. Do they tend
to rise and fall in tandem? Or does one go up when the other goes down?
Example: Consider an investment portfolio containing stocks, bonds, and com-
modities. The DCC GARCH model can help in understanding how these assets interact
with each other. For instance, when stock prices drop, do bond prices usually rise (indi-
cating a negative correlation), or do they also fall (positive correlation)? Understand-
ing these dynamic correlations can guide investment strategies, especially in terms of
diversification and hedging.
In financial markets, where large changes (jumps or crashes) can occur, models like
EGARCH, which can capture the asymmetric response of volatility to shocks, might be
preferred.
9 Tests in Statistics
Statistical tests provide a structured way to make decisions based on data. Whether
you’re trying to determine if a new investment strategy outperforms an old one, or if
a certain stock’s returns are normally distributed, statistical tests offer a way to draw
conclusions with a known level of confidence.
9.1 t-test
The t-test is used to determine if there’s a significant difference between the means of
two groups. It can be a two-sample t-test (for comparing two groups) or a paired t-test
(for comparing two measurements taken on the same group).
Example: Imagine two investment strategies, A and B. To determine which strategy
yields higher returns, a financial analyst could use the two-sample t-test. If Strategy A
is applied to a portfolio in one year and Strategy B in the next, a paired t-test might be
more appropriate to account for market variations.
9.2 ANOVA (Analysis of Variance)

ANOVA tests the hypothesis that the means among two or more groups are equal. It’s
an extension of the t-test to more than two groups.
Example: If a fund manager wants to compare the performance of three different
investment strategies over a year, ANOVA can help determine if there’s a significant
difference in the returns generated by these strategies.
9.3 Chi-Squared Test

This test is used to determine if there’s a significant association between two categorical
variables. It’s commonly used for testing relationships on categorical data.
Example: To understand if there’s a relationship between investment risk categories
(high, medium, low) and investment outcomes (profit, loss), a financial analyst might
use the Chi-Squared test.
12
9.4 Shapiro-Wilk Test
This test checks whether a variable follows a normal distribution, which is a common
assumption in many financial models.
Example: Before applying certain stock pricing models, an analyst might use the
Shapiro-Wilk test to check if the stock returns are normally distributed.
9.5 Durbin-Watson Test

Used primarily in regression analysis, the Durbin-Watson test checks for autocorrelation
(a relationship between values separated from each other by a given time lag) in the
residuals of a regression model.
Example: After building a regression model to forecast stock prices, a financial
analyst might use the Durbin-Watson test to ensure that the model’s predictions are not
unduly influenced by past errors.
9.6 Mann-Whitney U Test

This non-parametric test is used to compare two independent samples when the data
doesn’t meet the criteria for a t-test (e.g., data is not normally distributed).
Example: If an analyst has data on the returns of two non-normally distributed
investment portfolios, the Mann-Whitney U test can determine if one portfolio typically
outperforms the other.
9.7 Variance Inflation Factor (VIF)

VIF is a measure used to detect multicollinearity in regression models. Multicollinear-
ity occurs when two or more independent variables in a regression model are highly
correlated. A high VIF indicates that an independent variable is highly linearly related
to the others.
Example: In a model predicting stock returns based on multiple financial indi-
cators, a high VIF for two indicators might suggest that they’re providing redundant
information, possibly leading to unreliable or unstable estimates.
9.8 Breusch-Pagan Test

This test is used to detect heteroskedasticity in the residuals of a regression model.
Heteroskedasticity means that the variance of the errors varies across observations,
violating one of the classical linear regression assumptions.
Example: When modeling the returns of a volatile stock, the Breusch-Pagan test
can help ensure that the model’s predictions are consistent across different levels of
returns.
9.9 Ljung-Box Test

Used in time series analysis, the Ljung-Box test checks for autocorrelation in the residu-
als of a model. Autocorrelation occurs when values in a time series are correlated with
past values.
13
Example: After modeling a stock’s daily returns, an analyst might use the Ljung-
Box test to check that the model captures all significant temporal patterns, ensuring no
autocorrelation remains in the residuals.
9.10 Augmented Dickey-Fuller (ADF) Test

The ADF test checks a time series for stationarity. A stationary series has properties
(like mean and variance) that don’t change over time, which is an essential assumption
for many time series models.
Example: Before applying ARIMA modeling to forecast bond yields, an economist
might use the ADF test to ensure the time series is stationary or to determine the nec-
essary differencing order to achieve stationarity.
9.11 Engle’s ARCH Test

This test, developed by Robert Engle, detects autoregressive conditional heteroskedas-
ticity (ARCH) effects in the residuals of a time series model. It’s a precursor to the
GARCH model.
Example: When analyzing the volatility of a currency exchange rate, an analyst
might use Engle’s ARCH test to determine if a GARCH model, which accounts for chang-
ing volatility over time, is appropriate.
9.12 Bollerslev’s Multivariate GARCH

While the standard GARCH model is univariate, Bollerslev introduced a multivariate
version that can model the volatility of multiple time series simultaneously, capturing
the interdependencies between them.
Example: A portfolio manager interested in the co-movements of several stocks
might use a multivariate GARCH model to understand how volatilities and correlations
between stocks change over time, aiding in portfolio diversification and risk manage-
ment.
Thank You
14

Statistics For Quants 1696336178

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistics For Quants 1696336178

Uploaded by

Copyright:

Available Formats

Statistics for Quants: A Primer

Amit Kumar Jha, UBS

6 Time Series Analysis 7

7 Other Advanced Statistical Techniques 7

2.4 Variance and Standard Deviation

3.1 Normal Distribution

4.1 Hypothesis Testing

4.1.1 Type I Error (False Positive)

4.1.2 Type II Error (False Negative)

4.2 Confidence Intervals

4.3 Sample Size and Power

5.2 Multiple Regression

5.3 Coefficient of Determination (R2 )

5.4 Applications in Finance

6.1 Components of Time Series

2. Seasonality: Regular fluctuations in the data that occur at consistent intervals,

3. Cycles: Long-term wave-like patterns that aren’t as regular as seasonality.

Example: When analyzing monthly sales of a seasonal product, like beachwear,

6.3 Autocorrelation and Partial Autocorrelation

7 Other Advanced Statistical Techniques

7.2 Volatility Modeling

7.3 Risk Management and Value at Risk (VaR)

8 Time Series Models

8.1 Autoregressive Model (AR)

8.2 Moving Average Model (MA)

8.4 Autoregressive Integrated Moving Average (ARIMA)

8.5 Seasonal ARIMA (SARIMA)

8.6 Seasonal ARIMA with Exogenous Variables (SARIMAX)

8.7 Generalized Autoregressive Conditional Heteroskedasticity (GARCH)

8.8 Variants of GARCH

8.8.3 TGARCH (Threshold GARCH)

8.8.4 PGARCH (Power GARCH)

8.8.5 VGARCH (Variance GARCH)

8.8.6 DCC GARCH (Dynamic Conditional Correlation GARCH)

9.2 ANOVA (Analysis of Variance)

9.3 Chi-Squared Test

9.5 Durbin-Watson Test

9.6 Mann-Whitney U Test

9.7 Variance Inflation Factor (VIF)

9.8 Breusch-Pagan Test

9.9 Ljung-Box Test

9.10 Augmented Dickey-Fuller (ADF) Test

9.11 Engle’s ARCH Test

9.12 Bollerslev’s Multivariate GARCH

You might also like