You are on page 1of 23

Market Correlation and Market Volatility

in US Blue Chip Stocks

Craig Mounfield (craig.mounfield@volterra.co.uk)


and
Paul Ormerod (pormerod@volterra.co.uk)

Volterra Consulting Ltd

The Old Power Station


121 Mortlake High Street
London SW14 8SN

Crowell Prize Submission


20th March 2001
Abstract

We analyse the daily rates of return of US blue chip stocks over the 1993-2001 period.
Using the technique of random matrix theory, we show that the correlation matrix of
these rates of return is to a large extent dominated by noise rather than by true
information. These results confirm for this data set findings recently documented in the
econophysics literature.

However, the eigenvector associated with the principal eigenvalue of the correlation
matrix does contain true information and shows stability over time. This, the market
eigenvector, shows the extent to which the individual stocks tend to move together. We
quantify the fraction of total information contained within this eigenmode, which we
define as the information index

We find a clear positive relationship between the absolute changes in the variability of
the information index and the absolute changes in the variability of the market index.
Further, the absolute change in the variability of the information index lagged one day
has statistically significant predictive power for the absolute change in the variability of
the market index.
1. Introduction
A precise quantification of the correlations between the returns of different assets
traded in financial markets is of fundamental importance to risk management
where one attempts to diversify as widely as possible the character of the portfolio
(reducing exposure to sector/industry specific shocks). As a consequence of this
the correlation matrix is one of the cornerstones of much of modern financial
engineering such as CAPM (Elton et al, 1995) and Value at Risk.

However it is well understood that empirical measurements of the correlations


between assets are subject to a number of significant sources of potential error.
The difficulties associated with determining the true correlations between
financial assets arise primarily due to :

• Non-stationary correlations between assets (due, for example, to an


organisations profile changing over time)

• A finite number of observations of asset price movements (the statistical


significance of spurious measurements becomes insignificant in the limit of an
infinite number of observations of asset pair price movements). In this case
the empirically measured correlations may be significantly noise dominated
masking the true correlations between asset returns.

The technique of Random Matrix Theory (RMT) has recently been applied to
financial market data to analyse the true degree of information content contained
within empirical correlation matrices formed from equity returns. RMT was
originally developed for the study of complex quantum mechanical systems.

In order to assess the degree to which an empirical correlation matrix is noise


dominated we can compare the eigenspectra properties of the empirical matrix
with the theoretical eigenspectra properties of a random matrix. Undertaking this
analysis will identify those eigenstates of the empirical correlation matrix which
contain genuine information content. The remaining eigenstates will be noise
dominated and hence unstable over time. This technique has recently been applied
by a number of researchers to financial market data (for example, Mantegna et al
1999, Laloux et al 1999, Plerou et al 1999, Gopikrishnan et al 2000, Plerou 2000,
Bouchaud et al 2000, Drozdz et al 2001) as well as to macroeconomic data
(Ormerod et al, 2000).

We apply this technique to daily returns on leading US blue chip stocks using
daily data over the 1993 - 2001 period. The results are consistent with those of
the recent literature, in that empirical financial correlation matrices are in general
dominated by noise, but there do exist some significant, stable deviations of
empirical financial correlation matrices from the universal predictions of RMT.

The main purpose of this paper is two-fold. First, to show that this technique may
also yield an information index which characterises the degree to which the
movements of assets in a portfolio are correlated. Second, that the temporal
evolution of this index is well correlated with the volatility of the overall market
index.

The structure of the paper is as follows. Section 2 outlines the relevant concepts of
RMT. Section 3 then applies these concepts to the analysis of a portfolio of US
blue chip equities. Finally the main results are summarised in section 4.

2. Random Matrix Theory


The problem of understanding the properties of matrices with stochastically
fluctuating entries is one which has been studied intensively since the 1950’s in
the context of nuclear physics. In this context the problem was to understand the
empirically observed energy spectra of complex quantum mechanical systems
(specifically heavy nuclei composed of many interacting constituents).
In order to characterise these properties it was assumed that the numerous many-
body interactions are in fact so complex that in the aggregate they may be
considered to be random. That is, the elements of the Hamiltonian matrix H ij

may be considered to be mutually independent random variables. Under this


assumption it was possible to derive the statistics of the eigenvalue distribution of
the Hamiltonian which were in remarkable agreement with experimental data (a
contemporary exposition of RMT may be found in Mehta, 1991).

It was also demonstrated that RMT predictions represent an average over all
possible interactions. Hence RMT predictions are universal predictions that will
apply to wide classes of systems. Deviations from the universal predictions of
RMT identify system-specific, non-random properties of the system under
consideration. These deviations provide clues about the underlying interactions
within the system (Mehta 1991).

In order to assess the degree to which an empirical correlation matrix is noise


dominated one may compare the eigenspectra properties of the empirical matrix
with the theoretical eigenspectra properties of a random matrix. Undertaking this
analysis will identify those eigenstates of the empirical matrix which contain
genuine information content. The remaining eigenstates are understood to be
noise dominated and hence potentially unstable over time. The eigenstates that
contain genuine information content are specific to the system under
consideration and are indicative of the presence of collective modes of motion.

2.1 Eigenspectra Properties of Random Matrices


Consider a matrix M of T observations of price changes of N assets (at a

frequency of e.g. inter-day observations). If the inter-period logarithmic returns


are defined as
M i (t ) = ln Pi (t ) − ln Pi (t − 1)

then the correlation matrix measuring the correlations between the N assets is
given by

1
C=
T
MM
T

If the T observations are i.i.d random variables then in the limit N → ∞ and
T → ∞ the density of eigenvalues, λ , of the random correlation matrix C is
given by (Sengupta et al 1999)

Q (λmax − λ )(λ − λmin )


ρ C (λ ) =
2πσ 2 λ

for λ ∈ [λmin , λmax ] where Q = T ≥ 1.


N

The upper and lower bounds on the theoretical eigenvalue distribution are given
by,

1 2
λmax = σ 2 (1 + )
Q

1 2
λmin = σ 2 (1 − )
Q


2
is the variance of the elements of M , usually rescaled to unity). This

distribution is plotted below in figure 1 for Q = 3.22. As can be seen from this
figure there is a well-defined range of non-zero eigenvalues λmin < λ < λmax .
This range of eigenvalues corresponds to a random, noisy subspace band where
the postulates of RMT hold. That is to say, the eigenvectors corresponding to
eigenvalues within λmin < λ < λmax contain no genuine information.

The eigenvalue distribution of the correlation matrices of matrices of actual data


can be compared to this ‘null-hypothesis’ distribution and thus, in theory, if the
distribution of eigenvalues of an empirically formed matrix differs from the above
distribution, then that matrix will not have completely random elements. In other
words, there will be structure present in the correlation matrix. Each isolated
eigenstate outside of the RMT bounds represents a correlated group whose size
and participants are obtained from the eigenvalue and eigenvector respectively.

When the dimensions of the random matrix under consideration are finite (but still
‘large’) this has the effect of broadening the spectral distribution. However in
these instances Monte-Carlo simulation can generate what the broadened
eigenvalue distribution is expected to be.
0.9

0.8

0.7

0.6

0.5
Density

0.4

0.3

0.2

0.1

0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8
Eigenvalue

Figure 1 : Theoretical Density of Eigenvalues for a Random Matrix


2.2 The Inverse Participation Ratio
To analyse the structure of the eigenvectors of the empirical correlation matrix the
inverse Participation Ratio (IPR) may be calculated. The IPR is commonly
utilised in localisation theory to quantify the contribution of the different
components of an eigenvector to the magnitude of that eigenvector (thus
determining if an eigenstate is localised or extended) (Plerou et al 1999).

α
Component i of an eigenvector vi corresponds to the contribution of time series

i to that eigenvector. That is to say, in this context, it corresponds to the


contribution of asset i to eigenvector α . In order to quantify this we define the
IPR for eigenvector α to be

N
I α = ∑ (viα ) 4
i =1

Hence an eigenvector with identical components vi = 1


α
N will have
I α = 1 N and an eigenvector with one non-zero component will have I α = 1.
Therefore the inverse participation ratio is the reciprocal of the number of
eigenvector components significantly different from zero (i.e. the number of
assets contributing to that eigenvector).

2.3 Temporal Stability of the Eigenvector Structure


For those eigenvectors that deviate from the theoretically predicted bounds of
RMT it is important to quantify the degree of stability of the information content
of the eigenmode (i.e. the stability of the correlations between the assets). This is
necessary since spurious correlations may be introduced by a particular choice of
data to calculate the correlation matrix from. We may assess this stability by
calculating the scalar product of eigenvectors in non-overlapping analysis periods.
That is for two analysis periods TA and TB we form the overlap matrix
 v N (T A ) ⋅ v N (TB ) . . . v (T A ) ⋅ v (TB ) 
N 1
 
 . . . 

O(T A , TB ) =  
. . . 
 . . . 
 1 
 v (T A ) ⋅ v (TB ) v (T A ) ⋅ v (TB ) 
N 1 1
. . .

Hence if the eigenvector structure remains perfectly stable in time (i.e. the
correlations between the assets contributing to that eigenvector remain stable from
period to period) then each element of the overlap matrix would be equal to
Oij (TA , TB ) = δ ij . No inter-period stability would imply that Oij (TA , TB ) = 0 .

3. RMT Applied to Empirical Correlation Matrices


Having described the basic analysis tools of RMT we will now apply this
technology to financial correlation matrices.

3.1 Data Analysed


The data set is for 31 US equities (blue chips, mostly Dow Jones Industrial
Average constituents), daily closing data for the period 4th January 1993 to 13th
March 2001. There are 2068 separate trading days (taking out holidays etc).

As a control this data set is also analysed after each of the time series of the
individual assets are shuffled at random 10000 times. This has the effect of
destroying any temporal correlations in the data, while at the same time
preserving the statistical properties of the distributions (e.g. mean and variance).
This randomly shuffled portfolio will act as a control to demonstrate there exists a
quantitative difference between the eigenspectra of random and empirical
correlation matrices.

3.2 Analysis of the Eigenspectra Properties


To demonstrate that RMT may yield genuine information as to the true
information content contained within an empirical correlation matrix we will
calculate the eigenspectra properties of the two portfolios described above. That is
to say we form the correlation matrix from the inter-day returns of the assets
(there are thus 2067 observations of daily price changes for the 31 assets). For a
matrix of this dimension the theoretical upper and lower bounds for the
eigenvalue distribution are 1.26 and 0.77 respectively.

For the correlation matrix (of dimension 31 x 31) formed from the shuffled data
we find that all 31 eigenvalues of the correlation matrix fall within the upper and
lower bounds. For the correlation matrix formed from the original data set we
observe that there are 17 eigenvalues below the lower bound, 4 eigenvalues above
the upper bound and therefore 10 eigenvalues which fall between the upper and
lower bounds.

We of course expect that for the shuffled data there should be no information
content contained within the time series since the process of shuffling the data
destroys any temporal correlations in the data. However the observation of a
significant number of eigenvalues outside the RMT bounds for the original, un-
shuffled, portfolio demonstrates that there does indeed exist genuine, non-random,
correlated movements between groups of assets within the portfolio.

We may also examine the stability of these correlations over time. Firstly we
choose two non-overlapping time periods of approximately 4 years in duration
and calculate the eigenspectra properties of the correlation matrices formed from
these two analysis periods. For matrices of these dimensions the theoretical upper
and lower eigenvalues are 1.38 and 0.68 respectively. For the two analysis periods
it is found that the numbers of eigenvalues below the theoretical minimum are 12
and 15 and above the theoretical maximum are identical (being 4). This indicates
that that large scale macrostructure of the portfolio remains unchanged over the
course of the 8 year total analysis period.
We can also calculate the overlap matrix between the two periods. This is shown
in figure 2. For these two analysis periods the overlap between the eigenvectors
corresponding to the largest eigenvalue is 0.99. In addition to this if we repeat the
analysis with 10 non-overlapping periods (each of 200 trading days in duration)
we also observe an average overlap for the eigenvectors corresponding to the
largest eigenvalue yields an average degree of overlap of 0.95. These numbers
represent a significant degree of temporal stability of the eigenvector structure.
Figure 2 : Colour coded plot of the degree of overlap of the eigenvectors
corresponding to 2 non-overlapping analysis periods for the US blue chip
portfolio. A white square corresponds to perfect overlap between the structure of
the 2 eigenvectors (perfect stability of the degree of information content in that
eigenmode) and black corresponds to no degree of overlap whatsoever. As can be
seen, the degree of stability of the market eigenmode (i.e. the dot product of
eigenvector 1 with itself in each of the two periods - bottom right hand corner) is
significantly different from that of any of the other overlaps.
3.3 Analysis of the ‘Market’ Eigenmode
In terms of those eigenvalues which lie outside the noisy sub-space band the most
important is the largest eigenvalue. The application of RMT techniques to equities
traded in financial markets have demonstrated that this eigenmode corresponds to
the ‘market’ (e.g. Gopikrishnan et al, 2000).

In particular, for this data set (2067 observations of daily returns for 31 assets),
the maximum eigenvalue of the correlation matrix is 7.05 (the remainder of the
eigenvalues are in the range 2.15 to 0.34). The theoretical maximum eigenvalue is
1.26 so it is clear that the largest empirically observed eigenvalue is significantly
above this threshold.

Analysis of the eigenvector corresponding to the largest eigenvalue demonstrates


that each of the 31 components of the eigenvector contribute approximately an
equal amount to the eigenvector. Indeed, the IPR for this eigenvector is 0.037.
This is to be compared with the value of 0.032 (1/N) that we would expect if all of
the assets contributed equally to the eigenvector. This indicates that this
eigenmode is ‘extended’. Hence the behaviour of this eigenmode is indicative of
large-scale correlated movements of all of the assets within the portfolio.

In order to quantify this overall collective motion of the portfolio’s asset price
dynamics we may exploit the fact that the trace of the correlation matrix is
preserved. That is, for the US blue chip portfolio of 31 assets, the trace of the
correlation matrix is equal to 31 (since there are 31 independent time series). The
closer the 'market' eigenmode (i.e. the maximum eigenvalue) is to this value the
more information is contained within this mode and the more correlated the
movements of the price changes of the assets within the portfolio are. We may
therefore quantify the fraction of total information contained within this
eigenmode – the information index - expressed as a percentage by the following
formula
λ max
Q (t ) = 100
N

If the assets in the portfolio move together very closely, then we would expect
Q (t ) → 100% . Conversely, if the asset price movements are completely
uncorrelated then we would expect Q (t ) → 0% (corresponding to no collective
dynamics).

3.4 Temporal Evolution of the Market Eigenmode


We have seen that the eigenmode of the empirical correlation matrix
corresponding to the maximum eigenvalue represents a collective motion of all of
the assets within the portfolio. What is of interest is to determine how this
eigenmode evolves temporally.

The analysis is undertaken with a fixed window of data. Within this window, the
spectral properties of the correlation matrix formed from the constituent elements
of the US blue chip portfolio are calculated. In particular, the maximum
eigenvalue is calculated. This window is then advanced by one period
(corresponding, in this data set, to one trading day) and the maximum eigenvalue
noted for each period. The same procedure is followed for the Dow Jones
Industrial Index (DJIA) itself. A window of 250 periods, which corresponds to
approximately one year in terms of elapsed time, was chosen for the analysis. As
previously the correlation matrix is formed from the returns on the assets.

Plotted in figures 3a and 3b respectively are the absolute values of the logarithmic
differences of DJIA and the information index, Q. The absolute value of the
logarithmic differences represents a proxy for the volatility of the time series
(Ponzi, 2000) (with a window of 250 trading days). Figure 3a (for the DJIA)
demonstrates that there exists periods of ‘bursts’ of volatility interspersed by
periods of low volatility (so-called volatility clustering characteristic of the
presence of long-range temporal correlations in the volatility). Inspection of the
charts suggests that the two measures exhibit a significant degree of correlation.
In particular it is apparent that bursts of extreme volatility in the DJIA are
reflected in similar bursts in the information index.
Volatility of DJIA

0.08
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0
29/12/93 29/12/94 29/12/95 29/12/96 29/12/97 29/12/98 29/12/99 29/12/00

Figure 3a : Plot of volatility of the DJIA for the period 4th January 1998 – 13th
March2001

Volatility of Information Index

0.14

0.12

0.1

0.08

0.06

0.04

0.02

0
29/12/93 29/12/94 29/12/95 29/12/96 29/12/97 29/12/98 29/12/99 29/12/00

Figure 3b : Plot of volatility of the information index for the period 4th January 1998 –
13th March 2001
A scatter plot of the two variables, set out in Figure 4, does suggest a positive
relationship between them. The simple correlation coefficient, ρ, is in fact 0.462,
highly statistically significantly different from zero.

0.12
0.10
Volatility of Information Index
0.08
0.06
0.04
0.02
0.0

0.0 0.02 0.04 0.06


Volatility of DJIA

Figure 4 : Scatter plot demonstrating the relationship between the volatility of the
returns on the DJIA with the volatility of the returns on the information index

Using the full data set from the windowing, we have N = 1818 trading days. The
significant positive correlation persists even when the large potential outliers are
trimmed from the data set. For example, using only those observations where the
absolute value of the maximum eigenvalue is < 0.03 gives N = 1789 and ρ =
0.371. Figure 4 shows that the overwhelming bulk of the data is concentrated at
low values of the variables, but even choosing only those observations where the
absolute value of the maximum eigenvalue is < 0.01 gives ρ = 0.283, with N =
1610.
These results suggest that the volatility of the Dow Jones Industrial Average is
positively correlated with the volatility of the degree of information in the
eigenvector associated with the 'market' eigenvalue.

We then examined the possibility that the volatility of the degree of information in
the market eigenvector - the degree to which the constituent stocks move together
- might have some predictive power as far as the volatility of the overall index is
concerned.

We carried out classical least squares regression of the volatility of the Dow Jones
index on lagged values of the volatility of the maximum eigenvalue. Empirically
only the first lagged value was statistically significant. The estimated coefficient
was 0.0965 with a standard error of 0.0213, so the coefficient is significantly
different from zero at p< 0.0001.

We examined this relationship using the general non-linear least squares


technique of local regression (available in the program S-Plus, for example). This
technique fits a curve to the data points locally, so that any point on the curve at
that point depends only on the observations at that point and some specified
neighbouring points. For any given data point, x(t) say, we choose the k nearest
neighbours of x(t), which constitute a neighbourhood N(x(t)). The number of
neighbours k is specified as a percentage of the total available number of data
points. This percentage is called the span.

By choosing a sufficiently large value for the span, all the points in the data set
are in the neighbourhood of every single point. In other words, in the limit the
local regression technique is identical to that of classical least squares. This
enables us to carry out standard analysis of variance on the results for different
choices of the span. In this case, a value of 0.8 represents the best choice of the
span. The reduction in the residual sum of squares compared to that obtained with
classical least squares is significantly different from zero at p = 0.00012.
However, the degree of non-linearity is not strong. The equivalent number of
parameters in the local regression model with span = 0.8 is only 2.5, indicating
that the local regression model is somewhere between linear and a quadratic one
in complexity [ref S-Plus Modern Statistics and Advances Graphics, Guide to
Statistics vol. 1, Mathsoft, Seattle, 2000]

4. Conclusions
The correlation matrix of returns is of fundamental importance to much modern
portfolio analysis. However, recent literature in the physics journals using the
technique of random matrix theory has shown that such empirical correlation
matrices contain substantial amounts of noise rather than true information. The
results presented here confirm these findings with a data set of daily returns on
US blue chip stocks over the 1993-2001 period.

However, the correlation matrix does contain a certain amount of true


information. In particular, the eigenvector associated with the principal
eigenvalue of the correlation matrix enables us to identify the extent to which the
individual stocks are genuinely moving together over time. We use the term
'market eigenmode' to characterise this eigenvalue and vector. We demonstrate
that the market eigenmode is stable over time. We define the information index to
be the fraction of total information contained within this eigenmode

We analyse the temporal movements of variability of the information index and of


the variability of the index formed from the component stocks and find a clear
positive correlation between their absolute values. Further, the variability of the
information index lagged one day has statistically significant power in accounting
for movements in the current variability of the index.
5. References
J.-P. Bouchaud and M. Potters, Theory of Financial Risks – From Statistical
Physics to Risk Management, Cambridge University Press (2000)

S. Drozdz, J. Kwapien, F. Grummer, F. Ruf, J. Speth, Quantifying the Dynamics


of Financial Correlations, cond-mat/0102402 (2001)

E. J. Elton and M.J.Gruber, Modern Portfolio Theory and Investment Analysis,


J.Wiley and Sons, New York (1995)

P. Gopikrishnan, B. Rosenow, V. Plerou, and H.E. Stanley Identifying Business


Sectors from Stock Price Fluctuations, cond-mat/0011145 (2000)

L. Laloux, P. Cizeau, J.-P Bouchaud and M. Potters Noise Dressing of Financial


Correlation Matrices, Phys Rev Lett 83, 1467 (1999)

R. N. Mantegna and H. E. Stanley, An Introduction to Econophysics, Cambridge


University Press (2000)

M. Mehta, Random Matrices, Academic Press (1991)

P. Ormerod and C. Mounfield, Random Matrix Theory and the Failure of


Macroeconomic Forecasts, Physica A 280, 497 (2000)

V. Plerou, P. Gopikrishnan, B. Rosenow, L.A.N. Amaral and H.E. Stanley


Universal and Non-universal Properties of Cross-correlations in Financial Time
Series, Phys Rev Lett 83, 1471 (1999)

V. Plerou, P. Gopikrishnan, B. Rosenow, L.A.N. Amaral and H.E. Stanley A


Random Matrix Theory Approach to Financial Cross-Correlations, Physica A
287, 374 (2000)

A. Ponzi, The Volatility in a Multi-share Financial Market Model, cond-


mat/0012309 (2000) – To appear in European Physical Journal

A.M Sengupta and P. P. Mitra, Phys Rev E 60 3389 (1999)

You might also like