Professional Documents
Culture Documents
We analyse the daily rates of return of US blue chip stocks over the 1993-2001 period.
Using the technique of random matrix theory, we show that the correlation matrix of
these rates of return is to a large extent dominated by noise rather than by true
information. These results confirm for this data set findings recently documented in the
econophysics literature.
However, the eigenvector associated with the principal eigenvalue of the correlation
matrix does contain true information and shows stability over time. This, the market
eigenvector, shows the extent to which the individual stocks tend to move together. We
quantify the fraction of total information contained within this eigenmode, which we
define as the information index
We find a clear positive relationship between the absolute changes in the variability of
the information index and the absolute changes in the variability of the market index.
Further, the absolute change in the variability of the information index lagged one day
has statistically significant predictive power for the absolute change in the variability of
the market index.
1. Introduction
A precise quantification of the correlations between the returns of different assets
traded in financial markets is of fundamental importance to risk management
where one attempts to diversify as widely as possible the character of the portfolio
(reducing exposure to sector/industry specific shocks). As a consequence of this
the correlation matrix is one of the cornerstones of much of modern financial
engineering such as CAPM (Elton et al, 1995) and Value at Risk.
The technique of Random Matrix Theory (RMT) has recently been applied to
financial market data to analyse the true degree of information content contained
within empirical correlation matrices formed from equity returns. RMT was
originally developed for the study of complex quantum mechanical systems.
We apply this technique to daily returns on leading US blue chip stocks using
daily data over the 1993 - 2001 period. The results are consistent with those of
the recent literature, in that empirical financial correlation matrices are in general
dominated by noise, but there do exist some significant, stable deviations of
empirical financial correlation matrices from the universal predictions of RMT.
The main purpose of this paper is two-fold. First, to show that this technique may
also yield an information index which characterises the degree to which the
movements of assets in a portfolio are correlated. Second, that the temporal
evolution of this index is well correlated with the volatility of the overall market
index.
The structure of the paper is as follows. Section 2 outlines the relevant concepts of
RMT. Section 3 then applies these concepts to the analysis of a portfolio of US
blue chip equities. Finally the main results are summarised in section 4.
It was also demonstrated that RMT predictions represent an average over all
possible interactions. Hence RMT predictions are universal predictions that will
apply to wide classes of systems. Deviations from the universal predictions of
RMT identify system-specific, non-random properties of the system under
consideration. These deviations provide clues about the underlying interactions
within the system (Mehta 1991).
then the correlation matrix measuring the correlations between the N assets is
given by
1
C=
T
MM
T
If the T observations are i.i.d random variables then in the limit N → ∞ and
T → ∞ the density of eigenvalues, λ , of the random correlation matrix C is
given by (Sengupta et al 1999)
The upper and lower bounds on the theoretical eigenvalue distribution are given
by,
1 2
λmax = σ 2 (1 + )
Q
1 2
λmin = σ 2 (1 − )
Q
(σ
2
is the variance of the elements of M , usually rescaled to unity). This
distribution is plotted below in figure 1 for Q = 3.22. As can be seen from this
figure there is a well-defined range of non-zero eigenvalues λmin < λ < λmax .
This range of eigenvalues corresponds to a random, noisy subspace band where
the postulates of RMT hold. That is to say, the eigenvectors corresponding to
eigenvalues within λmin < λ < λmax contain no genuine information.
When the dimensions of the random matrix under consideration are finite (but still
‘large’) this has the effect of broadening the spectral distribution. However in
these instances Monte-Carlo simulation can generate what the broadened
eigenvalue distribution is expected to be.
0.9
0.8
0.7
0.6
0.5
Density
0.4
0.3
0.2
0.1
0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8
Eigenvalue
α
Component i of an eigenvector vi corresponds to the contribution of time series
N
I α = ∑ (viα ) 4
i =1
Hence if the eigenvector structure remains perfectly stable in time (i.e. the
correlations between the assets contributing to that eigenvector remain stable from
period to period) then each element of the overlap matrix would be equal to
Oij (TA , TB ) = δ ij . No inter-period stability would imply that Oij (TA , TB ) = 0 .
As a control this data set is also analysed after each of the time series of the
individual assets are shuffled at random 10000 times. This has the effect of
destroying any temporal correlations in the data, while at the same time
preserving the statistical properties of the distributions (e.g. mean and variance).
This randomly shuffled portfolio will act as a control to demonstrate there exists a
quantitative difference between the eigenspectra of random and empirical
correlation matrices.
For the correlation matrix (of dimension 31 x 31) formed from the shuffled data
we find that all 31 eigenvalues of the correlation matrix fall within the upper and
lower bounds. For the correlation matrix formed from the original data set we
observe that there are 17 eigenvalues below the lower bound, 4 eigenvalues above
the upper bound and therefore 10 eigenvalues which fall between the upper and
lower bounds.
We of course expect that for the shuffled data there should be no information
content contained within the time series since the process of shuffling the data
destroys any temporal correlations in the data. However the observation of a
significant number of eigenvalues outside the RMT bounds for the original, un-
shuffled, portfolio demonstrates that there does indeed exist genuine, non-random,
correlated movements between groups of assets within the portfolio.
We may also examine the stability of these correlations over time. Firstly we
choose two non-overlapping time periods of approximately 4 years in duration
and calculate the eigenspectra properties of the correlation matrices formed from
these two analysis periods. For matrices of these dimensions the theoretical upper
and lower eigenvalues are 1.38 and 0.68 respectively. For the two analysis periods
it is found that the numbers of eigenvalues below the theoretical minimum are 12
and 15 and above the theoretical maximum are identical (being 4). This indicates
that that large scale macrostructure of the portfolio remains unchanged over the
course of the 8 year total analysis period.
We can also calculate the overlap matrix between the two periods. This is shown
in figure 2. For these two analysis periods the overlap between the eigenvectors
corresponding to the largest eigenvalue is 0.99. In addition to this if we repeat the
analysis with 10 non-overlapping periods (each of 200 trading days in duration)
we also observe an average overlap for the eigenvectors corresponding to the
largest eigenvalue yields an average degree of overlap of 0.95. These numbers
represent a significant degree of temporal stability of the eigenvector structure.
Figure 2 : Colour coded plot of the degree of overlap of the eigenvectors
corresponding to 2 non-overlapping analysis periods for the US blue chip
portfolio. A white square corresponds to perfect overlap between the structure of
the 2 eigenvectors (perfect stability of the degree of information content in that
eigenmode) and black corresponds to no degree of overlap whatsoever. As can be
seen, the degree of stability of the market eigenmode (i.e. the dot product of
eigenvector 1 with itself in each of the two periods - bottom right hand corner) is
significantly different from that of any of the other overlaps.
3.3 Analysis of the ‘Market’ Eigenmode
In terms of those eigenvalues which lie outside the noisy sub-space band the most
important is the largest eigenvalue. The application of RMT techniques to equities
traded in financial markets have demonstrated that this eigenmode corresponds to
the ‘market’ (e.g. Gopikrishnan et al, 2000).
In particular, for this data set (2067 observations of daily returns for 31 assets),
the maximum eigenvalue of the correlation matrix is 7.05 (the remainder of the
eigenvalues are in the range 2.15 to 0.34). The theoretical maximum eigenvalue is
1.26 so it is clear that the largest empirically observed eigenvalue is significantly
above this threshold.
In order to quantify this overall collective motion of the portfolio’s asset price
dynamics we may exploit the fact that the trace of the correlation matrix is
preserved. That is, for the US blue chip portfolio of 31 assets, the trace of the
correlation matrix is equal to 31 (since there are 31 independent time series). The
closer the 'market' eigenmode (i.e. the maximum eigenvalue) is to this value the
more information is contained within this mode and the more correlated the
movements of the price changes of the assets within the portfolio are. We may
therefore quantify the fraction of total information contained within this
eigenmode – the information index - expressed as a percentage by the following
formula
λ max
Q (t ) = 100
N
If the assets in the portfolio move together very closely, then we would expect
Q (t ) → 100% . Conversely, if the asset price movements are completely
uncorrelated then we would expect Q (t ) → 0% (corresponding to no collective
dynamics).
The analysis is undertaken with a fixed window of data. Within this window, the
spectral properties of the correlation matrix formed from the constituent elements
of the US blue chip portfolio are calculated. In particular, the maximum
eigenvalue is calculated. This window is then advanced by one period
(corresponding, in this data set, to one trading day) and the maximum eigenvalue
noted for each period. The same procedure is followed for the Dow Jones
Industrial Index (DJIA) itself. A window of 250 periods, which corresponds to
approximately one year in terms of elapsed time, was chosen for the analysis. As
previously the correlation matrix is formed from the returns on the assets.
Plotted in figures 3a and 3b respectively are the absolute values of the logarithmic
differences of DJIA and the information index, Q. The absolute value of the
logarithmic differences represents a proxy for the volatility of the time series
(Ponzi, 2000) (with a window of 250 trading days). Figure 3a (for the DJIA)
demonstrates that there exists periods of ‘bursts’ of volatility interspersed by
periods of low volatility (so-called volatility clustering characteristic of the
presence of long-range temporal correlations in the volatility). Inspection of the
charts suggests that the two measures exhibit a significant degree of correlation.
In particular it is apparent that bursts of extreme volatility in the DJIA are
reflected in similar bursts in the information index.
Volatility of DJIA
0.08
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0
29/12/93 29/12/94 29/12/95 29/12/96 29/12/97 29/12/98 29/12/99 29/12/00
Figure 3a : Plot of volatility of the DJIA for the period 4th January 1998 – 13th
March2001
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
29/12/93 29/12/94 29/12/95 29/12/96 29/12/97 29/12/98 29/12/99 29/12/00
Figure 3b : Plot of volatility of the information index for the period 4th January 1998 –
13th March 2001
A scatter plot of the two variables, set out in Figure 4, does suggest a positive
relationship between them. The simple correlation coefficient, ρ, is in fact 0.462,
highly statistically significantly different from zero.
0.12
0.10
Volatility of Information Index
0.08
0.06
0.04
0.02
0.0
Figure 4 : Scatter plot demonstrating the relationship between the volatility of the
returns on the DJIA with the volatility of the returns on the information index
Using the full data set from the windowing, we have N = 1818 trading days. The
significant positive correlation persists even when the large potential outliers are
trimmed from the data set. For example, using only those observations where the
absolute value of the maximum eigenvalue is < 0.03 gives N = 1789 and ρ =
0.371. Figure 4 shows that the overwhelming bulk of the data is concentrated at
low values of the variables, but even choosing only those observations where the
absolute value of the maximum eigenvalue is < 0.01 gives ρ = 0.283, with N =
1610.
These results suggest that the volatility of the Dow Jones Industrial Average is
positively correlated with the volatility of the degree of information in the
eigenvector associated with the 'market' eigenvalue.
We then examined the possibility that the volatility of the degree of information in
the market eigenvector - the degree to which the constituent stocks move together
- might have some predictive power as far as the volatility of the overall index is
concerned.
We carried out classical least squares regression of the volatility of the Dow Jones
index on lagged values of the volatility of the maximum eigenvalue. Empirically
only the first lagged value was statistically significant. The estimated coefficient
was 0.0965 with a standard error of 0.0213, so the coefficient is significantly
different from zero at p< 0.0001.
By choosing a sufficiently large value for the span, all the points in the data set
are in the neighbourhood of every single point. In other words, in the limit the
local regression technique is identical to that of classical least squares. This
enables us to carry out standard analysis of variance on the results for different
choices of the span. In this case, a value of 0.8 represents the best choice of the
span. The reduction in the residual sum of squares compared to that obtained with
classical least squares is significantly different from zero at p = 0.00012.
However, the degree of non-linearity is not strong. The equivalent number of
parameters in the local regression model with span = 0.8 is only 2.5, indicating
that the local regression model is somewhere between linear and a quadratic one
in complexity [ref S-Plus Modern Statistics and Advances Graphics, Guide to
Statistics vol. 1, Mathsoft, Seattle, 2000]
4. Conclusions
The correlation matrix of returns is of fundamental importance to much modern
portfolio analysis. However, recent literature in the physics journals using the
technique of random matrix theory has shown that such empirical correlation
matrices contain substantial amounts of noise rather than true information. The
results presented here confirm these findings with a data set of daily returns on
US blue chip stocks over the 1993-2001 period.