You are on page 1of 23

Market Correlation and Market Volatility in US Blue Chip Stocks

Craig Mounfield (craig.mounfield@volterra.co.uk) and Paul Ormerod (pormerod@volterra.co.uk)

Volterra Consulting Ltd

The Old Power Station 121 Mortlake High Street London SW14 8SN

Crowell Prize Submission 20th March 2001

Abstract

We analyse the daily rates of return of US blue chip stocks over the 1993-2001 period. Using the technique of random matrix theory, we show that the correlation matrix of these rates of return is to a large extent dominated by noise rather than by true information. These results confirm for this data set findings recently documented in the econophysics literature.

However, the eigenvector associated with the principal eigenvalue of the correlation matrix does contain true information and shows stability over time. This, the market eigenvector, shows the extent to which the individual stocks tend to move together. We quantify the fraction of total information contained within this eigenmode, which we define as the information index

We find a clear positive relationship between the absolute changes in the variability of the information index and the absolute changes in the variability of the market index. Further, the absolute change in the variability of the information index lagged one day has statistically significant predictive power for the absolute change in the variability of the market index.

1.

Introduction
A precise quantification of the correlations between the returns of different assets traded in financial markets is of fundamental importance to risk management where one attempts to diversify as widely as possible the character of the portfolio (reducing exposure to sector/industry specific shocks). As a consequence of this the correlation matrix is one of the cornerstones of much of modern financial engineering such as CAPM (Elton et al, 1995) and Value at Risk.

However it is well understood that empirical measurements of the correlations between assets are subject to a number of significant sources of potential error. The difficulties associated with determining the true correlations between financial assets arise primarily due to :

Non-stationary correlations between assets (due, for example, to an organisations profile changing over time)

A finite number of observations of asset price movements (the statistical significance of spurious measurements becomes insignificant in the limit of an infinite number of observations of asset pair price movements). In this case the empirically measured correlations may be significantly noise dominated masking the true correlations between asset returns.

The technique of Random Matrix Theory (RMT) has recently been applied to financial market data to analyse the true degree of information content contained within empirical correlation matrices formed from equity returns. RMT was originally developed for the study of complex quantum mechanical systems.

In order to assess the degree to which an empirical correlation matrix is noise dominated we can compare the eigenspectra properties of the empirical matrix with the theoretical eigenspectra properties of a random matrix. Undertaking this

analysis will identify those eigenstates of the empirical correlation matrix which contain genuine information content. The remaining eigenstates will be noise dominated and hence unstable over time. This technique has recently been applied by a number of researchers to financial market data (for example, Mantegna et al 1999, Laloux et al 1999, Plerou et al 1999, Gopikrishnan et al 2000, Plerou 2000, Bouchaud et al 2000, Drozdz et al 2001) as well as to macroeconomic data (Ormerod et al, 2000).

We apply this technique to daily returns on leading US blue chip stocks using daily data over the 1993 - 2001 period. The results are consistent with those of the recent literature, in that empirical financial correlation matrices are in general dominated by noise, but there do exist some significant, stable deviations of empirical financial correlation matrices from the universal predictions of RMT.

The main purpose of this paper is two-fold. First, to show that this technique may also yield an information index which characterises the degree to which the movements of assets in a portfolio are correlated. Second, that the temporal evolution of this index is well correlated with the volatility of the overall market index.

The structure of the paper is as follows. Section 2 outlines the relevant concepts of RMT. Section 3 then applies these concepts to the analysis of a portfolio of US blue chip equities. Finally the main results are summarised in section 4.

2.

Random Matrix Theory


The problem of understanding the properties of matrices with stochastically fluctuating entries is one which has been studied intensively since the 1950s in the context of nuclear physics. In this context the problem was to understand the empirically observed energy spectra of complex quantum mechanical systems (specifically heavy nuclei composed of many interacting constituents).

In order to characterise these properties it was assumed that the numerous manybody interactions are in fact so complex that in the aggregate they may be considered to be random. That is, the elements of the Hamiltonian matrix H ij may be considered to be mutually independent random variables. Under this assumption it was possible to derive the statistics of the eigenvalue distribution of the Hamiltonian which were in remarkable agreement with experimental data (a contemporary exposition of RMT may be found in Mehta, 1991).

It was also demonstrated that RMT predictions represent an average over all possible interactions. Hence RMT predictions are universal predictions that will apply to wide classes of systems. Deviations from the universal predictions of RMT identify system-specific, non-random properties of the system under consideration. These deviations provide clues about the underlying interactions within the system (Mehta 1991).

In order to assess the degree to which an empirical correlation matrix is noise dominated one may compare the eigenspectra properties of the empirical matrix with the theoretical eigenspectra properties of a random matrix. Undertaking this analysis will identify those eigenstates of the empirical matrix which contain genuine information content. The remaining eigenstates are understood to be noise dominated and hence potentially unstable over time. The eigenstates that contain genuine information content are specific to the system under consideration and are indicative of the presence of collective modes of motion.

2.1

Eigenspectra Properties of Random Matrices Consider a matrix M of T observations of price changes of N assets (at a frequency of e.g. inter-day observations). If the inter-period logarithmic returns are defined as

M i (t ) = ln Pi (t ) ln Pi (t 1)
then the correlation matrix measuring the correlations between the N assets is given by

C=

1 T MM T

If the T observations are i.i.d random variables then in the limit N and

T the density of eigenvalues, , of the random correlation matrix C is


given by (Sengupta et al 1999)

C ( ) =

Q 2 2

(max )( min ) 1.

for [min , max ] where Q = T

The upper and lower bounds on the theoretical eigenvalue distribution are given by,

max = 2 (1 +

1 2 ) Q 1 2 ) Q

min = 2 (1
(
2

is the variance of the elements of M , usually rescaled to unity). This

distribution is plotted below in figure 1 for Q = 3.22. As can be seen from this figure there is a well-defined range of non-zero eigenvalues min < < max .

This range of eigenvalues corresponds to a random, noisy subspace band where the postulates of RMT hold. That is to say, the eigenvectors corresponding to eigenvalues within min < < max contain no genuine information.

The eigenvalue distribution of the correlation matrices of matrices of actual data can be compared to this null-hypothesis distribution and thus, in theory, if the distribution of eigenvalues of an empirically formed matrix differs from the above distribution, then that matrix will not have completely random elements. In other words, there will be structure present in the correlation matrix. Each isolated eigenstate outside of the RMT bounds represents a correlated group whose size and participants are obtained from the eigenvalue and eigenvector respectively.

When the dimensions of the random matrix under consideration are finite (but still large) this has the effect of broadening the spectral distribution. However in these instances Monte-Carlo simulation can generate what the broadened eigenvalue distribution is expected to be.

0.9

0.8

0.7

0.6

Density

0.5

0.4

0.3

0.2

0.1

0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 Eigenvalue

Figure 1 : Theoretical Density of Eigenvalues for a Random Matrix

2.2

The Inverse Participation Ratio To analyse the structure of the eigenvectors of the empirical correlation matrix the inverse Participation Ratio (IPR) may be calculated. The IPR is commonly utilised in localisation theory to quantify the contribution of the different components of an eigenvector to the magnitude of that eigenvector (thus determining if an eigenstate is localised or extended) (Plerou et al 1999).

Component i of an eigenvector vi corresponds to the contribution of time series

i to that eigenvector. That is to say, in this context, it corresponds to the


contribution of asset i to eigenvector . In order to quantify this we define the IPR for eigenvector to be
N

I = (vi ) 4
i =1

Hence an eigenvector with identical components vi = 1

N will have

I = 1 N and an eigenvector with one non-zero component will have I = 1.


Therefore the inverse participation ratio is the reciprocal of the number of eigenvector components significantly different from zero (i.e. the number of assets contributing to that eigenvector).

2.3

Temporal Stability of the Eigenvector Structure For those eigenvectors that deviate from the theoretically predicted bounds of RMT it is important to quantify the degree of stability of the information content of the eigenmode (i.e. the stability of the correlations between the assets). This is necessary since spurious correlations may be introduced by a particular choice of data to calculate the correlation matrix from. We may assess this stability by calculating the scalar product of eigenvectors in non-overlapping analysis periods. That is for two analysis periods TA and TB we form the overlap matrix

v N (T A ) v N (TB ) . O(T A , TB ) = . . 1 v (T ) v N (T ) A B

. .

. .

. .

1 N v (T A ) v (TB ) . . . 1 1 v (T A ) v (TB )

Hence if the eigenvector structure remains perfectly stable in time (i.e. the correlations between the assets contributing to that eigenvector remain stable from period to period) then each element of the overlap matrix would be equal to

Oij (TA , TB ) = ij . No inter-period stability would imply that Oij (TA , TB ) = 0 . RMT Applied to Empirical Correlation Matrices
Having described the basic analysis tools of RMT we will now apply this technology to financial correlation matrices.

3.

3.1

Data Analysed The data set is for 31 US equities (blue chips, mostly Dow Jones Industrial Average constituents), daily closing data for the period 4th January 1993 to 13th March 2001. There are 2068 separate trading days (taking out holidays etc).

As a control this data set is also analysed after each of the time series of the individual assets are shuffled at random 10000 times. This has the effect of destroying any temporal correlations in the data, while at the same time preserving the statistical properties of the distributions (e.g. mean and variance). This randomly shuffled portfolio will act as a control to demonstrate there exists a quantitative difference between the eigenspectra of random and empirical correlation matrices.

3.2

Analysis of the Eigenspectra Properties To demonstrate that RMT may yield genuine information as to the true information content contained within an empirical correlation matrix we will

calculate the eigenspectra properties of the two portfolios described above. That is to say we form the correlation matrix from the inter-day returns of the assets (there are thus 2067 observations of daily price changes for the 31 assets). For a matrix of this dimension the theoretical upper and lower bounds for the eigenvalue distribution are 1.26 and 0.77 respectively.

For the correlation matrix (of dimension 31 x 31) formed from the shuffled data we find that all 31 eigenvalues of the correlation matrix fall within the upper and lower bounds. For the correlation matrix formed from the original data set we observe that there are 17 eigenvalues below the lower bound, 4 eigenvalues above the upper bound and therefore 10 eigenvalues which fall between the upper and lower bounds.

We of course expect that for the shuffled data there should be no information content contained within the time series since the process of shuffling the data destroys any temporal correlations in the data. However the observation of a significant number of eigenvalues outside the RMT bounds for the original, unshuffled, portfolio demonstrates that there does indeed exist genuine, non-random, correlated movements between groups of assets within the portfolio.

We may also examine the stability of these correlations over time. Firstly we choose two non-overlapping time periods of approximately 4 years in duration and calculate the eigenspectra properties of the correlation matrices formed from these two analysis periods. For matrices of these dimensions the theoretical upper and lower eigenvalues are 1.38 and 0.68 respectively. For the two analysis periods it is found that the numbers of eigenvalues below the theoretical minimum are 12 and 15 and above the theoretical maximum are identical (being 4). This indicates that that large scale macrostructure of the portfolio remains unchanged over the course of the 8 year total analysis period.

We can also calculate the overlap matrix between the two periods. This is shown in figure 2. For these two analysis periods the overlap between the eigenvectors corresponding to the largest eigenvalue is 0.99. In addition to this if we repeat the analysis with 10 non-overlapping periods (each of 200 trading days in duration) we also observe an average overlap for the eigenvectors corresponding to the largest eigenvalue yields an average degree of overlap of 0.95. These numbers represent a significant degree of temporal stability of the eigenvector structure.

Figure 2 : Colour coded plot of the degree of overlap of the eigenvectors corresponding to 2 non-overlapping analysis periods for the US blue chip portfolio. A white square corresponds to perfect overlap between the structure of the 2 eigenvectors (perfect stability of the degree of information content in that eigenmode) and black corresponds to no degree of overlap whatsoever. As can be seen, the degree of stability of the market eigenmode (i.e. the dot product of eigenvector 1 with itself in each of the two periods - bottom right hand corner) is significantly different from that of any of the other overlaps.

3.3

Analysis of the Market Eigenmode In terms of those eigenvalues which lie outside the noisy sub-space band the most important is the largest eigenvalue. The application of RMT techniques to equities traded in financial markets have demonstrated that this eigenmode corresponds to the market (e.g. Gopikrishnan et al, 2000).

In particular, for this data set (2067 observations of daily returns for 31 assets), the maximum eigenvalue of the correlation matrix is 7.05 (the remainder of the eigenvalues are in the range 2.15 to 0.34). The theoretical maximum eigenvalue is 1.26 so it is clear that the largest empirically observed eigenvalue is significantly above this threshold.

Analysis of the eigenvector corresponding to the largest eigenvalue demonstrates that each of the 31 components of the eigenvector contribute approximately an equal amount to the eigenvector. Indeed, the IPR for this eigenvector is 0.037. This is to be compared with the value of 0.032 (1/N) that we would expect if all of the assets contributed equally to the eigenvector. This indicates that this eigenmode is extended. Hence the behaviour of this eigenmode is indicative of large-scale correlated movements of all of the assets within the portfolio.

In order to quantify this overall collective motion of the portfolios asset price dynamics we may exploit the fact that the trace of the correlation matrix is preserved. That is, for the US blue chip portfolio of 31 assets, the trace of the correlation matrix is equal to 31 (since there are 31 independent time series). The closer the 'market' eigenmode (i.e. the maximum eigenvalue) is to this value the more information is contained within this mode and the more correlated the movements of the price changes of the assets within the portfolio are. We may therefore quantify the fraction of total information contained within this eigenmode the information index - expressed as a percentage by the following formula

Q (t ) = 100

max N

If the assets in the portfolio move together very closely, then we would expect

Q (t ) 100% . Conversely, if the asset price movements are completely


uncorrelated then we would expect Q (t ) 0% (corresponding to no collective dynamics).

3.4

Temporal Evolution of the Market Eigenmode We have seen that the eigenmode of the empirical correlation matrix corresponding to the maximum eigenvalue represents a collective motion of all of the assets within the portfolio. What is of interest is to determine how this eigenmode evolves temporally.

The analysis is undertaken with a fixed window of data. Within this window, the spectral properties of the correlation matrix formed from the constituent elements of the US blue chip portfolio are calculated. In particular, the maximum eigenvalue is calculated. This window is then advanced by one period (corresponding, in this data set, to one trading day) and the maximum eigenvalue noted for each period. The same procedure is followed for the Dow Jones Industrial Index (DJIA) itself. A window of 250 periods, which corresponds to approximately one year in terms of elapsed time, was chosen for the analysis. As previously the correlation matrix is formed from the returns on the assets.

Plotted in figures 3a and 3b respectively are the absolute values of the logarithmic differences of DJIA and the information index, Q. The absolute value of the logarithmic differences represents a proxy for the volatility of the time series (Ponzi, 2000) (with a window of 250 trading days). Figure 3a (for the DJIA) demonstrates that there exists periods of bursts of volatility interspersed by periods of low volatility (so-called volatility clustering characteristic of the

presence of long-range temporal correlations in the volatility). Inspection of the charts suggests that the two measures exhibit a significant degree of correlation. In particular it is apparent that bursts of extreme volatility in the DJIA are reflected in similar bursts in the information index.

Volatility of DJIA

0. 08 0. 07 0. 06 0. 05 0. 04 0. 03 0. 02 0. 01 0 29/ 93 12/ 29/ 94 12/ 29/ 95 12/ 29/ 96 12/ 29/ 97 12/ 29/ 98 12/ 29/ 99 12/ 29/ 00 12/

Figure 3a : Plot of volatility of the DJIA for the period 4th January 1998 13th March2001
Volatility of Information Index

0. 14 0. 12 0. 1 0. 08 0. 06 0. 04 0. 02 0 29/ 93 12/ 29/ 94 12/ 29/ 95 12/ 29/ 96 12/ 29/ 97 12/ 29/ 98 12/ 29/ 99 12/ 29/ 00 12/

Figure 3b : Plot of volatility of the information index for the period 4th January 1998 13th March 2001

A scatter plot of the two variables, set out in Figure 4, does suggest a positive relationship between them. The simple correlation coefficient, , is in fact 0.462, highly statistically significantly different from zero.

Volatility of Information Index

0.0 0.0

0.02

0.04

0.06

0.08

0.10

0.12

0.02

0.04
Volatility of DJIA

0.06

Figure 4 : Scatter plot demonstrating the relationship between the volatility of the returns on the DJIA with the volatility of the returns on the information index

Using the full data set from the windowing, we have N = 1818 trading days. The significant positive correlation persists even when the large potential outliers are trimmed from the data set. For example, using only those observations where the absolute value of the maximum eigenvalue is < 0.03 gives N = 1789 and = 0.371. Figure 4 shows that the overwhelming bulk of the data is concentrated at low values of the variables, but even choosing only those observations where the absolute value of the maximum eigenvalue is < 0.01 gives = 0.283, with N = 1610.

These results suggest that the volatility of the Dow Jones Industrial Average is positively correlated with the volatility of the degree of information in the eigenvector associated with the 'market' eigenvalue.

We then examined the possibility that the volatility of the degree of information in the market eigenvector - the degree to which the constituent stocks move together - might have some predictive power as far as the volatility of the overall index is concerned.

We carried out classical least squares regression of the volatility of the Dow Jones index on lagged values of the volatility of the maximum eigenvalue. Empirically only the first lagged value was statistically significant. The estimated coefficient was 0.0965 with a standard error of 0.0213, so the coefficient is significantly different from zero at p< 0.0001.

We examined this relationship using the general non-linear least squares technique of local regression (available in the program S-Plus, for example). This technique fits a curve to the data points locally, so that any point on the curve at that point depends only on the observations at that point and some specified neighbouring points. For any given data point, x(t) say, we choose the k nearest neighbours of x(t), which constitute a neighbourhood N(x(t)). The number of neighbours k is specified as a percentage of the total available number of data points. This percentage is called the span.

By choosing a sufficiently large value for the span, all the points in the data set are in the neighbourhood of every single point. In other words, in the limit the local regression technique is identical to that of classical least squares. This enables us to carry out standard analysis of variance on the results for different choices of the span. In this case, a value of 0.8 represents the best choice of the span. The reduction in the residual sum of squares compared to that obtained with

classical least squares is significantly different from zero at p = 0.00012. However, the degree of non-linearity is not strong. The equivalent number of parameters in the local regression model with span = 0.8 is only 2.5, indicating that the local regression model is somewhere between linear and a quadratic one in complexity [ref S-Plus Modern Statistics and Advances Graphics, Guide to Statistics vol. 1, Mathsoft, Seattle, 2000]

4.

Conclusions
The correlation matrix of returns is of fundamental importance to much modern portfolio analysis. However, recent literature in the physics journals using the technique of random matrix theory has shown that such empirical correlation matrices contain substantial amounts of noise rather than true information. The results presented here confirm these findings with a data set of daily returns on US blue chip stocks over the 1993-2001 period.

However, the correlation matrix does contain a certain amount of true information. In particular, the eigenvector associated with the principal

eigenvalue of the correlation matrix enables us to identify the extent to which the individual stocks are genuinely moving together over time. We use the term 'market eigenmode' to characterise this eigenvalue and vector. We demonstrate that the market eigenmode is stable over time. We define the information index to be the fraction of total information contained within this eigenmode

We analyse the temporal movements of variability of the information index and of the variability of the index formed from the component stocks and find a clear positive correlation between their absolute values. Further, the variability of the information index lagged one day has statistically significant power in accounting for movements in the current variability of the index.

5.

References
J.-P. Bouchaud and M. Potters, Theory of Financial Risks From Statistical Physics to Risk Management, Cambridge University Press (2000) S. Drozdz, J. Kwapien, F. Grummer, F. Ruf, J. Speth, Quantifying the Dynamics of Financial Correlations, cond-mat/0102402 (2001) E. J. Elton and M.J.Gruber, Modern Portfolio Theory and Investment Analysis, J.Wiley and Sons, New York (1995) P. Gopikrishnan, B. Rosenow, V. Plerou, and H.E. Stanley Identifying Business Sectors from Stock Price Fluctuations, cond-mat/0011145 (2000) L. Laloux, P. Cizeau, J.-P Bouchaud and M. Potters Noise Dressing of Financial Correlation Matrices, Phys Rev Lett 83, 1467 (1999) R. N. Mantegna and H. E. Stanley, An Introduction to Econophysics, Cambridge University Press (2000) M. Mehta, Random Matrices, Academic Press (1991) P. Ormerod and C. Mounfield, Random Matrix Theory and the Failure of Macroeconomic Forecasts, Physica A 280, 497 (2000) V. Plerou, P. Gopikrishnan, B. Rosenow, L.A.N. Amaral and H.E. Stanley Universal and Non-universal Properties of Cross-correlations in Financial Time Series, Phys Rev Lett 83, 1471 (1999) V. Plerou, P. Gopikrishnan, B. Rosenow, L.A.N. Amaral and H.E. Stanley A Random Matrix Theory Approach to Financial Cross-Correlations, Physica A 287, 374 (2000) A. Ponzi, The Volatility in a Multi-share Financial Market Model, condmat/0012309 (2000) To appear in European Physical Journal A.M Sengupta and P. P. Mitra, Phys Rev E 60 3389 (1999)

You might also like