You are on page 1of 4

Second order code

Page 1 of 4

Code for the estimation of Scaling Exponents
The Matlab (and C) routines available here allow the analysis of diverse scaling processes or time series, including Long-Range Dependent (LRD), SelfSimilar (SS), and Multiscaling (MS) processes such as Multifractals (MF). The methods are wavelet based and have been developed jointly with Patrice Abry and colleagues. The routines are entirely self contained and do not require any special Matlab toolboxes, in particular the wavelet toolkit is not used. On this page the analysis of second order scaling is given, whereas more general scaling is discussed on the MultiScaling page. As the second order case is simpler and better known, some refinements are possible which are not available with the more general MS tools, which deal with the scaling structure across moments of allorders. Second Order Tools The Matlab routines available here generate the Logscale Diagram, a wavelet analysis framework for scaling processes or time series, which in particular allows the estimation of their key parameter, the scaling exponent. The two most important cases are: Long-Range Dependent (LRD), and Self-Similar (SS) processes. Measurement of these, and other kinds of scaling, are implemented by exactly the same routines and procedure. The only difference is in the final interpretation of the resulting estimate, as explained below. Note that only the second order Logscale Diagram, and therefore second order statistics, are analysed by the routines here. Although these can be validly applied to any kind of scaling process, to examine the greater richness of scaling of Multifractals and other MultiScaling processes, higher order moments must be analysed also (see the MultiScaling page). The routines implement fully the wavelet based joint estimation of LRD described in detail in A wavelet based joint estimator for the parameters of LRD. Further papers relating to the estimator and its use can be found on my homepage. In addition the special prefiltering required in the analysis of intrinsically discrete time series (such as fGn and the well known fARIMA processes), without which errors are make on the analysis at small scales (full paper), is integrated into the routines as an option. In addition a robust routine is provided to enable the automatic selection of the lower cutoff scale of a scaling range, given the upper cutoff scale. The fundamental issues involved in the definition and selection of such a scale is discussed in this book chapter and full details are in On the Automatic Selection of the Onset of Scaling. On a slightly more general note, a useful tool in the investigation of non-stationarity is also provided. It incorporates many of the wavelet tools and scaling results such as LRD confidence intervals for mean estimates, and the statistical test for the constancy in time of scaling exponents, as described in the extended writeup. Finally, an On-Line version of the main estimation tool is also provided. This allows data to be `piped' into the wavelet filter bank, effectively allowing wavelet analysis to be performed on data of arbitrary length. This page is only intended to make the routines available and to give some basic instructions in their use and interpretation, not to provide detailed documentation or a comprehensive software support service! News of any errors found however will be gratefully received. Note that quite comprehensive comments are given at the beginning of each matlab file which answer most questions not covered below. Note that the routines are self contained and do not require any special matlab toolboxes, in particular the wavelet toolkit is not used. The Code The tarred and gzip-ed Matlab code is here. For users who have downloaded a previous release and wish to obtain the new functions, it is strongly recommended that the old versions be deleted and the entire set downloaded again, to ensure compatibility. For Unix users, after downloading first perform: > gunzip LDestimate_code.tar to decompress the file, followed by > tar -xvf LDestimate_code.tar to unwrap the functions. A directory called "Secondorder" will be created inside "Wavelet_Estimation_Tools", containing 19 `file.m' matlab files, each with a single Matlab function, and a single test data file, `fgn8.dat'. This data file, a row vector, is a sample path of fractional Gaussian noise generated by a spectral method, with H=0.8 (alpha = 0.6). Its analysis by "LDestimate" appears in the figures below. The on-line code is written in C and is entirely independent from the Matlab code. It was developed by Matthew Roughan, Alex Stiessel and Darryl Veitch. After unpacking a subdirectory "On-Line" will appear inside "Wavelet_Estimation_Tools" with 6 files. For windows users, try using winzip to unpack the files. Copyright, conditions of use The code is made freely available for non-commercial use only, specifically for research and teaching purposes, provided that the copyright headers at the head of each file are not removed, and suitable citation is made in published work using the routine(s). No responsibility is taken for any errors in the code. The headers in the files contain copyright dates and authors on an individual basis. In addition there is a copyright on the collection as made available here, dated 3 June 2001.

Theoretical Notes
LRD: The definition of LRD we use is the divergence of the spectrum at the origin of a stationary stochastic process with finite second moments: f(v) ~ cf |v|^(-alpha), v -> 0 . In this definition there are two parameters: (alpha,cf). Alpha is the dimensionless scaling exponent, and is the most important parameter describing the qualitative nature of the scaling. cf has the dimensions of variance and describes the quantitative aspects, or `size' of the LRD. As an example of the importance of cf, confidence intervals of mean estimates of LRD data are essentially proportional to the square root of cf! For particular families of processes there may be fixed relationships between alpha, cf, and other quantities, for instance the variance of the process, however in principle these parameters are all independent and for empirical data must be estimated as such. Alpha takes values in [0,1). For our purposes here a short range dependent process is simply a stationary process which is not long range dependent. Such a process has alpha=0 at large scales, corresponding to white noise at scales beyond the characteristic scale(s). (note however that this definition is inadequate for processes with negative alpha, such as fGn with H<1/2, which are not LRD but are not `classical SRD' either). The estimator is semi-parametric, so prior to estimation an analysis phase is necessary to determine the lower cutoff scale at which the LRD `begins', and to see if LRD is present at all. To do this one looks for alignment in the Logscale Diagram, an example is given below for the test data set supplied. The Logscale Diagram is essentially a log-log plot of variance estimates of the wavelet details at each scale, against scale, complete with confidence intervals about these estimates at each scale. It can be thought of as a spectral estimator where large scale corresponds to low frequency. The functions provided generate the Logscale Diagram and allow different scaling ranges to be tried. Details of the reading of Logscale Diagrams can be found in this review chapter, though some details are given below. Cf takes positive real values. LRD implies that the sum over all correlations is large (in fact infinite), but individually their sizes at large lag is controlled by cf, and can be arbitrarily small. The value of cf therefore has a large impact. For example, confidence intervals around mean estimates of LRD data are essentially proportional to the square root of cf.


even if the exponent values were the same. The errors due to not initialising are concentrated on the first two octaves. limiting j2. that is it examines only first and second order statistics. In practice. The slope of the straight line (red) over the range of scales selected. All self similar processes are non-stationary. the slope alpha must be transformed as H= (alpha+1)/2.8. The comments above on the use of the Logscale Diagram for LRD processes hold true for SS processes. Such a simple situation leads to a relatively simple inference problem for which the optimal test (Uniformly Most Powerful Invariant test) is known. Example: LDestimate output for the test data set. and we recommend that H be reserved for H-SS processes Also. analysed with the call: LDestimate(fgn8. where the LRD `begins'. It is however possible to use the DWT to study the second order properties of discrete series directly in a rigorous way. Gaussian. By extension `Hurst parameter values' of LRD processes are often quoted.9. Stationarity The tool is concerned with second order or `wide sense' stationarity. dealing with fine time structure. cf.1) of LRD: otherwise the LRD CI's don't apply and are set equal to the IID ones. we see the goodness of fit statistic Q(j1) as a function of j1 improve dramatically at j1=2. Automatic selection of lower cutoff scale The range of scales (j1. as they decrease at a slower rate than 1/n . The LRD confidence interval is a function of both alpha and cf. a>0 . and for SS processes j1=-infty and j2 is infinite. this is thereby recommended as the beginning of the scaling region. Note: Although strictly speaking the Hurst parameter characterises exactly self-similar processes (H-SS processes). Its increment process is the stationary Fractional Gaussian Noise (fGn) which is LRD with alpha=2H-1. http://www. far higher weight is given to the smaller scales. one does not normally need to perform it. The practical step needed is an initial discrete prefiltering of the data to obtain the correct approximation sequence which initialises the fast recursive algorithm calculating the DWT. Variance. Care should be taken to avoid confusion. Because of the dyadic nature of the wavelet basis. against j1. the method is robust as it is based on this `huge improvement' phenomenon. there are ~ half as many points at scale j+1 than there were at the finer scale j. The Hurst parameter H is the scaling parameter of self-similarity. the CI's will become so large that the test will always accept the null hypothesis.html 10/3/2011 . `Fractal processes'. as can be seen below for the data set provided. alpha.3. there are always limits on the range: the data only has a finite length. must be chosen. More precisely.1. j1*. not just the second order properties. yields an accurate estimate for the LRD exponent. Initialisation for discrete time series The Discrete Wavelet Transform on which the routines here are based. and both are calculated for the block in question. shows huge improvements as j1 increases.unimelb. except for nearly perfect scaling processes such as fGn and fARIMA0d0. and also that the estimates are made over a common scaling range (otherwise the scaling would be different. thus. could have j2 finite but j1 = -infinity.the power has become too low.. confidence intervals in the LD increase by ~root 2 for each increase in j (see figure).1.). any reasonable goodness of fit measure will plummet. and thus j=1 for the finest wavelet detail coefficients series. See the note on this below concerning the output of "LDestimate". see the document for details. This test relies on several advantageous properties of the estimator. Thus m must be large enough to resolve any variations in alpha that may be present. It is common practice however to label fGn by the H from the FBM from which it came. One outcome of this is that if one chooses a j1 which is too small.2.1). the goodness of fit measure with j2 fixed. The basic approach is to split the data into m equally spaced blocks. If m is too large. the statistics of the process X are equivalent to those of a^(-H) X(at).j2) over which a scaling phenomenon exists varies. and j1 corresponds to the finest resolution which is conventionally set to j=0 for the series. and with a known variance. when it stabilises. namely: that estimates are nearly unbiased. Even if the goodness of fit measure is based on Gaussianity. Thus in the weighted regression.cubinlab. Over each block some or all of the following quantities are examined: Mean.Second order code Page 2 of 4 Self-Similarity (SS) Exact Self-Similarity is a very strong statistical scale invariance property that holds for all of the distributions and moments of the process.. The latter are often far larger. This can be used as the basis of selecting the optimal j1 value. and that correlations between estimates over adjacent blocks are very low. The other feature is the incorporation of the test for the constancy of alpha. using weighted regression. is defined on continuous time data only. until the alignment region is reached. but is not itself stationary. like the continuous wavelet transform. The only difference is that to obtain H. For example Fractional Brownian Motion is a H-SS process which has stationary too large an m means that the scales over which the scaling exists are no longer present. The first nice feature is the confidence intervals on the mean estimates: theoretical asymptotic CI's are displayed based both on IID assumptions (blue). it has become common practice to use the DWT to analyse discrete time data! without asking what this might mean or what errors may be incurred. and LRD assumptions (red). for example this can occur with LRD as the largest scales are lost. a sample path of fractional Gaussian Noise with Hurst parameter of H=0. provided alpha is in the range (0. but j1. Thus for LRD series. The test assumes that over each block the estimate means something. described in detail here. it is often also used to describe the LRD of the `derivatives' of such processes. For LRD j2 is infinite. Thus a graph of Q(j1). as described in more detail below. Thus in practice an experimental procedure must be followed which involves combining test results for different m values. In the left hand plot.

to H. alpha is the relevant parameter directly. and the range (j1. and finally performs the weighted regression through a call to "regrescomp" leading to the estimates and the plotting of the Logscale Diagram itself. The filters have been precalculated for Daubechies wavelets 1 to 10. as well as the critical level corresponding to the data. A value greater than say 0. For example this should be used when analysing the fgn8. and therefore which parameter is appropriate. To run it on the test data. it is merely a convention to rewrite alpha in this way for LRD processes! Both `H' s are outputted. or the fractal dimension of the sample path (valid only if Gaussian) D.m . and (b) eliminate or decrease the influence of deterministic trends (such as linear trends or mean level variations) which may be present (see here and here). but these can then be changed interactively. the function outputs the estimate of the slope `alpha'. and the time series itself can be displayed. except that the filter is truncated if necessary to ensure that no more than one eigth of the data is lost. The vector (one per j2 value selected) of j1* values is conveniently returned by "LDestimate". and is plotted in the title. Other examples are there.0000001 or worse! In this case one shouldn't take the confidence intervals literally.unimelb. For convenience. The vector of j2 values can be changed by editing the call to "newchoosej1" in "LDestimate". Pre-filtering always results in an effective loss of data. and are stored in the function initfilterDaub_DWT_discrete.cubinlab. (Initial values must be given in the argument list to "LDestimate". which is robust to departures from Gaussianity. In either case one increases N until a stable Logscale Diagram is seen. But either way the j1* is not actually used in the LD calculated by "LDestimate".j2). including one which covers all possible j2 values. such as the Hurst parameter H. This can also be done effectively using the choosej1 function described below. though this may not be seen in real data. corresponding to LRD. An internal variable "wantH" can be changed to set the scaling exponent output from being expressed as alpha. Experimentation with the number of vanishing moments N of the wavelet is first needed to (a) ensure that the wavelet details are well defined (as H increases higher values will be needed to ensure this.3). Over each block some or all of the following quantities are examined: Mean. Hence. and H (the self-similarity parameter) makes no sense if alpha<1 . at the far left of the plot the sample value and CI is given calculated over the ENtire series.) For each alignment range chosen. help determine which kind of scaling is present. cf.dat test series supplied. due to 'edge effects'. but the program could easily be modified to read them more conveniently.) On the other hand alignment at all (or almost all) scales with alpha>1 suggest SS.m Different kinds of scaling are possible. the other functions support it. and of the details of the output. Notes on the parameters needed for "LDestimate". where alignment (a straight line) is Judgement by eye of the alignment region (scaling range) is difficult. like 0. Q is the probability of observing the data given that the expectations of the variance estimates at each scale really do follow the defining linear form of linear regression.m .Second order code Page 3 of 4 Notes for use Overview The main function is "LDestimate" (LD for Logscale Diagram). alpha. See the comments in the head of the file. on a Unix system one can simply pipe it in: LDestimate < fgn8. commented out. as in the figure above.05 is acceptable. After unpacking the functions. and the Q values can only be used in a relative way. just type make to compile (with gcc) the top level function (For LRD of course j2 should always be the largest available given the length of data. but the output could be trapped and stored with little effort. Examples of use are also given. which takes real values. NOTE: in the case of LRD. A warning: an advantage of the wavelet method is that estimates at different scales are almost decorrelated from each other. "LDestimate" calls "initDWT_discrete" if requested to perform the initialisation for discrete data. A vector of j2 values can be inputted. and upper cutoff j2. but this H is not the H of strict selfsimilarity.1) with alignment at large scales suggests LRD. the constancy test is automatically applied.the statistic Q is a better guide. Stationarity Tool: eda_staty. so the latter has an input parameter to allow this if desired. On-Line C code: LDestimate. N=1 is sufficient for LRD). For example D makes no sense unless alpha is in (1. A goodness of fit statistic `Q' is outputted to help with the choice of scaling range.the optimal filter length is likely to be longer than the longest available and the automated choice will correctly use the entire stored filter. This variation is natural and desirable and should not interpreted as a lack of alignment. If the data is highly non-Gaussian then the confidence intervals both for the points in the Logscale Diagram and for the final estimates may be quite a bit smaller than they should be. and examined to find a lower cutoff scale j1. Wavelet and pre-filter coefficients for Daubechies wavelets Daubechies1 (Haar) to Daubechies10 are supplied. If alpha is selected. For each statistic selected. alpha is transformed into values of related parameters.dat Currently all of the input parameters are hardwired. It is up to the user to determine which kind of scaling is present. then "wtspec" which performs the discrete wavelet decomposition (DWT) using Daubechies wavelets and estimates the variance of the wavelet details at each scale (until there are no points left). A future version will provide greater compare different choices of scaling range.m The function "newchoosej1" takes the information of a LD and calculates the goodness of fit Q(j1) values. The Q value will also then be ridiculously small.m The third last argument to "LDestimate" should be 1 if the initialisation for discrete series is desired. Variance. as the confidence intervals become very small at small scales. Correct Initialisation for discrete data: initDWT_discrete. Nothing is currently and the outcome.m (incorporating the constancy test for alpha) The basic approach is to split the data into m equally spaced blocks.however this takes a little longer to run. for example a value in (0. The value of alpha. they are always different.c The on-line code has very little documentation at present. These cutoffs should be experimented with to find a range where the regression fits the confidence intervals plotted on the Logscale Diagram well. however the analysis procedure is the same in each case. expressed as a percentage (100*(1-significance level of data)). It is convenient and natural to call "newchoosej1" from within "LDestimate". the function is 'shipped' with j2 set to the largest possible. Currently the length of the filter used is chosen automatically by the following rule: All of the stored filter is used. is plotted over the graph. then passes them to "method6" which applied the heuristic to return the optimal value j1* . the Hurst parameter of the http://www. A filter length can be chosen directly if desired by changing the "filterlength" variable in the call to "initDWT_discrete" in "LDestimate". in the Logscale Diagram they will be seen to varying considerably about the regression line (particularly for small alpha due to the automatic rescaling of the plot). however this is sometimes rewritten as an 'H'. each selected by a logical input variable. First the Logscale Diagram is generated. Automatic lower scale chooser: newchoosej1. can be found in the comment block at the beginning of the file LDestimate. j1* values for each are returned. Scale Parameter Estimation: LDestimate.html 10/3/2011 . however this will not be necessary if the data set is very long.

cubinlab. long time series are thinned before display. Last modified: May 18 15:33:41 2007 http://www.unimelb.html 10/3/2011 .ee.Second order code Page 4 of 4 corresponding integrated process (alpha = 2H -1). Similarly to "LDestimate". To save time and the input variable "discrete_init" can be set to allow the special initialisation for discrete series to be performed.