You are on page 1of 15

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/262049620

AUTOMATIC BOUNDARY CORRECTION USING EMPIRICAL MODE


DECOMPOSITION AND LOCAL LINEAR REGRESSION

Article · March 2014

CITATIONS READS

5 251

3 authors:

Abobaker.M. Jaber Mohd Tahir Ismail


University of Benghazi Universiti Sains Malaysia
11 PUBLICATIONS   20 CITATIONS    154 PUBLICATIONS   391 CITATIONS   

SEE PROFILE SEE PROFILE

Alsaidi Altaher
Sebha University
10 PUBLICATIONS   36 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Spatial Panel Models View project

Wilk’s lambda based on robust method View project

All content following this page was uploaded by Abobaker.M. Jaber on 06 May 2014.

The user has requested enhancement of the downloaded file.


Far East Journal of Mathematical Sciences (FJMS)
© 2013 Pushpa Publishing House, Allahabad, India
Published Online: March 2014
Available online at http://pphmj.com/journals/fjms.htm
Special Volume 2013, Part VI, Pages 563-576
(Devoted to articles on Comput. Sci., Info. Sci., Financial Manag. & Biol. Sci.)

AUTOMATIC BOUNDARY CORRECTION USING


EMPIRICAL MODE DECOMPOSITION AND
LOCAL LINEAR REGRESSION

Abobaker M. Jaber, Mohd Tahir Ismail and Alsaidi M. Altaher

School of Mathematical Sciences


Universiti Sains Malaysia
11800 Penang, Malaysia
e-mail: jaber3t@yahoo.co.uk
mtahir@cs.usm.my
assaedi76@yahoo.com

Abstract

This study proposes a new two-stage technique called empirical mode


decomposition (EMD) with local linear (LL) regression for boundary
adjustment in EMD. With the advantages of LL regression, the
proposed EMD-LL is highly robust in the presence of the boundary
problem. Detailed experiments are implemented to compare the EMD-
LL with the classical EMD. The proposed method is more successful
in reducing the boundary effects compared with the classical EMD
method.

1. Introduction

Consider the regression model is given by:


Received: August 16, 2013; Revised: October 22, 2013; Accepted: November 1, 2013
2010 Mathematics Subject Classification: 62M10.
Keywords and phrases: empirical mode decomposition, local linear regression, bandwidth
selection, boundary problem.
564 Abobaker M. Jaber, Mohd Tahir Ismail and Alsaidi M. Altaher

i
yi = f   + εi , i = 1, ..., n. (1)
n
 

Huang et al. [8] proposed a new nonparametric method known as


empirical mode decomposition (EMD). EMD demonstrates great efficiency
in handling unstable data behavior. However, this method suffers from the
boundary problem (end or edge effect) (Huang et al. [8]). When using
the classical EMD, the boundary effects are considered by imposing on f
some boundary assumptions (conditions), such as symmetry (mirror) and
periodicity. Unfortunately, such assumptions are not always met. In some
cases, the problem continues to exist. In this study, we employ local
polynomial regression to EMD as a primary step to overcome the boundary
problem. One advantage of local linear (LL) regression over other
nonparametric smoothers is its relatively good performance near the
boundary (Fan [5]). Instead of using classical boundary solutions, such as
the characteristic period extending method, symmetric (mirror) extending
method, and similarity searching method, this study utilizes the advantages
of LL regression to reduce the boundary effects of EMD. The proposed
method involves two stages. At the first stage, LL regression is applied to the
corrupted noisy data. Then, the remaining series is expected to be hidden in
the residuals. At the second stage, EMD is applied to the residuals. The final
estimate is the summation of the fitting estimates from the LL regression and
EMD. However, such combination is previously used to eliminate the
boundary effects present in wavelet regression; see Oh et al. [14], Oh and
Lee [13], and Altaher and Ismail [1]. Therefore, the proposed EMD-LL
method is expected to demonstrate good estimation performance compared
with the classical EMD.

The rest of this paper is organized as follows: Section 2 provides a brief


background on EMD, LL regression, and several classical solutions to the
boundary problem of EMD. Section 3 introduces the proposed method.
Section 4 discusses the simulation study and its results. Section 5 presents the
conclusion.
Automatic Boundary Correction … 565

2. Literature Review

2.1. Empirical mode decomposition


EMD proposed by Huang et al. [8] is a natural extension of and an
alternative to traditional methods for analyzing non-linear and non-stationary
signals, such as wavelet methods, Fourier methods, and empirical orthogonal
functions (Blakely [3]). The main objective of EMD is to decompose data
( yt ) into small signals called intrinsic mode functions (IMFs). An IMF is a
function whose upper and lower envelopes are symmetric, and the number of
zero crossings and the number of extrema are equal or differ at most by one
(Amine and El Abidine Guennoun [2]). The algorithm for the extraction
of IMFs for a given time series yt is called shifting and consists of the
following steps:

I. Set initial estimates for the residue; for example, r0 (t ) = yt . Set


g 0 (t ) = rk −1 (t ) and i = 1 as well as the index of IMF k = 1.

II. Construct the lower minima I mini −1 and I maxi −1 envelopes of the
signal using the cubic spline method.

III. Compute the mean values (mi ) by averaging the upper envelope and
the lower envelope; for example, mi −1 = [I maxi −1 + I mini −1 ] 2.

IV. Subtract the mean from the original signal g i = g i −1 − mi −1 and


i = i + 1. Repeat steps (II) to (IV) until g i is an IMF. If so, the kth
IMF is given by IMFK = g i .

V. Update residue rk (t ) = rk −1 (n ) − IMFK . This residual component is


treated as new data and subjected to the abovementioned process to
calculate the next IMFK +1.

VI. Repeat the abovementioned steps until the final residual component
r ( x ) becomes a monotonic function. Consider the final estimation of
residue rˆ( x ).
566 Abobaker M. Jaber, Mohd Tahir Ismail and Alsaidi M. Altaher

Numerous methods have been employed to extract series trend. Freehand


and least square methods are the commonly used techniques, but the former
depends on user experience, whereas the latter is difficult to use when the
original series is extremely irregular (Liu et al. [10]; Fan et al. [7]). EMD is
also an effective method for trend extraction (Deng et al. [4]).

2.2. Local polynomial regression


Local polynomial regression was initially proposed by Stone [16]. The
present work presents local polynomial regression by Fan and Gijbels [5, 6].
Suppose that ( X i , Yi ) , i = 1, ..., n are bivariate random samples from

equation (1), where ε i is a random error with zero mean, and σ 2 is the
variance. Assume that the unknown mean function m( x ) is smooth, and that
the ( p + 1) th derivative of m( x ) exists at point x0 . We may approximate
m( x ) locally using the polynomial of order p. That is, for x in neighborhood
of x0 , we obtain the following through Taylor expansion:

m( p ) ( x0 )
m( x ) ≈ m( x0 ) + m′( x0 ) ( x − x0 ) + " + ( x − x0 ) p .
p!

Locally at point x0 , we can then consider a weighted polynomial regression,


which involves the minimization of

2
n  p 
 
∑ Y
 i − ∑ β j ( X i − x0 ) j  K h ( X i − x0 ) ,
1 
 j =0 

where h is the bandwidth that controls neighborhood size, and K h (⋅) =


K (⋅ h ) h with K (⋅) is a kernel function that assigns weights to the data
points depending on their distance from the center of the neighborhood. K (⋅)
is usually assumed as a symmetric probability density function. An LL
estimator is obtained when the order p = 1; additional details can be found
in Fan and Gijbels [6].
Automatic Boundary Correction … 567

2.2.1. Bandwidth selection

The practical performance of any nonparametric regression technique


relies heavily on the smoothing parameters (bandwidth). Bandwidth selection
is an important aspect of any nonparametric estimation. It may be equivalent
to the determination of the required width of the local neighborhood to
ensure good local approximation (Mami [12]). Various methods for selecting
the optimal bandwidth are available in the literature. Several of the well-
known methods are classical and data-driven methods.

2.2.2. Direct plug-in bandwidth selection

In this study, we adopt a data-driven bandwidth selection method


referred to as the plug-in bandwidth selection for the case of the LL kernel
estimator:

15


hMISE = c1( k ) 
s∫v( x ) dx 

,
( 2) 

2
 s m ( x ) f ( x ) dxn 
 

where c1 (k ) is a pure constant depending only on the kernel function and


on the order of polynomial p. The other unknown quantity in the optimal
2
bandwidth formula is ∫ v v( x ) dx = (a − b ) σ (Ruppert et al. [15]).

2.3. Some boundary solutions in EMD

Several studies have attempted to solve the boundary problem of EMD.


Wu and Riemenschneider [17] proposed the use of a ratio extension on the
boundary instead of the traditional mirror extension. Deng et al. [4] applied
neural network to each IMF to restrain the end effect. Liu [11] provided an
algorithm based on the sigma-pi neural network that is used to extend signals
before applying EMD. Lin et al. [9] proposed a new approach that combines
mirror expansion with the extrapolation prediction of regression function to
solve the boundary problem. The algorithm includes the following two steps:
568 Abobaker M. Jaber, Mohd Tahir Ismail and Alsaidi M. Altaher

the extrapolation of the signal through support vector regression at both


endpoints to form the primary expansion signal and the expansion of the
primary signal through extrema mirror expansion. EMD is performed on the
resulting signal to obtain reduced end limitations.

3. Proposed Method

This section introduces in detail the proposed method that is a


combination of EMD and local polynomial regression (EMD-LL). The basic
idea behind the proposed method is to estimate the underlining function f
with the sum of a set of EMD functions f EMD and LL regression f LL . That
is,

fˆLL.EMD = fˆEMD + fˆLL .

To obtain an estimated fˆEMD.LL , we must estimate two components,


namely, f EMD and f LL . The estimation process is as follows:

1. Apply LL regression to the corrupted noisy data yi . Find the trend

estimate fˆ .
LL

2. Find the residuals ei from LL; that is, ei = yi − fˆLL .

3. As the remaining series is expected to be hidden in the residuals ei ,


apply EMD to ei as follows:

I. Set initial estimates for the residue; for example, initialize


r0 (t ) = e, and set g 0 (t ) = rk −1(t ) and i = 1; the index of IMF is
k = 1.

II. Construct the lower minima I mini −1 and I maxi −1 envelopes of the

signal using the cubic spline method.

III. Calculate the mean values by averaging the upper envelope and the
lower envelope. Set mi −1 = [I maxi −1 + I mini −1 ] 2.
Automatic Boundary Correction … 569

IV. Subtract the mean from the original signal g i = g i −1 − mi −1 and


i = i + 1. Repeat steps (II) to (IV) until g i is an IMF. If so, the kth
is given by IMFK = g i .

V. Update residue rk ( x ) = rk −1 ( n ) − IMFK . This residual component


is treated as new data and subjected to the abovementioned process
to calculate the next IMFK +1.

VI. Repeat the aforementioned steps until the final residual component
r ( x ) becomes a monotonic function. Consider the final estimation
of residue rˆ( x ).

4. The final estimate is the summation of the fitting estimates from the
LL regression and EMD:

fˆLL.EMD = fˆEMD + rˆ(t ).

To ensure the good performance of the proposed method, the bandwidth


parameter should be selected with extreme care using an efficient bandwidth
criterion, such as the direct plug-in method.

4. Simulation Study

Simulation is a technique for guiding the experiments of a model.


Compared with analytical methods, simulation is easily understandable and
highly realistic but only when used correctly. In this study, all computations
and graphics were conveyed using the software package R.

A simulation study was conducted to evaluate and compare the two


methods, namely, the classical EMD by Huang et al. [8] and the proposed
EMD-LL. Eight test functions with different features were proposed
(Table 1).
570 Abobaker M. Jaber, Mohd Tahir Ismail and Alsaidi M. Altaher

Table 1. Formula of test functions used in simulation

f 1 = sin (πx ) + sin (2πx ) + sin (6πx ) + 0.5 x

f 2 = sin (πx ) − sin (2πx ) + sin (6πx ) + 0.5 x

f 3 = sin (πx ) − sin (2πx ) − sin (6πx ) + 0.5 x

f 4 = sin ( πx3 ) + sin (2πx 2 ) + sin (6πx ) + 0.5 x


2
f 5 = 2 − 5 x + 5e( −500( x − 0.5) )

10e −10 x + 2 if x ≤ 0.5


f6 = 
3 cos(10πx ) 0.5 < x < 1
2
f 7 = 5e ( −100( x − 0.5) ) + 5e ( −10 x )

cos(100πx ) 1 2
if ≤ x≤
f 8 = 4 (1 + e ( − 6.4 x + 3.2 ) ) +  3 3
0 o.w

For each function in each case, 1000 replications of sample size n = 100
were carried out. Approximately 10 observations were generated at each
boundary side (τ = 10). The mean squared error (MSE) was used as a
numerical measure for assessing the quality of the estimates. The MSE was
calculated for those observations that were mostly 10 sample points away
from the boundaries of the test functions:

MSE∆ ( fˆ ) =
1
2∆ ∑ { f ( xi ) − fˆ ( xi )}2 , (∆ = 1, 2, ..., [n 2]; xi = i n ) ,
i∈N ( ∆ )

where N (∆ ) = {1, ..., ∆, n − ∆ + 1, ..., n}.

Tables 2-7 report the numerical results for the classical EMD with
respect to the proposed method. MSE1 represents classical EMD, and MSE2
represents the proposed method. Three random errors εi ’s are drawn from
the following distributions, respectively:
Automatic Boundary Correction … 571

a. The normal distribution with zero mean and unity variance.

b. Correlated noise from first-order auto-regressive (AR) model (1)


with parameter (0.5).

c. Heavy tail noise from t distribution with three degrees of freedom.

Figure 1. The eight test functions used in simulation.

4.1. Results

Regardless of the boundary assumptions, test functions, and noise


structures, the proposed method is constantly superior to the classical EMD
under periodic and symmetric conditions. The following tables summarize
the results.
572 Abobaker M. Jaber, Mohd Tahir Ismail and Alsaidi M. Altaher

Table 2. MSEs of classical EMD under periodic boundary conditions with


respect to the proposed method when the error follows a normal distribution
with zero mean and unity variance. Sample size = 100, and bandwidth is
selected for each simulation run using plug-in direction (dpill)
Test function Classical empirical mode Combined local linear regression
decomposition and empirical mode decomposition
1 0.61613559 0.02954017
2 1.40568406 0.03009826
3 0.56671031 0.03116958
4 1.12892732 0.03196148
5 0.80624152 0.03030510
6 7.12565735 0.08911224
7 1.62367717 0.02680036
8 0.03286615 0.01605067

Table 3. MSEs of classical EMD under periodic boundary conditions with


respect to the proposed method when the error follows a heavy tail noise
from t distribution with three degrees of freedom. Sample size = 100, and
bandwidth is selected for each simulation run using plug-in direction (dpill)
Test function Classical empirical mode Combined local linear regression
decomposition and empirical mode decomposition
1 0.62037024 0.03412744
2 1.45878570 0.04321219
3 0.57069504 0.03326110
4 1.16887311 0.03321835
5 0.86660865 0.03233413
6 7.01218410 0.09135692
7 1.71649813 0.03304281
8 0.03549267 0.02097989
Automatic Boundary Correction … 573

Table 4. MSEs of classical EMD under periodic boundary conditions with


respect to the proposed method when the error follows correlated noise from
first-order AR model (1) with parameter (0.5). Sample size = 100, and
bandwidth is selected for each simulation run using plug-in direction (dpill)
Test function Classical empirical mode Combined local linear regression
decomposition and empirical mode decomposition
1 0.63136138 0.07736668
2 1.45512953 0.07623352
3 0.59694980 0.07184908
4 1.22839379 0.07913441
5 0.9363689 0.07744646
6 6.72913451 0.13978112
7 1.87913454 0.08180875
8 0.07877662 0.04823973

Table 5. MSEs of classical EMD under symmetric (mirror) conditions with


respect to the proposed method when the error follows a normal distribution
with zero mean and unity variance. Sample size = 100, and bandwidth is
selected for each simulation run using plug-in direction (dpill)
Test function Classical empirical mode Combined local linear regression
decomposition and empirical mode decomposition
1 0.5910341 0.02434128
2 0.8589354 0.02377881
3 0.5138173 0.02484214
4 0.7414731 0.02492693
5 1.825077 0.03626081
6 8.901597 0.06692215
7 4.78779 0.02604103
8 0.01480898 0.01137586
574 Abobaker M. Jaber, Mohd Tahir Ismail and Alsaidi M. Altaher

Table 6. MSEs of classical EMD under symmetric (mirror) conditions with


respect to the proposed method when the error follows a t distribution with
3df. Sample size = 100, and bandwidth is selected for each simulation run
using plug-in direction (dpill)
Test function Classical empirical mode Combined local linear regression
decomposition and empirical mode decomposition
1 0.5881285 0.02425279
2 0.9172949 0.02336476
3 0.5162273 0.03121375
4 0.7889681 0.02383214
5 1.550273 0.09336214
6 8.82306 0.06529681
7 4.540607 0.02456862
8 0.01874432 0.01374723
Table 7. MSEs of classical EMD under symmetric (mirror) boundary
conditions with respect to the proposed method when the error follows the
correlated noise from the first-order AR model (1) with parameter (0.5).
Sample size = 100, and bandwidth is selected for each simulation run using
plug-in direction (dpill)
Test function Classical empirical mode Combined local linear regression
decomposition and empirical mode decomposition
1 0.6008803 0.07351091
2 1.008095 0.07283503
3 0.5245122 0.07184672
4 0.8066677 0.07478093
5 1.374693 0.09225864
6 8.701722 0.1163033
7 4.565272 0.07638075
8 0.04763058 0.03940312
Automatic Boundary Correction … 575

5. Conclusion

This study proposed a method that combines LL regression at the first


stage and the classical EMD at the second stage. Results confirm the good
performance of the proposed method compared with the classical boundary
adjustment of the classical boundary solution, such as symmetric and
periodic conditions. The empirical performance of the proposed method in
eliminating the boundary effects of EMD was tested through different
numerical experiments involving simulation. Results from these experiments
illustrate the improvement of EMD estimation in terms of MSE.

References

[1] Alsaidi M. Altaher and Mohd Tahir Ismail, Robust estimation for boundary
correction in wavelet regression, J. Stat. Comput. Simul. 82(10) (2012),
1531-1544.
[2] Amar Amine and Zine El Abidine Guennoun, Contribution of wavelet
transformation and empirical mode decomposition to measurement of US core
inflation, Appl. Math. Sci. 6(135) (2012), 6739-6752.
[3] Christopher D. Blakely, A fast empirical mode decomposition technique for
nonstationary nonlinear time series, Preprint submitted to Elsevier Science, 2005,
p. 3.
[4] Yongjun Deng, Wei Wang, Chengchun Qian, Zhong Wang and Dejun Dai,
Boundary-processing-technique in EMD method and Hilbert transform, Chinese
Sci. Bull. 46(11) (2001), 954-960.
[5] Jianqing Fan, Design-adaptive nonparametric regression, J. Amer. Statist. Assoc.
87(420) (1992), 998-1004.
[6] J. Fan and J. Gijbels, Local linear smoothers in regression function estimation,
Institute of Statistics Mimeo Series # 2055, University of North Carolina, Chapel
Hill, 1991.
[7] Y. Fan, J. W. Zhi and S. L. Yuan, Improvement in time-series trend analysis,
Computer Technology and Development 16 (2006), 82-84.
[8] Norden E. Huang, Zheng Shen, Steven R. Long, Manli C. Wu, Hsing H. Shih,
Quanan Zheng, N.-C. Yen, C. C. Tung and Henry H. Liu, The empirical mode
decomposition and the Hilbert spectrum for nonlinear and non-stationary time
series analysis, Proc. R. Soc. Lond. A 454 (1971), 903-995.
576 Abobaker M. Jaber, Mohd Tahir Ismail and Alsaidi M. Altaher
[9] Da-Chao Lin, Zhang-Lin Guo, Feng-Ping An and Fan-Lei Zeng, Elimination of
end effects in empirical mode decomposition by mirror image coupled with
support vector regression, Mechanical Systems and Signal Processing 31 (2012),
13-28.
[10] Hui-ting Liu, Zhi-wei Ni and Jian-yang Li, Extracting trend of time series based
on improved empirical mode decomposition method, Advances in Data and Web
Management, Springer, 2007, pp. 341-349.
[11] Zhuofu Liu, A Novel Boundary Extension Approach for Empirical Mode
Decomposition Intelligent Computing, Springer, 2006, pp. 299-304.
[12] A. Mami, Local polynomial regression with applications to both independent and
longitudinal data, Dept. unpublished Ph.D. of Statistics, University of Manchester,
2002.
[13] Hee-Seok Oh and Thomas Lee, Hybrid local polynomial wavelet shrinkage:
wavelet regression with automatic boundary adjustment, Comput. Statist. Data
Anal. 48(4) (2005), 809-819.
[14] Hee-Seok Oh, Philippe Naveau and Geunghee Lee, Polynomial boundary
treatment for wavelet regression, Biometrika 88(1) (2001), 291-298.
[15] David Ruppert, Simon J. Sheather and Matthew P. Wand, An effective bandwidth
selector for local least squares regression, J. Amer. Statist. Assoc. 90(432) (1995),
1257-1270.
[16] Charles J. Stone, Consistent nonparametric regression, Ann. Statist. 5(4) (1977),
595-620.
[17] Qin Wu and Sherman D. Riemenschneider, Boundary extension and stop criteria
for empirical mode decomposition, Advances in Adaptive Data Analysis 2(02)
(2010), 157-169.

View publication stats

You might also like