13 FJMS SV 2013vi 563 2 PDF

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/262049620
AUTOMATIC BOUNDARY CORRECTION USING EMPIRICAL MODE

DECOMPOSITION AND LOCAL LINEAR REGRESSION
Article · March 2014
CITATIONS READS
5 251
3 authors:
Abobaker.M. Jaber Mohd Tahir Ismail

University of Benghazi Universiti Sains Malaysia
11 PUBLICATIONS 20 CITATIONS 154 PUBLICATIONS 391 CITATIONS
SEE PROFILE SEE PROFILE
Alsaidi Altaher
Sebha University
10 PUBLICATIONS 36 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Spatial Panel Models View project
Wilk’s lambda based on robust method View project
All content following this page was uploaded by Abobaker.M. Jaber on 06 May 2014.
The user has requested enhancement of the downloaded file.

Far East Journal of Mathematical Sciences (FJMS)
© 2013 Pushpa Publishing House, Allahabad, India
Published Online: March 2014
Available online at http://pphmj.com/journals/fjms.htm
Special Volume 2013, Part VI, Pages 563-576
(Devoted to articles on Comput. Sci., Info. Sci., Financial Manag. & Biol. Sci.)
AUTOMATIC BOUNDARY CORRECTION USING

EMPIRICAL MODE DECOMPOSITION AND
LOCAL LINEAR REGRESSION
Abobaker M. Jaber, Mohd Tahir Ismail and Alsaidi M. Altaher
School of Mathematical Sciences

Universiti Sains Malaysia
11800 Penang, Malaysia
e-mail: jaber3t@yahoo.co.uk
mtahir@cs.usm.my
assaedi76@yahoo.com
Abstract
This study proposes a new two-stage technique called empirical mode

decomposition (EMD) with local linear (LL) regression for boundary
adjustment in EMD. With the advantages of LL regression, the
proposed EMD-LL is highly robust in the presence of the boundary
problem. Detailed experiments are implemented to compare the EMD-
LL with the classical EMD. The proposed method is more successful
in reducing the boundary effects compared with the classical EMD
method.
1. Introduction
Consider the regression model is given by:

Received: August 16, 2013; Revised: October 22, 2013; Accepted: November 1, 2013
2010 Mathematics Subject Classification: 62M10.
Keywords and phrases: empirical mode decomposition, local linear regression, bandwidth
selection, boundary problem.
564 Abobaker M. Jaber, Mohd Tahir Ismail and Alsaidi M. Altaher
i
yi = f   + εi , i = 1, ..., n. (1)
n
 
Huang et al. [8] proposed a new nonparametric method known as

empirical mode decomposition (EMD). EMD demonstrates great efficiency
in handling unstable data behavior. However, this method suffers from the
boundary problem (end or edge effect) (Huang et al. [8]). When using
the classical EMD, the boundary effects are considered by imposing on f
some boundary assumptions (conditions), such as symmetry (mirror) and
periodicity. Unfortunately, such assumptions are not always met. In some
cases, the problem continues to exist. In this study, we employ local
polynomial regression to EMD as a primary step to overcome the boundary
problem. One advantage of local linear (LL) regression over other
nonparametric smoothers is its relatively good performance near the
boundary (Fan [5]). Instead of using classical boundary solutions, such as
the characteristic period extending method, symmetric (mirror) extending
method, and similarity searching method, this study utilizes the advantages
of LL regression to reduce the boundary effects of EMD. The proposed
method involves two stages. At the first stage, LL regression is applied to the
corrupted noisy data. Then, the remaining series is expected to be hidden in
the residuals. At the second stage, EMD is applied to the residuals. The final
estimate is the summation of the fitting estimates from the LL regression and
EMD. However, such combination is previously used to eliminate the
boundary effects present in wavelet regression; see Oh et al. [14], Oh and
Lee [13], and Altaher and Ismail [1]. Therefore, the proposed EMD-LL
method is expected to demonstrate good estimation performance compared
with the classical EMD.
The rest of this paper is organized as follows: Section 2 provides a brief

background on EMD, LL regression, and several classical solutions to the
boundary problem of EMD. Section 3 introduces the proposed method.
Section 4 discusses the simulation study and its results. Section 5 presents the
conclusion.
Automatic Boundary Correction … 565
2. Literature Review
2.1. Empirical mode decomposition

EMD proposed by Huang et al. [8] is a natural extension of and an
alternative to traditional methods for analyzing non-linear and non-stationary
signals, such as wavelet methods, Fourier methods, and empirical orthogonal
functions (Blakely [3]). The main objective of EMD is to decompose data
( yt ) into small signals called intrinsic mode functions (IMFs). An IMF is a
function whose upper and lower envelopes are symmetric, and the number of
zero crossings and the number of extrema are equal or differ at most by one
(Amine and El Abidine Guennoun [2]). The algorithm for the extraction
of IMFs for a given time series yt is called shifting and consists of the
following steps:
I. Set initial estimates for the residue; for example, r0 (t ) = yt . Set

g 0 (t ) = rk −1 (t ) and i = 1 as well as the index of IMF k = 1.
II. Construct the lower minima I mini −1 and I maxi −1 envelopes of the
signal using the cubic spline method.
III. Compute the mean values (mi ) by averaging the upper envelope and
the lower envelope; for example, mi −1 = [I maxi −1 + I mini −1 ] 2.
IV. Subtract the mean from the original signal g i = g i −1 − mi −1 and

i = i + 1. Repeat steps (II) to (IV) until g i is an IMF. If so, the kth
IMF is given by IMFK = g i .
V. Update residue rk (t ) = rk −1 (n ) − IMFK . This residual component is

treated as new data and subjected to the abovementioned process to
calculate the next IMFK +1.
VI. Repeat the abovementioned steps until the final residual component
r ( x ) becomes a monotonic function. Consider the final estimation of
residue rˆ( x ).
Numerous methods have been employed to extract series trend. Freehand

and least square methods are the commonly used techniques, but the former
depends on user experience, whereas the latter is difficult to use when the
original series is extremely irregular (Liu et al. [10]; Fan et al. [7]). EMD is
also an effective method for trend extraction (Deng et al. [4]).
2.2. Local polynomial regression

Local polynomial regression was initially proposed by Stone [16]. The
present work presents local polynomial regression by Fan and Gijbels [5, 6].
Suppose that ( X i , Yi ) , i = 1, ..., n are bivariate random samples from
equation (1), where ε i is a random error with zero mean, and σ 2 is the
variance. Assume that the unknown mean function m( x ) is smooth, and that
the ( p + 1) th derivative of m( x ) exists at point x0 . We may approximate
m( x ) locally using the polynomial of order p. That is, for x in neighborhood
of x0 , we obtain the following through Taylor expansion:
m( p ) ( x0 )
m( x ) ≈ m( x0 ) + m′( x0 ) ( x − x0 ) + " + ( x − x0 ) p .
p!
Locally at point x0 , we can then consider a weighted polynomial regression,

which involves the minimization of
2
n  p 
 
∑ Y
 i − ∑ β j ( X i − x0 ) j  K h ( X i − x0 ) ,
1 
 j =0 
where h is the bandwidth that controls neighborhood size, and K h (⋅) =

K (⋅ h ) h with K (⋅) is a kernel function that assigns weights to the data
points depending on their distance from the center of the neighborhood. K (⋅)
is usually assumed as a symmetric probability density function. An LL
estimator is obtained when the order p = 1; additional details can be found
in Fan and Gijbels [6].
2.2.1. Bandwidth selection
The practical performance of any nonparametric regression technique

relies heavily on the smoothing parameters (bandwidth). Bandwidth selection
is an important aspect of any nonparametric estimation. It may be equivalent
to the determination of the required width of the local neighborhood to
ensure good local approximation (Mami [12]). Various methods for selecting
the optimal bandwidth are available in the literature. Several of the well-
known methods are classical and data-driven methods.
2.2.2. Direct plug-in bandwidth selection
In this study, we adopt a data-driven bandwidth selection method

referred to as the plug-in bandwidth selection for the case of the LL kernel
estimator:
15


hMISE = c1( k ) 
s∫v( x ) dx 

,
( 2) 
∫
2
 s m ( x ) f ( x ) dxn 
 
where c1 (k ) is a pure constant depending only on the kernel function and

on the order of polynomial p. The other unknown quantity in the optimal
2
bandwidth formula is ∫ v v( x ) dx = (a − b ) σ (Ruppert et al. [15]).
2.3. Some boundary solutions in EMD
Several studies have attempted to solve the boundary problem of EMD.

Wu and Riemenschneider [17] proposed the use of a ratio extension on the
boundary instead of the traditional mirror extension. Deng et al. [4] applied
neural network to each IMF to restrain the end effect. Liu [11] provided an
algorithm based on the sigma-pi neural network that is used to extend signals
before applying EMD. Lin et al. [9] proposed a new approach that combines
mirror expansion with the extrapolation prediction of regression function to
solve the boundary problem. The algorithm includes the following two steps:
the extrapolation of the signal through support vector regression at both

endpoints to form the primary expansion signal and the expansion of the
primary signal through extrema mirror expansion. EMD is performed on the
resulting signal to obtain reduced end limitations.
3. Proposed Method
This section introduces in detail the proposed method that is a

combination of EMD and local polynomial regression (EMD-LL). The basic
idea behind the proposed method is to estimate the underlining function f
with the sum of a set of EMD functions f EMD and LL regression f LL . That
is,
fˆLL.EMD = fÊMD + fˆLL .
To obtain an estimated fÊMD.LL , we must estimate two components,

namely, f EMD and f LL . The estimation process is as follows:
1. Apply LL regression to the corrupted noisy data yi . Find the trend
estimate fˆ .
LL
2. Find the residuals ei from LL; that is, ei = yi − fˆLL .
3. As the remaining series is expected to be hidden in the residuals ei ,

apply EMD to ei as follows:
I. Set initial estimates for the residue; for example, initialize

r0 (t ) = e, and set g 0 (t ) = rk −1(t ) and i = 1; the index of IMF is
k = 1.
II. Construct the lower minima I mini −1 and I maxi −1 envelopes of the
signal using the cubic spline method.
III. Calculate the mean values by averaging the upper envelope and the
lower envelope. Set mi −1 = [I maxi −1 + I mini −1 ] 2.
IV. Subtract the mean from the original signal g i = g i −1 − mi −1 and

i = i + 1. Repeat steps (II) to (IV) until g i is an IMF. If so, the kth
is given by IMFK = g i .
V. Update residue rk ( x ) = rk −1 ( n ) − IMFK . This residual component

is treated as new data and subjected to the abovementioned process
to calculate the next IMFK +1.
VI. Repeat the aforementioned steps until the final residual component
r ( x ) becomes a monotonic function. Consider the final estimation
of residue rˆ( x ).
4. The final estimate is the summation of the fitting estimates from the
LL regression and EMD:
fˆLL.EMD = fÊMD + rˆ(t ).
To ensure the good performance of the proposed method, the bandwidth

parameter should be selected with extreme care using an efficient bandwidth
criterion, such as the direct plug-in method.
4. Simulation Study
Simulation is a technique for guiding the experiments of a model.

Compared with analytical methods, simulation is easily understandable and
highly realistic but only when used correctly. In this study, all computations
and graphics were conveyed using the software package R.
A simulation study was conducted to evaluate and compare the two

methods, namely, the classical EMD by Huang et al. [8] and the proposed
EMD-LL. Eight test functions with different features were proposed
(Table 1).
Table 1. Formula of test functions used in simulation
f 1 = sin (πx ) + sin (2πx ) + sin (6πx ) + 0.5 x
f 2 = sin (πx ) − sin (2πx ) + sin (6πx ) + 0.5 x
f 3 = sin (πx ) − sin (2πx ) − sin (6πx ) + 0.5 x
f 4 = sin ( πx3 ) + sin (2πx 2 ) + sin (6πx ) + 0.5 x

2
f 5 = 2 − 5 x + 5e( −500( x − 0.5) )
10e −10 x + 2 if x ≤ 0.5

f6 = 
3 cos(10πx ) 0.5 < x < 1
2
f 7 = 5e ( −100( x − 0.5) ) + 5e ( −10 x )
cos(100πx ) 1 2
if ≤ x≤
f 8 = 4 (1 + e ( − 6.4 x + 3.2 ) ) +  3 3
0 o.w
For each function in each case, 1000 replications of sample size n = 100
were carried out. Approximately 10 observations were generated at each
boundary side (τ = 10). The mean squared error (MSE) was used as a
numerical measure for assessing the quality of the estimates. The MSE was
calculated for those observations that were mostly 10 sample points away
from the boundaries of the test functions:
MSE∆ ( fˆ ) =
1
2∆ ∑ { f ( xi ) − fˆ ( xi )}2 , (∆ = 1, 2, ..., [n 2]; xi = i n ) ,
i∈N ( ∆ )
where N (∆ ) = {1, ..., ∆, n − ∆ + 1, ..., n}.
Tables 2-7 report the numerical results for the classical EMD with
respect to the proposed method. MSE1 represents classical EMD, and MSE2
represents the proposed method. Three random errors εi ’s are drawn from
the following distributions, respectively:
a. The normal distribution with zero mean and unity variance.
b. Correlated noise from first-order auto-regressive (AR) model (1)

with parameter (0.5).
c. Heavy tail noise from t distribution with three degrees of freedom.
Figure 1. The eight test functions used in simulation.
4.1. Results
Regardless of the boundary assumptions, test functions, and noise

structures, the proposed method is constantly superior to the classical EMD
under periodic and symmetric conditions. The following tables summarize
the results.
Table 2. MSEs of classical EMD under periodic boundary conditions with

respect to the proposed method when the error follows a normal distribution
with zero mean and unity variance. Sample size = 100, and bandwidth is
selected for each simulation run using plug-in direction (dpill)
Test function Classical empirical mode Combined local linear regression
decomposition and empirical mode decomposition
1 0.61613559 0.02954017
2 1.40568406 0.03009826
3 0.56671031 0.03116958
4 1.12892732 0.03196148
5 0.80624152 0.03030510
6 7.12565735 0.08911224
7 1.62367717 0.02680036
8 0.03286615 0.01605067

respect to the proposed method when the error follows a heavy tail noise
from t distribution with three degrees of freedom. Sample size = 100, and
bandwidth is selected for each simulation run using plug-in direction (dpill)
1 0.62037024 0.03412744
2 1.45878570 0.04321219
3 0.57069504 0.03326110
4 1.16887311 0.03321835
5 0.86660865 0.03233413
6 7.01218410 0.09135692
7 1.71649813 0.03304281
8 0.03549267 0.02097989

respect to the proposed method when the error follows correlated noise from
first-order AR model (1) with parameter (0.5). Sample size = 100, and
bandwidth is selected for each simulation run using plug-in direction (dpill)
1 0.63136138 0.07736668
2 1.45512953 0.07623352
3 0.59694980 0.07184908
4 1.22839379 0.07913441
5 0.9363689 0.07744646
6 6.72913451 0.13978112
7 1.87913454 0.08180875
8 0.07877662 0.04823973
Table 5. MSEs of classical EMD under symmetric (mirror) conditions with

respect to the proposed method when the error follows a normal distribution
with zero mean and unity variance. Sample size = 100, and bandwidth is
selected for each simulation run using plug-in direction (dpill)
1 0.5910341 0.02434128
2 0.8589354 0.02377881
3 0.5138173 0.02484214
4 0.7414731 0.02492693
5 1.825077 0.03626081
6 8.901597 0.06692215
7 4.78779 0.02604103
8 0.01480898 0.01137586
Table 6. MSEs of classical EMD under symmetric (mirror) conditions with

respect to the proposed method when the error follows a t distribution with
3df. Sample size = 100, and bandwidth is selected for each simulation run
using plug-in direction (dpill)
1 0.5881285 0.02425279
2 0.9172949 0.02336476
3 0.5162273 0.03121375
4 0.7889681 0.02383214
5 1.550273 0.09336214
6 8.82306 0.06529681
7 4.540607 0.02456862
8 0.01874432 0.01374723
Table 7. MSEs of classical EMD under symmetric (mirror) boundary
conditions with respect to the proposed method when the error follows the
correlated noise from the first-order AR model (1) with parameter (0.5).
Sample size = 100, and bandwidth is selected for each simulation run using
plug-in direction (dpill)
1 0.6008803 0.07351091
2 1.008095 0.07283503
3 0.5245122 0.07184672
4 0.8066677 0.07478093
5 1.374693 0.09225864
6 8.701722 0.1163033
7 4.565272 0.07638075
8 0.04763058 0.03940312
5. Conclusion
This study proposed a method that combines LL regression at the first

stage and the classical EMD at the second stage. Results confirm the good
performance of the proposed method compared with the classical boundary
adjustment of the classical boundary solution, such as symmetric and
periodic conditions. The empirical performance of the proposed method in
eliminating the boundary effects of EMD was tested through different
numerical experiments involving simulation. Results from these experiments
illustrate the improvement of EMD estimation in terms of MSE.
References
[1] Alsaidi M. Altaher and Mohd Tahir Ismail, Robust estimation for boundary
correction in wavelet regression, J. Stat. Comput. Simul. 82(10) (2012),
1531-1544.
[2] Amar Amine and Zine El Abidine Guennoun, Contribution of wavelet
transformation and empirical mode decomposition to measurement of US core
inflation, Appl. Math. Sci. 6(135) (2012), 6739-6752.
[3] Christopher D. Blakely, A fast empirical mode decomposition technique for
nonstationary nonlinear time series, Preprint submitted to Elsevier Science, 2005,
p. 3.
[4] Yongjun Deng, Wei Wang, Chengchun Qian, Zhong Wang and Dejun Dai,
Boundary-processing-technique in EMD method and Hilbert transform, Chinese
Sci. Bull. 46(11) (2001), 954-960.
[5] Jianqing Fan, Design-adaptive nonparametric regression, J. Amer. Statist. Assoc.
87(420) (1992), 998-1004.
[6] J. Fan and J. Gijbels, Local linear smoothers in regression function estimation,
Institute of Statistics Mimeo Series # 2055, University of North Carolina, Chapel
Hill, 1991.
[7] Y. Fan, J. W. Zhi and S. L. Yuan, Improvement in time-series trend analysis,
Computer Technology and Development 16 (2006), 82-84.
[8] Norden E. Huang, Zheng Shen, Steven R. Long, Manli C. Wu, Hsing H. Shih,
Quanan Zheng, N.-C. Yen, C. C. Tung and Henry H. Liu, The empirical mode
decomposition and the Hilbert spectrum for nonlinear and non-stationary time
series analysis, Proc. R. Soc. Lond. A 454 (1971), 903-995.
[9] Da-Chao Lin, Zhang-Lin Guo, Feng-Ping An and Fan-Lei Zeng, Elimination of
end effects in empirical mode decomposition by mirror image coupled with
support vector regression, Mechanical Systems and Signal Processing 31 (2012),
13-28.
[10] Hui-ting Liu, Zhi-wei Ni and Jian-yang Li, Extracting trend of time series based
on improved empirical mode decomposition method, Advances in Data and Web
Management, Springer, 2007, pp. 341-349.
[11] Zhuofu Liu, A Novel Boundary Extension Approach for Empirical Mode
Decomposition Intelligent Computing, Springer, 2006, pp. 299-304.
[12] A. Mami, Local polynomial regression with applications to both independent and
longitudinal data, Dept. unpublished Ph.D. of Statistics, University of Manchester,
2002.
[13] Hee-Seok Oh and Thomas Lee, Hybrid local polynomial wavelet shrinkage:
wavelet regression with automatic boundary adjustment, Comput. Statist. Data
Anal. 48(4) (2005), 809-819.
[14] Hee-Seok Oh, Philippe Naveau and Geunghee Lee, Polynomial boundary
treatment for wavelet regression, Biometrika 88(1) (2001), 291-298.
[15] David Ruppert, Simon J. Sheather and Matthew P. Wand, An effective bandwidth
selector for local least squares regression, J. Amer. Statist. Assoc. 90(432) (1995),
1257-1270.
[16] Charles J. Stone, Consistent nonparametric regression, Ann. Statist. 5(4) (1977),
595-620.
[17] Qin Wu and Sherman D. Riemenschneider, Boundary extension and stop criteria
for empirical mode decomposition, Advances in Adaptive Data Analysis 2(02)
(2010), 157-169.
View publication stats

13 FJMS SV 2013vi 563 2 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

13 FJMS SV 2013vi 563 2 PDF

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

AUTOMATIC BOUNDARY CORRECTION USING EMPIRICAL MODE

Article · March 2014

Abobaker.M. Jaber Mohd Tahir Ismail

SEE PROFILE SEE PROFILE

Spatial Panel Models View project

Wilk’s lambda based on robust method View project

The user has requested enhancement of the downloaded file.

AUTOMATIC BOUNDARY CORRECTION USING

Abobaker M. Jaber, Mohd Tahir Ismail and Alsaidi M. Altaher

School of Mathematical Sciences

This study proposes a new two-stage technique called empirical mode

Consider the regression model is given by:

Huang et al. [8] proposed a new nonparametric method known as

The rest of this paper is organized as follows: Section 2 provides a brief

2.1. Empirical mode decomposition

I. Set initial estimates for the residue; for example, r0 (t ) = yt . Set

IV. Subtract the mean from the original signal g i = g i −1 − mi −1 and

V. Update residue rk (t ) = rk −1 (n ) − IMFK . This residual component is

Numerous methods have been employed to extract series trend. Freehand

2.2. Local polynomial regression

Locally at point x0 , we can then consider a weighted polynomial regression,

where h is the bandwidth that controls neighborhood size, and K h (⋅) =

2.2.1. Bandwidth selection

The practical performance of any nonparametric regression technique

2.2.2. Direct plug-in bandwidth selection

In this study, we adopt a data-driven bandwidth selection method

where c1 (k ) is a pure constant depending only on the kernel function and

2.3. Some boundary solutions in EMD

Several studies have attempted to solve the boundary problem of EMD.

the extrapolation of the signal through support vector regression at both

This section introduces in detail the proposed method that is a

fˆLL.EMD = fˆEMD + fˆLL .

To obtain an estimated fˆEMD.LL , we must estimate two components,

1. Apply LL regression to the corrupted noisy data yi . Find the trend

2. Find the residuals ei from LL; that is, ei = yi − fˆLL .

3. As the remaining series is expected to be hidden in the residuals ei ,

I. Set initial estimates for the residue; for example, initialize

signal using the cubic spline method.

IV. Subtract the mean from the original signal g i = g i −1 − mi −1 and

V. Update residue rk ( x ) = rk −1 ( n ) − IMFK . This residual component

fˆLL.EMD = fˆEMD + rˆ(t ).

To ensure the good performance of the proposed method, the bandwidth

Simulation is a technique for guiding the experiments of a model.

A simulation study was conducted to evaluate and compare the two

Table 1. Formula of test functions used in simulation

f 1 = sin (πx ) + sin (2πx ) + sin (6πx ) + 0.5 x

f 2 = sin (πx ) − sin (2πx ) + sin (6πx ) + 0.5 x

f 3 = sin (πx ) − sin (2πx ) − sin (6πx ) + 0.5 x

f 4 = sin ( πx3 ) + sin (2πx 2 ) + sin (6πx ) + 0.5 x

10e −10 x + 2 if x ≤ 0.5

where N (∆ ) = {1, ..., ∆, n − ∆ + 1, ..., n}.

a. The normal distribution with zero mean and unity variance.

b. Correlated noise from first-order auto-regressive (AR) model (1)

c. Heavy tail noise from t distribution with three degrees of freedom.

Figure 1. The eight test functions used in simulation.

Regardless of the boundary assumptions, test functions, and noise

Table 2. MSEs of classical EMD under periodic boundary conditions with

Table 3. MSEs of classical EMD under periodic boundary conditions with

Table 4. MSEs of classical EMD under periodic boundary conditions with

Table 5. MSEs of classical EMD under symmetric (mirror) conditions with

Table 6. MSEs of classical EMD under symmetric (mirror) conditions with

This study proposed a method that combines LL regression at the first