You are on page 1of 25

Journal Pre-proof

Statistics Variable Kernel Width for Maximum Correntropy Criterion


Algorithm

Shuyong Zhou , Haiquan Zhao

PII: S0165-1684(20)30132-8
DOI: https://doi.org/10.1016/j.sigpro.2020.107589
Reference: SIGPRO 107589

To appear in: Signal Processing

Received date: 6 October 2019


Revised date: 3 March 2020
Accepted date: 16 March 2020

Please cite this article as: Shuyong Zhou , Haiquan Zhao , Statistics Variable Ker-
nel Width for Maximum Correntropy Criterion Algorithm, Signal Processing (2020), doi:
https://doi.org/10.1016/j.sigpro.2020.107589

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition
of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of
record. This version will undergo additional copyediting, typesetting and review before it is published
in its final form, but we are providing this version to give early visibility of the article. Please note that,
during the production process, errors may be discovered which could affect the content, and all legal
disclaimers that apply to the journal pertain.

© 2020 Published by Elsevier B.V.


Highlights

 This paper summarizes several variable kernel width maximum correntropy criterion (MCC) algorithms, and discusses
the basic principles of these algorithms. A close relationship between this algorithms and LMS algorithm is analyzed and
established.
 Then a new statistics variable kernel width MCC algorithm is proposed (SVKW-MCC) on the basis of previous variable
kernel width algorithms.
 SVKW-MCC algorithm is proposed to address the shortcomings of some well-known variable kernel width algorithm.
The SVKW-MCC algorithm use statistics method to compute the kernel width and eliminates the abnormal errors caused
by impulsive noise by statistical method.
 The stability and steady-state mean-square performance of the proposed algorithm is analyzed and verified by
experiments
0Statistics Variable Kernel Width for Maximum

Correntropy Criterion Algorithm


Shuyong Zhou, Haiquan Zhao*

Abstract: Since the maximum correntropy criterion (MCC) algorithm with a constant kernel width leads to the trade-off

problem between the convergence rate and steady-state misalignment, various adaptive kernel width MCC algorithms were

derived to solve this problem. However, the superior performances of these algorithms depend mainly on specific data range,

or have complicated calculation and parameter setting. Thus, this paper proposes a statistics variable kernel width MCC

(SVKW-MCC) algorithm to overcome these problems. Specifically, the proposed algorithm calculates the mean and variances

of the errors signal, and then the proposed algorithm removes these data that significantly deviate from the mean value of

errors signal, moreover, the new mean and variance are recalculated after removing these abnormal data, subsequently, the new

kernel width is calculated by the new variance and mean. Simulation results in system identification and echo cancellation

scenarios show that the proposed algorithm outperforms the existing variable kernel width methods. Moreover, the stability

and steady-state mean-square performance of the proposed algorithm is analyzed and verified by experiments.

Keywords: Maximum correntropy criterion, variable kernel width, impulsive interferences, statistics variable kernel width,

steady-state excess mean square error.

1. Introduction

In recent years, information theoretic learning (ITL) [1, 2] has been applied in non-Gaussian signal processing, especially in

the impulsive noise environment. The minimum entropy [3-5] and the maximum entropy [6-15] are mostly used in ITL theory.

The maximum correntropy criterion (MCC) is popular for its simplicity and robustness.

The maximum correntropy is defined as the probability of how similar [16] two random variables are in a neighborhood of

the joint space controlled by the kernel width i.e, the kernel width acts as a zoom lens [9], controlling the “observation window”

in which similarity is assessed. The smaller the kernel width, the more sensitive of the MCC algorithm is to observation errors,

however, when the kernel width is large enough; the algorithm will be degraded to the LMS algorithm [22]. The kernel width

of MCC leads to a trade-off problem among learning speed and steady-state accuracy. The selection of a suitable kernel width

Shuyong Zhou, Haiquan Zhao are with the Key Laboratory of Magnetic Suspension Technology and Maglev Vehicle, and the National Rail Transportation
Electrification and Automation Engineering Technology Research Center under Grant NEEC-2019-A02, Ministry of Education, and the School of Electrical
Engineering, Southwest Jiaotong University, Chengdu, 610031, China.
* Corresponding author
E-mail addresses: e-mail: 2241903@qq.com; hqzhao_swjtu@126.com
is a critical to the MCC-based algorithms.

In order to solve the contradiction between the steady state performance and the fast convergence rate, several adaptive

kernel width methods have been proposed in [16-21], such as switch kernel width maximum correntropy criterion (SMCC)

algorithm [16], variable kernel width (VKW-MCC) maximum correntropy criterion algorithm [18], adaptive kernel width

maximum correntropy criterion (AMCC) algorithm [17], improved variable kernel width maximum correntropy criterion

(IVKW-MCC) algorithm [21]. However, existing kernel width selection methods are not suitable enough for this problem.

The aforementioned algorithms have obvious weakness. SMCC and AMCC algorithm only perform better in certain

environments. The parameter setting of VKW-MCC algorithm is very complicated. The IVKW-MCC algorithm is very

complicated to calculate.

This paper summarizes several variable kernel width MCC algorithms, and discusses the basic principles of these

algorithms. A close relationship between this algorithms and LMS algorithm is analyzed and established [22].Then a new

statistics variable kernel width MCC algorithm is proposed (SVKW-MCC) on the basis of previous variable kernel width

algorithms. Afterwards the convergence performance of the algorithm is analysed. The steady-state excess mean square error

(EMSE) of the SVKW-MCC algorithm is studied based on energy conservation relation [24-28]. The simulations in system

identification and echo cancellation scenarios that include non-Gaussian impulsive interferences have proved that the proposed

algorithm outperforms some well-known variable kernel width algorithms.

2. Review of variable kernel width MCC algorithm

In the MCC algorithm, maximum correlation entropy is a nonlinear local similarity measure between two random variable X

and Y in kernel spaces [6-9],

V ( X , Y )  E k  X , Y    k ( x, y)dFx , y ( X , Y ) (1)

where E[·] denotes the expectation operator, Fx, y ( X , Y ) is the joint distribution function of X and Y, k(·,·) is a symmetric

positive definite Gaussian kernel controlled by the kernel width  0 , and defined as [8]:

1  x y 2 
k ( x ,y ) e x p  . (2)
2 0  2 0 
 

Like the MSE criterion, the cost function of the MCC algorithm can be defined as [6, 9],

   d  xT w 
2

J M C C(w )n E  e xp 
n n n
(3)
  2 02 
   
where en  dn  xTn wn is the a priori output error, w n is the estimate of unknown system wopt that needs to be estimated at

iteration time n, wopt is an L×1 weight vector, E[·] denotes the expectation operator, and d n is defined as:

d n =xTn wopt  vn (4)

where d n is the output signal, xTn is an L×1 input signal defined as xn   xn , xn 1 ,..., xn  L 1  , The variable v n is the
T

Gaussian background noise plus impulse signal. Similarly to MSE criterion, we can use the stochastic gradient ascent approach

to search the optimal solution as

  e2  
Wn 1  Wn   E exp   n 2   en xn . (5)
  2 0  

According to [9], the weight update equation using the maximum correntropy cost function can be reduced to this

simple form.

 e2 
Wn 1  W n   e x p  n 2  e xn
 2 0 
n

The switch kernel width maximum correntropy criterion (SMCC) algorithm is obtained by maximizing the following cost

function [16]:

2
1 e
max J '
SMCC (n)  2 exp( n 2 )en . (6)
σn σn 2σ n

The SMCC algorithm consider σ n as a function of en in (6). After simple calculation, the expression and the kernel width of

the SMCC algorithm can be obtained as:

  en2 
Wn 1  Wn  exp   2  en xn
 n2  2 n  (7)
 e /2
2
n
2
n

The (8) can be obtained by substituting the kernel width of the SMCC algorithm into the weight update equation of

SMCC algorithm.

1
Wn 1  Wn  2 exp  1 xn . (8)
en

In this case, this algorithm is no longer a MCC-based algorithm and will diverge according to the formula (8), therefore, the

kernel width of the SMCC algorithm [16] is set as

σ2n  max(en2 / 2,σ02 ) , (9)


where σ0 denotes the predetermined kernel width which is calculated by Silverman's rule [29] or other method[30,31], and

σ0 is a constant. The kernel width of the SMCC switches between an error-based kernel width and a predetermined kernel

width [16].

The kernel calculation principle of the AMCC algorithm [17] is the same as that in the previous algorithm, and the kernel

width of the AMCC algorithm is

σ2n  en2  σ02 . (10)

However, these algorithms are only robust in the Specific data matching with prediction kernel width, thus, [18] proposes a

variable kernel width (VKW-MCC) MCC algorithm to overcome this problem, The kernel width of the VKW-MCC algorithm

is calculated at each iteration by maximizing the following cost function calculate with respect to the kernel width σ n .

 e2 
max JVKW (en )  exp   n 2  (11)
σn
 2σ n 

In references [18], the kernel width σ n is calculated as

σ n  κ en . (12)

In order to prevent the VKW-MCC algorithm from degrading to the LMS algorithm, VKW-MCC algorithm uses the following

method: first, take an error window function,

Ae, n  [ en , en 1 ... en  Nw1 ] (13)

where Nw is the width of the window function of Ae , n , then the threshold of error is expressed as

en   en1  (1   ) min( Ae,n ) , (14)

where 0<  <1 is the smoothing factor, min(·) denotes the sample minimum operation which helps to remove the impulsive

interference-corrupted en [18]. And the kernel width is defined as

 n   0 if en   0 / 
 , (15)
 n   en if en   0 / 

where  0 and k is a constant which is arbitrary set to twenty in [18].

3. Proposed algorithm

3.1 Discussion

First, let’s discuss the SMCC algorithm. After substituting the kernel width of the SMCC algorithm into the weight update

equation, this weight update equation can be transformed as:


 1 en2
 Wn 1  Wn  2  exp  1 x n if   02
 e n 2
 . (16)
 W  W   exp   en  e x if
2
en2
 2  n n
  02
 n 1  02  2 0 
n
 2

It is clearly seen from the (16) that the update equation of the SMCC algorithm is divided into two segments by en = 2 0 .

When en  2 0 , the SMCC algorithm is the ordinary MCC algorithm. When en  2 0 , the SMCC algorithm uses

1/ en 1 to guarantee the robustness of the algorithm in the impulse environment. The convergence speed of SMCC algorithm

can be observed intuitively by the following formula:

  en2 
 SMCC exp   2  en (17)
 n2  2 n 

Similarly, the convergence speed equation of the MCC can be defined as

  e2 
 MCC exp   n 2  en , (18)
0
2
 2 0 

where  0 is a constant. The simulation results of (17) and (18) are depicted as Fig.1. It can be seen from the Fig.1 that the

SMCC algorithm is essentially similar to MCC algorithm. The SMCC algorithm only works better than MCC algorithm under

certain specific data.

Then we analyse the AMCC algorithm. Substituting (10) into the weight update equation (7), The AMCC algorithm [17] can

be expressed as

  en2 
Wn 1  Wn  exp   2  n n.
ex (19)
en   0
2 2
 2(en   0 ) 
2

The convergence speed of the AMCC algorithm is defined as

  en2 
 AMCC  exp   2  n .
e (20)
en   0
2 2
 2(en   0 ) 
2

The curves of (20) under different σ 0 are depicted as Fig.2. It can be seen from the Fig.2 that the error peak value of the

curve always fluctuates from -1 to 1 no matter how the value of the kernel width changes. So the algorithm only maintains

robustness against impulsive noise in some special data range. The simulation results show that when the data deviates from

this range, the convergence effect is even worse than the conventional MCC algorithm.
Fig.1. The curves of  MCC and  SMCC versus en Fig.2. The curves of  AMCC versus en with different  0
3.2. Proposed SVKW-MCC algorithm

Because the existing variable kernel width algorithm has the shortcomings discussed above, so the following variable kernel

width algorithm (SVKW-MCC) is proposed based on statistical probability. MCC algorithm can be regarded as variable step

size algorithm [22]. The equivalent step size is expressed as:

 en2 
n = exp   2  (21)
 2 

From the point of view of mathematics, this exponential term is a normal distribution function with a mean of zero and a

variance of  which is usually defined as the kernel width of the MCC algorithm. A simulation result in Fig.3 is presented to

illustrate the influences of different kernel width. It is obvious that the equivalent step size  n decreases rapidly with the

increase of error. When the error exceeds three times the kernel width, it almost decays to zero. This is the reason why MCC

algorithm is robust against the impulsive noise. Also, the curve clearly displays the inherent shortcomings of MCC algorithm.

As the filter iteration, the error is decreasing while the step size is increased. The relationship between error and iteration speed

can be observe by the function of n en .

 en2 
n en   exp   2  n
e (22)
 2 

The simulation result of (22) is shown as Fig.4. It can be seen from Fig.4 that the iteration speed is the fastest when the error is

equal to the kernel width. From the point of view of mathematical, the formula is derived as

 en 2 
max J SVKW  MCC  en   exp   2  en (23)
σn  2σ n 
 

We take derivative of (23) with respect to en , then make the derivative equals zero,

1  en2 n2  0 . (24)

The kernel width can be obtained by solving the (24).


 n  en (25)

The new variable kernel width algorithm update formula is derived as:

  en2 
 Wn 1  Wn   exp   2  en x n
  2 n 
 (26)
  n  en

Observing the above formula (26), the (26) reverts to the LMS algorithm. When the impulsive interferences occur, the

algorithm loses robustness. To avoid this drawback, the SMCC and AMCC algorithm change the kernel width form. However,

it is well-known that the LMS algorithm has the excellent performance in the ordinary Gaussian environment. The fundamental

reason is that the kernel width should not be equal to the error when the impulsive noise occurs.

Fig.3. Equivalent step size an error relationship curve Fig.4. The curves of n en of the MCC algorithm

versus en with different 


If the kernel width is equal to error of the current iteration time, the algorithm will diverges under the impulsive noise, when

calculating the kernel width, the most important principle is to remove the impact of impulse noise on the error, so we

proposed that the optimal kernel width should be equal to the mean value of absolute error what have removed the impulsive

impacts.

 n  E  en  (27)

The en can be considered as a function of  n shown below:

en   E  en  (28)

 is a coefficient. When the error is normal, it is close to one. When the error is impulse noise, it is very large. Substituting

(28) and (27) into update formula of variable kernel width algorithm to get

 2 
Wn 1  Wn   exp     E  en  x n . (29)
 2 

From the above formula (29), when the error is interrupted by white Gaussian noise, the algorithm with smaller coefficient

 decays to LMS algorithm to achieve the optimal convergence effect. When the error is interrupted by impulsive noise,

the exponential term in the algorithm will converge quickly to near zero, and has better robustness. The equivalent step-size

coefficient of the above (29) is expressed as:

 2 
   exp    (30)
 2 

It can be seen from the Fig.5 that the step size is the largest when the error is equal to the mean value, and the step size is

sharply decreased when the error deviates from the mean value.

Fig.5. Equivalent step size curve of SVKW-MCC.

According to the law of large numbers in probability theory, most data will be in the near range of the mean.


P en  E en   3 D en   0.95  (31)
. denotes probability, E . represents mean value, D . and denotes variance. According
In the above formulas (31), P 

to the law of large numbers in probability theory (31), the most of en always fluctuates near the mean value in the updating

process. In order to make most of the en fall within the kernel width, the kernel width of SVWK-MCC is set

 n  E  en   3 D ( en ) . (32)

According to (32), the mean and variance of absolute value of error is needed to be calculated. In order to estimate the mean
and variance of the error at the current iteration, we take a period of error data starting from the moment of time n to
calculate the mean and variance. The window function is defined as:
Ae, n  [en , en 1 , en  2 ,...en i ...en  Nw1 ] (33)

Potential impulsive noise in window function can lead to serious distortion in mean and variance. According to the theory of

law of large numbers, it can be simply considered that the impulse noise is only a small part of the data far from the mean

value. The mean and standard deviation of the above error window functions can be calculated as:

E A (n)  E ( Ae, n ) DA (n)  D( Ae , n ) . (34)

The proposed algorithm zeros data that deviate from mean of error.

0 if eni  E A (n)  3DA (n)


eni   (35)
eni if eni  EA (n)  3DA (n)

We can get more accurate mean and standard deviation than first calculating by recalculating the data of E A ( n) and DA ( n) ,

based on experience, repeat it three times. To guarantee the smooth update of the kernel width, the following sliding average

method of (37) can be used.

 n   n 1  (1   )( EA (n)  3DA (n)) (36)

where  is a constant coefficient close to 1, the proposed algorithm is summarized in the following table 1.

TABLE 1. A Simple Summary of SVKW-MCC algorithm


For each iteration n
en =d n  xTn wn
Ae, n  [en , en 1 , en  2 ,...en  Nw 1 ]
For j =1:3
E A (n)  E ( Ae , n ) DA (n)  D ( Ae , n )
for i =1: Nw
If Ae, n (i )  E A (n)  3DA (n)
Ae , n (i )  0
end if

end
end
 n   n 1  (1   )( EA (n)  3DA (n))
 e2 
Wn 1  Wn   exp   n 2  en xn
 2 n 
end

4. Performance analysis

In this section, we analyze the stability and steady-state mean-square performance of the proposed SVKW-MCC algorithm.

Similar to the MCC algorithm [6-15], the weight vector update expression of SVKW-MCC algorithm can be written as

Wn 1  Wn   f  e n  xn (37)

According to the proposed algorithm, where f  e n  is defined as

 e2 
f  e n   exp   n2  en , (38)
 n 

where  n  E  en   3 D ( en ) is the calculated kernel width. To perform the convergence performance in the

mean-square sense, we define the weight error vector Wn as

Wn  ( Wopt  Wn ) (39)

where Wopt is the system parameters to be identified, Subtracting the both side of the (37) from Wopt , yields

Wn 1  Wn   f  e n  xn (40)

The (41) can be obtained by taking square and expectation of both sides of (40) [25-28].

E  Wn 1   E  Wn   2 E e a,n f  e n    2 E  f 2  e n  xn 


2 2 2
(41)
     

4.1. Stability analysis

In the above (41), where ea , n  WnT X n , and en  ea ,n  vn , vn is the Gaussian background noise plus impulse noise in

the SVKW-MCC algorithm. In order to simplify the calculation, we only consider the white Gaussian noise. The

convergence of the SVKW-MCC algorithm in the mean-square sense can be guaranteed if the squared weight error satisfies

E  Wn 1   E  Wn 
2 2
(42)
   

To ensure (42), we have a mean-square convergence condition is achieved by solute the

2 E e a, n f  e n    2 E  f 2  e n  xn   0


2
(43)
  .

Then the convergence condition of the mean square is given by


2 E e a , n f  e n  
0  (44)
E  f 2  e n  xn 
2
  .

In order to solve (44), the following assumptions are widely used in the theoretical analysis of adaptive filters [17, 18]:

A1: The noise sequence  n  is independent, identically distributed, and independent of the input sequence xn  .

A2: The filter is long enough such that the a priori error ea , n is zero-mean Gaussian and independent of the noise

 n  .

and f 2  e n  are asymptotically uncorrelated [21].


2
A3: The x n

Substituting en  ea ,n  vn and (38) into molecular of (44), after a simple calculation, we can obtain

  e2  
lim 2 E e a , n f  e n   lim 2E (ea2, n  e a , n n ) exp   n2   . (45)
n  n 
   n  

According assumption A2, we obtain lim e a , n n   0 . Substituting it into (45), yields
n 

  e2 
lim 2 E e a , n f  e n   lim 2 E ea2, n exp   n2  . (46)
n  n 
  n  

According to the assumption of A3, the denominator in (44) can be calculated as:

E  f 2  e n  xn   E  f 2  e n  E  xn 
2 2
(47)
   

According to the en  ea ,n  vn and (38), E  f 2  e n  can be calculated as:

  2e2  
E  f 2  e n   E (ea , n  n )2 exp   2n   (48)
   n  

  2e2  
E  f 2  e n   E (ea2, n  2ea , n n  n2 ) exp   2n   (49)
   n  

According to assumption A2, then substituting the E e a , n n   0 into the (49), we obtain

  2e2  
E  f 2  e n   E (ea2, n  n2 ) exp   2n   (50)
   n  

Substituting (46) and (50) into (44), we can obtain


  e2  
2 E ea2, n exp   n2  
   n 
0  (51)
  2e 2   
E  (ea2, n   n2 ) exp   2n    E  x n 
2

  
   
n  

In particular, if the variance of the background noise is much smaller than priori error, i.e., E  n  E ea2, n  ,we have
2

ea2, n  n2  ea2, n , then (51) gives rise to

2
0  (52)
  e2   
E  exp   n2    E  xn 
2

  
   n  

E  xn   E Tr ( Rxx ) , and Tr(⋅) denotes the trace of a matrix, and Rxx is the covariance matrix of the input
2
where
 

vectors, then we have that E Tr ( Rxx )  L [26].

4.2. Steady-state excess mean square error (EMSE) analysis

As the filter reaches the steady-state [21], we have

E  Wn 1   E  Wn  .
2 2
(53)
   

Using (41), it holds that

2 E e a, n f  e n    2 E  f 2  e n  xn   0


2
(54)
 

Comparing (54) with the (43), we can conclude that the result is similar to (51), we can obtain

  e2 
2 E ea2, n exp   n2  
   n 
 . (55)
  2e 2   
E  (ea2, n   n2 ) exp   2n    E  x n 
2

    n     

The steady-state behavior of adaptive filtering algorithm is generally evaluated by excess mean square error (EMSE), which

can be defined as

S  lim E ea2,n  (56)


n 

Combining (56) and (55), limiting both sides of formula (55) we can obtain
   e2  2  
  E Tr ( Rxx ) exp   n2  n  
   n   
S  lim  (57)

.

 2   E Tr ( Rxx ) exp   en  n2  
n  2

    n   
2

The input signal is generated from a zero-mean Gaussian distribution with unit variance; the length of the system to be

  e2  
identified system is equal to L, Let E  n2   2 and lim E exp   n2    e , where 2 is variance of background noise,
n 
   n  

 e is steady value of exponential function. Therefore, (57) is simplified as

 Le2
S . (58)
2   Le2

Substituting  n  E  en   3 D( en ) into  e , we can obtain

  
  en2 
e  lim E exp   . (59)
 
n  2
  E en  3 D( en ) 
  

The error obeys the normal distribution with mean zero and variance  e2 . We can obtain lim en2   e2 . In the (58), according
n 

2  2
to probability and statistics theory, we obtain that E en   e , D( en )  1   e . After a simple calculation, we can
   

obtain

 
1  e2 
E en   en f  en  d en   en exp   n 2  den (60)
  2 e  2 e  .

Observing the above formula, we can find that it has obvious symmetry, so we can deduce that


1  e2  
2 e  e2   e2 
E en   2en
0 2 e
exp   n 2  den   
 2 e  0 
exp   n 2  d   n 2 
 2 e   2 e 
(61)

After taking the integral in (61), we can obtain

2 e  e2  2
E en   exp   n 2  
 e (62)
  2 e 
0
 ,

According to the variance formula, we can obtain


D( en )  E  en   E  en    
2 2
 D(en )  E  en 
2
(63)
 

Substituting (62) into (63) yields

 2
D( en )  1    e2 (64)
 
Substituting (62) and (64) into (59) yields

1
e  exp( ) (65)
16 2 2
9 6 1  
  

 en2 
e  lim exp   2 
 0.8 (66)
n 
 n 

Substituting (66) into (58), the excess mean square error (EMSE) can be obtained.

0.8 L v2
S (67)
2  0.8 L2

Fig.6. Steady-state Excess Mean Square Error and step-size curve when  v2  0.001

Fig.7. Noise Variance and Steady-State Excess Mean Square Error Curve.
The mean square deviation (MSD) is defined as NMSD(n)  10log10 wopt  w n 2
/ wopt
2
 in the simulation. The system

parameters obey uniform distribution between zero and half. The order of the system is 64 taps (L=64). The input signal is a

Gaussian distribution with the mean zero and variance one. When the output signal runs half the time, the system parameters

suddenly change to the original negative number. Furthermore, the desired signal is disturbed by Gaussian white noise with

signal-to-noise ratio of 30dB. The impulsive noise  (n) can be modeled by the Bernoulli-Gaussian (BG) distribution,

 (n)  c(n) A(n) , where A( n) is a Gaussian distribution with an average of zero and a standard deviation of 100. And

c ( n) is a Bernoulli process with the probability density function defined by P(c(n)  1)  Pr and P(c(n)  0)  1  Pr ,

and Pr  0.01 is set [13]. All simulation results are the average over 300 independent trials. The initial value of the

parameter of the algorithm is set to the table 2 below:


TABLE 2. Simulation parameter setting table

Gaussian
Algorithm colored signals Speech signal
background

  0.0008   0.0009   0.0009


MCC
 2  2   2.1

  0.0008   0.0009
MCC
 4  4

  0.003   0.003   0.03


SMCC
  1.8   1.8   1.5

  0.0048   0.0049   0.1


AMCC
  1.8   1.8   2.1

  0.008   0.008   0.03


Nw  26 Nw  26 Nw  26
VKW-MCC
 0  20  0  20 0 1
K  20 K  20 K  40

  0.006   0.005   0.03


SVKW-MCC
Nw  64 Nw  64 Nw  512

5. Simulation results

5.1. System identification in gaussian noise background


The following Fig.6 is the system recognition results of the Gaussian random signal with a signal-to-noise ratio of 30DB

and an impulsive noise with occurrence probability of 0.01.

(a)

(b)
(c)

Fig.8. Experimental result for the MCC, SMCC, AMCC, VKW-MCC, and proposed SVKW-MCC algorithm in Gaussian scenario,

where the system parameter is suddenly changed from w opt to –w opt at the 30000th iteration. (a) System impulse response. (b) NMSD

learning curves (c) kernel width change curve

From the above simulation, it is clearly seen that the kernel width of the MCC algorithm is contradiction between the

convergence rate and the steady state performance.

When the system to be identified is changed to chart a in the Fig.9 and the variance of impulsive noise is 1000, the

simulation results are as:

(a)

(b)

(c)
Fig.9. Experimental result for the MCC, SMCC, AMCC, VKW-MCC, and proposed SVKW-MCC algorithm in Gaussian scenario,

where the system parameter is suddenly changed from w opt to –w opt at the 20000th iteration. (a) System impulse response. (b) NMSD

learning curves (c) kernel width change curve

SMCC and AMCC algorithm perform worse than MCC under this particular condition. In order to surpass the MCC

algorithm, experiments show that VKW-MCC need to set the appropriate parameters. And also it is very difficult to set

appropriate parameters of VKW-MCC. The proposed algorithm does not need this prior knowledge, and also no need to set

the appropriate parameter. The convergence speed and steady-state performance of the proposed algorithm outperform the

above algorithms in Gaussian noise environment. From the curve of the kernel width, it is observed that there is an

approximate linear relationship between the kernel width and the convergence curve.

5.2. System identification under colored signals


When the input is AR (1) colored signals process with a pole at -0.7 in the background of impulsive noise, the simulation

results are as

(a)

(b)

Fig.10. Experimental result for the MCC, SMCC, AMCC, VKW-MCC, and proposed SVKW-MCC algorithm in Correlated scenario,

where the system parameter is suddenly changed from w opt to –w opt at the 30000th iteration. (a) NMSD learning curves (b) kernel width

change curve.

From the above figure, it is shown that the convergence speed and steady-state performance of the proposed algorithm are

better than those of other algorithms under the relevant input conditions.
5.3. Echo cancellation simulation

In this section, the speech signal is used as input, and also the abrupt change of the echo path occurs in 4 105 th input

sample. Figure 7 gives a comparison of these algorithms in this case. The parameters set the same as Table. 2. Evidently,

comparing with the Other variable kernel width algorithm, the proposed algorithm can much better solve the trade-off

problem between the convergence rate/tracking capability and steady-state error.

(a)

(b)

(c)

Fig.11. Experimental result for the MCC, SMCC, AMCC, VKW-MCC, and proposed SVKW-MCC algorithm in an echo

cancellation scenario, where the system parameter is suddenly changed from w opt to –w opt at the 400000th iteration. (a) Impulse

response of echo channel. (b) Input speech signal for echo cancellation experiment. (c) NMSD learning curves

From the above figure, it is observed that the convergence speed and steady-state performance of the proposed

algorithm are better than those of other algorithms under the speech signal environment.

5.4. Excess mean square error (EMSE) simulation


Most of the simulation conditions are the same as section 5.1. But the EMSE is defined as


EMSE  n   ea2, n  10 log10   w opt  w n  xn   , what is different with   , what
T 2 2 2
NMSD(n)  10log10 wopt  w n / wopt

is used in the previous section. The main simulation conditions of the proposed SVKW-MCC algorithm are set follows,
  0.005 , L  64 ,  v2  0.001 .

Fig.12. Experimental result for the MCC, SMCC, AMCC, VKW-MCC, and proposed SVKW-MCC algorithm in Gaussian scenario,

where the system parameter is suddenly changed from w opt to –w opt at the 20000th iteration.

Because we originally analyzed the steady-state performance of the algorithm after it stabilized, so we use a straight line
to represent the steady-state value of the analysis. Therefore, the figure of EMSE simulation is not correct mathematically,
but it can be seen that the actual simulation results are close to the theoretical values.

6. Conclusion

In this paper, statistical variable kernel width maximum entropy (SVKW-MCC) algorithm is proposed to address the

shortcomings of some well-known variable kernel width algorithm. The SVKW-MCC algorithm eliminates the abnormal

errors caused by impulsive noise based on statistical method. Moreover, it calculates the optimal kernel width by using the

mean and variance of the errors after eliminating abnormal errors. Simulation results show that the proposed algorithm has

obvious advantages in Gaussian environment. Since the study is based on the assumption of Gaussian distribution, the

parameters still need to be modified to achieve better results in the case of non-Gaussian distribution such as colored signals

and speech signal. Notwithstanding that, this algorithm can be extended to other MCC algorithm or M-estimation algorithm.

The new algorithm is proved to be simple in calculation, wide range of application, better in convergence speed and

steady-state performance by system identification and echo cancellation.

Acknowledgment

This work was partially supported by National Science Foundation of P.R. China (Grant: 61871461, 61571374, 61433011),

Sichuan Science and Technology Program (Grant: 19YYJC0681).

References
[1] Principe J C. Information theoretic learning: Renyi's entropy and kernel perspectives. Springer Science & Business Media, 2010.
[2] B. Chen, Y. Zhu, J. Hu, J.C. Principe. System parameter identification: information criteria and algorithms. Newnes, 2013.
[3] D. Erdogmus and J.C. Principe, "Generalized information potential criterion for adaptive system training," Neural Networks, IEEE
Transactions on. 13.5(2002). pp.1035-1044
[4] B. Chen, J. Hu, L. Pu, Z. Sun. "Stochastic gradient algorithm under (h,  )-entropy criterion." Circuits, Systems & Signal Processing
26.6(2007): pp.941-960.
[5] B. Chen, P. Zhu, J.C. Principe. "Survival information potential: a new criterion for adaptive system training." Signal Processing, IEEE
Transactions on, 60.3 (2012). pp.1l84-1194.
[6] W. Liu, P.P. Pokharel, J.C. Principe. "Correntropy: A localized similarity measure." Neural Networks, 2006. lJCNN'06. International
Joint Conference on. IEEE, 2006. pp.4919-4924.
[7] Wang W, Zhao H, Zeng X. "Geometric Algebra Correntropy: Definition and Application to Robust Adaptive Filtering." IEEE
Transactions on Circuits and Systems II: Express Briefs, 2019.
[8] I. Santamaría, P.P. Pokharel, J.C. Príncipe. "Generalized correlation function: definition, properties, and application to blind
equalization." Signal Processing, IEEE Transactions on, 54.6 (2006).pp.2187-2197.
[9] W. Liu, P.P. Pokharel, J.C. Principe. "Correntropy: properties and applications in non-Gaussian signal processing." Signal Processing,
IEEE Transactions on, 55.11 (2007): pp.5286-5298
[10] Gogineni V. C, Mula S. “Improved proportionate-type sparse adaptive filtering under maximum correntropy criterion in impulsive
noise environments,” Digital Signal Processing, 2018, 79: 190-198.
[11] Li Y, Jiang Z, Shi W, et al. “Blocked maximum correntropy criterion algorithm for cluster-sparse system identifications,” IEEE
Transactions on Circuits and Systems II: Express Briefs, 2019, 66(11): 1915-1919.
[12] Li Y, Wang Y, Yang R, et al. “A soft parameter function penalized normalized maximum correntropy criterion algorithm for sparse
system identification,” Entropy, 2017, 19(1): 45.
[13] Chen B, Liu X, Zhao H, et al. “Maximum correntropy Kalman filter,” Automatica, 2017, 76: 70-77.
[14] Ma W, Chen B, Duan J, et al. “Diffusion maximum correntropy criterion algorithms for robust distributed estimation,” Digital Signal
Processing, 2016, 58: 10-19.
[15] Liu C, Qi Y, Ding W. “The data-reusing MCC-based algorithm and its performance analysis,” Chinese Journal of Electronics, 2016,
25(4): 719-725.
[16] Wang W, J. Zhao, H. Qu, B. Chen, J. C. Principe "A switch kernel width method of correntropy for channel estimation." Proc. IEEE
International Joint Conference on Neural Networks, (IJCNN’15), Killarney, Ireland. p. 1–7.
[17] W. Wang, J. Zhao, H. Qu, B. Chen, and J. C. Principe, “An adaptive kernel width update method of correntropy for channel
estimation,” in Proc. Inter. Conf. Digit. signal process. (ICDSP), 2015, pp. 916–920.
[18] F. Huang, J. Zhang, and S. Zhang, “Adaptive filtering under a variable kernel width maximum correntropy criterion,” IEEE
Transactions on Circuits and Systems II: Express Briefs, vol. 64, no. 10, pp. 1247–1251, 2017.
[19] L. Shi and Y. Lin, “Convex combination of adaptive filters under the maximum correntropy criterion in impulsive interference,”
IEEE Signal Process. Lett., vol. 21, no. 11, pp. 1385–1388, 2014.
[20] S. Zhao, B. Chen, and J. C. Principe, “An adaptive kernel width update for correntropy,” in Proc. Inter. Joint Conf. Neural Netw.,
2012, pp.1–5.
[21] Shi L, Zhao H, Y. Zakharov. “An improved variable kernel width for maximum correntropy criterion algorithm.” IEEE Transactions
on Circuits and Systems II, Exp. Briefs, to be published. doi: 10.1109/TCSII.2018.2880564.
[22] Chen B, Xing L, Zhao H, et al. “Generalized correntropy for robust adaptive filtering.” IEEE Transactions on Signal Processing,
2016, 64(13): 3376-3387.
[23] S. Wang, L. Dang, B. Chen, S. Duan, L. Wang, and C. K. Tse, “Random Fourier filters under maximum correntropy criterion,” IEEE
Transactions on Circuits and Systems I: Regular Papers., DOI:10.1109/TCSI.2018.2825241, 2018.
[24] Al-Naffouri TY, Sayed AH. “Adaptive filters with error non-linearities: Mean square analysis and optimum design.” EURASIP J
Adv Signal Process 2001;2001(1):192–205.
[25] Chen B, Xing L, Liang J, Zheng N, Principe JC. “Steady-state mean-square error analysis for adaptive filtering under the maximum
correntropy criterion.” IEEE Signal Process Letters, 2014, 21(7):880–4.
[26] Wang W, Zhao J, Qu H, et al. “Convergence performance analysis of an adaptive kernel width MCC algorithm.“ AEU-International
Journal of Electronics and Communications, 2017, 76: 71-76.
[27] Wang W, Zhao H, Zeng X, et al. “Steady-State Performance Analysis of Nonlinear Spline Adaptive Filter under Maximum
Correntropy Criterion,” IEEE Transactions on Circuits and Systems II: Express Briefs, 2019.
[28] Khalili A, Rastegarnia A, Islam M K, et al. “Steady-state tracking analysis of adaptive filter with maximum correntropy criterion,”
Circuits, Systems, and Signal Processing, 2017, 36(4): 1725-1734.
[29] B.W. Silverman, “Density estimation for statistics and data analysis,” vol.3, Chapman and hall London, 1986.
[30] M.C. Jones, J.S. Marron, and SJ. Sheather, "A brief survey of band width selection for density estimation," Journal of the American
Statistical Association, 91.433(1996): pp. 401--407.
[31] AW. Bowman, "An alternative method of cross-validation for the smoothing of density estimates," Biometrika, 71.2(1984):
pp.353-360
credit author statement

Dear Editor:

I certify that this manuscript is original and has not been published and will not be submitted elsewhere for

publication while being considered by Signal Processing. And the study is not split up into several parts to

increase the quantity of submissions and submitted to various journals or to one journal over time. No data have

been fabricated or manipulated (including images) to support my conclusions. No data, text, or theories by others

are presented as if they were our own. The submission has been received explicitly from all co-authors. And

authors whose names appear on the submission have contributed sufficiently to the scientific work and therefore

share collective responsibility and accountability for the results.

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent was obtained from all individual participants included in the study.
Ethical Statement

Dear Editor:

I certify that this manuscript is original and has not been published and will not be submitted elsewhere for

publication while being considered by Signal Processing. And the study is not split up into several parts to

increase the quantity of submissions and submitted to various journals or to one journal over time. No data have

been fabricated or manipulated (including images) to support my conclusions. No data, text, or theories by others

are presented as if they were our own. The submission has been received explicitly from all co-authors. And

authors whose names appear on the submission have contributed sufficiently to the scientific work and therefore

share collective responsibility and accountability for the results.

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent was obtained from all individual participants included in the study.

You might also like