Journal Pre-Proof: Signal Processing

Journal Pre-proof
Statistics Variable Kernel Width for Maximum Correntropy Criterion

Algorithm
Shuyong Zhou , Haiquan Zhao
PII: S0165-1684(20)30132-8
DOI: https://doi.org/10.1016/j.sigpro.2020.107589
Reference: SIGPRO 107589
To appear in: Signal Processing
Received date: 6 October 2019

Revised date: 3 March 2020
Accepted date: 16 March 2020
Please cite this article as: Shuyong Zhou , Haiquan Zhao , Statistics Variable Ker-
nel Width for Maximum Correntropy Criterion Algorithm, Signal Processing (2020), doi:
https://doi.org/10.1016/j.sigpro.2020.107589
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition
of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of
record. This version will undergo additional copyediting, typesetting and review before it is published
in its final form, but we are providing this version to give early visibility of the article. Please note that,
during the production process, errors may be discovered which could affect the content, and all legal
disclaimers that apply to the journal pertain.
© 2020 Published by Elsevier B.V.

Highlights
 This paper summarizes several variable kernel width maximum correntropy criterion (MCC) algorithms, and discusses
the basic principles of these algorithms. A close relationship between this algorithms and LMS algorithm is analyzed and
established.
 Then a new statistics variable kernel width MCC algorithm is proposed (SVKW-MCC) on the basis of previous variable
kernel width algorithms.
 SVKW-MCC algorithm is proposed to address the shortcomings of some well-known variable kernel width algorithm.
The SVKW-MCC algorithm use statistics method to compute the kernel width and eliminates the abnormal errors caused
by impulsive noise by statistical method.
 The stability and steady-state mean-square performance of the proposed algorithm is analyzed and verified by
experiments
0Statistics Variable Kernel Width for Maximum
Correntropy Criterion Algorithm

Shuyong Zhou, Haiquan Zhao*
Abstract: Since the maximum correntropy criterion (MCC) algorithm with a constant kernel width leads to the trade-off
problem between the convergence rate and steady-state misalignment, various adaptive kernel width MCC algorithms were
derived to solve this problem. However, the superior performances of these algorithms depend mainly on specific data range,
or have complicated calculation and parameter setting. Thus, this paper proposes a statistics variable kernel width MCC
(SVKW-MCC) algorithm to overcome these problems. Specifically, the proposed algorithm calculates the mean and variances
of the errors signal, and then the proposed algorithm removes these data that significantly deviate from the mean value of
errors signal, moreover, the new mean and variance are recalculated after removing these abnormal data, subsequently, the new
kernel width is calculated by the new variance and mean. Simulation results in system identification and echo cancellation
scenarios show that the proposed algorithm outperforms the existing variable kernel width methods. Moreover, the stability
and steady-state mean-square performance of the proposed algorithm is analyzed and verified by experiments.
Keywords: Maximum correntropy criterion, variable kernel width, impulsive interferences, statistics variable kernel width,
steady-state excess mean square error.
1. Introduction
In recent years, information theoretic learning (ITL) [1, 2] has been applied in non-Gaussian signal processing, especially in
the impulsive noise environment. The minimum entropy [3-5] and the maximum entropy [6-15] are mostly used in ITL theory.
The maximum correntropy criterion (MCC) is popular for its simplicity and robustness.
The maximum correntropy is defined as the probability of how similar [16] two random variables are in a neighborhood of
the joint space controlled by the kernel width i.e, the kernel width acts as a zoom lens [9], controlling the “observation window”
in which similarity is assessed. The smaller the kernel width, the more sensitive of the MCC algorithm is to observation errors,
however, when the kernel width is large enough; the algorithm will be degraded to the LMS algorithm [22]. The kernel width
of MCC leads to a trade-off problem among learning speed and steady-state accuracy. The selection of a suitable kernel width
Shuyong Zhou, Haiquan Zhao are with the Key Laboratory of Magnetic Suspension Technology and Maglev Vehicle, and the National Rail Transportation
Electrification and Automation Engineering Technology Research Center under Grant NEEC-2019-A02, Ministry of Education, and the School of Electrical
Engineering, Southwest Jiaotong University, Chengdu, 610031, China.
* Corresponding author
E-mail addresses: e-mail: 2241903@qq.com; hqzhao_swjtu@126.com
is a critical to the MCC-based algorithms.
In order to solve the contradiction between the steady state performance and the fast convergence rate, several adaptive
kernel width methods have been proposed in [16-21], such as switch kernel width maximum correntropy criterion (SMCC)
algorithm [16], variable kernel width (VKW-MCC) maximum correntropy criterion algorithm [18], adaptive kernel width
maximum correntropy criterion (AMCC) algorithm [17], improved variable kernel width maximum correntropy criterion
(IVKW-MCC) algorithm [21]. However, existing kernel width selection methods are not suitable enough for this problem.
The aforementioned algorithms have obvious weakness. SMCC and AMCC algorithm only perform better in certain
environments. The parameter setting of VKW-MCC algorithm is very complicated. The IVKW-MCC algorithm is very
complicated to calculate.
This paper summarizes several variable kernel width MCC algorithms, and discusses the basic principles of these
algorithms. A close relationship between this algorithms and LMS algorithm is analyzed and established [22].Then a new
statistics variable kernel width MCC algorithm is proposed (SVKW-MCC) on the basis of previous variable kernel width
algorithms. Afterwards the convergence performance of the algorithm is analysed. The steady-state excess mean square error
(EMSE) of the SVKW-MCC algorithm is studied based on energy conservation relation [24-28]. The simulations in system
identification and echo cancellation scenarios that include non-Gaussian impulsive interferences have proved that the proposed
algorithm outperforms some well-known variable kernel width algorithms.
2. Review of variable kernel width MCC algorithm
In the MCC algorithm, maximum correlation entropy is a nonlinear local similarity measure between two random variable X
and Y in kernel spaces [6-9],
V ( X , Y )  E k  X , Y    k ( x, y)dFx , y ( X , Y ) (1)
where E[·] denotes the expectation operator, Fx, y ( X , Y ) is the joint distribution function of X and Y, k(·,·) is a symmetric
positive definite Gaussian kernel controlled by the kernel width  0 , and defined as [8]:
1  x y 2 
k ( x ,y ) e x p  . (2)
2 0  2 0 
 
Like the MSE criterion, the cost function of the MCC algorithm can be defined as [6, 9],
   d  xT w 
2

J M C C(w )n E  e xp 
n n n
(3)
  2 02 
   
where en  dn  xTn wn is the a priori output error, w n is the estimate of unknown system wopt that needs to be estimated at
iteration time n, wopt is an L×1 weight vector, E[·] denotes the expectation operator, and d n is defined as:
d n =xTn wopt  vn (4)
where d n is the output signal, xTn is an L×1 input signal defined as xn   xn , xn 1 ,..., xn  L 1  , The variable v n is the
T
Gaussian background noise plus impulse signal. Similarly to MSE criterion, we can use the stochastic gradient ascent approach
to search the optimal solution as
  e2  
Wn 1  Wn   E exp   n 2   en xn . (5)
  2 0  
According to [9], the weight update equation using the maximum correntropy cost function can be reduced to this
simple form.
 e2 
Wn 1  W n   e x p  n 2  e xn
 2 0 
n
The switch kernel width maximum correntropy criterion (SMCC) algorithm is obtained by maximizing the following cost
function [16]:
2
1 e
max J '
SMCC (n)  2 exp( n 2 )en . (6)
σn σn 2σ n
The SMCC algorithm consider σ n as a function of en in (6). After simple calculation, the expression and the kernel width of
the SMCC algorithm can be obtained as:
  en2 
Wn 1  Wn  exp   2  en xn
 n2  2 n  (7)
 e /2
2
n
2
n
The (8) can be obtained by substituting the kernel width of the SMCC algorithm into the weight update equation of
SMCC algorithm.
1
Wn 1  Wn  2 exp  1 xn . (8)
en
In this case, this algorithm is no longer a MCC-based algorithm and will diverge according to the formula (8), therefore, the
kernel width of the SMCC algorithm [16] is set as
σ2n  max(en2 / 2,σ02 ) , (9)

where σ0 denotes the predetermined kernel width which is calculated by Silverman's rule [29] or other method[30,31], and
σ0 is a constant. The kernel width of the SMCC switches between an error-based kernel width and a predetermined kernel
width [16].
The kernel calculation principle of the AMCC algorithm [17] is the same as that in the previous algorithm, and the kernel
width of the AMCC algorithm is
σ2n  en2  σ02 . (10)
However, these algorithms are only robust in the Specific data matching with prediction kernel width, thus, [18] proposes a
variable kernel width (VKW-MCC) MCC algorithm to overcome this problem, The kernel width of the VKW-MCC algorithm
is calculated at each iteration by maximizing the following cost function calculate with respect to the kernel width σ n .
 e2 
max JVKW (en )  exp   n 2  (11)
σn
 2σ n 
In references [18], the kernel width σ n is calculated as
σ n  κ en . (12)
In order to prevent the VKW-MCC algorithm from degrading to the LMS algorithm, VKW-MCC algorithm uses the following
method: first, take an error window function,
Ae, n  [ en , en 1 ... en  Nw1 ] (13)
where Nw is the width of the window function of Ae , n , then the threshold of error is expressed as
en   en1  (1   ) min( Ae,n ) , (14)
where 0<  <1 is the smoothing factor, min(·) denotes the sample minimum operation which helps to remove the impulsive
interference-corrupted en [18]. And the kernel width is defined as
 n   0 if en   0 / 
 , (15)
 n   en if en   0 / 
where  0 and k is a constant which is arbitrary set to twenty in [18].
3. Proposed algorithm
3.1 Discussion
First, let’s discuss the SMCC algorithm. After substituting the kernel width of the SMCC algorithm into the weight update
equation, this weight update equation can be transformed as:

 1 en2
 Wn 1  Wn  2  exp  1 x n if   02
 e n 2
 . (16)
 W  W   exp   en  e x if
2
en2
 2  n n
  02
 n 1  02  2 0 
n
 2
It is clearly seen from the (16) that the update equation of the SMCC algorithm is divided into two segments by en = 2 0 .
When en  2 0 , the SMCC algorithm is the ordinary MCC algorithm. When en  2 0 , the SMCC algorithm uses
1/ en 1 to guarantee the robustness of the algorithm in the impulse environment. The convergence speed of SMCC algorithm
can be observed intuitively by the following formula:
  en2 
 SMCC exp   2  en (17)
 n2  2 n 
Similarly, the convergence speed equation of the MCC can be defined as
  e2 
 MCC exp   n 2  en , (18)
0
2
 2 0 
where  0 is a constant. The simulation results of (17) and (18) are depicted as Fig.1. It can be seen from the Fig.1 that the
SMCC algorithm is essentially similar to MCC algorithm. The SMCC algorithm only works better than MCC algorithm under
certain specific data.
Then we analyse the AMCC algorithm. Substituting (10) into the weight update equation (7), The AMCC algorithm [17] can
be expressed as
  en2 
Wn 1  Wn  exp   2  n n.
ex (19)
en   0
2 2
 2(en   0 ) 
2
The convergence speed of the AMCC algorithm is defined as
  en2 
 AMCC  exp   2  n .
e (20)
en   0
2 2
 2(en   0 ) 
2
The curves of (20) under different σ 0 are depicted as Fig.2. It can be seen from the Fig.2 that the error peak value of the
curve always fluctuates from -1 to 1 no matter how the value of the kernel width changes. So the algorithm only maintains
robustness against impulsive noise in some special data range. The simulation results show that when the data deviates from
this range, the convergence effect is even worse than the conventional MCC algorithm.
Fig.1. The curves of  MCC and  SMCC versus en Fig.2. The curves of  AMCC versus en with different  0
3.2. Proposed SVKW-MCC algorithm
Because the existing variable kernel width algorithm has the shortcomings discussed above, so the following variable kernel
width algorithm (SVKW-MCC) is proposed based on statistical probability. MCC algorithm can be regarded as variable step
size algorithm [22]. The equivalent step size is expressed as:
 en2 
n = exp   2  (21)
 2 
From the point of view of mathematics, this exponential term is a normal distribution function with a mean of zero and a
variance of  which is usually defined as the kernel width of the MCC algorithm. A simulation result in Fig.3 is presented to
illustrate the influences of different kernel width. It is obvious that the equivalent step size  n decreases rapidly with the
increase of error. When the error exceeds three times the kernel width, it almost decays to zero. This is the reason why MCC
algorithm is robust against the impulsive noise. Also, the curve clearly displays the inherent shortcomings of MCC algorithm.
As the filter iteration, the error is decreasing while the step size is increased. The relationship between error and iteration speed
can be observe by the function of n en .
 en2 
n en   exp   2  n
e (22)
 2 
The simulation result of (22) is shown as Fig.4. It can be seen from Fig.4 that the iteration speed is the fastest when the error is
equal to the kernel width. From the point of view of mathematical, the formula is derived as
 en 2 
max J SVKW  MCC  en   exp   2  en (23)
σn  2σ n 
 
We take derivative of (23) with respect to en , then make the derivative equals zero,
1  en2 n2  0 . (24)
The kernel width can be obtained by solving the (24).

 n  en (25)
The new variable kernel width algorithm update formula is derived as:
  en2 
 Wn 1  Wn   exp   2  en x n
  2 n 
 (26)
  n  en
Observing the above formula (26), the (26) reverts to the LMS algorithm. When the impulsive interferences occur, the
algorithm loses robustness. To avoid this drawback, the SMCC and AMCC algorithm change the kernel width form. However,
it is well-known that the LMS algorithm has the excellent performance in the ordinary Gaussian environment. The fundamental
reason is that the kernel width should not be equal to the error when the impulsive noise occurs.
Fig.3. Equivalent step size an error relationship curve Fig.4. The curves of n en of the MCC algorithm
versus en with different 

If the kernel width is equal to error of the current iteration time, the algorithm will diverges under the impulsive noise, when
calculating the kernel width, the most important principle is to remove the impact of impulse noise on the error, so we
proposed that the optimal kernel width should be equal to the mean value of absolute error what have removed the impulsive
impacts.
 n  E  en  (27)
The en can be considered as a function of  n shown below:
en   E  en  (28)
 is a coefficient. When the error is normal, it is close to one. When the error is impulse noise, it is very large. Substituting
(28) and (27) into update formula of variable kernel width algorithm to get
 2 
Wn 1  Wn   exp     E  en  x n . (29)
 2 
From the above formula (29), when the error is interrupted by white Gaussian noise, the algorithm with smaller coefficient
 decays to LMS algorithm to achieve the optimal convergence effect. When the error is interrupted by impulsive noise,
the exponential term in the algorithm will converge quickly to near zero, and has better robustness. The equivalent step-size
coefficient of the above (29) is expressed as:
 2 
   exp    (30)
 2 
It can be seen from the Fig.5 that the step size is the largest when the error is equal to the mean value, and the step size is
sharply decreased when the error deviates from the mean value.
Fig.5. Equivalent step size curve of SVKW-MCC.
According to the law of large numbers in probability theory, most data will be in the near range of the mean.

P en  E en   3 D en   0.95  (31)
. denotes probability, E . represents mean value, D . and denotes variance. According
In the above formulas (31), P 
to the law of large numbers in probability theory (31), the most of en always fluctuates near the mean value in the updating
process. In order to make most of the en fall within the kernel width, the kernel width of SVWK-MCC is set
 n  E  en   3 D ( en ) . (32)
According to (32), the mean and variance of absolute value of error is needed to be calculated. In order to estimate the mean
and variance of the error at the current iteration, we take a period of error data starting from the moment of time n to
calculate the mean and variance. The window function is defined as:
Ae, n  [en , en 1 , en  2 ,...en i ...en  Nw1 ] (33)
Potential impulsive noise in window function can lead to serious distortion in mean and variance. According to the theory of
law of large numbers, it can be simply considered that the impulse noise is only a small part of the data far from the mean
value. The mean and standard deviation of the above error window functions can be calculated as:
E A (n)  E ( Ae, n ) DA (n)  D( Ae , n ) . (34)
The proposed algorithm zeros data that deviate from mean of error.
0 if eni  E A (n)  3DA (n)

eni   (35)
eni if eni  EA (n)  3DA (n)
We can get more accurate mean and standard deviation than first calculating by recalculating the data of E A ( n) and DA ( n) ,
based on experience, repeat it three times. To guarantee the smooth update of the kernel width, the following sliding average
method of (37) can be used.
 n   n 1  (1   )( EA (n)  3DA (n)) (36)
where  is a constant coefficient close to 1, the proposed algorithm is summarized in the following table 1.
TABLE 1. A Simple Summary of SVKW-MCC algorithm

For each iteration n
en =d n  xTn wn
Ae, n  [en , en 1 , en  2 ,...en  Nw 1 ]
For j =1:3
E A (n)  E ( Ae , n ) DA (n)  D ( Ae , n )
for i =1: Nw
If Ae, n (i )  E A (n)  3DA (n)
Ae , n (i )  0
end if
end
end
 n   n 1  (1   )( EA (n)  3DA (n))
 e2 
Wn 1  Wn   exp   n 2  en xn
 2 n 
end
4. Performance analysis
In this section, we analyze the stability and steady-state mean-square performance of the proposed SVKW-MCC algorithm.
Similar to the MCC algorithm [6-15], the weight vector update expression of SVKW-MCC algorithm can be written as
Wn 1  Wn   f  e n  xn (37)
According to the proposed algorithm, where f  e n  is defined as
 e2 
f  e n   exp   n2  en , (38)
 n 
where  n  E  en   3 D ( en ) is the calculated kernel width. To perform the convergence performance in the
mean-square sense, we define the weight error vector Wn as
Wn  ( Wopt  Wn ) (39)
where Wopt is the system parameters to be identified, Subtracting the both side of the (37) from Wopt , yields
Wn 1  Wn   f  e n  xn (40)
The (41) can be obtained by taking square and expectation of both sides of (40) [25-28].
E  Wn 1   E  Wn   2 E e a,n f  e n    2 E  f 2  e n  xn 

2 2 2
(41)
     
4.1. Stability analysis
In the above (41), where ea , n  WnT X n , and en  ea ,n  vn , vn is the Gaussian background noise plus impulse noise in
the SVKW-MCC algorithm. In order to simplify the calculation, we only consider the white Gaussian noise. The
convergence of the SVKW-MCC algorithm in the mean-square sense can be guaranteed if the squared weight error satisfies
E  Wn 1   E  Wn 
2 2
(42)
   
To ensure (42), we have a mean-square convergence condition is achieved by solute the
2 E e a, n f  e n    2 E  f 2  e n  xn   0

2
(43)
  .
Then the convergence condition of the mean square is given by

2 E e a , n f  e n  
0  (44)
E  f 2  e n  xn 
2
  .
In order to solve (44), the following assumptions are widely used in the theoretical analysis of adaptive filters [17, 18]:
A1: The noise sequence  n  is independent, identically distributed, and independent of the input sequence xn  .
A2: The filter is long enough such that the a priori error ea , n is zero-mean Gaussian and independent of the noise
 n  .
and f 2  e n  are asymptotically uncorrelated [21].

2
A3: The x n
Substituting en  ea ,n  vn and (38) into molecular of (44), after a simple calculation, we can obtain
  e2  
lim 2 E e a , n f  e n   lim 2E (ea2, n  e a , n n ) exp   n2   . (45)
n  n 
   n  
According assumption A2, we obtain lim e a , n n   0 . Substituting it into (45), yields
n 
  e2 
lim 2 E e a , n f  e n   lim 2 E ea2, n exp   n2  . (46)
n  n 
  n  
According to the assumption of A3, the denominator in (44) can be calculated as:
E  f 2  e n  xn   E  f 2  e n  E  xn 
2 2
(47)
   
According to the en  ea ,n  vn and (38), E  f 2  e n  can be calculated as:
  2e2  
E  f 2  e n   E (ea , n  n )2 exp   2n   (48)
   n  
  2e2  
E  f 2  e n   E (ea2, n  2ea , n n  n2 ) exp   2n   (49)
   n  
According to assumption A2, then substituting the E e a , n n   0 into the (49), we obtain
  2e2  
E  f 2  e n   E (ea2, n  n2 ) exp   2n   (50)
   n  
Substituting (46) and (50) into (44), we can obtain

  e2  
2 E ea2, n exp   n2  
   n 
0  (51)
  2e 2   
E  (ea2, n   n2 ) exp   2n    E  x n 
2
  
   
n  
In particular, if the variance of the background noise is much smaller than priori error, i.e., E  n  E ea2, n  ,we have
2
ea2, n  n2  ea2, n , then (51) gives rise to
2
0  (52)
  e2   
E  exp   n2    E  xn 
2
  
   n  
E  xn   E Tr ( Rxx ) , and Tr(⋅) denotes the trace of a matrix, and Rxx is the covariance matrix of the input
2
where
 
vectors, then we have that E Tr ( Rxx )  L [26].
4.2. Steady-state excess mean square error (EMSE) analysis
As the filter reaches the steady-state [21], we have
E  Wn 1   E  Wn  .
2 2
(53)
   
Using (41), it holds that
2 E e a, n f  e n    2 E  f 2  e n  xn   0

2
(54)
 
Comparing (54) with the (43), we can conclude that the result is similar to (51), we can obtain
  e2 
2 E ea2, n exp   n2  
   n 
 . (55)
  2e 2   
E  (ea2, n   n2 ) exp   2n    E  x n 
2
    n     
The steady-state behavior of adaptive filtering algorithm is generally evaluated by excess mean square error (EMSE), which
can be defined as
S  lim E ea2,n  (56)

n 
Combining (56) and (55), limiting both sides of formula (55) we can obtain
   e2  2  
  E Tr ( Rxx ) exp   n2  n  
   n   
S  lim  (57)

.

 2   E Tr ( Rxx ) exp   en  n2  
n  2
    n   
2
The input signal is generated from a zero-mean Gaussian distribution with unit variance; the length of the system to be
  e2  
identified system is equal to L, Let E  n2   2 and lim E exp   n2    e , where 2 is variance of background noise,
n 
   n  
 e is steady value of exponential function. Therefore, (57) is simplified as
 Le2
S . (58)
2   Le2
Substituting  n  E  en   3 D( en ) into  e , we can obtain
  
  en2 
e  lim E exp   . (59)
 
n  2
  E en  3 D( en ) 
  
The error obeys the normal distribution with mean zero and variance  e2 . We can obtain lim en2   e2 . In the (58), according
n 
2  2
to probability and statistics theory, we obtain that E en   e , D( en )  1   e . After a simple calculation, we can
   
obtain
 
1  e2 
E en   en f  en  d en   en exp   n 2  den (60)
  2 e  2 e  .
Observing the above formula, we can find that it has obvious symmetry, so we can deduce that

1  e2  
2 e  e2   e2 
E en   2en
0 2 e
exp   n 2  den   
 2 e  0 
exp   n 2  d   n 2 
 2 e   2 e 
(61)
After taking the integral in (61), we can obtain
2 e  e2  2
E en   exp   n 2  
 e (62)
  2 e 
0
 ,
According to the variance formula, we can obtain

D( en )  E  en   E  en    
2 2
 D(en )  E  en 
2
(63)
 
Substituting (62) into (63) yields
 2
D( en )  1    e2 (64)
 
Substituting (62) and (64) into (59) yields
1
e  exp( ) (65)
16 2 2
9 6 1  
  
 en2 
e  lim exp   2 
 0.8 (66)
n 
 n 
Substituting (66) into (58), the excess mean square error (EMSE) can be obtained.
0.8 L v2
S (67)
2  0.8 L2
Fig.6. Steady-state Excess Mean Square Error and step-size curve when  v2  0.001
Fig.7. Noise Variance and Steady-State Excess Mean Square Error Curve.
The mean square deviation (MSD) is defined as NMSD(n)  10log10 wopt  w n 2
/ wopt
2
 in the simulation. The system
parameters obey uniform distribution between zero and half. The order of the system is 64 taps (L=64). The input signal is a
Gaussian distribution with the mean zero and variance one. When the output signal runs half the time, the system parameters
suddenly change to the original negative number. Furthermore, the desired signal is disturbed by Gaussian white noise with
signal-to-noise ratio of 30dB. The impulsive noise  (n) can be modeled by the Bernoulli-Gaussian (BG) distribution,
 (n)  c(n) A(n) , where A( n) is a Gaussian distribution with an average of zero and a standard deviation of 100. And
c ( n) is a Bernoulli process with the probability density function defined by P(c(n)  1)  Pr and P(c(n)  0)  1  Pr ,
and Pr  0.01 is set [13]. All simulation results are the average over 300 independent trials. The initial value of the
parameter of the algorithm is set to the table 2 below:

TABLE 2. Simulation parameter setting table
Gaussian
Algorithm colored signals Speech signal
background
  0.0008   0.0009   0.0009

MCC
 2  2   2.1
  0.0008   0.0009
MCC
 4  4
  0.003   0.003   0.03

SMCC
  1.8   1.8   1.5
  0.0048   0.0049   0.1

AMCC
  1.8   1.8   2.1
  0.008   0.008   0.03

Nw  26 Nw  26 Nw  26
VKW-MCC
 0  20  0  20 0 1
K  20 K  20 K  40
  0.006   0.005   0.03

SVKW-MCC
Nw  64 Nw  64 Nw  512
5. Simulation results
5.1. System identification in gaussian noise background

The following Fig.6 is the system recognition results of the Gaussian random signal with a signal-to-noise ratio of 30DB
and an impulsive noise with occurrence probability of 0.01.
(a)
(b)
(c)
Fig.8. Experimental result for the MCC, SMCC, AMCC, VKW-MCC, and proposed SVKW-MCC algorithm in Gaussian scenario,
where the system parameter is suddenly changed from w opt to –w opt at the 30000th iteration. (a) System impulse response. (b) NMSD
learning curves (c) kernel width change curve
From the above simulation, it is clearly seen that the kernel width of the MCC algorithm is contradiction between the
convergence rate and the steady state performance.
When the system to be identified is changed to chart a in the Fig.9 and the variance of impulsive noise is 1000, the
simulation results are as:
(a)
(b)
(c)
where the system parameter is suddenly changed from w opt to –w opt at the 20000th iteration. (a) System impulse response. (b) NMSD
learning curves (c) kernel width change curve
SMCC and AMCC algorithm perform worse than MCC under this particular condition. In order to surpass the MCC
algorithm, experiments show that VKW-MCC need to set the appropriate parameters. And also it is very difficult to set
appropriate parameters of VKW-MCC. The proposed algorithm does not need this prior knowledge, and also no need to set
the appropriate parameter. The convergence speed and steady-state performance of the proposed algorithm outperform the
above algorithms in Gaussian noise environment. From the curve of the kernel width, it is observed that there is an
approximate linear relationship between the kernel width and the convergence curve.
5.2. System identification under colored signals

When the input is AR (1) colored signals process with a pole at -0.7 in the background of impulsive noise, the simulation
results are as
(a)
(b)
Fig.10. Experimental result for the MCC, SMCC, AMCC, VKW-MCC, and proposed SVKW-MCC algorithm in Correlated scenario,
where the system parameter is suddenly changed from w opt to –w opt at the 30000th iteration. (a) NMSD learning curves (b) kernel width
change curve.
From the above figure, it is shown that the convergence speed and steady-state performance of the proposed algorithm are
better than those of other algorithms under the relevant input conditions.
5.3. Echo cancellation simulation
In this section, the speech signal is used as input, and also the abrupt change of the echo path occurs in 4 105 th input
sample. Figure 7 gives a comparison of these algorithms in this case. The parameters set the same as Table. 2. Evidently,
comparing with the Other variable kernel width algorithm, the proposed algorithm can much better solve the trade-off
problem between the convergence rate/tracking capability and steady-state error.
(a)
(b)
(c)
Fig.11. Experimental result for the MCC, SMCC, AMCC, VKW-MCC, and proposed SVKW-MCC algorithm in an echo
cancellation scenario, where the system parameter is suddenly changed from w opt to –w opt at the 400000th iteration. (a) Impulse
response of echo channel. (b) Input speech signal for echo cancellation experiment. (c) NMSD learning curves
From the above figure, it is observed that the convergence speed and steady-state performance of the proposed
algorithm are better than those of other algorithms under the speech signal environment.
5.4. Excess mean square error (EMSE) simulation

Most of the simulation conditions are the same as section 5.1. But the EMSE is defined as


EMSE  n   ea2, n  10 log10   w opt  w n  xn   , what is different with   , what
T 2 2 2
NMSD(n)  10log10 wopt  w n / wopt

is used in the previous section. The main simulation conditions of the proposed SVKW-MCC algorithm are set follows,
  0.005 , L  64 ,  v2  0.001 .
where the system parameter is suddenly changed from w opt to –w opt at the 20000th iteration.
Because we originally analyzed the steady-state performance of the algorithm after it stabilized, so we use a straight line
to represent the steady-state value of the analysis. Therefore, the figure of EMSE simulation is not correct mathematically,
but it can be seen that the actual simulation results are close to the theoretical values.
6. Conclusion
In this paper, statistical variable kernel width maximum entropy (SVKW-MCC) algorithm is proposed to address the
shortcomings of some well-known variable kernel width algorithm. The SVKW-MCC algorithm eliminates the abnormal
errors caused by impulsive noise based on statistical method. Moreover, it calculates the optimal kernel width by using the
mean and variance of the errors after eliminating abnormal errors. Simulation results show that the proposed algorithm has
obvious advantages in Gaussian environment. Since the study is based on the assumption of Gaussian distribution, the
parameters still need to be modified to achieve better results in the case of non-Gaussian distribution such as colored signals
and speech signal. Notwithstanding that, this algorithm can be extended to other MCC algorithm or M-estimation algorithm.
The new algorithm is proved to be simple in calculation, wide range of application, better in convergence speed and
steady-state performance by system identification and echo cancellation.
Acknowledgment
This work was partially supported by National Science Foundation of P.R. China (Grant: 61871461, 61571374, 61433011),
Sichuan Science and Technology Program (Grant: 19YYJC0681).
References
[1] Principe J C. Information theoretic learning: Renyi's entropy and kernel perspectives. Springer Science & Business Media, 2010.
[2] B. Chen, Y. Zhu, J. Hu, J.C. Principe. System parameter identification: information criteria and algorithms. Newnes, 2013.
[3] D. Erdogmus and J.C. Principe, "Generalized information potential criterion for adaptive system training," Neural Networks, IEEE
Transactions on. 13.5(2002). pp.1035-1044
[4] B. Chen, J. Hu, L. Pu, Z. Sun. "Stochastic gradient algorithm under (h,  )-entropy criterion." Circuits, Systems & Signal Processing
26.6(2007): pp.941-960.
[5] B. Chen, P. Zhu, J.C. Principe. "Survival information potential: a new criterion for adaptive system training." Signal Processing, IEEE
Transactions on, 60.3 (2012). pp.1l84-1194.
[6] W. Liu, P.P. Pokharel, J.C. Principe. "Correntropy: A localized similarity measure." Neural Networks, 2006. lJCNN'06. International
Joint Conference on. IEEE, 2006. pp.4919-4924.
[7] Wang W, Zhao H, Zeng X. "Geometric Algebra Correntropy: Definition and Application to Robust Adaptive Filtering." IEEE
Transactions on Circuits and Systems II: Express Briefs, 2019.
[8] I. Santamaría, P.P. Pokharel, J.C. Príncipe. "Generalized correlation function: definition, properties, and application to blind
equalization." Signal Processing, IEEE Transactions on, 54.6 (2006).pp.2187-2197.
[9] W. Liu, P.P. Pokharel, J.C. Principe. "Correntropy: properties and applications in non-Gaussian signal processing." Signal Processing,
IEEE Transactions on, 55.11 (2007): pp.5286-5298
[10] Gogineni V. C, Mula S. “Improved proportionate-type sparse adaptive filtering under maximum correntropy criterion in impulsive
noise environments,” Digital Signal Processing, 2018, 79: 190-198.
[11] Li Y, Jiang Z, Shi W, et al. “Blocked maximum correntropy criterion algorithm for cluster-sparse system identifications,” IEEE
Transactions on Circuits and Systems II: Express Briefs, 2019, 66(11): 1915-1919.
[12] Li Y, Wang Y, Yang R, et al. “A soft parameter function penalized normalized maximum correntropy criterion algorithm for sparse
system identification,” Entropy, 2017, 19(1): 45.
[13] Chen B, Liu X, Zhao H, et al. “Maximum correntropy Kalman filter,” Automatica, 2017, 76: 70-77.
[14] Ma W, Chen B, Duan J, et al. “Diffusion maximum correntropy criterion algorithms for robust distributed estimation,” Digital Signal
Processing, 2016, 58: 10-19.
[15] Liu C, Qi Y, Ding W. “The data-reusing MCC-based algorithm and its performance analysis,” Chinese Journal of Electronics, 2016,
25(4): 719-725.
[16] Wang W, J. Zhao, H. Qu, B. Chen, J. C. Principe "A switch kernel width method of correntropy for channel estimation." Proc. IEEE
International Joint Conference on Neural Networks, (IJCNN’15), Killarney, Ireland. p. 1–7.
[17] W. Wang, J. Zhao, H. Qu, B. Chen, and J. C. Principe, “An adaptive kernel width update method of correntropy for channel
estimation,” in Proc. Inter. Conf. Digit. signal process. (ICDSP), 2015, pp. 916–920.
[18] F. Huang, J. Zhang, and S. Zhang, “Adaptive filtering under a variable kernel width maximum correntropy criterion,” IEEE
Transactions on Circuits and Systems II: Express Briefs, vol. 64, no. 10, pp. 1247–1251, 2017.
[19] L. Shi and Y. Lin, “Convex combination of adaptive filters under the maximum correntropy criterion in impulsive interference,”
IEEE Signal Process. Lett., vol. 21, no. 11, pp. 1385–1388, 2014.
[20] S. Zhao, B. Chen, and J. C. Principe, “An adaptive kernel width update for correntropy,” in Proc. Inter. Joint Conf. Neural Netw.,
2012, pp.1–5.
[21] Shi L, Zhao H, Y. Zakharov. “An improved variable kernel width for maximum correntropy criterion algorithm.” IEEE Transactions
on Circuits and Systems II, Exp. Briefs, to be published. doi: 10.1109/TCSII.2018.2880564.
[22] Chen B, Xing L, Zhao H, et al. “Generalized correntropy for robust adaptive filtering.” IEEE Transactions on Signal Processing,
2016, 64(13): 3376-3387.
[23] S. Wang, L. Dang, B. Chen, S. Duan, L. Wang, and C. K. Tse, “Random Fourier filters under maximum correntropy criterion,” IEEE
Transactions on Circuits and Systems I: Regular Papers., DOI:10.1109/TCSI.2018.2825241, 2018.
[24] Al-Naffouri TY, Sayed AH. “Adaptive filters with error non-linearities: Mean square analysis and optimum design.” EURASIP J
Adv Signal Process 2001;2001(1):192–205.
[25] Chen B, Xing L, Liang J, Zheng N, Principe JC. “Steady-state mean-square error analysis for adaptive filtering under the maximum
correntropy criterion.” IEEE Signal Process Letters, 2014, 21(7):880–4.
[26] Wang W, Zhao J, Qu H, et al. “Convergence performance analysis of an adaptive kernel width MCC algorithm.“ AEU-International
Journal of Electronics and Communications, 2017, 76: 71-76.
[27] Wang W, Zhao H, Zeng X, et al. “Steady-State Performance Analysis of Nonlinear Spline Adaptive Filter under Maximum
Correntropy Criterion,” IEEE Transactions on Circuits and Systems II: Express Briefs, 2019.
[28] Khalili A, Rastegarnia A, Islam M K, et al. “Steady-state tracking analysis of adaptive filter with maximum correntropy criterion,”
Circuits, Systems, and Signal Processing, 2017, 36(4): 1725-1734.
[29] B.W. Silverman, “Density estimation for statistics and data analysis,” vol.3, Chapman and hall London, 1986.
[30] M.C. Jones, J.S. Marron, and SJ. Sheather, "A brief survey of band width selection for density estimation," Journal of the American
Statistical Association, 91.433(1996): pp. 401--407.
[31] AW. Bowman, "An alternative method of cross-validation for the smoothing of density estimates," Biometrika, 71.2(1984):
pp.353-360
credit author statement
Dear Editor：
I certify that this manuscript is original and has not been published and will not be submitted elsewhere for
publication while being considered by Signal Processing. And the study is not split up into several parts to
increase the quantity of submissions and submitted to various journals or to one journal over time. No data have
been fabricated or manipulated (including images) to support my conclusions. No data, text, or theories by others
are presented as if they were our own. The submission has been received explicitly from all co-authors. And
authors whose names appear on the submission have contributed sufficiently to the scientific work and therefore
share collective responsibility and accountability for the results.
This article does not contain any studies with human participants or animals performed by any of the authors.
Informed consent was obtained from all individual participants included in the study.
Ethical Statement
Dear Editor：
I certify that this manuscript is original and has not been published and will not be submitted elsewhere for
publication while being considered by Signal Processing. And the study is not split up into several parts to
increase the quantity of submissions and submitted to various journals or to one journal over time. No data have
been fabricated or manipulated (including images) to support my conclusions. No data, text, or theories by others
are presented as if they were our own. The submission has been received explicitly from all co-authors. And
authors whose names appear on the submission have contributed sufficiently to the scientific work and therefore
share collective responsibility and accountability for the results.
This article does not contain any studies with human participants or animals performed by any of the authors.
Informed consent was obtained from all individual participants included in the study.

Journal Pre-Proof: Signal Processing

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Journal Pre-Proof: Signal Processing

Uploaded by

Copyright:

Available Formats

Journal Pre-proof

Statistics Variable Kernel Width for Maximum Correntropy Criterion

Shuyong Zhou , Haiquan Zhao

To appear in: Signal Processing

Received date: 6 October 2019

© 2020 Published by Elsevier B.V.

Correntropy Criterion Algorithm

steady-state excess mean square error.

algorithm outperforms some well-known variable kernel width algorithms.

2. Review of variable kernel width MCC algorithm

and Y in kernel spaces [6-9],

V ( X , Y )  E k  X , Y    k ( x, y)dFx , y ( X , Y ) (1)

d n =xTn wopt  vn (4)

to search the optimal solution as

the SMCC algorithm can be obtained as:

kernel width of the SMCC algorithm [16] is set as

σ2n  max(en2 / 2,σ02 ) , (9)

width of the AMCC algorithm is

σ2n  en2  σ02 . (10)

In references [18], the kernel width σ n is calculated as

method: first, take an error window function,

Ae, n  [ en , en 1 ... en  Nw1 ] (13)

en   en1  (1   ) min( Ae,n ) , (14)

interference-corrupted en [18]. And the kernel width is defined as

where  0 and k is a constant which is arbitrary set to twenty in [18].

equation, this weight update equation can be transformed as:

can be observed intuitively by the following formula:

Similarly, the convergence speed equation of the MCC can be defined as

certain specific data.

The convergence speed of the AMCC algorithm is defined as

size algorithm [22]. The equivalent step size is expressed as:

can be observe by the function of n en .

1  en2 n2  0 . (24)

The kernel width can be obtained by solving the (24).

versus en with different 

The en can be considered as a function of  n shown below:

coefficient of the above (29) is expressed as:

Fig.5. Equivalent step size curve of SVKW-MCC.

E A (n)  E ( Ae, n ) DA (n)  D( Ae , n ) . (34)

0 if eni  E A (n)  3DA (n)

method of (37) can be used.

 n   n 1  (1   )( EA (n)  3DA (n)) (36)

TABLE 1. A Simple Summary of SVKW-MCC algorithm

According to the proposed algorithm, where f  e n  is defined as

mean-square sense, we define the weight error vector Wn as

E  Wn 1   E  Wn   2 E e a,n f  e n    2 E  f 2  e n  xn 

4.1. Stability analysis

To ensure (42), we have a mean-square convergence condition is achieved by solute the

2 E e a, n f  e n    2 E  f 2  e n  xn   0

Then the convergence condition of the mean square is given by

and f 2  e n  are asymptotically uncorrelated [21].

According to the en  ea ,n  vn and (38), E  f 2  e n  can be calculated as:

Substituting (46) and (50) into (44), we can obtain

ea2, n  n2  ea2, n , then (51) gives rise to

vectors, then we have that E Tr ( Rxx )  L [26].

4.2. Steady-state excess mean square error (EMSE) analysis

As the filter reaches the steady-state [21], we have

Using (41), it holds that

2 E e a, n f  e n    2 E  f 2  e n  xn   0

S  lim E ea2,n  (56)

 e is steady value of exponential function. Therefore, (57) is simplified as

Substituting  n  E  en   3 D( en ) into  e , we can obtain

After taking the integral in (61), we can obtain