You are on page 1of 5

DESIGN OF FPGA HARDWARE FOR A REAL-TIME

BLIND SOURCE SEPARATION OF FETAL ECG SIGNALS


Charayaphan Chareonsak, Yu Wei, Xiong Bing, and Farook Sattar
School of Electrical and Electronic Engineering, Nanyang Technological University,
Nanyang Avenue, Singapore 639798.
E-mail: {ecchara, efsattar}@ntu.edu.sg
ABSTRACT
In monitoring Fetal ECG (FECG) signal, the unavoidable
and so a major source of interference is the Maternal
ECG (MECG). The fetal heart is very small and thus the
electrical current it generates is much lower than that of
the mother [8]. In order to extract the fetal ECG for
proper clinical diagnostic, some adaptive filtering
technique can be used to remove or suppress the maternal
ECG [9]. Often, a number of electrodes are placed around
the general area of the fetus to pick up multiple FECG
signals. In this case, Blind Source Separation (BSS)
algorithm can be used to deal with the problem much
more effectively.
Blind source separation of independent sources from
their mixtures is a common problem in many real world
multi-sensor applications. However, the algorithm
requires a very high computing power and thus a realtime implementation using is software is not practical.
We present a low-cost real-time FPGA (Field
Programmable Gate Array) implementation of an
improved BSS algorithm based on ICA (Independent
Component Analysis) technique. The separation is
performed by implementing noncausal filters instead of
causal filters within the feedback network. This reduces
the required length of the unmixing filters as well as
provides better separation and faster convergence. Results
of FPGA testing using real ECG signals are reported.
1. INTRODUCTION
Blind signal separation, or BSS, refers to performing
inverse channel estimation despite having no knowledge
about the true channel (or mixing filter) [1,2,3,4,5]. BSS
method based on ICA (independent component analysis)
technique has been found effective and thus commonly
used. A limitation using ICA technique is the need for
long unmixing filters in order to estimate inverse
channels [l]. Here, we propose the use of noncausal filters
[10] to shorten the filter length. In addition to that, using
noncausal filters in the feedback network allows a good
separation even if the direct channels filters do not have
stable inverses. A variable step-size parameter for
adaptation of the learning process is used to improve the
convergence.
FPGA (Field Programmable Gate Array) architecture
allows optimal parallelism needed to handle the high

computation load of a real-time DSP. Being fully customprogrammable, FPGA offers rapid hardware prototyping
and algorithm investigation. Here, we present an FPGA
design of a real-time ICA-based BSS for the application
of separating the FECG from the MECG.
2. THEORY
2.1 Separation of Convolutive Mixture
The architecture proposed by Torkkola for separation of
convolutive mixture is shown in Fig. 1[3]. Minimizing
the mutual information between outputs u1 and u2 is
achieved by maximizing the total entropy at the output.
By forcing W11 and W22 to be a mere scaling coefficient,
the architecture is simplified:

u1 (t ) = x1 (t ) +

L12

w12 (k )u 2 (t k )

(1)

w21 (k )u1 (t k )

(2)

k =0
L21

u 2 (t ) = x 2 (t ) +

k =0

And the learning rules for the separation matrix:

wij (1 2 yi )u j (t k )

(3)

Fig. 1. Torkkolas feedback network for BSS.

2.2 Improved ICA Based BSS Method


Torkkola's algorithm works only when the stable inverse
of the direct channel filters exists; which is not
guaranteed. It was shown that the algorithm can be
modified for noncausal. The relationships between the
signals are now changed to:

u1 (t ) = x1 (t + M ) +
u 2 (t ) = x2 (t + M ) +

M 1

w12 (k )u2 (t k )

(4)

w21 (k )u1 (t k )

(5)

k =M
M 1

k = M

where M is half of the filter length, L, i.e. L = 2M+1 and


the learning rule:
wij( t1 p1+ M ) = w(ijt0 p0 + M ) + K (ui (t0 )u j ( p0 )) (6)
where K ( ui ( t0 )) = stepsize * (1 2 y i ( t0 ))

1
yi (t0 ) =
1 + e ui (t0 )
and

(7)
(8)

t1 = t0+1
po=t0-k and p1=t1-k for k = -M, -M+1, , M.

The variable learning step size, stepsize, in Equation 7,


will be explained in more detail later on.
3. ARCHITECTURE OF FPGA DESIGN FOR BSS
In this section, we describe the architecture of the FPGA
design of the ICA-based BSS algorithm using Torkkolas
feedback network. The system-level design is shown
followed by detailed FPGA simulations on real ECG
signals. Then, topics on hardware realization of the FPGA
are discussed and the FPGA synthesis results given.
In our work, the FPGA design tools used were
XilinxTM System Generator version 2.3 [6] and MatlabTM
version 6.5 from MathWorks. The FPGA synthesis tool
used was XilinxTM ISE 5.2i. System Generator provides a
bit-true and cycle-true FPGA blocksets for simulation
under MATLAB SimulinkTM, thus offering a convenient
and realistic system-level FPGA simulation.
3.1 Practical Implementation of Torkkolas Network
for FPGA Realization
As a result of our earlier experimentation [7][10], we
propose that in order to minimize FPGA resource needed,
as well as to ensure real-time BSS separation given the
limited FPGA clock speed, the specifications shown
below be used. Subsections 3.1.1 to 3.1.5 explain the
impact of each parameter on hardware requirement.
Filter length, L = 321 taps,
Buffer size for iterative convolution, N = 640
(implemented using overlapped window to shorten
the latency time. See Subsections 3.1.1),
Maximum number of iterations, I = 200,
Approximation of the exponential learning step size
using linear piecewise approximation.
The linear piecewise approximation is used to avoid
complex circuitry needed to implement the exponential
function in hardware (see subsection 3.1.4 for more
explanation). The MATLAB simulation in our earlier

work shows that using piecewise approximation does not


affect the performance BSS algorithm significantly [6].
3.1.1 Three-buffer technique
In real-time hardware implementation, to achieve an
uninterrupted processing, the hardware must process the
input and output as streams of continuous sample.
However, this is in contrast with the need of batch
processing of BSS algorithm. To perform the separation,
a block of data buffer has to be filtered iteratively. Here,
we implement a buffering mechanism using three 640sample (N = 640) buffers per one input source. While one
buffer is being filled with the input, second buffer is
being filtered, and the third is being streamed out.
A side effect of this three-buffer technique is that the
system produces a processing delay equivalent to twice
the time needed to fill up a buffer. For example, if the
signal sampling frequency is 100 Hz, the time to fill up
one buffer is 640/100 = 6.4 second. The system will then
need another 6.4 second to process before the result being
ready for output. The total delay is then 6.4+6.4 = 12.8
sec. This processing delay is too long for a practical realtime ECG monitoring and thus we applied an overlapped
window technique. In our implementation, the 640sample block is sampled with overlap of 32 samples. In
this case the processing delay is reduced to
(64/100)*2=1.28 sec.
3.1.2 Implementation of mechanism for the feedback
network
According to Equations 4 and 5, there is a need to refer to
negative addresses for the values of w12(i) when i < 0.
The equation can be modified to include only positive
addresses:

u1 (t ) = x1 (t + M ) +

i = M

12

(i + M )u 2 (t i )

(9)

Equation 9 performs the same non-causal filtering on


u2 as in Equation 4 without the need for negative
addressing of w12. Equation 5 is also modified
accordingly.

Fig. 2. Implementation of (9) for Torkkolas feedback network

The block diagram shown Fig. 2 depicts the hardware


implementation of Equation 9. Note that the
implementation of the FIR filtering of w12 is done through

multiply-accumulate unit (MAC) which significantly


reduces the numbers of multipliers and adders needed
compared to direct implementation (see section 3.1.5).
3.1.3 Mechanism for learning the filter coefficients
The mechanism for learning of the filter coefficients were
implemented according to Equation 6.
3.1.4 Implementation of variable learning step size
In order to speed up the learning of the filter coefficients
shown in Equations 6, we implement a variable step size
technique. In our application, the variable learning step
size in Equation 7, i.e. the parameter stepsize, is
implemented using Equation 10 below where n is the
iteration level, initstep is the initial step size, and I is the
maximum number of iterations, i.e. 200.
stepsize = exp(-u0 n / I)
(10)
1
where
(11)
u0 = log 2 (initstep )
I
The exponential term is difficult to implement in
digital hardware. Look-up table could be used but will
require a large block of ROM (Read Only Memory).
Alternative to using look-up ROM is the CORDIC
algorithm (COrdinate Rotation DIgital Computer).
However, CORDIC algorithm will impose a long latency
(if not heavily pipelined) which will result in the need for
higher FPGA clock speed. Instead, we used a linearly
decreasing variable step size as shown in Equation 12.
stepsize = 0.0006 - 0.000012n

(12)

Fig. 3. Top-level design of BSS using System Generator

Fig. 4. Detailed circuit for updating the filter coefficients


2

1.5

0.5

- 0.5

-1

0.5

1.5

2. 5

3.5

4.5

3.5

4.5

3.5

4.5

3.5

4.5

(a)
2

1.5

3.1.5 Calculation of required FPGA clock speed


As mentioned earlier that in order to save hardware
resource, multiply-accumulate (MAC) technique is used.
MAC operation has to be done at a much higher rate than
that of the input sampling frequency. This MAC
operating frequency determines the frequency of the
FPGA clock. This frequency can be calculated using
Equation 13. Fs is the sampling frequency of the input
signals, L is the tap length of the FIR filter, and I is the
number of iterations.
(13)
FPGA Clock Frequency = L * I * Fs
The filter tap L = 321, iterations I = 200, sampling
frequency Fs = 100 Hz, the required FPGA clock
frequency is thus 321*200*100*(640/64) = 64.2 MHz.

0.5

- 0.5

-1

0.5

1.5

2. 5

(b)
2

1.5

0.5

- 0.5

-1

0.5

1.5

2. 5

(c)
2

1.5

4. SIMULATION OF THE FPGA DESIGN USING


ECG SIGNALS
The top level of the BSS FPGA design using System
Generator is shown in Fig. 3. A more detailed diagram for
the circuit for updating the filter coefficients is shown in
Fig. 4. The FPGA was then simulated using ECG signals
and the results are given in the following paragraphs.

0.5

- 0.5

-1

0.5

1.5

2. 5

(d)
Fig. 5. (a) Original MECG, (b) original FECG, (c) and (d) are
the mixed and noisy ECG signals used for BSS

In order to create the mixtures of Maternal ECG and


Fetal ECG, we mixed two ECG signals downloaded from
PhysioNet (www.physionet.org). The sampling frequency
of the ECG signals is 100 Hz. The signals were mixed at
different ratio between 0.4 and 0.9 in order to simulate
two electrodes placed at two locations. Two low-pass
filtered Gaussian noise sources were then added into the
mixtures in order to simulate flicker (or 1/f) noise
commonly appear in low frequency signals. The two
noise sources were scaled to provide an SNR of
approximately 30 dB. Fig. 5 (a) shows the original ECG
signal used to represent MECG and the one in Fig. 5 (b)
is used for FECG. Fig. 5 (c) and (d) show the two mixed,
and noisy, ECG signals used in the FPGA simulation of
BSS algorithm. It can be seen in Fig. 5 (c) and (d) that the
FECG can hardly be identified due to the much larger
MECG. Although some of the QRS complex of the
FECG is still visible, most are hidden by the larger
MECG.
Fig. 6 (a) and (b) show the separated output ECG
signals as the results from FPGA simulation. It can be
seen that the separated FECG signal in Fig. 6 (b) is clean
and the QRS complex, as well as other components, can
be easily detected. Comparing Fig. 6 (b) to the original
FECG signal in Fig. 5 (b), it can be seen that the BSS
algorithm preserve the shape of the signal well. Similar
conclusion can be drawn from inspecting the MECG
result shown in Fig. 6(a).
2

1.5

Table 1 details the gate requirement of the FPGA design.


The total gate requirement reported by the ISE is
approximately 100 Kgates. Table 2 shows the reported
maximum path delay and the maximum clock. The
maximum FPGA operating frequency of 71.2 MHz is
higher than the required 64.2 MHz and thus the design
will operate in real-time.
Table 1: Detail gate requirement of the BSS FPGA design
Number of Slice for Logic
Number of Slice for Flip Flops
Number of 4 input LUTs
- used as LUTs
- used as a route-thru
- used as Shift registers
Total equivalent gate count for the design

550
405
3,002
2,030
450
522
100,213

Table 2: Maximum combinational path delay and operating frequency of


the FPGA design for BSS
Maximum path delay from/to any node
15.8 nSec
Maximum operating frequency
71.2 MHz

6. CONCLUSION
I this paper, we have shown that our designed FPGA
performs the improved BSS algorithm that successfully
separate the Maternal ECG (MECG) and the Fetal ECG
(FECG) from the mixtures of recorded ECG signals. The
algorithm is robust against flicker (or 1/f) noise and
preserves the components in the ECG signals.
A simple and practical implementation of an ICA based
blind source separation circuit using FPGA is described.
The FPGA design achieves the real-time speed using a
relatively low system clock of 64.4 MHz.

0.5

7. REFERENCES
[1]

- 0.5

-1

0.5

1.5

2.5

3.5

4.5

(a)

[2]
[3]

[4]

1.5

0.5

[5]

- 0.5

-1

0.5

1.5

2.5

3.5

4.5

(a)
Fig. 6. Result of FPGA simulation (a) separated MECG and (b)
separated FECG

5. FPGA SYNTHESIS RESULTS


After the successful simulation, the VHDL codes were
automatically generated from the design using System
Generator. The VHDL codes were then synthesized using
Xilinx ISE 5.2i and targeted for Virtex-E, 600,000 gates.

[6]
[7]

[8]

T-W Lee, "Independent Component Analysis - Theory


and Applications", Kluwer Academic Publishers, 1998.
R.M. Gray, "Entropy and Information Theory", New
York: Springer-Verlag, 1990.
P. Comon, "Independent component analysis, a new
concept?", Signal Processing, vol. 36, 1994, pp. 287-314.
K. Torkkola, "Blind Source Separation For Audio Signals
- Are We there yet?", IEEE Workshop on Independent
Component Analysis and Blind Signal Separation,
Aussois, France, Jan 1999.
T-W Lee, A.J. Bell, and R. Orglmeister, "Blind source
separation of real world signals", Proc. IEEE Int. Conf.
Neural Networks, June 97, Houston, pp. 2129-2135.
Xilinx Inc., Xilinx System Generator v2.3 for The
MathWorks Simulink: Quick Start Guide, February 2002.
F. Sattar and C. Charayaphan, Low-Cost Design and
Implementation of an ICA-Based Blind Source Separation
Algorithm, IEEE ASIC/SoC Conference, Rochester, NY,
Sept 25-28, 2002, pp. 15-19.
Adam, D., and Shavit, D. Complete foetal ECG
morphology recording by synchronized adaptive

filtration, Medical and biological engineering and


computing, 28, 287-292. 1990.

[9]

Kam, A. and Cohen, A., Maternal ECG elimination and


Foetal ECG Detection Comparison of Several
Algorithms, Proc. Of the 20th Ann. Int. Conf. IEEE
EMBS, Hong-Kong, 1998.
[10] Charayaphan Charoensak and Farook Sattar, Hardware
for real-time ICA-based blind source separation, in

Proc. 15th IEEE Int. Conf. SOCC, Sept. 12-15, 2004.

You might also like