This action might not be possible to undo. Are you sure you want to continue?

Charayaphan Chareonsak, Yu Wei, Xiong Bing, and Farook Sattar School of Electrical and Electronic Engineering, Nanyang Technological University, Nanyang Avenue, Singapore 639798. E-mail: {ecchara, efsattar}@ntu.edu.sg ABSTRACT In monitoring Fetal ECG (FECG) signal, the unavoidable and so a major source of interference is the Maternal ECG (MECG). The fetal heart is very small and thus the electrical current it generates is much lower than that of the mother [8]. In order to extract the fetal ECG for proper clinical diagnostic, some adaptive filtering technique can be used to remove or suppress the maternal ECG [9]. Often, a number of electrodes are placed around the general area of the fetus to pick up multiple FECG signals. In this case, Blind Source Separation (BSS) algorithm can be used to deal with the problem much more effectively. Blind source separation of independent sources from their mixtures is a common problem in many real world multi-sensor applications. However, the algorithm requires a very high computing power and thus a realtime implementation using is software is not practical. We present a low-cost real-time FPGA (Field Programmable Gate Array) implementation of an improved BSS algorithm based on ICA (Independent Component Analysis) technique. The separation is performed by implementing noncausal filters instead of causal filters within the feedback network. This reduces the required length of the unmixing filters as well as provides better separation and faster convergence. Results of FPGA testing using real ECG signals are reported. 1. INTRODUCTION Blind signal separation, or BSS, refers to performing inverse channel estimation despite having no knowledge about the true channel (or mixing filter) [1,2,3,4,5]. BSS method based on ICA (independent component analysis) technique has been found effective and thus commonly used. A limitation using ICA technique is the need for long unmixing filters in order to estimate inverse channels [l]. Here, we propose the use of noncausal filters [10] to shorten the filter length. In addition to that, using noncausal filters in the feedback network allows a good separation even if the direct channels filters do not have stable inverses. A variable step-size parameter for adaptation of the learning process is used to improve the convergence. FPGA (Field Programmable Gate Array) architecture allows optimal parallelism needed to handle the high computation load of a real-time DSP. Being fully customprogrammable, FPGA offers rapid hardware prototyping and algorithm investigation. Here, we present an FPGA design of a real-time ICA-based BSS for the application of separating the FECG from the MECG. 2. THEORY 2.1 Separation of Convolutive Mixture The architecture proposed by Torkkola for separation of convolutive mixture is shown in Fig. 1[3]. Minimizing the mutual information between outputs u1 and u2 is achieved by maximizing the total entropy at the output. By forcing W11 and W22 to be a mere scaling coefficient, the architecture is simplified:

u1 (t ) = x1 (t ) +

u 2 (t ) = x 2 (t ) +

k =0 L21

∑ w12 (k )u 2 (t − k )

k =0

L12

(1)

∑ w21 (k )u1 (t − k )

(2)

And the learning rules for the separation matrix:

∆wij ∝ (1 − 2 yi )u j (t − k )

(3)

Fig. 1. Torkkola’s feedback network for BSS.

2.2 Improved ICA Based BSS Method Torkkola's algorithm works only when the stable inverse of the direct channel filters exists; which is not guaranteed. It was shown that the algorithm can be modified for noncausal. The relationships between the signals are now changed to:

u1 (t ) = x1 (t + M ) +

u 2 (t ) = x2 (t + M ) +

M −1

k =−M M −1

∑ w12 (k )u2 (t − k )

∑ w21 (k )u1 (t − k )

(4)

work shows that using piecewise approximation does not affect the performance BSS algorithm significantly [6]. 3.1.1 Three-buffer technique In real-time hardware implementation, to achieve an uninterrupted processing, the hardware must process the input and output as streams of continuous sample. However, this is in contrast with the need of batch processing of BSS algorithm. To perform the separation, a block of data buffer has to be filtered iteratively. Here, we implement a buffering mechanism using three 640sample (N = 640) buffers per one input source. While one buffer is being filled with the input, second buffer is being filtered, and the third is being streamed out. A side effect of this three-buffer technique is that the system produces a processing delay equivalent to twice the time needed to fill up a buffer. For example, if the signal sampling frequency is 100 Hz, the time to fill up one buffer is 640/100 = 6.4 second. The system will then need another 6.4 second to process before the result being ready for output. The total delay is then 6.4+6.4 = 12.8 sec. This processing delay is too long for a practical realtime ECG monitoring and thus we applied an overlapped window technique. In our implementation, the 640sample block is sampled with overlap of 32 samples. In this case the processing delay is reduced to (64/100)*2=1.28 sec. 3.1.2 Implementation of mechanism for the feedback network According to Equations 4 and 5, there is a need to refer to negative addresses for the values of w12(i) when i < 0. The equation can be modified to include only positive addresses:

(5)

k =− M

where M is half of the filter length, L, i.e. L = 2M+1 and the learning rule: ∆wijt1− p1+ M ) = ∆w(ijt0 − p0 + M ) + K (ui (t0 )u j ( p0 )) (6) ( where K ( ui ( t0 )) = stepsize * (1 − 2 y i ( t0 )) (7) (8)

1 yi (t0 ) = 1 + e− ui (t0 )

and t1 = t0+1 po=t0-k and p1=t1-k for k = -M, -M+1, …, M.

The variable learning step size, stepsize, in Equation 7, will be explained in more detail later on. 3. ARCHITECTURE OF FPGA DESIGN FOR BSS In this section, we describe the architecture of the FPGA design of the ICA-based BSS algorithm using Torkkola’s feedback network. The system-level design is shown followed by detailed FPGA simulations on real ECG signals. Then, topics on hardware realization of the FPGA are discussed and the FPGA synthesis results given. In our work, the FPGA design tools used were XilinxTM System Generator version 2.3 [6] and MatlabTM version 6.5 from MathWorks. The FPGA synthesis tool used was XilinxTM ISE 5.2i. System Generator provides a bit-true and cycle-true FPGA blocksets for simulation under MATLAB SimulinkTM, thus offering a convenient and realistic system-level FPGA simulation. 3.1 Practical Implementation of Torkkola’s Network for FPGA Realization As a result of our earlier experimentation [7][10], we propose that in order to minimize FPGA resource needed, as well as to ensure real-time BSS separation given the limited FPGA clock speed, the specifications shown below be used. Subsections 3.1.1 to 3.1.5 explain the impact of each parameter on hardware requirement. • Filter length, L = 321 taps, • Buffer size for iterative convolution, N = 640 (implemented using overlapped window to shorten the latency time. See Subsections 3.1.1), • Maximum number of iterations, I = 200, • Approximation of the exponential learning step size using linear piecewise approximation. The linear piecewise approximation is used to avoid complex circuitry needed to implement the exponential function in hardware (see subsection 3.1.4 for more explanation). The MATLAB simulation in our earlier

u1 (t ) = x1 (t + M ) +

i =− M

∑w

M

12

(i + M )u 2 (t − i )

(9)

Equation 9 performs the same non-causal filtering on u2 as in Equation 4 without the need for negative addressing of w12. Equation 5 is also modified accordingly.

Fig. 2. Implementation of (9) for Torkkola’s feedback network

The block diagram shown Fig. 2 depicts the hardware implementation of Equation 9. Note that the implementation of the FIR filtering of w12 is done through

multiply-accumulate unit (MAC) which significantly reduces the numbers of multipliers and adders needed compared to direct implementation (see section 3.1.5). 3.1.3 Mechanism for learning the filter coefficients The mechanism for learning of the filter coefficients were implemented according to Equation 6. 3.1.4 Implementation of variable learning step size In order to speed up the learning of the filter coefficients shown in Equations 6, we implement a variable step size technique. In our application, the variable learning step size in Equation 7, i.e. the parameter stepsize, is implemented using Equation 10 below where n is the iteration level, initstep is the initial step size, and I is the maximum number of iterations, i.e. 200. stepsize = exp(-u0 – n / I) (10) 1 where (11) u0 = − log 2 (initstep ) − I The exponential term is difficult to implement in digital hardware. Look-up table could be used but will require a large block of ROM (Read Only Memory). Alternative to using look-up ROM is the CORDIC algorithm (COrdinate Rotation DIgital Computer). However, CORDIC algorithm will impose a long latency (if not heavily pipelined) which will result in the need for higher FPGA clock speed. Instead, we used a linearly decreasing variable step size as shown in Equation 12. stepsize = 0.0006 - 0.000012n (12)

Fig. 3. Top-level design of BSS using System Generator

**Fig. 4. Detailed circuit for updating the filter coefficients
**

2

1.5

1

0.5

0

- 0.5

-1

0

0.5

1

1.5

2

2. 5

3

3.5

4

4.5

5

(a)

2 1.5

1

3.1.5 Calculation of required FPGA clock speed As mentioned earlier that in order to save hardware resource, multiply-accumulate (MAC) technique is used. MAC operation has to be done at a much higher rate than that of the input sampling frequency. This MAC operating frequency determines the frequency of the FPGA clock. This frequency can be calculated using Equation 13. Fs is the sampling frequency of the input signals, L is the tap length of the FIR filter, and I is the number of iterations. (13) FPGA Clock Frequency = L * I * Fs The filter tap L = 321, iterations I = 200, sampling frequency Fs = 100 Hz, the required FPGA clock frequency is thus 321*200*100*(640/64) = 64.2 MHz. 4. SIMULATION OF THE FPGA DESIGN USING ECG SIGNALS The top level of the BSS FPGA design using System Generator is shown in Fig. 3. A more detailed diagram for the circuit for updating the filter coefficients is shown in Fig. 4. The FPGA was then simulated using ECG signals and the results are given in the following paragraphs.

0.5

0

- 0.5

-1

0

0.5

1

1.5

2

2. 5

3

3.5

4

4.5

5

(b)

2 1.5

1

0.5

0

- 0.5

-1

0

0.5

1

1.5

2

2. 5

3

3.5

4

4.5

5

(c)

2 1.5

1

0.5

0

- 0.5

-1

0

0.5

1

1.5

2

2. 5

3

3.5

4

4.5

5

(d) Fig. 5. (a) Original MECG, (b) original FECG, (c) and (d) are the mixed and noisy ECG signals used for BSS

In order to create the mixtures of Maternal ECG and Fetal ECG, we mixed two ECG signals downloaded from PhysioNet (www.physionet.org). The sampling frequency of the ECG signals is 100 Hz. The signals were mixed at different ratio between 0.4 and 0.9 in order to simulate two electrodes placed at two locations. Two low-pass filtered Gaussian noise sources were then added into the mixtures in order to simulate flicker (or 1/f) noise commonly appear in low frequency signals. The two noise sources were scaled to provide an SNR of approximately 30 dB. Fig. 5 (a) shows the original ECG signal used to represent MECG and the one in Fig. 5 (b) is used for FECG. Fig. 5 (c) and (d) show the two mixed, and noisy, ECG signals used in the FPGA simulation of BSS algorithm. It can be seen in Fig. 5 (c) and (d) that the FECG can hardly be identified due to the much larger MECG. Although some of the QRS complex of the FECG is still visible, most are hidden by the larger MECG. Fig. 6 (a) and (b) show the separated output ECG signals as the results from FPGA simulation. It can be seen that the separated FECG signal in Fig. 6 (b) is clean and the QRS complex, as well as other components, can be easily detected. Comparing Fig. 6 (b) to the original FECG signal in Fig. 5 (b), it can be seen that the BSS algorithm preserve the shape of the signal well. Similar conclusion can be drawn from inspecting the MECG result shown in Fig. 6(a).

2

Table 1 details the gate requirement of the FPGA design. The total gate requirement reported by the ISE is approximately 100 Kgates. Table 2 shows the reported maximum path delay and the maximum clock. The maximum FPGA operating frequency of 71.2 MHz is higher than the required 64.2 MHz and thus the design will operate in real-time.

Table 1: Detail gate requirement of the BSS FPGA design

Number of Slice for Logic Number of Slice for Flip Flops Number of 4 input LUTs - used as LUTs - used as a route-thru - used as Shift registers Total equivalent gate count for the design 550 405 3,002 2,030 450 522 100,213

Table 2: Maximum combinational path delay and operating frequency of the FPGA design for BSS Maximum path delay from/to any node 15.8 nSec Maximum operating frequency 71.2 MHz

1.5

6. CONCLUSION I this paper, we have shown that our designed FPGA performs the improved BSS algorithm that successfully separate the Maternal ECG (MECG) and the Fetal ECG (FECG) from the mixtures of recorded ECG signals. The algorithm is robust against flicker (or 1/f) noise and preserves the components in the ECG signals. A simple and practical implementation of an ICA based blind source separation circuit using FPGA is described. The FPGA design achieves the real-time speed using a relatively low system clock of 64.4 MHz. 7. REFERENCES

[1] [2] [3] [4] T-W Lee, "Independent Component Analysis - Theory and Applications", Kluwer Academic Publishers, 1998. R.M. Gray, "Entropy and Information Theory", New York: Springer-Verlag, 1990. P. Comon, "Independent component analysis, a new concept?", Signal Processing, vol. 36, 1994, pp. 287-314. K. Torkkola, "Blind Source Separation For Audio Signals - Are We there yet?", IEEE Workshop on Independent Component Analysis and Blind Signal Separation, Aussois, France, Jan 1999. T-W Lee, A.J. Bell, and R. Orglmeister, "Blind source separation of real world signals", Proc. IEEE Int. Conf. Neural Networks, June 97, Houston, pp. 2129-2135. Xilinx Inc., Xilinx System Generator v2.3 for The MathWorks Simulink: Quick Start Guide, February 2002. F. Sattar and C. Charayaphan, “Low-Cost Design and Implementation of an ICA-Based Blind Source Separation Algorithm”, IEEE ASIC/SoC Conference, Rochester, NY, Sept 25-28, 2002, pp. 15-19. Adam, D., and Shavit, D. “Complete foetal ECG morphology recording by synchronized adaptive

1

0.5

0

- 0.5

-1

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

(a)

2 1.5

1

0.5

0

[5]

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

- 0.5

-1

(a) Fig. 6. Result of FPGA simulation (a) separated MECG and (b) separated FECG

[6] [7]

5. FPGA SYNTHESIS RESULTS After the successful simulation, the VHDL codes were automatically generated from the design using System Generator. The VHDL codes were then synthesized using Xilinx ISE 5.2i and targeted for Virtex-E, 600,000 gates.

[8]

filtration”, Medical and biological engineering and computing, 28, 287-292. 1990.

[9]

Kam, A. and Cohen, A., “Maternal ECG elimination and Foetal ECG Detection – Comparison of Several Algorithms”, Proc. Of the 20th Ann. Int. Conf. IEEE EMBS, Hong-Kong, 1998. [10] Charayaphan Charoensak and Farook Sattar, “Hardware for real-time ICA-based blind source separation,” in

Proc. 15th IEEE Int. Conf. SOCC, Sept. 12-15, 2004.

- Placement and Routing of 1oo2 System on Fpgaby atulmaharashtra
- 1-s2.0-S0378475412001437-mainby Er Mehul Girnari
- My Posterby badhon1131
- Hardware Software Co-simulation of Edge Detection for Image Processing System Using Delay Block in Xsgby International Journal of Research in Engineering and Technology

- Sande Seema 2010
- FPGA IEEE.pdf
- Power Efficient and High Throughput of Fir Filter Using Block Least Mean Square Algorithm in Fpga
- lms_fpga_504
- DCT and IDCT Implementations on Different FPGA Technologies
- IAETSD March-C Algorithm for Embedded Memories in FPGA
- 186_devlin_paper
- research paper
- Placement and Routing of 1oo2 System on Fpga
- 1-s2.0-S0378475412001437-main
- My Poster
- Hardware Software Co-simulation of Edge Detection for Image Processing System Using Delay Block in Xsg
- kaka
- physical design for FPGA.pdf
- Optimizing Floating Point Units in Hybrid FPGAs
- FSM Con Automata
- FPGA
- Design and Test of a Ddr Sdram Interface 0049
- 2.1-1 Hwang
- Field-Programmable Gate Array - Wikipedia, The Free Encyclopedia
- 187011122
- Floating Point Fpga
- Field-Programmable Gate Array - Wikipedia, The Free Encyclopedia
- 2(a) Design of SHA-1 Algorithm Based on FPGA
- 111551826-Lec03-FPGAArch
- FPGA Based System Design Unit-2
- N2X-N2XT Product Presentation May 2011
- FPGA
- Charankar_Nikita(Land Mines Anta Babu Nuvve Chudu)
- Fpl13 a Single-precision Compressive Sensing Signal Reconstruction Engine on Fpgas
- DESIGN OF FPGA HARDWARE FOR A REAL-TIME BLIND SOURCE SEPARATION OF FETAL ECG SIGNALS

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue reading from where you left off, or restart the preview.

scribd