You are on page 1of 4

3D Haar Wavelet Transform with Dynamic Partial Reconfiguration for 3D Medical Image Compression

Afandi Ahmad, Benjamin Krill, Abbes Amira
Electronic and Computer Engineering School of Engineering and Design Brunel University, West London, United Kingdom Email: {Afandi.Ahmad, Benjamin.Krill, Abbes.Amira}

Hassan Rabah
Laboratoire d’Instrumentation Electronique de Nancy University Henri Poincare, France Email:

Abstract— This paper describes the design and implementation of 3D Haar wavelet transform (HWT) with transpose based computation and dynamic partial reconfiguration (DPR). As a result of the separability property of the multi-dimensional HWT, the proposed architecture has been implemented using a cascade of three N-point 1D HWT and two transpose memory for a 3D volume of N ×N ×N , suitable for 3D medical image compression. The proposed 3D HWT architectures were implemented on Xilinx Virtex-5 field programmable gate array (FPGA) using VHDL. An in-depth performance analysis and comparison has shown that DPR based implementation improves both speed and power consumption as well as reducing the hardware required for the system.

I. I NTRODUCTION The nature of medical image processing applications involves performing complex tasks, mainly matrix transforms, repeatedly on a large set of volume data, often under real-time requirements. As an example, the computational complexity for fast Fourier transform (FFT) and the recent developed curvelet transform is in the order from O(N × logN ) to O(N 2 × J) with N is the transform size and J is the maximum transform resolution level, and hence are extremely computationally intensive for large medical volumes data [1]. In order to solve this issue, efficient implementation for these operations are pertinent of important and lead to efficient solutions for three-dimensional (3D) medical image compression. Higher compression ratios can be achieved using multi-resolution analysis where the 3D wavelet transform is widely applied due to its features of perfect reconstruction property and lack of blocking artifacts. In this research, Haar wavelet transform (HWT) as the simplest of all wavelets has been chosen as a result of the following features: conceptually simple, fast, memory efficient, and it is exactly reversible without the edge effects which are a problem with other wavelet transforms [2]. Reconfigurable hardware, especially field programmable gate arrays (FPGAs) offers significant potential for the efficient implementation of a wide range of computationally intensive signal and image processing algorithms and applications, from simple low-resolution and low bandwidth (multimedia, picture phone) to very high-resolution and high-bandwidth (medical imaging, HDTV) applications [3].

Complexity in data addressing and accessing, massive amount of data to be processed and requirement of several building blocks for its computationally intensive matrix transformation operations have resulted a big restriction for hardware implementation in 3D medical image compression. FPGAs with dynamic partial reconfiguration (DPR) is a promising solution for reducing the hardware required for an efficient design implementation as well as improving the performance, speed and power consumption of the 3D medical image compression system. Despite its complexity, there has recently been an interest in 3D discrete wavelet transform (DWT) implementation on FPGAs. However, a survey of existing implementations and architectures indicates that the research is still in its infancy as demonstrated by the limited contributions [4], [5]. With regards to DPR mechanism, it has been widely studied in various fields [6]-[10]. A significant contribution presented in [11] with novel FPGA-based scalable architecture for discrete cosine transform (DCT) using DPR and exhibits significant results for partial reconfiguration process with better saving of power consumption, reduce the processing clock cycles and the reconfiguration overhead. These achievements motivate a strong justification to further explore the 3D HWT implementation with DPR and evaluate their performance in terms of area, power consumption and maximum speed. In this paper, the evaluation of the proposed architectures for 3D HWT with transpose based computation and DPR mechanism on FPGA that are suitable for 3D medical image compression is discussed. Comparative studies for both architectures in terms of area, power consumption, maximum speed and the influence of the transform size on the hardware performance are also presented. The structure of the paper is organised as follows. Section II presents the proposed architecture of 3D HWT with DPR mechanism. Experimental results, comparison and analysis are described in Section III. Section IV concludes this paper. II. P ROPOSED A RCHITECTURE FOR 3D HWT WITH DPR In this section, the proposed system architecture as depicted in Fig. 1(a) to (e) is briefly explained, including the implementation of 3D wavelet compression and decompression system, the computation process of 3D HWT with transpose based

978-1-4244-4918-7/09/$25.00 ©2009 IEEE


The output coefficients of the 2D DWT are sent to the second transpose. optimisation of block random access memory (BRAM) has been considered in this work. 1. The DPR module connections are performed with simple bus interfaces. where S0 is the first sub-image and S7 is the eighth sub-image of the input volume. This methodology has the restriction that all design files and reconfigurable modules must be available to the build environment to build partial modules. and meets the design goals. optimises system performance. It uses the module based DPR where configuration frames are reconfigured and busmacros are used to connect the DPR areas with the static area [13]. The main advantage of DPR is that an 138 . As described before all coefficients are stored into memory also the transpositions of T2 are stored after transformation into memory. 1(d) illustrates the details of the working system for the implementation of 3D HWT with DPR. 3D HWT and Transpose Computation of 3D HWT is performed as follows. The transpose T1 acts as a memory forwarder and performs matrix transpose. (d) Proposed top level architecture for 3D HWT using DPR (e) Transpose module implementation without DPR mechanism. A.2PR and PlanAhead 10. After transposition of the resultant matrix. another 1D HWT is performed on the coefficients which are stored in memory to yield the two-dimensional (2D) HWT coefficients. while the static area consists of the data fetch unit and the memory controller (Wishbone compliant). This approach significantly improves utilisations of available storage resources. ISE 9. The proposed system is implemented with the current partial reconfiguration suites. HWT and transposition module are connected with the defined data bit width bus and an enable signal. Fig. since row vectors are provided by the 1D HWT. The 2D HWT computation is performed on each sub-image S0 to S7 for N = 8. The fetch unit sends data and the request to the HWT core as long the free signal is active. a request line and back signal free. top level architecture for 3D HWT with DPR and the transpose module implementation without DPR.Fig. Instead of using the logic and other embedded resources for the transpose implementation. The reconfigurable areas have been used for 1D HWT and different transposition modules. The input to the first one-dimensional (1D) HWT is read row by row. DPR System Architecture and Implementation There are two areas in the DPR framework: reconfigurable and static. T2 . Proposed system architecture framework. (a) and (b) Block diagrams of the 3D wavelet compression/decompression. The calculated values are sent to the transpose module T1 which calculated the memory addresses for the transposition and stores the data into memory. B. and the 1D HWT is performed on each input vector as they are provided. computation. Data fetch unit and HWT DPR area are connected with a defined data bit width bus. (c) Computation of 3D HWT coefficients using transpose based computation. Each cycle where the enable signal is active data will be transposed and written into the memory.1 from Xilinx [12]. This is the conventional row-column 2D HWT computation.

power consumption and maximum speed. In terms of power consumption. full partial bitstreams generated are significantly smaller and hence reducing the storage space required to store the various bitstreams.75% to 12. 3. MRI and PET images for the first slices.941 bytes is required for 3D HWT configuration and the shortest configuration time needed is also the worst at 4. The implementation of 3D HWT with DPR mechanism provides significant results with better saving of area and reduce the power consumption by 1. by comparing the file sizes of the bitstreams. The HWT DPR area can be reconfigured to switch between different transform sizes. After the T1 sub-image transposition the DPR area is reconfigured with the T2 transposition which saves the sub-images and these operations will be repeated for all sub-images. 2. 64 and 128) which have been used for the FPGA implementation. After all sub-images are computed and transposed with T2 . a full bitstream of 3. industry standard connectivity interfaces and equipped with XC5VLX110T FPGA. the transposition module and the 1D HWT module can be changed. Also power consumption and logic size can be reduced by cascading calculation modules. DPR mechanism yielding 17. In the 3D HWT case.88%. FPGA Implementation Both architectures were implemented using VHDL on Xilinx University Program XUPV5-LX110T Development System. C.06%) 1689.27% and 13.45%) 1964.96% respectively. The transform size N dependency is propagated from the HWT module to all connected modules.84 347. package size and power. Influence of transform size on area. III. Parameters Proposed 3D HWT Without DPR With DPR 20. B. In summary. partial reconfiguration has more efficient bitstream and as proven. there are four different transform sizes (N = 8.87%. R ESULTS AND A NALYSIS A.216% better maximum frequency than without DPR.8 ms. medical resonance imaging (MRI) and positron emission tomography (PET) images using 3D HWT in a medical image compression system with context-based adaptive variable length coding (CAVLC). The results show that the file size of transform size (N = 64) for full partial bitstreams is reduced about 86. while by using DPR mechanism the area saving can be achieved between 2. and offers the advantage that no other logic changes are necessary. First transposition T1 performs the row to column transposition which are active till a sub-image is transposed.implementation of a given design can be integrated into a smaller FPGA. Results indicate that the proposed 3D HWT with transpose based computation requires more area. smaller bitstream decreases the configuration time. Medical Images Simulation Fig.92 Area (Slices) Power consumption (mW) Maximum frequency (MHz) 21. 16. Fig. Various transform sizes used are reflecting the various size of volumes data in 3D medical imaging.047 (30. Concerning the generated bitstreams files and configuration times required. The transposition module will be changed during image calculation three times for each sub-image. the graphs are plotted on a log scale to the base 10. On the contrary. Table I lists the overall performance results for both proposed architectures. Comparison of original and reconstructed CT.02 of a full bitstream and the configuration time is also reduced by 86. 1(e) illustrates the implementation of transpose module without DPR with all modules have to be combined and connected with multiplexer. non-partial reconfiguration consumes up to 1377.95% Fig. On the other hand. power consumption and maximum frequency is depicted in Fig. TABLE I R ESOURCES UTILISATION AND OVERALL PROPOSED ARCHITECTURES PERFORMANCE ON V IRTEX -5 (XC5VLX110T) FOR N = 128. Discussions In order to evaluate the relationship of the transform sizes towards the area. 2(a) to (i) illustrate the best quality and compression comparison for the first medical volumes slices of original and the reconstructed slices for computerised tomography (CT). This reduces cost. hence lead to higher area resources demand.779 (30. 32. This development platform comes with on-board memory. For ease of visualisation.96 mW for N 139 .889.14 288. the transposition DPR is reconfigured with the straight transposition and the last 1D HWT is performed on all T2 sub-images. In terms of maximum frequency.

PhD Thesis. B. Maufroid. “Run-Time Reconfigurable Systems for Digital Signal Processing Applications: A Survey”. pp. 1–11. “FPGA Implementation of 3D Discrete Wavelet Transform for Real-time Medical Imaging”. Liberati. Stechele. F. A. V. [10] P.. Cuvelier. [13] Lysaght. vol. Seville. Kromer. vol. Aulagnier. Claus. power consumption and maximum frequency. “Using Partial-RunTime Reconfigurable Hardware to Accelerate Video Processing in Driver Assistance System”. R. 43. “Scalable FPGA-based Architecture for DCT Computation Using Dynamic Partial Reconfiguration”. K. Crookes. Influence of transform size on area. Jiang and D. “VLSI Architecture Design Approaches for Real-time Video Processing”. pp. 2008.81% by performing partial reconfiguration. Mulertt. B. vol. Paulsson. P. Ahmadinia and C. 855–868. Comparative study for both non-partial and partial reconfiguration processes shows interesting conclusions concerning the advantages offered by DPR and lead to a promising solution for implementing computationally intensive applications such as 3D medical image compression. WSEAS Trans. pp. and Mason. on Embedded Comput. The Queen’s University of Belfast 2006. pp. 2006. O. 2008. vol. 7. Lee and R. With DPR mechanism. Cir.. -D. 3. [11] J. R EFERENCES [1] I. J. pp. J.Fig. WSEAS Trans. Manet. International Conference on. 519–522. Conference Design.. Nice.20% to 18. Bertrand. “Design and FPGA Implementation of Matrix Transforms for Image and Video Processing”. S. P. pp. pp. D.. Braun. Tosi. in Field Programmable Logic and Applications. 4. [9] A. Muller and W. Legat. J. and Blodget.2008. Ciano. H. [6] M. vol. Crookes. 1–6. [2] A. the area for static and reconfigurable area can be specified and it can be clearly seen in the layouts generated. 1–6. pp. [12] Xilinx INC v2. Becker. 2008. 2008. 2007. M.1. ACM Trans. in Proc. J. Syst. Proc. 15–31. C ONCLUSIONS Two architectures for 3D HWT have been proposed in this paper based on transpose computation and partial reconfiguration. Shirani. 18th European Conf. FPL ’06. 2005. 2008. “Invited Paper: Enhanced Architectures. Dimililer. power and maximum frequency are optimised and improved. Jiang and D. pp. Sig. Ahmad. Rousseau and P. 607–610. Barba. pp. “Partial Reconfiguration Design with PlanAhead”. J. Journal of VLSI Signal Processing. L. M. Zeppenfeld. in Proc. [4] M. IV. Shoa and S. [8] L. and Young. in order to visualise the impact of non-partial and partial reconfiguration for the proposed architecture. Spain. Moreover. B. Cosmas. Analysis for the performance achieved for different parameters such as area utilised. “Data Path Driven Waveform-like Reconfiguration”. D. on Circuit Theory and Design (ECCTD 2007). Test and Exhibition in Europe (DATE ’07). K. M. and Bridgford. “Area-Efficient High-Speed 3D DWT Processor Architecture”. “Image Compression using Neural Networks and Haar Wavelet”. Automation. Teich. Madrid. chip layouts on different FPGA devices of Virtex-5 are shown in Fig. F. 2007. “The Erlangen Slot Machine: A Dynamically Reconfigurable FPGA-based Computer”. Comparative study for both non-partial and partial reconfiguration processes shows an important conclusion concerning the advantages offered by DPR especially in processing large medical volumes. 213–235. Bobda. Using DPR. and Sys. 47. DeMara. [7] C. Spain. several large systems are mapped to small hardware resources and the area. in Proc. L. 4. 2007. V. Comparison of chip layout for different Virtex-5 devices for N = 64. 1–18. 502–503. D. France. Huang. “An Evaluation of Dynamic Partial Reconfiguration for Signal and Image Processing in Professional Electronics Applications”. Gailliard. Germany. Khashman and K. G. Parris. Journal of VLSI Signal Processing. [5] M.. [3] A. Design Methodologies and CAD Tools for Dynamic Reconfiguration of Xilinx FPGAs”. C. International Conference on Field Programmable Logic and Applications (FPL 2008). Heidelberg. 2006. vol. 2007. Gamrat. Uzun. J. vol. 4. 39. 140 . Loo and J. School of Computer Science. Fig.. 330– 339. Majer. power consumed and maximum frequency achieved clearly reveals that with DPR. Electronics Letter. K. Hubner and J. complex designs can be implemented on limited hardware resources and hence lead to better performance achievements. EURASIP J. pp. 2008. Embedded Syst. = 64 and it saves by 4.