J Asc 2012

Archives Des Sciences
Vol 65, No. 12;Dec 2012
HW/SW Co-design for FPGA based Video Processing Platform

Yahia SAID (Corresponding author) Laboratory of Electronics and Microelectronics (EE) Faculty of Sciences of Monastir , University of Monastir 5019, Tunisia E-mail: said.yahia1@gmail.com Taoufik SAIDANI, Wajdi ELHAMZI, Mohamed ATRI Laboratory of Electronics and Microelectronics (EE) Faculty of Sciences of Monastir, University of Monastir 5019, Tunisia E-mail: saidani_taoufik@yahoo.fr, elhamziwajdi@yahoo.fr, mohamed.atri@fsm.rnu.tn
Abstract In this paper we present a Video Processing Platform (VPP) for rapid prototyping based on FPGA (Field Programmable Gate Arrays) architecture using EDK embedded system and Xilinx System Generator. This hardware/software co-design platform has been implemented on a Xilinx Spartan 3A DSP FPGA. The video interface blocks are done in RTL and the MicroBlaze soft processor is used as an embedded video controller. This paper discusses the architectural building blocks showing the flexibility of the proposed platform. This flexibility is achieved by using a new design flow based on Xilinx System Generator. This Video Processing Platform allows custom-processing blocks to be plugged-in to the platform architecture without modifying the front-end (capturing video data) and back-end (displaying processed output). This paper presents several examples of video processing applications, such as a Prewitt edge detector and video wavelet coding that have been realized using the Video Processing Platform (VPP) for real-time video processing. Keywords: Field Programmable Gate Arrays (FPGA), Real Time Video Processing, Embedded Development Kit (EDK), System Generator (XSG). 1. Introduction Image and video processing are an ever expanding and dynamic areas with applications reaching out into our everyday life such as in medicine, astronomy, ultrasonic imaging, remote sensing, space exploration, surveillance, authentication, automated industry inspection and in many more areas [1]. Reconfigurable hardware in the form of Field Programmable Gate Arrays (FPGAs) has been proposed as a way of obtaining high performance for Image Processing, even under real time requirements [2]. Implementing image processing algorithms on reconfigurable hardware minimizes the time-to-market cost, enables rapid prototyping of complex algorithms and simplifies debugging and verification. Therefore, FPGAs are an ideal choice for implementation of real time image processing algorithms [3].
504
ISSN 1661-464X
Vol 65, No. 12;Dec 2012
With the evolution of FPGA architecture, it has in build processor for designing reconfigurable embedded system. The design involves use of processor, hardware logic IP and its integration. This is termed as System on Chip (SoC) design [4]. The Xilinx Embedded Development Kit (EDK) is offered for SoC design platform. It provides a rich set of tools like Software development kit (SDK) to develop embedded software application and Xilinx platform studio (XPS) for hardware development and with a wide range of embedded processing Intellectual Property (IP) cores including processors and peripherals. Integrating all the cores with processors inside the FPGA leads to reconfigurable embedded processor system [5]. The introduction of high level hardware system modeling tools has further accelerated the design of image processing in FPGA. The Xilinx System generator (XSG) offers a new design methodology that uses a model based approach for design and implementation of Digital Signal Processing (DSP) applications in FPGA [6]. XSG is an important design tool which is an extension of Simulink and consists of a Simulink library called the Xilinx blockset that can be mapped directly into target FPGA hardware. XSG provides the functionality for performing co-simulation for designs that run both in hardware and in software which make it possible to complete even very long simulations within a much shorter period of time [6]. Figure 1 shows a design flow using the XSG. The software automatically converts the high level system block diagram to RTL. The result can be synthesized to Xilinx FPGA technology using ISE tools. All of the downstream FPGA implementation steps including synthesis and place and route are automatically performed to generate an FPGA programming file.
Figure 1. XSG based design flow for hardware implementation
505
ISSN 1661-464X
Vol 65, No. 12;Dec 2012
System Generator provides a system integration platform for the design of video processing system on FPGAs that allows the RTL, Simulink, MATLAB, and C/C++ components of a DSP system to come together in a single simulation and implementation environment. It also supports a black box block that allows RTL to be imported into Simulink and co-simulated. System Generator constructs the VHDL design of the model, generates a pcore for this model and integrates it with the hardware/software platform in the XPS project. The EDK Processor IP block provides an interface to MicroBlaze and Custom logic being developed in XSG. In this approach export IP core technique is used for designing SoC system [6]. The Xilinx Embedded Development Kit (EDK) tools make it possible to implement a complete video processing system on a single FPGA using hardware/software codesign methods. In this approach, custom image/video processing modules developed in System Generator can be integrated as a dedicated hardware peripheral to the existing framework. The objective of this work is to develop a real-time video processing platform (VPP) with an input from a CMOS camera and output to a DVI display and verified the results video in real time. This platform provides rapid development of image and video processing algorithms: Model-based designs developed with XSG are converted to hardware blocks that can be incorporated easily into VPP. This paper is organized as follows: Section 2 describes the Platform design overview. Section 3 presents two examples of video processing applications developed with XSG which are a Prewitt edge detector and video wavelet coding. Finally, a brief conclusion and directions for future work are given in Section 4. 2. Overal Platform Design The board used for VPP is the VSK Spartan 3A-DSP Platform developed by Xilinx [7]. This board has Xilinx Spartan-3A DSP XC3SD3400A-4FGG676C FPGA with 53,712 logic cells, 126 DSP48A Slices, and 2,268Kb of block ram (BRAMs). The board has an add-on card: the FMC-Video IO daughter card that augments the video capabilities of the Video Processing Platform. The FMC-Video includes camera interface to allow the capture of data from a custom camera based on a Micron MT9V022 Digital CMOS color image sensor [8]. Images with 8 or 10 bits per pixel, 742H by 480V, 60 frames per second are captured by the high performance MT9V022 image sensor's 10 bit A-D converter and serialized for image transmission [9]. The data stream from the camera is in the form of a high-speed LVDS data stream. This stream is received and deserialized using a National DS92LV1212A deserializer. This is capable of carrying LVDS data from a camera which has a pixel rate of 26.6 MHz [8]. This board is ideal for a video processing platform since it has all the hardware necessary to capture and display the data on a monitor. Video data are captured from the camera at a resolution of 742x480P at 60Hz. Then these data are sent through a Gamma block for data correction, and then on to the Video to VFBC, so that we only send the active data into the MPMC. The default is a 3-frame buffer, and a simple sync signal that is connected between the Video to VFBC and the Display Controller to make sure that we read out one frame behind what is being written into the external memory. The display controller then reads data out of memory and passes it to the DVI out.
506
ISSN 1661-464X
Vol 65, No. 12;Dec 2012
We have built a flexible architecture that enables real-time image and video processing. The overview of the design is given in Figure 2
Figure 2. Platform design overview The complete streaming video application includes Video interfaces, a run-time configurable processing blocks and a real-time video processing block. The system is controlled by a MicroBlaze processor [10] that initializes the VPP peripherals and Controls the Video Processing and Frame Buffer Pipelines by reading and writing control registers in the system. The MicroBlaze soft processor core is a 32-bit Harvard Reduced Instruction Set Computer (RISC) architecture optimized for implementation in Xilinx FPGAs with separate 32-bit instruction and data buses running at full speed to execute programs and access data from both on-chip and external memory at the same time [10]. It is used as an embedded video controller in this design. The block diagram of MicroBlaze is shown in Figure 3. The peripherals are connected to the Embedded MicroBlaze processor through Processor Local Bus (PLB). The Processor is connected to dual-port SRAM, called Block RAM (BRAM), using a dedicated Local Memory Bus (LMB). This bus features separate 32-bit wide channels for program instructions and program data, using the dual-port feature of the BRAM. The LMB provides single-cycle access to on-chip dual-port Block RAM.
507
ISSN 1661-464X
Vol 65, No. 12;Dec 2012
Figure .3 MicroBlaze Core Block Diagram The complete video system is created using the Xilinx Embedded Development Kit (EDK) [5] and System Generator for DSP [6]. The Embedded Development Kit is an integrated development environment for designing embedded processing systems. System Generator is a system-level modeling tool from Xilinx that facilitates FPGA hardware design. It can automatically generate accelerator blocks in the form of a custom peripheral for the embedded video application that allows the MicroBlaze processor to read and write shared memories in the customized video accelerators. The synthesis results of the overall system are given in Table 1. VPP uses few resources of the FPGA; hence space is available for additional logic such as image and video processing applications.
Table 1. The synthesis results of the proposed platform Resource Type Slices Slice Flip Flops 4 input LUTs bonded IOBs BRAMs DSP48s Used 7810 9706 11170 78 64 3 Available 23872 47744 47744 469 126 126 % 33% 20% 24% 17% 50% 3%
508
ISSN 1661-464X
Vol 65, No. 12;Dec 2012
3. Case Study Using Xilinx System Generator Two video processing applications have been designed and developed using Xilinx System Generator. A Prewitt edge detector and video wavelet coding blocks have been designed and tested with VPP, as previously described. In this section, output images are real-time video results of the different hardware components generated by System Generator. 3.1 Prewitt Gradient Edge Detector Edges characterize boundaries as well as giving the information of the location objects, shape, size, and object textures. Therefore, edge detection has a fundamental importance in image processing. Edges in images characterize object boundaries and are therefore useful for segmentation, registration, and identification of objects in a scene. Edge detection refers to the process of identifying and locating sharp discontinuities in an image [11]. The discontinuities are abrupt changes in pixel intensity which characterize boundaries of objects in a scene. The most well known technique for edge detection involves convolving the image with a 2-D filter, which is constructed to be sensitive to large gradients in the image while returning values of zero in uniform regions [12]. Prewitt is gradient based edge detection algorithm which performs a 2-D spatial gradient measurement on the video data. It uses two 3X3 kernels to convolve with the original image. Hence, all of the edges in an image, regardless of direction, can be detected by implementing the sum of two directional edge enhancement operations. First, RGB data are converted into grayscale to obtain image intensity, using the following equation:
(1)
The kernels are then applied separately to the image intensity, to produce separate measurements of the gradient component in each orientation (called Gx and Gy) as shown in (2).
and
(2)
These can then be combined together to find the absolute magnitude of the gradient at each point and the orientation of that gradient as follow:
and
(3)
The Prewitt edge detector is build as a video processing accelerator, using System Generator for DSP and Simulink. The design of our filter is shown in Fig. 4.
509
ISSN 1661-464X
Vol 65, No. 12;Dec 2012
Figure 4. Prewitt IP core in system generator
The System Generator design contains an EDK Processor block that can be exported as an EDK pcore using the EDK Export Tool compilation target. The export process creates a PLB-based pcore, which is integrated to the Microblaze 32 bit soft RISC processor with the Xilinx Platform Studio (XPS) [6]. In the VPP setup a DVI display shows the output edge from the camera. Experimental setup for implementation of Prewitt edge detection is presented in Figure 5.
Figure 5. Experimental setup for implementation of edge detection. Input is from CMOS camera and the output is on a DVI display.
510
ISSN 1661-464X
Vol 65, No. 12;Dec 2012
The total resource usage for the system, including the MicroBlaze, bus structure, the Prewitt edge core and peripherals, is 9096 slices, equaling 38% of the FPGAs total resources. Table 2 shows the amount of logic used for the Prewitt edge module. The post-synthesis resource usage of this module is 5%. It has a post-synthesis maximum estimate frequency of 68.432MHz. Table 2. Post-synthesis device utilization for the Prewitt Edge Pcore Resource Type Slices Slice Flip Flops 4 input LUTs bonded IOBs BRAMs DSP48s Maximum Frequency Used 1286 1746 1710 0 5 4 Available 23872 47744 47744 469 126 126 68.432 MHz % 6% 4% 4% 0% 3% 3%
3.2 Discrete Wavelet Transform Discrete Wavelet Transform (DWT) is a broadly used digital signal processing technique with application in diverse areas such as digital speech recognition, feature extraction, multi-resolution video processing and data compression [13]. DWT, originally implemented through Mallats filterbank algorithm [14], has been rendered more efficient by the development of the lifting scheme that has been incorporated in the JPEG 2000 image compression standard. The lifting scheme entirely relies on the spatial domain, has many advantages compared to filter bank structure, such as lower area, power consumption and computational complexity. Lifting has other advantages, such as in-place computation of the DWT, integer-to-integer wavelet transforms which are useful for lossless coding. The lifting scheme has been developed as a flexible tool suitable for constructing the second generation wavelets. It is composed of three basic operation stages: split, predict and update (Figure 6).
Image
Split
Prediction
Up dating
+
Figure 6. Lifting scheme forward transform
K-1
511
ISSN 1661-464X
Vol 65, No. 12;Dec 2012
The implementation of lifting schemes is decomposed of two levels 2D-DWT, it may be computed using filter banks as shown in Figure 7. The input samples X(n) are approved through two stages of analysis filters.
Figure 7. Lifting scheme decomposition of 5/3 filter
They are first processed by low-pass (h(n)) and high-pass (g(n)) horizontal filters and are sub sampled by two. Subsequently, the outputs (L1, H1) are processed by low-pass and high-pass vertical filter. Note that: L1, H1 are the outputs of 1D-DWT; LL1, LH1, HL1 and HH1 one-level decomposition of 2D-DWT. From the earlier structure, for a separable 2D-DWT with N levels of transformation, it can be easily achieved by concatenation of 1D-DWT units, with the first stage processing N transformation levels on rows and the second one with N transformation levels on columns. For image compression purposes, JPEG 2000 recommends an alternate row/column based structure as the one presented in Figure 7.The sub-band decomposition of an image when the standard 2D-DWT with two transformation levels is presented in Figure 8. H and L correspond to high and low-pass filter stages, respectively.
Hn
2
LL2
Hn
2
L2
Gn Hn
2
LH2
2
LL1
Hn
HL2
Hn
2
L1
Gn Gn
2
H2
2
LH1
X(n)
Gn
Vertical Filter Horizontal filter
Hn
HH2
HL1
Gn
2
H1
Gn
Horizontal filter Vertical Filter
HH1
Figure 8. Subband decomposition for two-level 2D-DWT
512
ISSN 1661-464X
Vol 65, No. 12;Dec 2012
The design of the DWT 2D Codec in System Generator is shown in Figure 9. Experimental results of DWT2D codec implementation is presented in Figure 10.
Figure 9. DWT2D IP core in system generator
Figure 10. Experimental setup for implementation of DWT2D Codec
The total resource usage for the system, including the MicroBlaze, bus structure, DWT2D Codec Pcore and peripherals, is 8833 slices, equaling 38% of the FPGAs total resources. Table 3 shows the amount of logic used for the DWT2D Codec module. The post-synthesis resource usage of this module is 5%. It has a post-synthesis maximum estimate frequency of 65,167 MHz.
513
ISSN 1661-464X
Vol 65, No. 12;Dec 2012
Table 3. Post-synthesis device utilization for the DWT2D Codec Pcore Resource Type Slices Slice Flip Flops 4 input LUTs bonded IOBs BRAMs DSP48s Maximum Frequency Used 1023 1246 1323 162 3 4 Available 23872 47744 47744 469 126 126 65.167 MHz % 5% 3% 3% 4% 3% 4%
4. Conclusion Continual growth in the size and functionality of FPGAs over recent years has resulted in an increasing interest in their use as implementation platforms for image processing applications, particularly real-time video processing [15]. In this work, we have presented a video processing platform (VPP) for real-time video processing application. This platform provides a development environment that allows designers to quickly begin to experiment with video processing using the Spartan-3A DSP family of FPGAs. An embedded base system shipped with the VSK [7], provides a familiar starting point from which existing processor-based video applications can be ported, or new designs created. The user can build flexible video processing systems that include embedded processors and customized video accelerators and verify video hardware designs in a fraction of the time using hardware co-simulation provided by System Generator. Two applications have been presented showing the performance and flexibility of the proposed platform. For the Prewitt edge detection system architecture, including the MicroBlaze, bus structure, the Prewitt edge core and peripherals, the total resource usage is 9096 slices, equaling 38% of the FPGAs total resources. It has a post-synthesis maximum estimate frequency of 88.547MHz. The DWT2D codec system architecture has 85.292 MHz maximum frequency and uses 8833 CLB slices with 38% utilization, so there is possibility of implementing some more parallel processes with this architecture on the same Platform. The Xilinx System Generator tool, offers an efficient and straightforward method for transitioning from a PC-based model in Simulink to a real-time FPGA based hardware implementation. Custom video accelerator blocks are captured in the DSP friendly Simulink modeling environment, converted into custom peripherals for Platform Studio and then connected to the embedded system using the processor local bus. Future works include the use of the Xilinx System Generator and EDK development tools for the implementation of a computer vision application: object detection and tracking system on the proposed Platform.
514
ISSN 1661-464X
Vol 65, No. 12;Dec 2012
References
[1] [2] [3]
[4] [5] [6] [7] [8] [9] [10] [11] [12]
[13] [14] [15]
Russ J. C, The Image Processing Hand book, Sixth Edition, CRC Press, 2011. D.Crookes, Design and implementation of a high level programming environment for FPGA-based image processing, IEEE Proceedings on Vision, Image, and Signal Processing, vol 4, 2000. D.V.Rao, S.Patil, N.A.Muthukuma, Implementation and Evaluation of Image Processing Algorithms on Reconfigurable Architecture using C-based Hardware Descriptive Languages, International Journal of Theoretical and Applied Computer Sciences, pp.9-34, 2006. R.Peesapati, S. Sabat, K.Venu , Automatic IP Core generation in SoC, International Journal of Recent Trends in Engineering, Vol 2, No. 6, 2009 Xilinx Inc. Embedded System Tools Reference Manual, http://www.xilinx.com Xilinx System Generator user Guide, http://www.xilinx.com Spartan-3A DSP FPGA Video Starter Kit user Guide, http://www.xilinx.com Xtreme DSP Solution FMC-Video Daughter Board Technical Reference Guide, http://www.xilinx.com Micron MT9V022 CMOS image sensor product brief, http://www.micron.com MicroBlaze soft processor, http://www.xilinx.com J.Canny, A computational approach to edge detection, IEEE Trans. Pattern Anal.Mach. Intell, vol. PAMI-8, no.6, pp. 679-698, Jum.1986. S.Behera, M.N.Mohanty, S.Patnaik, A Comparative Analysis on Edge Detection of Colloid Cyst: A Medical Imaging Approach, Soft Computing Techniques in Vision Science, Studies in Computational Intelligence, Springer, Volume 395, pp 63-85 , 2012. D.S.Taubman, M.W.Marcellin, JPEG2000, Image Compression Fundamentals, Standards and Practice, Kluwer Academic Publishers, ch.6, 2002. S.Mallat, A theory for multiresolution signal decomposition: the wavelet representation, IEEE Trans. Pattern Anal. Mach. Intell, Vol 11, pp. 674-693,1989 B.Hutchings, J.Villasenor, The Flexibility of Configurable Computing, IEEE Signal Processing Magazine,vol15, pp. 6784,1998.
515
ISSN 1661-464X

J Asc 2012

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

J Asc 2012

Uploaded by

Copyright:

Available Formats

Archives Des Sciences

Vol 65, No. 12;Dec 2012

HW/SW Co-design for FPGA based Video Processing Platform

Archives Des Sciences

Vol 65, No. 12;Dec 2012

Figure 1. XSG based design flow for hardware implementation

Archives Des Sciences

Vol 65, No. 12;Dec 2012

Archives Des Sciences

Vol 65, No. 12;Dec 2012

Archives Des Sciences

Vol 65, No. 12;Dec 2012

Archives Des Sciences

Vol 65, No. 12;Dec 2012

Archives Des Sciences

Vol 65, No. 12;Dec 2012

Figure 4. Prewitt IP core in system generator

Archives Des Sciences

Vol 65, No. 12;Dec 2012

Archives Des Sciences

Vol 65, No. 12;Dec 2012

Figure 7. Lifting scheme decomposition of 5/3 filter

Figure 8. Subband decomposition for two-level 2D-DWT

Archives Des Sciences

Vol 65, No. 12;Dec 2012

Figure 9. DWT2D IP core in system generator

Figure 10. Experimental setup for implementation of DWT2D Codec

Archives Des Sciences

Vol 65, No. 12;Dec 2012

Archives Des Sciences

Vol 65, No. 12;Dec 2012

[4] [5] [6] [7] [8] [9] [10] [11] [12]

[13] [14] [15]

You might also like