The Ninth International Conference on Electronic Measurement & Instruments ICEMI’2009

Design Of High-Speed Image Processing System Based On FPGA
Xinxi Zhang Yong Li Jinyang Wang Yulin Chen
(Department of Control Engineering, Academy of Armored Force Engineering, Beijing 100072, China)
Abstract- A high-speed image processing system is designed
in this paper in order to resolve the problems such as low
system integration, slow-speed processing existing in image
processing of vehicle-loaded computer. The system realizes
functions of image acquisition, memorizing and overlapping
by configuring Nios II soft core CPU and function modules
such as image preprocessingprocessing and display to build
main hardware on a FPGA chip and by designing of system
software. Due to using programmable chips and parallel
processing technology, the system has high integration, good
maintenance, quick image processing speed and strong
real-time capability.
Key words- Image Processing, FPGA, Nios II CPU.
Recently, the main problems existing in
vehicle-loaded computer lie in two aspects. Firstly, in
the condition of using low power loss POWERPC CPU,
an integrated board with the functions of image
acquisition and image display is needed. Secondly, with
the wide uses of video image and thermal imager and
the development of electronic integration, a high-speed
image processing system should be designed.
In order to resolve these two main problems, the
author design a high-speed image processing system
based on FPGA to realize overlapping of multi-channel
image information. Function modules such as image
acquisition, processing and display can be realized in a
single FPGA chip, which reduces peripheral circuit and
enhance the performance of the whole system. Because
of parallel processing technology, process speed and
real-time performance are enhanced dramatically.
A. Image magnification based on bilinear interpolation
Magnification method based on pixel magnification
principle is simple and fast, but it is just to copy the
original pixel to its neighborhood. With the increase in
magnification factor, image will appear clear blocks and
saw-tooth and can not retain edge information of the
original image. This problem can be resolved by
bilinear interpolation. Bilinear interpolation can be used
to eliminate the saw-tooth, retain the edge information
of the original image and receive better visual effect.
, 1 i j
1, i j
+ 1, 1 i j
+ +

Fig.1. Original Image
( 1), ( ) k i l j v
+ +
, ( ) ki l j v
, ki lj
, ( 1) ki l j
( 1), k i lj
+ ( 1), ( 1) k i l j
+ +
" "
" "

Fig.2. Amplified Image
Fig.1 is the original image, in which

, 1 i j

1, i j

1, 1 i j
+ +
are the adjacent pixel blocks. The fig.2
can be received after amplified
times in horizontal
direction and
times in vertical direction.

, 1 i j

1, i j

1, 1 i j
+ +
in amplified image is just
changed in position and the pixel values remain
unchanged. Thus we can get the following equations:
, ,
, ( 1) , 1
( 1), 1,
( 1), ( 1) 1, 1
ki lj i j
ki l j i j
k i lj i j
k i l j i j
F f
F f
F f
F f
+ +
+ +
+ + + +
In the amplified image, we define the data which
retain information of original image as initial data
whose coordinate value is the integer multiples of
original image and define the data which need
interpolation operation from data of original image as
interpolation data whose coordinate value is not the
integer multiples of original image. When carrying out
interpolation, the pixel values in the positions of initial
data keeps invariable and other pixel values are
calculated by interpolation algorithm.
978-1-4244-3864-8/09/$25.00 ©2009 IEEE
The Ninth International Conference on Electronic Measurement & Instruments ICEMI’2009
The specific algorithm is listed as follows. Set the
pixel position in amplified image as ( , x y ), and the
coordinates in amplified image and original image have
following relationship:
/ x k i u = + / y l j v = + (2)
Where, , x y is the coordinate of amplified image;
, i j is the coordinate of original image; k is the
horizontal magnification times and l is the vertical
magnification times; 0<u <10<v <1.
We can work out the pixel values in amplified
image by bilinear
( , ( )) ( , ) [ ( , ( 1)) ( , )]
( , ) [ ( , 1) ( , )]
( ( 1), ( )) ( ( 1), ) [ ( ( 1), ( 1)) ( ( 1), )]
( 1, ) [ ( 1, 1) ( 1, )]
( , ) ( , ( )) [ ( ( 1
F ki l j v F ki lj v F ki l j F ki lj
f i j v f i j f i j
F k i l j v F k i lj v F k i l j F k i lj
f i j v f i j f i j
F x y F ki l j v u F k i
+ = + + ÷
= + + ÷
+ + = + + + + ÷ +
= + + + + ÷ +
= + + +
< ), ( )) ( , ( ))] l j v F ki l j v
+ ÷ + ¦
Formula (3) shows that the pixel values can be
calculated after only one time interpolation when the
pixel points are located in initial date positions, e.g.
( , ( )) F ki l j v + , ( ( 1), ( )) F k i l j v + + , but three
times interpolations are needed when the pixel points
are located in interpolation data positions.
Formula (3) also shows that the expressions conform
to multiplication-add mode and they can be realized
easily in FPGA chip. As a result, this method not only
guarantees image magnification effect but also
guarantees real-time request.
B. Multi-channel images overlapping
1. Alpha channel
o -channel is the abbreviated form of Alpha channel
which determines the transparency of each pixel of
image except base color channel. The color value of
each channel multiplies its o value to determine its
contribution to the pixel.
Alpha channel uses different gray levels to indicate
the size of the transparency. The o value varies from
0 to 1 and has 256 levels transparency when the
o -channel has 8-bit binary data width. White (Į = 1,
corresponding to 255) is opaque, black (Į = 0,
corresponding to 0) is completely transparent, and gray
value between white and black indicates partly
2. Multi-channel images overlapping
Associativity: image overlapping processing
conforms to associativity:
I overlapping (
I overlapping B ) = (
I ) overlapping B (4)
Deduct formula of multi-channel image overlapping
according to associativity.
o as o value of foreground
I and
i as
color of foreground
I . Set
o as o value of
I and
i as color of foreground
I . b
indicates the color of background B and its o value
is 1.
For (
I overlapping B ), the color value covered by
I in certain pixel is
1 1
i o which is the sampling value
of color
i in this pixel point. Because
o is the
transparency of
I , the transparency of background B
in the pixel is
1 o ÷ and the color value is
(1 )b o ÷ .
The whole color value
i is the sum of the two parts:

1, 1 1 1
(1 )
i i b o o = + ÷ (5)
Then calculate the value of
I overlapping (
overlapping B ), we obtain the color value
i :
2,1, 2 2 2 1,
(1 )
b b
i i i o o = + ÷ (6)
I overlapping
I , set o as o value of
overlapped image and i as its color value, then the
color value of (
I overlapping
I ) overlapping B will
be (1 ) i b o o + ÷ . We have the following formula
basing on associativity:
2 2 2 1 1 1
(1 )[ (1 ) ] (1 ) i i b i b o o o o o o + ÷ + ÷ = + ÷
2 2 2 1 1 2 1
(1 ) (1 )(1 ) (1 ) i i b i b o o o o o o o + ÷ + ÷ ÷ = + ÷
Because the background is random, the overlapping
formula of two layer images can be obtained based on
formula (7).
2 2 2 1 1
(1 ) i i i o o o o = + ÷ (8)
The Ninth International Conference on Electronic Measurement & Instruments ICEMI’2009
The compositive o -channel value is:
2 2 1
(1 ) o o o o = + ÷ (9)
In formula (8), the product of the color value of
each channel multiplying its o value is called
pre-multiply color. When I i o = ,
2 2 2
I i o = ,
1 1 1
I i o = ,
the abbreviation of the formula (8) is:
2 2 1
(1 ) I I I o = + ÷ (10)
The overlapping formula expressed with pre-multiply
color not only makes the expression more brief but also
makes compositive color and compositive o -channel
value have the same expression forms. When the third
I overlaps on layer
I and layer
I , the
overlapping formula can be deduced from two layers
overlapping formula.
I as pre-multiply color of the third layer
image and
o as its o -channel value, then the
pre-multiply color of three layers overlapping is:
3 3 2 2 1
(1 )[ (1 ) ] I I I I o o = + ÷ + ÷ (11)
The compositive o -channel value is:
3 3 2 2 1
(1 )[ (1 ) ] o o o o o o = + ÷ + ÷ (12)
Similarly, compositive pre-multiply color and
compositive o -channel value of n layers overlapping
1 1 2 2 2 2 1
1 1 1
(1 ){ (1 )[ (1 )( ( (1 ) ))]}
(1 ) (1 )(1 ) (1 )
n n n n n n
n n n n n k k
o o o o
o o o o
÷ ÷ ÷ ÷
÷ ÷ ÷
= + ÷ + ÷ + ÷ + ÷
= + ÷ + + ÷ ÷ ÷ +
1 1 2 2 2 2 1
1 1 1
(1 ){ (1 )[ (1 )( ( (1 ) ))]}
(1 ) (1 )(1 ) (1 )
n n n n n n
n n n n n k k
o o o o o o o o o o
o o o o o o o
÷ ÷ ÷ ÷
÷ ÷ ÷
= + ÷ + ÷ + ÷ + ÷
= + ÷ + + ÷ ÷ ÷ +
The system consists of Nios II soft core CPU, image
preprocessing module, image display moduleimage
processing module and universal peripheral interface.
Fig. 3 shows the general structure of the system.

Fig.3. General Structure of System
The system integrates soft core CPU developed by
Altera, SDRAM controller, general purpose
input/output interface and image preprocess, processing
and display modules self-defined by the author. Each
module is connected by Avalon bus and works under the
coordination of NIOS II CPU.
Besides video decoder chip, video DAC chip,
memory and keyboard, the others of the system are all
integrated in FPGA chip. The working process of the
system is as follows. Analog video signal from the
camera is decoded by the video decoder chip which is
configured by I2C bus and transmitted to the image
preprocess module. The image preprocessing module
formats and de-interlaces the video data, and then stores
them to SDRAM. There are several DMA channels
which are defined in image processing module. The
images of each channel overlap in image overlapping
module. Display controller generates the scanning time
sequence signal and converts digit image data into
analog signals with DAC chip. All functional modules
of system work parallel in a unified clock, and are
controlled by the Nios II CPU. In addition, users could
achieve the corresponding image processing function
through the instruction from keyboard.
A. Image Preprocessing Module Design
Image preprocessing module contains ITU-R656
decoder, FIFO and input DMA.
Video signals decoded from video decoder chip are
interlaced and should be transmitted respectively in the
form of odd field and even field. Video data format of
each line digit video signals is shown in fig.4. “FF 00
00 SAV” is time reference code which marks the
beginning of effective video data. “Cb0 Y0 Cr0 Y1 Cb2
Y2 Cr2 Y3 … Cr718 Y719” is effective video data
which accords with the ITU-R656 standard. “FF 00 00
EAV” is also the time reference code which marks the
end of effective video data.
Fig.4. Data Format of Digit Video Signals
ITU-R656 decoder detects the time reference code
“FF 00 00 SAV”, generates an effective line signal, and
starts to decode the following video data. It converts
The Ninth International Conference on Electronic Measurement & Instruments ICEMI’2009
8-bit ITU-R656 data into 16-bit YCrCb data. It also
generates input DMA control signal.
Input DMA defines DMA master port. On the one
hand, it provides writing display memory address, data
and writing request signal to the Avalon bus, starts bus
transfer and stores the YCrCb data in SDRAM display
memory; on the other hand, it also deals with
de-interlacing for data sources, that is to say the
interlaced signal is converted to line-by-line signals.
B. Image Processing Module Design
Image processing module contains two DMA master
ports, data buffer and image overlapping sub-module. It
is shown in fig.5.
Fig.5. Image Processing Module
1. Multi-channel DMA
Multi-channel DMA is the master port equipment of
Avalon bus. It is responsible to provide the effective
address, data and reading request signal to the Avalon
bus, and starts bus transmission in the rising clock edge
in order to read memory’s data; meanwhile, the DMA
module produces the write data address of the buffer,
and transmits the data from memory to the data buffer
When the multi-channel DMA reads data from
different memory, each DMA master port can work in
parallel without interfering. However, when it reads
data from the same memory, DMA will conflict
between the address bus and data bus. To resolve this
problem, the module uses the flow line design. Each
DMA master port occupies address bus and data bus at
different time. When the first DMA master port uses the
bus, the other master ports wait. After fixed time delay,
the first DMA master port releases the bus and the other
ports grab the use privilege of the bus, and then repeat.
Delay time length is in inverse proportion to subtract
between memory reading clock and display frequency.
2. Overlapping sub-module
Overlapping sub-module realizes the overlapping of
multi-channel image information. Various channels
pixel’s transparency Alpha value and overlapping
region’s starting and ending address are defined in this
sub-module. Take the overlapping of 3 channels images
for example. Referenced to formula (11), the image
overlapping algorithm can easily realized in FPGA chip
because of its multiply-add operation.
C. Display Controller
This module is used to generate the scanning signal
to drive the display, as well as address signals of
reading data cache and read request signal. At the same
time it also conveys current procession address to the
image processing module and controls related modules.
System software runs on the Nios CPU. Since
all the functional modules of the whole system are
designed by hardware, CPU only configures registers of
each module with slave port module of registers. As a
result, the burden of CPU is relatively small; thereby
effectively improve speed of the system. Software flow
is shown in fig.6.
Software and hardware verifications have run on
Altera’s DE2 board. The results are shown in fig.7.
Fig.7 (a) shows image after two channels images
overlap. Fig.7 (b) shows image after video image
overlaps characters.

Fig.6. Software Flow
Compared with traditional image processing system
based on software, the main advantages of image
processing module designed in this paper are as follows:
1,Each DMA channel can works parallel, process
multi-channel image information simultaneously, and
speed image data processing.
2,The use of hardware multiplier embedded in FPGA
achieves high-speed and real-time computation;
3,Using the embedded RAM resources as a cache to
store part of (or a few lines) image data can increase the
The Ninth International Conference on Electronic Measurement & Instruments ICEMI’2009
data processing throughput;
4,In this module, CPU is only responsible to change the
disposition parameter dynamically rather than
participates in the concrete processing operation,
therefore raises system's speed rate.

Fig.7 (a).Image After Two Channels Images Overlap

Fig.7 (b). Image After Video Image Overlap Characters
Fig. 7. Verification Images
This high-speed image processing systems based on
FPGA can obtain and process images parallel.
Compared with the general PC-based image processing
system, it has the characteristics of high integration,
high-speed image processing and real-time processing.
The system implements various image processing
functional modules on programmable chip. It doesn’t
change the hardware structure with re-configuration
technology. The existing system can be updated online
or through the network, which makes it widely used in
many areas.
[1] Peng Chenglian, Zhou Bo. Challenge SOC-SOPC design and
practice based on NIOS[M]. Beijing: Tsinghua University Press,
[2] Pan Song, Huang Jiye. SOPC Technology Practical Tutorial [M].
Beijing:Tsinghua University Press, 2005
[3] Zhou Li gong. SOPC Embedded System Basic Tutorial[M].
Beijing: Beihang University Press,2006
[4] Zhao Wei, You Zeqing, Guo Shuxu. Algorithms for the
Multilayer Overlay in Video Effect[J]. Journal of Computer-aided
Design & Computer Graphics, Vol.13,No.2 Feb.,2001
[5] Reid Porter, Jan Frigo, Maya Gokhale. A Programmable,
Maximal Throughput Architecture for Neighborhood Image
Processing[C]. Field-Programmable Custom Computing Machines,
2006. FCCM '06. 14th Annual IEEE Symposium on 279-280.
[6] Gribbon, K.T.; Bailey, D.G.; Johnston, C.T.Design Patterns for
Image Processing Algorithm Development on FPGAs[C].
TENCON 2005 2005 IEEE Region 10.