55 views

Uploaded by Joshua Duffy

FPGA based project

- fpga based 32 bit risc processor.
- FINAL REPORT.docx
- lec-fpga
- Instrumentation Engineering
- Effect of Symlet Filter Order on Denoising of Still Images
- Typewriter
- PFE Adaptive Filter Architectures for FPGA Implementation
- Static Timing Analysis An
- Altera Power-Optimized Solutions for Telecom Applications
- Report
- DE10-Nano User Manual (1)
- AIWA 6ZG-1
- f 0533134
- Lec15.ppt
- A Phase Congruency Based Patch Evaluator for Complexity Reduction in Multi Dictionary Based Single Image Super Resolution 2016 Information Sciences
- Reconfig Syllabus
- Untitled
- iee05tjt
- U1
- 232 Fpga Board

You are on page 1of 12

An FPGA-Based Fully Synchronized Design of a

Bilateral Filter for Real-Time Image Denoising

Anna Gabiger-Rose, Student Member, IEEE, Matthias Kube, Robert Weigel, Fellow, IEEE, and

Richard Rose, Student Member, IEEE

AbstractIn this paper, a detailed description of a synchronous

eld-programmable gate array implementation of a bilateral lter

for image processing is given. The bilateral lter is chosen for one

unique reason: It reduces noise while preserving details. The de-

sign is described on register-transfer level. The distinctive feature

of our design concept consists of changing the clock domain in

a manner that kernel-based processing is possible, which means

the processing of the entire lter window at one pixel clock cycle.

This feature of the kernel-based design is supported by the ar-

rangement of the input data into groups so that the internal clock

of the design is a multiple of the pixel clock given by a targeted

system. Additionally, by the exploitation of the separability and

the symmetry of one lter component, the complexity of the design

is widely reduced. Combining these features, the bilateral lter is

implemented as a highly parallelized pipeline structure with very

economical and effective utilization of dedicated resources. Due to

the modularity of the lter design, kernels of different sizes can be

implemented with low effort using our design and given instruc-

tions for scaling. As the original form of the bilateral lter with

no approximations or modications is implemented, the resulting

image quality depends on the chosen lter parameters only. Due

to the quantization of the lter coefcients, only negligible quality

loss is introduced.

Index TermsBilateral lter, eld-programmable gate array

(FPGA), image processing, noise reduction, real-time processing.

I. INTRODUCTION

B

ILATERAL ltering has gained great popularity in image

processing due to its capability of reducing noise while

preserving the structural information of an image. The bilateral

lter [1] consists of two components. The detail-preserving

property of the lter is mainly caused by the nonlinear lter

component also called photometric lter. It selects the pixels of

similar intensity which are averaged by the linear component

afterward. Very often, the linear component is formulated as

a low-pass lter. The amount of noise reduction via selective

averaging and the amount of the blurring via low-pass ltering

are both adjusted by two parameters. The understanding of

Manuscript received March 5, 2012; revised August 6, 2012 and October 24,

2012; accepted December 6, 2012. Date of publication October 25, 2013; date

of current version February 7, 2014.

A. Gabiger-Rose, R. Weigel, and R. Rose are with the Institute for Elec-

tronics Engineering, Friedrich-Alexander University of Erlangen-Nuremberg,

91058 Erlangen, Germany (e-mail: anna.gabiger-rose@fau.de; robert.weigel@

fau.de; richard.rose@fau.de).

M. Kube is with the Department of Contactless Test and Measuring Systems,

Fraunhofer Institute for Integrated Circuits, 91058 Erlangen, Germany (e-mail:

matthias.kube@iis.fraunhofer.de).

Digital Object Identier 10.1109/TIE.2013.2284133

these parameters is very intuitive, which leverages the bilateral

lter to an almost all-purpose solution in image processing.

The authors of [2] and [3] show that noise ltering, despite

the prevailing view, not always implies resolution reduction

but can even be used to sharpen the edges [2] or to enhance

the owlike structures [3]. In [4], the motion-adaptive bilateral

lter is used for quality improvement in low bit rate video

coding. Also, in [5], the bilateral lter is applied for noise

reduction in a method for local tone mapping which maps high

dynamic range image to low dynamic range image.

Recently, bilateral ltering has gained a high awareness

level in medical image processing and nondestructive testing.

The authors of [6] studied the impact of noise reduction by

the bilateral lter applied to the reconstructed images. They

concluded that the images processed with this lter show a

signicant improvement in image quality compared to their

unltered counterparts. In [7], the authors discuss the results of

noise reduction by the bilateral lter in projection space. This

means that the noise ltering takes place prior to computing the

reconstructed volume. It has been concluded that noise reduc-

tion of this kind can be translated into a dose reduction in X-ray

computed tomography. Considering industrial applications, the

dose reduction permits the reduction of the scanning time and

thus allows a higher throughput of test items.

Our own experiments and studies shown in [8] and [9]

conrm the possible dosis reduction. As the reduction of the

exposure time due to ltering is feasible, we are interested

in a real-time ltering of projections. Moreover, the lter is

not supposed to reduce the spatial resolution of projections to

maintain the visibility of defects in a reconstruction. Since we

achieve very satisfying results considering detail preservation

with our eld-programmable gate array (FPGA) implementa-

tion presented in [10], we intend to give a deeper insight in our

work.

The major contribution of this paper is the detailed descrip-

tion of a novel FPGA design architecture of the bilateral lter

on register-transfer level (RTL). This abstraction level is chosen

for the possibility of direct specication of the clocking scheme

[11]. The main advantages of this design are the capability of

real-time processing and economical and effective utilization of

resources through the following.

1) Sorting the data into equal groups to which separate

pipelines are assigned.

2) Raising the internal clock frequency according to the data

ow.

3) No external image buffer is necessary.

0278-0046 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.

See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

4094 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 61, NO. 8, AUGUST 2014

Moreover, due to the modularity of the design, it can be

extended to implement arbitrary kernel size with low effort. The

instructions required for this can be found later in this paper.

The remainder of this paper is organized as follows. In

Section II, we consider the related work. After a short descrip-

tion of the bilateral lter in Section III, we give a detailed

description of our FPGA design in Section IV. Section IV is

the main part of this paper presenting the lter design stage

by stage. In Section V, the criteria applied to the evaluation of

the image quality prior and after the noise ltering are detailed.

After that, in Section VI, the results are discussed, and the

performance potential of our lter design is analyzed.

II. RELATED WORK

Since the bilateral lter is in widespread use, a lot of effort

has been put into acceleration for use in practical applications.

Mainly, among the publications concerning speeding up of

the bilateral ltering, two trends can be stated. One stream

is focused on the modication of the ltering components,

resulting in an efcient algorithm. Another trend is to accelerate

the ltering through parallelizing the algorithm or through

hardware acceleration, including modications of the lter at

the same time.

In [12], a fast approximation of the original bilateral lter

is proposed. Here, the 2-D ltering is separated into two 1-D

operations performing 1-D bilateral ltering in one arbitrary

dimension and ltering the intermediate result in the same

manner in the subsequent dimension. The authors report that

the proportionality of the execution time to the number of

lter dimensions decreases from exponential to linear. This

approach requires a little memory overhead but results in a

lter which is fast enough to be used for preprocessing in video

compression systems. However, as the photometric component

of the bilateral lter is not separable, the image resulting from

the modied lter is documented to be slightly different from

the image produced by the original lter.

Another acceleration approach proposed in [13] has given a

basis for numerous extensive works. This approach provides a

numerical scheme for speeding up the ltering via a piecewise-

linear approximation of the bilateral lter in the intensity do-

main and substituting the low-pass ltering by downsampling.

In [14], this technique is extended by transposing the computa-

tion to a 3-D space presenting the image intensity as a third

dimension over the 2-D image coordinate space. After that,

the authors of [15] formulated the concept of the bilateral grid

and implemented the bilateral lter using the proposed data

structure on three different graphics processing units (GPUs).

Not until then, by means of their hardware acceleration, a

processing with 30 fps is possible which they assign as real-

time performance. Later, the technique proposed in [13] was

also implemented on a GPU by the authors of [16] and is

also capable of the real-time processing with the same frame

rate. More recently, the lazy sliding window implementation

of the approach in [13] was proposed in [17]. This method

is suitable for single-instruction-multiple-data-type processors

like DSPs. In this case, the speedup also allows applications

requiring real-time performance. The main drawback of the

lter acceleration approach discussed so far is the high amount

of memory required for the implementation.

Instead of a piecewise-linear approximation and subsam-

pling, the idea of utilizing a histogram-based approach for

accelerating the lter is presented in [18] and [19]. The main

difference between these two works is that, in [18], a hierarchy

of partial distributed histograms on multiple tiers is computed

and adjusted for each output pixel while the author of [19]

calculates the integral histogram of the image and extracts the

histogram for each target lter window to obtain one output

pixel. These methods both are fast, but a real-time performance

of the histogram-based approach in [19] can only be achieved

by very-large-scale-integration design of the lter shown in

[20]. The memory demand of the histogram-based acceleration

method is also high but is lower than that of the piecewise-linear

approximation and subsampling approach.

The aforementioned examples show that a lter modication

technique reaches real-time performance only if its imple-

mentation utilizes hardware acceleration. Most of the referred

works rely on GPUs for acceleration. However, in elds of

applications in which high power efciency is crucial, an FPGA

solution is preferable. In [21], an algorithm for the denoising of

medical images is implemented on an FPGA and four different

GPUs. The authors show that the power consumption of their

FPGA implementation is always signicantly lower. Further-

more, the authors of [21] point out that an FPGA implementa-

tion allows to count latency in image lines, resulting in delays

lower than one frame, while the latency on a GPU is always

one frame. This is relevant for many medical applications which

demand fast image output to supply interactive operations.

The authors of [22] also choose an FPGA implementation

for their image processing system because moving time-critical

functionalities, like the edge detection in an image, to hardware

platforms makes it possible to keep delays in the control loop

to a minimum. The authors of [23] and [24] report excellent

experience of using FPGAs for motion control of robots based

on real-time image processing. The main reason for using

FPGAs for real-time robotics tasks is the ability of FPGAs to

satisfy the requirement for high computational power and data

throughput [24]. Moreover, FPGA solutions offer additional

advantages, such as recongurability and portability.

However, considering complexity and timing constraints of

the algorithm to be implemented, the suitability of the chosen

hardware platform has to be checked [25]. A DSP implemen-

tation has been regarded to be more appropriate for complex

algorithms with high data dependence. For algorithms with

low data dependence and high timing constraints, an FPGA

solution is more suitable. The authors of [25] discuss in detail

the advantages of using FPGAs even if the algorithm shows

both high complexity and timing constraints. At the same time,

the authors of [26] emphasize in their conclusion that FPGA-

based digital processing systems achieve better performance, at

a lower cost, than traditional solutions based on DSPs.

Furthermore, the parallel architecture of the FPGA provides

an excellent platform for the implementation of paralleled and

pipelined structures. This conclusion is made by many authors.

Therefore, implementing an algorithm for color image segmen-

tation for object detection in full parallelism on an FPGA, the

GABIGER-ROSE et al.: FPGA-BASED FULLY SYNCHRONIZED DESIGN OF BILATERAL FILTER 4095

authors of [27] report a drastic improvement of the speed of

segmentation compared with the sequential-code-based seg-

mentation. In [28], a design of a fully pipelined data path for

real-time face detection using FPGA is described which sup-

ports high-speed detection irrespective of the number of faces

in an image. The authors of [29] implement their paralleled and

fully pipelined hardware for real-time electromagnetic transient

simulation on an FPGA and thereby solve a challenging prob-

lem of implementation of the complex simulation models.

There are several publications dealing with FPGAimplemen-

tations of the bilateral lter. In [30], one of these designs is

presented. The verilog hardware description language (VHDL)

code of this design is generated automatically from the mod-

els for FPGA synthesis using System Generator from Xilinx.

Although the optimization setting for the code generation was

for maximum clock frequency, the authors admit that the speed

of their implementation for a 15 15 pixel lter kernel is

insufcient for a real-time application. The authors of [31]

compared a VHDL and a high-level synthesis (HLS) descrip-

tion, created by System Generator, of an adaptive impulse noise

lter and concluded that higher speed of the system clock can

be achieved using VHDL description. Thus, these publications

showexemplarily that the handcrafted optimization of an FPGA

design regarding both the operating frequency and the resource

utilization is still irreplaceable.

A different approach for the FPGA implementation of a real-

time bilateral lter has been proposed in [32]. The modied

lter is based on the calculation of the lter coefcients from

the photometric lter only. The spatial ltering is eliminated

due to the processing of the minimal window of 3 3 and

raising of the derived photometric coefcients to the power of

8. According to the authors, for a moderate noise level, their

modied bilateral lter can achieve slightly better results com-

pared to the traditional bilateral lter shown in [1]. However,

the original bilateral lter can be tuned by two parameters

which are highly responsible for the ltering performance.

Unfortunately, no description of the parameters used for this

comparison is given in [32].

The work published in [33] is most related to our work.

The major parallel to our design consists in implementing the

bilateral lter on an FPGA without any modication. This

approach is sometimes called brute-force method. However, the

main difference to our work is that the authors developed their

design using an HLS tool. The resulting architecture presents a

3 3 lter kernel. In contrast, our design is based on an RTL

description and presents a 5 5 lter kernel. Our design allows

high clock frequency and high data throughput and shows only

a slight increase of resource demand considering the larger

kernel. From this follows that our architecture utilizes hardware

resources more efciently and more economically.

III. BILATERAL FILTER

The bilateral lter [1] embodies the idea of a combination

of domain and range ltering. The domain lter averages the

nearby pixel values and acts thereby as a low-pass lter. The

range lter stands for the nonlinear component and plays an

important part in edge preserving. This component allows

averaging of similar pixel values only, regardless of their po-

sition in the lter window. If the value of a pixel in the lter

window diverges from the value of the pixel being ltered by a

certain amount, the pixel is skipped.

Taking Gaussian noise into account, the shift-variant ltering

operation of the bilateral lter is given by

( m

0

) =

1

k(m

0

)

mF

(m) s ((m

0

), (m)) c(m

0

, m).

(1)

The term m = (m, n) denotes the pixel coordinates in the

image to be ltered and m

0

= (m

0

, n

0

) and m

0

= ( m

0

, n

0

)

represent the coordinates of the centered pixel in the noisy and

in the ltered images, respectively. With these notations,

( m

0

)

means the gray value of the pixel being ltered, and (m)

identies the gray value of the spatially neighboring pixels to

(m

0

) in the lter window F.

The following expressions (2) and (3) describe the photo-

metric and the geometric components s((m

0

), (m)) and

c(m

0

, m), respectively:

s ((m

0

), (m)) = exp

_

1

2

_

(m

0

) (m)

ph

_

2

_

(2)

c(m

0

, m) = exp

_

1

2

_

m

0

m

c

_

2

_

(3)

where parameters

ph

and

c

regulate the width of the Gaussian

curve assigned to s((m

0

), (m)) and c(m

0

, m), respectively.

The photometric component compares the gray value of the

centered pixel with the gray values of the spatial neighborhood

and computes the corresponding weight coefcients depending

on the factor

ph

. The more the absolute difference of the

gray values exceeds

ph

, the lower is the corresponding lter

coefcient and vice versa. The domain lter c(m

0

, m) acts as

a standard low-pass lter, the weights of which are reciprocally

proportional to the spatial distance of the centered pixel to the

pixels in the neighborhood.

Normalization with

k(m

0

) =

mF

s ((m

0

), (m)) c(m

0

, m) (4)

guarantees that the range of the ltered images does not change

signicantly due to the ltering. Owing to the fact that the

coefcients of the photometric component cannot be computed

in advance, the division by the normalization factor cannot be

avoided by means of prescaling of the lter coefcients.

IV. DESIGN CONCEPT

The image data, as well as all constants and coefcients

used in the following design concept, are integer numbers. As

discussed in Section VI, there is no need to implement oating-

point computation. With the aid of the presented design con-

cept, the bilateral lter can be realized as a highly parallelized

pipeline structure giving great importance to the effective re-

source utilization. In this paper, the data paths are detailed. The

description of the control signals is not addressed here.

4096 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 61, NO. 8, AUGUST 2014

Fig. 1. Order of the functional units of the bilateral lter.

Fig. 2. Principle of the input data retrieval for the image ltering.

For the design description, a window size of 5 5 is chosen.

This window size is the tradeoff between high noise reduction

and low blurring effect.

The design concept for the implementation of the bilateral

lter is subdivided into three functional blocks. The block-

based design approach reduces design complexity and simpli-

es validation [34]. Fig. 1 presents these units and their order

in the concept. The input data marked by Data_in are read

line by line and arranged for further processing in the register

matrix. The second unit is the photometric lter which weights

the input data according to the intensity of the processed pixels.

The ltering is completed by the geometric lter, and the

ltered data are marked by Data_out.

A. Register Matrix

The photometric lter component, also often referred to as

a range lter in the related literature, is a nonlinear lter. It

means that the lter coefcients change for every lter position.

Thus, the pixel weights for the photometric component have

to be calculated separately for every pixel in the lter window.

The number of weights depends on the lter window size. Here,

24 weights have to be computed for the ltering of one image

pixel.

The lter window is shifted rst along the input lines rep-

resenting the image rows, moving one row down every time

the precedent row has been ltered. Consequently, the demand

arising from this ltering technique is that at least ve lines

have to be stored for the period of time during which a line

is ltered. As an external image buffer is undesired because

of the additional expenses of resources due to the memory

controller and because of the additional latency due to the

memory accesses, the ve input lines are stored in the line

storages which are implemented as block RAMs for data with

N bits. The ve input lines are called image rows or rows in the

following. These ve rows include the row to be ltered, two

foregoing rows, and two succeeding rows.

This arrangement is depicted in Fig. 2. The pixel being l-

tered is marked by mid_pix. This pixel and its neighborhood

in the solid box represent the kernel of the bilateral lter.

After the middle row has been ltered, the outer foregoing row

Fig. 3. Register matrix of the kernel-based design concept.

line storage n-2 moves out of the register matrix. As the

input data are read into the register matrix pixel by pixel, the

content of the line storages and of the lter kernel is shifted

by one pixel at each clock event. This shift emulates the shift

of the lter kernel. Acting this way, at the end of an image

line, all remaining rows are shifted one row down. The former

succeeding row line storage n + 1 can now be processed. The

output lines form the output image which is stored externally.

The parallel calculation of 24 weights in the photometric

lter component and the subsequent weighting in the geometric

component combined with the nal normalization at the lter

output require a large amount of resources considering the

sparse time of just one pixel cycle. Due to the exibility of the

clock management in FPGAs, this challenge can be accepted.

The solution is offered by our kernel-based design concept in

Fig. 3. The single registers are interconnected in a manner that,

aside from the shift of the lter window by one pixel, the entire

kernel is provided to the next lter stage simultaneously. This

is an important advantage of the presented kernel-based design

concept as no extra data buffer is required. On the other hand, it

is necessary to process all 25 pixels in one pixel cycle in order

to keep up with the reading of the input lines into the register

matrix.

The output of the register matrix is sorted into groups, in this

case into six groups, and fed into the photometric lter compo-

nent with the quadruple pixel clock frequency synchronously.

GABIGER-ROSE et al.: FPGA-BASED FULLY SYNCHRONIZED DESIGN OF BILATERAL FILTER 4097

Fig. 4. Abstract illustration of the photometric lter component.

The number of the groups is explained by the symmetry of

the geometric lter component which is discussed later in

Section IV-C. The sorting is done by means of multiplexing the

pixels in the manner shown in Fig. 3. The quadruplication of

the lter processing clock is implemented by setting the select

signal of the multiplexers four times in one pixel clock. Here,

the clock domain changes to the fourfold of the input pixel

clock. The counter on the top of Fig. 3 generates the select

signal and thus controls the readout of the register matrix. This

counter is clocked with the quadruple pixel clock as well. The

counter is rst enabled after the whole register matrix is lled.

The pixels in each group are processed in parallel while each

group is pipelined through to the register matrix output stage.

The pixel in the center of the lter window is not a part of any

group and is forwarded to a latch belonging to the input stage

of the photometric lter component. The sorting of the pixels

into groups and the quadruplication of the pixel clock are the

key to the presented synchronous FPGA design concept using

a parallelized pipeline architecture.

B. Photometric Component

After the register matrix has been lled, the grouped image

data are provided to the photometric lter component which

is pictured in Fig. 4. At the output of the photometric lter, the

weighted pixels appear, still sorted into groups, accompanied by

the weighted mid_pix. Additionally, the photometric coef-

cients have to be forwarded for the required normalization at the

last stage of the ltering according to (4). Thus, in parallel to the

pixels, the photometric coefcients also have to be processed by

the geometric lter in order to obtain the normalization factor

dened in (4). For this reason, the output of the photometric

lter consists of the following:

1) weighted pixels sorted into groups 0 . . . 5;

2) the weighted pixel being ltered, marked by mid_pix;

3) photometric coefcients corresponding to groups 0 . . . 5.

In further stages of the design, the weighted pixel values, i.e.,

the outputs of the multipliers, are named by their groups 0 . . . 5.

A detailed functional ow block diagram of the photometric

lter is shown in Fig. 5. The pixel in the center of the lter

window has to be available during the calculation of the re-

quired 24 pixel weights. Latching the centered pixel allows the

computation of the gray value differences between the centered

pixel and the remaining pixels inside of the lter window. Each

group contains four pixels. A separate pipeline belonging to

each group makes it possible to process the entire neighborhood

of mid_pix at one pixel clock signal. All six pipelines are

designed identically.

Fig. 5. Photometric lter component.

Fig. 6. Processing order of input data in the photometric lter component.

The way of arranging and the processing order of the input

data of the photometric component are shown in Fig. 6. At the

rst internal clock event t

0

, the rst pixels of each group are

provided to the respective pipeline. At the second internal clock

t

1

, the second pixels of each group enter the component. This

organization of groups allows the processing of the whole lter

window in four internal clock cycles corresponding to one pixel

cycle. In the upper part of Fig. 5, the processing path for the

group 0 is shown; in the lower part, there is the processing path

for the group 5.

4098 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 61, NO. 8, AUGUST 2014

Fig. 7. Limitation of the number of coefcients.

The combinatory blocks comb.0 . . . 5 compute the abso-

lute gray value difference required by (2). In order to keep

the design synchronous, the gray values of each pipeline are

registered during the difference calculation. The upper path in

Fig. 5 shows the required registers labeled group 0 to make

sure that the gray value appears at the input of the multiplier

at the same time as the corresponding photometric coefcient.

Through the following, we use registers to keep our design

synchronous. Thus, it makes any delay control inside of our

architecture redundant.

To avoid the calculation of the expensive exponential, all

possible values of the function (2) are precalculated and stored

in the lookup table (LUT). The absolute difference of the

gray values itself is directly interpreted as the address of the

corresponding weight coefcient in the LUT.

Due to the quantization, the number of the weight coef-

cients is limited. This limit depends on three parameters:

1) the word length N of the input data;

2) the parameter

ph

;

3) the word length W of the coefcients.

The rst point means that increasing the color depth of an

image causes a larger amount of intensity differences that

have to be stored in the LUT. Depending on the parameter

ph

, the slope of the Gaussian curve is steeper or more at

which inuences the number of coefcients different from zero

after the quantization. It depends on the word length W itself

whose coefcients actually are different from zero after the

quantization.

In Fig. 7, the coefcients are plotted for N = 8 b, W = 8 b,

and

ph

= 60. As the negative exponential converges toward

zero for increasing gray value differences, there are only a

limited number of quantized coefcients that are different from

zero. Considering the example in Fig. 7, there are only 188

coefcients to be stored. For simplication of the internal

control, the number of coefcients is extended to the next

power of 2, resulting in the highest address 2

P

1. In the

example, the highest address is 255. The coefcients are stored

in the LUT of each pipeline in the initialization phase of the

ltering.

Fig. 8. Abstract illustration of the geometric lter component.

If N is greater than P, via logical disjunction of left (N-P) bits,

it is checked whether the gray value difference is greater than

the chosen limit 2

P

1. The result of the disjunction selects the

coefcient address. If the gray value difference is greater than

the limit, the weight coefcient is set to zero which is stored

at the address 2

P

1. In the opposite case, the corresponding

coefcient is read out of the LUT. This coefcient may also

be zero as the number of coefcients is extended to 2

P

1.

During the readout of the coefcient, the related gray value is

registered for synchronicity. At the next internal clock event, the

gray values of each group are multiplied by the corresponding

coefcients while registering the coefcients in coeff. group

0 . . . 5 for the nal normalization.

The pixel in the center of the lter window does not belong to

any group and is processed separately. This pixel is multiplied

by the highest coefcient 2

W

1 and delayed by registers

photo_k middle and geom_in middle for synchronicity.

C. Geometric Component

For the design of the geometric lter component, advantage

is taken of its separability and its symmetry. Because of the

separability, the geometric lter is split into the vertical and hor-

izontal parts. Therefore, 2-D ltering is replaced by successive

1-D ltering in vertical and horizontal directions. This solution

is preferred in the design of the geometric lter because 1-D

ltering can be implemented more efciently. Both parts are

implemented twice to lter the weighted image data and the

photometric weights simultaneously which is shown in Fig. 8.

The input of the vertical component parts is the 2-D array

of the lter window and the 2-D array of the corresponding

coefcients. Each output is a 1-D vector in which each entry

represents one ltered and cumulated column. The coefcients

of the geometric component are labeled C_0, C_1, C_2. The

output of the geometric lter consists of the ltered unnor-

malized gray value (kernel result) and the normalization factor

(norm result).

Due to the symmetry of the weight coefcients of the geo-

metric component, the order of multiplication and addition is

swapped in both lter parts. This fact plays an important role

in pixel group formation. At rst, the weighted gray values

which are located at the same distance from the centered pixel

in the lter window are summed up [35]. Because of the equal

distance, these gray values should be weighted with the same

coefcient anyway. For a 5 5 window, there are always 4

GABIGER-ROSE et al.: FPGA-BASED FULLY SYNCHRONIZED DESIGN OF BILATERAL FILTER 4099

Fig. 9. Vertical part of the geometric lter component.

or 8 pixels at the same distance from the centered pixel. For

the simplicity of the design, it makes sense to assemble the

pixels into equally large groups. Smaller groups allow for better

handling of the design. For this reason, the pixels are divided

into groups of four with regard to the subsequent processing

explained in the following sections. After the accumulation of

the pixels according to their symmetry, the sum is multiplied

by the corresponding coefcient. The horizontal processing is

done in the same way.

The coefcients for the geometric component are scaled in

such a manner that the sum of the vertical coefcients (and

the horizontal ones, respectively) is equivalent to the so-called

normalized one [35]. For the signed coefcients with the word

length W, the normalized one is equal to 2

W1

. This means

that the division of the weighted gray values and photometric

coefcients after geometric ltering can be realized as a simple

shift operation. In the last stage, the normalized ltered gray

value has to be divided by the normalized product of the photo-

metric coefcients. The geometric coefcients are calculated in

advance and stored in a block RAM.

1) Vertical Component Part: The rst stage of the geometric

component is the vertical part which is pictured in Fig. 9. With

the aid of Fig. 6, it can be seen that the pixels of the rst column

numbered 1, 2, 3, 4, 5 and the rst pixel of the middle column

numbered 11 enter the vertical component part simultaneously.

For the corresponding photometric coefcients, the same order

of processing is valid.

The groups 0, 1, 2, 3, 4, which means all columns with the

exception of the centered column, are processed as shown in

the upper part of Fig. 9. The geometrically symmetrical pixels

are cumulated at rst and then multiplied by the geometric

weight coefcient. All coefcients for the geometric lter are

constant for the chosen lter window size. Due to the scaling

Fig. 10. Horizontal part of the geometric lter component.

of the geometric coefcients, it is assured that the accumulation

does not result in a carry. The registers REGcol 0,1,2 in this

part of the design are used to delay weighted data to maintain

synchronicity. After the multiplication, the weighted values are

summed up by the adder tree to one value at each internal clock

event.

The processing of the centered column is detailed in the

lower part of Fig. 9. The centered pixel is weighted and delayed

by REGcen so that this pixel and the remaining pixels in the

centered column can be fed to the input of the adder tree simul-

taneously. The remaining pixels enter the dedicated processing

path one by one. They were multiplexed in the register matrix

in the way that they can be combined pairwise and multiplied

by the same coefcient in the geometric component. In order

to weight the pixels in a proper way, every incoming pixel is

stored in the register REGcol mid so that the subsequently

calculated sum is valid every second internal clock event. The

multiplexing of the lter coefcients with zeros assures that

invalid sums vanish due to the multiplying by zero and do not

falsify the result.

As it is shown in Fig. 8, the vertical part of the geometric l-

ter for the weighting of the photometric coefcients is designed

identically.

2) Horizontal Component Part: In Fig. 10, the horizontal

part of the geometric component is displayed. After processing

in the vertical dimension, the lter window is reduced to one

row, and its elements are computed at one internal clock event

each. In order to be able to reuse the symmetrical design, the

values of the ltered columns 0, 1, 3, 4 are stored in the shift

registers according to the order of their reception. The ltered

photometrical coefcients are stored in the same way. Since the

content of the shift register in the left part of Fig. 10 is valid

at every fourth internal clock event, the time domain changes

4100 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 61, NO. 8, AUGUST 2014

Fig. 11. Final normalization of the ltered data.

here to the domain of the pixel clock. This domain change is

indicated by the dashed line in Fig. 10. All operations on the

right-hand side of the dashed line are executed according to the

pixel clock.

At every pixel clock signal, the valid column values are writ-

ten to the registers which perform the division of the weighted

gray values by the normalized ones. The division is imple-

mented through a shift operation. The remaining processing is

similar to the processing described in the previous paragraph.

The geometrically symmetrical pixels are cumulated at rst and

multiplied afterward by the geometric weight coefcient. For

the geometric ltering in the horizontal direction, the same geo-

metric coefcients are used as for the vertical ltering. The nal

division by the normalized one is performed in the next stage.

D. Normalization

At the nal stage, the kernel result has to be normalized by

the normresult as shown in Fig. 11. After the nal accumulation

of these values, they are both divided by the normalized one

again. In this manner, the word lengths of the weighted gray

values and of the norm are both (W 1) bits shorter. Finally,

after the division, N bits of the nal result are forwarded to the

output of the bilateral lter.

E. Design Scalability

In previous paragraphs, we detailed the lter design for the

5 5 kernel. However, depending on an application, another

kernel size might be required. For small images, a 3 3

window size is more suitable to prevent blurring. Some authors

choose to work with a larger kernel of the size of 11

11 pixels [36]. Our design can be scaled for different kernel

sizes. Starting at the register matrix, it has to be dimensioned

according to the required kernel size. The kernel size in one

dimension is assigned with K in the following:

N

groups

= K + 1 (5)

where N

groups

means the number of the pixel groups. The

quantity of the line storages equals K. The number of required

multiplexers equals N

groups

. The multiplexing pattern of the

pixels remains unchanged for every kernel size. According to

the symmetry of the kernel, the pixels have to be grouped into

N

groups

containing n

group_member

pixels each

n

group_member

= K1. (6)

The groups are always built up in the manner that each row

except for the middle pixel forms a pixel group. The middle

column represents the last pixel group in which particular

attention has to be paid to the arrangement of the pixels in order

to keep the weighting in the geometric component valid.

Furthermore, the number of pipelines, including combinatory

blocks and coefcient LUTs in the photometric component,

equals N

groups

. The design of the pipelines remains the same.

The number of the pipelines in the vertical part of the geo-

metric component changes according to the kernel size. For

the structure in the upper part of Fig. 9, (K + 1)/2 pipelines

are required because the geometrical symmetry of the pixels

has to be taken into account. The lower part of the verti-

cal geometric component remains unchanged except for the

multiplexer which has n

group_member

inputs according to the

required lter window size. The shift register of the horizontal

part of the geometric component has to be dimensioned for

(K1) values. The number of the connected pipelines has

to be adjusted to the length of the shift register, taking the

geometrical symmetry into account again. The processing of

the centered column remains unchanged. The same holds for

the normalization coefcients as well.

Finally, if the maximal operating frequency f

operating

is

known, the internal clock frequency f

internal

can be determined

as follows:

f

internal

=

f

operating

n

group_member

. (7)

According to the internal clock frequency f

internal

, the counter

has to be adjusted, which generates the select signal for the

multiplexers and the enable signal EnREG for the horizontal

part of the geometric component.

V. IMAGE QUALITY ASSESSMENT

To evaluate the performance of the noise reduction and the

accuracy of the detail preservation, criteria for the image quality

assessment are required. The criteria chosen in this work are

PSNR

dB

and MSSIM.

1) PSNR

dB

: The well-known peak-signal-to-noise ratio

PSNR

dB

in decibels is dened as follows:

PSNR

dB

=20 log

10

_

GV

max

MSE

_

(8)

MSE =

1

MN

N

_

ref

(m)

(m)

_

2

(9)

where MSE denotes the mean squared error between the

image to be compared and the reference image. GV

max

represents the maximum gray value depending on the

word length after the digitalization of the images. The

noiseless M N image with gray values

ref

(m) pro-

vides the reference for the measurement of the MSE.

The gray values

(m) originate from the image to be

compared. Considering the quality of the noise lter,

PSNR

dB

describes the capability of the lter to suppress

noise regardless of the perceived visual quality of the

ltered image.

GABIGER-ROSE et al.: FPGA-BASED FULLY SYNCHRONIZED DESIGN OF BILATERAL FILTER 4101

2) MSSIM: The mean structural similarity index MSSIM is

a method for the assessment of the image quality that

takes advantage of the characteristics of the human visual

system [37]. First, the local structural similarity SSIM of

the 11 11 image blocks v(

ref

) and v(

) is calculated

SSIM

_

v(

ref

), v(

)

_

= l

_

v(

ref

), v(

)

_

c

_

v(

ref

), v(

)

_

s

_

v(

ref

), v(

)

_

(10)

where l(v(

ref

), v(

function, c(v(

ref

), v(

the image blocks after luminance subtraction, and

s(v(

ref

), v(

contrast normalization. After averaging the SSIM of J

blocks over the whole image, the mean value MSSIM

MSSIM(

ref

,

) =

1

J

J

j=1

SSIM

_

v

j

(

ref

), v(

)

_

(11)

of an entire image represented by

is identied. The

value MSSIM = 1 means that two images are completely

identical. The smaller the MSSIM, the less the structural

similarity that the two images show. The detailed descrip-

tion of MSSIM can be found in [37].

VI. RESULTS

After an implementation in Matlab, the proposed architecture

of the bilateral lter was implemented in VHDL and simulated

with ModelSim. A test image was ltered by Matlab imple-

mentation as well as the ModelSim simulation, and the ltered

images were compared. The purpose of this comparison is to

analyze the image quality drop due to the quantization of the

lter coefcients in our FPGA design.

The test image Lighthouse shown in Fig. 12(a) is an 8-b

grayscale image with a size of 512 512 pixels. Hence, in the

following, GV

max

= 255 is used.

In order to apply the bilateral lter to a color image, the

color data have to be transformed into the CIELab color space

[1]. The structure of the lter remains unchanged. However,

processing of color images is beyond our research interest, so

no results on this topic will be reported.

A. Performance Analysis

For the comparison of the ltering capability between

the Matlab implementation and the ModelSim simulation,

Gaussian noise with standard deviation

noise

= [10, 20, 30, 40,

50, 60] was added to the test image.

In Fig. 12, the test image is contrasted with its noisy coun-

terpart with

noise

= 20 and two ltered images. The lter

parameters

ph

= 3

noise

and

c

= 1 were chosen for the

photometric and geometric components, respectively. For lter-

ing in Matlab, no quantization of the lter coefcients was ap-

plied. The corresponding ltered image is shown in Fig. 12(c).

For the simulation with ModelSim, the coefcient word length

W = 8 was used. The simulation result is shown in Fig. 12(d).

Fig. 12. (a) Original image. (b) Noisy image with

noise

= 20. (c) Filtering

in Matlab. (d) Filtering in ModelSim.

4102 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 61, NO. 8, AUGUST 2014

Fig. 13. Performance comparison of the Matlab implementation and the

ModelSim simulation.

Between the Matlab implementation and the ModelSim simu-

lation, no visually distinguishable difference can be registered.

The results of the quantitative comparison between the Mat-

lab implementation and the ModelSim simulation are con-

trasted in Fig. 13 and summarized in Table I. As our recent

research shows, by adjusting

ph

as a multiple of the measured

standard deviation of noise rather than by a single constant,

even better PSNR

dB

can be achieved. Thus, an optimal setting

for the lter can be chosen which reduces noise and prevents

blurring at the same time as far as possible. Exceeding this point

causes oversmoothing, and choosing the adjusting parameter

below this point leads to insufcient noise suppression. The

discussion of this topic is important but beyond the scope of

this paper. For more details, refer to [38].

Fig. 13 reveals that, for increasing noise levels, PSNR

dB

and

MSSIM both increase after noise ltering. For higher standard

deviation of noise, the gain is higher. Using our setting

ph

=

3

noise

, averaging with higher weights is performed for in-

creasing noise levels. Owing to this fact, PSNR

dB

rises by a

higher amount. MSSIM also increases because the geometrical

component remains narrow, preventing oversmoothing.

TABLE I

FILTERING RESULTS

TABLE II

SYNTHESIS RESULT

The numbers in Table I show that applying the presented

lter architecture delivers results almost as good as that of

the Matlab implementation. The slight decrease of the image

quality due to ltering by ModelSim simulation is explained by

coefcient quantization and by rounding of the internal values

during the shift operations. No artifacts caused by quantization

are introduced into the ltered image. In summary, the simula-

tion results are highly satisfying.

B. Verication

For verication, a Virtex-5 FPGA platform equipped with a

Virtex XC5VLX50-1 device was used. The shortened synthesis

report of the lter design is shown in Table II. A long-term

trial proved that the design is suitable for real-time processing.

The FPGA board was connected to a camera with a 12-b

resolution depth, generating 30 fps at a full resolution of 1024

1024 pixels.

Due to the technical specication of the camera, pauses be-

tween the frames are necessary so that 30 fps is the maximally

achievable frame rate. Thus, the maximal data ow reaches

approximately 31.5 Mpixel/s. Consequently, we restricted the

clock frequency of our design to 40 MHz in this application.

The internal clock frequency is 160 MHz. With this clock rate,

a maximal throughput of 38 fps is possible.

With a different camera, an even higher frame rate is achiev-

able. Using our FPGA platform, the maximal possible internal

frequency shown in Table II is 220 MHz. Hence, the maximal

operating frequency of our lter design with the contemplated

FPGA Virtex-5 equals 55 MHz. Considering the image reso-

lution of 1024 1024 pixels, the following frame rate can be

computed:

_

(1024 1024)

pixels

frame

18.18 ns

pixel

_

1

= 52.45

frames

second

. (12)

This calculation is valid only for a throughput of 1 pixel/cycle

which is given by our design.

GABIGER-ROSE et al.: FPGA-BASED FULLY SYNCHRONIZED DESIGN OF BILATERAL FILTER 4103

TABLE III

CITED FPGA IMPLEMENTATIONS OF THE BILATERAL FILTER

The total delay of the output pixels of our architecture with

a kernel size of 5 5 pixels applied to an image of 512

512 pixels is 2560 + 36 cycles. The time required for lling

up of the register matrix, depending on the kernel size and

image width, results in a delay of 5 512 = 2560 cycles. The

processing time from the multiplexers in the register matrix to

the output of the normalization stage is constant and depends

not on the kernel size. The critical operations are performed

at internal clock frequency. If the kernel size is changed, the

pixel groups have to be reordered, and the internal clock has to

be adjusted according to (7). In this case, the processing time

still accounts for 36 cycles. The normalization by division costs

24 cycles, which makes out 66% of the whole processing time.

For the evaluation of the performance of the lter design,

a comparison with other implementations from the references

is given in Table III. Except for the authors of [32], all other

authors implement the original bilateral lter from [1]. From

[32], the full parallel architecture is used for the comparison

in Table III. All lters are implemented on different FPGAs of

different families and generations, which makes the comparison

less signicant, but still, itemizing some features like the max-

imum clock frequency of the design or the resource demand

might give a good insight.

Our design works at the highest clock frequency. However,

considering the kernel size of 5 5 pixels and the switching

of the time domain, our architecture presents only the third

highest frame rate. However, it looks different if we implement

a 3 3 lter kernel. In this case, the operating frequency is

110 MHz, and the resulting frame rate doubles, which puts the

performance of our design on the second place.

Regarding the resource demand, it should be clear that the

logic elements of Altera and the logic slices of Xilinx are

built differently. The values in Table III give merely a hint at

the FPGA area used by each design. On the other hand, the

number of required multipliers can be compared directly. In

[30], the number of the multipliers is not available. According

to the statement of the authors of [33], an efcient parallel

implementation of a bilateral lter for a 5 5 mask requires 25

multipliers.We have shown that our design concept is efcient

and it requires only 23 multipliers. Therefore, considering the

implemented window size of 5 5 pixels, we use the resources

more economically.

VII. CONCLUSION

In this paper, we have given a detailed description of an

FPGA design of the bilateral lter for real-time image pro-

cessing. The advantages of our design can be summarized in

following points.

1) The lter design for a kernel size of 5 5 shown here

utilizes the FPGA resources economically, which makes

it feasible to implement the lter on a common medium-

sized FPGA.

2) The introduced register matrix at the rst stage of the

lter makes external image storage redundant, contribut-

ing to the decrease of the resource demand of the lter

implementation.

3) The shown architecture is synchronous and capable of

real-time processing supporting high clock frequencies.

Maximal operating frequency depends on the chosen

FPGA family.

4) Conceiving our lter architecture, we kept in mind the

scalability of the design in order to enable the implemen-

tation of arbitrary lter window size with low effort.

5) The shown lter architecture assures a constant process-

ing delay independent of the lter window size. The total

delay is the sum of the processing delay and the ll-up

time of the line storages which depends on the kernel size

and image width.

6) Image quality assessment in terms of PSNR

dB

and struc-

tural similarity assured that the image quality loss due

to coefcient quantization and due to rounding of the

internal results is negligible.

REFERENCES

[1] C. Tomasi and P. Manduchi, Bilateral ltering for gray and color im-

ages, in Proc. IEEE ICCV, 1998, pp. 839846.

[2] B. Zhang and J. P. Allebach, Adaptive bilateral lter for sharpness en-

hancement and noise removal, IEEE Trans. Image Process., vol. 17,

no. 5, pp. 664678, May 2008.

[3] B. Yan and A.-D. Saleh, Structure enhancing bilateral ltering of

images, in Proc. IEEE PCSPA, 2010, pp. 614617.

[4] M. de-Frutos-Lpez, H. Medina-Chanca, S. Sanz-Rodrguez, C. Pelez-

Moreno, and F. Daz-de-Mara, Perceptually-aware bilateral lter for

quality improvement in low bit rate video coding, in Proc. IEEE PCS,

2012, pp. 477480.

[5] J. Won Lee, R.-H. Park, and S. Chang, Noise reduction and adaptive

contrast enhancement for local tone mapping, IEEE Trans. Consum.

Electron., vol. 58, no. 2, pp. 578586, May 2012.

[6] J. Giraldo, Z. Kelm, L. Yu, J. Fletcher, B. Erickson, and C. McCollough,

Comparative study of two image space noise reduction methods for com-

puted tomography: Bilateral lter and nonlocal means, in Proc. Conf.

IEEE EMBS, 2009, pp. 35293532.

[7] L. Yu, A. Manduca, J. Trzasko, N. Khaylova, J. Koer, C. McCollough,

and J. Fletcher, Sinogram smoothing with bilateral ltering for low-

dose CT, in Proc. SPIE Med. Imag.: Phys. Med. Imag., 2008, vol. 6913,

pp. 691329-1691329-8.

[8] A. Gabiger, R. Weigel, S. Oeckl, and P. Schmitt, Enhancement of CT

image quality via bilateral ltering of projections, in Proc. 1st Int. Conf.

Image Formation X-ray Comput. Tomography, 2010, pp. 140143.

[9] A. Gabiger-Rose, R. Rose, M. Kube, P. Schmitt, and R. Weigel, Noise

adaptive bilateral ltering of projections for computed tomography, in

Proc. 11th Int. Meet. Fully Three-Dimens. Image Reconstruction Radiol.

Nucl. Med., 2011, pp. 306309.

[10] A. Gabiger, M. Kube, and R. Weigel, A synchronous FPGA design of

a bilateral lter for image processing, in Proc. IEEE IECON, 2009,

pp. 19901995.

[11] T. Riesgo, Y. Torroja, and E. de la Torre, Design methodologies based

on hardware description languages, IEEE Trans. Ind. Electron., vol. 46,

no. 1, pp. 312, Feb. 1999.

4104 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 61, NO. 8, AUGUST 2014

[12] T. Q. Pham and L. J. van Vliet, Separable bilateral ltering for fast video

preprocessing, in Proc. IEEE ICME, 2005, pp. 14.

[13] F. Durand and J. Dorsey, Fast bilateral ltering for the display of high-

dynamic-range images, ACM Trans. Graph., vol. 21, no. 3, pp. 257266,

Jul. 2002.

[14] S. Paris and F. Durand, A fast approximation of the bilateral lter using

a signal processing approach, in Proc. ECCV, 2006, pp. 568580.

[15] J. Chen, S. Paris, and F. Durand, Real-time edge-aware image processing

with the bilateral grid, ACM Trans. Graph., vol. 26, no. 3, pp. 19,

Jul. 2007.

[16] Q. Yang, K.-H. Tan, and N. Ahuja, Real-time O(1) bilateral ltering, in

Proc. IEEE CVPR, 2009, pp. 557564.

[17] M. M. Bronstein, Lazy sliding window implementation of the bilateral

lter on parallel architectures, IEEE Trans. Image Process., vol. 20, no. 6,

pp. 17511756, Jun. 2011.

[18] B. Weiss, Fast median and bilateral ltering, ACM Trans. Graph.,

vol. 25, no. 3, pp. 519526, Jul. 2006.

[19] F. Porikli, Constant time O(1) bilateral ltering, in Proc. IEEE CVPR,

2008, pp. 18.

[20] Y.-C. Tseng, P.-H. Hsu, and T.-S. Chang, A 124 Mpixels/sec VLSI de-

sign for histogram-based joint bilateral ltering, in IEEE Trans. Image

Process., Nov. 2011, vol. 20, no. 11, pp. 32313241.

[21] F. Hannig, M. Schmid, J. Teich, and H. Hornegger, A deeply pipelined

and parallel architecture for denoising medical images, in Proc. IEEE

FPT, 2010, pp. 485490.

[22] L. Costas, P. Colodrn, J. J. Rodrguez-Andina, J. Faria, and

M.-Y. Chow, Analysis of two FPGA design methodologies applied to

an image processing system, in Proc. IEEE ISIE, 2010, pp. 30403044.

[23] N. Sudha and A. R. Mohan, Hardware-efcient image-based robotic path

planning in a dynamic environment and its FPGA implementation, IEEE

Trans. Ind. Electron., vol. 58, no. 5, pp. 19071920, May 2011.

[24] R. Marin, G. Len, R. Wirz, J. Sales, J. M. Claver, P. J. Sanz, and

J. Fernndez, Remote programming of network robots within the UJI in-

dustrial robotics telelaboratory: FPGA vision and SNRP network proto-

col, IEEETrans. Ind. Electron., vol. 56, no. 12, pp. 48064816, Dec. 2009.

[25] E. Monmasson and M. N. Cirstea, FPGA design methodology for in-

dustrial control systemsA review, IEEE Trans. Ind. Electron., vol. 54,

no. 4, pp. 18241842, Aug. 2007.

[26] J. J. Rodriguez-Andina, M. J. Moure, and M. D. Valdes, Features, design

tools, and application domains of FPGAs, IEEE Trans. Ind. Electron.,

vol. 54, no. 4, pp. 18101823, Aug. 2007.

[27] H. Zhuang, K.-S. Low, and W.-Y. Yau, Multichannel pulse-coupled

neural-network-based color image segmentation for object detection,

IEEE Trans. Ind. Electron., vol. 59, no. 8, pp. 32993308, Aug. 2012.

[28] S. Jin, D. Kim, T. T. Nguyen, D. Kim, M. Kim, and J. W. Jeon, Design and

implementation of a pipelined datapath for high-speed face detection using

FPGA, IEEE Trans. Ind. Informat., vol. 8, no. 1, pp. 158167, Feb. 2012.

[29] Y. Chen and V. Dinavahi, Digital hardware emulation of universal ma-

chine and universal line models for real-time electromagnetic transient

simulation, IEEE Trans. Ind. Electron., vol. 59, no. 2, pp. 13001309,

Feb. 2012.

[30] C. Charoensak and F. Sattar, FPGA design of a real-time implementation

of dynamic range compression for improving television picture, in Proc.

IEEE ICICS, 2007, pp. 15.

[31] A. Rosado-Muoz, M. Bataller-Mompen, E. Soria-Olivas, C. Scarante,

and J. F. Guerrero-Martnez, FPGA implementation of an adaptive lter

robust to impulsive noise: Two approaches, IEEE Trans. Ind. Electron.,

vol. 58, no. 3, pp. 860870, Mar. 2011.

[32] T. Q. Vinh, J. H. Park, Y.-C. Kim, and S. H. Hong, FPGA implementation

of real-time edge-preserving lter for video noise reduction, in Proc.

IEEE ICCEE, 2008, pp. 611614.

[33] H. Dutta, F. Hannig, J. Teich, B. Heigl, and H. Hornegger, A design

methodology for hardware acceleration of adaptive lter algorithms in

image processing, in Proc. IEEE ASAP, 2006, pp. 331340.

[34] R. Chen, L. Chen, and L. Chen, System design consideration for digital

wheelchair controller, IEEE Trans. Ind. Electron., vol. 47, no. 4, pp. 898

907, Aug. 2000.

[35] R. Turney, Two-dimensional linear ltering, in Application Note: Xilinx

FPGAs, 2007, pp. 18.

[36] M. Zhang and B. K. Gunturk, Multiresolution bilateral lter for image

denoising, IEEE Trans. Image Process., vol. 17, no. 12, pp. 23242333,

Dec. 2008.

[37] Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, Image quality assess-

ment: From error visibility to structural similarity, IEEE Trans. Image

Process., vol. 13, no. 4, pp. 600612, Apr. 2004.

[38] A. Gabiger-Rose, M. Kube, P. Schmitt, R. Weigel, and R. Rose, Image

denoising using bilateral lter with noise-adaptive parameter tuning, in

Proc. IEEE IECON, 2011, pp. 45154520.

Anna Gabiger-Rose (S09) was born in

Ordshonikidse, Ukraine, in 1978. She received the

Dipl.-Ing. degree in electrical engineering, electro-

nics, and information technology from the Friedrich-

Alexander University of Erlangen-Nuremberg,

Erlangen, Germany, in 2007.

From 2001 to 2007, she was a Student Assistant

with the Department of Contactless Test and Mea-

suring Systems, Fraunhofer Institute for Integrated

Circuits, Erlangen. She is currently a Research As-

sistant with the Institute for Electronics Engineering,

University of Erlangen-Nuremberg. Her research interests include the design of

embedded systems for image processing and the investigation of digital ltering

techniques for image quality enhancement.

Mrs. Gabiger-Rose is member of the IEEE Industrial Electronics Society.

She served as a reviewer for the 35th Annual Conference of the IEEE Industrial

Electronics Society (IECON09).

Matthias Kube was born in Mainz, Germany, in

1975. He received the Dipl.-Ing. FH (M.Sc.) degree

in electrical engineering and microelectronics from

the Georg-Simon-Ohm University of Applied Sci-

ence of Nuremberg, Nuremberg, Germany, in 2002.

Since 2003, he has been working as a member of

the research staff at the Department of Contactless

Test and Measuring Systems, Fraunhofer Institute

for Integrated Circuits, Erlangen, Germany. He has

the technical leadership for the development of an

innovative indirect converting X-ray detector with

conventional optical sensors for scientic and industrial applications of non-

destructive testing (NDT), which is optimized for tasks that require a high

dynamic range, a high speed, and a long life cycle. His interests in research

include optical sensors and cameras, eld-programmable-gate-array design,

embedded systems for image processing, and X-ray imaging for NDT.

Robert Weigel (S88M89SM95F02) was born

in Ebermannstadt, Germany, in 1956. He received

the Dr.-Ing. and Dr.-Ing.habil. degrees in electrical

engineering and computer science from the Mu-

nich University of Technology, Munich, Germany, in

1989 and 1992, respectively.

He was a Research Engineer from 1982 to 1988,

a Senior Research Engineer from 1988 to 1994, and

a Professor for RF Circuits and Systems from 1994

to 1996 with the Munich University of Technology.

From 1996 to 2002, he was the Director of the

Institute for Communications and Information Engineering, University of Linz,

Linz, Austria. Since 2002, he has been the Head of the Institute for Electronics

Engineering, University of Erlangen-Nuremberg, Erlangen, Germany.

Dr. Weigel was the recipient of the IEEE Microwave Applications Award in

2007. Within IEEE Microwave Theory and Techniques Society (MTT-S), he has

been the Founder andChair of the AustrianCommunications/Microwave Theory

and Techniques Society Joint Chapter and Region 8 Coordinator. He is the Chair

of MTT-2 Microwave Acoustics and the MTT-S President-Elect in 2013.

Richard Rose (S09) was born in Nuremberg,

Germany, in 1981. He received the Dipl.-Ing. degree

in electrical engineering, electronics, and informa-

tion technology from the Friedrich-Alexander Uni-

versity of Erlangen-Nuremberg, Erlangen, Germany,

in 2007.

In 2008, he joined the Institute for Electronics

Engineering, University of Erlangen-Nuremberg, as

a Research Assistant, and since 2010, he has been the

Team Leader of the System Engineering group. His

research interests include digital signal processing,

receiver design, antenna design, localization techniques, and wireless commu-

nication systems.

Mr. Rose is a member of the IEEE Microwave Theory and Techniques So-

ciety, the IEEE Signal Processing Society, the IEEE Antennas and Propagation

Society, and the IEEE Communications Society. He served as a reviewer for the

journal of Mathematical Problems in Engineering and the International Journal

of Electronics and Communications.

- fpga based 32 bit risc processor.Uploaded byAchit Yadav
- FINAL REPORT.docxUploaded byAchit Yadav
- lec-fpgaUploaded bylalith.shankar7971
- Instrumentation EngineeringUploaded byvaithy1990
- Effect of Symlet Filter Order on Denoising of Still ImagesUploaded byAnonymous IlrQK9Hu
- TypewriterUploaded bySundar Rajadurai
- PFE Adaptive Filter Architectures for FPGA ImplementationUploaded byHachem Elyousfi
- Static Timing Analysis AnUploaded bybalashyamu
- Altera Power-Optimized Solutions for Telecom ApplicationsUploaded bykn65238859
- ReportUploaded bySyllogismRXS
- DE10-Nano User Manual (1)Uploaded byweirdocolector
- AIWA 6ZG-1Uploaded byDavid Argote Bellido
- f 0533134Uploaded byInternational Organization of Scientific Research (IOSR)
- Lec15.pptUploaded bySai Prashanth
- A Phase Congruency Based Patch Evaluator for Complexity Reduction in Multi Dictionary Based Single Image Super Resolution 2016 Information SciencesUploaded byatirina
- Reconfig SyllabusUploaded bySudarshan Suresh
- UntitledUploaded byAvinash Reddy
- iee05tjtUploaded byLuis Charrua Figueiredo
- U1Uploaded bySaurabhMoharir
- 232 Fpga BoardUploaded bysakin1
- pxc387997Uploaded byYogeshvaran Ranganthan
- VLSI PROJECTSUploaded byGorantala Anil Kumar
- FPGA Irradiation MGH 2 - Raym-110331v1Uploaded byjulienjulien17
- Ieee Argencon 2016 Paper 28Uploaded byademargcjunior
- 25-SohJUploaded byZisa Krage
- Panda, 2012Uploaded byEdson
- 04-0649Uploaded byThirumal Reddy Komati Reddy
- IRJET- Image Inpainting Using Modified Exemplar- Based MethodUploaded byIRJET Journal
- New Microsoft Office Word DocumentUploaded byBruce Martin
- datasheet223Uploaded byMilan Coleman

- 1Uploaded bydwivedi89
- Notes Mpgl1 Chapter6Uploaded byJoshua Duffy
- Digital Signal ProcessingUploaded byShaji Joseph
- btech_cscUploaded byVivek Murugaiyan
- 02. Comm SysUploaded byAdam Wells
- BTech(ECE)2011-12Uploaded byJacob Jayaseelan
- UNIT-IVUploaded byJoshua Duffy
- EC 1361-Digital Signal ProcessingUploaded byNihit Patel
- Dsp RejinpaulUploaded byJoshua Duffy
- Digital Signal ProcessingUploaded byJoshua Duffy
- Digital Ic ListUploaded byhilariolussa
- GaussianUploaded byJoshua Duffy
- Adaptive MedianUploaded byJoshua Duffy
- AverageUploaded byJoshua Duffy
- ConnectorUploaded byJoshua Duffy
- Opcodes Table of Intel 8085 LibreUploaded byJoshua Duffy
- 8051-experiments1Uploaded bykrishps
- 1Uploaded byJoshua Duffy
- 5_2Uploaded byJoshua Duffy
- 2Electronic Devices_sample ChapterUploaded byJoshua Duffy
- SlideUploaded byJoshua Duffy
- Ec2404 Lab ManualUploaded bySylviaHofer
- EC - 1 - LabmanualUploaded byPartha Sarathy
- Dos CommandsUploaded byJoshua Duffy
- Unified Power Flow ControllerUploaded byJoshua Duffy
- UNIT3Uploaded byJoshua Duffy
- UNIT5Uploaded byJoshua Duffy
- UNIT2Uploaded byJoshua Duffy

- Course File FormatUploaded bysakthi_2022
- 720U2301 Rev 07 - Minimate Pro Operator ManualUploaded byJennifer Gordon
- Nokia MetroSite Base Station User Manual - Alarm DescriptionsUploaded byomidianno
- Muncion matlab 2015 freqzUploaded bylitronix
- Design and Simulation of FPGA Based Digital System for Peak Detection and CountingUploaded byPaulo Estrela
- En - Ssp 307 - Touran - Electrical System 2Uploaded byAma Deea
- PowerSourceLAW400@350Uploaded byMohd Asrul Arashi Sadali
- Acca Manual j and d Mechanical RequirementsUploaded bymsmith6477
- ADE6698 PT Motor Tech DataUploaded byberkahharian
- Circuits and NetworksUploaded byNaik Romik
- 15el9500 ManualUploaded byandyinsb
- Bathy500df Master-rev 4309Uploaded byykrisnanda7965
- Bus Reactor BHELUploaded byVHMANOHAR
- 54a1942a06e5a9ac3167c957cd9afdefa651110b383fbUploaded bynaguiec
- Physics ProjectUploaded byHarsh Desai
- CV Elec_instrument EngineerUploaded byAdel Mohamad Rasheed
- RENR5456RENR5456-04_SISUploaded byrjan7pe
- AUTOMATIC_VOLTAGE_REGULATOR-R250.pdfUploaded byWilman33
- IRI1-WD.pdfUploaded byAman Saini
- RSRPUploaded byAli Gomri
- DSC-H2_L2Uploaded byCarlos Gonçalves
- S171E_P3_Modbus_Protocol_Manual__V1_0A_04_2007Uploaded bySreekanthMylavarapu
- HFSS Antennas - Arrays.pdfUploaded byYohandri
- Test IC910HUploaded byScott Gillis
- Simple Tube Audio Osc (Oscilador de audio valvular)Uploaded byfacur4
- Palmer PDI-03 Part 2Uploaded byFernando Gómez
- MICA2 Datasheet DocUploaded byFares Mezrag
- Assignment #1 SolutionUploaded byfarhan
- EHDUploaded byvinitgarg1993
- ER-260 265 AU User ManualUploaded byRadu Baciu-Niculescu