1.1 Overview

CHAPTER 1
INTRODUCTION
1.1 OVERVIEW
The dynamic power of a very-large-scale integrated circuit is an effective

way to reduce the total power consumption. One of the best ways to reduce the
dynamic power dissipation is to minimize the switching activities, i.e., the number
of signal transitions in the circuit .The token-ring architecture, adopted in the
design of a mixed-timing first-in–first-out (FIFO) interface offers the potential for
low power consumption since data are immobile in the FIFO. In their designs, once
data is queued, it will not be moved and is simply dequeued in place. A token,
which is represented as logic state 1 in the token register, is used to control the
dequeuing of old data and queuing of new data at the same time. All the token
registers form a token ring, which is a succession of nodes interconnected in a
circular manner. The token-ring architecture with exactly one token will be
adopted in our design for lowering the power consumption of a one dimension (1-
D) median filter. A median filter is a nonlinear filter widely used in digital signal
and image processing for the smoothing of signals, suppression of impulse noise,
and edge preservation. The median filter replaces a sample with the middle-ranked
value among all the samples inside the sample window, centered around the
sample in question. Depending on the number of samples processed at the same
cycle, there are two types of architectures for hardware design, i.e., word-level
architecture and bit-level architecture. In the word-level architecture, the input
samples are sequentially processed word by word, and the bits of the sample are
processed in parallel. A selective median filter is a mixed filter which removes
1
spike noise (impulsive noise or most commonly called as salt and pepper) from a
noisy image while preserving sharp edges. It is rightly called as ‘selective’ because
this filter first checks each pixel whether it is a noisy or non-noisy. For this
purpose we have used a double derivative filter or Laplace filter which acts as a
noise detector. If it is found noisy then we apply a general median filter as
described in. The main objective of both of these filters is to denoise the noisy
image and restore the original one while keeping the high intensity details intact.
The median filter is a major denoising technique used in image restoration
technique as described below. The following subsection also describes the
degradation process and some common noise models along with double derivative
technique used for sharpening the images. Image restoration Preview Image is a
two dimensional operation or function of light intensities. As light falls on object
and gets reflected back; hence we can see the object. Mathematically it can be
represented as
Y(x, y) = I(x, y).R(x, y)
Here, I(x, y) = amount or intensity of light incident on object.
R(x, y) = amount or intensity of light reflected from that object.
Y(x, y) = resultant image in intensity (Unit:-Candela).
The ultimate goal of restoration technique is to improve image in some
predefine sense. Although there are areas of overlap, image enhancement is largely
a subjective process, while image restoration is for the most part an objective
process. In restoration technique attempts to reconstruct or recover an image that
has been degraded by using prior knowledge of the degradation phenomenon. Thus
restoration techniques are oriented toward modeling the degradation and applying
the inverse process to recover the original image. In this chapter as a degradation
function that, together with an additive noise term, operates on an input image f(x,
y).Given g(x, y), some knowledge about the degradation function h, and some
2
knowledge about the additive noise term η(x, y), the objective of restoration is to
obtain an estimation of the original image. The degraded image model in spatial
domain is given by
g (x, y) = h(x, y) * f(x, y) + η(x, y)
where h(x, y) is the spatial representation of the degradation function and the
above function on”*” is called convolution operator. This convolution is
multiplication in frequency domain. So it can be shown as
G (u, v) = H (u, v) F (u, v) + N (u, v)
Degradation Restoration
F(x,y ) + F′(x ,y)
Function, H Filters
Degradation Noise, η(x, y) Restoration
Figure 1.1 Image degradation model
Sometimes image are degraded by external noises as noises are additive in

nature. The general degradation process of an image is given below Degradation
Function H + Restoration F(x, y) Filters F'(x, y) Noise η (x, y) Degradation
Restoration. Some Noise Models The main sources of noise in digital images arise
during image acquisition (digitization) and transmission. The performance of
imaging sensors is affected by a variety of factors, such as environmental
conditions during image acquisition and by the quality of sensing elements
3
themselves. For example sensor temperature, light levels are major factors affect
the amount of noise in the resulting image. Images are corrupted during
transmission principally due to interference in the channel used for transmission.
For example; due to lightning effect or other atmospheric disturbance image
transmitted by wireless network is corrupted. Images acquired by optical, electro-
optical or electronic means are mostly degraded by the sensing environment. The
degradations may be in the form of sensor noise.
1. Impulsive Noise
Impulse noise shown in figure is characterized by a noise spike replacing
the actual pixel value. Impulse noise is further divided into two classes. Random
valued impulse noise (RVIN) and salt and pepper noise (SPN). In RVIN, the
impulse value at a particular pixel may be a random value between particular
intervals. But in SPN the impulses are either the minimum value or the maximum
value allowed in the intensity values. For example 0 and 255 in the case of 8-bit
image.
Pa ………. When..z= a
P(z)= Pb ………..When..z= b
0 ………… otherwise
pb ……………
P (z)
pa ……
a b z
Figure 1.1.1 PDF of impulsive noise
4
2. Uniform Noise
It is another common noise model shown in figure. In this case the noise
can take on values in an interval [a, b] with uniform probability. The PDF of
uniform noise is given by,
1
……when …a ≤ z ≤ b
𝑏−𝑎
P (Z) =
0 ………. otherwise
P(z)
1/(b-a) …………..
a b z
Figure 1.1.2 PDF of uniform noise
3. Gaussian noise
Due to mathematical tractability in both the spatial and frequency domains
Gaussian noise models are used in frequent in practice. The PDF of Gaussian
random values z is given by,
2 /2𝜎 2
p(z) = 1/√2𝜋𝜎(𝑒 −(𝑧−𝜇) )
5
Figure 1.1.3 PDF of Gaussian noise
Denoising Techniques another type of image degradation is due to additive

noise. Random values get added to the intensity values. Denoising involves the
application of some filtering technique to get the true image back. The denoising
algorithms vary greatly depending on the type of noise present in the image. Each
type of image is characterized by a unique noise model. Each noise model
corresponds to a probability density function which describes the distribution of
noise within the image. When the only degradation present in an image is noise,
then equation becomes
g (x, y) = f(x, y) + η(x, y)
The Fourier transform of will be
G (u, v) = F (u, v) + N (u, v)
Spatial Filtering Denoising techniques exist in both spatial domain as well

as frequency domain. Spatial Filtering is preferred when only additive noise is
present.
6
The different classes of filtering techniques exist in spatial domain filtering.
• Mean Filters
• Order-Statistics Filters
• Adaptive Filters
 Sharpening Filters
Mean Filter
Mean filters in this method the middle pixel value of the filter window is
replaced with the arithmetic mean or geometric mean or harmonic or contra-
harmonic mean of all the pixel values within the filter window. A mean filter
simply smoothes local variations in an image. Noise is reduced as a result of this
smoothening, but edges within the image get blurred.
Examples: - Arithmetic Mean Filter, Geometric Mean Filter, Contra
harmonic Mean Filter Order statistics filter Median Filter comes under the class of
order-statistics filters.
Order Statistic Filters
Response of Order statistics filters is based on ordering the pixels contained

in the filter window. Median filter replaces the value of a pixel by the median of
the gray levels within the filter window. Median filters are particularly effective in
the presence of impulse noise.
Examples: - Median filters, Max-min filters, midpoint filter 7 Adaptive
Filters Adaptive filters change its behavior based on the statistical characteristics of
the image inside the filter window.
7
Adaptive Filters
Adaptive filter performance is usually superior to non-adaptive

Counterparts. But the improved performance is at the cost of added filter
complexity. Mean and variance are two important statistical measures using which
adaptive filters can be designed.
Example : If the local variance is high compared to the overall image
variance, the filter should return a value close to the present value. Because high
variance is usually associated with edges and edges should be preserved.
Sharpening Filter
Sharpening filters the main function of sharpening is to highlight the fine
detail in an image or enhance the detail that has been blurred. It is used to preserve
high intensities value pixel value like edge pixel. In the spatial domain image
blurring happens by pixel averaging in a neighbourhood. Actually averaging
operation is nothing but an integration operation; so sharpening should be
accomplished with differentiation operation. In the image at edge there is higher
discontinuity; so taking differentiation enhances the edges.
Example: Double derivative filter or Laplacian filter .
First order derivative produces thicker edges in an image and so second
order produces more detail information. The main principle of using first order is
to extract the edges in an image. But second orders are mostly preferred as it
produces even better 8 enhancement results. Usually these filters along with other
enhancement or restoring filters yield impressive results.
1.2 MEDIAN FILTERING
What is noise? Noise is any undesirable signal. Noise is everywhere and thus
we have to learn to live with it. Noise gets introduced into the data via any
8
electrical system used for storage, transmission, and/or processing. In addition,
nature will always plays a "noisy" trick or two with the data under observation.
When encountering an image corrupted with noise you will want to improve
its appearance for a specific application. The techniques applied are application-
oriented. Also, the different procedures are related to the types of noise introduced
to the image. Some examples of noise are: Gaussian or White, Rayleigh, Shot or
Impulse, periodic, sinusoidal or coherent, uncorrelated, and granular.
When performing median filtering, each pixel is determined by the median

value of all pixels in a selected neighborhood (mask, template, window). The
median value m of a population (set of pixels in a neighborhood) is that value in
which half of the population has smaller values than m, and the other half has
larger values than m.
This class of filter belongs to the class of edge preserving smoothing filters
which are non-linear filters. These filters smooth the data while keeping the small
and sharp details.
Median filtering is a simple and very effective noise removal filtering

process. Its performance is particularly good for removing shot noise. Shot noise
consists of strong spike like isolated values.
Shown below are the original image and the same image after it has been
corrupted by shot noise at 10%. This means that 10% of its pixels were replaced by
full white pixels. Also shown are the median filtering results using 3x3 and 5x5
windows; three (3) iterations of 3x3 median filter applied to the noisy image; and
finally for comparison, the result when applying a 5x5 mean filter to the noisy
image
9
1.3 MEDIAN FILTER
Common Names: Median filtering, Rank filtering
1.3.1 Brief Description
The median filter is normally used to reduce noise in an image, somewhat

like the mean filter. However, it often does a better job than the mean filter of
preserving useful detail in the image.
1.3.2 How It Works
Like the mean filter, the median filter considers each pixel in the image in
turn and looks at its nearby neighbours to decide whether or not it is representative
of its surroundings. Instead of simply replacing the pixel value with the mean of
neighbouring pixel values, it replaces it with the median of those values. The
median is calculated by first sorting all the pixel values from the surrounding
neighbourhood into numerical order and then replacing the pixel being considered
with the middle pixel value. (If the neighbourhood under consideration contains an
even number of pixels, the average of the two middle pixel values is used.) Figure
1 illustrates an example calculation.
123 125 126 130 140

122 124 126 127 135
Neighbourhood values :
118 120 150 125 134
115,119,120,123,124,
119 115 119 123 133 125,126,127,150
111 116 110 120 130
Median Value :124
Figure 1.2 Image for sample window

10
Figure Calculating the median value of a pixel neighbourhood. As can be
seen the central pixel value of 150 is rather unrepresentative of the surrounding
pixels and is replaced with the median value: 124. A 3×3 square neighbourhood is
used here --- larger neighbourhoods will produce more severe smoothing.
1.4 MEAN FILTER
Common Names : Mean filtering , Smoothing , Averaging , Box filtering .
Mean filtering is a simple, intuitive and easy to implement method of

smoothing images, i.e. reducing the amount of intensity variation between one
pixel and the next. It is often used to reduce noise in images.
1.4.2 How It Works
The idea of mean filtering is simply to replace each pixel value in an image
with the mean (`average') value of its neighbours, including itself. This has the
effect of eliminating pixel values which are unrepresentative of their surroundings.
Mean filtering is usually thought of as a convolution filter. Like other convolutions
it is based around a kernel, which represents the shape and size of the
neighbourhood to be sampled when calculating the mean. Often a 3×3 square
kernel is used, as shown in Figure 1, although larger kernels (e.g. 5×5 squares) can
be used for more severe smoothing. (Note that a small kernel can be applied more
than once in order to produce a similar - but not identical - effect as a single pass
with a large kernel.)
11
Figure 1.4 3×3 averaging kernel often used in mean filtering
Computing the straightforward convolution of an image with this kernel

carries out the mean filtering process.
1.5 CRIMMINS SPECKLE REMOVAL
Common Names : Crimmins Speckle Removal
Crimmins Speckle Removal reduces speckle from an image using the

Crimmins complementary hulling algorithm. The algorithm has been specifically
designed to reduce the intensity of salt and pepper noise in an image. Increased
iterations of the algorithm yield increased levels of noise removal, but also
introduce a significant amount of blurring of high frequency details.
1.5.2 How It Works
Crimmins Speckle Removal works by passing an image through a speckle

removing filter which uses the complementary hulling technique to reduce the
speckle index of that image. The algorithm uses a non-linear noise reduction
technique which compares the intensity of each pixel in an image with those of its
12
8 nearest neighbours and, based upon the relative values, increments or decrements
the value of the pixel in question such that it becomes more representative of its
surroundings. The noisy pixel alteration (and detection) procedure used by
Crimmins is more complicated than the ranking procedure used by the non-linear
median filter. It involves a series of pair wise operations in which the value of the
`middle' pixel within each neighbourhood window is compared, in turn, with each
set of neighbours (N-S, E-W, NW-SE, NE-SW) in a search for intensity spikes.
The operation of the algorithm is illustrated in Figure 1 and described in more
detail below.
For each iteration and for each pair of pixel neighbours, the entire image is
sent to a Pepper Filter and Salt Filter as shown above. In the example case, the
Pepper Filter is first called to determine whether the each image pixel is
darker_than - i.e. by more than 2 intensity levels - its northern neighbours.
Comparisons where this condition proves true cause the intensity value of the pixel
under examination to be incremented twice lightened, otherwise no change is
effected. Once these changes have been recorded, the entire image is passed
through the Pepper Filter again and the same series of comparisons are made
between the current pixel and its southern neighbour. This sequence is repeated by
the Salt Filter, where the conditions lighter_than and darken are, again, instantiated
using 2 intensity levels.
Note that, over several iterations, the effects of smoothing in this way
propagate out from the intensity spike to infect neighbouring pixels. In other
words, the algorithm smoothes by reducing the magnitude of a locally inconsistent
pixel, as well as increasing the magnitude of pixels in the neighbourhood
surrounding the spike. It is important to notice that a spike is defined here as a
pixel whose value is more than 2 intensity levels different from its surroundings.
13
CHAPTER 2
LITERATURE REVIEW
2.1 R.-D. Chen, P.-Y. Chen, and C.-H. Yeh, “Design of an area-efficient one
dimensional median filter,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol.
60, no. 10, pp. 662–666, Oct. 2013.
An area-efficient 1-D median filter based on the sorting network is presented

in this brief. It is a word-level filter, storing the samples in the window in
descending order according to their values. When a sample enters the window, the
oldest sample is removed, and the new sample is inserted in an appropriate position
to preserve the sorting of samples. To increase the throughput, the deletion and
insertion of samples are performed in one clock cycle, so that the median output is
generated at each cycle. The experimental results have shown the improved area
efficiency of our design in comparison with previous work. The architecture of our
1-D median filter with window size N is given.
It consists of N cascaded blocks, one for each window rank. Each cell block
ci is composed of a data register (Ri), an m-bit (m = log2 N) age register (Pi), and a
control unit for data transfer. Register Ri stores the value of the sample in cell ci,
and register Pi keeps the age of this sample, i.e., the number of clock cycles the
sample has experienced in the window. All cells are connected to a global input X,
through which they receive the new input sample. At each machine cycle,
depending on the value of the input sample, the data transfer control unit
determines if each cell will keep its original sample, receive the input sample from
the outside, or receive the sample from its adjoining cell on the left or right. The
14
window samples are stored in descending order from left to right, so that at any
time, the maximum sample is at the leftmost register (R1) and the minimum
sample is at the rightmost register (RN). The median value of the window will be
at the center cell (R(N+1)/2), assuming N is an odd number.
2.2 V. G. Moshnyaga and K. Hashimoto, “An efficient implementation of 1-D

median filter,” in Proc.52nd IEEE Int. MWSCAS, 2009, pp. 451–454.
This paper presents a new architecture and circuit implementation of 1-D

median filter. The proposed circuit belongs to the class of non-recursive sorting
network architectures that process the input samples sequentially in the word-based
manner. In comparison to the related schemes, it maintains sorting of samples from
the previous position of the sliding window, positioning only the incoming sample
to the correct rank. Unlike existing 1-D filter implementations, the circuit has
linear hardware complexity, minimal latency and achieves throughput of 1/2 of the
sampling rate. Experimental evaluation and comparisons show high efficiency of
our design. This paper presented a novel circuit for median filtering of one-
dimensional signals. The circuit has a linear complexity, proportional to the
window length of the filter. The prototype FPGA implementations of the filter for
the window size of 5, 9, and 25 confirm functional correctness of the proposed
circuit and its ability to operate at 1/2 of the sampling rate, 2 clock-cycle latency
and high clock frequency. Currently we are working on custom VLSI chip design
and extending the circuit functionality to 2-D median filters.
15
2.3 O.T.-C. Chen, S. Wang, and Y.-W. Wu, “Minimization of switching
activities of partial products for designing low-power multipliers,” IEEE
Trans. Very Large Scale Integr. (VLSI) Syst., vol. 11, no. 3, pp. 418–433 Jun.
2003.
This work presents low-power 2's complement multipliers by minimizing the

switching activities of partial products using the radix-4 Booth algorithm. Before
computation for two input data, the one with a smaller effective dynamic range is
processed to generate Booth codes, thereby increasing the probability that the
partial products become zero. By employing the dynamic-range determination unit
to control input data paths, the multiplier with a column-based adder tree of
compressors or counters is designed. To further reduce power consumption, the
two multipliers based on row-based and hybrid-based adder trees are realized with
operations on effective dynamic ranges of input data. Functional blocks of these
two multipliers can preserve their previous input states for non effective dynamic
data ranges and thus, reduce the number of their switching operations.
To illustrate the proposed multipliers exhibiting low-power dissipation, the

theoretical analyzes of switching activities of partial products are derived. The
proposed 16 /spl times/ 16-bit multiplier with the column-based adder tree
conserves more than 31.2%, 19.1%, and 33.0% of power consumed by the
conventional multiplier, in applications of the ADPCM audio, G.723.1 speech, and
wavelet-based image coders, respectively. Furthermore, the proposed multipliers
with row-based, hybrid-based adder trees reduce power consumption by over
35.3%, 25.3% and 39.6%, and 33.4%, 24.9% and 36.9%, respectively. When
considering product factors of hardware areas, critical delays and power
consumption, the proposed multipliers can outperform the conventional
multipliers. Consequently, the multipliers proposed herein can be broadly used in
16
various media processing to yield low-power consumption at limited hardware cost
or little slowing of speed.
2.4 J. Cadenas, G. M. Megson, R. S. Sherratt, and P. Huerta, “Fast median

calculation method,” Electron. Lett., vol. 48, no. 10, pp. 558–560, May
2012.
The ever increasing demand for high image quality requires fast and efficient
methods for noise reduction. The best known order-statistics filter is the median
filter. A method is presented to calculate the median on a set of N W-bit integers in
W B time steps. Blocks containing B-bit slices are used to find B-bits of the
median; using a novel quantum-like representation allowing the median to be
computed in an accelerated manner compared to the best known method (W time
steps). The general method allows a variety of designs to be synthesised
systematically. A further novel architecture to calculate the median for a moving
set of N integers is also discussed. This method reduces by B the number of steps
to calculate the median on a set of N items compared to previously known
methods. We found parameters B and N for which circuit implementations also
make the method faster. This is ideal for real-time processing of images. The
method applies to non-negative or signed integers and also easily extends to the
case of rank filtering. Multiple architectures can be derived with different tradeoffs
between area and time in fully pipelined structures.
17
2.5 D. Prokin and M. Prokin, “Low hardware complexity pipelined rank
filter,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 57, no. 6, pp. 446–
450, Jun. 2010.
The major benefit of a disclosed low-hardware-complexity pipelined
rank filter is reduction in hardware complexity and increase in processing
speed, due to identical pipelined stages and the absence of mask bits. Field-
programmable-gate-array realization of this filter significantly reduces the
number of used logic elements and registers, in comparison with the best prior
art methods, and, at the same time, increases the maximum operating frequency.
One rank-sample result is available at the output on each clock cycle,

thus enabling real-time nonlinear image processing. A low-hardware-
complexity pipelined rank-filtering algorithm has been developed,
implemented, and verified in various FPGAs produced by Xilinx and Altera.
This architecture has provided significantly lower complexity, in comparison
with the best prior art methods. Furthermore, the hardware embodiment of
either bit-pipeline (LCBP) or bit-feedback (LCBF) rank- filtering algorithm is
very regular and can easily be adjusted for different window sizes N, as well as
different sample sizes W by simple modification of two constants N and W in
the appropriate VHDL code available from the authors. Both LCBP and LCBF
have favorably been compared with the best masking methods and the best
sorting methods,based on the maximum operating frequency and used chip
resources, confirming LCBP and LCBF applicability for real-time 1-D signal
and image processing. Both LCBP and LCBF are also easily applicable as
building blocks in complex applications of median filters, such as adaptive,
weighted, switching, and cascade filters.
18
CHAPTER 3
PROJECT DESCRIPTION
3.1 INTRODUCTION
Lowering the dynamic power of a very-large-scale integrated circuit is an

effective way to reduce the total power consumption. One of the best ways to
reduce the dynamic power dissipation is to minimize the switching activities,
i.e.,the number of signal transitions in the circuit, The token-ring architecture,
adopted in the design of a mixed-timing first-in–first-out (FIFO) interface, offers
the potential for low power consumption since data are immobile in the FIFO. In
their designs, once data is queued, it will not be moved and is simply dequeued in
place. A token, which is represented as logic state 1 in the token register, is used to
control the dequeuing of old data and queuing of new data at the same time. All the
token registers form a token ring, which is a succession of nodes interconnected in
a circular manner. The token-ring architecture with exactly one token will be
adopted in our design for lowering the power consumption of a one dimension (1-
D) median filter. A median filter is a nonlinear filter widely used in digital signal
and image processing for the smoothing of signals, suppression of impulse noise,
and edge preservation.
The median filter replaces a sample with the middle-ranked value among all
the samples inside the sample window, centered around the sample in question.
Depending on the number of samples processed at the same cycle, there are two
types of architectures for hardware design, i.e., word-level architecture and bit-
level architecture. In the word-level architecture, the input samples are sequentially
19
processed word by word, and the bits of the sample are processed in parallel. On
the contrary, the bit-level architecture processes the samples in parallel and the bits
of the incoming samples are sequentially processed. In this brief, the word-level
architecture will be adopted in the design of a low-power median filter for practical
use. The median of a set of samples in the word-level sorting network is often
computed by first sorting the input samples and then selecting the middle value.
In their methods, the input samples are sequentially processed word by

word, and the incoming sample is inserted into the correct rank in two steps. In the
first step, the oldest sample is removed from the window by moving some of the
stored samples to the left. In the second step, the incoming sample is compared
with the already sorted samples and then inserted in the right place by moving
some of them to the right. The difference between the two architectures in and is
that these two steps are separately performed in two clock cycles in, whereas in, it
takes only one cycle.
In both of their methods, however, some of the stored samples have to be

shifted left or right, depending on their values when a new input sample enters the
window. For some applications that require a larger sample width, more signal
transitions in the circuit are needed; i.e., more dynamic power will be consumed.
To conquer this problem, a new median filter architecture targeting low power
consumption is proposed. Instead of sorting the samples physically in the window,
the stored samples are kept immobile there. Only the rank of each sample, which
uses fewer bits, has to be updated at each new cycle when an input sample enters
the window. Since our architecture is implemented as a two-stage pipeline, the
median output, which is the sample with median rank, will also be generated at
each cycle. The improvement in power consumption is achieved by utilizing a
20
token ring in our architecture. Since the stored samples in the window are
immobile, our architecture is suitable for low-power applications.
3.2 ARCHITECTURE OF THE LOW POWER FILTER

It gives an overview of our low-power median filter architecture with
window size N. It consists of a circular array of N identical cells and three
auxiliary modules: rank calculation (RankCal), rank selection (RankSel), and
median selection (MedianSel). All the cells are also connected to a global input
register X, through which they receive the incoming sample, and the median is
stored in the output register Y .
Figure 3.2 Low Power Filter Architechture
21
The architecture is implemented as a two-stage pipeline, where the registers
in all the cells serve as the internal pipeline registers. All the registers in the
architecture are synchronized by the rising edge of a global clock. Each cell block
ci is composed of a rank generation (RankGen) module, a comparator module
“==,” and three registers: an m-bit (m = log2 N) rank register (Pi), a data register
(Ri), and a 1-bit token register (Ti). Register Ri stores the value of the sample in
cell ci, register Pi keeps the rank of this sample, and the enable signal (en) of Ri is
stored in register Ti. All the samples in the window are ranked according to their
values, regardless of their physical locations in the window. In our design, a cell
with a greater sample value will be associated with a greater rank. However, for
two cells ci and cj , whose sample values are equal, ci will be given a greater rank
if Ri is newer than Rj (or Rj is older than Ri); i.e., the sample in cj enters the
window earlier than the sample in ci. The rank is hence unique for each cell. For a
window with size N, the rank starts from 1 for a cell with the least sample value,
and ends with N for a cell with the greatest sample value.
The median of the window can then be obtained from the sample value Ri
of a cell ci whose rank Pi is equal to (N + 1)/2, assuming N is an odd number. In
the architecture, the input sample enters the window in a FIFO manner. After it is
queued, it will not be moved and is simply dequeued in place. A token, which is
represented as logic state 1 in the token register of some cell, is used to control the
dequeuing of old sample and queuing of new input sample at the same time. After
the token is used, it will be passed to the next cell at a new cycle. All the token
registers form a token ring with exactly one token. It serves as a state machine in
the circuit, where the location of the token defines the state of the machine.
Since the data in each register Ri is immobile, our architecture has the
potential for low-power applications. At the initial state of the architecture, to
22
make the first incoming sample be stored in the first cell c1, the token is assumed
to exist in the last cell cN, shown as the shadowed circle at the output of register
TN. Whenever an input sample enters the window at a new cycle, the rank of each
cell has to be updated. It may have to be recalculated, or may be the old rank
decremented by 1, incremented by 1, or kept unchanged.
The new rank of each cell, which is denoted by signal Qi in the cell, is
generated by the RankGen module. Each RankGen module receives signal A from
the RankCal module and signal B from the RankSel module. Signal A is the
recalculated rank of a cell ci that contains the token and signal B is the old rank of
ci. Moreover, signal Yi is the output of a comparator module “==,” which
compares the value of rank Pi with a constant value (N + 1)/2 so that Yi = 1 if Pi is
equal to (N + 1)/2, else Yi = 0. This signal is used to indicate if the corresponding
cell ci contains the median in Ri. Similar to the RankSel module, the MedianSel
module transfers the value of Ri to the output register Y if Yi = 1; i.e., if the median
is stored in Ri.
3.3 RANK GENERATION
For the RankGen module in a cell ci, its implementation .Signal Fi is the
output of a comparator module “<=,” which compares the value of Ri with that of
the input sampleX so that Fi = 1if Ri is less than or equal toX (Ri <= X), else Fi = 0
(Ri > X). Signal Ai is the output of a logic AND gate so that Ai = 1 if Ti = 0 and Fi
= 1; i.e., if cell ci,does not contain the token and Ri is less than or equal toX, else Ai
= 0. The Ai signal of each cell is connected to the RankCal module, which
calculates the new rank of a cell that contains the token. As mentioned in Section
III-A, the new rank is calculated as K + 1, where K is the number of cells, which
does not contain the token and whose sample value is less than or equal toX; i.e.,K
is equal to the number of logic 1’s on the Ai signals of all the cells. The RankCal
23
module can be implemented by a multi-input adder that adds all the Ai signals and
then increments the sum by 1. If cell ci contains the token, the output A of the
RankCal module will be its new rank at the next cycle. On the contrary, if cell ci
does not contain the token, its new rank will be determined by the other signals.
Signal Gi is the output of a comparator module “>,” which compares the
value of rank Pi with that of signal B so that Gi = 1 if Pi is greater than B, else Gi =
0. Since the value of B is the rank Pj of another cell cj that contains the token, the
meaning of Gi can also be described as Gi = 1 if Pi is greater than Pj (Pi > Pj),
elseGi = 0(Pi <= Pj). Signal Ei is the output of a compactor module “==,” which
also compares the value of Pi with that of B so that Ei = 1 if Pi is equal to B, else
Ei = 0. Likewise, the meaning of Ei can be described as Ei = 1 if Pi is equal to Pj
(Pi = Pj), else Ei = 0. Combining these two signals Ei and Gi, for the three
relations between Pi and Pj (Pi > Pj , Pi = Pj , and Pi < Pj ), the corresponding
value of EiGi will be 01, 10, and 00, respectively.
3.4 RANK SELECTION
The RankSel module is responsible for transferring the rank Pi of a cell ci to

its output B if ci contains the token; i.e., when Ti = 1. Fig. 3(a) shows a simple
implementation of this module using AND/OR gates. It can also be implemented
by tristate buffers in Fig. 3(b), where B is the output of a global data bus that
collects the output signals of all the tristate buffers. Since there exists exactly one
cell that contains the token at any time; i.e., there exists exactly one Ti signal
whose value is equal to 1, the value of B will always be valid. The MedianSel
module can also be implemented in a similar way. It transfers the value of Ri to the
output register Y if Ri is the median; i.e., whe Yi = 1. However, if it is implemented
24
by tristate buffers, the median will be valid only when there exists at least (N + 1)/2
samples in the window; otherwise, it will remain in a high impedance state.
3.5 RANK UPDATING
This section explains how to determine the new rank for each cell. Two
types of cells will be separately discussed: a cell with the token and a cell without
the token.
3.5.1.Cell With the Token
For a cell ci with the token, its sample value Ri will be replaced by the
input sample X, and its rank Pi has to be recalculated. The new value of Pi can be
obtained by comparing X with the sample values of all the other N − 1 cells that
do not contain the token. For these cells, if K is the number of cells whose sample
value is less than or equal to X, the new value of Pi will be K + 1. At cycle t6 of
Fig. 2, for example, the new value of rank P1 will be calculated as 2 + 1 at the
next cycle t7.
3.5.2. Cell Without the Token
For a cell ci without the token, its sample value Ri will not be affected when
an input sample X enters the window. However, its rank Pi may be affected by the
sample value Rj of another cell cj that contains the token. Since the value of Rj will
be replaced by X, the relation between Ri and Rj may change. However, the
sample value Rk of any other cell ck that does not contain the token will not affect
Pi since the value of Rk will not be changed. Depending on the relation between Pi
and Pj, and the relation between Ri and X, the new value of Pi may be
decremented by 1, incremented by 1, or kept unchanged when an input sample X is
25
inserted into the window. The change of rank Pi can be explained as the following
five cases, where cell ci does not contain the token, but cell cj does.
Case 1—(Decremented by 1)Pi > Pj andRi <= X: If rank Pi is greater than
rank Pj , Ri is greater than or equal to (but newer than) Rj at the current cycle. If Ri
is less than or equal to the input sample X, Ri will be less than or equal to (but
older than) the new value of Rj at the next cycle. In other words, Rj will change
from a value that is less than or equal to (but older than) Ri to a value that is
greater than or equal to (but newer than) Ri. Therefore, the number of cells whose
sample value is less than or equal to (but older than) Ri will be decremented by 1 at
the next cycle; i.e., Pi has to be decremented by 1. Take rank P4 at cycle t6 of Fig.
2, for example, its value will be decremented by 1 (from 3 to 2) at the next cycle
t7.
Case 2—(Incremented by 1) Pi<Pj and Ri>X: Similar to Case 1, if rank Pi
is less than rank Pj , Ri is less than or equal to (but older than) Rj at the current
cycle. If Ri is greater than the input sample X, Ri will be greater than the new value
of Rj at the next cycle. Therefore, the number of cells whose sample value is less
than or equal to (but older than) Ri will be incremented by 1 at the next cycle; i.e.,
Pi has to be incremented by 1.
Case 3—(Kept Unchanged) Pi < Pj and Ri <= X: If Pi is less than Pj , Ri is
less than or equal to (but older than) Rj at the current cycle. If Ri is less than or
equal to X, Ri will also be less than or equal to (but older than) the new value of Rj
at the next cycle. Therefore, the number of cells whose sample value is less than or
equal to (but older than) Ri at the current cycle will be equal to that at the next
cycle; i.e., Pi has to be kept unchanged. For example, at cycle t7 of Fig. 2, the
value of rank P3 has to be kept unchanged at one at the next cycle t8.
Case 4—(Kept Unchanged) Pi > Pj and Ri > X: Similar to Case 3, if Pi is
greater than Pj and Ri is greater than X, the number of cells whose sample value is
26
less than or equal to (but older than) Ri at the current cycle will also be equal to
that at the next cycle; i.e., Pi has to be kept unchanged.
Case 5—(Kept Unchanged) Pi = Pj : This case occurs when the window is
not yet fully occupied with valid data. At the initial state, the rank of each cell is
reset to be zero. After the window is fully occupied, each cell will be assigned a
nonzero and unique rank. If rank Pi is equal to rank Pj , the values of Pi and Pj are
both zero, and neither cell ci nor cell cj contains valid data. The new value of Pi at
the next cycle will still be zero since ci does not contain the token; i.e., Pi has to be
kept unchanged at zero. At cycle t2 of Fig. 3.2.1, for example, the value of rank P4
will still be zero at the next cycle t4.
It can be obtained from the above discussions that for a cell ci that contains
the token, its rank Pi has to be recalculated at a new cycle. If ci does not contain
the token, its rank Pi may be decremented by 1, incremented by 1, or kept
unchanged. Therefore, there are four sources for ci to update its rank.
3.6 CIRCUIT IMPLEMENTATION

Here, the implementations of the RankSel, MedianSel, RankGen, and
RankCal modules in the architecture will be discussed.
3.6.1.RankSel and MedianSel Modules
The RankSel module is responsible for transferring the rank Pi of a cell ci to

its output B if ci contains the token; i.e., when Ti = 1. Fig. 3.6. shows a simple
implementation of this module using AND/OR gates. It can also be implemented
by tristate buffers in Fig. 3.6.1.
27
Figure 3.6 Using AND/OR gates.
Figure 3.6.1 Using Tristate buffers.
There exists exactly one Ti signal whose value is equal to 1, the value of B
will always be valid. The Median Sel module can also be implemented in a similar
way. It transfers the value of Ri to the output register Y if Ri is the median; i.e.,
when Yi = 1. However, if it is implemented by Tristate buffers, the median will be
valid only when there exists at least (N + 1)/2 samples in the window; otherwise, it
will remain in a high impedance state.
3.6.2 RankGen and RankCal Modules
For the RankGen module in a cell ci, its implementation is given in Fig.
3.3.1. Signal Fi is the output of a comparator module “<=,” which compares the
value of Ri with that of the input sample X so that Fi = 1if Ri is less than or equal
to X (Ri <= X), else Fi = 0 (Ri > X). Signal Ai is the output of a logic AND gate so
that Ai = 1 if Ti = 0 and Fi = 1; i.e., if cell ci does not contain the token and Ri is
less than or equal to X, else Ai = 0.
28
Figure 3.6.2 Implementation of the Rank Generation module.
Figure 3.6.2.1 Implementation of the Ctrl Module.
The Ai signal of each cell is connected to the RankCal module, which

calculates the new rank of a cell that contains the token. K is equal to the number
of logic 1’s on the Ai signals of all the cells. The RankCal module can be
implemented by a multi-input adder that adds all the Ai signals and then increments
the sum by 1. If cell ci contains the token, the output A of the RankCal module will
be its new rank at the next cycle. On the contrary, if cell ci does not contain the
token, its new rank will be determined by the other signals. Signal Gi is the output
of a comparator module “>,” which compares the value of rank Pi with that of
signal B so that Gi = 1 if Pi is greater than B, else Gi = 0. Since the value of B is
the rank Pj of another cell cj that contains the token, the meaning of Gi can also be
29
described as Gi = 1 if Pi is greater than Pj (Pi > Pj), else Gi = 0(Pi <= Pj). Signal
Ei is the output of a compactor module “==,” which also compares the value of Pi
with thatof B so that Ei = 1 if Pi is equal to B, else Ei = 0. Likewise, the meaning
of Ei can be described as Ei = 1 if Pi is equal to Pj (Pi = Pj), else Ei = 0.
Combining these two signals Eiand Gi, for the three relations between Pi and Pj
(Pi > Pj , Pi = Pj , and Pi < Pj ), the corresponding value of EiGi will be 01, 10,
and 00, respectively.
3.6.3 Ctrl Module
For a cell ci, since there are four possible sources to update its rank, a 4-to-1
multiplexer is used to select one of these sources for signal Qi in Fig. 3.3.2. Then,
the value of rank Pi will be updated by the value of Qi at each cycle. The
multiplexer is controlled by two selection signals S1 and S0; and these two signals,
which are generated by the Ctrl module, are determined by four signals Ti, Ei, Fi,
and Gi. If cell ci contains the token (Ti = 1), its new rank is obtained from the
output A of the RankCal module; i.e., the value of S1S0 should be 11 when Ti = 1.
If cell ci does not contain the token (Ti = 0),there are five cases for this, as
described in Section III-B. In Case 1 when Pi > Pj (EiGi = 01) and Ri <= X (Fi =
1), the rank of ci will be decremented by 1; i.e., the value of S1S0 should be 10
when EiFiGi = 011.
Similarly in Case 2, when Pi < Pj (EiGi = 00) and Ri > X (Fi = 0), the rank
of ci will be incremented by 1; i.e., the value of S1S0 should be 01 when EiFiGi =
000. Finally, when Pi < Pj (EiGi = 00) and Ri <= X (Fi = 1) in Case 3, Pi > Pj
(EiGi = 01) and Ri >X (Fi = 0) in Case 4, and Pi = Pj (EiGi = 10) in Case 5, the
rank of ci will be kept unchanged; i.e., when EiFiGi = 010, 001, or 1 − 0, the value
of S1S0 should be 00. Fig depicts a simple implementation of the Ctrl module.
30
CHAPTER 4
SOFTWARE DESCRIPTION
4.1 HARDWARE DESCRIPTION LANGUAGE
In electronics, a hardware description language or HDL is any language

from a class of computer languages, specification languages, or modeling
languages for formal description and design of electronic circuits, and most-
commonly, digital logic. It can describe the circuit's operation, its design and
organization, and tests to verify its operation by means of simulation.
HDLs are standard text-based expressions of the spatial and temporal

structure and behavior of electronic systems. Like concurrent programming
languages, HDL syntax and semantics includes explicit notations for expressing
concurrency. However, in contrast to most software programming languages,
HDLs also include an explicit notion of time, which is a primary attribute of
hardware. Languages whose only characteristic is to express circuit connectivity
between a hierarchy of blocks are properly classified as net lists languages used on
electric computer-aided design (CAD).
HDLs are used to write executable specifications of some piece of hardware.

A simulation program, designed to implement the underlying semantics of the
language statements, coupled with simulating the progress of time, provides the
hardware designer with the ability to model a piece of hardware before it is created
physically. It is this executability that gives HDLs the illusion of being
programming languages, when they are more-precisely classed as specification
languages or modeling languages. Simulators capable of supporting discrete-event
31
(digital) and continuous-time (analog) modeling exist, and HDLs targeted for each
are available.
It is certainly possible to represent hardware semantics using traditional

programming languages such as C++, although to function such programs must be
augmented with extensive and unwieldy class libraries. Primarily, however,
software programming languages do not include any capability for explicitly
expressing time and this is why they do not function as a hardware description
language. Before the recent introduction of System Verilog, C++ integration with a
logic simulator was one of the few ways to use OOP in hardware verification.
System Verilog is the first major HDL to offer object orientation and garbage
collection.
Using the proper subset of virtually any (hardware description or software

programming) language, a software program called a synthesizer (or synthesis
tool) can infer hardware logic operations from the language statements and
produce an equivalent net list of generic hardware primitives to implement the
specified behaviour. Synthesizers generally ignore the expression of any timing
constructs in the text. Digital logic synthesizers, for example, generally use clock
edges as the way to time the circuit, ignoring any timing constructs. The ability to
have a synthesizable subset of the language does not itself make a hardware
description language.
4.1.1 Design using HDL
Efficiency gains realized using HDL means a majority of modern digital

circuit design revolves around it. Most designs begin as a set of requirements or a
high-level architectural diagram. Control and decision structures are often
prototyped in flowchart applications, or entered in a state-diagram editor. The
32
process of writing the HDL description is highly dependent on the nature of the
circuit and the designer's preference for coding style. The HDL is merely the
'capture language'—often beginning with a high-level algorithmic description such
as MATLAB or a C++ mathematical model. Designers often use scripting
languages (such as Perl) to automatically generate repetitive circuit structures in
the HDL language. Special text editors offer features for automatic indentation,
syntax-dependent coloration, and macro-based expansion of entity architecture
signal declaration.
The HDL code then undergoes a code review, or auditing. In preparation for
synthesis, the HDL description is subject to an array of automated checkers. The
checkers report deviations from standardized code guidelines, identify potential
ambiguous code constructs before they can cause misinterpretation, and check for
common logical coding errors, such as dangling ports or shorted outputs. This
process aids in resolving errors before the code is synthesized.
In industry parlance, HDL design generally ends at the synthesis stage. Once
the synthesis tool has mapped the HDL description into a gate netlist , this netlist is
passed off to the back-end stage. Depending on the physical technology (FPGA,
ASIC gate array, ASIC standard cell), HDLs may or may not play a significant role
in the back-end flow. In general, as the design flow progresses toward a physically
realizable form, the design database becomes progressively more laden with
technology-specific information, which cannot be stored in a generic HDL
description. Finally, an integrated circuit is manufactured or programmed for use.
4.1.2 Simulating and debugging HDL code
Essential to HDL design is the ability to simulate HDL programs.

Simulation allows an HDL description of a design (called a model) to pass design
33
verification, an important milestone that validates the design's intended function
(specification) against the code implementation in the HDL description. It also
permits architectural exploration. The engineer can experiment with design choices
by writing multiple variations of a base design, then comparing their behavior in
simulation. Thus, simulation is critical for successful HDL design.
To simulate an HDL model, an engineer writes a top-level simulation

environment (called a testbench). At minimum, a testbench contains an
instantiation of the model (called the device under test or DUT), pin/signal
declarations for the model's I/O, and a clock waveform. The testbench code is
event driven: the engineer writes HDL statements to implement the (testbench-
generated) reset-signal, to model interface transactions (such as a host–bus
read/write), and to monitor the DUT's output. An HDL simulator — the program
that executes the testbench — maintains the simulator clock, which is the master
reference for all events in the testbench simulation. Events occur only at the
instants dictated by the testbench HDL (such as a reset-toggle coded into the
testbench), or in reaction (by the model) to stimulus and triggering events. Modern
HDL simulators have a full-featured graphical user interfaces, complete with a
suite of debug tools. These allow the user to stop and restart the simulation at any
time, insert simulator breakpoints (independent of the HDL code), and monitor or
modify any element in the HDL model hierarchy. Modern simulators can also link
the HDL environment to user-compiled libraries, through a defined PLI/VHPI
interface. Linking is system-dependent (Win32/Linux/SPARC), as the HDL
simulator and user libraries are compiled and linked outside the HDL environment.
Design verification is often the most time-consuming portion of the design

process, due to disconnect between a device's functional specification, the
designer's interpretation of the specification, and the imprecision of the HDL
34
language. The majority of the initial test/debug cycle is conducted in the HDL
simulator environment, as the early stage of the design is subject to frequent and
major circuit changes. An HDL description can also be prototyped and tested in
hardware — programmable logic devices are often used for this purpose. Hardware
prototyping is comparatively more expensive than HDL simulation, but offers a
real-world view of the design. Prototyping is the best way to check interfacing
against other hardware devices and hardware prototypes. Even those running on
slow FPGAs offer much faster simulation times than pure HDL simulation.
4.1.3 Design Verification with HDLs
Historically, design verification was a laborious, repetitive loop of writing

and running simulation test cases against the design under test. As chip designs
have grown larger and more complex, the task of design verification has grown to
the point where it now dominates the schedule of a design team. Looking for ways
to improve design productivity, the EDA industry developed the Property
Specification Language.
In formal verification terms, a property is a factual statement about the

expected or assumed behavior of another object. Ideally, for a given HDL
description, a property or properties can be proven true or false using formal
mathematical methods. In practical terms, many properties cannot be proven
because they occupy an unbounded solution space. However, if provided a set of
operating assumptions or constraints, a property checker can prove (or disprove)
more properties, over the narrowed solution space.
The assertions do not model circuit activity, but capture and document the
"designer's intent" in the HDL code. In a simulation environment, the simulator
evaluates all specified assertions, reporting the location and severity of any
35
violations. In a synthesis environment, the synthesis tool usually operates with the
policy of halting synthesis upon any violation. Assertion-based verification is still
in its infancy, but is expected to become an integral part of the HDL design toolset.
4.1.4 HDL and programming languages
A HDL is analogous to a software programming language, but with major

differences. Many programming languages are inherently procedural (single-
threaded), with limited syntactical and semantic support to handle concurrency.
HDLs, on the other hand, resemble concurrent programming languages in their
ability to model multiple parallel processes (such as flipflops, adders, etc.) that
automatically execute independently of one another. Any change to the process's
input automatically triggers an update in the simulator's process stack. Both
programming languages and HDLs are processed by a compiler (usually called a
synthesizer in the HDL case), but with different goals. For HDLs, 'compiler' refers
to synthesis, a process of transforming the HDL code listing into a physically
realizable gate netlist. The netlist output can take any of many forms: a
"simulation" netlist with gate-delay information, a "handoff" netlist for post-
synthesis place and route, or a generic industry-standard EDIF format (for
subsequent conversion to a JEDEC-format file).
On the other hand, a software compiler converts the source-code listing into
a microprocessor-specific object-code, for execution on the target microprocessor.
As HDLs and programming languages borrow concepts and features from each
other, the boundary between them is becoming less distinct. However, pure HDLs
are unsuitable for general purpose software application development, just as
general-purpose programming languages are undesirable for modeling hardware.
Yet as electronic systems grow increasingly complex, and reconfigurable systems
36
become increasingly mainstream, there is growing desire in the industry for a
single language that can perform some tasks of both hardware design and software
programming. SystemC is an example of such—embedded system hardware can be
modeled as non-detailed architectural blocks (blackboxes with modeled signal
inputs and output drivers). The target application is written in C/C++, and natively
compiled for the host-development system (as opposed to targeting the embedded
CPU, which requires host-simulation of the embedded CPU). The high level of
abstraction of SystemC models is well suited to early architecture exploration, as
architectural modifications can be easily evaluated with little concern for signal-
level implementation issues. However, the threading model used in SystemC and
its reliance on shared memory mean that it does not handle parallel execution or
lower level models well.
In an attempt to reduce the complexity of designing in HDLs, which have

been compared to the equivalent of assembly languages, there are moves to raise
the abstraction level of the design. Companies such as Cadence, Synopsys and
Agility Design Solutions are promoting SystemC as a way to combine high level
languages with concurrency models to allow faster design cycles for FPGAs than
is possible using traditional HDLs. Approaches based on standard C or C++ (with
libraries or other extensions allowing parallel programming) are found in the
Catapult C tools from Mentor Graphics, the Impulse C tools from Impulse
Accelerated Technologies, and the free and open-source ROCCC 2.0 tools from
Jacquard Computing Inc. Annapolis Micro Systems, Inc.'s Core Fire Design Suite
and National Instruments Lab VIEW FPGA provide a graphical dataflow approach
to high-level design entry. Languages such as SystemVerilog, SystemVHDL, and
Handel-C seek to accomplish the same goal, but are aimed at making existing
hardware engineers more productive versus making FPGAs more accessible to
37
existing software engineers. There is more information on C to HDL and Flow to
HDL in their respective articles.
4.2 VERILOG
Verilog, standardized as IEEE 1364, is a hardware description language
(HDL) used to model electronic systems. It is most commonly used in the design
and verification of digital circuits at the register-transfer level of abstraction. It is
also used in the verification of analog circuits and mixed-signal circuits.
Hardware description languages such as Verilog differ from software

programming languages because they include ways of describing the propagation
time and signal strengths (sensitivity). There are two types of assignment
operators; a blocking assignment (=), and a non-blocking (<=) assignment. The
non-blocking assignment allows designers to describe a state-machine update
without needing to declare and use temporary storage variables. Since these
concepts are part of Verilog's language semantics, designers could quickly write
descriptions of large circuits in a relatively compact and concise form. At the time
of Verilog's introduction (1984), Verilog represented a tremendous productivity
improvement for circuit designers who were already using graphical schematic
capture software and specially written software programs to document and
simulate electronic circuits.
The designers of Verilog wanted a language with syntax similar to the C

programming language, which was already widely used in engineering software
development. Like C, Verilog is case-sensitive and has a basic preprocessor
(though less sophisticated than that of ANSI C/C++). Its control flow keywords
(if/else, for, while, case, etc.) are equivalent, and its operator precedence is
compatible with C. Syntactic differences include: required bit-widths for variable
38
declarations, demarcation of procedural blocks (Verilog uses begin/end instead of
curly braces {}), and many other minor differences. Verilog requires that variables
be given a definite size. In C these sizes are assumed from the 'type' of the variable
(for instance an integer type may be 8 bits).
A Verilog design consists of a hierarchy of modules. Modules encapsulate

design hierarchy, and communicate with other modules through a set of declared
input, output, and bidirectional ports. Internally, a module can contain any
combination of the following: net/variable declarations (wire, reg, integer, etc.),
concurrent and sequential statement blocks, and instances of other modules (sub-
hierarchies). Sequential statements are placed inside a begin/end block and
executed in sequential order within the block. However, the blocks themselves are
executed concurrently, making Verilog a dataflow language.
Verilog's concept of 'wire' consists of both signal values (4-state: "1, 0, floating,
undefined") and signal strengths (strong, weak, etc.). This system allows abstract
modeling of shared signal lines, where multiple sources drive a common net. When
a wire has multiple drivers, the wire's (readable) value is resolved by a function of
the source drivers and their strengths. A subset of statements in the Verilog
language are synthesizable. Verilog modules that conform to a synthesizable
coding style, known as RTL (register-transfer level), can be physically realized by
synthesis software. Synthesis software algorithmically transforms the (abstract)
Verilog source into a netlist, a logically equivalent description consisting only of
elementary logic primitives (AND, OR, NOT, flip-flops, etc.) that are available in
a specific FPGA or VLSI technology. Further manipulations to the netlist
ultimately lead to a circuit fabrication blueprint (such as a photo mask set for an
ASIC or a bit stream file for an FPGA).
39
Example:
modulemain;
initial
begin
$display("Hello world!");
$finish;
end
endmodule
The Verilog Procedural Interface (VPI), originally known as PLI 2.0, is an

interface primarily intended for the C programming language. It allows behavioral
Verilog code to invoke C functions, and C functions to invoke standard Verilog
system tasks. The Verilog Procedural Interface is part of the IEEE 1364
Programming Language Interface standard; the most recent edition of the standard
is from 2005. VPI is sometimes also referred to as PLI 2, since it replaces the
deprecated Program Language Interface (PLI).
4.3 SOFTWARE INFORMATION
Create a New Project Create a new ISE project which will target the FPGA
device on the Spartan-3 Startup Kit demo board.
To create a new project:
1. Select File > New Project... The New Project Wizard appears.
2. Type tutorial in the Project Name field.
3. Enter or browse to a location (directory path) for the new project. A
tutorial subdirectory is created automatically.
4. Verify that HDL is selected from the Top-Level Source Type list.
40
5. Click Next to move to the device properties page.
Fill in the properties in the table as shown below:
♦ Product Category: All

♦ Family: Spartan3
♦ Device: XC3S200
♦ Package: FT256
♦ Speed Grade: -4
♦ Top-Level Source Type: HDL
♦ Synthesis Tool: XST (VHDL/Verilog)
♦ Simulator: ISE Simulator (VHDL/Verilog)
♦ Preferred Language: Verilog (or VHDL)
♦ Verify that Enable Enhanced Design Summary is selected.
Leave the default values in the remaining fields. When the table is complete,
our project properties will look like the following:
Figure 4.3 New project wizard

41
4.3.1 Creating a Verilog Source
Create the top-level Verilog source file for the project as follows:
1. Click New Source in the New Project dialog box.
2. Select Verilog Module as the source type in the New Source dialog box.
3. Type in the file name counter.
4. Verify that the Add to Project checkbox is selected.
5. Click Next.
6. Declare the ports for the counter design by filling in the port information
as shown below:
Figure 4.3.1 New source wizard
The source file containing the counter module displays in the Workspace,
and the counter displays in the Sources tab, as shown below:
42
Figure 4.3.1.1 Xilinx ISE window
4.3.2 Using Language Templates (Verilog):

The next step in creating the new source is to add the behavioral description
for counter. Use a simple counter code example from the ISE Language Templates
and customize it for the counter design.
1. Place the cursor on the line below the output [3:0] COUNT_OUT;
statement.
2. Open the Language Templates by selecting Edit → Language
Templates… Note: You can tile the Language Templates and the counter file by
selecting Window → Tile Vertically to make them both visible.
3. Using the “+” symbol, browse to the following code example: Verilog →
Synthesis Constructs → Coding Examples → Counters → Binary → Up/Down
Counters → Simple Counter.
43
4. With Simple Counter selected, select Edit → Use in File, or select the Use
Template in File toolbar button. This step copies the template into the counter
source file.
5. Close the Language Templates
4.3.3 Design Simulation
Verifying Functionality using Behavioral Simulation Create a test bench

waveform containing input stimulus you can use to verify the functionality of the
counter module.
The test bench waveform is a graphical view of a test bench. Create the test
bench waveform as follows:
1. Select the counter HDL file in the Sources window.
2. Create a new test bench source by selecting Project → New Source.
3. In the New Source Wizard, select Test Bench WaveForm as the source
type, and type counter in the File Name field.
4. Click Next.
5. The Associated Source page shows that you are associating the test bench
waveform with the source file counter. Click Next.
6. The Summary page shows that the source will be added to the project, and
it displays the source directory, type, and name. Click Finish.
7. You need to set the clock frequency, setup time and output delay times in
the Initialize Timing dialog box before the test bench waveform editing window
opens.
44
The requirements for this design are the following:
♦ The counter must operate correctly with an input clock frequency = 25

MHz.
♦ The DIRECTION input will be valid 10 ns before the rising edge of

CLOCK
♦ The output (COUNT_OUT) must be valid 10 ns after the rising edge of

CLOCK. The design requirements correspond with the values below. Fill in the
fields in the Initialize Timing dialog box with the following information:
♦ Clock High Time: 20 ns.
♦ Clock Low Time: 20 ns.
♦ Input Setup Time: 10 ns.
♦ Output Valid Delay: 10 ns. ♦ Offset: 0 ns.
♦ Global Signals: GSR (FPGA) Note: When GSR (FPGA) is enabled, 100
ns is added to the Offset value automatically.
♦ Initial Length of Test Bench: 1500 ns.
8. Click Finish to complete the timing initialization.
9. The blue shaded areas that precede the rising edge of the CLOCK correspond to
the Input Setup Time in the Initialize Timing dialog box.
45
Figure 4.3.3 Initial timing and clock wizard
Toggle the DIRECTION port to define the input stimulus for the counter design as
follows:
♦ Click on the blue cell at approximately the 300 ns to assert DIRECTION high so
that the counter will count up.
♦ Click on the blue cell at approximately the 900 ns to assert DIRECTION low so
that the counter will count down.
46
Fig 4.3.3.1 Clock pulse
47
CHAPTER 5
RESULT AND CONCLUSION
5.1 RESULT
A low-power architecture for the design of a 1-D median filter has been
proposed in this brief. The power saving is achieved by adopting a token ring in
our design, where the stored samples are kept immobile; only the rank of each
sample has to be updated at each new cycle. As can be seen from the experimental
results, the power consumption for median filters in practical use has been
successfully reduced at the expense of some area overhead.
Figure 4.5 Output
48
Figure 5.1.1 Power Reduction window
By using token register 0.019 watt power consumed in this project.
5.2. CONCLUSION
A low power architecture for the design of a one dimensional median filter
has been proposed in this brief .The power saving is achieved by adopting a token
ring in our design, where the stored samples are kept immobile ;only the rank of
each sample has to be updated at each new cycle .As can be seen from the
experimental result ,the power consumption for median filters in practical use has
been successfully reduced at the expense of some area overhead.
49
REFERENCES
1. J. Cadenas, G. M. Megson, R. S. Sherratt, and P. Huerta, “Fast median

calccalculation method,” Electron. Lett., vol. 48, no. 10, pp. 558–560, May 2012.
2. T. Chelcea and S. M. Nowick, “Robust interfaces for mixed-timing systems,”
IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 12, no. 8, pp. 857–873,
Aug. 2004.
3. O. T.-C. Chen, S. Wang, and Y.-W. Wu, “Minimization of switching activities
of partial products for designing low-power multipliers,” IEEE Trans. Very
Large Scale Integr. (VLSI) Syst., vol. 11, no. 3, pp. 418–433, Jun. 2003.
4. R.-D. Chen, P.-Y. Chen, and C.-H. Yeh, “Design of an area-efficient one
dimensional median filter,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 60,
no. 10, pp. 662–666, Oct. 2013.
5. C. Choo and P. Verma, “A real-time bit-serial rank filter implementation using
Xilinx FPGA,” in Proc. SPIE Real-Time Image Process., 2008, vol. 6811, pp.
68110F-1–68110F-8.
6. S. A. Fahmy, P. Y. K. Cheung, and W. Luk, “High-throughput one dimensional
median and weighted median filters on FPGA,” IET Comput. Digit. Tech., vol.
3, no. 4, pp. 384–394, Jul. 2009.
7. I. M. Panades and A. Greiner, “Bi-synchronous FIFO for synchronous circuit
communication well suited for network-on-chip in GALS architectures,” in
Proc. 1st Int. Symp. NOCS, 2007, pp. 83–94.
8. V. G. Moshnyaga and K. Hashimoto, “An efficient implementation of 1-D
median filter,” in Proc. 52nd IEEE Int. MWSCAS, 2009, pp. 451–454.
50
9. M. Mottaghi-Dastjerdi, A. Afzali-Kusha, and M. Pedram, “BZ-FAD: A low-
power low-area multiplier based on shift-and-add architecture,” IEEE Trans.
Very Large Scale Integr. (VLSI) Syst., vol. 17, no. 2, pp. 302–306, Feb. 2009.
10. D. Prokin and M. Prokin, “Low hardware complexity pipelined rank filter,”
IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 57, no. 6, pp. 446– 450, Jun.
2010.
11. D. S. Richards, “VLSI median filters,” IEEE Trans. Acoust., Speech, Signal
Process., vol. 38, no. 1, pp. 145–153, Jan. 1990.
12. Z. Vasicek and L. Sekanina, “Novel hardware implementation of adaptive
median filters,” in Proc. 11th IEEE Workshop DDECS, 2008, pp. 1–6.
51

1.1 Overview

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1.1 Overview

Uploaded by

Copyright:

Available Formats

CHAPTER 1

The dynamic power of a very-large-scale integrated circuit is an effective

G (u, v) = H (u, v) F (u, v) + N (u, v)

Degradation Noise, η(x, y) Restoration

Figure 1.1 Image degradation model

Sometimes image are degraded by external noises as noises are additive in

Figure 1.1.1 PDF of impulsive noise

Figure 1.1.2 PDF of uniform noise

Denoising Techniques another type of image degradation is due to additive

The Fourier transform of will be

G (u, v) = F (u, v) + N (u, v)

Spatial Filtering Denoising techniques exist in both spatial domain as well

Response of Order statistics filters is based on ordering the pixels contained

Adaptive filter performance is usually superior to non-adaptive

1.2 MEDIAN FILTERING

When performing median filtering, each pixel is determined by the median

Median filtering is a simple and very effective noise removal filtering

Common Names: Median filtering, Rank filtering

1.3.1 Brief Description

The median filter is normally used to reduce noise in an image, somewhat

1.3.2 How It Works

123 125 126 130 140

Figure 1.2 Image for sample window

1.4 MEAN FILTER

Common Names : Mean filtering , Smoothing , Averaging , Box filtering .

1.4.1 Brief Description

Mean filtering is a simple, intuitive and easy to implement method of

1.4.2 How It Works

Computing the straightforward convolution of an image with this kernel

1.5 CRIMMINS SPECKLE REMOVAL

Common Names : Crimmins Speckle Removal

1.5.1 Brief Description

Crimmins Speckle Removal reduces speckle from an image using the

1.5.2 How It Works

Crimmins Speckle Removal works by passing an image through a speckle

An area-efficient 1-D median filter based on the sorting network is presented

2.2 V. G. Moshnyaga and K. Hashimoto, “An efficient implementation of 1-D

This paper presents a new architecture and circuit implementation of 1-D

This work presents low-power 2's complement multipliers by minimizing the

To illustrate the proposed multipliers exhibiting low-power dissipation, the

2.4 J. Cadenas, G. M. Megson, R. S. Sherratt, and P. Huerta, “Fast median

One rank-sample result is available at the output on each clock cycle,

Lowering the dynamic power of a very-large-scale integrated circuit is an

In their methods, the input samples are sequentially processed word by

In both of their methods, however, some of the stored samples have to be

3.2 ARCHITECTURE OF THE LOW POWER FILTER

Figure 3.2 Low Power Filter Architechture

3.3 RANK GENERATION

3.4 RANK SELECTION

The RankSel module is responsible for transferring the rank Pi of a cell ci to

3.5 RANK UPDATING

3.5.1.Cell With the Token

3.5.2. Cell Without the Token

3.6 CIRCUIT IMPLEMENTATION

3.6.1.RankSel and MedianSel Modules

The RankSel module is responsible for transferring the rank Pi of a cell ci to

Figure 3.6.1 Using Tristate buffers.

3.6.2 RankGen and RankCal Modules

Figure 3.6.2.1 Implementation of the Ctrl Module.

The Ai signal of each cell is connected to the RankCal module, which

3.6.3 Ctrl Module

4.1 HARDWARE DESCRIPTION LANGUAGE

In electronics, a hardware description language or HDL is any language