10 11 12 13
. x x x x
20 21 22 23
. x x x x
1 1 2 2
y a x a x = +
8
Scaling
accumulator
2.Conventional Distributed Arithmetic
FIR Filter
0 1 2 1
( ) ( ) ( 1) ( 2) ... ( ( 1))
N
y n h x n h x n h x n h x n N
= + + + +
: Number of Filter Coefficients
: Filter Coefficients
k
N
h
1
0
( ) ( )
N
k
k
y n h x n k
=
=
1
Z
0
h
1
Z
1
h
1
Z
2
h
+
3
h
+ +
1
Z
M
h
+ ( ) y n
( ) x n
0
h
1
h
2
h
3
h
1 N
h
9
The input samples will be represented by Bbit
in 2s complement format
The output y(n) is
where and
1
0
1
( ) 2
B
j
k kj
j
x n k x x
=
= +
1 1 1
0
0 1 0
( ) 2
N B N
j
k k k kj
k j k
y n h x h x
= = =
= +
1
0
( ) 2
B
j
j
j
y n C
=
=
1
0
, 1,..., 1
N
j k kj
k
C h x j B
=
= =
1
0 0
0
N
k k
k
C h x
=
=
10
1
2
PISO
SISO
SISO
SISO
0 j
x
( ) x n
1 j
x
2 j
x
3 j
x
j
C
( ) y n
Input
3 2 1 0 j j j j
x x x x
0 0 0 0
0 0 0 1
0 0 1 0
0 0 1 1
0 1 0 0
0 1 0 1
0 1 1 0
0 1 1 1
0
0
h
1
h
1 0
h h +
2
h
2 0
h h +
2 1
h h +
2 1 0
h h h + +
1 0 0 0
1 0 1 0
1 1 0 0
1 1 1 0
1 1 1 1
1 1 0 1
1 0 1 1
1 0 0 1
3
h
3 0
h h +
3 1
h h +
3 1 0
h h h + +
3 2
h h +
3 2 0
h h h + +
3 2 1
h h h + +
3 2 1 0
h h h h + + +
j
C
1
0
( ) 2
B
j
j
j
y n C
=
=
1
0
, 1,..., 1
N
j k kj
k
C h x j B
=
= =
1
0 0
0
N
k k
k
C h x
=
=
=
= +
1 1 1
0
0 1 0
( ) ( ) ( ) 2
N B N
j
k kj
k j k
y n x n k h x n k h
= = =
= +
1
0
( ) 2
B
j
j
j
y n D
=
=
1
0
( ) , 1, ..., 1
N
j kj
k
D x n k h j B
=
= =
1
0 0
0
( )
N
k
k
D x n k h
=
=
where and
12
Hardware Architecture (N=4)
1
0
( ) 2
B
j
j
j
y n D
=
=
1
0
( ) , 1, ..., 1
N
j kj
k
D x n k h j B
=
= =
1
0 0
0
( )
N
k
k
D x n k h
=
=
13
Partial product
Adder Network and MUX
(N=4)
14
Partial product
4.FIR Adaptive Filter
x(n) : Input signal
d(n) : Desired signal
y(n) : Output signal
e(n) : Error signal
Error Signal: ( ) ( ) ( ) e n d n y n =
LMS algorithm:
( 1) ( ) ( ) ( ); : convergence factor
k k
h n h n e n x n k + = +
1
0
Output Signal: ( ) ( ) ( )
N
k
k
y n h n x n k
=
=
( ) d n
( ) x n
15
4.The Purposed Structure LMS Adaptive Filter
y(n)
ADD/SUB
lacc
clacc
Shift 0
s/ a
clacc
lr
2
1
ACC
Buffer
Shifter
Shift 1
BARREL SHIFTER
BARREL SHIFTER
BARREL SHIFTER
BARREL SHIFTER
upd
3
h
2
h
1
h
0
h
lr
clk
lr
Shift 2
lr
d(n)

+
Buffer
P
I
S
O
Buffer
P
I
S
O
P
I
S
O
P
I
S
O
MUX
( )
Quantizer
e n
Parallel Data
Serial Data
Adder
N/W
PIPO
PIPO
PIPO
x(n)
x(n1)
x(n3)
x(n2)
16
4.The Purposed Structure LMS Adaptive Filter
y(n)
ADD/SUB
lacc
clacc
Shift 0
s/ a
clacc
lr
2
1
ACC
Buffer
Shifter
Shift 1
BARREL SHIFTER
BARREL SHIFTER
BARREL SHIFTER
BARREL SHIFTER
upd
3
h
2
h
1
h
0
h
lr
clk
lr
Shift 2
lr
d(n)

+
Buffer
P
I
S
O
Buffer
P
I
S
O
P
I
S
O
P
I
S
O
MUX
( )
Quantizer
e n
Parallel Data
Serial Data
Adder
N/W
PIPO
PIPO
PIPO
x(n)
x(n1)
x(n3)
x(n2)
17
4.The Purposed Structure LMS Adaptive Filter
y(n)
ADD/SUB
lacc
clacc
Shift 0
s/ a
clacc
lr
2
1
ACC
Buffer
Shifter
Shift 1
BARREL SHIFTER
BARREL SHIFTER
BARREL SHIFTER
BARREL SHIFTER
upd
3
h
2
h
1
h
0
h
lr
clk
lr
Shift 2
lr
d(n)

+
Buffer
P
I
S
O
Buffer
P
I
S
O
P
I
S
O
P
I
S
O
MUX
( )
Quantizer
e n
Parallel Data
Serial Data
Adder
N/W
PIPO
PIPO
PIPO
x(n)
x(n1)
x(n3)
x(n2)
18
4.The Purposed Structure LMS Adaptive Filter
y(n)
ADD/SUB
lacc
clacc
Shift 0
s/ a
clacc
lr
2
1
ACC
Buffer
Shifter
Shift 1
BARREL SHIFTER
BARREL SHIFTER
BARREL SHIFTER
BARREL SHIFTER
upd
3
h
2
h
1
h
0
h
lr
clk
lr
Shift 2
lr
d(n)

+
Buffer
P
I
S
O
Buffer
P
I
S
O
P
I
S
O
P
I
S
O
MUX
( )
Quantizer
e n
Parallel Data
Serial Data
Adder
N/W
PIPO
PIPO
PIPO
x(n)
x(n1)
x(n3)
x(n2)
19
20
Control Unit Simulation Result
5.Synthesis Results
Hardware Simulation Result
5.Synthesis Results
21
When Apply the constant value of input and desired signal
the output of adaptive filter is produced and adaptation is perform
until the error signal is zero, the output is equal to the desired.
5.Synthesis Results
Conventional DA CoefficientDistributive DA
Registered Performance
(Maximum Clock Frequency)
5.47 MHz 25.06 MHz
Memory Bits 320 bits 0 bits
Logic Cell Utilized 506 LCs 1,521 LCs
22
Performance Comparison
5.Synthesis Results
The throughput is defined as the ratio of clock rate to the number of
clock cycles required for processing a signal sample.
For Ntap FIR Adaptive Filter, both LUTbased and CDDA adaptive
filter requires B clock cycles for compute the output sequence.
The LUTbased structure requires N clock cycles for updating LUT.
The CDDA structure requires constant 4 clock cycles
(number of tap independent ) for update filter coefficients directly,
thus
Clock Rate
Throughtput
t
=
LUTBased
Clock Rate
Throughtput
B N
=
+
CDDA
Clock Rate
Throughtput
4 B
=
+
23
Throughput Comparison
(Clock Rate = 2 MHz, B=8 bits)
4 8 16 32
1
1.5
2
2.5
3
3.5
4
4.5
5
5.5
x 10
5
N
u
m
b
e
r
o
f
S
a
m
p
l
e
s
p
r
o
c
e
s
s
e
d
p
e
r
s
e
c
o
n
d
Number of Filter Coefficients
Throughput Comparison of LUTless versus LUTbased Structure
LUTbased
LUTless
5.Synthesis Results
24
5.Experimental Results
( ) e n
( ) y n
( ) d n
( ) x n
Sinusoidal High Freq.
+
Sinusoidal Low Freq.
Desired
Sinusoidal Low Freq.
High Frequency : 7 kHz
Low Frequency: 500 Hz
Sampling Frequency: 30 kHz
Convergence Factor: 0.0625
4tap FIR Adaptive Filter
25
5.Experimental Results
Desired
Input
Output
Simulation Result
Experimental Result
26
50Hz Sine
Reference
Corrupted ECG
( ) e n
( ) y n
27
5.Experimental Results
Sampling Frequency: 30 kHz
Convergence Factor: 0.0625
4tap FIR Adaptive Filter
28
5.Experimental Results
Corrupted
ECG
Filted ECG
Simulation Result
Experimental Result
Conclusion
The multiplierless adaptive filter can be implemented.
The LUT is replaced by adder network and multiplexer.
Very simple structure
Constant Throughput
(do not require extra time for LUT updating) .
The complexity of adder network will be increased
exponentially when the filter length is higher.
29
Thank you
Much more than documents.
Discover everything Scribd has to offer, including books and audiobooks from major publishers.
Cancel anytime.