Professional Documents
Culture Documents
Architecture
Course Teacher: Dr.-Ing. Shehzad Hasan
CIS, NED University
Lecture # 4
4p(1) 1 1 0 1 0 0 1 1 1 0
p(1) 1 1 1 1 0 1 0 0 –––––––––––––––––––––––
+z1a 1 1 1 0 1 0
–––––––––––––––––––––––––––––––––
4p(2) 1 1 0 1 1 1 0 0
p(2) 1 1 0 1 1 1 0 0
================================
Add/subtract z i/2 a
control To adder input
Several
Next multiples All multiples
multiple
... ...
Small CSA
tree
Full CSA
Adder
tree
Adder Adder
High-radix
Basic or Full
binary Speed up partial tree Economize tree
12 FAs
4 FAs + 1 HA
7-bit adder
6 FAs
h n(h)
2 4
3 6
4 9
5 13
6 19
11 FAs
1 2 3 4 3 2 1
[1, 6]
using CSA tree 7-bit CSA 7-bit CSA
[2, 8] [1,8] [5, 11] [3, 11]
p0
a3 x1
a4 x1 a2 x1 a1 x1 a0 x1
p1
a3 x2
a2 x2 a1 x2 a0 x2
a4 x2
p2
a3 x3
a2 x3 a1 x3 a0 x3
a4 x3
p3
a3 x4
a2 x4 a1 x4 a0 x4
a4 x4
p4
A basic array multiplier uses a 0
one-sided CSA tree and a ripple-
p9 p8 p7 p6 p5
carry adder
x0
p0
x1
p1
x2
FA p2
x3
p3
x4
p4
p5
p9 p8 p7 p6
where s = z – (d q)
or z = (d q) + s
q Quotient
d Divisor
z Dividend
–q3 d 23
–q2 d 22 Subtracted
–q1 d 21 bit-matrix
–q0 d 20
s Remainder
Latency: The number of clock cycles that are required for the execution core to complete
the execution of all of the μops that form an instruction.
Cycles to wait: The number of clock cycles required to wait before the issue ports are free to
accept the same instruction again
Source: Intel® 64 and IA-32 Architectures Optimization Reference Manual
Divide fractions like integers; adjust the remainder zfrac < dfrac