Chapter 3

COMPUTER ORGANIZATION AND DESIGN RISC-V
Edition
The Hardware/Software Interface
Chapter 3
Arithmetic for Computers
§3.1 Introduction
Arithmetic for Computers
n Operations on integers
n Addition and subtraction
n Multiplication and division
n Dealing with overflow
n Floating-point real numbers
n Representation and operations
Chapter 3 — Arithmetic for Computers — 2

§3.2 Addition and Subtraction
Integer Addition
n Example: 7 + 6
n Overflow if result out of range

n Adding +ve and –ve operands, no overflow
n Adding two +ve operands
n Overflow if result sign is 1
n Adding two –ve operands

Integer Subtraction
n Add negation of second operand
n Example: 7 – 6 = 7 + (–6)
+7: 0000 0000 … 0000 0111
–6: 1111 1111 … 1111 1010
+1: 0000 0000 … 0000 0001
n Overflow if result out of range
n Subtracting two +ve or two –ve operands, no overflow
n Subtracting +ve from –ve operand
n Subtracting –ve from +ve operand

Arithmetic for Multimedia
n Graphics and media processing operates
on vectors of 8-bit and 16-bit data
n Use 64-bit adder, with partitioned carry chain
n Operate on 8×8-bit, 4×16-bit, or 2×32-bit vectors
n SIMD (single-instruction, multiple-data)
n Saturating operations
n On overflow, result is largest representable
value
n c.f. 2s-complement modulo arithmetic
n E.g., clipping in audio, saturation in video
§3.3 Multiplication
Multiplication
n Start with long-multiplication approach
multiplicand
1000
multiplier
× 1001
1000
0000
0000
1000
product 1001000
Length of product is
the sum of operand
lengths

Multiplication Hardware
Initially 0

Optimized Multiplier
n Perform steps in parallel: add/shift
n One cycle per partial-product addition

n That’s ok, if frequency of multiplications is low
Faster Multiplier
n Uses multiple adders
n Cost/performance tradeoff
n Can be pipelined
n Several multiplication performed in parallel
RISC-V Multiplication
n Four multiply instructions:
n mul: multiply
n Gives the lower 64 bits of the product
n mulh: multiply high
n Gives the upper 64 bits of the product, assuming the
operands are signed
n mulhu: multiply high unsigned
n Gives the upper 64 bits of the product, assuming the
operands are unsigned
n mulhsu: multiply high signed/unsigned
n Gives the upper 64 bits of the product, assuming one
operand is signed and the other unsigned
n Use mulh result to check for 64-bit overflow
§3.4 Division
Division
n Check for 0 divisor
n Long division approach
quotient n If divisor ≤ dividend bits
dividend n 1 bit in quotient, subtract
1001 n Otherwise
1000 1001010 n 0 bit in quotient, bring down next
-1000 dividend bit
divisor
10 n Restoring division
101 n Do the subtract, and if remainder
1010 goes < 0, add divisor back
-1000 n Signed division
remainder 10
n Divide using absolute values
n Adjust sign of quotient and remainder
n-bit operands yield n-bit as required
quotient and remainder

Division Hardware
Initially divisor
in left half
Initially dividend

Optimized Divider
n One cycle per partial-remainder subtraction

n Looks a lot like a multiplier!
n Same hardware can be used for both

Faster Division
n Can’t use parallel hardware as in multiplier
n Subtraction is conditional on sign of remainder
n Faster dividers (e.g., SRT division)
generate multiple quotient bits per step
n Sweeney, Robertson, and Tocher
n SRT division uses a lookup table based on the
dividend and the divisor to determine each quotient
digit
n Still requires multiple steps

RISC-V Division
n Four instructions:
n div, rem: signed divide, remainder
n divu, remu: unsigned divide, remainder
n Overflow and division-by-zero don’t

produce errors
n Just return defined results
n Faster for the common case of no error

§3.5 Floating Point
Floating Point
n Representation for non-integral numbers
n Including very small and very large numbers
n Like scientific notation
n –2.34 × 1056 normalized
n +0.002 × 10–4 not normalized

n +987.02 × 109
n In binary
n ±1.xxxxxxx2 × 2yyyy
n Types float and double in C
Floating Point Standard
n Defined by IEEE Std 754-1985
n Developed in response to divergence of
representations
n Portability issues for scientific code
n Now almost universally adopted
n Two representations
n Single precision (32-bit)
n Double precision (64-bit)

IEEE Floating-Point Format
single: 8 bits single: 23 bits
double: 11 bits double: 52 bits
S Exponent Fraction
x = ( -1)S ´ (1+ Fraction) ´ 2(Exponent -Bias)
n S: sign bit (0 Þ non-negative, 1 Þ negative)

n Normalize significand: 1.0 ≤ |significand| < 2.0
n Always has a leading pre-binary-point 1 bit, so no need to
represent it explicitly (hidden bit)
n Significand is Fraction with the “1.” restored
n Exponent: excess representation: actual exponent + Bias
n Ensures exponent is unsigned
n Single: Bias = 127; Double: Bias = 1023

Single-Precision Range
n Exponents 00000000 and 11111111 reserved
n Smallest value
n Exponent: 00000001
Þ actual exponent = 1 – 127 = –126
n Fraction: 000…00 Þ significand = 1.0
n ±1.0 × 2–126 ≈ ±1.2 × 10–38
n Largest value
n exponent: 11111110
Þ actual exponent = 254 – 127 = +127
n Fraction: 111…11 Þ significand ≈ 2.0
n ±2.0 × 2+127 ≈ ±3.4 × 10+38

Double-Precision Range
n Exponents 0000…00 and 1111…11 reserved
n Smallest value
n Exponent: 00000000001
Þ actual exponent = 1 – 1023 = –1022
n Fraction: 000…00 Þ significand = 1.0
n ±1.0 × 2–1022 ≈ ±2.2 × 10–308
n Largest value
n Exponent: 11111111110
Þ actual exponent = 2046 – 1023 = +1023
n Fraction: 111…11 Þ significand ≈ 2.0
n ±2.0 × 2+1023 ≈ ±1.8 × 10+308

Floating-Point Precision
n Relative precision
n all fraction bits are significant
n Single: approx 2–23
n Equivalent to 23 × log102 ≈ 23 × 0.3 ≈ 6 decimal
digits of precision
n Double: approx 2–52
n Equivalent to 52 × log102 ≈ 52 × 0.3 ≈ 16 decimal
digits of precision

Floating-Point Example
n Represent –0.75
n –0.75 = (–1)1 × 1.12 × 2–1 = -(1.5/2) = -0.75
n S=1
n Fraction = 1000…002
n Exponent = –1 + Bias
n Single: –1 + 127 = 126 = 011111102
n Double: –1 + 1023 = 1022 = 011111111102
n Single: 1011111101000…00
n Double: 1011111111101000…00

Floating-Point Example
n What number is represented by the single-
precision float
11000000101000…00
n S=1
n Fraction = 01000…002
n Exponent = 100000012 = 129
n x = (–1)1 × (1 + .012) × 2(129 – 127)
= (–1) × 1.25 × 22
= –5.0

Denormal Numbers
n Exponent = 000...0 Þ hidden bit is 0
-Bias
x = ( -1) ´ (0 + Fraction) ´ 2
S
n Smaller than normal numbers

n allow for gradual underflow, with
diminishing precision
n Denormal with fraction = 000...0

x = ( -1)S ´ (0 + 0) ´ 2-Bias = ±0.0
Two representations
of 0.0!

Infinities and NaNs
n Exponent = 111...1, Fraction = 000...0
n ±Infinity
n Can be used in subsequent calculations,
avoiding need for overflow check
n Exponent = 111...1, Fraction ≠ 000...0
n Not-a-Number (NaN)
n Indicates illegal or undefined result
n e.g., 0.0 / 0.0
n Can be used in subsequent calculations

Floating-Point Addition
n Consider a 4-digit decimal example
n 9.999 × 101 + 1.610 × 10–1
n 1. Align decimal points
n Shift number with smaller exponent
n 9.999 × 101 + 0.016 × 101
n 2. Add significands
n 9.999 × 101 + 0.016 × 101 = 10.015 × 101
n 3. Normalize result & check for over/underflow
n 1.0015 × 102
n 4. Round and renormalize if necessary
n 1.002 × 102

Floating-Point Addition
n Now consider a 4-digit binary example
n 1.0002 × 2–1 + –1.1102 × 2–2 (0.5 + –0.4375)
n 1. Align binary points
n Shift number with smaller exponent
n 1.0002 × 2–1 + –0.1112 × 2–1
n 2. Add significands
n 1.0002 × 2–1 + –0.1112 × 2–1 = 0.0012 × 2–1
n 1.0002 × 2–4, with no over/underflow
n 1.0002 × 2–4 (no change) = 0.0625

FP Adder Hardware
n Much more complex than integer adder
n Doing it in one clock cycle would take too
long
n Much longer than integer operations
n Slower clock would penalize all instructions
n FP adder usually takes several cycles
n Can be pipelined

FP Adder Hardware
Step 1
Step 2
Step 3
Step 4

Floating-Point Multiplication
n Consider a 4-digit decimal example
n 1.110 × 1010 × 9.200 × 10–5
n 1. Add exponents
n For biased exponents, subtract bias from sum
n New exponent = 10 + –5 = 5
n 2. Multiply significands
n 1.110 × 9.200 = 10.212 Þ 10.212 × 105
n 1.0212 × 106
n 1.021 × 106
n 5. Determine sign of result from signs of operands
n +1.021 × 106

Floating-Point Multiplication
n Now consider a 4-digit binary example
n 1.0002 × 2–1 × –1.1102 × 2–2 ; (0.5 × –0.4375)
n 1. Add exponents
n Unbiased: –1 + –2 = –3
n Biased: (–1 + 127) + (–2 + 127) = –3 + 254 – 127 = –3 + 127
n 2. Multiply significands
n 1.0002 × 1.1102 = 1.1102 Þ 1.1102 × 2–3
n 1.1102 × 2–3 (no change) with no over/underflow
n 1.1102 × 2–3 (no change)
n 5. Determine sign: (+ve) × (–ve) Þ –ve
n –1.1102 × 2–3 = –0.21875

FP Arithmetic Hardware
n FP multiplier is of similar complexity to FP
adder
n But uses a multiplier for significands instead of
an adder
n FP arithmetic hardware usually does
n Addition, subtraction, multiplication, division,
reciprocal, square-root
n FP « integer conversion
n Operations usually takes several cycles
n Can be pipelined

FP Instructions in RISC-V
n Separate FP registers: f0, …, f31
n double-precision
n single-precision values stored in the lower 32 bits
n FP instructions operate only on FP registers
n Programs generally don’t do integer ops on FP data,
or vice versa
n More registers with minimal code-size impact
n FP load and store instructions
n flw, fld
n fsw, fsd

n Single-precision arithmetic
n fadd.s, fsub.s, fmul.s, fdiv.s, fsqrt.s
n e.g., fadds.s f2, f4, f6
n Double-precision arithmetic
n fadd.d, fsub.d, fmul.d, fdiv.d, fsqrt.d
n e.g., fadd.d f2, f4, f6
n Single- and double-precision comparison
n feq.s, flt.s, fle.s
n feq.d, flt.d, fle.d
n Result is 0 or 1 in integer destination register
n Use beq, bne to branch on comparison result
n Branch on FP condition code true or false
n B.cond


FP Example: °F to °C
n C code:
float f2c (float fahr) {
return ((5.0/9.0)*(fahr - 32.0));
}
n fahr in f10, result in f10, literals in global memory space
n Compiled RISC-V code: const5: .dword 5

const9: .dword 9
const32: .dword 32
f2c:
flw f0,const5(x3) // f0 = 5.0f; x3 has address of gp
flw f1,const9(x3) // f1 = 9.0f
fdiv.s f0, f0, f1 // f0 = 5.0f / 9.0f
flw f1,const32(x3) // f1 = 32.0f
fsub.s f10,f10,f1 // f10 = fahr – 32.0
fmul.s f10,f0,f10 // f10 = (5.0f/9.0f)*(fahr–32.0f)
jalr x0,0(x1) // return

FP Example: Array Multiplication
n C=C+A×B
n All 32 × 32 matrices, 64-bit double-precision elements
n C code:
void mm (double c[][],
double a[][], double b[][]) {
size_t i, j, k;
for (i = 0; i < 32; i = i + 1)
for (j = 0; j < 32; j = j + 1)
for (k = 0; k < 32; k = k + 1)
c[i][j] = c[i][j]
+ a[i][k] * b[k][j];
}
n Addresses of c, a, b in x10, x11, x12, and
i, j, k in x5, x6, x7
offset of element A[i][j] = (i*32 + j)*8
n RISC-V code:
mm:...
li x28,32 // x28 = 32 (row size/loop end)
li x5,0 // i = 0; initialize 1st for loop
L1: li x6,0 // j = 0; initialize 2nd for loop
L2: li x7,0 // k = 0; initialize 3rd for loop
slli x30,x5,5 // x30 = i * 2**5 (size of row of c)
add x30,x30,x6 // x30 = i * size(row) + j
slli x30,x30,3 // x30 = byte offset of [i][j]
add x30,x10,x30 // x30 = byte address of c[i][j]
fld f0,0(x30) // f0 = c[i][j]
L3: slli x29,x7,5 // x29 = k * 2**5 (size of row of b)
add x29,x29,x6 // x29 = k * size(row) + j
slli x29,x29,3 // x29 = byte offset of [k][j]
add x29,x12,x29 // x29 = byte address of b[k][j]
fld f1,0(x29) // f1 = b[k][j]

…
slli x29,x5,5 // x29 = i * 2**5 (size of row of a)
add x29,x29,x7 // x29 = i * size(row) + k
slli x29,x29,3 // x29 = byte offset of [i][k]
add x29,x11,x29 // x29 = byte address of a[i][k]
fld f2,0(x29) // f2 = a[i][k]
fmul.d f1, f2, f1 // f1 = a[i][k] * b[k][j]
fadd.d f0, f0, f1 // f0 = c[i][j] + a[i][k] * b[k][j]
fsd f0,0(x30) // c[i][j] = f0
addi x7,x7,1 // k = k + 1
bltu x7,x28,L3 // if (k < 32) go to L3
addi x6,x6,1 // j = j + 1
bltu x6,x28,L2 // if (j < 32) go to L2
addi x5,x5,1 // i = i + 1
bltu x5,x28,L1 // if (i < 32) go to L1

Accurate Arithmetic
n IEEE Std 754 specifies additional rounding
control
n Extra bits of precision (guard, round, sticky)
n Choice of rounding modes
n Allows programmer to fine-tune numerical behavior of
a computation
n Not all FP units implement all options
n Most programming languages and FP libraries just
use defaults
n Trade-off between hardware complexity,
performance, and market requirements

Rounding
§ Rounding occurs when converting:
• Double to single precision
• Floating point # to an integer
§ FP hardware carries 3 extra bits of precision, and rounds for
proper value: Guard, Round, Sticky bits.
• The goal is to obtain the final result as if the intermediate results were
calculated using infinite precision, and then rounded.
Mantissa format plus extra bits:
1.XXXXXXXXXXXXXXXXXXXXXXX 0 0 0
^ ^ ^ ^ ^
| | | | |
| | | | - sticky bit (s)
| | | - round bit (r)
| | - guard bit (g)
| - 23-bit mantissa from a representation
- hidden bit
Rounding
n When a mantissa is to be shifted in order to align radix
points, the bits that fall off the least significant end of the
mantissa go into these extra bits.
n The guard & round bits are just 2 extra bits of precision
used in calculations.
n The sticky bit is an indication of what is/could be in lesser
significant bits that are not kept:
n If a value of ‘1’ ever is shifted into the sticky bit
position, that sticky bit remains a ‘1’ ("sticks" at 1),

despite further shifts.

Rounding Example
Mantissa from representation, 11000000000000000000100
must be shifted by 8 places (to align radix points)
g r s
Before first shift: 1.11000000000000000000100 0 0 0
After 1 shift: 0.11100000000000000000010 0 0 0
After 2 shifts: 0.01110000000000000000001 0 0 0
After 3 shifts: 0.00111000000000000000000 1 0 0
After 4 shifts: 0.00011100000000000000000 0 1 0
After 5 shifts: 0.00001110000000000000000 0 0 1
After 6 shifts: 0.00000111000000000000000 0 0 1
After 7 shifts: 0.00000011100000000000000 0 0 1
After 8 shifts: 0.00000001110000000000000 0 0 1

Rounding
n IEEE four modes of rounding:
1. Round towards + ∞
n ALWAYS round “up”: 2.1 3; -2.1 -2
2. Round towards - ∞
n ALWAYS round “down”: 1.9 1; -1.9 -2
3. Truncate
n Just drop the last bits (round towards 0)
4. Round to nearest even (case of 0.5)
n 3.5 4
n 2.5 2 if sticky bit = 0
n 2.5 3 if sticky bit = 1
§3.6 Parallelism and Computer Arithmetic: Subword Parallelism
Subword Parallellism
n Graphics and audio applications can take
advantage of performing simultaneous
operations on short vectors
n Example: 128-bit adder:
n Sixteen 8-bit adds
n Eight 16-bit adds
n Four 32-bit adds
n Also called data-level parallelism, vector
parallelism, or Single Instruction, Multiple
Data (SIMD)
§3.9 Fallacies and Pitfalls
Right Shift and Division
n Left shift by i places multiplies an integer
by 2i
n Right shift divides by 2i?
n Only for unsigned integers
n For signed integers
n Arithmetic right shift: replicate the sign bit
n e.g., –5 / 4
n 111110112 >> 2 = 111111102 = –2
n Rounds toward –∞
n c.f. 111110112 >>> 2 = 001111102 = +62

Associativity
n Parallel programs may interleave
operations in unexpected orders
n Assumptions of associativity may fail
(x+y)+z x+(y+z)
x -1.50E+38 -1.50E+38
y 1.50E+38 0.00E+00
z 1.0 1.0 1.50E+38
1.00E+00 0.00E+00
n Need to validate parallel programs under

varying degrees of parallelism
§3.10 Concluding Remarks
Concluding Remarks
n Bits have no inherent meaning
n Interpretation depends on the instructions
applied
n Computer representations of numbers

n Finite range and precision
n Need to account for this in programs

Concluding Remarks
n ISAs support arithmetic
n Signed and unsigned integers
n Floating-point approximation to reals
n Bounded range and precision

n Operations can overflow and underflow

Chapter 3

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 3

Uploaded by

Copyright:

Available Formats

COMPUTER ORGANIZATION AND DESIGN RISC-V

Chapter 3 — Arithmetic for Computers — 2

n Overflow if result out of range

Chapter 3 — Arithmetic for Computers — 3

Chapter 3 — Arithmetic for Computers — 4

Chapter 3 — Arithmetic for Computers — 6

Chapter 3 — Arithmetic for Computers — 7

n One cycle per partial-product addition

Chapter 3 — Arithmetic for Computers — 11

Chapter 3 — Arithmetic for Computers — 12

n One cycle per partial-remainder subtraction

Chapter 3 — Arithmetic for Computers — 13

Chapter 3 — Arithmetic for Computers — 14

n Overflow and division-by-zero don’t

Chapter 3 — Arithmetic for Computers — 15

n +0.002 × 10–4 not normalized

Chapter 3 — Arithmetic for Computers — 17

x = ( -1)S ´ (1+ Fraction) ´ 2(Exponent -Bias)

n S: sign bit (0 Þ non-negative, 1 Þ negative)

Chapter 3 — Arithmetic for Computers — 18

Chapter 3 — Arithmetic for Computers — 19

Chapter 3 — Arithmetic for Computers — 20

Chapter 3 — Arithmetic for Computers — 21

Chapter 3 — Arithmetic for Computers — 22

Chapter 3 — Arithmetic for Computers — 23

n Smaller than normal numbers

n Denormal with fraction = 000...0

Chapter 3 — Arithmetic for Computers — 24

Chapter 3 — Arithmetic for Computers — 25

Chapter 3 — Arithmetic for Computers — 26

Chapter 3 — Arithmetic for Computers — 27

Chapter 3 — Arithmetic for Computers — 28

Chapter 3 — Arithmetic for Computers — 29

Chapter 3 — Arithmetic for Computers — 30

Chapter 3 — Arithmetic for Computers — 31

Chapter 3 — Arithmetic for Computers — 32

Chapter 3 — Arithmetic for Computers — 33

Chapter 3 — Arithmetic for Computers — 34

Chapter 3 — Arithmetic for Computers — 35

n Compiled RISC-V code: const5: .dword 5

Chapter 3 — Arithmetic for Computers — 36

Chapter 3 — Arithmetic for Computers — 38

Chapter 3 — Arithmetic for Computers — 39

Chapter 3 — Arithmetic for Computers — 40

position, that sticky bit remains a ‘1’ ("sticks" at 1),

Chapter 3 — Arithmetic for Computers — 42

Chapter 3 — Arithmetic for Computers — 43

Chapter 3 — Arithmetic for Computers — 46

n Need to validate parallel programs under

n Computer representations of numbers

Chapter 3 — Arithmetic for Computers — 48

n Bounded range and precision

Chapter 3 — Arithmetic for Computers — 49

You might also like