You are on page 1of 21

EEL580 - Arquitetura de Computadores: Arithmetic for

Computers

Docente: Diego L. C. Dutra


Sumário

● Introduction
● Multiplication
● Division
● Floating Point
● Parallelism and Computer Arithmetic
● Real Stuff
● Going Faster
● Fallacies and Pitfalls
● Concluding Remarks
Introduction

● Operations on integers
○ Addition and subtraction
■ Overflow if result out of range
○ Multiplication and division
○ Dealing with overflow
● Floating-point real numbers
○ Representation and operations
● Arithmetic for Multimedia
○ Graphics and media processing operates on vectors of 8-bit and 16-bit data
■ Use 64-bit adder, with partitioned carry chain
■ Operate on 8×8-bit, 4×16-bit, or 2×32-bit vectors
○ SIMD (single-instruction, multiple-data)
■ Saturating operations
■ On overflow, result is largest representable value
● c.f. 2s-complement modulo arithmetic
■ E.g., clipping in audio, saturation in video
Multiplication: Basic
Multiplication: Optimized Multiplier

● Faster: Uses multiple adders


● Perform steps in parallel: add/shift ○ Cost/performance tradeoff

● One cycle per partial-product addition


○ That’s ok, if frequency of multiplications is low
● Can be pipelined
○ Several multiplication performed in parallel
Division: Basic
Division: Optimized Divider

● Faster Division
○ Can’t use parallel hardware as in multiplier
■ Subtraction is conditional on sign of
remainder
■ We still use parallel hardware but w/
diminish returns
○ Faster dividers (e.g. Newton-Raphson)
generate multiple quotient bits per step
■ Still require multiple steps

● One cycle per partial-remainder


subtraction
● Looks a lot like a multiplier!
○ Same hardware can be used for both
RISC-V Multiplication & Division

● RISC-V Division: Overflow and division-by-zero don’t produce errors


○ Just return defined results
○ Faster for the common case of no error
Floating Point

single: 8 bits single: 23 bits


● Representation for non-integral double: 52 bits
double: 11 bits
numbers
○ Including very small and very large S Exponent Fraction
numbers
● Types float and double in C
● Defined by IEEE Std 754-1985 ● IEEE Floating-Point Format
● Developed in response to divergence ○ S: sign bit (0 ⇒ non-negative, 1 ⇒ negative)
○ Normalize significand: 1.0 ≤ |significand| < 2.0
of representations ■ Always has a leading pre-binary-point 1 bit, so
○ Portability issues for scientific code no need to represent it explicitly (hidden bit)
● Now almost universally adopted ■ Significant is Fraction with the “1.” restored
○ Exponent: excess representation: actual exponent +
● Two representations Bias
○ Single precision (32-bit) ■ Ensures exponent is unsigned
○ Double precision (64-bit) ■ Single: Bias = 127; Double: Bias = 1203
Floating Point

● Relative precision ● Infinities and NaNs


○ all fraction bits are significant ○ Exponent = 111...1, Fraction = 000...0
○ Single: approx 2–23 ■ ±Infinity
■ Equivalent to 23 × log102 ≈ 23 × 0.3 ■ Can be used in subsequent
≈ 6 decimal digits of precision calculations, avoiding need for
○ Double: approx 2–52 overflow check
■ Equivalent to 52 × log102 ≈ 52 × 0.3 ○ Exponent = 111...1, Fraction ≠ 000...0
≈ 16 decimal digits of precision ■ Not-a-Number (NaN)
● Denormal Numbers ■ Indicates illegal or undefined
○ Exponent = 000...0 ⇒ hidden bit is 0 result
● e.g., 0.0 / 0.0
■ Can be used in subsequent
○ Smaller than normal numbers calculations
■ allow for gradual underflow, with
diminishing precision
○ Denormal with fraction = 000...0
Floating Point: Adder HW

Step 1

Step 2

Step 3

Step 4
Floating Point: HW

● FP Adder Hardware FP multiplier


○ Much more complex than integer adder
○ Doing it in one clock cycle would take too long
■ Much longer than integer operations
■ Slower clock would penalize all instructions
○ FP adder usually takes several cycles
■ Can be pipelined
● FP Arithmetic Hardware & Accurate Arithmetic
○ FP multiplier is of similar complexity to FP adder
■ But uses a multiplier for significands instead of an adder
○ FP arithmetic hardware usually does
■ Addition, subtraction, multiplication, division, reciprocal, square-root
■ FP ↔ integer conversion
○ Operations usually takes several cycles
■ Can be pipelined
○ IEEE Std 754 specifies additional rounding control
■ Extra bits of precision (guard, round, sticky)
■ Choice of rounding modes
■ Allows programmer to fine-tune numerical behavior of a computation
■ Not all FP units implement all options: Trade-off between hardware complexity,
performance, and market requirements
Floating Point Instructions in RISC-V
Floating Point Example: °F to °C

● C code:

● Compiled RISC-V code:


Floating Point Example: Array Multiplication

● C= C+A× B
○ All 32 × 32 matrices, 64-bit double-precision
elements
● C code:
○ Addresses of c, a, b in x10, x11, x12, and i, j, k in
x5, x6, x7

● RISC-V code:
Real Stuff: Parallelism in Computer Arithmetic and SIMD

● Graphics and audio applications can take advantage of performing simultaneous


operations on short vectors
○ Example: 128-bit adder:
■ Sixteen 8-bit adds
■ Eight 16-bit adds
■ Four 32-bit adds
● Also called data-level parallelism, vector parallelism, or Single Instruction,
Multiple Data (SIMD)
● Streaming SIMD Extension 2 (SSE2) on x86_64
Going Faster: Subword Parallelism and Matrix Multiply

● Unoptimized C code:

● Optimized C code:
Fallacies and Pitfalls

● Fallacy: Right Shift and Division


○ Left shift by i places multiplies an integer by 2 i and right shift divides by 2 i?
■ Only for unsigned integers
● Pitfall: Floating-point addition is not associative.
○ Parallel programs may interleave operations in unexpected orders
Assumptions of associativity may fail
○ Need to validate parallel programs under varying degrees of parallelism

● Fallacy: Parallel execution strategies that work for integer data


types also work for floating-point data types.
Fallacies and Pitfalls

● Fallacy: Only theoretical mathematicians care about floating-point accuracy.


○ Important for scientific code
○ But for everyday consumer use?
○ “My bank balance is out by 0.0002¢!” ☹
○ The Intel Pentium FDIV bug
○ The market expects accuracy
○ See Colwell, The Pentium Chronicles
○ Recall costs$500 million,
Concluding Remarks

● Bits have no inherent meaning The frequency of the RISC-V instructions for the
○ Interpretation depends on the SPEC CPU2006 benchmarks.
instructions applied
● Computer representations of
numbers
○ Finite range and precision
○ Need to account for this in programs
● ISAs support arithmetic
○ Signed and unsigned integers
○ Floating-point approximation to reals
● Bounded range and precision
○ Operations can overflow and underflow
Questions

You might also like