Aula Ch3

EEL580 - Arquitetura de Computadores: Arithmetic for
Computers
Docente: Diego L. C. Dutra

Sumário
● Introduction
● Multiplication
● Division
● Floating Point
● Parallelism and Computer Arithmetic
● Real Stuff
● Going Faster
● Fallacies and Pitfalls
● Concluding Remarks
Introduction
● Operations on integers
○ Addition and subtraction
■ Overflow if result out of range
○ Multiplication and division
○ Dealing with overflow
● Floating-point real numbers
○ Representation and operations
● Arithmetic for Multimedia
○ Graphics and media processing operates on vectors of 8-bit and 16-bit data
■ Use 64-bit adder, with partitioned carry chain
■ Operate on 8×8-bit, 4×16-bit, or 2×32-bit vectors
○ SIMD (single-instruction, multiple-data)
■ Saturating operations
■ On overflow, result is largest representable value
● c.f. 2s-complement modulo arithmetic
■ E.g., clipping in audio, saturation in video
Multiplication: Basic
Multiplication: Optimized Multiplier
● Faster: Uses multiple adders

● Perform steps in parallel: add/shift ○ Cost/performance tradeoff
● One cycle per partial-product addition

○ That’s ok, if frequency of multiplications is low
● Can be pipelined
○ Several multiplication performed in parallel
Division: Basic
Division: Optimized Divider
● Faster Division
○ Can’t use parallel hardware as in multiplier
■ Subtraction is conditional on sign of
remainder
■ We still use parallel hardware but w/
diminish returns
○ Faster dividers (e.g. Newton-Raphson)
generate multiple quotient bits per step
■ Still require multiple steps
● One cycle per partial-remainder

subtraction
● Looks a lot like a multiplier!
○ Same hardware can be used for both
RISC-V Multiplication & Division
● RISC-V Division: Overflow and division-by-zero don’t produce errors

○ Just return defined results
○ Faster for the common case of no error
Floating Point
single: 8 bits single: 23 bits

● Representation for non-integral double: 52 bits
double: 11 bits
numbers
○ Including very small and very large S Exponent Fraction
numbers
● Types float and double in C
● Defined by IEEE Std 754-1985 ● IEEE Floating-Point Format
● Developed in response to divergence ○ S: sign bit (0 ⇒ non-negative, 1 ⇒ negative)
○ Normalize significand: 1.0 ≤ |significand| < 2.0
of representations ■ Always has a leading pre-binary-point 1 bit, so
○ Portability issues for scientific code no need to represent it explicitly (hidden bit)
● Now almost universally adopted ■ Significant is Fraction with the “1.” restored
○ Exponent: excess representation: actual exponent +
● Two representations Bias
○ Single precision (32-bit) ■ Ensures exponent is unsigned
○ Double precision (64-bit) ■ Single: Bias = 127; Double: Bias = 1203
Floating Point
● Relative precision ● Infinities and NaNs

○ all fraction bits are significant ○ Exponent = 111...1, Fraction = 000...0
○ Single: approx 2–23 ■ ±Infinity
■ Equivalent to 23 × log102 ≈ 23 × 0.3 ■ Can be used in subsequent
≈ 6 decimal digits of precision calculations, avoiding need for
○ Double: approx 2–52 overflow check
■ Equivalent to 52 × log102 ≈ 52 × 0.3 ○ Exponent = 111...1, Fraction ≠ 000...0
≈ 16 decimal digits of precision ■ Not-a-Number (NaN)
● Denormal Numbers ■ Indicates illegal or undefined
○ Exponent = 000...0 ⇒ hidden bit is 0 result
● e.g., 0.0 / 0.0
■ Can be used in subsequent
○ Smaller than normal numbers calculations
■ allow for gradual underflow, with
diminishing precision
○ Denormal with fraction = 000...0
Floating Point: Adder HW
Step 1
Step 2
Step 3
Step 4
Floating Point: HW
● FP Adder Hardware FP multiplier

○ Much more complex than integer adder
○ Doing it in one clock cycle would take too long
■ Much longer than integer operations
■ Slower clock would penalize all instructions
○ FP adder usually takes several cycles
■ Can be pipelined
● FP Arithmetic Hardware & Accurate Arithmetic
○ FP multiplier is of similar complexity to FP adder
■ But uses a multiplier for significands instead of an adder
○ FP arithmetic hardware usually does
■ Addition, subtraction, multiplication, division, reciprocal, square-root
■ FP ↔ integer conversion
○ Operations usually takes several cycles
■ Can be pipelined
○ IEEE Std 754 specifies additional rounding control
■ Extra bits of precision (guard, round, sticky)
■ Choice of rounding modes
■ Allows programmer to fine-tune numerical behavior of a computation
■ Not all FP units implement all options: Trade-off between hardware complexity,
performance, and market requirements
Floating Point Instructions in RISC-V
Floating Point Example: °F to °C
● C code:
● Compiled RISC-V code:

Floating Point Example: Array Multiplication
● C= C+A× B
○ All 32 × 32 matrices, 64-bit double-precision
elements
● C code:
○ Addresses of c, a, b in x10, x11, x12, and i, j, k in
x5, x6, x7
● RISC-V code:
Real Stuff: Parallelism in Computer Arithmetic and SIMD
● Graphics and audio applications can take advantage of performing simultaneous

operations on short vectors
○ Example: 128-bit adder:
■ Sixteen 8-bit adds
■ Eight 16-bit adds
■ Four 32-bit adds
● Also called data-level parallelism, vector parallelism, or Single Instruction,
Multiple Data (SIMD)
● Streaming SIMD Extension 2 (SSE2) on x86_64
Going Faster: Subword Parallelism and Matrix Multiply
● Unoptimized C code:
● Optimized C code:
Fallacies and Pitfalls
● Fallacy: Right Shift and Division

○ Left shift by i places multiplies an integer by 2 i and right shift divides by 2 i?
■ Only for unsigned integers
● Pitfall: Floating-point addition is not associative.
○ Parallel programs may interleave operations in unexpected orders
Assumptions of associativity may fail
○ Need to validate parallel programs under varying degrees of parallelism
● Fallacy: Parallel execution strategies that work for integer data

types also work for floating-point data types.
Fallacies and Pitfalls
● Fallacy: Only theoretical mathematicians care about floating-point accuracy.

○ Important for scientific code
○ But for everyday consumer use?
○ “My bank balance is out by 0.0002¢!” ☹
○ The Intel Pentium FDIV bug
○ The market expects accuracy
○ See Colwell, The Pentium Chronicles
○ Recall costs$500 million,
Concluding Remarks
● Bits have no inherent meaning The frequency of the RISC-V instructions for the
○ Interpretation depends on the SPEC CPU2006 benchmarks.
instructions applied
● Computer representations of
numbers
○ Finite range and precision
○ Need to account for this in programs
● ISAs support arithmetic
○ Signed and unsigned integers
○ Floating-point approximation to reals
● Bounded range and precision
○ Operations can overflow and underflow
Questions

Aula Ch3

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Aula Ch3

Uploaded by

Copyright:

Available Formats

EEL580 - Arquitetura de Computadores: Arithmetic for

Docente: Diego L. C. Dutra

● Faster: Uses multiple adders

● One cycle per partial-product addition

● One cycle per partial-remainder

● RISC-V Division: Overflow and division-by-zero don’t produce errors

single: 8 bits single: 23 bits

● Relative precision ● Infinities and NaNs

● FP Adder Hardware FP multiplier

● Compiled RISC-V code:

● Graphics and audio applications can take advantage of performing simultaneous

● Fallacy: Right Shift and Division

● Fallacy: Parallel execution strategies that work for integer data

● Fallacy: Only theoretical mathematicians care about floating-point accuracy.

You might also like