CAQA5e ch1

Computer Architecture
A Quantitative Approach, sixth Edition
Module 1
Fundamentals of Quantitative
Design and Analysis
Copyright © 2012, Elsevier Inc. All rights reserved. 1

Introduction
1970: The IBM 7044 Computer
Copyright © 2013, Daniel A. Menasce. All rights reserved. 2

Introduction
1970: Magnetic Core Memory
1core =
1 bit

Introduction
How Does a Computer Work?
 What is the Von Neumann computer
architecture?
 What are the elements of a Von Neumann
computer?
 How do these elements interact to execute
programs?

Introduction
Von Neumann Architecture
 What is the Von Neumann computer
architecture?
 What are the elements of a Von Neumann
computer?
 How do these elements interact to execute
programs?

Introduction
ALU Data
Stored-program
Registers Program computer
system bus
Introduction
ALU Data
Stored-program
Registers Program computer
Sequence of
General instructions:
registers, FP (1) Load/Store from
registers, memory to/from
Program registers, (2)
counter, etc. Arithmetic/Logical
Ops., (3) Conditional
and unconditional
branches.
Introduction
Example of Instruction Formats
opcode RX RY RZ RZ RX op RY
RX  Mem(address)
opcode RX Address
Mem(address) RX
Unconditional or
opcode Address Conditional Jump to
Address

Introduction
Computer Technology
 Performance improvements:
 Improvements in semiconductor technology
 Feature size, clock speed
 Improvements in computer architectures
 Enabled by HLL compilers, UNIX
 Lead to RISC architectures
 Together have enabled:

 Lightweight computers
 Productivity-based managed/interpreted
programming languages

Introduction
Single Processor Performance
Move to multi-processor. From ILP to DLP and TLP
RISC

Introduction
Moore’s Law
The number of
transistors on
integrated
circuits doubles
approximately
every two years.
(Gordon Moore,
Source: Wikimedia Commons 1965)

Introduction
Current Trends in Architecture
 Cannot continue to leverage Instruction-Level
parallelism (ILP)
 Single processor performance improvement ended in
2003
 New models for performance:

 Data-level parallelism (DLP)
 Thread-level parallelism (TLP)
 Request-level parallelism (RLP)
 These require explicit restructuring of the

application
Introduction
Instruction Level Parallelism (ILP)
In high-level language: In assembly language:
* e = a+b
e= a + b; 001 LOAD A,R1
f = c + d; 002 LOAD B,R2
3. ADD R1,R2,R3
g = e * f; 4. STO R3,E
* f=c+d
a b c d 5. LOAD C,R4
6. LOAD D,R5
+ + 7. ADD R4,R5,R6
e f 8. STO R6,F
* g=e*f
* 9. MULT R3,R6,R7
10. STO R7,G
g Copyright © 2013, Daniel A. Menasce. All rights reserved. 13

Introduction
Instruction Level Parallelism
In assembly language:
* e = a+b Instructions that can
001 LOAD A,R1
potentially be executed in
002 LOAD B,R2
3. ADD R1,R2,R3 parallel:
4. STO R3,E
* f=c+d
005 LOAD C,R4 001, 002, 005, 006
006 LOAD D,R5
7. ADD R4,R5,R6 003, 007
8. STO R6,F 004, 008, 009
* g=e*f
9. MULT R3,R6,R7 010
10. STO R7,G

Introduction
Data Level Parallelism (DLP)
Tas Data Tas Data

k k
Introduction
Task Level Parallelism (TLP)
Tas Task
k s
Classes of Computers
 Personal Mobile Device (PMD)
 e.g. smart phones, tablet computers
 Emphasis on energy efficiency and real-time
 Desktop Computing
 Emphasis on price-performance
 Servers
 Emphasis on availability, scalability, throughput
 Clusters / Warehouse Scale Computers
 Used for “Software as a Service (SaaS)”
 Emphasis on availability and price-performance
 Sub-class: Supercomputers, emphasis:
floating-point performance and fast internal networks
 Embedded Computers
 Emphasis: price

Parallelism
 Classes of parallelism in applications:
 Data-Level Parallelism (DLP)
 Task-Level Parallelism (TLP)
 Classes of architectural parallelism:

 Instruction-Level Parallelism (ILP)
 Vector architectures/Graphic Processor Units (GPUs)
 Thread-Level Parallelism
 Request-Level Parallelism

Flynn’s Taxonomy
 Single instruction stream, single data stream (SISD)
 Single instruction stream, multiple data streams (SIMD)

 Vector architectures
 Multimedia extensions
 Graphics processor units
 Multiple instruction streams, single data stream (MISD)

 No commercial implementation
 Multiple instruction streams, multiple data streams

(MIMD)
 Tightly-coupled MIMD (thread-level parallelism)
 Loosely-coupled MIMD (cluster or warehouse-scale
computing)
Defining Computer Architecture
Defining Computer Architecture
 “Old” view of computer architecture:
 Instruction Set Architecture (ISA) design
 i.e. decisions regarding:
 Registers (register-to-memory or load-store), memory
addressing (e.g., byte addressing, alignment), addressing modes
e.g., register, immediate, displacement), instruction operands
(type and size), available operations (CISC or RISC), control
flow instructions, instruction encoding (fixed vs. variable
length)
 “Real” computer architecture:

 Specific requirements of the target machine
 Design to maximize performance within constraints:
cost, power, and availability
 Includes ISA, microarchitecture, hardware
Trends in Technology
 Integrated circuit technology
 Transistor density: 35%/year
 Die size: 10-20%/year
 Integration overall (Moore’s Law): 40-55%/year
 DRAM capacity: 25-40%/year (slowing)
 Flash capacity (non-volatile semiconductor memory): 50-

60%/year
 15-20X cheaper/bit than DRAM
 Magnetic disk technology: density increase:

40%/year
 15-25X cheaper/bit then Flash
 300-500X cheaper/bit than DRAM
Bandwidth and Latency
 Bandwidth or throughput
 Total work done in a given time (e.g., MB/sec, MIPS)
 10,000-25,000X improvement for processors
 300-1200X improvement for memory and disks
 Latency or response time

 Time between start and completion of an event
 30-80X improvement for processors
 6-8X improvement for memory and disks

Bandwidth and Latency
Log-log plot of bandwidth and latency milestones

Transistors and Wires
 Feature size
 Minimum size of transistor or wire in x or y
dimension
 10 microns in 1971 to .032 microns in 2011
 Transistor performance scales linearly
 Wire delay does not improve with feature size!
 Integration density scales quadratically

Power (1 Watt = 1 Joule/sec) and
Trends in Power and Energy

Energy (Joules)
 Problem: Get power in, get power out
 Thermal Design Power (TDP)

 Characterizes sustained power consumption
 Used as target for power supply and cooling system
 Lower than peak power, higher than average power
consumption
 Clock rate can be reduced dynamically to limit

power consumption
 Energy per task is often a better measurement

Introduction
Energy and Power Example
Processor A Processor B
20% higher average power P
consumption: 1.2 P
Executes a task in 70% of T
the time needed by B:
0.7 * T
Energy consumption: 1.2 * Energy consumption = P * T
0.7 * T = 0.84 P * T
More energy efficient!
It is better to use energy instead of

power for comparing a fixed workload.
Dynamic Energy and Power
 Dynamic energy
 Transistor switch from 0 -> 1 or 1 -> 0
 ½ x Capacitive load x Voltage2
 Dynamic power
 ½ x Capacitive load x Voltage2 x Frequency switched
 Reducing clock rate reduces power, not energy

Power
 Intel 80386
consumed ~ 2 W
 3.3 GHz Intel Core
i7 consumes 130
W
 Heat must be
dissipated from
1.5 x 1.5 cm
chip
 This is the limit of
what can be
cooled by air

Reducing Power
 Techniques for reducing power:
 Do nothing well: turn-off clock of inactive
modules.
 Dynamic Voltage-Frequency Scaling
 Low power state for DRAM, disks (spin-
down)
 Overclocking some cores and turning off other
cores

Static Power
 Static power consumption
 Leakage current flows even when a transistor is
off
 Leakage can be as high as 50% due to large
SRAM caches that need power to maintain
values.
 Powerstatic = Currentstatic x Voltage
 Scales with number of transistors
 To reduce: power gating (i.e., turn off
the power supply).

Trends in Cost
Trends in Cost
 Cost driven down by learning curve
 Yield (% of manufactured devices that survive
testing).
 DRAM: price closely tracks cost
 Microprocessors: price depends on

volume (less time to get down the learning
curve and increase in manufacturing
efficiency)
 10% less for each doubling of volume
Trends in Cost
Integrated Circuit Cost
 Integrated circuit
# dies along the

wafer’s perimeter
 Bose-Einstein formula:
empirical formula
 Defects per unit area = 0.016-0.057 defects per square cm (2010)

 N = process-complexity factor = 11.5-15.5 (40 nm, 2010)

Dependability
Dependability
 Service Level Agreement (SLA) or Service Level
Objectives (SLO)
 Module reliability
 Mean time to failure (MTTF) – reliability measure
 Mean time to repair (MTTR) – service interruption
 Mean time between failures (MTBF) = MTTF + MTTR
 Availability = MTTF / MTBF

MTTR, MTTF, and MTBF
time
MTTR MTTF
failure machine failure

is fixed
MTBF
© 2004 D. A. Menascé. All Rights Reserved. 37

Dependability
Dependability Example
 A disk subsystem has the following components
and MTTF values:
Component MTTF (hours)
10 disks Each at 1,000,000
hours
1 ATA controller 500,000
1 power supply 200,000
1 fan 200,00
1 ATA cable 1,000,000
 Assume that lifetimes are exponentially

distributed and independent, what is the
system’s MTTF?

Dependability
Dependability Example Cont’d
Component MTTF (hours)
10 disks Each at 1,000,000
hours
1 ATA controller 500,000
1 power supply 200,000
1 fan 200,00
1 ATA cable 1,000,000
Failure Rate syst  10  1 1 1 1 1

   
1,000,000 500,000 200,000 200,000
1,000,000
 23,000 /billion hours
1
MTTFsyst   43,500 hours  4.96 yrs
Failure Rate syst

Measuring Performance
 Typical performance metrics:
 Response time (of interest to users)
 Throughput (of interest to operators)
 Speedup of X relative to Y
 Execution timeY / Execution timeX
 Execution time
 Wall clock time: includes all system
overheads
 CPU time: only computation time
 Benchmarks
 Kernels (e.g., matrix multiply)
 Toy programs (e.g., sorting)
 Synthetic benchmarks (e.g., Dhrystone)
 Benchmark suites (e.g., SPEC, TPC)
SPECRate and SPECRatio
 SPECrate: a throughput metrics. Measures the number jobs of a given
type that can be processed in a given time interval.
 SPECratio = ratio between elapsed time for a given job at a reference

machines and the elapsed time of the same job at a given machine.
Execution Time reference 
10.5sec Time
Execution machine A  5.25sec
SPECratio machine A  10.5 /5.25  2

 Comparing machines A and B:
Execution Timemachine B  21sec
Execution Timemachine A  5.25sec
Execution
TimereferenceTimemachine A Execution Timemachine B
SPECratiomachine A  Execution  21/5.25 
SPECratiomachine B Execution Time reference
 Execution Timemachine A 4
Execution Timemachine B
Copyright © 2013, Daniel A. Menasce 41

Geometric Mean of SPECratios
 When computing the average of SPECratios one should use the
geometric mean and not the arithmetic mean:
n
Geometric mean = n x i
i=1
Program SPECRatio
A 10.2
B 21.5
C 15.2
Geometric mean 14.94
Geometric mean = 3 10.2  21.5 15.5  3,333.36 

3
14.94
Copyright © 2013, Daniel A. Menasce 42

Principles
Principles of Computer Design
 Take Advantage of Parallelism
 e.g., multiple processors, disks, memory banks,
pipelining, multiple functional units
 Principle of Locality
 Reuse of data and instructions
 Focus on the Common Case

 Amdahl’s Law

Principles
Principles of Computer Design
 The Processor Performance Equation
(average) Clock cycles per

instruction

CAQA5e ch1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CAQA5e ch1

Uploaded by

Copyright:

Available Formats

Computer Architecture

A Quantitative Approach, sixth Edition

Copyright © 2012, Elsevier Inc. All rights reserved. 1

Copyright © 2013, Daniel A. Menasce. All rights reserved. 2

Copyright © 2013, Daniel A. Menasce. All rights reserved. 3

Copyright © 2013, Daniel A. Menasce. All rights reserved. 4

Copyright © 2013, Daniel A. Menasce. All rights reserved. 5

Copyright © 2013, Daniel A. Menasce. All rights reserved. 8

 Together have enabled:

Copyright © 2012, Elsevier Inc. All rights reserved. 9

Copyright © 2012, Elsevier Inc. All rights reserved. 10

Copyright © 2013, Daniel A. Menasce. All rights reserved. 11

 New models for performance:

 These require explicit restructuring of the

g Copyright © 2013, Daniel A. Menasce. All rights reserved. 13

Copyright © 2013, Daniel A. Menasce. All rights reserved. 14

Tas Data Tas Data

Copyright © 2012, Elsevier Inc. All rights reserved. 17

 Classes of architectural parallelism:

Copyright © 2012, Elsevier Inc. All rights reserved. 18

 Single instruction stream, multiple data streams (SIMD)

 Multiple instruction streams, single data stream (MISD)

 Multiple instruction streams, multiple data streams

 “Real” computer architecture:

 DRAM capacity: 25-40%/year (slowing)

 Flash capacity (non-volatile semiconductor memory): 50-

 Magnetic disk technology: density increase:

 Latency or response time

Copyright © 2012, Elsevier Inc. All rights reserved. 22

Log-log plot of bandwidth and latency milestones

Copyright © 2012, Elsevier Inc. All rights reserved. 23

Copyright © 2012, Elsevier Inc. All rights reserved. 24

Trends in Power and Energy

 Thermal Design Power (TDP)

 Clock rate can be reduced dynamically to limit

 Energy per task is often a better measurement

It is better to use energy instead of

 Reducing clock rate reduces power, not energy

Copyright © 2012, Elsevier Inc. All rights reserved. 27

Copyright © 2012, Elsevier Inc. All rights reserved. 29

Copyright © 2012, Elsevier Inc. All rights reserved. 30

Copyright © 2012, Elsevier Inc. All rights reserved. 31

 DRAM: price closely tracks cost

 Microprocessors: price depends on

# dies along the

 Defects per unit area = 0.016-0.057 defects per square cm (2010)

Copyright © 2012, Elsevier Inc. All rights reserved. 33

Copyright © 2012, Elsevier Inc. All rights reserved. 36

failure machine failure

© 2004 D. A. Menascé. All Rights Reserved. 37

 Assume that lifetimes are exponentially

Copyright © 2012, Elsevier Inc. All rights reserved. 38

Failure Rate syst  10  1 1 1 1 1

Copyright © 2012, Elsevier Inc. All rights reserved. 39

 SPECratio = ratio between elapsed time for a given job at a reference

SPECratio machine A  10.5 /5.25  2

Copyright © 2013, Daniel A. Menasce 41

Geometric mean = 3 10.2  21.5 15.5  3,333.36 

Copyright © 2013, Daniel A. Menasce 42

 Focus on the Common Case

Copyright © 2012, Elsevier Inc. All rights reserved. 43

(average) Clock cycles per

Copyright © 2012, Elsevier Inc. All rights reserved. 46