Unit - 1 (Fundamentals of Computer Architecture and Technology Trends)

Unit-1
Fundamentals – Computer Architecture and

Technology Trends
By
Dr. Sarvesh Vishwakarma
Professor – CSE
TCS 704 - Advanced Computer Architecture
Copyright © 2012, Elsevier Inc. All rights reserved. 1

Why we need computer
architecture?????????

Sequential Machine

Sequential Machine
 Problem with Von Neumann Model
 Speed of Information Exchange between
Memory and Processor
 Execution Rate of Information
 Solution
 Use of Cache memory
 Use of Pipelining Concepts: Overlapping the
execution of instructions

Sequential Machine
 Restrictions
 Cache memory speed is limited by technology
Speed
Data
increases
Storage
Memory
Cach
CPU e Size
increase

Sequential Machine
 Pipelining are useful only in some cases
 Due to Pipelining hazards
 Data Hazards
 Branch Hazards
 Resource Hazards

Sequential Machine
 Don’t go for Sequential Machine
architecture
 Use Parallel Architectures
 Flynn’s Taxonomy
 SISD single instruction single data stream
 SIMD single instruction multiple data stream
 MISD multiple instruction single data stream
 MIMD Multiple instruction multiple data stream

Classes of Computers
Flynn’s Taxonomy
 Single instruction stream, single data stream (SISD)
 Single instruction stream, multiple data streams (SIMD)

 Vector architectures
 Multimedia extensions
 Graphics processor units
 Multiple instruction streams, single data stream (MISD)

 No commercial implementation
 Multiple instruction streams, multiple data streams

(MIMD)
 Tightly-coupled MIMD
 Loosely-coupled MIMD

Introduction
Single Processor Performance
Move to multi-processor
RISC

Classes of Computer
1. Mainframe
2. Minicomputer
3. Supercomputer
4. Desktop computer
5. Server
6. Embedded computer

Classes of Computer
 Mainframe
 Year 1960
 Costly
 Large size
 Multi-user
 Application: Data Processing & Scientific
Computing (Bank, Government, Corporate)
 Response 100 sec. for million user

Classes of Computer
 Minicomputer
 Year 1970
 Costlier
 Small size
 Multi-user
 Application: Data Processing & Scientific
Computing (Scientific Laboratory)
 Time-sharing
 Supercomputer
 Year 1970

Classes of Computer
 Desktop Computer
 Year 1980
 Less Costlier
 Small size
 Feature: Microprocessor
 Two Classes:
 Personal Computer also called Micro-computer
Alternate of timesharing minicomputer, flexible and
meet a wide range of end user needs
 Workstation: single user and contain special
hardware

Classes of Computer
 Server
 Year 1980
 Costlier
 Size
 Dedicated to provide large scale services
 Reliable
 Long-term file storage and access
 Large memory
 More computing power

Classes of Computer
 Server
 Characteristics
 Availability
 Operate 7 days a week 24 hours a day
 Scalability
In response to the growing demand of the services Server

often grow
 Long-term file storage and access
 Large memory
 More computing power
 Efficient Throughput

Classes of Computer
 Server
 Cost of Downtime
Application Cost of downtime per 1% (87.6 0.5% (43.8 0.1% (8.8
hour (thousands of $) hrs/yr) hrs/yr) hrs/yr)
Brokerage operations 6450 565 283 56.5
Credit card authorization 2600 228 114 22.8
Package shipping services 150 13 6.6 1.3
Home shopping channel 113 9.9 4.9 1.0
Catalog sales center 90 7.9 3.9 0.8

Airline reservation center 89 7.9 3.9 0.8
Cellular service activation 41 3.6 1.8 0.4
Online network fees 25 2.2 1.1 0.2
ATM service fees 14 1.2 0.6 0.1

Classes of Computer
 Personal Digital Assistant
 Year 1990
 First hand held computing devices
 High performance digital consumer electronics
video game, set-top box
 Embedded Computer
 Year 2000
 Handle particular task
 Reduce the size and product cost
 Cellphone, digital watches, mp3 player, factory
controller
Classes of Computer
 Comparison of three computing classes
and their system characteristics
Feature Desktop Server Embedded
Price of system $1000 - $10,000 $10,000- $10-$100,000

$10,000,000
Price of microprocessor module $100-$1000 $200-$2000 $0.20-$200
Microprocessors sold per year 150,000,000 4,000,000 300,000,000

(estimates for 2000)
Critical system design issues Price- Throughput, Price, power
performance, availability, consumption,
graphics scalability application-
performance specific
performance

Classes of Computer
 Summary of some of the most important functional requirements an
architect faces

Instruction Set Architecture
 Class of ISA
 80x86
 MIPS
 Memory Addressing
 MIPS follow aligned addressing
 80x86 Non aligned addressing
 Addressing Modes
 MIPS
 Register addressing mode
 Immediate addressing mode
 Displacement addressing mode

 Types and Size of Operands
 80x86/MIPS: support operand size
 8 bit (ASCII); 16 bit (Unicode character);
 32 bit (integer); 64 bit (long integer) ;
 80 bit extended double precision
 IEEE 754 floating point:
 32 bit single precision
 64 bit double precision
 Operations
 Data transfer
 Arithmetic logical
 Control
 Floating point

 Control Flow Instructions
 80x86/MIPS: support
 Conditional branch
 Unconditional Jump
 Procedure call
 Returns
 80x86:
 JE; JNE
 MIPS
 BE; BNE
 Encoding an ISA
 Fixed length Data transfer
 Variable length

Integrated Circuits: Fueling Innovation
 Chips begins with silicon, found in sand

 Silicon does not conduct electricity well and thus
called semiconductor
 A special chemical process can transform tiny
areas of silicon to either:
 Excellent conductors of electricity (like copper)
 Excellent insulator from electricity (like glass)
 Areas that can conduct or insulate (a switch)
 A transistor is simply an on/off switch controlled
by electricity
 Integrated circuits combines dozens of hundreds
of transistors in a chip
23
Trends in Cost
 Cost of an Integrated Circuit
Objective: Derive a formula for Cost of an

Integrated Circuit.
Die_Cost  Testing_Cost  Packing_Cost  Final_Test_Cost

IC_Cost 
Final_Test_Yield

Microelectronics Process
20-30
Slices
processing
steps
Die
Package Dice
Test
Package
Ship
Test
 Silicon ingots:
 6-12 inches in diameter and about 12-24 inches long
 Impurities in the wafer can lead to defective devices and reduces the yield
25
Average cutting edge for non-square pieces
rd s
2
s
Irregular 2
cutting
edge s
2
s
2
pieces rd 
2
   
2 2
s2 s2
rd 2  
4 4
2 s2
Total rd 2 
No. of 4
 Wafer _ diameter 
non-square  2 s2
average _ cutting _ length rd 2 
pieces 2 2
  (Wafer _ Diameter )
 s s
rd rd 2  2  
2 2
  (Wafer _ Diameter )

2  Die _ Area rd 2  2  ( Die _ Area )
rd  2  ( Die _ Area )
26
Integrated Circuits Costs
27
Trends in Cost
 Factors that influence the Cost of
Computer
1. Time
2. Volume
3. Commodification

What Affects Cost?
1. Learning curve:
 The more experience in manufacturing a component, the better
the yield
 In general, a chip, board or system with twice the yield will have
half the cost.
 The learning curve is different for different components,
complicating design decisions
2. Volume
 Larger volume increases rate of learning curve
 Doubling the volume typically reduce cost by 10%
3. Commodities
 Are essentially identical products sold by multiple vendors in
large volumes
 Foil the competition and drive the efficiency higher and thus the
cost down
29
Intel Motherboard Components
30
Computer
Components
31
Trends in Cost
Trends in Cost
 Cost driven down by learning curve
 Yield
 DRAM: price closely tracks cost
 Microprocessors: price depends on

volume
 10% less for each doubling of volume

Trends in Cost
Integrated Circuit Cost
 Integrated circuit
 Bose-Einstein formula:
 Defects per unit area = 0.016-0.057 defects per square cm (2010)

 N = process-complexity factor = 11.5-15.5 (40 nm, 2010)

Dependability
•Service accomplishment:- services is delivered as specified
•Service interruption:- delivered service is different from the service level
agreement
•Module Reliability:- measure of continuous service accomplishment from a
reference initial instant.

•MTTF:- mean time to failure
Reciprocal of MTTF is a rate of failures per

1 billion hours of operation or FIT
Rate of failure = MTTF
•MTBF:- measure of reliability for repairable system but commonly used for
both repair and non-repair system
If used for repair system
MTBF  MTTR  MTTF
FIT:- number of expected failures per one billion hours of operation for a
•
device.
34
MTTF Dependability
Operating
properly
Repair
ref t0 t1 t2 t3 t4 t5 t6
Second Third Fourth

First
failure failure failure
failure
Mean time between  t 2  t 0

failure  t 2  t1   t1  t 0 
 operating  repair
 MTTF  MTTR
MTBF  MTTF  MTTR

35
simple failure model
Non-repairable component
uptime/operation
downtime
ref
MTTF
Renewal failure model

Repairable component
uptime/operation uptime
MTTR
downtime
ref
MTBF
36
Dependability
 Module reliability
 Mean time to failure (MTTF)
 Mean time to repair (MTTR)
 Mean time between failures (MTBF) = MTTF + MTTR
 Module Availability
MTTF
Availabili ty 
MTBF
MTTF
Availabilt y 
(MTTF  MTTR)
 Module availability:- measure of the service
accomplishment with respect to the alternation between
the two states of accomplishment and interruption.
37
Real World Examples
From "Estimating IC Manufacturing Costs,” by Linley Gwennap,

Microprocessor Report, August 2, 1993, p. 15
38
Performance Metrics
 Response (execution) time:
 The time between the start and the completion of a task
 Measures user perception of the system speed
 Common in reactive and time critical systems, single-user computer, etc.
 Throughput:
 The total number of tasks done in a given time
 Most relevant to batch processing (billing, credit card processing)
 Mainly used for input/output systems (disk access, printer, etc.)
39
Introduction
Computer Technology
 Performance improvements:
 Improvements in semiconductor technology
 Feature size, clock speed
 Improvements in computer architectures
 Enabled by HLL compilers, UNIX
 Lead to RISC architectures
 Together have enabled:

 Lightweight computers
 Productivity-based managed/interpreted
programming languages

Introduction
Current Trends in Architecture
 Cannot continue to leverage Instruction-Level
parallelism (ILP)
 Single processor performance improvement ended in
2003
 New models for performance:

 Data-level parallelism (DLP)
 Thread-level parallelism (TLP)
 Request-level parallelism (RLP)
 These require explicit restructuring of the

application

 Personal Mobile Device (PMD)
 e.g. start phones, tablet computers
 Emphasis on energy efficiency and real-time
 Desktop Computing
 Emphasis on price-performance
 Servers
 Emphasis on availability, scalability, throughput
 Clusters / Warehouse Scale Computers
 Used for “Software as a Service (SaaS)”
 Emphasis on availability and price-performance
 Sub-class: Supercomputers, emphasis: floating-point
performance and fast internal networks
 Embedded Computers
 Emphasis: price

Defining Computer Architecture
Defining Computer Architecture
 “Old” view of computer architecture:
 Instruction Set Architecture (ISA) design
 i.e. decisions regarding:
 registers, memory addressing, addressing modes,
instruction operands, available operations, control flow
instructions, instruction encoding
 “Real” computer architecture:

 Specific requirements of the target machine
 Design to maximize performance within constraints:
cost, power, and availability
 Includes ISA, microarchitecture, hardware

Trends in Technology
 Integrated circuit logic technology
 Transistor density: 35%/year
 Die size: 10-20%/year
 Integration overall: 40-55%/year
 Semiconductor DRAM capacity: 25-40%/year (slowing)

 Flash capacity: 50-60%/year
 15-20X cheaper/bit than DRAM
 Magnetic disk technology: 40%/year

 15-25X cheaper/bit then Flash
 300-500X cheaper/bit than DRAM
 Network technology: depend on performance of switches

and performance of transmission system

Bandwidth and Latency
 Bandwidth or throughput
 Total work done in a given time
 10,000-25,000X improvement for processors
 300-1200X improvement for memory and disks
 Latency or response time

 Time between start and completion of an event
 30-80X improvement for processors
 6-8X improvement for memory and disks

Scaling of Transistor Performance and Wires
 Feature size
 Minimum size of transistor or wire in x or y
dimension
 10 microns in 1971 to .032 microns in 2011
 Transistor performance scales linearly
 Wire delay does not improve with feature size!
 Integration density scales quadratic ally

Performance trends: Bandwidth over Latency
Log-log plot of bandwidth and latency milestones

Trends in Power and Energy
 Problem: Get power in, get power out
 Thermal Design Power (TDP)

 Characterizes sustained power consumption
 Used as target for power supply and cooling system
 Lower than peak power, higher than average power
consumption
 Clock rate can be reduced dynamically to limit

power consumption
 Energy per task is often a better measurement

Dynamic Energy and Power
 Dynamic energy
 Transistor switch from 0 -> 1 or 1 -> 0
 ½ x Capacitive load x Voltage2
 Dynamic power
 ½ x Capacitive load x Voltage2 x Frequency switched
 Reducing clock rate reduces power, not energy

Power
 Intel 80386
consumed ~ 2 W
 3.3 GHz Intel
Core i7 consumes
130 W
 Heat must be
dissipated from
1.5 x 1.5 cm chip
 This is the limit of
what can be
cooled by air

Reducing Power
 Techniques for reducing power:
 Do nothing well
 Dynamic Voltage-Frequency Scaling
 Low power state for DRAM, disks
 Overclocking, turning off cores

Static Power
 Static power consumption
 Currentstatic x Voltage
 Scales with number of transistors
 To reduce: power gating

Measuring Performance
Measuring Performance
 Typical performance metrics:
 Response time
 Throughput
 Speedup of X relative to Y
 Execution timeY / Execution timeX
 Execution time
 Wall clock time: includes all system overheads
 CPU time: only computation time
 Benchmarks
 Kernels (e.g. matrix multiply)
 Toy programs (e.g. sorting)
 Synthetic benchmarks (e.g. Dhrystone)
 Benchmark suites (e.g. SPEC06fp, TPC-C)

Principles
Principles of Computer Design
 Take Advantage of Parallelism
 e.g. multiple processors, disks, memory banks,
pipelining, multiple functional units
 Principle of Locality
 Reuse of data and instructions
 Focus on the Common Case

 Amdahl’s Law

Principles
 The Processor Performance Equation

Principles
 Different instruction types having different
CPIs

A 400-MHz processor was used to execute a
benchmark program with the following instruction
mix and clock cycle counts:
Instruction type Instruction count Clock cycle count
Integer Arithmetic 450000 1
Data transfer 320000 2
Floating point 150000 2
Control transfer 80000 2
Determine the effective CPI, MIPS rate, and

execution time for this program.
57
Question: Suppose that we want to enhance the
processor used for web serving. The new
processor is 10 times faster on computation in
the web serving application than the original
processor. Assuming that the original
processor is busy with computation 40% of the
time and is waiting I/O 60% of the time, what is
the overall speedup gained by incorporating
the enhancement?

A 400-MHz processor was used to execute a
benchmark program with the following instruction
mix and clock cycle counts:
Instruction type Instruction count Clock cycle count
Integer Arithmetic 450000 1
Data transfer 320000 2
Floating point 150000 2
Control transfer 80000 2
Determine the effective CPI, MIPS rate, and

execution time for this program.
59
Question??
Q1. Find the number of dies per 300mm (30 cm)

wafer for a die that is 1.5 cm on a side.
Q2. Find the die yield for dies that are 1.5 cm on a
side and 1.0 cm on a side, assuming a defect
density of 0.4 per cm2 and α is 4.
60
f1 = 500 MHz f2 = 2.5 GHz
T1 = 12x seconds T2 = x seconds
MIPS Rate1 = 100 MIPS MIPS Rate2 = 1800 MIPS
CPI1 = ? CPI2 = ?
Ic = ? Ic = ?
Throughput1 = ? Throughput2 = ?

Ex1. The execution times (in seconds) of four programs on
three computers are given below: Assume that 109 instructions
were executed in each of the four programs. Calculate the
MIPS rating of each program on each of the three machines.
Based on these ratings, can you draw a clear conclusion
regarding the relative performance of the three computers?
Give reasons if you find a way to rank them statistically.
Program Execution Time (in seconds)
Computer A Computer B Computer C
Program 1 1 10 20
Program 2 1000 100 20
Program 3 500 1000 50
Program 4 100 800 100

Q1. Assume a disk subsystem with the following components and MTTF
10 disks, each rated at 1,000,000-hour MTTF
1 SCSI controller, 500,000-hour MTTF
1 power supply, 200,000-hour MTTF
1 fan, 200,000-hour MTTF
1 SCSI cable, 1,000,000-hour MTTF
Using the simplifying assumptions that the lifetimes are exponentially distributed and that
failures are independent, compute the MTTF of the system as a whole.
1 1 1 1 1
 10     
Failure rate system 1000,000 500000 200000 200000 1000000
10  2  5  5  1 23 23  1000
  
1,000,000 hours 1,000,000 1000,000,000 hours
23,000

1 billion hours
or, 23000 FIT
1 1,000,000,000 hours
MTTFsystem    43,500 hours
Failure rate system 23,000
1 years  364  24 hours  8736 hours
43500
Therefore, MTTFsystem   4.979 years
8736
64
Q2. Availability is the most important consideration
for designing servers, followed closely by scalability
and throughput.
(a)We have a single processor with a failures in time (FIT) of
100. What is the mean time to failure (MTTF) for this system?
(b)If it takes 1 day to get the system running again, what is the
availability of the system?
(c)Imagine that the government, to cut costs, is going to build a
supercomputer out of inexpensive computers rather than
expensive, reliable computers. What is the MTTF for a system
with 1000 processors? Assume that if one fails, they all fail.
65
1
a) MTTFsystem 
Failure ratesystem
1,000,000,000 hours

100
 107 hours

b) MTTR  24 hours
MTTF  107 hours
MTTF
System availability 
MTTF  MTTR
107
 7
10  24
10,000,000 10,000,000
   0.999  1
10,000,000  24 10,000,024
67
1
c ) Failure ratesystem  1000  7
10
1000  100

107  100
100,000
 9
10 hours
100000

1 billion hours
100,000 FIT
1000,000,000
MTTFsystem   10,000 hours
100,000
68

Unit - 1 (Fundamentals of Computer Architecture and Technology Trends)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit - 1 (Fundamentals of Computer Architecture and Technology Trends)

Uploaded by

Copyright:

Available Formats

Unit-1

Fundamentals – Computer Architecture and

Copyright © 2012, Elsevier Inc. All rights reserved. 1

Copyright © 2012, Elsevier Inc. All rights reserved. 2

Copyright © 2012, Elsevier Inc. All rights reserved. 3

Copyright © 2012, Elsevier Inc. All rights reserved. 4

Copyright © 2012, Elsevier Inc. All rights reserved. 5

Copyright © 2012, Elsevier Inc. All rights reserved. 6

Copyright © 2012, Elsevier Inc. All rights reserved. 7

 Single instruction stream, multiple data streams (SIMD)

 Multiple instruction streams, single data stream (MISD)

 Multiple instruction streams, multiple data streams

Copyright © 2012, Elsevier Inc. All rights reserved. 8

Copyright © 2012, Elsevier Inc. All rights reserved. 9

Copyright © 2012, Elsevier Inc. All rights reserved. 10

Copyright © 2012, Elsevier Inc. All rights reserved. 11

Copyright © 2012, Elsevier Inc. All rights reserved. 12

Copyright © 2012, Elsevier Inc. All rights reserved. 13

Copyright © 2012, Elsevier Inc. All rights reserved. 14

In response to the growing demand of the services Server

 More computing power

Copyright © 2012, Elsevier Inc. All rights reserved. 15

Brokerage operations 6450 565 283 56.5

Credit card authorization 2600 228 114 22.8

Package shipping services 150 13 6.6 1.3

Home shopping channel 113 9.9 4.9 1.0

Catalog sales center 90 7.9 3.9 0.8

Copyright © 2012, Elsevier Inc. All rights reserved. 16

Price of system $1000 - $10,000 $10,000- $10-$100,000

Microprocessors sold per year 150,000,000 4,000,000 300,000,000

Copyright © 2012, Elsevier Inc. All rights reserved. 18

Copyright © 2012, Elsevier Inc. All rights reserved. 19

Copyright © 2012, Elsevier Inc. All rights reserved. 20

Copyright © 2012, Elsevier Inc. All rights reserved. 21

Copyright © 2012, Elsevier Inc. All rights reserved. 22

 Chips begins with silicon, found in sand

Objective: Derive a formula for Cost of an

Die_Cost  Testing_Cost  Packing_Cost  Final_Test_Cost

Copyright © 2012, Elsevier Inc. All rights reserved. 24

Copyright © 2012, Elsevier Inc. All rights reserved. 28

 DRAM: price closely tracks cost

 Microprocessors: price depends on

Copyright © 2012, Elsevier Inc. All rights reserved. 32

 Defects per unit area = 0.016-0.057 defects per square cm (2010)

Copyright © 2012, Elsevier Inc. All rights reserved. 33

reference initial instant.

Reciprocal of MTTF is a rate of failures per

Second Third Fourth

Mean time between  t 2  t 0

MTBF  MTTF  MTTR

Renewal failure model

From "Estimating IC Manufacturing Costs,” by Linley Gwennap,

 Together have enabled:

Copyright © 2012, Elsevier Inc. All rights reserved. 40

 New models for performance:

 These require explicit restructuring of the

Copyright © 2012, Elsevier Inc. All rights reserved. 41

Copyright © 2012, Elsevier Inc. All rights reserved. 42

 “Real” computer architecture:

Copyright © 2012, Elsevier Inc. All rights reserved. 43

 Semiconductor DRAM capacity: 25-40%/year (slowing)

 Magnetic disk technology: 40%/year

 Network technology: depend on performance of switches