You are on page 1of 14

The Computer Revolution

Computer Architecture • The third revolution along with agriculture and


Chapter 1: Computer Abstractions and Technology industry
• Progress in computer technology
– Underpinned by Moore’s Law
• Makes novel applications feasible
– Computers in automobiles
– Cell phones
– Human genome project
– World Wide Web
– Search Engines
• Computers are pervasive
Adapted from Computer Organization the Hardware/Software Interface – 5th

Computer Engineering – CSE – HCMUT Chapter 1: Computer Technology 2

The Moore’s Law Intel Processors & Chips


• World record, in terms of the number of transistors
integrated into a chip:
– Altera FPGA device: 30+ Billions
• Intel processor
– Core i 9th generation Cannon Lake
Co-founder of Intel Corp. – 14 nm technology
– >1.4B transistors (6th generation – SkyLake)

The number of transistors integrated in a


chip has doubled every 18-24 months
(1975)

Chapter 1: Computer Technology 3 Chapter 1: Computer Technology 4

1
The First “Computer” The First “Computer” (cont.)

Source: Internet The ENIAC Computer, source: US Army photo


Chapter 1: Computer Technology 5 Chapter 1: Computer Technology 6

The ENIAC Computer A Brief History of Computers


• The first generation
• 30+ tons – Vacuum tubes
– 1946 – 1955
• 1,500+ square feet (140 square meter) • The second generation
– Transistors
• 18,000+ vacuum tubes – 1955 – 1965
• The third generation
• 140+ KW power – 1965 – 1980
– Integrated circuits
• 5,000+ additions per second • The current generation
– 1980 - …
– Personal computers
• What’s the next?
– Quantum computers?
– Memristor?

Chapter 1: Computer Technology 7 Chapter 1: Computer Technology 8

2
Classes of Computers Classes of Computers
• Personal computers • Supercomputers
– General purpose, variety of software – High-end scientific and engineering calculations
– Subject to cost/performance tradeoff – Highest capability but represent a small fraction of
the overall computer market
• Server computers
– Network based • Embedded computers
– High capacity, performance, reliability – Hidden as components of systems
– Range from small servers to building sized – Stringent power/performance/cost constraints

Chapter 1: Computer Technology 9 Chapter 1: Computer Technology 10

The PostPC Era The PostPC Era


• Personal Mobile Device (PMD)
– Battery operated
– Connects to the Internet
– Hundreds of dollars
– Smart phones, tablets, electronic glasses,…
• Clouding computing
– Warehouse Scale Computers (WSC)
– Software as a Service (SaaS)
– Portion of software run on a PMD and a portion run in
the Cloud
– Amazon and Google
source: BusinessInsider
Chapter 1: Computer Technology 11 Chapter 1: Computer Technology 12

3
Understanding Performance Below Your Program
• Algorithm • Application software
– Determines number of operations executed – Written in high-level language
• Programming language, compiler, architecture • System software
– Compiler: translates HLL code to
– Determine number of machine instructions executed per
machine code
operation
– Operating System: service code
• Processor and memory system • Handling input/output
– Determine how fast instructions are executed • Managing memory and storage
• Scheduling tasks & sharing resources
• I/O system (including OS)
• Hardware
– Determines how fast I/O operations are executed
– Processor, memory, I/O controllers

Chapter 1: Computer Technology 13 Chapter 1: Computer Technology 14

Levels of Program Code Components of a Computer


• High-level language • Same components for
The BIG Picture
– Level of abstraction closer all kinds of computer
to problem domain – Desktop, server,
– Provides for productivity embedded
and portability • Input/output includes
• Assembly language – User-interface devices
– Textual representation of • Display, keyboard, mouse
instructions – Storage devices
• Hardware representation • Hard disk, CD/DVD, flash
– Binary digits (bits) – Network adapters
– Encoded instructions and • For communicating with
other computers
data

Chapter 1: Computer Technology 15 Chapter 1: Computer Technology 16

4
Touchscreen Through the Looking Glass
• PostPC device • LCD screen: picture elements (pixels)
• Supersedes keyboard – Mirrors content of frame buffer memory
and mouse
• Resistive and Capacitive
types
– Most tablets, smart
phones use capacitive
– Capacitive allows
multiple touches
simultaneously

Chapter 1: Computer Technology 17 Chapter 1: Computer Technology 18

Opening the Box Inside the Processor (CPU)


Capacitive multitouch LCD screen

3.8 V, 25 Watt-hour battery • Datapath: performs operations on data


• Control: sequences datapath, memory, ...
Computer board
• Cache memory
– Small fast SRAM memory for immediate access to
data

Chapter 1: Computer Technology 19 Chapter 1: Computer Technology 20

5
Inside the Processor Abstractions
The BIG Picture
• Apple A5 • Abstraction helps us deal with complexity
– Hide lower-level detail
• Instruction set architecture (ISA)
– The hardware/software interface
• Application binary interface
– The ISA plus system software interface
• Implementation
– The details underlying and interface
Chapter 1: Computer Technology 21 Chapter 1: Computer Technology 22

A Safe Place for Data Networks


• Volatile main memory • Communication, resource sharing, nonlocal
– Loses instructions and data when power off
access
• Non-volatile secondary memory
– Magnetic disk • Local area network (LAN): Ethernet
– Flash memory
– Optical disk (CDROM, DVD)
• Wide area network (WAN): the Internet
• Wireless network: WiFi, Bluetooth

Chapter 1: Computer Technology 23 Chapter 1: Computer Technology 24

6
Technology Trends Semiconductor Technology
• Electronics technology • Silicon: semiconductor
continues to evolve
– Increased capacity and • Add materials to transform properties:
performance
– Conductors
– Reduced cost
DRAM capacity
– Insulators
Year Technology Relative performance/cost – Switch
1951 Vacuum tube 1
1965 Transistor 35
1975 Integrated circuit (IC) 900
1995 Very large scale IC (VLSI) 2,400,000
2013 Ultra large scale IC 250,000,000,000

Chapter 1: Computer Technology 25 Chapter 1: Computer Technology 26

Manufacturing ICs Intel Core i7 Wafer

• 300mm wafer, 280 chips, 32nm technology


• Yield: proportion of working dies per wafer
• Each chip is 20.7 x 10.5 mm
Chapter 1: Computer Technology 27 Chapter 1: Computer Technology 28

7
Integrated Circuit Cost Defining Performance
Cost per wafer • Which airplane has the best performance?
Cost per die 
Dies per wafer  Yield Boeing 777 Boeing 777

Dies per wafer  Wafer area Die area Boeing 747 Boeing 747

BAC/Sud BAC/Sud

1
Concorde Concorde

Yield  Douglas DC- Douglas DC-


8-50 8-50

(1 (Defects per area  Die area/ )) 0 100 200 300 400 500 0 2000 4000 6000 8000 10000

Passenger Capacity Cruising Range (miles)

• Nonlinear relation to area and defect rate Boeing 777 Boeing 777

– Wafer cost and area are fixed Boeing 747 Boeing 747

BAC/Sud BAC/Sud

– Defect rate determined by manufacturing process Concorde


Douglas DC-
Concorde
Douglas DC-
8-50 8-50

– Die area determined by architecture and circuit design 0 500 1000 1500 0 100000 200000 300000 400000

Cruising Speed (mph) Passengers x mph

Chapter 1: Computer Technology 29 Chapter 1: Computer Technology 30

Response Time and Throughput Relative Performance


• Response time • Define Performance = 1/Execution Time
– How long it takes to do a task • “X is n time faster than Y”
• Throughput Performanc e X Performanc e Y
– Total work done per unit time
 Execution time Y Execution time X  n
• e.g., tasks/transactions/… per hour
• How are response time and throughput affected by • Example: time taken to run a program
– Replacing the processor with a faster version? – 10s on A, 15s on B
– Adding more processors? – Execution TimeB / Execution TimeA
• We’ll focus on response time for now… = 15s / 10s = 1.5
– So A is 1.5 times faster than B
Chapter 1: Computer Technology 31 Chapter 1: Computer Technology 32

8
Measuring Execution Time CPU Clocking
• Elapsed time • Operation of digital hardware governed by a
– Total response time, including all aspects constant-rate clock
Clock period
• Processing, I/O, OS overhead, idle time
Clock (cycles)
– Determines system performance
Data transfer
• CPU time and computation

– Time spent processing a given job Update state

• Discounts I/O time, other jobs’ shares • Clock period: duration of a clock cycle
– Comprises user CPU time and system CPU time – e.g., 250ps = 0.25ns = 250×10–12s
– Different programs are affected differently by CPU • Clock frequency (rate): cycles per second
and system performance
– e.g., 4.0GHz = 4000MHz = 4.0×109Hz
Chapter 1: Computer Technology 33 Chapter 1: Computer Technology 34

CPU Time CPU Time Example


CPU Time  CPU Clock Cycles  Clock Cycle Time
• Computer A: 2GHz clock, 10s CPU time
CPU Clock Cycles • Designing Computer B

Clock Rate – Aim for 6s CPU time
– Can do faster clock, but causes 1.2 × clock cycles
• Performance improved by • How fast must Computer B clock be?
– Reducing number of clock cycles Clock CyclesB 1.2  Clock Cycles A
Clock RateB  
– Increasing clock rate CPU Time B 6s
– Hardware designer must often trade off clock rate Clock Cycles A  CPU Time A  Clock Rate A
against cycle count  10s  2GHz  20  10 9
1.2  20  10 9 24  10 9
Clock RateB    4GHz
Chapter 1: Computer Technology 35
6s
Chapter 1: Computer Technology
6s 36

9
Instruction Count and CPI CPI Example
Clock Cycles  Instruction Count  Cycles per Instruction • Computer A: Cycle Time = 250ps, CPI = 2.0
CPU Time  Instruction Count  CPI  Clock Cycle Time • Computer B: Cycle Time = 500ps, CPI = 1.2
• Same ISA, compiler
Instruction Count  CPI
 • Which is faster, and by how much?
Clock Rate
CPU Time  Instruction Count  CPI  Cycle Time
A A A
• Instruction Count for a program
 I  2.0  250ps  I  500ps A is faster…
– Determined by program, ISA and compiler
CPU Time  Instruction Count  CPI  Cycle Time
• Average cycles per instruction B B B
– Determined by CPU hardware  I  1.2  500ps  I  600ps
– If different instructions have different CPI CPU Time
B  I  600ps  1.2
• Average CPI affected by instruction mix CPU Time I  500ps …by this much
A
Chapter 1: Computer Technology 37 Chapter 1: Computer Technology 38

CPI in More Detail CPI Example


• If different instruction classes take different • Alternative compiled code sequences using
instructions in classes A, B, C
numbers of cycles
Class A B C
n
Clock Cycles   (CPIi  Instruction Count i ) CPI for class 1 2 3
i1 IC in sequence 1 2 1 2
IC in sequence 2 4 1 1
• Weighted average CPI  Sequence 1: IC = 5  Sequence 2: IC = 6
Clock Cycles n
 Instruction Count i  Clock Cycles Clock Cycles
   CPIi 
 
CPI   = 2×1 + 1×2 + 2×3 = 4×1 + 1×2 + 1×3
Instruction Count i1  Instruction Count 
= 10 =9
Relative frequency
 Avg. CPI = 10/5 = 2.0  Avg. CPI = 9/6 = 1.5
Chapter 1: Computer Technology 39 Chapter 1: Computer Technology 40

10
Exercise Performance Summary
• A program consists of 1000 instructions in which: The BIG Picture
– 30% load/store instructions, CPI = 2.5 Instructio ns Clock cycles Seconds
– 10% jump instructions, CPI = 1 CPU Time   
– 20% branch instructions, CPI = 1.5 Program Instructio n Clock cycle
– The rest are arithmetic instructions, CPI = 2.0
• The program is executed on a 2 GHz CPU • Performance depends on
a) What is execution time (CPU time) of the program? – Algorithm: affects IC, possibly CPI
b) What is the weight average CPI of the program? – Programming language: affects IC, CPI
c) If load/store instructions are improved so that their
execution time is reduced by a factor of 2, what is the – Compiler: affects IC, CPI
speed-up of the system? – Instruction set architecture: affects IC, CPI, Tc

Chapter 1: Computer Technology 41 Chapter 1: Computer Technology 42

10000
Power Trends 120
Reducing Power
3600 3900
2000 2667 3300 3400
100
1000
frequency 103
95 • Suppose a new CPU has
Frequency (MHz)

200 87 80
Power (W)

100
25
66 75.3 77
65 60 – 85% of capacitive load of old CPU
power
10
12.5 16
29.1 40 – 15% voltage and 15% frequency reduction
4.1 4.9
10.1 20
Pnew Cold  0.85  (Vold  0.85)2  Fold  0.85
1
3.3
0  2
 0.85 4  0.52
Pold Cold  Vold  Fold
Pentium Pro

Pentium 4
Willamette

Core i5 Ivy
Pentium 4

Prescott

Skylake
Core i5
Kentsfield
Pentium

Clarkdal
e (2010)
(2004)

(2015)
Core i5

Bridge
(1982)

(1985)

(1989)

(1993)

(1997)

(2001)

(2012)
80286

80386

80486

Core 2

(2007)

• The power wall


• In CMOS IC technology – We can’t reduce voltage further
Power  Capacitive load  Voltage  Frequency 2 – We can’t remove more heat
• How else can we improve performance?
×30 5V → 1V ×1000
Chapter 1: Computer Technology 43 Chapter 1: Computer Technology 44

11
Uniprocessor Performance Multiprocessors
• Multicore microprocessors
– More than one processor per chip
• Requires explicitly parallel programming
– Compare with instruction level parallelism
• Hardware executes multiple instructions at once
• Hidden from the programmer
– Hard to do
• Programming for performance
• Load balancing
Constrained by power, instruction-level parallelism,
memory latency • Optimizing communication and synchronization
Chapter 1: Computer Technology 45 Chapter 1: Computer Technology 46

SPEC CPU Benchmark CINT2006 for Intel Core i7 920


• Programs used to measure performance
– Supposedly typical of actual workload
• Standard Performance Evaluation Corp (SPEC)
– Develops benchmarks for CPU, I/O, Web, …
• SPEC CPU2006
– Elapsed time to execute a selection of programs
• Negligible I/O, so focuses on CPU performance
– Normalize relative to reference machine
– Summarize as geometric mean of performance ratios
• CINT2006 (integer) and CFP2006 (floating-point)
n
n
 Execution time ratio
i1 Chapter 1: Computer Technology
i
47 Chapter 1: Computer Technology 48

12
SPEC Power Benchmark SPECpower_ssj2008 for Xeon X5650

• Power consumption of server at different


workload levels
– Performance: ssj_ops/sec
– Power: Watts (Joules/sec)

 10   10 
Overall ssj_ops per Watt    ssj_opsi    poweri 
 i0   i 0 

Chapter 1: Computer Technology 49 Chapter 1: Computer Technology 50

Pitfall: Amdahl’s Law Fallacy: Low Power at Idle


• Improving an aspect of a computer and expecting • Look back at i7 power benchmark
a proportional improvement in overall – At 100% load: 258W
performance
Taffected – At 50% load: 170W (66%)
Timproved   Tunaffected – At 10% load: 121W (47%)
improvemen t factor
• Example: multiply accounts for 80s/100s • Google data center
– How much improvement in multiply performance to – Mostly operates at 10% – 50% load
get 5× overall? – At 100% load less than 1% of the time
80
– Can’t be done: 20   20
n
• Corollary: make the common case fast
• Consider designing processors to make power
proportional to load
Chapter 1: Computer Technology 51 Chapter 1: Computer Technology 52

13
Pitfall: MIPS as a Performance Metric Concluding Remarks
• MIPS: Millions of Instructions Per Second • Cost/performance is improving
– Doesn’t account for – Due to underlying technology development
• Differences in ISAs between computers • Hierarchical layers of abstraction
• Differences in complexity between instructions – In both hardware and software
Instruction count • Instruction set architecture
MIPS 
Execution time  10 6 – The hardware/software interface
Instruction count Clock rate • Execution time: the best performance measure
 
Instruction count  CPI CPI  10 6
 10 6 • Power is a limiting factor
Clock rate
– Use parallelism to improve performance
• CPI varies between programs on a given CPU
Chapter 1: Computer Technology 53 Chapter 1: Computer Technology 54

14

You might also like