You are on page 1of 57

Chapter 1

Computer Abstractions and Technology

1.1 Intro oduction

The Computer Revolution

Progress in computer technology

Underpinned by Moores Moore s Law Computers i C in automobiles bil Cell phones Human genome project World Wide Web Search Engines

Makes novel applications feasible


Computers p are p pervasive


Chapter 1 Computer Abstractions and Technology 2

Classes of Computers

Desktop computers
z

Designed to deliver good performance to a single user at low cost usually executing 3rd party software, usually incorporating a graphics display, a keyboard, and a mouse Used to run larger programs for multiple multiple, simultaneous users typically accessed only via a network and that places a greater emphasis on dependability and (often) security A high performance, high cost class of servers with hundreds to thousands of processors, terabytes of memory and petabytes of storage that are used for high-end scientific and engineering applications A computer inside another device used for running one predetermined application
Irwin, PSU, 2008

Servers
z

Supercomputers
z

Embedded computers (processors)


z

CSE431 Chapter 1.3

Review: Some Basic Definitions


Kilobyte 210 or 1,024 bytes Megabyte 220 or 1,048,576 Megabyte 1 048 576 bytes
z

sometimes rounded to 106 or 1,000,000 bytes

Gigabyte 230 or 1,073,741,824 bytes


z

sometimes rounded to 109 or 1,000,000,000 bytes

Terabyte y 240 or 1,099,511,627,776 , , , , bytes y


z

sometimes rounded to 1012 or 1,000,000,000,000 bytes

Petabyte y 250 or 1024 terabytes y


z

sometimes rounded to 1015 or 1,000,000,000,000,000 bytes

Exabyte 260 or 1024 petabytes


z

Sometimes rounded to 1018 or 1,000,000,000,000,000,000 bytes


Irwin, PSU, 2008

CSE431 Chapter 1.4

Growth in Cell Phone Sales (Embedded)


embedded growth >> desktop growth

Where else are embedded processors found?


Irwin, PSU, 2008

CSE431 Chapter 1.5

Embedded Processor Characteristics


The largest class of f computers spanning the widest range of applications and performance

Often have minimum performance requirements. Example? Often have stringent limitations on cost. Example? Often have Oft h stringent ti t limitations li it ti on power consumption. ti Example? Often have low tolerance for failure failure. Example?

CSE431 Chapter 1.6

Irwin, PSU, 2008

Below the Program


Applications software Systems software Hardware

System y software
z

Operating system supervising program that interfaces the users program with the hardware (e.g., Linux, MacOS, Windows)
- Handles basic input and output operations - Allocates storage and memory - Provides for protected sharing among multiple applications

Compiler translate programs written in a high-level language (e.g., C, Java) into instructions that the hardware can execute
Irwin, PSU, 2008

CSE431 Chapter 1.7

Below the Program, Cont

High level language program (in C) High-level


swap (int v[], int k) (int temp; temp = v[k]; v[k] = v[k+1]; v[k+1] = temp; )

one-to-many C compiler

Assembly language program (for MIPS)


swap: sll add lw lw sw sw jr $2, $5, 2 $2 $4 $2, $4, $2 $15, 0($2) $16, 4($2) $16, 0($2) $15, 4($2) $31

one-to-one assembler

Machine (object, binary) code (for MIPS)


000000 00000 00101 0001000010000000 000000 00100 00010 0001000000100000 . . .
Irwin, PSU, 2008

CSE431 Chapter 1.8

Advantages of Higher-Level Languages ?

Higher-level languages

As a result, very little programming is done today at the h assembler bl l level l


Irwin, PSU, 2008

CSE431 Chapter 1.9

Advantages of Higher-Level Languages ?

Higher-level languages
z

Allow the programmer to think in a more natural language and for their intended use (Fortran for scientific computation, C b l for Cobol f business b i programming, i Li Lisp f for symbol b l manipulation, Java for web programming, ) Improve programmer productivity more understandable code that is easier to debug and validate Improve program maintainability Allow programs to be independent of the computer on which they are developed (compilers and assemblers can translate high-level language programs to the binary instructions of any machine) Emergence of optimizing compilers that produce very efficient assembly code optimized for the target machine

z z z

As a result, very little programming is done today at the h assembler bl l level l


Irwin, PSU, 2008

CSE431 Chapter 1.10

Under the Covers

Five classic components of f a computer input, output, memory, datapath, and control

datapath + control = processor (CPU)

CSE431 Chapter 1.11

Irwin, PSU, 2008

Anatomy of a Computer
Output device

Network cable

Input device

Input device

Chapter 1 Computer Abstractions and Technology 12

Anatomy of a Mouse

Optical mouse

LED illuminates desktop Small low-res camera Basic image processor

Looks for x, y movement

Buttons & wheel

Supersedes roller-ball mechanical mouse

Chapter 1 Computer Abstractions and Technology 13

Through the Looking Glass

LCD screen: picture elements (pixels)

Mirrors content of frame buffer memory

Chapter 1 Computer Abstractions and Technology 14

Opening the Box

Chapter 1 Computer Abstractions and Technology 15

Inside the Processor (CPU)


Datapath: performs operations on data Control: sequences datapath datapath, memory memory, ... Cache memory

Small fast SRAM memory for immediate access to data

Chapter 1 Computer Abstractions and Technology 16

AMDs Barcelona Multicore Chip

51 12KB L2 2

2MB s shared d L3 C Cache

51 12KB L2 2

Four out-oforder cores on one chip p 1.9 GHz clock rate 65nm technology Three levels of caches (L1 L2 (L1, L2, L3) on chip Integrated g Northbridge
Irwin, PSU, 2008

Core 1

Core 2

Northbridge

512KB B L2

Core 3

512KB B L2

Core 4

CSE431 Chapter 1.17

http://www.techwarelabs.com/reviews/processors/barcelona/

Instruction Set Architecture (ISA)

ISA, S or simply architecture the abstract interface f between the hardware and the lowest level software that encompasses all the information necessary to write a machine language program, including instructions, registers, memory access, I/O,
z

Enables implementations p of varying y g cost and p performance to run identical software

The combination Th bi ti of f the th basic b i instruction i t ti set t (the (th ISA) and the operating system interface is called the application binary interface (ABI)
z

ABI The user portion of the instruction set plus the operating system interfaces used by application programmers. Defines a standard for binary portability across computers.

CSE431 Chapter 1.18

Irwin, PSU, 2008

Moores Law

In 1965, Intels G Gordon Moore predicted that the number of transistors that can be integrated on single chip would double about every two years

Dual Core Itanium with 1 7B transistors 1.7B

feature size & die size

CSE431 Chapter 1.19

Courtesy, Intel

Irwin, PSU, 2008

Technology Scaling Road Map (ITRS)

Year Feature size (nm) Intg Capacity (BT) Intg.

2004 90 2

2006 65 4

2008 45 6

2010 32 16

2012 22 32

Fun facts about 45nm transistors


z z z

30 million can fit on the head of a pin You could fit more than 2,000 across the width of a human hair If car prices had fallen at the same rate as the price of a single transistor has since 1968, a new car today would cost about 1 cent

CSE431 Chapter 1.20

Irwin, PSU, 2008

Another Example of Moores Law Impact


DRAM capacity growth over 3 decades
1G 256M 64M 128M 4M 1M 64K 16K 16M 512M

256K

CSE431 Chapter 1.21

Irwin, PSU, 2008

But What Happened to Clock Rates and Why?

Clock rates hit a power wall

120 100
Powe er(Watts)
CSE431 Chapter 1.22

80 60 40 20 0

Irwin, PSU, 2008

For the P6, success criteria included performance above a certain level and failure criteria included power di i i above dissipation b some threshold. h h ld
Bob Colwell, Pentium Chronicles

CSE431 Chapter 1.23

Irwin, PSU, 2008

A Sea Change is at Hand

The power challenge has forced f a change in the design of microprocessors


z

Since 2002 the rate of improvement p in the response p time of programs on desktop computers has slowed from a factor of 1.5 per year to less than a factor of 1.2 per year

As of 2006 all desktop and server companies are shipping microprocessors with multiple processors cores per chip
P d Product Cores per chip Clock rate Power AMD Barcelona 4 2.5 GHz 120 W Intel I l Nehalem 4 ~2.5 GHz? ~100 W? IBM Power P 6 Sun S Ni Niagara 2 2 4.7 GHz ~100 W? 8 1.4 GHz 94 W

Plan of record is to double the number of cores per chip per generation (about every two years)
Irwin, PSU, 2008

CSE431 Chapter 1.24

What You Will Learn

How programs are translated into the machine language

And how the hardware executes them

The hard hardware/software are/soft are interface What determines program performance

And how it can be improved

How hardware designers g improve p performance What is parallel processing


Chapter 1 Computer Abstractions and Technology 25

Understanding Performance

Algorithm

Determines number of operations executed Determine number of machine instructions executed per operation Determine how fast instructions are executed Determines how fast I/O operations are executed

Programming language, compiler, architecture

Processor and memory system

I/O system (including OS)

Chapter 1 Computer Abstractions and Technology 26

Abstractions
The BIG Picture

Abstraction helps us deal with complexity

Hide lower-level detail The hardware/software interface The ISA p plus system y software interface The details underlying and interface
Chapter 1 Computer Abstractions and Technology 27

Instruction set architecture (ISA)

Application binary interface

Implementation

A Safe Place for Data

Volatile main memory

Loses instructions and data when power off Magnetic disk Flash memory Optical disk (CDROM, DVD)

Non-volatile secondary memory


Chapter 1 Computer Abstractions and Technology 28

Networks

Communication and resource sharing Local area network (LAN): Ethernet

Within a building

Wide area network (WAN: the Internet Wireless network: WiFi, Bluetooth

Chapter 1 Computer Abstractions and Technology 29

Technology Trends

Electronics technology gy continues to evolve

Increased capacity and performance Reduced cost


Technology Vacuum tube Transistor Integrated circuit (IC) Very large scale IC (VLSI) Ult l Ultra large scale l IC

DRAM capacity

Year 1951 1965 1975 1995 2005

Relative performance/cost 1 35 900 2,400,000 6 200 000 000 6,200,000,000


Chapter 1 Computer Abstractions and Technology 30

1.4 Perf formance

Defining Performance

Which airplane has the best performance?


Boeing 777 Boeing 747 BAC/Sud Concorde Douglas DC-8-50 0 100 200 300 400 500 Boeing 777 Boeing 747 BAC/Sud Concorde Douglas DC8-50 0 2000 4000 6000 8000 10000

Passenger Capacity

Cruising Range (miles)

Boeing 777 Boeing 747 BAC/Sud Concorde Douglas DC-8-50 0 500 1000 1500

Boeing 777 Boeing 747 BAC/Sud Concorde Douglas DC8-50 0 100000 200000 300000 400000 Passengers x mph

Cruising Speed (mph)

Chapter 1 Computer Abstractions and Technology 31

Response Time and Throughput

Response time

How long it takes to do a task Total work done p per unit time

Throughput

e.g., tasks/transactions/ per hour

How are response time and throughput affected by


Replacing the processor with a faster version? Addi more processors? Adding ?

Well focus on response time for now


Chapter 1 Computer Abstractions and Technology 32

Relative Performance

Define Performance = 1/Execution Time X is n time faster than Y Y X


Performanc e X Performanc e Y = Execution time Y Execution time X = n

Example: time taken to run a program


10s on A, 15s on B Execution TimeB / Execution TimeA = 15s / 10s = 1.5 So A is 1.5 1 5 times faster than B
Chapter 1 Computer Abstractions and Technology 33

Measuring Execution Time

Elapsed time

Total response time time, including all aspects

Processing, I/O, OS overhead, idle time

Determines system y performance p Time spent p p processing gag given j job

CPU time

Discounts I/O time, other jobs shares

Comprises user CPU time and system CPU time ti Different programs are affected differently by CPU and system performance
Chapter 1 Computer Abstractions and Technology 34

CPU Clocking

Operation of digital hardware governed by a constant-rate clock


Clock period

Clock (cycles) Data transfer and computation U d t state Update t t

Clock p period: duration of a clock cycle y

e.g., 250ps = 0.25ns = 2501012s e.g., 4.0GHz = 4000MHz = 4.0109Hz


Chapter 1 Computer Abstractions and Technology 35

Clock frequency (rate): cycles per second

CPU Time
CPU Time = CPU Clock Cycles Clock Cycle Time CPU Clock Cycles = Clock Rate

Performance improved by

Reducing number of clock cycles Increasing clock rate Hardware designer must often trade off clock rate against cycle count

Chapter 1 Computer Abstractions and Technology 36

CPU Time Example


Computer A: 2GHz clock, 10s CPU time Designing g g Computer p B


Aim for 6s CPU time Can do faster clock, but causes 1.2 clock cycles

How fast must Computer B clock be?

Clock CyclesB 1.2 Clock CyclesA = Clock RateB = CPU Time B 6s Clock Cycles A = CPU Time A Clock Rate A = 10s 2GHz = 20 10 9 1.2 20 10 9 24 10 9 Clock RateB = = = 4GHz 6s 6s
Chapter 1 Computer Abstractions and Technology 37

Instruction Count and CPI


Clock Cycles = Instruction Count Cycles per Instruction CPU Time = Instruction Count CPI Clock Cycle Time Instruction Count CPI = Clock Rate

Instruction Count for a p program g

Determined by program, ISA and compiler Determined by CPU hardware If different instructions have different CPI

Average cycles per instruction


Average CPI affected by instruction mix


Chapter 1 Computer Abstractions and Technology 38

CPI Example

Computer A: Cycle Time = 250ps, CPI = 2.0 Computer p B: Cycle y Time = 500ps, p , CPI = 1.2 Same ISA Which is faster, and by y how much?
CPU Time A B = Instruction Count CPI Cycle Time A A = I 2.0 2 0 250ps = I 500ps A is faster faster = Instruction Count CPI Cycle Time B B = I 1.2 1 2 500ps = I 600ps
by this much

CPU Time CPU Time

B = I 600ps = 1.2 CPU Time I 500ps A

Chapter 1 Computer Abstractions and Technology 39

CPI in More Detail

If different instruction classes take different numbers of cycles


Clock Cycles = (CPIi Instruction Count i )
i=1 n

Weighted average CPI

n Clock Cycles Instruction Count i = CPIi CPI = I t ti n Count Instructio C t i=1 I t ti n Count Instructio C t

Relative frequency
Chapter 1 Computer Abstractions and Technology 40

CPI Example

Alternative compiled code sequences using instructions in classes A, B, C


Class CPI for class IC in sequence 1 IC in sequence 2 A 1 2 4

B 2 1 1

C 3 2 1

Sequence 1: IC = 5

Sequence 2: IC = 6

Clock Cycles = 21 + 12 + 23 = 10 A Avg. CPI = 10/5 = 2.0 20

Clock Cycles = 41 + 12 + 13 =9 A Avg. CPI = 9/6 = 1 1.5 5

Chapter 1 Computer Abstractions and Technology 41

Performance Summary
The BIG Picture

Instructions Clock cycles Seconds CPU Time = Program Instruction Clock cycle

Performance depends on

Algorithm: affects IC, possibly CPI Programming language: affects IC, CPI Compiler: affects IC, CPI Instruction set architecture: affects IC, CPI, Tc
Chapter 1 Computer Abstractions and Technology 42

1.5 The Power Wall

Power Trends

In CMOS IC technology
Power = Capacitive load Voltage 2 Frequency
30 5V 1V 1000

Chapter 1 Computer Abstractions and Technology 43

Reducing Power

Suppose a new CPU has


85% of capacitive load of old CPU 15% voltage and 15% frequency reduction

Pnew Cold 0.85 (Vold 0.85)2 Fold 0.85 4 = = 0.85 = 0.52 2 Pold Cold Vold Fold

The power wall


We cant can t reduce voltage further We cant remove more heat

How else can we improve performance?


Chapter 1 Computer Abstractions and Technology 44

1.6 The Sea Chan nge: The Switch to Multiproce essors

Uniprocessor Performance

Constrained by power, instruction-level parallelism, memory latency l t


Chapter 1 Computer Abstractions and Technology 45

Multiprocessors

Multicore microprocessors

More than one processor per chip C Compare with ihi instruction i l level l parallelism ll li

Requires explicitly parallel programming

Hardware executes multiple instructions at once Hidden from the programmer Programming for performance Load balancing Optimizing communication and synchronization
Chapter 1 Computer Abstractions and Technology 46

Hard to do

1.7 Rea al Stuff: Th he AMD Op pteron X4

Manufacturing ICs

Yield: proportion of working dies per wafer


Chapter 1 Computer Abstractions and Technology 47

AMD Opteron X2 Wafer

X2: 300mm wafer, 117 chips, 90nm technology X4: 45nm technology
Chapter 1 Computer Abstractions and Technology 48

Integrated Circuit Cost


Cost per wafer Cost per die = Dies per wafer Yield Dies per wafer Wafer area Die area 1 Yield = (1+ (Defects per area Die area/2))2

Nonlinear relation to area and defect rate


Wafer cost and area are fixed Defect rate determined by manufacturing process Die area determined by architecture and circuit design
Chapter 1 Computer Abstractions and Technology 49

SPEC CPU Benchmark

Programs used to measure performance

Supposedly typical of actual workload Develops benchmarks for CPU, I/O, Web, Elapsed time to execute a selection of programs

Standard Performance Evaluation Corp (SPEC)

SPEC CPU2006

Negligible I/O, so focuses on CPU performance

Normalize relative to reference machine Summarize as geometric mean of performance ratios

CINT2006 ( (integer) g ) and CFP2006 ( (floating-point) gp )

Execution time ratio


i=1

Chapter 1 Computer Abstractions and Technology 50

CINT2006 for Opteron X4 2356


Name perl bzip2 gcc mcf go hmmer sjeng libquantum h264avc omnetpp astar xalancbmk Description Interpreted string processing Block-sorting compression GNU C Compiler Combinatorial optimization Go game (AI) Search gene sequence Chess game (AI) Quantum computer simulation Video compression Discrete event simulation Games/path finding XML parsing IC109 2,118 2,389 1,050 336 1 658 1,658 2,783 2,176 1,623 3,102 587 1,082 1 058 1,058 CPI 0.75 0.85 1.72 10.00 1 09 1.09 0.80 0.96 1.61 0.80 2.94 1.79 2 70 2.70 Tc (ns) 0.40 0.40 0.47 0.40 0 40 0.40 0.40 0.48 0.40 0.40 0.40 0.40 0 40 0.40 Exec time 637 817 24 1,345 721 890 37 1,047 993 690 773 1 143 1,143 Ref time 9,777 9,650 8,050 9,120 10 490 10,490 9,330 12,100 20,720 22,130 6,250 7,020 6 900 6,900 11.7 SPECratio 15.3 11.8 11.1 6.8 14 6 14.6 10.5 14.5 19.8 22.3 9.1 9.1 60 6.0

Geometric mean

High cache miss rates


Chapter 1 Computer Abstractions and Technology 51

SPEC Power Benchmark

Power consumption of server at different workload levels


Performance: ssj_ops/sec Power: Watts (Joules/sec)

10 10 Overall ssj_ops ssj ops per Watt = ssj_ops ssj ops i poweri i=0 i=0

Chapter 1 Computer Abstractions and Technology 52

SPECpower_ssj2008 for X4
Target Load % 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Overall sum ssj_ops/ ssj ops/ power Performance (ssj_ops/sec) 231,867 211,282 185,803 163,427 , 140,160 118,324 920,35 70,500 47,126 23 066 23,066 0 1,283,590 Average Power (Watts) 295 286 275 265 256 246 233 222 206 180 141 2,605 493

Chapter 1 Computer Abstractions and Technology 53

1.8 Falla acies and Pitfalls

Pitfall: Amdahls Law

Improving an aspect of a computer and expecting p gap proportional p improvement p in overall performance
Timproved Taffected = + Tunaffected improvemen t factor

Example: p multiply p y accounts for 80s/100s

How much improvement in multiply performance to get 5 overall? 80 Cant be done! 20 = + 20 n

Corollary: make the common case fast


Chapter 1 Computer Abstractions and Technology 54

Fallacy: Low Power at Idle

Look back at X4 power benchmark


At 100% load: 295W At 50% load: 246W (83%) At 10% load: 180W (61%) Mostly operates at 10% 50% load At 100% load less than 1% of the time

Google data center


Consider designing processors to make power proportional to load


Chapter 1 Computer Abstractions and Technology 55

Pitfall: MIPS as a Performance Metric

MIPS: Millions of Instructions Per Second

Doesnt Doesn t account for


Differences in ISAs between computers Differences in complexity between instructions


Instruction count E Execution ti time ti 10 6

MIPS =

Instruction count Clock rate = = 6 Instruction count CPI CPI 10 6 10 Clock rate

CPI varies between programs on a given CPU


Chapter 1 Computer Abstractions and Technology 56

1.9 Con ncluding R Remarks

Concluding Remarks

Cost/performance is improving

Due to underlying technology development In both hardware and software The hardware/software interface

Hierarchical layers of abstraction

Instruction set architecture

Execution time: the best performance measure Power is a limiting factor

Use parallelism to improve performance


Chapter 1 Computer Abstractions and Technology 57

You might also like