Computer Architecture 13-14

Chapter 1: Measuring &
Understanding Performance

[Adapted from Computer Organization and Design, 4th Edition,
Patterson & Hennessy, © 2009, MK]

Chapter 1. Measuring and understanding performance

1

Dept. of Computer Architecture, UMA, Sept 2013

What you Should Know from Last Year
Basic logic design & machine organization
logical minimization, component design
processor, memory, I/O

Create, assemble, run, debug programs in an
assembly language
MIPS preferred

Create, simulate, and debug hardware structures in a
simulator
Xillin

Create, compile, and run C (C++, Java) programs

Chapter 1. Measuring and understanding performance

2

Dept. of Computer Architecture, UMA, Sept 2013

Review: Some Basic Definitions
Kilobyte – 210 or 1,024 bytes
Megabyte– 220 or 1,048,576 bytes
sometimes “rounded” to 106 or 1,000,000 bytes

Gigabyte – 230 or 1,073,741,824 bytes
sometimes rounded to 109 or 1,000,000,000 bytes

Terabyte – 240 or 1,099,511,627,776 bytes
sometimes rounded to 1012 or 1,000,000,000,000 bytes

Petabyte – 250 or 1024 terabytes
sometimes rounded to 1015 or 1,000,000,000,000,000 bytes

Exabyte – 260 or 1024 petabytes
Sometimes rounded to 1018 or 1,000,000,000,000,000,000 bytes
Chapter 1. Measuring and understanding performance

3

Dept. of Computer Architecture, UMA, Sept 2013

Classes of Computers
1.

Desktop computers

(ej.: PC, laptop)

1. good performance,
2. single user
3. low cost
4. (graphics display, a keyboard, and a mouse)
2.

Servers (ej.: workstations, web servers, file storage)
1. larger programs
2. simultaneous users
3. network
4. security

Chapter 1. Measuring and understanding performance

4

Dept. of Computer Architecture, UMA, Sept 2013

Classes of Computers
3.

Supercomputers (weather, oil exploration, protein structure)
1. hundreds to thousands of processors,
2. terabytes of memory and petabytes of storage
3. high-end scientific and engineering applications

4.

Embedded computers (processors) (ej.: cell phones, cars, video
game, TV, digital cameras)

- A computer inside another device used for running one
predetermined application

Chapter 1. Measuring and understanding performance

5

Dept. of Computer Architecture, UMA, Sept 2013

Sept 2013 . Measuring and understanding performance 6 Dept.The Computer Revolution Progress in computer technology Underpinned by Moore’s Law Makes novel applications feasible Computers in automobiles Cell phones Human genome project World Wide Web Search Engines Computers are pervasive Chapter 1. UMA. of Computer Architecture.

Measuring and understanding performance 7 Dept. Sept 2013 . of Computer Architecture. UMA.The Processor Market embedded growth >> desktop growth Where else are embedded processors found? Chapter 1.

Measuring and understanding performance 9 Dept. of Computer Architecture.What You Will Learn How programs are translated into the machine language And how the hardware executes them About the hardware/software interface What determines program performance And how it can be improved How hardware designers improve performance What parallel processing is Chapter 1. Sept 2013 . UMA.

architecture Determine number of machine instructions executed per operation Processor and memory system Determine how fast instructions are executed I/O system (including OS) Determines how fast I/O operations are executed Chapter 1.Understanding Performance Algorithm Determines number of operations executed Programming language. Measuring and understanding performance 10 Dept. Sept 2013 . of Computer Architecture. UMA. compiler.

Windows) . Linux.Allocates storage and memory .Handles basic input and output operations . Measuring and understanding performance 11 Dept. C..g. of Computer Architecture.Underneath the Program Applications software Written in high-level language Systems software Hardware System software Operating system – supervising program that interfaces the user’s program with the hardware (e. UMA. Sept 2013 .Schedules tasks & Provides for protected sharing among multiple applications Compiler – translate programs written in a high-level language (e. Java) into instructions that the hardware can execute Chapter 1..g. MacOS.

4($2) $16. binary) code (for MIPS) 000000 00000 00101 0001000010000000 000000 00100 00010 0001000000100000 .Below the Program. $5. $2 $15. of Computer Architecture. 4($2) $31 one-to-one assembler Machine (object. Measuring and understanding performance 12 Dept. int k) (int temp. temp = v[k]. 0($2) $15. ) one-to-many C compiler Assembly language program (for MIPS) swap: sll add lw lw sw sw jr $2. . 0($2) $16. Sept 2013 . Chapter 1. 2 $2. v[k+1] = temp. . UMA. Con’t High-level language program (in C) swap (int v[]. $4. v[k] = v[k+1].

Sept 2013 . UMA. very little programming is done today at the assembler level Chapter 1.Advantages of Higher-Level Languages ? Higher-level languages Allow the programmer to think in a more natural language Improve programmer productivity Improve program maintainability Allow programs to be independent of the computer on which they are developed Emergence of optimizing compilers that produce very efficient assembly code optimized for the target machine As a result. Measuring and understanding performance 13 Dept. of Computer Architecture.

of Computer Architecture. mouse Storage devices Hard disk. UMA. Measuring and understanding performance 14 Dept.Under the Covers Same components for all kinds of computer Five classic components of a computer – input. CD/DVD. memory. flash Network adapters For communicating with other computers Chapter 1. datapath. and control datapath + control = processor (CPU) Input/output includes User-interface devices Display. Sept 2013 . keyboard. output.

of Computer Architecture. UMA. Measuring and understanding performance 15 Dept.Opening the Box Chapter 1. Sept 2013 .

of Computer Architecture. UMA.512KB L2 Four out-oforder cores on one chip 1. L3) on chip Core 4 Integrated Northbridge http://www.techwarelabs.9 GHz clock rate Core 2 65nm technology Core 3 512KB L2 Northbridg e 512KB L2 2MB shared L3 Cache Core 1 512KB L2 AMD’s Barcelona Multicore Chip Three levels of caches (L1. Sept 2013 . L2. Measuring and understanding performance 16 Dept.com/reviews/processors/barcelona/ Chapter 1.

UMA. DVD) Chapter 1. Sept 2013 . of Computer Architecture. Measuring and understanding performance 17 Dept.A Safe Place for Data Volatile main memory Loses instructions and data when power off Non-volatile secondary memory Magnetic disk Flash memory Optical disk (CDROM.

Measuring and understanding performance 18 Dept. of Computer Architecture. Bluetooth Chapter 1. UMA. Sept 2013 .Networks Communication and resource sharing Local area network (LAN): Ethernet Within a building Wide area network (WAN: the Internet Wireless network: WiFi.

The BIG Picture Abstraction helps us deal with complexity Hide lower-level detail Instruction set architecture (ISA) The hardware/software interface Application binary interface The ISA plus system software interface Implementation The details underlying and interface Chapter 1. Measuring and understanding performance 19 Dept. of Computer Architecture. UMA. Sept 2013 .

000 6. of Computer Architecture. Sept 2013 .Technology Trends Electronics technology continues to evolve Increased capacity and performance Reduced cost Year Technology 1951 Vacuum tube 1965 Transistor 1975 Integrated circuit (IC) 1995 Very large scale IC (VLSI) 2005 Ultra large scale IC Chapter 1. Measuring and understanding performance DRAM capacity Relative performance/cost 1 35 900 2.000 20 Dept.400.200.000. UMA.

7B transistors feature size & die size Chapter 1.Moore’s Law In 1965. Sept 2013 . of Computer Architecture. Intel ® Dept. Intel’s Gordon Moore predicted that the number of transistors that can be integrated on single chip would double about every two years Dual Core Itanium with 1. UMA. Measuring and understanding performance 22 Courtesy.

a new car today would cost about 1 cent Chapter 1. Sept 2013 .Technology Scaling Road Map (ITRS) Year 2004 2006 2008 2010 2012 Feature size (nm) 90 65 45 32 22 Intg.000 across the width of a human hair If car prices had fallen at the same rate as the price of a single transistor has since 1968. UMA. Measuring and understanding performance 23 Dept. Capacity (BT) 2 4 6 16 32 Fun facts about 45nm transistors 30 million can fit on the head of a pin You could fit more than 2. of Computer Architecture.

Measuring and understanding performance 24 Dept. Sept 2013 .Another Example of Moore’s Law Impact DRAM capacity growth over 3 decades 1G 256M 512M 64M 128M 4M 16M 1M 64K 256K 16K Chapter 1. UMA. of Computer Architecture.

Measuring and understanding performance 25 Dept. of Computer Architecture. UMA.But What Happened to Clock Rates and Why? Power (Watts) Clock rates hit a “power wall” Chapter 1. Sept 2013 .

Tendency “For the P6.” Bob Colwell. UMA. Sept 2013 . Measuring and understanding performance 26 Dept. Pentium Chronicles Chapter 1. of Computer Architecture. success criteria included performance above a certain level and failure criteria included power dissipation above some threshold.

5 per year to less than a factor of 1. of Computer Architecture. UMA.5 GHz 4.7 GHz 1.A Sea Change is at Hand The power challenge has forced a change in the design of microprocessors Since 2002 the rate of improvement in the response time of programs on desktop computers has slowed from a factor of 1.4 GHz 120 W ~100 W? ~100 W 94 W Plan of record is to double the number of cores per chip per generation (about every two years) Chapter 1.2 per year As of 2006 all desktop and server companies are shipping microprocessors with multiple processors – cores – per chip Product Cores per chip Clock rate Power AMD Barcelona Intel Nehalem IBM Power 6 Sun Niagara 2 4 4 2 8 2.5 GHz ~2. Measuring and understanding performance 27 Dept. Sept 2013 .

best cost/performance? Both require basis for comparison metric for evaluation Our goal is to compare which factors affect performances and their relative weight Chapter 1. Sept 2013 .least cost ? .best performance improvement ? . which has the . UMA.best performance ? . Measuring and understanding performance 28 Dept. which has the . of Computer Architecture.What is Performance Metrics Purchasing perspective given a collection of machines.best cost/performance? Design perspective faced with design options.least cost ? .

UMA. which are more focused on throughput • How are response time and throughput affected by o Replacing the processor with a faster version? o Adding more processors? We’ll focus on response time for now… Chapter 1. which are more focused on response time. Sept 2013 . versus servers. of Computer Architecture.Comparing Throughput and Response Time Response time (execution time) – the time between the start and the completion of a task • Important to individual users Throughput (bandwidth) – the total amount of work done in a given unit time • Important to data center managers Comparison • We will need different performance metrics as well as a different set of applications to benchmark embedded and desktop computers. Measuring and understanding performance 29 Dept.

need to minimize execution time performanceX = 1 / execution_timeX If X is n times faster than Y.= --------------------. UMA. Measuring and understanding performance 31 Dept. Sept 2013 . then performanceX execution_timeY -------------------. of Computer Architecture.= n performanceY execution_timeX Decreasing response time frequently improves throughput Chapter 1.What is Speed To maximize performance.

how much faster is A than B? We know that A is n times faster than B if performanceA execution_timeB -------------------.5 times faster than B Chapter 1. of Computer Architecture.5 10 So A is 1.= 1.= n performanceB execution_timeA The performance ratio is 15 -----. Sept 2013 .= --------------------. UMA. Measuring and understanding performance 32 Dept.Relative Performance Example If computer A runs a program in 10 seconds and computer B runs the same program in 15 seconds.

Measuring and understanding performance 33 Dept.it includes all aspects to complete a task . idle time Determines system performance Productivity Throughput: the total amount of work done in a given unit time Chapter 1. I/O operations. of Computer Architecture. UMA. Sept 2013 .Processing. OS overhead.Measuring Execution Time Elapsed time Total response time= Wall clock Time = Elapsed Time .

other jobs’ shares Comprises user CPU time and system CPU time Different programs are affected differently by CPU and system performance Elapsed time Example: ‘time’ in Unix: 90.Substract I/O time. Measuring and understanding performance 34 Dept. of Computer Architecture. Sept 2013 .9) / (2*60 + 39) I/O and other processes system CPU time Our goal: user CPU time + system CPU time Chapter 1.7u 12.: (90.9s 2:39 user CPU time 65% CPU utiliz.Measuring Execution Time CPU time Time spent processing a given job . UMA.7 + 12.

UMA. Measuring and understanding performance 35 Dept. of Computer Architecture. Sept 2013 .Review: Machine Clock Rate Clock rate (clock cycles per second in MHz or GHz) is inverse of clock cycle time (clock period) CC = 1 / CR one clock period 10 nsec clock cycle => 100 MHz clock rate 5 nsec clock cycle => 200 MHz clock rate 2 nsec clock cycle => 500 MHz clock rate 1 nsec (10-9) clock cycle => 1 GHz (109) clock rate 500 psec clock cycle => 2 GHz clock rate 250 psec clock cycle => 4 GHz clock rate 200 psec clock cycle => 5 GHz clock rate Chapter 1.

Performance Factors CPU execution time (CPU time) – time the CPU spends working on a task Does not include time waiting for I/O or running other programs CPU execution time = # CPU clock cyclesx clock cycle time for a program for a program or CPU execution time = #------------------------------------------CPU clock cycles for a program for a program clock rate Can improve performance by increasing the the clock rate or by reducing the number of clock cycles required for a program ● Hardware designer must often trade off clock rate against cycle count Chapter 1. of Computer Architecture. Measuring and understanding performance Dept. UMA. Sept 2013 36 .

Improving Performance Example A program runs on computer A with a 2 GHz clock in 10 seconds. CPU timeA = ------------------------------CPU clock cyclesA clock rateA CPU clock cyclesA = 10 sec x 2 x 109 cycles/sec = 20 x 109 cycles CPU timeB = ------------------------------1. UMA.2 times as many clock cycles as computer A to run the program.2 x 20 x 109 cycles clock rateB clock rateB = ------------------------------1. computer B will require 1. of Computer Architecture. What clock rate must computer B run at to run this program in 6 seconds? Unfortunately. Sept 2013 . Measuring and understanding performance 37 Dept.2 x 20 x 109 cycles = 4 GHz 6 seconds Chapter 1. to accomplish this.

Sept 2013 . UMA. of Computer Architecture.Clock Cycles per Instruction Not all instructions take the same amount of time to execute One way to think about execution time is that it equals the number of instructions executed multiplied by the average time per instruction # CPU clock cycles # Instructions Average clock cycles = for a program x for a program per instruction Clock cycles per instruction (CPI) – the average number of clock cycles each instruction takes to execute A way to compare two different implementations of the same ISA CPI CPI for this instruction class A B C 1 2 3 Chapter 1. Measuring and understanding performance 38 Dept.

I. of Computer Architecture. Computer A has a clock cycle time of 250 ps and an effective CPI of 2.0 x 250 ps = 500 x I ps CPU timeB = I x 1.2 for the same program. so CPU timeA = I x 2.= --------------------.2 x 500 ps = 600 x I ps Clearly.0 for some program and computer B has a clock cycle time of 500 ps and an effective CPI of 1. Which computer is faster and by how much? Each computer executes the same number of instructions. Measuring and understanding performance 39 Dept.= ---------------.2 performanceB execution_timeA 500 x I ps Chapter 1. A is faster … by the ratio of execution times performanceA execution_timeB 600 x I ps ------------------.Using the Performance Equation Computers A and B implement the same ISA. Sept 2013 .= 1. UMA.

of Computer Architecture.Effective (Average) CPI Computing the overall effective CPI is done by looking at the different types of instructions and their individual cycle counts and averaging n Overall effective CPI = Σ (CPIi x ICi) i=1 Where ICi is the count (percentage) of the number of instructions of class i executed CPIi is the (average) number of clock cycles per instruction for that instruction class n is the number of instruction classes The overall effective CPI varies by instruction mix – a measure of the dynamic frequency of instructions across one or many programs Chapter 1. UMA. Measuring and understanding performance 40 Dept. Sept 2013 .

THE Performance Equation Our basic performance equation is then CPU time = Instruction_count x CPI x clock_cycle or CPU time = Instruction_count x CPI ----------------------------------------------clock_rate These equations separate the three key factors that affect performance Can measure the CPU execution time by running the program The clock rate is usually given Can measure overall instruction count by using profilers/ simulators without knowing all of the implementation details CPI varies by instruction type and ISA implementation for which we must know the implementation details Chapter 1. UMA. Measuring and understanding performance 41 Dept. of Computer Architecture. Sept 2013 .

Measuring and understanding performance clock_cycle X 42 Dept.Determinates of CPU Performance CPU time = Instruction_count x CPI x clock_cycle Instruction_ count CPI X X X X X X X X X X X Algorithm Programming language Compiler ISA Core organization Technology Chapter 1. UMA. Sept 2013 . of Computer Architecture.

6 means 37.A Simple Example Op Freq CPIi Freq x CPIi ALU 50% 1 .2/1.2 1.95 x IC x CC so 2.6 2.95 Σ= How much faster would the machine be if a better data cache reduced the average load time to 2 cycles? CPU time new = 1.4 .8% faster Chapter 1.2 .3 .3 .5% faster How does this compare with using branch prediction to shave a cycle off the branch time? CPU time new = 2.2/2. of Computer Architecture.0 means 10% faster What if two ALU instructions could be executed at once? CPU time new = 1.5 .3 Branch 20% 2 .0 1.0 1.5 .0 .95 means 12. UMA.0 Store 10% 3 .5 .6 x IC x CC so 2.4 .0 x IC x CC so 2.25 Load 20% 5 1. Measuring and understanding performance 43 Dept.2/1.3 . Sept 2013 .4 2.4 1.

the processor organization. Chapter 1. and compilation techniques. at what cost? Can it be programmed? Ease of compilation? Static Metrics: How many bytes does the program occupy in memory? Dynamic Metrics: How many instructions are executed? How many bytes does the processor fetch to execute the program? CPI How many clocks are required per instruction? How "lean" a clock is practical? Best Metric: Time to execute the program! depends on the instructions set. in how long. UMA. Count Cycle Time Dept. Sept 2013 . Measuring and understanding performance 44 Inst. of Computer Architecture.Summary: Evaluating ISAs Design-time metrics: Can it be implemented.

of Computer Architecture. UMA. Sept 2013 .Differences in ISAs between computers . Measuring and understanding performance 45 Dept.Warning: MIPS as a Performance Metric MIPS: Millions of Instructions Per Second Doesn’t account for .Differences in complexity between instructions MIPS = = Instruction count Execution time × 10 6 Instruction count Clock rate = 6 Instruction count × CPI CPI × 10 6 × 10 Clock rate CPI varies between programs on a given CPU Chapter 1.

org There are also benchmark collections for power workloads (SPECpower_ssj2008). of Computer Architecture.Workloads and Benchmarks Benchmarks – a set of programs that form a “workload” specifically chosen to measure performance SPEC (System Performance Evaluation Cooperative) creates standard sets of benchmarks starting with SPEC89. UMA.spec. Sept 2013 . The latest is SPEC CPU2006 which consists of 12 integer benchmarks (CINT2006) and 17 floatingpoint benchmarks (CFP2006). … Chapter 1. www. for mail workloads (SPECmail2008). for multimedia workloads (mediabench). Measuring and understanding performance 46 Dept.

Web.SPEC CPU Benchmark Programs used to measure performance Supposedly typical of actual workload Standard Performance Evaluation Corp (SPEC) Develops benchmarks for CPU. so focuses on CPU performance Normalize relative to reference machine Summarize as geometric mean of performance ratios . UMA. Sept 2013 . … SPEC CPU2006 Elapsed time to execute a selection of programs .CINT2006 (integer) and CFP2006 (floating-point) n n ∏ Execution time ratio i=1 Chapter 1.Negligible I/O. Measuring and understanding performance 48 i Dept. of Computer Architecture. I/O.

specific computer configuration (clock rate. input set used.Comparing and Summarizing Performance How do we summarize the performance for benchmark set with a single number? First the execution times are normalized giving the “SPEC ratio” (bigger is faster. Measuring and understanding performance 49 Dept. cache sizes and speed. compiler settings.)) Chapter 1. Sept 2013 . of Computer Architecture. etc. SPEC ratio is the inverse of execution time) The SPEC ratios are then “averaged” using the geometric mean (GM) n GM = n Π SPEC ratioi i=1 Guiding principle in reporting performance measurements is reproducibility – list everything another experimenter would need to duplicate the experiment (version of the operating system. UMA..e. memory size and speed. i.

1 mcf Combinatorial optimization 336 10.020 9.61 0.40 721 10.40 690 6.900 6.130 22.40 993 22.40 817 9.047 20.3 omnetpp Discrete event simulation 587 2.3 bzip2 Block-sorting compression 2.050 1.40 1.40 637 9.80 0.120 6.96 0.389 0.330 10.082 1.8 gcc GNU C Compiler 1.47 24 8.8 h264avc Video compression 3.40 1.85 0.783 0.250 9.345 9.79 0.8 go Go game (AI) 1.102 0.40 890 9.00 0.490 14.720 19.658 1.777 15.CINT2006 for Opteron X4 2356 Name Description IC×10 CPI Tc (ns) Exec time Ref time SPECratio 9 perl Interpreted string processing 2.40 1.1 xalancbmk XML parsing 1.1 astar Games/path finding 1.70 0.050 11.48 37 12. of Computer Architecture.0 Geometric mean 11.5 libquantum Quantum computer simulation 1.09 0.623 1. UMA.6 hmmer Search gene sequence 2.40 773 7.118 0.94 0.100 14.7 High cache miss rates Chapter 1. Sept 2013 .143 6.650 11.80 0.5 sjeng Chess game (AI) 2.75 0.176 0. Measuring and understanding performance 51 Dept.72 0.058 2.

Sept 2013 . Measuring and understanding performance 54 Dept.Other Performance Metrics Power consumption – especially in the embedded market where battery life is important For power-limited applications. the most important metric is energy efficiency Chapter 1. UMA. of Computer Architecture.

Power Trends In CMOS IC technology Power = Capacitive load × Voltage 2 × Frequency ×30 Chapter 1. Measuring and understanding performance 5V → 1V 55 ×1000 Dept. of Computer Architecture. UMA. Sept 2013 .

of Computer Architecture.85 × (Vold × 0. Measuring and understanding performance 56 Dept.85 4 = = 0. UMA. Sept 2013 .52 2 Pold Cold × Vold × Fold The power wall We can’t reduce voltage further We can’t remove more heat How else can we improve performance? Chapter 1.Reducing Power Suppose a new CPU has 85% of capacitive load of old CPU 15% voltage and 15% frequency reduction Pnew Cold × 0.85) 2 × Fold × 0.85 = 0.

Measuring and understanding performance 57 Dept. Sept 2013 . of Computer Architecture. UMA.Warning: Low Power at Idle X4 power benchmark At 100% load: 295W At 50% load: 246W (83%) At 10% load: 180W (61%) Google data center Mostly operates at 10% – 50% load At 100% load less than 1% of the time Consider designing processors to make power proportional to load Chapter 1.

UMA.Amdahl’s Law Pitfall: Improving an aspect of a computer and expecting a proportional improvement in overall performance TCPU _ org 1 S= = TCPU _ imp Fm + (1− Fm) Sm Fm = Fraction of improvement Sm = Factor of improvement The opportunity for improvement is affected by how much time the modified event consumes Corollary 1: make the common case fast The performance enhancement possible with a given improvement is limited by the amount the improved feature is used Corollary 2: the bottleneck will limit the improv. Chapter 1. Measuring and understanding performance 58 Dept. Sept 2013 . of Computer Architecture.

Sept 2013 . Measuring and understanding performance 59 1 S= = Sm 1 / Sm+ (1−1) S= 1 0 + (1− Fm) Dept. UMA. of Computer Architecture.Amdahl’s Law Example: multiply accounts for 80s/100s How much improvement in multiply performance to get 5× overall? 80 Can’t be done! 20 = + 20 n Study the limit cases in the Amdahl law Fm=0 Sm=1 1 S= =1 0 + (1) Fm=1 1 S= = 1 Sm=inf Fm+ (1− Fm) Chapter 1.

Tendency: the switch to multiprocessors Uniprocessor Performance Constrained by power. Measuring and understanding performance 60 Dept. of Computer Architecture. UMA. memory latency Chapter 1. Sept 2013 . instruction-level parallelism.

Why Multicore improves performance Multicore microprocessors More than one processor per chip Requires explicitly parallel programming Compare with instruction level parallelism .Hidden from the programmer Hard to do . of Computer Architecture.Hardware executes multiple instructions at once . UMA. Sept 2013 . Measuring and understanding performance 61 Dept.Programming for performance .Load balancing .Optimizing communication and synchronization Chapter 1.

UMA.Manufacturing ICs Yield: proportion of working dies per wafer Chapter 1. of Computer Architecture. Sept 2013 . Measuring and understanding performance 62 Dept.

AMD Opteron X2 Wafer X2: 300mm wafer. of Computer Architecture. UMA. Sept 2013 . Measuring and understanding performance 63 Dept. 117 chips. 90nm technology X4: 45nm technology Chapter 1.

UMA. Measuring and understanding performance 64 Dept.Integrated Circuit Cost Cost per wafer Cost per die = Dies per wafer × Yield Dies per wafer ≈ Wafer area Die area 1 Yield = (1+ (Defects per area × Die area/2))2 Nonlinear relation to area and defect rate Wafer cost and area are fixed Defect rate determined by manufacturing process Die area determined by architecture and circuit design Chapter 1. of Computer Architecture. Sept 2013 .

Concluding Remarks Cost/performance is improving Due to underlying technology development Hierarchical layers of abstraction In both hardware and software Instruction set architecture The hardware/software interface Execution time: the best performance measure Power is a limiting factor Use parallelism to improve performance Chapter 1. UMA. Measuring and understanding performance 65 Dept. of Computer Architecture. Sept 2013 .