You are on page 1of 993
TRADEMARKS “The following trademarks are the property of the following organizations: “TeX isa trademark of Americal Mathematical Society. ‘Apple Il and Macintosh are trademarks of Apple Computes Inc. ‘CDC 6400, CDC 7500, CDC STAR-100, CYBER-I80, CYBER: 180/980, and CYBER.205 are trademarks of Control Data Corpora- tion. “The Cosmic Cube isa trademark of California Instiate of Technol- ony. (CP3100 i trademark of Conner Peripherals Cray, CRAY-1, CRAY J90, CRAY 190, CRAY X-MP/416, and CRAY Y-MP are trademarks of Cray Research, ‘Alpha, AiphaServer, AlpsStation, DEC, DECéystem, DECeystem 3100, DECstation, PDP“, PDP-11, Unibus, VAX, VAX 8700, and ‘VAXI1/780 ae trademarks of Digital Equipment Corporation. [MP2361A, Super Eagle, VP100,VP200, and VPP200 are trademarks ‘of Fujitsu Corporation. ‘Gru C Compiler is a trademark of Free Software Foundation. ‘Goodyear MPP isa trademark of Goclyear Tire and Rubber Co., Ine. ‘Apollo DN 300, Apollo DN 10000, Convex, HP, HP Precision ‘Architecture, HPPA, HP8S0, HP 3000, HP 300/70, PA-RISC, and Precision are registered trademarks of Hewlet-Packard Company. 482, 960CA. 4004, 8008, 8080, 8086, 887, 888, $0186, 80286, 80386, ‘8486, Dela, iAP 452, 1860, Ine, Intel, Intel Hypercube, SC/2, MMX, Multibus, Multibus Il, Paragon, and Pentium are trademarks of Intel Corporation. Intel Inside is a registered trade- ‘mark of Intl Corporation, 340, 360/20, 360/40, 360/80, 360/65, 360/85 360/91,370, 370/158, ‘370/165, 370/168, 370-XA, ESA/370 701, 74, 708,801, 3033, 3080, ‘3080 series, 3080 VF, 3081, 3080, 3090/00, 3050/200, 090/400, 080/600 3090/60S, 3080 VE, 330, 338, 33800, 3380 Disk Model 'AKS, 338, 3990, 3880-23, 3990, 709, 7054, IBM, IBM PC, IBM PC- AT,IBMSVS, ISAM, MVS, PLS, Power?C, POWERstation, RT-PC, RAMAC, RS/6000, Sage, Stretch, System /360, Vector Fality, and VM are trademarks of International Business Machines Corpore- tion. POWERserver, RISC System/4000, and SP2 are registered trodemarks of International Business Machines Corporation. ICL DAPisa trademark of International Computers Limited. Inmos and Transputer are trademaks of mos FtureBus isa trademark ofthe institute of Electrical and Hlectron- ie Engineers KSR-1 isa trademark of Kendall Square Research MASPAR MP-1 and MASPAR MP2 are trademarks of MasPar Corporation. MIPS, R2000, R3000, and R10Q00 are registered trademarks of MIPS Technology, Inc Windows is trademark of Microsoft Corporation. [NuBus isa trademark of Massachusetts Institute of Technology. Delta Series S608, System V/88 R3ZVI, VME bus, 6808, 68000, (68010, e120, e830, GBS, 68852, $900, 89000 1.84mt4, 88100, ‘and 88200 are trademarks of Motorola Corporation, "Neubeand nCube/ten are trademarks of Neube Corporation. [NEC is registered trademark of NEC Corporation. [Network Computer i trademark of Oracle Corporation Parsytec OC isa trademark of Parsytec Inc Imprimis, P12, Sabre, Sabre 97208, Seagate, and Wren 1V are ‘trademarks of Seagate Technology, I: NUMA. Sequent, and Symmetry are trademarks of Sequent ‘Computers Power Challenge, Silicon Graphics, Silicon Graphics 43/240, Sion Graphics 4D/60, Silicon Graphics 4D/240, and Silicon Graphics 4D Series ae trademark of Silicon Graphics. Origin2000 isa registered trademark of Silicon Graphics. ‘SPEC i registered trademark ofthe Standard Performance Eval- ‘ution Corporation ‘Spice ia trademark of University of Califia at Berkey. Enterprise Java, Sun, Sun Ultra, Sun Microsystems, and Ura are trademarks of Sun Microsystems, Inc. SPARC and UltraSPARC te registered trademarks of SPARC International, Inc, license to ‘Sen Microsystems, Ine ‘Connection Machine, CM-2, and CMS are trademarks of Thinking Machines ‘Burroughts 6500, B00), 85500, D-machine, UNIVAC, UNIVAC 1, and UNIVAC 1103 are trademarks of UNISYS. {Aito, PARC, Palo Alto Research Center, and Xerox ae trademarks of Xerox Corporation, ‘The UNIX trademark is Heensed exclusively through X/Open Company Lid [Allother product names ae trademarks or registered trademarks oftheir respective companies. Where trademarks appear in this tookand Morgan Kaufmann Publishers was aware of trademark ‘aim, the trademarks have been printed in inital caps oral caps. SEC OND Om mn + 1% Computer Organization and Design THE HARDWARE/SOFTWARE INTERFACE John L. Hennessy Stanford University David A. Patterson University of California, Berkeley With a contribution by James R. Larus University of Wisconsin r ™ <4 Morgan Kaufmann Publishers, Inc. San Francisco, California ‘Sponsoring Editor Denise Penrose Production Manager Yonie Overton Production Editor Julie Pabst Editorial Coordinator Jane Eliott ‘Text and Cover Design Ross Carron Design IMustration Alexander Teshin Associates, with second edition modification by Dartmouth, Publishing Inc. chapter Opener illustrations Canary Studios Copyeditor Ken Dellaenta Composition Nancy Logan Proofreader Jennifer McClain Indexer Steve Rath Printer Courier Corporation ‘Morgan Kaufmann Publishers, Inc. Fditorial and Sales Office: 340 Pine Street, Sixth Floor ‘San Francisco, CA 94104-3205 USA, ‘Telephone 415/392-2665 Facsimile 415/982-2665 Email mp@mkp com WWW htip/oo.mkp com Order tll fee 800/745-7323 (© 1998 by Morgan Kaufmann Publishers, Inc Alrghts reserved Printed in the United States of America anow 5432 [No pat ofthis publication may be reproduced, stored in a retrieval system, o transmitted in any form, fr by any means-—electroni, mechanical, photocopying, recording, or otherwise—without the prior ‘written permission ofthe publisher. |Advice, Praise, and Errors: Any correspondence related to ths publication or intended forthe authors should be sent electronically toco2bugs@nkp.com. Information regarding error sightings is encouraged. ‘Any error sightings that are accepted for correction in subsequent printings will be rewarded by the ‘authors with a payment of $1.00 (U.S.) per correction atthe time oftheir implementation ina reprint. Library of Congress Cataloging-in-Publication Data Patterson, David A. ‘Computer organization and design: the hardware/software interface 7 David A. Patterson, John L. Hennessy —2nd ed. \ | | Ince! ilingzaphicl references and index ISBN 1 Sate (cloth) ISBN 185860491 (paper) 1 Compute organisation. 2. Computer-—Desigh and Computer interac: "I Henney, John Tle Qarescente ser our dean 9716050 TO LINDA AND ANDREA Foreword by John H. Crawford Intel Fellow, Director of Microprocessor Architecture Intel Corporation, Santa Clara, California Computer design is an exciting and competitive discipline. The microproces- sor industry is on a treadmill where we double microprocessor performance every 18 months and double microprocessor complexity—measured by the number of transistors per chip—every 24 months. This unprecedented rate of change has been evident for the entire 25-year history of the microprocessor, and it promises to continue for many years to come as the creativity and ‘energy of many people are harnessed to drive innovation ahead in spite of the challenge of ever-smaller dimensions. This book trains the student with the concepts needed to lay a solid foundation for joining this exciting field. More importantly, this book provides a framework for thinking about computer organization and design that will enable the reader to continue the lifetime of learning necessary for staying at the forefront of this competitive discipline. ‘The text focuses on the boundary between hardware and software and ex- plores the levels of hardware in the vicinity of this boundary. This boundary is captured in a computer's architecture specification. It is a critical boundary for a successful computer product: an architect must define an interface that can be efficiently implemented by hardware and efficiently targeted by com- pilers. The interface must be able to retain these efficiencies for many genera~ tions of hardware and compiler technology, much of which will be unknown at the time the architecture is specified. This boundary is central to the disci- pline of computer design: it is where compilation (in software) ends and inter- pretation (in hardware) begins. This book builds on introductory programming skills to introduce the con- cepts of assembly language programming and the tools needed for this task: the assembler, linker, and loader. Once these prerequisites are completed, the remainder of the book explores the first few levels of hardware below the ar- chitectural interface. The basic concepts are motivated and introduced with clear and intuitive examples, then elaborated into the “real stuff” used in to- day’s modem microprocessors. For example, doing the laundry is used as an analogy in Chapter 6 to explain the basic concepts of pipelining, a key tech- nique used in all modern computers. In Chapter 4, algorithms for the basic Foreword vil floating-point arithmetic operators such as addition, multiplication, and divi- sion are first explained in decimal, then in binary, and finally they are elabo- ated into the best-known methods used for high-speed arithmetic in today’s ‘computers. New to this edition are sections in each chapter entitled “Real Stuff.” These sections describe how the concepts from the chapter are implemented in com- mercially successful products. These provide relevant, tangible examples of the concepts and reinforce their importance. As an example, the Real Stuff in Chapter 6, Enhancing Performance with Pipelining, provides an overview of a dynamically scheduled pipeline as implemented in both the IBM/Motorola PowerPC 604 and Intel’s Pentium Pro microprocessor. The history of computing is woven as a thread throughout the book to re- ward the reader with a glimpse of key successes from the brief history of this young discipline. The other side of history is reported in the Fallacies and Pit- falls section of each chapter. Since we can learn more from failure than from success, these sections provide a wealth of learning! The authors are two of the most admired teachers, researchers, and practi- tioners of the art of computer design today. John Hennessy has straddled both sides of the hardware/software boundary, providing technical leadership for the legendary MIPS compiler as well as the MIPS hardware products through many generations. David Patterson was one of the original RISC proponents: he coined the acronym RISC, evangelized the case for RISC, and served as a key consultant on Sun Microsystem’s SPARC line of processors. Continuing talent for marketable acronyms, his next breakthrough was RAID (Redun- dant Arrays of Inexpensive Disks), which revolutionized the disk storage in- dustry for large data servers, and then NOW (Networks of Workstations). Like other great “software” products, this second edition went through an extensive beta testing program: 13 beta sites tested the draft manuscript in classes to “debug” the text. Changes from this testing have been incorporated into the “production” version. Patterson and Hennessy have succeeded in taking the first edition of their excellent introductory textbook on computer design and making it even better. This edition retains all of the good points of the original, yet adds significant new content and some minor enhancements. What results is an outstanding in- troduction to the exciting Contents Foreword vi by John H. Crawford Worked Examples xii Computer Organization and Design Online wi Preface xix CHAPTERS B Computer Abstractions and Technology 2 1.1 Introduction 3 1.2 Below Your Program 5 4.3 Under the Covers 10 4.4 Integrated Circuits: Fueling Innovation 21 4.5 Real Stuff: Manufacturing Pentium Chips 24 1.6 Fallacies and Pitfalls 29 1.7. Concluding Remarks 30 18 Historical Perspective and Further Reading 32 1.9 Key Terms 44 1.10 Exercises 45 B The Role of Performance 52 2.4 Introduction 54 2.2 Measuring Performance 58 2.3 Relating the Metrics 60 2.4 Choosing Programs to Evaluate Performance 66 2.5 Comparing and Summarizing Performance 69 2.6 Real Stuff: The SPEC95 Benchmarks and Performance of Recent Processors 71 Fallacies and Pitfalls 75 Concluding Remarks 52 27 Contents: ix 2.9 Historical Perspective and Further Reading &3 2.10 Key Terms 39 2.14 Exercises 90 B Instructions: Language of the Machine 10: 3.1. Introduction 106 3.2 Operations of the Computer Hardware 107 3.3 Operands of the Computer Hardware 109 3.4 Representing Instructions in the Computer 116 3.5 Instruetions for Making Decisions 122 Supporting Procedures in Computer Hardware 132 Beyond Numbers 142 Other Styles of MIPS Addressing 145 Starting. 156 An Example to Put It All Together 163 Arrays versus Pointers 171 3.12 Real Stuff: PowerPC and 80x86 Instructions 175 3.13 Fallacies and Pitfalls 185 3.14 Concluding Remarks 187 3.15 Historical Perspective and Further Reading 159 3.16 Key Terms 196 3.47 Exercises 196 Gg Arithmetic for Computers 205 4.4 Introduction 210 4.2. Signed and Unsigned Number 4.3 Addition and Subtraction 220 4.4 Logical Operations 225 4.5 Constructing an Arithmetic Logic Unit 230 4.6 Multiplication 250 4.7 Division 265 4.8 Floating Point 275 4.9 Real Stuff: Floating Point in the PowerPC and 80x86 301 4.10 Fallacies and Pitfalls 304 4.11 Concluding Remarks 308 4.12 Historical Perspective and Further Reading 312 4.13 Key Terms 322 4.14 Exercises 322 210 B The Processor: Datapath and Control 336 5.1 Introduction 338 5.2 Building a Datapath 343 5.3 A Simple Implementation Scheme 351 OC EE x Contents ‘A Multicycle implementation 377 Microprogramming: Simplifying Control Design 399 Exceptions 410 | Real Stuff: The Pentium Pro Implementation 416 Fallacies and Pitfalls 419 5.9 Concluding Remarks 421 5.20 Historical Porspective and Further Reading 423 5.1 Key Terms 426 8.12 Exercises 427 Gg Enhancing Performance with Pipelining 434 6.1 An Overview of Pipelining 436 Pipelined Datapath 449 Pipelined Control 466 Data Hazards and Forwarding 476 Data Hazards and Stalls 459 Branch Hazards 496 Exceptions 505 ‘Superscalar and Dynamic Pipelining 510 Real Stuff: PowerPC 604 and Pentium Pro Pipelines 517 Fallacies and Pitfalls 520 Concluding Remarks 521 Historical Perspective and Further Reading 525 Key Terms 529 Exercises 529 Large and Fast: Exploiting Memory Hierarchy 535 7.1 Introduction 540 7.2 The Basics of Caches 545 7.3 Measuring and Improving Cache Performance 564 74 Virtual Memory 579 7.5 ACommon Framework for Memory Hierarchies 603 7.6 Real Stuff: The Pentium Pro and PowerPC 604 Memory Hierarchies 611 7.7 Fallacies and Pitfalls 615, 7.8 Concluding Remarks 618 Historical Perspective and Further Reading 621 7.10 Key Terms 627 TAL Exercises 628 a Interfacing Processors and Peripherals 636 8.1 Introduction 638 8.2 1/0 Performance Measures: Some Examples from Disk and File Systems 641 8.3 Types and Characteristics of 1/0 Devices 644 Contents xi 8.4 Buses: Connecting 1/0 Devices to Processor and Memory 655, 8.5 Interfacing 1/0 Devices to the Memory, Processor, and Operating ystem 67: Designing an 1/0 System 654 Real Stuff: A Typical Desktop 1/0 System 687 Fallacies and Pitfalls 638 1.9 Coneluding Remarks 69) -10 Historical Perspective and Further Reading 694 .11 Key Terms 700 12 Exercises 700 i Multiprocessors 710 9.4 Introduction 712 9.2 Programming Multiprocessors 714 9.3 Multiprocessors Connected by a Single Bus 717 4 Multiprocessors Connected by a Network 727 Clusters 734 Network Topologies 736 Real Stuff: Future Directions for Multiprocessors 740 Fallacies and Pitfalls 743 9.9 Concluding Remarks—Evolution versus Revolution in Computer Architecture 746 9.10 Historical Perspective and Further Reading 748, 9.11 Key Terms 756 9.42 Exercises 756 APPENDICES g Assemblers, Linkers, and the SPIM Simulator 4.2 by James R. Larus, University of Wisconsin A.1 Introduction A-3 A2 Assemblers A-10 A.3 Linkers A-17 A.4 Loading A-19 AS Memory Usage A-20 6 Procedure Call Convention A-22 A.7_ Exceptions and Interrupts A-3 A.8 Input and Output A-36 A9 SPIM A-38 A.10 MIPS R2000 Assembly Language A-i9 A.11 Concluding Remarks | A-75 A.12 Key Terms A-76 A.13 Exercises A-76 (eee aa nasanesaaaatae Gi The Basics of Logic Design 5-2 Introduction 5-3 Gates, Truth Tables, and Logic Equations B-4 Combinational Logic 8-8 Clocks 5-18 Momory Elements 8-21 Finite State Machines 5-35 Timing Methodologies 5-39 Concluding Remarks 5-14 Key Terms 5-45 B.10 Exercises 5-15 G Mapping Control to Hardware C2 Translating a Microprogram to Hardware | C-25 Concluding Remarks C-31 Key Terms C-32 Exercises C32 Glossary Gi Index xii Worked Examples Chapter 2: The Role of Performance ‘Throughput and Response Time 56 Relative Performance 57 Improving Performance 60 Using the Performance Equation 62 Comparing Code Segments 64 S as a Performance Measure 78 Chapter 3: Instructions: Language of the Machine Compiling Two C Assignment Statements into MIPS 108 ‘Compiling a Complex C Assignment into MIPS 109 ‘Compiling a C Assignment Using Registers 110 ‘Compiling an Assignment When an Operand Is in Memory 112 ‘Compiling Using Load and Store 113 Compiling Using a Variable Array Index 114 Translating a MIPS Assembly Instruction into a Machine Instruction 117 ‘Translating MIPS Assembly Language into Machine Language 119 ‘Compiling an IfStatement into a Conditional Branch 123 Compiling ifthen-else into Conditional Branches 124 ‘Compiling a Loop with Variable Array Index. 126 Compiling a while Loop 127 Compiling a Less Than Test 128 Compiling a switch Statement by Using a Jump Address Table 129 Compiling a Procedure that Doesn’t Call Another Procedure 134 ‘Compiling a Recursive Procedure, Showing Nested Procedure Linking 136 Compiling a String Copy Procedure, Showing How to Use C Strings 143 ‘Translating Assembly Constants into Machine Language 145 Loading a 32-Bit Constant 147 Showing Branch Offset in Machine Language 149 Branching Far Away 150 Decoding Machine Code 154 Linking Object Files 160 ‘Compiling an Assignment Statement into Accumulator Instructions 190 ‘Compiling an Assignment Statement into Memory-Memory Instructions 192 ‘Compiling an Assignment Statement into Stack Instructions 193 Chapter 4: Arithmetic for Computers. ASCII versus Binary Numbers 212 Binary to Decimal Conversion 214 Signed versus Unsigned Comparison 215 Negation Shortcut 216 Sign Extension Shortcut 217 Binary-to-Hexadecimal Shortcut 218 Qi Worked Examples Binary Addition and Subtraction 220 | CBit Fields 29 ‘Both Levels ofthe Propagate and Generate 247 Speed of Ripple Carry versus Carry Lookahead 248 First Multiply Algorithm 253 ‘Second Multiply Algorithm 256 ‘Third Multiply Algorithm 257 Booth’s Algorithm 261 ‘Multiply by 2i via Shift 262 First Divide Algorithm 268 ‘Third Divide Algorithm 271 Floating-Point Representation 279 Converting Binary to Decimal Floating Point 280 Decimal Floating-Point Addition 282 ‘Decimal Floating-Point Multiplication 287 1g Floating-Point C Program into MIPS Assembly Code 293 Floating-Point C Procedure with Two-Dimensional Matrices into MIPS 294 Rounding with Guard Digits 297 Chapter 5: The Processor: Datapath and Control ‘Composing Datapaths 351 Implementing Jumps 370 Performance of Single-Cycle Machines 373 Performance of a Single-Cycle CPU with Floating-Point Instructions 375. CPLin a Multicyele CPU 397 Chapter 6: Enhancing Performance with Pipelining ‘Single-Cycle versus Pipelined Performance 438 Stall on Branch Performance 442 Forwarding with Two Instructions 446 Reordering Code to Avoid Pipeline Stalls 447 Labeled Pipeline Execution, Including Control 471 Dependency Detection 479 Forwarding, 485 Pipelined Branch 498 ‘Loops and Prediction 501 ‘Comparing Performance of Several Control Schemes 504 Exception ina Pipelined Computer 507 Simple Superscalar Code Scheduling 513 Loop Unrolling for Superscalar Pipelines 513, Chapter 7: Large and Fast: Exploiting Memory Hierarchy Bits in a Cache 550 Mapping an Address to a Multiword Cache Block 556 CCaleulating Cache Performance 565, Cache Performance with Increased Clock Rate 567 Associativity in Caches 571 Size of Tags versus Set Associativity 575 Performance of Multilevel Caches 576 (Overall Operation of a Memory Hierarchy 595 Chapter 8: Interfacing Processors and Peripherals. Impact of /O on System Performance 639 Disk Read Time 648 Worked Examples xv Performance of Two Networks 654 FSM Control for 1/662 Performance Analysis of Synchronous versus Asynchronous Buses 662 Performance Analysis of Two Bus Schemes 665 ‘Overhead of Polling in an I/O System 676 Overhead of Interrupt-Driven I/O 679 Overhead of 1/0 Using DMA 681 1/O System Design 685 Chapter 9: Multiprocessors Speedup Challenge 715 Speedup Challenge, Bigger Problem 716 Parallel Program (Single Bus) 718 Parallel Program (Message Passing) 729 Appendix A: Assemblers, Linkers, and the SPIM Simulator Local and Global Labels. A-11 String Directive A-15 Macros A-15 Stack in Recursive Procedure A-28 Interrupt Handler A-34 Appendix B: The Basics of Logic Design Truth Tables BS Logic Equations B-6 Sum of Products B-I PLAs Ba3 Don’t Cares B-16 Appendix C: Mapping Control to Hardware Logic Equations for Next-State Outputs C-12 Control ROM Entries C- Computer Organization and Design Online [Allof the following resources are available at http://www.mkp.com|cod2e him. Web Extensions ‘These materials are extensions of the book's content. Web Extension I: Survey of RISC Architectures Supplies current detailed information for several RISC architectures. = Desktop RISC Architectures (Alpha, PA-RISC, MIPS, PowerPC, SPARC) m= Embedded RISC Architectures (ARM, Hitachi SH4, MIT M32R, MIPS 16, Thumb) Web Extension II: Introducing C to Pascal Programmers Provides Pascal programmers with a quick reference for understanding the C code in the text. Variable Declarations ™ Assignment Statements m Relational Expressions and Conditional Statements = Loops = Examples to Put It All Together @ Exercises Web Extension Architecture—VAX Another Approach to Instruction Set Presents an example of a CISC computer architecture for comparison with the MIPS architecture described in the text. ™ VAX Operands and Addressing Modes = Encoding VAX Instructions m= VAX Operations = An Example to Put it All Together: swap OES (,._ Computer Organization and Design Online xvii @ A Longer Example: sort i Fallacies and Pitfalls @ Historical Perspective and Further Reading m Exercises Supplements This electronic support package includes files that can be viewed and down- loaded in a number of formats. Lecture slides Electronic versions of text figures Instructors Manual Links to course home pages from selected schools Instructions for using new DOS and Windows versions of PCspim simu- lators Links to SPIM simulators (see page xviii) Resources Multiprocessors Page Extends Chapter 9's coverage of real machines by providing links to companies that manufacture current multiprocessor machines, Discussion Group Provides readers with the opportunity to exchange ideas and informa- tion related to the book. The SPIM Simulator Developed by James R. Larus, the SPIM $20 is a software simulator that runs assembly language programs for the MIPS R2000/R3000 RISC computers. It can read and run MIPS a.out files (when compiled and running on a system containing a MIPS processor). SPIM is a self-con- tained system that contains a debugger and an interface to the operating system. SPIMis portable; it has run on a DECStation 31000/51000, Sun 3, Sun 4, PC/RT, IBM RS/6000, HP Bobcat, HP Snake, and Sequent. Students can generate code for a simple, clean, orthogonal computer, regardless of the machine used. SPIM comes with complete source code and documenta- tion of all instructions. SPIM can be downloaded in versions for DOS, Windows, and UNIX, either from www.mkp.com/cod2e,htm or by direct ftp. i ee ea) Computer Organization and Design Online Retrieval of SPIM by ftp SPIM is available for anonymous ftp from pubjspim|spim.tar.Z (this is a compressed tar file). For those who are unfamiliar with command-line anonymous ftp, here are the steps to follow to get a copy of your preferred version of SPIM. 1. ftp to fip.cs.wiscedu from your computer: fipeswiscedu in the file % ftp ftp.cs.wisc.edu 2. The ftp server will respond and ask you to log in. Log in as anonymous and use your email address as a password: Name (ftp.cs.wisc.edu:larus): anonymous 331 Guest login ok, send login or email address as password Password: 3, The server will then print a welcome message. Change to the directory containing spim: ftp> cd pub/spim 4, Set binary mode for the transfer (since the file is compressed): ftp> binary 5. Choose the file appropriate for your machine and copy: ftp> get spim.tar.Z (UNDO) ftp> get PCspim.zip (Windows) ftp> get PCspim-dos.zip (DOS) 6. Exit the ftp program: ftp> quit 7. Uncompress and untar the file: % uncompress spim.tar.Z % tar xvf spim.tar If the uncompression fails, you probably for i got to set binary (step 4). Tr again. There are directions in the file README. Ro eiee Preface ‘The most beautiful thing we can experience is the mysterious. It is the source of all true art and science. Albert Einstein, What I Believe, 1930 About This Book We believe that learning in computer science and engineering should reflect the current state of the field, as well as introduce the principles that are shap- ing computing. We also feel that readers in every specialty of computing need to appreciate the organizational paradigms that determine the capabilities, performance, and, ultimately, the success of computer systems. Modern computer technology requires professionals of every computing specialty to understand both hardware and software. The interaction between hardware and software ata variety of levels also offers a framework for under- standing the fundamentals of computing. Whether your primary interest is computer science or electrical engineering, the central ideas in computer orga- nization and design are the same. Thus, our emphasis in this book is to show the relationship between hardware and software and to focus on the concepts that are the basis for current computers Traditionally, the competing influences of assembly language, organiza tion, and design have encouraged books that consider each area as a distinct subset. In our view, such distinctions have increasingly lost meaning as com- puter technology has advanced. To truly understand the breadth of our field, it is important to understand the interdependencies among these topics. The audience for this book includes those with little experience in assembly language or logic design who need to understand basic computer organization as well as readers with backgrounds in assembly language and /or logic design who want to learn how to design a computer or understand how a system works and why it performs as it does. Changes for the Second Edition We had six major goals for the second edition: tie the ideas from the book more closely to the real world; enhance how well the book works for begii ners; extend the book material using the World Wide Web; improve qualit improve pedagogy; and finally, update the technical content to reflect changes Ee 2S Preface in the industry since the publication of the first edition in 1994—the conven- tional reason for a new edition. Tirst, tomake the examples in the book even more concrete and connected with the real world, in each chapter we explained how the ideas were realized in the latest microprocessors from Intel or from IBM/Motorola. Hence you can learn how the mechanisms discussed are used in the computer on your desk- top. Each chapler has a new section called “Real Stuff” that ties the ideas you read about to the machine you probably use everyday. ; ’ Second, we wanted the book to work better for readers interested in an overview of computer organization. Each chapter now has a list of the key terms discussed in the chapter, and we added a glossary of more than 300 def- initions. We also rely on analogies from everyday life to explain subtleties of computers: = commercial airplanes to show how performance differs if measured as bandwidth or latency the stealth of spies to explain procedure invocation and nesting plumbing to show how carry-lookahead logic works the laundry room to explain pipeline execution and hazards a desk ina library to demonstrate principles of memory hierarchy the management overhead as committees grow to illustrate the diffi- culty of achieving high performance in large-scale multiprocessors ‘More specifically, we added more assembly language programming examples and more explanation in each example to help the beginner understand assembly language programming in Chapters 3 and 4. We also added an introductory section to the pipelining chapter (Chapter 6) that allows under- standing of the important ideas and issues in pipeline design without having to delve into the details of a pipelined datapath and control. Our third goal was to go beyond the limitations of a printed book by adding descriptions and links on the World Wide Web. Throughout this book, you will often see the “Web Enhanced” icon shown at the left. Wherever this icon appears, you can go to http://www.mkp.com|cod2e.htm to find materials related to the text. ‘The WWW lets us give examples of recent, relevant machines so that you can see the latest versions of the ideas in the book. For example, we've added anew online appendix (Web Extension 1) comparing RISC architectures. Other examples include links for specific references in the book to other sites; instruc- tions on how to use PCspim, the new DOS and Windows versions of the SPIM simulator, as well as links to all the versions of SPIM; access to all the figures from the book; lecture slides; links to instructors’ home pages; and an online Instructors Manual. We also included some appendices from the first edition (Web Extensions II and III) that you may find valuable. We intend to update these pages periodically to make new and better links. ee Protace aod Fourth, we wanted to significantly reduce the flaws that creep into a book during the revision process. The first edition of the book used beta testing to see which ideas worked well and which did not, and we were very happy with the improvements asa result. We did the same with the second edition. To fur- ther reduce the chances of bugs in the book, we gave ourselves a longer devel- ‘opment cycle and involved many more computer architects in its preparation, First, Tod Amon completely revised all exercises, in part based on suggestions of exercises by a dozen instructors. The book now has 30% new exercises and another 30% that have been reworked for a total of 400. We believe that they are much more clearly worded than before and that there is sufficient variety for a broader group of students. Second, Kent Wilken carefully read the beta edition, suggesting hundreds of improvements. After we revised the beta edi- tion, George Adams gave another very careful read of our revision, again mak- ing hundreds of useful suggestions. Finally, we reviewed the copyedit and ead the page proof to try to catch mistakes that can creep in during the book production process. Although we are sure there must still be bugs for which you can get rewards, we believe this edition is far cleaner than the first The fifth goal was to improve the exposition of the ideas in the book, based on difficulties mentioned by readers of the first edition. We expanded the sec- tion of Chapter 3 explaining procedures, showing the procedure infrastructure in a longer sequence of examples. Chapter 4 has a longer description of carry lookahead and carry save adders. We simplified the explanation of the multi- cycle datapath in Chapter 5 by adding several registers. Chapter 6 actually got a good deal shorter by adding an overview section, since it allowed us to re- duce the number of examples in the detailed pipelining sections. We also made numerous changes in the pipeline diagrams to make them easier to under- stand and more consistent. Chapter 7 was reorganized to putall caches togeth- er before moving to virtual memory and then translation buffers, coming back to the commonalities at the end. We also changed the emphasis from virtual memory as simply another level of the hierarchy to the hardware enforcer of protection. Chapter 8 was refocused to be more quantitative and design orient- ed. Chapter 9 was completely rewritten and retitled, reflecting the dramatic change in the parallel processing industry since 1994. Finally, in the interval since the first edition of this book, a computer has run a program at the rate of 1 teraFLOPS—a trillion floating-point operations per second ora million floating-point operations per microsecond, another com- puter has played better chess than the best human being, and the whole world is more closely connected thanks to the World Wide Web. These events oc- curred in part because computer designers have first improved performance of a single computer by a factor of 100 in the last 10 years and then harnessed together many of them to achieve even greater performance. We have includ- ed descriptions of new ideas that helped make these miracles occur, such as Preface branch prediction and out-of-order execution in Chapter 6, multilevel and onblocking caches in Chapter 7, switched networks and new buses in Chap- ter 8, and nonuniform-memory-access, shared-memory multiprocessors and clusters in Chapter 9. Supplements and Web Extensions {A directory of the Web supplements, extensions, and resources appears on page xvi. In it you'll find a complete electronic supplements package, as well yee variety of materials and resources designed to support this text that you ‘can access on the publisher's World Wide Web site at www.mkp.com/ cod2e htm. Included in the supplements package is an online Instructors Manual. The Instructors Manual contents are available from the Web site with the excep- tion of the solutions. Instructors should contact the publisher directly to obtain access to solutions. If they prefer, instructors may choose a printed Instructors Manual that in- cludes chapter objectives, teaching hints, and critical points for each chapter as (well as solutions to the exercises. Instructors should contact the publisher di- rectly to obtain the printed Instructors Manual. Relationship to CA:AQA Some readers may be familiar with Computer Architecture: A Quantitative ‘Approach, Our motivation in writing that book was to describe the principles of computer architecture using solid engineering fundamentals and quantita- tive cost performance trade-offs. We used an approach that combined exam- ples and measurements, based on commercial systems, to create realistic design experiences. Our goal was to demonstrate that computer architecture could be learned using, scientific methodologies instead of a descriptive approach. ‘A majority of the readers for Computer Organization and Design: The Hard- cware/Software Interface do not plan to become computer architects. The perfor- mance of future software systems will be dramatically affected, however, by how well software designers understand the basic hardware techniques at work in a system. Thus, compiler writers, operating system designers, data- base programmers, and most other software engineers need a firm grounding in the principles presented in this book. Similarly, hardware designers must understand clearly the effects of their work on software applications. Thus, we knew that this book had to be much more than a subset of the ma- terial in Computer Architecture. We've approached every topic in a new way. Topics shared between the books were written anew for this effort, while many other topics are presented here for the first time. To further ensure the uniqueness of Computer Organization and Design, we exchanged the writing re sponsibilities we assigned to ourselves for Computer Architecture. The topics Preface xl that Hennessy covered in the first book were written by Patterson in this one, and vice versa. Several of our reviewers suggested that we call this book “Computer Organization: A Conceptual Approach” to emphasize the signifi- cant differences from our other book. It is our hope that the reader will find new insights in every section, as well as a more tractable introduction to the abstractions and principles at work in a modern computer. We were so happy with Computer Organization and Design that the second edition of Computer Architecture was revised to remove most of the introducto- ty material, hence the there is much less overlap today than with the first edi- tions of both books. Learning by Evolution It is tempting for authors to present the latest version of a hardware concept and spend considerable time explaining how these often sophisticated ideas work. We decided instead to present each idea from its first principles, emphasizing the simplest version of an idea, how it works, and how it came to be. We believe that presenting the fundamental concepts first offers greater insight into why machines look the way they do today, as well as how they might evolve as technology changes. To facilitate this approach, we have based the book upon the MIPS proces- sor. It offers an easy-to-understand instruction set and can be implemented in a simple, straightforward manner. This allows readers to grasp an entire ma- chine organization and to follow exactly how the machine implements its in- structions. Throughout the text, we present the concepts before the details, building from simpler versions of ideas to more complex ones. Examples of this approach can be found in almost every chapter. Chapter 3 builds up to MIPS assembly language starting with one simple instruction type. The con- cepts and algorithms used in modern computer arithmeticare built up starting, from the familiar grade school algorithms in Chapter 4. Chapters 5 and 6 start from the simplest possible implementation of a MIPS subset and build to a ful- ly pipelined version. Chapter 7 illustrates the abstractions and concepts in memory hierarchies by starting with the simplest possible cache, then extend- ing it, and then covering virtual memory and TLBs using the same ideas. This evolutionary process is used extensively in Chapters 5 and 6, where the complete datapath and control for a processor are presented. Since learn- ing isa visual process, we have included sequences of figures that contain pro- gressively more detail or show a sequence of events within the machine. We have also used a second color to help readers follow the figures and sequences of figures. Learning from this Book Our objective of demonstrating first principles through the interrelationship of hardware and software is enhanced by several features found in each Preface chapter, The Hardware/Software Interface sections 4 used to highlight theve relationships. We've also included Big Picture sections for each chapter tow nind readers of the major insights. And esmentioned sbove, ssi ap- tor has a Real Stuff section to tie concepts to mechanisms found in current desktop computers. We hope that these elements reinforce our goal of making us bebic equally valuable as a foundation for further study in both hardware and software courses. ‘To illustrate the relationship between high-level language and machine language and to describe the hardware algorithms, we have chosen C.Itis widely used in compiler and operating system courses, itis widely used by computer professionals, and several facilities in the language make it suitable for describing hardware algorithms. For those who are familiar with Pascal ‘ther than Web Extension Il, found at roww.mkp.comcod2e htm provides a {quick introduction toC for Pascal programmers and should be sufficient toun- derstand the code sequences in the text. We have tried to manage the pace of the presentation for readers of varying experience, Ideas that are not essential to a newcomer, but which may be of Satorest to the more advanced reader, are set off from the main text and pre: sented as elaborations. When appropriate, advanced concepts have been saved for the exercise sets and enhanced with additional discussion as In More| Depth sections, In addition, we found that the extent of background that students have in logic design varies widely. Thus, Appendix B provides all the neces- sary background for those readers not versed in the basics of logic design, as ‘well as some slightly more sophisticated material for the more advanced stu dent. Within a course, this material can be used in supplementary lectures or incorporated into the mainstream of the course, depending on the background of the students and the goals of the instructor. We have also found that readers enjoy learning the history of the field, so the Historical Perspective sections include many photographs of important machines and little known stories about the ideas behind them. We hope that the perspective offered by these anecdotes and photographs will add a new di- mension for our readers. Course Syllabi and this Book One particularly difficult issue facing instructors is the balance of assembly language programming with computer organization. We have written this book so that readers will learn more about organization and design, whilestill providing a complete introduction to assembly language. By using a RISC architecture, students can learn the basics of an instruction set and assembly language programming in less time than is typically reserved in the curricu- lum for CISC-based assembly courses. Many instructors have also found that using a simulator, rather than running in native mode on a real machine, pre- eye de ee ee nn vides the experience of assembly language programming in substantially less time (and with less pain for the student). SPIM is the simulator of the MIPS processor developed by James R. Larus. The publisher's Web site at www.mkp.com/cod2e.htm has links to spim and xspim, which were developed by Larus to run on UNIX, and to PCspim (DOS) and PCspim (Windows), which were adapted from the Unix versions by David Carley. Although not identical to the Unix versions, the DOS and Win- dows versions offer the same general functionality. PCspim (Windows) will run under Windows 3.1, Windows 95, and Windows NT. We feel this wi enhance student opportunities for learning about computer organization (see Appendix A). Finally, stepwise derivation of assembly from a high-level language takes less study time than learning it from the ground up. Chapter 3 and Appendix A may be used together or separately, depending upon the reader's background. Chapter 3 provides the basics and can be supplemented with additional detail from Appendix A for a complete introduction to mod- ern assembly language programming. In the end, we hope this approach offers amore efficient treatment of assembly for most readers, while being sufficient ly broad to support detailed lecture or laboratory coverage if an instructor wants more emphasis on assembly language programming For those courses intended to expose students to the important principles of computer organization, the chapters from 4 to 9 explain the key ideas. Chapter 4 explains the idea of number representation for both integers and floating-point numbers and shows how arithmetic algorithms work. Chapters 5 and 6 introduce key ideas in control and pipelining and can be covered at several levels. Chapter 7 introduces the principles of memory hierarchies, uni- fying the ideas of caching and virtual memory. Chapter 8 shows how I/O sys- tems are organized and controlled, explaining the cooperative relationship between the hardware and the operating system. Finally, Chapter 9 uses exam- ples to introduce the key principles used in multiprocessors. For readers who want a greater emphasis on computer design, Chapters 4 through 8, together with Appendices B and C, provide that opportunity. For example, Chapter 4 explains a number of techniques used by computer design- ers to speed up addition and multiplication. Chapters 5 and 6 derive complete implementations of a MIPS subset using the arithmetic elements from Chapter 4nd a number of common datapath elements (such as register files and mem- ries) that are explained in detail in Appendix B. Chapter 5 starts with a very simple implementation; a complete datapath and control unit are constructed for this organization. The implementation is then modified to derive a faster version where each instruction can take differing numbers of clock cycles. The control for this multicycle implementation is designed using two different methods in Chapter 5. Appendix C shows in detail how the control specifica- tions are implemented using structured logic blocks. Chapter 6 builds on the single-clock cycle implementation created in Chapter 5 to show how pipelined machines are designed. The design is extended to show how hazards can be handled and how control for interrupts works. The student interested in com- puter design is not only exposed to three different designs for the same instruc- Pretace tion set, but can also see how these designs compare in terms of advantages and disadvantages. Chapter Organization and Overview Using these plans as the core, we developed the other chapters to introduce and support that core. , Many students remarked that they appreciated learning about the continu- ing rapid change in speed and capacity of hardware, as well as some of the is tory of computer development. This material is the focus of Chapter |. It provides a perspective on how software or hardware will need to scale during the coming decades. Chapter 1 also introduces topics to be covered in later chapters. Chapter 2 shows that time is the only safe measure of computer perfor- mance. It also relates common measurements used by hardware and software designers to the reliable measurement of time. The material in this chapter mo- tivates the techniques discussed in Chapters 5, 6, and 7 and provides a frame- ‘work for evaluating them. Chapter 3 builds on the knowledge of a programming language to derive an assembly language, offering several rules of thumb that guide the designer of the assembly language. We chose the instruction set of a real computer, in this case MIPS, so that real compilers could be used by students to see the code that would be generated. We hide the delayed branch and load until Chapter 6 for pedagogical reasons. Fortunately, the MIPS assembler schedules both de- layed branches and loads so the assembly language programmer can ignore these complexities without danger. Readers can see a very different approach to instruction set design in the Intel 80x86, which is covered in this chapter as, well. Although there is no consensus on what should be covered or what should be skipped in learning about computer arithmetic, we couldn't write Chapter 4 without reaching some conclusions of our own! Our solution is to introduce all the central ideas in the chapter and to provide some additional background for more advanced topics in the exercises. This allows one instructor to cover more advanced topics and assign exercises based on them, while another in- structor may skip the material. Chapters 5 and 6 show a realistic example of a processor in detail. Most readers appreciate having a real example to study, and a complete example provides the insight needed to see how all the pieces of a processor fit together for a pipelined and nonpipelined machine. To facilitate skipping some details on hardware implementation of control, we have included much of this mate- rial in Appendix C. _ Just as Chapters 2 through 6 provide important background for readers with an interest in compilers, Chapters 7 and 8 provide vital background to anyone pursuing further work in operating systems or databases. Chapter 7 describes the principles of memory hierarchies, focusing on the commonality between virtual memory and caching. Chapter 7 also emphasizes the role of the operating system and its interaction with the memory system. OME Sol RR ee Ee Preface po Topics as diverse as operating systems, databases, graphics, and network- ing require an understanding of I/O systems organization as well as the major technical characteristics of devices that influence this organization. Chapter 8 focuses on the topic of how I/O systems are organized starting with bus orga- nizations, working up to communication between the processor and I/O de- vice, and finally to the management role of the operating system. While we emphasize the interfacing issues, especially between hardware and software, several other important topics are introduced. Many of these topics are useful not only in computer organization but as background in other areas. For exam- ple, the handshaking protocol, used to interface asynchronous I/O devices, has applications in any distributed system. For some readers, this book may be their only overview of computer sys- tems, so we have included a survey of multiprocessing. Rather than the tradi- tional catalog of characteristics for many parallel machines, we have tried to describe the underlying principles that will drive the designs of parallel pro- cessors for the next decade. This section includes a small running example to show different versions of the same program for different parallel architec- ved tures. And as mentioned above, we have linked many example multiproces- ots sors from the real world on the book's WWW page at www.mkp.com/cod2e htm. Because the book is intended as an introduction for readers with a variety of interests, we tried to keep the presentation flexible. The appendices on as- sembly language and logic design are one of the principle vehicles to allow such flexibility, as these are easily skipped by more advanced readers, The presence of the appendices has made it possible to use this book in a course that mixes EE and CS majors with fairly different backgrounds in logic design and software. Assembly language programming is best learned by doing and in many cases will be done with the use of the simulator available with this book. Be- cause of this, we invited Jim Larus, the creator of the SPIM simulator, to join us as contributor of Appendix A. Appendix A describes the SPIM simulator and provides further details of the MIPS assembly language. In addition, it de- scribes assemblers and linkers, which handle the translation of assembly lan- guage programs to executable machine language. The logic design appendix is intended as a supplement to the material on computer organization rather than a comprehensive introduction to logic de- sign. While many EE students in a computer organization course will have al- ready had a course on logic design or digital electronics, we have found that CS majors in many institutions have not had much exposure to this area. The first few sections of Appendix B provide the necessary background. We in- clude some material, such as the organization of memories and finite state ma- chine control of a processor, in the mainstream material, since it is crucial to understanding computer organization, Selection of Material If you had no prior background and wanted to read from cover-to-cover, the following order makes sense: Chapters 1 and 2, Web Extension III (if needed), a xxvill Chapter 3, Chapter 4, Appendix A and Web Extension II (if interested), Appendix B, Chapter 5, Appendix C, Chapters 6, 7, 8, and 9. Clearly, most readers skip material. We have worked to provide readers with flexibility in their approach to the material, without making the discussions redundant. ‘The chapters have been written as self-contained units with cross-references to other chapters when related text or figures should be considered. The book has been used successfully in a variety of courses with different goals and stu- dent backgrounds. Concluding Remarks In Computer Architecture we alternated the gender of a pronoun chapter by chapter. In this book we believe we have removed all such pronouns, except of course for specific people. If you read the following acknowledgments section, you will see that we went to great lengths to correct mistakes. Since a book goes through many printings, we have the opportunity to make even more corrections. If you un- cover any remaining, resilient bugs, please contact the publisher by electronic mail at cod2bugs@mkp.comt or by low-tech mail using the address found on the copyright page. The first person to report a technical error will be awarded a $1.00 bounty upon its implementation in future printings of the book! Finally, like the last book, there is no strict ordering of the authors’ names. About half the time you will see Hennessy and Patterson, both in this book and in advertisements, and half the time you will see Patterson and Hennessy. ‘You'll even find it listed both ways in bibliographic publications such as Books in Print, This again reflects the true collaborative nature of this book: Together we brainstormed about the ideas and method of presentation, then individually wrote about one-half of the chapters and acted as reviewer for every draft of the other. The page count suggests we again wrote almost exactly the same number of pages. Thus, we equally share the blame for what you are about to read. Acknowledgments for the Second Edition We'd like to again express our appreciation to Jim Larus for his willingness in contributing his expertise on assembly language programming, as well as for ‘welcoming readers of this book to use the simulator he developed and main- tains at the University of Wisconsin. PCspim (DOS) and PCspim (Windows) versions of the simulator were developed by David Carley. Tod Amon of Southwest Texas State University was the exercise editor, creating and editing many new exercises. He also incorporated exercises donated by Doug Clark, Princeton; Richard Fateman, University of California, Berke- ley; Max Hailperin, Gustavus Adolphus College; Robert Kline, West Ches- ter University; Gandhi Puvvada, University of Southern California; Hamzeh Roumani, York University; Mike Smith, Harvard University; and Gregory Weber, Indiana University.

You might also like