PREFACE

The world has changed a great deal since the first edition of this book appeared in 1992. Computer networks and distributed systcms of all kinds have become very cmmon. Small children naw mam the Internet, where previously only computer pmfessimals went. As a consequence, this book has changed a great deal, too. The most obvious change is that the first edition was about half on singleprocrssor operating systems and half on distributed syskms. I c h s e that formal in 1.991 because few universities then had courses un distributed systems and whatever students l e m e d a b u t distributed systerns had to be put intu the operaring systems course, for which this book was intended. Now most universities have a separate course on distnbuted systems, so it is not neccssary to try to combine the two subjects imo m e course and one boak. This buok is intended for a first course on operating systems, and as such focuses mnstly on traditional singk-pmessrir sy atems. I have coauthored two other books on operating systems. This leads to two poss i ble cúurse sequences.
Practically-onented sequence: 1. Operating Systems Design and Irnplementation by Tanenbaum and Wwdhuil 2, Distriliukd Systems by Taneribaum and Van Steen

Traditionãl squence:

I . M d e m Operãting Sy stems hy Tanenbaurn 2. Distributed Systerns by Tanenbaum and Van Steen

PREFACE
The former sequence uses MINIX and the students are expected to experiment with MINIX in an accompanying laboratory supplementing the first course- The latter sequencc'dms not use MINIX. Instead, svme small s i r d a f o r s are avaihble that can be used for student exercises during a first course using this book. These can be found starting on the author's Web page: wwrv.cs.vrr.nl/-as[/ by

clicking on Software and supplementary material for my books . In addition to the major change of switching the emphais to single-processor operating systems in this book, other major changes include the addition of entire chapters on computer security, multimedia operating systems, and Windows 2000, all important and timely topics. In addition, a new and unique chapter on operating system design has k e n added. Another new feature is that many chapters now haw a section on research about the topic uf the chapter. This is intended to inuuduce the reader l o modern work in processes, memory management, and so on. These sections have numerous references to the current research literature for the interested reader. In addition, Chapter 1 3 has many introductory and tutorial references. Finally, numemus topics have been added to this book or heavily revised. These topics include: graphical user intefaces. multiprocessor operating systems. power management for laptops, trusted systems, viruses, network terminals, CDROM file systems, mutexes. RAID, soft timers, stable storage. fair-share scheduling, and new paging algorithms. M n new problems haw been added and old ay ones updated. The total number of problems now exceeds 450. A snlutions manual is available to professors using this book in a course. They can obtain a copy from their local Rentice Hall representative, In addition, over 250 new references to the current literature have been added to bring the book up to date. Despite the removal of more than 400 pages of old material. the book has increased in size due to the large mount nf new material added. While the bmk is still suitable for a one-semester or two-quarter course, iiis probably too long for a one-quarter nr one-trimester course at most universities. For this reason, the book has been designed in a modular way. Any course on operating systems should cover chapters 1 through 6. This is basic material that every student show

know.
If additional time is available, additional chapters can be covered. Each of them assumes the reader has finished chapters I through 6, but Chaps. 7 through 12 are each self contained. so any desired subset can be used and in any order, depending on the interests of the instructor. In the author's opinion. Chaps. 7 through 12 are much more interesting than the earlier ones. Instructors should tell heir students that they have to eat their broccoli before they can have the double chocolate fudge cake dessert. I would like to thank the following people for their help in reviewing parts of the manuscript: f i d a Bazzi, Riccardo Bettati, Felipe Cabrera. Richard Chapman, John Connely, John Diekinson, John Elliott, Deborah Frincke, Chandana Gamage, Robbert Geist, David Golds, Jim Griffioen, Gary Harki n, Frans Kaashoek, Muk-

PREFACE

XXV

kai Kriahnamoorthy, Monica Lam, Jussi Leiwo. Herb Mayer. Kirk McKusick, E v i Nemeth, Bill Potvin, Prasant Shenoy, Thomas Skinner, Xian-He Sun, ~ i l l i a m Terry, Robbert Van Renesse, and Maarten van Steen. Jamie Hanrahan, Mark Russinovich, and Dave Solomon were enormously knowledgeable about Windows 2OOO and very helpful. Special thanks go to A! Woodhull for valuablc reviews and thinking of many new end-of-chapter problems. My students were also helpful with comments and feedback, especial1y S taas de Jnng, Jan de Vos, Niels Drmt, David Fokkema, Auke Folkerts, Peter Groenewegen, Wilcr~ Ibes, Stefan Jansen, Jeroen Kcterna, Joeri Mulder, Irwin Oppenheim, Stef Post. Urnar Rehman, Daniel Rijkhof, Maarten Sunder, Maurits van der Schee, Rik van der Stoel, Mark van D i d , Dennis van Veen. and Thomas

Zeeman.
Barbara and Marvin are still wonderful, as usual, each in a unique way. Finally, last but not least, I would like to thank Suzanne for her love and patience, not to mention all the druiven and kersen, which have replaced the s i n ~ s n p p e l s ~ ~ p in recent times.

Andrcw S. Tanenbaurn

PREFACE

1.1.

WHATISANOPERATINGSYSTEM'! 3 1 . 1 . 1 . T h e Operating System as an Extended Machine 3 1.1.2. The Operating Syficm as a Resource Manager 5

1.2.

HISTORY OF OPERATING SYSTEMS
1.2.1. 1.2.2, 1.2.3. 1.2.4. 1.2.5.

6

'

The First Generation ( 1945-55) 6 T h e Second Generation ( 1 955-65) 7 TheThirdGeneraEion (1965-1980) 9 The Fourth Generation (1980-Present) 13 Ontogeny Recapitulates Phylogeny 16

I3

THE OPERATING SYSTEM ZOO 1 8 Mainframe Operating Systems 18 Server Operating Systems 19 Multiprocessor Operating Systems 19 Personal Computer Operating S ys~erns 19 Real-Time Operating Systcms 19 Embedded Operating Sys~cms 20 Srnarr Card Operating Systems 20

vnr
1.4,

CONTENTS

COMPUTER HARDWARE REVJEW 20 I .4+ Processors 2 1 1.
1.4+2. Memory

23

1.43. l / ~ ' ~ e v i c e s 28 1.4+4. '8uses 3 L

3 S.

OPERATING SYSTEM CONCEFTS 1 +5.1. Prwcesses 34 1.5.2. Dcadlrscks 36 1.5.3,Memory Management 37 1 S . 4 . Input/43utput 38 1.5-5. Files 38
1.5+6. Security 41 1.5+7. The Shell 41 1.5.8. RecyclingofCslncepts

34

43

1.6.

SYSTEM CALLS 44 1.6.1. SystemCalls forPlvcess Management 48 1 h.2. System Calls for File Management 50 t h . 3 . System Calls for Directory Management 51 1 ~5.4.Miscellaneous Systern Calls 53 1.6+5. The Windows Win32 API 5 3
OPERATING SYSTEM STRUCTURE 1.7.1. MonoIithcSysterns 56 1.7.2. byered Systems 57 1.7.3. Virtual Machines 59 1.7.4+Exokernels 61 1.7.5. Client-Server Model 6 1
56

1.7.

1.8.
1.9.

RESEARCH ON OPERATING SYSTEMS 63
OUTLINEOFTHERESTOFTHISBOOK
69

1 1 0. METRIC UNITS 66
+

I . 11 . SUMMARY

67

CONTENTS

2

PROCESSESANDTHREADS
2.1.

PROCESSES 71 2.1.1. The Process Modd 72 2 . I-2. Process Creation 73 2.3.3- Prncess Termination 75 2.1.4. Process Hierarchies 76 2.1.5. Pmcess Statcs 7 7 2.1.6- Trnple~nentati~n Prrxesscs 79 uf THREADS 81 2.2.1. . The Thrcad Model 8 1 2.2.2. Thread Usage 85 2.2.3. Implementing Threads in User Space 90 2.2.4. Implementing Threads in the Kernel 93 2.2.5. Hybrid Irnplementati~ns 94 2.2.6. Scheduler Activations 94 2.2.7. Pop-U'p Threads 96 2.2+8. Making Single-Threaded Codc Multithreaded 97

2.2.

INTERPROCESS COMMUNICATION I 0 0 2.3.1- RaceConditians !00 2+3.2. Critical Regions 102 2.3.3. Mutual Exclusion with Busy Waiting 103 2+3.4. Sleep and Wakeup 108 '2.3.5. Semaphores 110 2.3.6. Mutexes 1 13 2.3.7. Monitors I 2 5 2.3.8. Message Passing 119 2.3.9. Barriers 123
2+4. CLASSICAL I K PROBLEMS 224 2.4. l. Thc Dining PhiIosophers Problem 125 2.4,2. The Readers and Wrikrs Pmblen~ 128 2.4.3. The Sleeping Barber Problem 129
2.5.

SCHEDULING 132 2.5.1. Introduction to Schcdul t ng 1 32 2.5.2. Scheduling in Batch Systems 138 2.5.3. Scheduling in Interactive Systems 142 2.5.4. Scheduling in Real-Time Systems 148 2.5.5. Policy versus Mechanism 149 2.5.6. Thread Scheduling lSO

2.6.
2.7,

RESEARCH O N PROCESSES AND THREADS 15 1

SUMMARY

152

3

DEADLOCKS
3.1.

RESOURCES 160 3.1 .1. Preemptable and Nonpreemptable Resources 160 3+1.2. Resource Acquisition 16 1

3.2.

INTRODUCTION TO DEADLOCKS 3+2.1.Conditionsfur Deadlock 164 3.2.2. Deadlock Modeling 164

163

3
3

THE OSTRICH ALGORITHM 167 DEADLOCK DETECTION AND RECOVERY 168 3.4.1 Deadlock Detection with One Resource of Each Type 168 3.4.2. Deadlock Detection with Multiple Resource of Each Type 171 3.4.3- Recovery from Deadlock 173
+

5

DEADLOCK AVOIDANCE 175 3.5.1 Resource Trajectories 1 75 3.5.2. Safe and Unsafe States 176 3.5+3.The Banker's Algorithm for a Single Resource 17% 3.5+4. The Banker's Algorithm fnr Multiple Resources 179
+

3 .

DEADLOCK PREVENTTON 180 3.6.1. Attacking the Mutual Exclusion Condition 1 XO 3.6.2. Attacking the Hold and Wait Condjtim 1 8 1 3.6.3. Attacking the Nu Preemption Condition 1 8 12 3.6.4, Attacking the Circular Wait Condition 182

3.7.

OTHER ISSUES 183 3.7.1, Two-Phase Locking 183 3.7.2- Nonn.source Deadlocks 1 84 3.7+3. Starvation 184 RESEARCH ON DEADLOCKS
1 85

3.8.

3.9. SUMMARY 185

CONTENTS

4

MEMORY MANAGEMENT
4.1.

BASIC MEMORY MANAGEMENT 190 4.1.1. Monoprogramming withnut Swapping or Paging 190 4.1.2. Multiprogramming with Fixcd Partitions I 9I 4.1.3. Modeling Multipmgraming 192 4.1.4. Analysis of Multiprogramming System Performance 194 4.1 - 5 . Relocation and Protectinn 194

4.2. SWAPPING 196 4.2,L . Mernrsry Management with Bitmaps 199 4.2.2. Mernrxy Management with Linked Lists 200
4

VIRTUAL MEMORY 202 4.3.1. Paging 202 4.3.2. Page Tables 205 4.3.3. TLBs-Translation Lookaside Buffers 2 1 l 4.3.4. hverteb Page Tables 2 1 3

4.4.

PAGE REPLACEMENT ALGORITHMS 2 14 4.4.1. The Optimal Page Replacement Algorithm 21 5 4.4.2. The Not Recently Used Pagc Replacement Algorithm 21h 4.4.3. The First-In, First-Out 2 17 4.4.4. The Second Chance Page Replacement Algorithm 2 1 7 4.4.5. The Clock Page Replacement Algorithm 21 8 4.4.6. The Least Recently Used 21% 4.4.7. Simulating LRU in Software 220 4.4.8. The Working Set Pagc Replacement Algorithm 222 4.4.9. The W SClock Page Replacement Algorithm 225 4.4.:. Summary of Page Replacement Algorithms 227
MODELING PAGE REPLACEMENT ALGORITHMS 228 45.1. Belady's Anomaly 229 4.5.2. Smck Algorithms 229 4.5.3. The Distance String 232 4.5.4. Predicting Page Fault Rates 233

4.5.

4

DESIGN ISSUES FOR PAGING SYSTEMS 2.34 4.6.4. Local versus Global Allocation Policies 234 4.6.2. Load Control 236 4 - 6 3 Page Size 237 4.6.4. Separate Instruction and Data Spaces 239

CONTENTS
4.6.5. Shared Pages 239 4.6.6. Cleaning Policy 241 4.6.7. V inual Memory Interface 24 Z
4.7.

IMPLEMENTATION ISSUES 242 4.7.1. Operating System Invdvcrnent with Paging 242 4.7.2. Page Fault Handling 243 4.7.3. Instruutir~nBackup 244 4.7.3.Locking Pages in Memory 246 4.7.5. Backing Store 246 4.7,h. Separation of Policy and Mechanism 247

43.

SEGMENTATION 249 4.8.1. Implementation of Pure Segmentation 253 4.8.2. Segmentation with Paging: MULTICS 254 4 - 3 3 . Segmentation with Paging: The Intel Pentiurn 257

4.9.

RESEARCH ON MEMORY MANAGEMENT 262

4.10, SUMMARY 262

5.1.

PRINCIPLES OF 1/0 HARDWARE 269 5.1.1. IKI Devices .270 5.1.2. Device Controllers 27 1 5 1 . 3 . Memory-Mapped I/O 272 5.1 +4. Direct Memory Access 276 5.1.5. Interrupts Revisited 279
PRINCIPLES OF I SOFTWARE 282 O 5.2. I , Goals of the VO Software 283 5.2+2. Programmed 110 284 5.2.3- Interrupt-Driven VO 236 5.2.4. 1 l Using DMA 287 K

5.2.

5.3. 1/O SOFTWARE LAYERS 287 5.3.1. Interrupt Handlers 287 5.3.2. Device Drivers 289

CONTENTS
5.3.3. Device-Independent I/O Software 292 5.3.4. User-Space V 0 Software 2.98

5.4. DISKS 300 5.4.1 . Disk Hardware 300 5.4.2. Disk Formatting 315 5.4.3. Disk Arm Scheduling Algorithms 3 18 5.4.4. E m r Handling 322
5.4.5- Stable Storage 324 5.5.

CLOCKS 327 5.5.1. Clock Hardware 328 5.5.2. C l w k Software 329 5.5.3. Soft Timers 332

5,A.

CHARACTER-ORIENTED TERMINALS 333 5.6.1. RS-232 Terrrtinal Hardware 334 5.6+2. Input Software 336 5.6.3. Output Software 341

5.7.

GRAPHICAL USERINTERFACES 342 5.7.1. Personal Computer Key board. Mouse, and Display Hardware 343 5.7.2. Input Software 347 5.7.3. Output Software for Windows 347
NETWORK TEKMINALS 355 5.8.1. The X Window 'System 356 5.8.2, The SLIM Network Terminal 360

5.8,

5.9,

POWER MANAGEMENT 363 5.9.1. Hardware Issues 364 5.9.2. Operating System Issues 365 5.9.3. Degraded Operation 370

5.1 0. RESEARCH O N TNPUTKIUTPUT 37 1

5.11. SUMMARY 372

XlV

CONTENTS

6

FlLE SYSTEMS
6.1.

FILES 380 6.1.2. Fik Naming 380 6 , I . 2 , File Structure 382 6. I 3.File Types 383 6. I +4+ FiIe Access 385 6.1.5, File Attributes 386 6.1 +6. File Operations 387 6.1.7. An Example Program Using File System Calls 389 6.1.8. Memory-Mapped Files 39 1

6.2.

DIRECTORIES 383 6+2. . Single-Level Directory Systems 393 I 6,2.2. Two-Icvd Directory Systems 394 6,2.3. Hierarchical Directory Systems 395 6.2.4, Path Names 395 6.2.5. Directory Operations 398

6.3,

FILE SYSTEM IMPLEMENTATION 399 6.3. I. File System Layout 399 6.3.2. Implementing Files 400 6.3.3, Implementing Directories 405 6.3.4. Shared Files 408 6+3.5.Disk Space Management 410 6.3.6. File System Reliability 4 1 6 6+3.7. File System Performance 424 6.3.8. Log-Structured File Systems 428
EXAMPLE FILE SYSTEMS 430 6.4.1. CD-ROM File Systems 430 6.4.2. The C P M File System 435 6.4.3. The MS-DOS File System 438 6.4.4. The Windows 98 File System 442 6+4.5+ h e UNIX V7 File System 4-45 T

4 .

6.5.

RESEARCH ON FILE SYSTEMS 448

6 .

SUMMARY 448

CONTENTS

MULTIMEDIA OPERATING SYSTEMS
TNTRODUCTiON TO MULTIMEDIA 454

MULTIMEDIA FILES 458 7+2.1. Audio Encoding 459 7.2.2, Video Encoding 461
VIDEO COMPRESSION 463 7.3.1. The JPEG Standard 464 7.3.2, The MPEG Standard 467

MULTIMEDIA PROCESS SCHEDULING 4-69 7.4. I Scheduling Homugeneous Processes 469 7.4-2. General Real-Time Scheduling 470 7.4.3. Rate Monotonic Scheduling 472 7.4.4. Earliest DeadIine First Scheduling 473
+

MULTIMEDIA FILE SYSTEM PARADIGMS 475 7.5.1. VCR Unntrd Functions 476 7.5.2. Near Vidm on Demand 478 7.5.3. Near Video on Demand with VCR Functions 479

FILE PLACEMENT 48 1 7.6.. Placing a File on a Single Disk 48 I I 7.6.2. Two Alternative File Organization Strategies 482 7.6.3. Placing Files for Near Video on Demand 486 7.6.4. Placing Multiple Fiks on a Single Disk 487 7.6.5. Placing Files o n Multiple Disks 490
CACHING 492 7.7.1. Block Caching 492 7.7.2. File Caching 494
DISK SCHEDULING FOR MULTIMEDIA 494 7.8.1. Static Disk Scheduling 495 7.8.2, Dynamic Disk Scheduling 496

RESEARCH ON MULTIMEDIA 498

SUMMARY 499

8

MULTIPLE PROCESSOR SYSTEMS
8.1.

MULTIPROCESSORS S M 8.1.1. Mu1t iprocessor Hardware 506 8.1.2. Multiprncessor Operating System Types 5 13 8.1.3. Multiprocessor Synchronization 5 1 6 8.1.4. Multiprocessor Scheduling 52 1
MULTICOMPUTERS 526 8.2.1. Multicomputer Hardwarc 527 8.2.2. Low-Level Communication Software- 53 1 8,2.3, User-Level Communication Software ,534 8.2.4. Remote Procedure Call 537 8.2+S. Disbibuted Shared Memory 540 8.2.6. Mutticornputer Scheduling 544 8.2.7- Load Balancing 545

8:2.

8.3,

DISTRIBUTED SYSTEMS 549 8.3.1. Network Hardware 55 1 8.3.2- Network Services and Protocols 553 8.3.3. hcurnent-Based Middleware 558 8,3-4. File System-Based Middleware 559 8.3.5. Shared Object-Based Middleware 565 8.3.6. Coordination-Based Middleware 572

8.4. RESEARCH O N MULTIPLE PROCESSOR SYSTEMS 577
5

SUMMARY 577

9

SECURITY
9.1.

THE SECURITY ENVIRONMENT 584 9.1.1. Threats 584 9.1 2. Intruders 585 9.1.3. Accidental Data Loss , 5 6 55

9.2. BASKS OF CRYPTOGRAPHY 587 9.2.1. Secret-Key Cryptography 588 9.2.2. Public-Key Cryptography 588

CONTENTS
9.2.3. One- Way Functions 589

9.2.4. Digital Signatures 590
9.3.
USER AUTHENTICATION 591 V .3.2. Authentication Using Passwords 592 9.3.2, Authentication Using a Physical Object HI1 9.3.3. Authentication Using Biornetrics 603 8.3.4. Countermeasures 606

9.4.

ATTACKS FROM INSIDE THE SYSTEM 606 9.4.1. Trojan Horses 607 9.4.2. Login Spoofing 608 9.4.3. Logic Bombs 609 9.4.4. Trap Doors 610 9.4.5. Buffer Overflow 4 0 1 9.4.6, Generic Security Attacks 6 13 9.4.7. Famous Security Flaws 614 9.4.8. Design Principles for Security 616

9.5.

ATTACKS FROM OUTSIDE THE SYSTEM 617 9.5.1. Virus Damage Scenarios 618 9.5.2. Hnw Vimses Work 619 9.5.3.How Viruses Spread 626 9.5.4. Antivirus and Anti-Antivirus Techniques 628 9+5.5. The Inlernet Worm 635 9.5.6. Mobilt: Code 637 9.5.7. Java Security 642

9.6, PROTECTION MECHANISMS 645 9.6.1 Protection Domains 645 9+6.2.Access Control Lists 647 9.6.3.Capabilities 650
+

9.7. TRUSTED SYSTEMS 653 9.7.1. Trusted Computing Base 654 9.7.2. Formal Models of Secure Systems 655 9.7.3. Multilevel Security 657 9.7.4. Orange Book Security 659 9.7.5. C o k t Channels 66 1
9.8,

RESEARCH ON SECURITY 665 SUMMARY 666

9.9.

10 CASE STUDY 1: UNlX AND LtNUX
lo.!. HISTORY OFUNIX 672 10. I . 1 . UNTCS 672 10.1.2. PDP-I 1 UNIX 673 10.1+3. Portable UNIX 674 10,1.4. Bcrkdey UNIX 875 10-1.5. Standard UNlX 676 10.1.6. MINIX 677 I . . . i n 678
10.2. OVERVIEW OF UNIX 681 10.2.1. UNIX Goals 68 I 10+2.2+ Interfaces to UNlX 682

10.2+3.m UNIX Shell 683 e 1 O.2.4. UNIX Utility Programs 686 10.2.5. Kernel Structure 687
10.3. PROCESSES IN UNIX 690 10.3.1. Fundamental Concepts 690 10.3.2. Pmce;ss Management System Calls in UNIX 692 10.3.3. Implementarim o Processes in UNlX 699 f 1O+3.4.Booting UNIX 708
10.4. MEMORY MANAGEMENT IN UNIX 7 10 10.4.1. Fundamental Concepts 7 1 1 lO.4.2. Memory Management System Calls in UNlX 7 14 10.4.3. Implementation of Memory Management in U N l X 715

10.5. INPUTKWTPUT IN UNIX 723 10.5.1 Fundamental Concepts 724 10.5.2. Input!Uutput System Calls in UNlX 726 10.5.3. Implementation of lnput/Output in UNIX 727 t0.5.4, Streams 730
+

10.6, THE UNIX FILE SYSTEM 732 10.6.1 Fundamental Concepts 732 . 10.6.2. File System Calk in UNIX 736 10.6.3. Implementation of the UNIX File System 740 10.6.4. NFS: The Network File System 747

CONTENTS
10.7. SECURITY 1 UNIX 753 N 1 . . Fundamental Concepts 753 071 10.7.2. Security System Calls in UNIX 755 10.7.3. Implementation of Security in UNlX 756
+

XIX

20.8. SUMMARY 757

11 CASE STUDY 2: WINDOWS 2000
1 I 1. HISTORY OF WINDOWS 2000 763 1 1. I .I.. MS-DOS 763 1 1.1.2. Windows 95/98/Me 764 1 I. 1.3. Windows NT 765 11.1.4. Windows 2000 767
+

11.2+ PROGRAMMING WlNDOWS2000 771
1 1.2.1. The Win32 Application Programming lntcrface 772 11+2.2.The Registry 774 1 1.3. SYSTEM STRUCTURE 778 1 1.3.1 . Operating System Structure 778

.

11-32, Impternentation of Objects 787 1 1 -3.3. Environment Subsystems 792
I I ,4. PROCESSES AND THREADS IN WINDOWS 2000 796 I 1 -4.1. Fundamental Concepts 796 11.4.2. Job. Pmcess. Thread and Fiber Management API Calls 799 11.4.3. Implementatinn of Processes and Threads 802 1 1+4.4.MS-DOS Emulation 809 1 1.4.5+Booting Windows 2000 820 I I .5+ MEMORY MANAGEMENT 8 1: 1 I 1 3.1. Fundamental Concepts 8 1 2 1 1.5+2. Memory Management System Calls 8 1 6

1 1 +5.3.Implementation of Memory Management 8 17
I I .6. INPUTKWTPUT IN WINDOWS 2WO 824

1 1.6.1 Fundamental Concepts 824 . I 1.6.2. Input,'Output APT Calls 825 11.6.3. Implementation of VO 827 t 1.6.4. Device Drivers 827

XX

CONTENTS
1 1.7. THE WlNDOWS 2000 FILE SYSTEM

830

1 1.7.1. Fundamental Concepts 830 I I .7.2, File System API Calk in Windows 2000 83 I 1 1.7.3, Implementation of the Windows 2000 File System $33
1 1 - 8 . SEUUKITY IN WINDOWS 20Ml $44

I 1.8.1. Fundamental Concepts 845 1 1.8.2. Security APT Calls 847 I I -8.3. Impkrnentatim of Security X4X

11.8. CACHING T WINDOWS 20Uo 849 N

12 OPERATING SYSTEM DESIGN
12.1. THE NATURE OF THE DESIGN PRORLEM 856

I 2 , l . l . Goals 856 12.1 +2.Whyjs i~Hard to Design an Operating Systems'! 857
12.2. INTERFACE DESIGN 859 1 2.2.1. Guiding Principles 859 12.2.2. Pmdigrns 86 1 12.2+3. The System Call Interface 864
12.3 IMPLEMENTATION 867 12.3, I . System Smcrure 867 12.3.2. Mechanism versus Policy 870 1 2.3.3. Orthogunality 87 1 12.3.4.Naming 872 i2.3.5. Binding Time 874 12.3.6. Static versus Dynamic Structures 875 1 2.3.7. Top-Down versus Bottom-Up Implementatinn 876 1 2.3.8. Useful Techniques 877 12.4. PERFORMANCE 882 12.4.1.Why are Operating Systems Slow? 882 12.4.2. What Should he Optirnizcd? 883 [ 2.4.3. Space-Time Trade-offs 884 12.4.4. Caching 887

CONTENTS
12+4.5. Hints 888

12.4.6. Exploiting Locality 888
12.4.7. Optimize the Common Case 889

12.5. PROJECT M A N A G E M E N T 889 €23.1.The Mythical Man Month 890 12.5.2. T e r n Structure 891 12.5.3. The Role nf Experience 893 12.5.4. No Silver Bullet 894
12.6. TRENDS IN OPERATING SYSTEM DESIGN 894 12.6.1. Large Address Space Operating Systems 894 12+6.2. Networking 895 I2+6.3. PardTel and Distributed Systems 896 12.6.4. Multimedia 896 12.6.5+ Battery-Powered Computers 896

12-66, Embedded Systems 897
12.7. SUMMARY 897

13 READING LET AND BlBLlOGRAPHY
13.I . SUGGESTIONS FOR FURTHER READING 90 I 1 3. I . I . Introduction and General Works 902 13+1.2. Processes and Threads 902 13.1.3. Deadlocks 903 1 3.1.4. Memory Management 903 1 3.1 +5.InputlOutput 903 13+1.6. File Systems 9 4 0 13.1.7. Multimedia Operating Systems 905 13.1.8.Multiple hncessur Systems 906 1 3.1.9. Securiry 907 13.[.10. UNTX and Linux 908 13.1+1 - Windows 2000 909 1
1 3.1.12. Design Principles 9 10
13.2 ALPHABETICAL BIBLIOGRAPHY Y 1 1

INTRODUCTION

A modern cumpurer system consists uf one or more processtws. some main memory, disks, printers, a keyboard, a display, network interfaces, and other inputhutput devices. All in all, a complex system. Writing programs that keep track of - all these components and use them correctly, k t alone optinrally, is an extremely difficuIt job. For this reason, computers are equippcd with a layer nf software called the operating system, whose job is to manage all these devices and provide user progriims with a simpler interface to the hardware. These systems are the subject of this book. The placement of the operating system is shown in Fig. 1-1. A1 1he bottrm is the hardware, which, in many cases, is itself composed of two or more leve5s (or

layers). The lowest level contains physical devices, consisting of integrated circuit chips, wires, power supplies, cathode ray tubes, and similar physical devices. How these are constructed and how they work we the provinces r ~ f electri cai the engineer. Ncxr comes the microarchitecture Cevel, in which the physical devices are grouped together to farm functional units. Typically this level cnntains some registers internal to the CPU (Central m e s s i n g Unit) and a data path containing an arithmetic logic unit. In each clock cycle, one nr two operands are fetched from the registers and combined in the arithmetic logic unit (for example, by addition or Bmlean AND). The result is stored in one or more regisws. On some machines, the operation af the data path is controlled by software, calkd E c h microprogram. On other machines, it is contrnlled directly by hardware circuits.

Banking

Airline

system

resewation

Web browser
7

Application programs

Operating System
Machine language

i:
I1

Microar~hitecture

I
plication prtjgrams.

Physical devicss

Figure 1-1. A computer system consisls c~f hardware. system progrnms. and ap-

The purpose of the data path is to execute some sel af instructions. Some of these can be carried out in one data path cycle; others may require multiple data path cycles. These instructirms may use registers or lother hxdware facilities. Togcther, the hardware and instructions visible to an assembly language programmer form the 1SA (instruction Set Architecture) level. This level i s often called

machine language. The machine language typically has between 50 and 300 instructions, rnostly for moving data around the machine, doing arithmetic. and comparing values. In this level, the inputhutput devices are cnntrdled by loading values inm special device regiswrs. For example, a disk can be comrnandcd to rcad by loading ihe values o the disk address, m i memory address, byte count, and dircctinn (read f an o writc) into its registers. In practice, many mom parameters are necded. and rhc r status returned by the drive after an operatttim is highly cnmplcx. Furthennore. h r many 1/0 (lnput./Output) devices, timing plays an i m p r t a n t role in the programming. To hide this complexity, an operating system is provided, It consists of' a layer of software that (partially) hides the hardware and gives the programmer a more collvenient set of instructions lo work with. For example. read block tram file is conceptually simpler than having to worry about the details of moving disk heads, waiting for them to settle dnwn, and sr, on. On lop of the operating system is the resl o f the systcm suftware. Here w c find the command interpreter (shell), window systems. compilers, editors, and similar application-independem programs. It is important to realize that t h e e programs arc definitely not part of the operating system, even though they LVC typically supplied by the computer manufacturer. This is a crucial, hut subtle. point. The uperating system is (usually) that p m i o n of the software that runs in kernd mode or supervisor mode. Ir is protected from user tampering by the hardware (ignoring for the moment some older or low-md microprocessors that do nor have

all). Compilers and editors run in wer mode. If a user does not like a pAcular compiler. he$ is free to write his own if ht: so chnflses: he is not free to write his own clock interrupt handler, whlch is part of the operating system and is normally protected by hardware against attempts by users to modify it. This distinction, however, is sometimes blurred in embedded systems (which may not have kernel mode) or interpeted systems (such as Java-based operating systems that use interpretation, not hardware, to separate the compunents). Still, for rradi~ional computers. the operating system is what runs in kernel mode. That said, in many systems there icre programs that run in user mode but which help the operating system or perform privileged functions. For example, there is often a program that allows users to change their passwords. This pragram is not p a ~ l the operating system and does not run in kernel mode, but it of clearly carries out a sensitive function and has to be protected in a special way. In some systems, this idea is carried to an extreme form, and pieces of what is traditionally considered to be the operating system (such as the file system) run in user space. In such systems, it is difficult to draw a clear boundary. Everything mnning in kernel mode is clearly part of the operating system, but some prograins running outside it are uguably also part of it, or at ieast closely awxiated with it. Finally, above the system programs come the application programs. These programs are purchased or written by thc users to solve their particular problems. such as word processing, spreadsheets, engineering calculations, or storing information in a database.
hardware protection at

1.1 WHAT IS AN OPERATING SYSTEM?
Most computer users have had some experience with w operating system, but it is difficult to pin down precisely what an ~peratingsystem is. Part of the problem is hat operating systems,perform two basically unrelated functions, extending the machine and managing resources, and depending on who is doing the talking, you hem mostly a b u t one function or the other. Let us now look at both+

1.1.1 The Operating System as an Extended Machine
As mentioned earlier. the architect& (instruction set, memory organization, U0,and bus structure) of most computers at the machine language level is primitive and awkward to program, especially for inputloulput. To make this puint more concrete. let us briefly look at how floppy disk I/O is done using the NEC
"lie" should be read as "he or she" h u g t t ~ uthe h k . t

CHAP. 1
p ~ 7 t yccJmpatjble controller chips used on most htd-based personal computers. j (Throughout this book we will use the terms "floppy disk" and "diskefW" interchangeahl .) The PD7h5 has 1 0 commands. each specified by loading between 1 and 9 bytes into a device register. These cvinrnands are f~ reading and writing data, Inoving [he disk arm,and formatting tracks, as well as itlitidi~ing.sensing,

resetting, and recalibrating the controller and rhc drives. The must basic commands are read and write, each of which requires 13 parameters, packed info 9 bytes. These paramrrcrs specify such items as the address of thc disk block to be read, the number nf sectors per track. the recording mode uscd ntl the physical medium. the inrerseclor gap spacing, and what to do s ~ i t h deleted-data-riddress-mark. If you d i ~ understand this mumbo jumbo, u not do nut worry; that is precisely the point--it is rather csoteriu. When the operation is cumpktedl the contrdler chip returns 23 s r i m and crrur fields packed into 7 bytes. A s if this were not ennugh, the floppy disk programtner must also be constantly aware of whether h e motor is on o r off, I f the rnotor is off. it must be tumcd on (with a Img startup delay) behre daln can be read or wriaen. The motor cannot be left on tm long, however. the floppy disk will wcar w ~ t .The program~neris thus forced to deal with the tradc-nff between long startup delays versus wearing out flnppy disks (and toshp the data nn them). Without going int.0 the r d details. i t should be clear thal the average programmer probably does not want to get too intimately involved with the programming of floppy disks (or hard disks, which arc just as complex and quite different)- Instead, what the programmer warm is a simple, high-level ahstrx~ion to deal with. In the cast: of disks, a typical abstraction would he lhat ihe disk c m tains a collection of narned files. Each file can be opened for reeding or writing. then read or written, and finally closed. Details such as whether or not rccordlng should use modified frequency n ~ d u l a t i o n and what the current state of the miltor is should not appear in the abstraction presented to the user. The program that hides the truth about the hardware from the programmer and presents a nice, simple view of named filcs that can bc read and written is, of course. the operating system. Just as the operating system shields the prograrnnler from thc disk hardware and presents a simple file-oriented interface, it also conceals a lot of unpleasant business concerning interrupts, timers, mcrnory managcment. and other low-level features. In each case, the abstracticm ufkrrd by the operating system is simpler and easier to u s e than that offered hy ihc underlying
hardware.

.,

In this view, the function of the operating system is to prcsznt ihc uscr with the equivalent of an extended machine or virtual machine that is easicr ro program than the underlying hardware. How the operating system achievcs this gcml is a long story, which we will study in dctail throughnut this hook. To summarize i t in a nutshell, the operating system provides a variety o services that programs f can obtain using special instructions called system calls. f i e wiIl examine solne of the more common system calls later in this chaptcr.

-.

csprci:illy it' it only n r c t i s a small h x ~ i w 1l

t

tor:ll.

0 1 ' c0u1.r~. this 17aisc.i

w l v c thcrn. Anothcr resource that is space multiplertrd is the (hard) disk. 111 lniinv svsiems a single disk can hold files fi-orn man!- users 31 he same time. - Allocating disk spnsc iillil kccping ~ r n c kof who i s using tvhich disk hlorh:, i s a typical operating sy stcm rcsowce tnanagemcnt task.

issues of i:iin:ess, prorec~ion,and so on. and i t is up lo rhe opcratinp system l o

Operacing systrrns h~ivi.been evulving thrwgh thc ycx-.,. [ n the f-dlowinz sccrirjns w t *ill briefly ltmk 31. ;i few nf tlw highlighls. Since upuraliup s_rsrcnw havc hisrarically been ~ l o s e l y tied t o h e archirec~urcn f thcl c~srnpule~-.; n which o ~hcv rlin, we will l u r k at s~isucssivegeneriitirms nf wmputers Lr, sce what thcir nperaring systems were like. This mapping of operating systcm gencratiut~sto unmputer. generations is crude, hut i t dws p r o v i d ~some mucluru where thw: w w l d otherwise be none. The first true digital computer was designed h; rhe Er~pJish 1n:lrhcmariuian Charles Bahbage ( 1792-1 87 1). Although Babbage spent mosl of his lifc ard for+. tune trying to build his ";ma!ytical engine. he never got i t wurking prtqxrly hecnuse it was purely mechanical. and the technology or h i s day cnuld nor produce the required wheels. gears, and cogs to rlw hi$ prcuisim that h c nrcdcd. Needless to say. the analylical engine did no1 have an operating system. As nn interesting histr~rical aside. Babbagc rc;rlizcd that Ire wrmld need software for his analyiical engine, so he hi]-cd a voung W ~ I I I ~ Unamed Ado I Lovclacc. who was the daughter of rhc famed British ptac Lord Bymn. as th;: world's firsr programmer. The programming hngoagc ~ d a ' ~is n;lmed a h - her. '
I_

+

1.2.1 The First Generation (1945-55) Vacuum Tubes and Plugboards
After Babbage's unsuccessful efforts. l i t ~ l cprogress was made in ccmstruc~ing digital computers until World War 11. Armncl thu n d - 194Os, Hou.ard Aikcn a [ Hanfard. John vm N ~ u r n a n n the Instilute fnr Advanced Sludy in Priacetnn. 1. at Presper Eckert and William Mauchley ctl the University o f Pennsylvania. and Konrad Zuse in Crerrnm\.*,among others, aII succeeded in building calculaiing engjim. The firs1 ones c~sednrechnnical r c k y hut were vc1-y slow. with cycle times measured in seconds. Relays were later replaced by vacuum tubes. These machines were enormous; filling up entirc morns wid1 tcns of rhousands ofvacuum tubes, but they were still millions nt'timrs slower than even the chcapc+l persnrrnl cwnputcrs iwailuhlu today. In these early days. a single group of' people designed, built, programmed, operated, end maintained each machine. Ail pmgramming was done in absolute

machine language, ufien by wiring up plughourds I{) co~ltrolthe machine's hasic funcis~c hogramming litnguages were unknown (even nssrimhly language was unkncrwYnj. Operatiug syswms were unheard of. Thc usual mtrdc of' c~pewtiun a ws for the programmer tu sign up fur a hlock uf time on rhc signup sheet on the wall, then come d u w to the machine room. insert his ur her plugbuwd into the cornputer, and spend the noxt few hours hoping that none o 'the 20.00CI or so vacuum f tubes wuuld burn out during the run. Virtually all thc problems were straightforward numcricsl calculations. such as grinding out tables of sines. cosines. and logarit hms. By the early E W s , the rvulinc had iniprwecl snmewhat with tlle introduction of pii~lched a d s . It was now possible tu write programs on c a d s and read them c in Instead of using plugboards; ntherwise, the procedure was the same.

1.2.2 The Second Generation (195545)Transistors and Batch Systems
The introduction of the transistor in the mid- 1950s changed the picture radically. Computers kcnme reliahk enough that they could be manufactured and s d d ro paying customers with the expeatatiun thal they would cuntinue ta function long enough to gel some useful work: done, For the first time. [here was n clear separation bet ween designers, builders. i p x a t w s . prtygrammers. and mai ntenance prsrmnel . These machines, now called mainframes, were locked away in specially air conditioned computer rooms, with staffs of professional npcratnrs to run them. Only big cnrporations or major government agencies or ~rniversiricscould afford the multimillion dollar price tag. To run a job i,i.e.. a program or sct of programs), a programmer would first write Lhs program o n papcr (in FORTRAN o r assembler). then punch it on cuds. He would then bring the card deck down to the input room and hand i t to one uf the operators w d go drink coff'ce until thc output was ready. When the computer finishcd whatever job it was currently running, an operator would go over to the printer and tear off the output imd carry il over to the output room, so that the progrdmrner could cdtect i t latcr. Then hc wouid rake one of the card decks that had k e n broughl tiom the input room and read it in. If the FORTRAN compiler was needed. the operator would have to pet it from a tile cabinet and read it in. Much computer time was wasted while operators were walking around the machine room. Given the high cost of the equipment, i t i s noi surprising that people quickly looked for ways to reduce the wasted timc. The solution generally adopted was the batch system. The idea behind it was to collect a tray full of jobs in the input room and then read them onto a magnetic tape using a small (relatively) inexpensive computer. such as the ISM 1401, which was very good at reading cards. copying tapes, and printing output. but not at all good ar numerical calculations.

CHAP. L
I

I

Other. much morc expensive machines, such ria the IRM 7W4, were u s e d ' h the real computing. 'l'his situation i s shown in Fi 2 . 1 -2 .
&.

e

tor then loaded a special prnerarn (the ancestor ol. today's vperati~ig system). which read thc first job i'rom tape and ran i t . The c.}ulput was written onto a secL.

ond tape. i i ~ e a d bekg printed. After each job finished, thc operiiting system of automatically read the next job from the ~ i p c md bugan running i t . When the whole batch was done. th operator rcmovrd rhc input and ourput tapes, replaced the input tape with the next batch. and brought the o u l p u ~ tape ro a 140 1 for pdtlr ing o f fline (ix..not clsnrlected to the muin cninpiitcr). nut wilh ; I The structure nf a lypical input job i s shcrwn in Fig. 1-3. It stal-~ccl $JOB card, specifying the rnaxirnum run timr irr minutes. the account nuiuber to be charged. md the programmer's name. Then c;imc a $FOKI'KAN card. tellin2 the operating syslcm to l o x i the FOKI'KAIS cr~mpilerfroin the system tape. TI was fnllowed by the prugram to be otrlpiled. and then a SLOAD card, directing h e operating system ro load the nbjcct program just compiled. {Compiled programs were o f k n writterr o n scratch tapes and had to be loaded explicitly . l Next cnmc the $ R U N card, iclling thc operaring system to run rhe progrum with ihz data ftdluwing it. Finally, thc $END card n w k e d the cnd of the job. These prirnitive control c;irds were ~ h fi~rrrunners mcdern joh contml languages anti c ~ n c of tnmd interpreters. Largc second-genernt ion computers u-rrr used mostly h r s c i e n t i f i c ;rod engineering calculittions, such as solving h e partial d i f t e r r n r i n l rquatiorle that o f t e n r c c u r in physics and rnginecring. They ~ c r largely progl-anm~ed FORc in TRAK and assembly Language. Typical operating sy stcms were PMS (the Fortran M m i t o r System) and LBSYS, IBM's operating system for the 7094.

HISTORY OF OPERATING SYSTEMS

1.23 The Third Generation (1965-198U) L s and Multiprogramming C
By the early l960s, innst computer manufaciorers had two dislinot. und tohlly incurnpatible, product lines. On the [me h:md thcre were thc word-oriented, large-scale scientific computers, such us the 7094, which werr: used for numeriul calculiiticms in scienw and engineering. On the other hand, there werc the character-oriented, corntnercial computers. such as rhc 1401, which werc widcly used for tape sorting and priming by hanks and insurance cnmpanics. Doveloping and maintaining two compklely different product litlcs *:is i11n expensive prnposition Tor the rnanufac~urern. I n addition. many t ~ ctjmputcr w customers initially needed a small machine bul later outgrew it it11d wantcd a biggcr machine that would run a l l their old prngratns. but fnstcr. IBM attempted to s d v e both o f these pmblerrls a1 a singlc stroke by inlruducing the Systetd360. The 360 was a series of sof~warc-compatibIc{michines ranging froin 14Wsizcd tn much Inore p o w e r h l ihan the 7094. The machines ciiffered m l y in pricc and pcrlorlnimce (maxirnun~ tnelnory, processor speed, numhrr of I/O devices permitted, and so forth). Sincc all thc tnxhines had thc rairtc architecture and instmcriun set, programs written fm- one machinc c m l d run nn d l the others. at least in tbcory. Furthermore. the 360 was dcsigncd TO handle horh scientific (i.e., numericnl) and commerciiil computing. 'Thus n single family of rnachirles could satisfy thc needs of all vuslomcrs. In suhsequen~years, IBM has come nu1 with compatible successors to thc 760 line. using lnorc modem icclmdr q y , knuwn as the 370: 4300. 30H0, and 30911 series.

firs, jnoinl- cornpurzr line tcr llse ( s l n d l - ~ d Inregralcd ('ir~ ) ~l~~360 was .I its ( 1 ~ ~thus providing a majtv prjcdprrfornlance advnnra_ee over the ~ c o n d ; ~ . were hililt up ftnm inilividuid lritnsi~tors. [ I w'as an .enrralion rnachincs. rn imlncdiatc r;uccess. and the i&a {)I' filrnily of c o m f l d i b ~ t a ct>lnpllhYs was Wfln adopted hy all [he other ml?jo1-m w n f x t u r s r s The descendants of thcsc machines are still in use at colnputcr ccntrrs today. Nowadays they are oflen used nlanagiog h u g &t&;~st.s (e-g,. far airline reservation syslcm.;) or aS Servers for World Wide W& sites fiat rtiust przxcss fhnusnnds d ~xquests per sewnd. *The frc;itest strength of the "one !am il y idea was si mu lIaneoos!y i tl; prea1r:st u ' E ~ ~ ~ c 'rhe irlten~iollwiis that all sof~ware.including the upersting syst.e~n, s\. iIS13150 had tr:, work i all mndels. I t had 1 1 ) rurl o n m a l l .iysllrn~s, m which d t c n just rcpl;iccd I 4 U l s fr)r cnpying i;;irds tiiprs, imd 011 very Iui-gr syhterns, whi;h d t r n rcplirccd 7093s I'm-doing weather fnrcrrastin~and othtx heavy crmputing. It had to bc good on s y s l ~ r n s with few peripherals and im systems with marly periphcrals. la had iu wurk in c~mrnerci;tlcnv irw\rnents and in scienii fic environmcnts. Above all, il hiid to be efficient for ill1 of these different uses. Thcrc was nu way that IBM ( o r a ~ i y b r d y clsc) could write a p i c w ul' srlftw;rrr: to meet a11 ~hc>secmflicting rcquiremen~s. The rcsulr was an rnorluous and extraordinarily ~ u m p l c x operating system. PI-ohahly\we) to thi-uc mders [>frnagnitude larger than FMS. It cnnsisted of milli{,~ls lines uf as~srnlsl 1anguag.c writof' y ten by thousands of pnp-iimrncrs. and c:clnr~iincd h m w n c l s upm rhoul;ands of bugs, which necessitated a ct~ntinuousstream nf new relenses in an alteirlpt tr, correct them. new release fixed s o m bugs and introhced new m c s . so the number elf bugs probald:: remained c o n s m : TI l i ~ n c . One of the designers of OS13hU. Frcd B~-oc>ks. s~ihsequrntlywrote a witty ant1 incisive bouk (Brooks, 1996) describing his expcrir.nces with OS/3hO. While it would be irnpussiblc to .surnmarizc. the. hunk b e r e slrfficc i t iu say thal the cover shows a herd of prehistoric heasts stuck in n [al- oil. l'lle cover of Silherschatz cr 81. i2000) makes a sirnilm point about operotin: systems k i n e tlinosaurs. I)cspite its enonnuus size and problems. OS/3hO 3rd the s i ~ ~ i i l a thirdr generation operating systems pn~duccdby i~thurcomputer ~ n a t ~ l ~ l h c u l ~;~clually -ers sari sfied most ol' their customers reasonah1y well. Thcy also popularized severaal kcy techniques ilhsrnt in second-ge~lcl-ariol~ operating systcnli. Probably the 1arbsi impoflant of these was nlultiprogramrnisg. 0 1 1 the 7094. whcn thc current job paused to wait for a tapc or other 10 opcr;iiicm ro complete. the CPlj simply / idle until the 1K3 finished. With heavily CYU-hound sciznt ific calculations, [/O is infrequem. so this wabted rims is nor significant. With commrl-cia1 data pt%occssing, the I/O wait time can o k n hu XO or YO pel-ccnt ihc tol:il t i ~ n e . svmclhirlg sn had t he donc to avoid having the (expensive :I ('PU be idle so inuch. o The solution that evolved was io partition mcr-oory into hevcral picces. with a differen1 job in cach partilion. ils shown in Fig. 1-3. While one job was w*.aib,lg for 110 lo c ~ m p l e t e ,another job could be mirrg thc CPU. If cnoughjclhs ct~uld be held in main memory at unre. the CPU could be kcpr busy nearly 100 percellt of
"
L

1 ,

the rime. Having multiple jobs safely in inemory at once requires special harduw-e lo protect eachjob against snooping and mischief hy the other imes, b u ~ h e 360 and other third-generation systems were equipped with !his hardware.

Job I

___---

' x '

,.I

Memory partitions

Figure 1-4. A rnuIliprogramnlit~g sy5tcr-n naith thrw ,jdxin meinuq.

Another major feature present in third-generation operating systcms was the ability tu read jobs from c a d s onto the disk as soon as they wcrt brought to the computer room. Then. whcncver a running job finished, the operating s y s t e ~ could load a new job from the disk into the now-empty partition and run it. This technique i s called sprding (from Sirnuttaneous Peripheral Operatim 0 1 1 I h e ) and was also used for t,utput. With sprmling, the 1401s were no Iongcr needed, and much carrying of tapes disappeared. Although third-generation operating sysrems were well si~i1t.dfur big sciuntific calculations and ~nassivecnmmercial data prnrressing runs, they wcrc sttll basically batch systems. Many prugrarnmers pined for the first-generatim days when hey had the machine all ta themselves fnr a few hours. s they c [ ~ d d n debug their programs quickly. With third-generation systems. rhc time betweci~submitting a job and getting bnck the output was aften severat hours. so a single misplaced comma could cause a compilation to I'ail, and rhc prvgrarnincr t~ wsstc half a day. This desire for quick response time paved the way for timesharing. a wriant of multiprogramming. in which each user has an online ternliiial. In a limeshoring system, if 20 users are logged in and 17 of them are thiaking or talking oi. drinking coffee, the CPU can be allocated in turn to the three jobs that want service. Since people debugging programs usually issuc shnrt cominunds (r.g.. compilc a fivepage procedure+) rather than long ones (e.g., sart a milliun-record file), the computer can provide fast. interactive service to a number of users and perhaps also work on big batch jobs in the background when the CPU is otherwise idle. The first serious timesharing system. CTSS (Compatible Time Sharing System), wils developed at M.I.T. on it specially modified 7094 (Curbatd ~t 81.. 1962). However, rimesharing did not really becume popular until the necessary prchrctio~~ hardware became widespread during the third generation.
?Wewill use the terms "prmdure." "submutine." and
"functicm" in~crchanpeablyin this book.

success 01. the CTSS systrtn. hlIT. Bell Labs. i ~ n d GXXJ'A Electric ( thcl, a rnaior c i . j ~ l ~ p m;mul'ac~u~.er) l~t~f dccici~dto emhark on the develWlment of s .'compulc u t i l i t y ,*' a mactjjne illat. would supporl h ~ ~ n d r e dof ~ i l ~ d t a n c ~ ~ u s rimcsharing usrrs, Thejia l~lodel wils thc clectrici tv rlistributitm s~stem--whefl You need cicctric powcr, you just srirk a plug ill lhe wall, and within reason. as much powcl- you need will there. 'The designers of this system. known as M W TICS { Mt!LTipkxed Infurniltion and Ctmputing Service ). envisioned one l ~ ln;rchine providing computing power for evcryonc in !he Boston area. The ~ ~ ~ e ide;, i.h;li 111arhit1t.sFar more powerfid that) ~ h c i CE-h45 mainframe would he sdd r ft3r ;i t t i t ) ~ l s i ~ ~ ~ d 1101lii1~ hy the rrlillions only 70 years latcr nniss pure science t'i~tirm. Sort nf Iik c the idea of' supcrwnlc t r i l n ~ - : tlim tic u ~ d c w a l rains n o w . WC!I,TlC:S was a rniued success. it was d e s i y x d i r ~ supyorl Izu~~dreds uscrs rjl. ULI ii rn;whinl: unly slighily more powerful that] i111 I n t e l 3Xh-ba1.;d PC, althrlugh i i had much mtm I10 capacity. This is t l o t quilc ;is crazy as i i suunds, since people knew how tn write small, efficient progratmr; in those days, J skill that has subhey~zentlybeen lost. Thcrc were many reawns [hiit ML1L4'l-ICS no1 t : i k over lhu did world, not the least of which is that i t was \sv-itren in PLA. imd thc PL./J compiler Wiis years Iate and barely worked at a11 &hen it finally i~rrivcd. In addiriou, ML'LTICS was enormously ilii~biliulrs its ti~nc, t'or n-ruch like Charles Rabhage's analyrical engine in the nincrecnth century. To make a long story short. MULTICS j11iroctu~:cdmany seminal itleas into !he computer literature. bul turning it inlo a scrims product and a ~nqjor cutnrnerciai success was a lot harder than ai~yonc had iqccred. Bell Lnhs drnppzd our of ihc project, and Gcneral Electric quit the cnrrrpulcr htrsincs:. altogelher. Howexrrr. M.I.1'. persisted and eventually gut MLILTICS wnrking. 11 was ul~imalelysdct as a con~mercial product by the co~npany thar houeht GE's computer business Wmeywell) and installed by about 80 mijw unmpanies and cu~iversifir.~ worldwide. While their numbcrs vcre small. MI:] .TTC'S users wrrc ricrccly loyal. Cknera1 Motors; Furd, and thc U . S . National Security Agency. tlw example. o n l y shut down their MUI.TICS systems in thc latc 1 , 30 yearms;tftcr3 h.IL!L.TICS \*-as re I asell. t For ihe moment. the concept of a cornpulcr urility has fizzled CIHI hut i t may well come back in the form of m a s s i v e cerltrali7ed lrrlemet servers to which relativel y dumb user inachiiles are attached. with most of the work happening o n the big servers. The morivatiun here i s likely to he that niost peoplc do not wiint ro administrate an increasingly complex i ~ n d finicky computer s y stmm and wnultl prefer to have that work done by a team of professionals woi-king for the company runnins thc server. E-commerce is already evolving i n this dircction, with various companies running e-nialls on rnultiprocrssor servcrs lo which sinlple client machines ctmnect. very much i n the spirit or the MULTICS druign. Lkspitc i t s lack al' cominercial success. MIILTTCS hod huge influence subscqucnt npemting systems. It is described ill (Corbatti et al.,1972: Corb~td and Vyssotsky. 1965; Daloy irnd Dennis. 1968; Organick. 1972; and Snltzer- 1974). L t
~f~~~

itle

L.

SEC. 1 . 2

r r rsnmu OF OPERATING SYSTEMS

13

also has a still-active Web site, ~ ~ ~ . n r u l r i c ~ i u rwith ~ ~ t - g .deal of informa~ s ~ a great tion about the system. its designers, and its users. Another major developmen1 during t h c third ge~lerationwas the phenomenal The PDP- 1 had minicomputers, starting with t h DEC PDP- 1 in 6 ~ orl]y4K of 18-bit words, but st 5 I ZO,(KW per machine (less than 5 pcrcent of the price ot' ;I 7094). it sold like hotcakes. For certain kinds of nonnurnerjcal work. ir wn:. al~~li,sc fast as the 7094 and gave birth to a whole new indusfry. It was as followed by a series of other PDPs (unlike IBM's family, dl incompatib k ) ~ u l m i i ~ a t i n g the PDP- L 1 . in OIIC the computer scientists 31 Bell Labs w h r ~ of had wcrked on the MULTICS pruJcc.t. Ken Thumpson. slrhsequently found s small PDP-7 mi niuntnputer that n o unc was using and sct out to write a stripped-down, one-user version of MULTICS. This work later developed into the UNIX" operating system, which became gopular in the academic world, with governmen1 agencies, and with many companies. The hjstnry uf UNIX has been told efsuwhers (e.g+,Salus, 1994). Part of that story will be gjveu in Chap. lU. For now, suftke it to say. that becmsc the suurce code was widely available, various organizations developed their own (incornpatible) versions, which led to chaos. Two major versions developed, System V . from AT&T, and BSD, (Berkeley Softwatt Distribution) horn the 'University o f California at Berkeley. These had minor variants as well. To make i t pclssibie to write pmgrams that could run an any LJNIX system. IEEE developed a stmrlxd for UNIX. called POSIX. that must versions of [JMX now support. POSIX defines a minimal system call interfxe that conformanr LJNIX sysrelns must support. In fact, some other upcrating systems now also support the POSIX interface. As an aside, it is worth mentioning that in 1987, the author released a small clone of UNIX. called MINIX, for educational purposes. Functionally. W N I X is very similar to UNIX, including POSIX suppnrt. A book describing its jn~ernal operation and listing the source ctdc in an appendix is also available (l'anenhaum and W d h u l l , 1997). MINIX is a v a h b l c for free {including all the snurce codc) over the Internet at UKL w w l . r ~ . c ~ . v r r . r r ~ - ~ s ~ ~ ~ ~ i ~ i . ~ . ~ ~ ~ r n I . The desire for a free production (as opposed to educational) version of MINlX led a Finnish student, Linus Twvalds. to write Linux. This system was developed on MLNlX and originally supported various MINIX features (e.8.. the MINIX file system). It has since k e n extended in many ways hul still retains a 1n1.g~ amount of underlying structure common to MINIX. and €0 UNlX (upon which h e formcr was based). Most o f what will be said ahout UNIX in this hook thus applies to Syste~n , BSD. MINIX, Linux, and other versions a12dclones of L J M X as well. V

1.2.4 The Fourth Generation (19t3tLPresent) Personal Computers
With &hedevelopment of LSL (Large Scale Integration) circujts. chips contoining thousands of transistors on a square centimeter of silicon. the age of the persoiial computer dawned. In terms of architecrure, personal co~nputers(initially

~ n i r r a c u l ~ ~ p u t e \r. s ~ ~ no[ 311 that dilfi.l-L:n~ n p ~ : I rniniclunputcrs ~ 1 [he ' of yricc they cer-tainJ w e r e diffcrc-nt. W h ~ t h miniy c ~ p ~ p1 - class. hut in I computer ~nade t p c l s s i l k for a dcpartmcnt in a cornpony or nnivrrsily to haW it:+ i colnpulrr, the microprocessor chip made i t possible for L; single individual ! U h;~vehis ur her own personal computer. In 1974, whcn Intel came nut with thc XOXO, thc E i n t general-purpcw 8-bit C P U , ir wantcd an operating system fur r t ~ e XOXO. in par1 to be nblc lo rest i t . Intel asked one of its consultants. Gary Kildall. to write one. Kildall and n friend first huil t a urmtrollcr for the rlewly-ruluascd S hugxr Asscwiiltes #-inchflctppy disk and hooked thc lloppy disk up tr, the 8080, t h u s psrducing thc first micn~computcr with a disk, Kildall then wrote a disk-hased operating systcrn called CP/M (Control Program firr Microcomputers } i b r i t . Since Intcl did lint think thal diskbused rnicroc.omputers tiad rnuch nf ;i firturu. whcn Kildall iidic-6 fur t h c righis ru CP/M. Intel granled his rtxpesl. KilrhI l h c n fcm~ied ;1 cr,inp;my. Digitdl Rcsear~h,10 further develop and sell CP/M. 111 1977. Digital Rcscnrch rewrote CP/M til rmkc it suitiiblc fi.c,r ru~.rninz n thu o many rnicrr~uonqwtersusing the X080, Zilog ZM), anil r,tiicr8 C P U chips. hiiktjv application programs w m r ewritten tcr run o n CP/M, alliwing ir to uirmplctcly doinin;ite the w d d of microcomputing for. iihrlul 5 ysars. 1n t t ~ e early 1980s. [ BM dcsigncd ~ h IHM PC at-IJ l w k e c l ;lrourd for wftwaw c tu run on i t . People frnw IBM contacted Bill Gates [I, liCunsc his BASK' interpreter. They also asked him if he knew of an operating system to run un rhc PC. Gales suggested that IBM contncl Digital Rcsc;irch. {hen the w d d ' s dr~minant upcrating systems company. Making what was surcly thc wwst husinc~s dccisicm in recorded histc~ry. Kildall rcfused ro mrcr with IBM. scndirig a suhni-dinar.e instcad. To make matters w m e , his la^ yzr even refused lo sign 1RM's 11ondi.iclosure agreement cove^-ing the not-yet-unncwnccci PC. Cr:mscyurntly. I R M w e n t back $0Gates asking if hc could prtwidc them wirh iin opcmring sys~cm. When IBM came beck. Gates realized that a 1oc:rl computer ~nanufilcturcr, Seattle Cornputer Prtducrs. had a suitable operat ing sy sle m. DOS t Disk Opcrating System). He approached rhcm and asked t o buy it (allcgcdly 1 i - r Pd50.000). which they readily accepted. Gates then offered IBM a D I W B A S I C (xrckngc. which IRM accepted. I3M wanred certain modifications. so Gates hired the person who wrote DOS. Tim Palriwm. as an cinployec of [iatos' tlcdgling company, Microsoft, to make them. The revised sy.;tc.ln wils tnwilrnedMS-IIOS (MicroSoft Disk Operating System 1 and quickly came to dominate the IUM PC ~nitrlcl. .A key factor hcrr was Gaics' (in retrospect, cxtrernzlv wise) decision 10 h e l l MSDOS 111 computer companies fbi- buildling with thcir haniwnrc. compared lo Kilrlall's altrnlpc t c ~ ('P/hl to end users one ar a rime (ai h i s t il~itinlly). sell By thc tinw the IBM P U A T r a m our i n l983 with ~ h c Intcl KO2NG CPCS. MS-DOS was firmly cntrcnched and ( W \+-aso n its last lugs. MS-DOS wah latur 'M widcly used cln the XOZXt, and 80486. Althuuyh ~ h initial versic>n<-$MS-DOS wits c fairly pri mitivc, suhsequcnl versions included more advanced leaiures, including
L

Illany taken fmln U N ~ X . (Microsc~ftwas well aware of IJKIX, evert selfin$ 3 micmcomputcr version of it called XENIX Juritig rhc ~ ~ m p a n yearly yews.) 's ottlcr operating systcnls for early lnjcroco~npulcrswere clp/p,,q. MS-DOS, has,-d un users typing in commands from the keyboard. That eventually cha,lged due to rese.uch done by Doug Engelbar? at S~anford Research lnstitutc in the 1960s. Engelbatt invented the GUI (Graphical User lnterlnre). ~ronounced .'gnory.'* complete with windows, icons. mcnus, and mouse. These ideas were by researchers iit Xerox PARC and incorpurdted into machines they built. One day, Steve Jobs, who cn-invented the Apple cotnputcr in his garage. visited PARC. saw ij W , E and instm-~tlyrealized its potential value, smnething Xernx management h n ~ o u s l ydid nnt (Smith and Alexander, 1988). Jobs then embdrked on building an Apple with a GLJL This project lcd to the Lisa, which was t o n expensive and failed cr~mrne~ia!ly. Jobs' second attempt. the Apple Mi~cintr~sh. a huge success, not only h c a u s c it was much cheaper than rhe was Lisa. but also because il was user friendly, meaning that i t was intended for uscrs who nnt only knew nothing ahout computers bur furthennore had absolutely no intention whatsoever o f learning. When- Microsufi drcided to build ; S U L - W S S ~ : ~ It-o MS-r~i)S. t was strmgly I i influenced by the success of the Macinrosh. lt produced :d GLJI-based system called Windows, which clriginally ran rm h>pnf MS-UI?S {i.u.. i t was more like ;i shell than a truc operating system). For ahnut 10 y r m i . i'rmi~1985 r t ~ 1995, Windows was just a grnphir:al ei~vimnrnent rtlp lrf M S - I I O S . However. starting in on 199s a fretrsi;~nding vcrsiun i b f Windows. W i n h w s 9.5, was rmelcased that incc~rparated many operating systcrn f;.:ltutas into i l . usiny the undci-]vine MS-DOS s- ~ s ten1 rmly 1 ' b m ~ i n gand ru~uninycdd hl?;-lXjSprt~prilm.;, in 1008, a slighrly ~ iiwdified version uf this system. called Wiliifms 9%W:IS rcle;isc.d. Nevertheless, hrlth Winduws 95 and Wirrdilws 93 still uorll~iini r Iarpc. a t m u n l of th-hir [ntel ;~wi'iibZ language. y Atlother MicrrwA't t?per:isir.tg sysicm i5 Windt~wsNT iN-I' ~ri111ds N C W f~ Technnlogy 1, whtch is c.rm-qsdtib!c. will1 Winch u+s i11 ii cc~-lili Icvel. h ~ r CDIII11 pletr rewrite f r c m scratch j n t ~ l - ~ ~ i d 11 is ;i 1 1 1 17-hit sysicl~l.'The Ifad desigller ly. 11 f r w Windows NT was David Cutler, u fit) i t a s ;rlw r1nc 01' the derigncrs o f the L'AX V M S operating sy4.rln. srb sll~nc. itleas fl-wn L'MS arc p-osen( in ST. Micrr,w f ' t expected that the first ver.sir~rlill' N'l' ~ r ~ u l hill t ~ f ' f kJS-U<>S dnrj a11 r;>lhpr ycrci %ions Winduws since it wns n v:l\tly \t~pc.ripr5yslen1, hut i~l ' i / ~ l c d ,C)nlv with of Windows NT 4.0 did it finally c ; ~ ~ c h in :i hie w?;. nn rspcciiilly i)n corprjralz netv m k s ; . V c r s h 5 of WinJuws N'l- w;ls rcrralncd Wj nrJoi<~..s in e;u-]v ] WY. I t 2Il(l() was inttndesl tr, bu thu s w r c ~t t .j hnth W~II&?W.\ ; ~ l d ~ ~ ~ ;IT 4.i).W ~ ~ 98 W L J O did not quitc work [rut r'illwr. su Micl-owl't c ; m ~ with yet a~rotherout \el-sion of Windvws 98 c;dled Winbrws Me (1ZliIlsnnium edition l . The other major c o n ~ e n d r in. thc pcrconal ctrnlprrtcr world is UK[?i ~ i~s vat'iclos dcrivativcs 1. LiNlX ih s t r n m ~ ~ r s m workstarions and r,ther hipll-enci coll1ti pukrs. such as network scrwrs. 1t i s cspcci:llly pupulur on machines powered by
L .

16

INTRODUCTION

CHAP. I

high-perfonnance RISC: chips. On Pentiurn-based COmpU€erS, LinW 1% becoming a popular alternative to Windows for studenrs and increasingly mamy coTclrate dlroughout this b o ~ kwe will ux i h ~ tcml " k ~ ~ t i u l m "to users+ ( A s an mean thc Perttiurn I, 11, 1II. and 4.) A ]though many UKIX users, especiaH y expcricilccd prngrammers. pref'rr- a ~nrntnand-basedinterface to a GL!I, nearly dl U N l X sysktns SUppc?rl a w i l 1 d O N ' ~ ~ .;ystcln called l h X Windows system produced at hl. l.T. This system handles the ~ hasic window managenlent. allowing users to c r e a k , delrtt.. inove. and resize windrjws using u mouse. Often ii con1p1et.eCU1. such LLS Matif, i s avnilable to run o n tt,p c:,f the X Winduws system giving [:NIX ;I lotlk and feel w n ~ c t h i n glike ~ h c Macintush rlr Micrr~sr~ft Wlttdows. fr~r S L (!NIX users who want such a t h i n g ~ ' r l n inreresting develupnwnt that began uking place during zhe mid-1981)s is t hr grwwrh uf networks trf personal urmput crs ruu lt~jz~g network operating SYSterns and distributed rrperating systems i'1'~nenbaurl-rand Van Sreerl. 2002). In i t network r~perating sysr.em. the usets arc Aware ol* the eexis~cnceo znulr.iple wr11f pulers and can log in to rrrnrjte machinel;; ;lnd copy C i h trvm o n e tnar.hiiie to another. Each inachine runs its o u n local uperaiiny system :md has irs own local user (or users). Network operating systems are nur fundamoitally differcnl trorr~ singleP'OC"SSO~ operating systems. They obvious1 y n w d a nctwnrk interfiux cmtrollcr and some low-level softwarc tn drive it, i l s well as pmgrarns tr, ac11iwe remote login and remote file access, h u ~ thcse additions do not change the cssentiiil strilcture of the nperating system. A distributed opcrati t q system, in ctmirnst. i.: onc that appears lib its users as a traditional uniprocessor system, cvzn though i i is x r u a l l y composed of mu1 tiplc pmccssors. The uscrs should not be awijrc o f whcre their pt-ogmms are bring run vr where their liles art located; that shrmld ;ill be handled i~utrm~aticullv ~ i d 3 cfficien~ly thc operaling r y s ~ e n ~ . by True distributed operating syslcms rcquirr n w e than just d d i np a l i tt \s codc to ii uniprocessor rqxraring systcin, because distributed anti ucntrilli~cd sustelns differ in crilical ways. Distributed systems. tbr r.trrrnple. often i ~ l l o wapplications to run on scveral processors at the samc titnc. thus requiring motme colnplex pl-ocrssor scheduling algorithms in order lo opriminr r h iimount o f parollclisrr~. ~ Cominunicatinn delays within thc netxrrr-k oRcn nlean that lhcsr (sod ot.hr=r) algnrirhms must run with incomplete, wmiated. nr cvun inco~~rec.~ inFiwrrl~itinn. This situation is radically di ffercnt from a single- proceswr SF.: xttm i n u h i c h thc operating system has cmnplete iuformation ilhou~ syslcrrl \laic. the

1.2.5 Ontogeny Recapitulates Phy togcny
AlictnCharles Darwin's buok Tlity 0 1 - i y i i r u/' (/re Spucit.s was p~~bliahrd. thc German ~ m l o g i s t Ernsi Haeukel ztated ihitr "Onrogcny Rccapitulntes Phylogeny." By this hc meant thiit the developmer~ti>f an cmbryo (ontugmy) repeats

SEC. t .2

H W D R Y OF OPERATING SYSTEMS

17

( i .e.. recapitulates) the evolution of the species (phylogeny). In other wards. after fertilization, a human cgg goes through stages of being a fish, a pig, and so on before turning into a human baby. Modern biologists regard this as a gross simplification. but i t still h a s a kernel of truth in it. Something analogous has happened in the computer industry. Each new species (mainframe. minicomputer, personat computer, embedded computer. smart card. etc.) seems to go through rhc development that its ancestors did. The first mainframes were programmed enlirely in usscmbly language. Even complex programs. like compiiers and operating systems, were written in assembler. By the tinw minicomputers appeared on rhe scene. FORTRAN, COBOL, and other high-level languages were common o n mainframes, but the new n~inicumputcrs were i~everthelessprogrammed in assernbtcr (for lack of mcnrory). When micrucomputers {early personal computers) were invented. they, too. were prograrnrned in assembler. even though by then minicomputers were also prngrilmrned in highlevel languages. Palmtop computers also scarled with assembly ctdc but quickly moved on tu high-level languages (mr~stTyb e ~ a u s cthe devehprnent wurk was done on bigger machines). The same is true for sman cards. Now let us look at operating systems. The first mainframes initially had n o protection hardware and no supporl fcsr multiprt~gramming,so they ran simple opcrating systems that handled one manually-loaded program at a tirnc. Later they acquired the hardware and operating system suppofl to handle multiple programs at once,and then full timesharing capabilities. When minicomputers t?mt appeared, they also had n o prutcction hardware and ran one manually-loaded program at a time. even though multiprogramming was well established in the mainframe world by then. Gradually, they acquired protection hardware and the ability to run two or more programs at once. 'The first microcomputers were also capable of running only one program a! a time, but later acquired the ability to multiprograrn. Palmlops and smart cards went the same route. Disks first appeared on large mainframes. then on minicomputers. microcornpulers, and s on down the line. Even nuw, smart cards du not have hard disks, o but with the advent of flash ROM, they will soon have the equivalent of i t . When disks first appeared, primitive file systems sprung up. O n thc CDC 6600, easily the most powerful mainframe in the world during much of the l960s, the file system consisted of users having the ability tt, create a tile and thcndeclare it to be pwmanent. meaning it stayed on the disk even after the creating program exiled. To access such a file later, n program had to attach i t with a special command irnd give its password (supplied when the file was made permanent). In effect, therc was a single directory shared by all users. I t w a up to the users lo avoid file name conflicts. Early rr~inicomputer file systems had a single direc~ory shared by all uscrs and so did early rnicrwornputer file systems. Virtual memory (the ability to run programs larger than the physical memory) had a similar development. It first appeared in mainframes, minicomputers,

rnicl.ocomputers and gra~lual wurked i t s way down to smallci- and smaller 5)'slv tetns. Ne~workinghad a siinilar history. In all cases, the st>fmtwarr development w a s dictated by the ~echnology:?.'. The firs1 microuomputcrs. t i ~ r cxurnplc, had solnethi ng like 4 KB of memory and no protection h i d w a r e . High-level languages arld mu11iprogr;nnining were simply too much for such a tiny sysrem to handle. 4 s the mirrocotnputrrs evolved i n w modern personat computers, they acquired the necessary hardware and then the necessary software to handlc more advanced features. I t is likely that this d e \ r e l q m x n t will continue for years to come. Other fields may also have this wheel of reincarnation. but in the cornpurer industry i t seems ro spin faster.

1.3 THE OPERATING SYSTEM ZOO
All r ~ this history and duvelopinwt has lefl t ~ s ! i a widc variely ryt' t,pera[f ~ ~ h ing systems. nu1 2111 of whish are widely krluwn. tn this scctinn wc will briefly touch up011 seven of thcn-I. We will coma back; t o w i n c of thesc differrnt kinds r d ' systems h ~ u in the bnr,k. r

1.3.1 Mainframe Optrating Systems
A1 the high end are the r.>paratinp systcnlh ['rw the maintiiuncs, thusc Iwnlrrsizcd cntnputcrs still fourid in 117ajor corp[wiitc L~:LT;~ centers. l'hcse cr~mputersdistinguish themselves fronl personal conqmtcrs in l~:rrns ul' their 110 capacity. .4 ~naint'rumcwith 1000 disk?; and thnlrsands ilf gigabytes of ~ h t u nut unusual: is pcrsr~nalcomputer with thew specit'iuatior,s wr,~~!tl odd indcrc?. Mriinfrnmcs bc are also making s o n m h i n r or a ccmcbnc]\ ;iz high-cncl Wch wn!ers. server\ f c ~ r largc-scale clectwn tc r o ~ i ~ n w r csiics, a r d .icrvcr> for hi]sincss- IU- b u s i r w i s t rane
u

silctinrls.
The operating s y s t c m . f i w moinf'r;,rncr arc. he;t\ilv c;u.ic~li~il towar-tl p r c r c s s i n r man? jobs at oncc. rnmi o f whi~.hnecd prudigims ; ~ i l i ~ ~ u (11't n / O Thcy typically n i d f e r 1 hree kinds of scrvicrs: hi~tc l i m s i r c ~ i oprnccssi ng. and i inc.sh~i-in,g.;4 h. ~~ ~ hatch system i s ooc t h ~ tprocesses rourini. j c h ~ i r h o u tm y i n ~ e r o c i i vnszr ~ prrsunt. Cl;isms p n ~ c r s * i i~ i un insurance c.olripan> 111- sales I-epoiti119 for a i h a i i ~ n ~ of stwes is typiuitlly dune i t l botch mode. 7r;i nwct i tr11 proccsiin F svst<nls hund lc large nurnhers o( small rcqucsts. for esolllpk. chcch prcrce\uinp a1 il hank or airlinr reser~:atkrns. Each unit c.11- wiw-!., is s n ~ d l .~ L I I heS R S ~ L Y T I n ~ tmr~dk~ L I I I ~ i drcds ur thousaiids per s c ~ r ~ Titncshari~ipsvs tcms ol low mulliple rcmoto usc1-s d. to run juhs cm i h c c{.~mputcr ijnce. w c h ils quer\inp 3 big tlil~;tba%r. ut l'hcse fun^*lions itre closely tnrlarr.d:~nninfrarnc oprriiliiig systems d t e n pcrtirnn all of thcr~r An exiirnple mai tiPritnw irpcratinp system i s (:)9390. ;I dcscendanl ot' OSi36r).
CI

.

1.3.2 Server Operating Systems
level down are the server operating systcms. Thuy run on servers. which ;KC either very largc pc-rsonal computers. workstations. or even mainframes. They aervc multiple users at once over a net work and allow the users to share hardware and softwarr resources. Servers can provide print service, f i l e service, or Web service. Internet providers run many server machines to support their customers and Web siles use scrvers to store the Web p a p and hl-tndk the incoming requests. Typical scrver operaring systems are UNIX and Windows 2W0. Linux is n l w gaining ground Tor servers,

I

1.3.3 Multiprocessor Operating Systems
An increasingIy common way to gct major-kapuc computing p m w r i s to connect mullipk CPUs inio a single system. Depending on precisely how they arc connected and what. is shared, these systems arc called parallel ccjmputers, tni~lticomputers, ar inu[riprr~cessoru. They need spccial operating systeitls, but often these are varhtions ail the scrver operaling sys~crns. with special futures fnt communication and ~ o n n c c t i v i t y .

1-34 Personal Computer Operating Systems
The ocxt category is the personai computer operating system. Their job is to provide a g o d interface ro a single user. 'They are widlrly used for word processing. spreadsheets. and Intcrnet access. Comn~on examples are Windows 98, Windows 2000. the Macintosh operating sysrein. and Linun. Persot~al compukr operating systems are sn widely known thar probably little intrnduutirm is needed. I n fact- many people an: n < ~ t even aware thar orher kinds exisr.

1.3.5 Real-Time Operating Systems
Another type of operating system is thc real-time system. These systems are characterized by having time as a key parameter. For cxcrmple, in industrial p r w ess control systems, real-time corripuers have to collect data about the priduction process and use it to contnd machines in the factory. Often there arc hard deadlines that must he met. For example, if a car is moving down an asse~nbl linc. y certain actions must take place at certain instanis of time. II' a welding rohot welds too early or too lare, the car will he ruined. If the :iction absolutely ~ N F occur at a certain rnomcnt (or within a certain range). w r h;we i hard reat-time l

I

system. Another kind of rcal-time system i s a suft real-time system, in which missing an occasional deadline i s acceptable. Digital sudio or multimedia systems fall in ihia category. VxWvrks and QNX are well-known real-time operating systems.

20

INTRODUC?'IQY

CHAP. I

L3+6 Em bedded Operating Systems
Collti~~uing down to smaller and smaller s\.ste~ns;wc comr lo palrntqJ on computers and embedded systems. A palmtop computer or PDA (Persmal Dipital Assistant) i s a srnall cornputer that fits in a shil-t pockct and perfr~rms sn1;rll a number 11f functions such ;is ;In clcct~.nnicddr-ess b c ~ ~ k mcinr, y i d . Limbediind dad s y s t e m s r u r i on ihc computers that c m t r o l tlrvicus that are n o t generally thought of as cclrnputcrs. such as T V sets. micmwave ovens. imtl mohilc tetrphones. These often h a v e some chijractcristics of real-lime systems hut also h a w s i zc, metnnry. and p w c r t e s t ticlions that t i l i i k ~them special. Examplcs of s w h rspcl-rlting systems are Pal mOS rind Windows CL (C1rs t~sirmcr Iilectrnrlic-i).

1.3.7 Smart Card opera tin^ Systems
The sn~nllestoperating systems run 011 smart unrds, which art: credit cardsized devices cnnlzlining a CPI! chip. They h a w very sevrrre prncessing p w ~ u i . and rnc~nm-y constrsints. S r m e of them can handle mly n s i nglc function. such iiS electronic payments, bur others can handle rnulliple functions on the samc sm;trr card. Often rhesc are prr~prietarysystems. Some smart cards are Java r~rier~ted. What !his rrieans is thal thc ROM rm the smart card holds an interprttcr fnr thc Java Vii-iual Machine TJVM). Java ;ipplels (smaI1 progratns) arc downloaded t r y the curd and arc- interpreted by Lhe JV,M interpreter. Some of thesc cards can handle multiple lava applcts at tht: same lime, leading tu n~ultipimgrammingand the need to schcduk L ~ L ' I ~ . Rcsrwrce management and protection also become arl issue when two or Imnre iipplets are present a1 the same time. These issues must he handled by the (usuelly rxlrurnely primitive) operating sysmn present un lhc card.

1.4 COMPUTER HARDWARE REVIEW
An upcrating system Is intimately tied t t thc hardware of' the computer il runs ~ on. It extends the computer's instruction set and rnirnages its tmesuurcrs. To work. it must know a great deal about thc hardw;ir-e, nl least, about huw the hardwar-c

appears t the programmer. o Conceptually, a simple personal ctmmputer can bc nhstracred ro :I rnodcl resembling that of Fig. 1-5. The CPU. memory, and tN) devices are d l connecretl by a system bus and communicate with one mother- c~vci.it. Modern persondl computers h a w a more complicated structure. inwlving rnul~iple huscs, which n8c will look at later. For the time being. [his model will be wfficienr. In the l i > l l o ~ ing sections. we will bricfly review lhese components and exalninz some uf the hardware issues that are tlf concern to operating system designers.

COMPUTER HARDWARE KEVTEW
Monitor

Hard disk drive

I
CPU Memory

controller
I

Video

Keyboard controller

F~OPW
,

1. . ..
Hard
.

..

disk controller

controller
T

disk
I
I

:

I

I

Bus

me "brain" of the computer is the CPU.

It fetches instructinns frorrl memory

and executes them. I'hc basic cycle u l rvcry CPU is to fetch the first instmction from memory, decode it to determine its type and operands. execute tt. and then fetch, decode, and execute subsquem instructions. In this way, prugra~ns carare ried uut. Each CPU has a specific set of instructims that it can execute. Thns a Pentium cannot execute SPARC programs and a SPARC cannot execute Pentium programs. Because accessing memory tn get an instruction or data word takes much longer than exesuttng an instrucrion, all CPUs contain sume registers inside tu hold key variables and temporary results. Thus the instruutiim set generally contains instructions to load a word from memory into a register, and stare a word from a register into memory, Other ins~ructionscombine two operands from registers, memory, or both into a result, such as adding twc~words and storing the result in a register w in memory, L addition to the general registers used t r ~ n bold variables and temporary results, most computers have several special registers that arc visible to the programmer. One of these is the program counter. which contains rhc rnernory address of the next instructinn to be fetched. After thn~instruction has been fetched, the program counter is updated to point to its successor. Another register is the stack pointer. which points to the top of the current stack in memory. The stack contains one frame for each pmcedute that has been entered but not yet exited. A procedure's stack frame holds those input parameters. local variables, and temporary variables that arc nor kept in registers. Yet another register is the PSW (Program Status Word). This register cnntains the condition code bits. which are set by comparison instructions, the CPU priority, the mode (user or kernel), and various other control bits. User programs

may ,loI-jualJy read the cnlirc P S W but lypicolly m a y u.ritc. only StlIllC o f i t s fii'ltls. The PSW plays an il~~portant i n system calls and 110. role ~h~ operating system lnust bt. aware uf a l l the rugistcrs. When time multiplexing the CPU, the opcratinp system will often stop thc runninp PI-opramt r ~ (re)stn~-~ another one. Every time i t stops a running program, the clperating system tnu~t save all the registers so they can be restored when the program runs later. To io~proveperformance. CPU drsipner..s have lnng ahandoiled rhe si~nplc inndel o f fetching, decnding, and executing one instruction at a time. Many madem CPUs have fxilitics fnr executing mw-c than rme instruclion at the same rime. For example. a CPU might hatc separate Lktch. decode. a11d execute units, sn that while it was cxwuting instruction n, i l could a h be decoding instruction n + I and fetching instruction rr + 2. Such an nrgnni~atinnis called ; pipeline 1 rrnd i s illustrated in Fig I -6Ca) for a pipeline with three stages. Longer pipelines are rssnmwn. In mosl pipcline designs, m c e an i r ~ ~ l r u c t ihas hcen frstchcd intu r~n thc pipeline, it must be executed, even if the preceding instructivn was a crmdirional branch that was takcn. Pipelines causc compiler a.1-iters and operaling system writers great headaches hecausr rhey expclsc the cm~plenitit.s the undcrlyof'

ing machine

tr,

them.

,

Fetch
unit

Decode unit

f

I

Fetch

unit

a

Decode unit

Execute upit

:

Decode
unit unit

I4

Evcn inore advanced than a pipeline dcsigrl i s i t supersralar CPU. shoir:n in Fig. 1 -6(h). In his design, muhiple cxccution unils are present., for cxainple. one for integer arithmetic. one fur floating-point arithmetic, and one Crw Hooleim operations. Two or inore instructions are fetched a1 once. decoded, and durripcd into a holding buffer until they can he cxscimd. As soon as a n execution unit i s free. it looks in the holding huffer to see if there is an instruction it can handle. and if so. it removes the instruction from the buffer and exccutes it. An irnplication rrf this design is that program instructicms arc n h n executed out o f order. Fnr the most part, i t is up to the hardware to make sure the resull produced is the same one a sequential irnplcmentaiion would have pn~ducrd.but an annoying iirnount of the complexity i s foisted onto the i p x a r i a g system, as we shall scc. Most CPUs, except w r y simple ones used in embedded systems, have lwo modes, kernel mode and user mode. as tnentioned earlier. Usually a bit in the

SEC. 1.4

COMPLITER HARDWARE REVIEW

23

PSW controls the m{,de. When running in kernel mode. the CPU can cxecule

every ins~uction its instruction set and use every feature of the hardware. The in o~rating system runs in kernel mode. giving it access to the cc~mplete hudwarc. I n contrast. uscr programs run in uscr mode, which permits only a subset of the instructions 10be executed and a suhser of the features to be accessed. Generaljy. a l l instructions involving I/O and memory protection are disallowed in user
mode. Setting the PSW mode bit to kernel modeis also fohidden, of course. To obtain services fmm the operating system, a user program must make a system call. which traps into the kernel and invokes the operating system. The TRAP instruction switches from user mode to kernel mode and starts the operating system. When thc work has been completed. control is returned ro the user program at the inslruction following the system call. We will explain rhe details of the system call process later in this chapter. As a note on typography, we will use the lower case Helvetica font to indicate system calls in running text, like t h k read. It is worth noting that computers have traps other than the instruction fur executing a system call. Most of the uther traps arc caused by the hardware to warn of an exceptional situation such as an attempt to divide by 0 or a floating-point undert'low. In all cases the operating system gets contrtd and must decide whiit to do- Sometimes the program must be terminated with an error. Other times [he error can be ignored (an underflowed nutnkr can be set to 0).Finally, when the p r o g m has announced in advance that it wanis tn handle certain kinds o f canditions, control can be passed back to the program to let it deal with the problem.

The second major component in any computer is the mcmoty. Ideatly. a memory should be extremely fast (Faster than e ~ e c u t i n gn instruction so the C'PU a i s not held up by the memory). abundantly large, and dirt cheap. No current technology satisfies all of rhcse goals, so a different approach i s taken. The memory system i s con~uucted a hierarchy of laycrs. os shown in Fig. 1-7. as The top layer consists uf the rcgistcrs intcrnal to the CPI!. They w e made of the same material as the CPU and are thus just as fabt as {he CPIJ. Conacquently. there i s no delay in accessing (hem. The sloragu cnpaciry avnilablv in them i t yps icafly 32 x >?-bits on a 32-bit CPU m d 63 x 64-hits on ;i h4-bit CPU, Less than I KB in both cases. Programs must managc the registers (ire..deride whal to kcep in them) thernsdves, ir.1 software. Kext comes the cache memory. which i s rnos~lycontrolled hv the hardware. Main rnemury is divided up into cache lines. 1ypically 64 hyres. with addresses O tn 63 in cache {inc O. addresses M to 1 27 in rache line I. and so o n . The most heavily used cache lines are kept in a high-spced cache located inside o r VCI-y close lo rhc CPU. Whrn the program need?; to read a memory word, the cache hardware checks to sec if the line needed i s in the cache. If it is. called a cache

INTRODUCTION

CHAP. 4

Typical access time
1 nsec

Typical capacity

,

I Registers [
Cache
Main memory
Magnetic disk

,

4 K8

1
10 nsec
10 maec

64-51 2 MB

I '
I

1
I

5-50 GB

100 sec

Magnetic tape

20-1ODGB

hit. ~ h rcquest i satisfied from h e cache and no rneintlry request is sen1 w e r the c s bus lo the iniiiii mernrq. Cache hits normally take about ~ w t , cluck cycles Cache misses have ti) go to mernory, with a substantial timc penalty. C;lche rnerrtory i s limited in size due to ils high cost. Sornu machines have two .rjr even ~hrce levels of cache, each one slr~wer and bigger than the m e before il. Main memury comes next. This is the workhorse of the mcmxsrv system. Main memory is often called RAM (Random Access Memory). bid tirncrs sometimes call it core memory, because computers in the 1950s and 1961)s used tiny magnetizable ferrite cores for rrtuin memory. Currently, rnen~nriesare rens t r l hundreds of megabytes and growing rapidly. All UPL' requcsts that cannot he satisfied out of the cachc go to main memory. Next in the hierarchy i s magnetic disk (hard disk). Ilisk storage is two rwders of magnitude cheaper than RAM per hit and often ~ w orders uf magnitude larger. o as well+ The only pri>blern is that the time ttr randornly access &la o n it is clostL to three orders of magnitude slower. This Iow s p e d i s rluc ti) thc fact lhat a disk i s a mechanical device, as shown tn Fig. I -8. A disk consists of one rlr more metal p h ~ c r s that rotilk at 541'10. 72W. 01IO,XoU rprn A mechanical arm pivots over the platters from the corner, similar to the pickup arm on an old 33 rpm phonograph for playing vinyl records. Inf3orm;ition is written onto the disk in a series uf concentric circlcs. At any r i v c t ~arm position, each of the heads can read an arlautar region called a track. l'ogolher. all the trnck.s fnr a given arm pnsitiun form a cylinder. Each track is divided into some nulnbcr of sectors. typically 512 hytcs per sector. On modern disks. the uukr cyhnders crmtain inme sc-cturs than h c . inncr ones. Moving the arm from one cylinder tn rhe nexl one takes abour 1 msec. Moving it ro a random cylinder typically takcs 5 mscc to 10 msec. depending on Ihe drive. Once the arm is on the c ~ m x ttm ~ k t, l w drivc musi wait fur. the needed secror to rotate under the head. an additional delay of 5 Inarc to I D msrc. depending un the drive's rpm. Once the sector i s under the head. reading or writing r m x r s at a rate of 5 M B k c un low-end disk5 10 160 blR/seu i-m fdster ryncs.
L.

SEC. 1.4

COMPUTER HARDWARE REVIEW
Read/write head (I surface) per

Surface 7
Surface 6 Surface 5

Surface 4 Surface 3 - --.
__--

Direction o arm f

--

Surface 2 Surface 1

-.

Suhce 0

Tt-le final layer i n the mcrnory hierarchy is magnetic tape. This medium i s often used as a backup f x disk stwage and for holding very large darn sets. Tr) acccss a tape, it must first be put into a tape reader. either by a person or a mbnt (autnjnarcd tape handling is common at ~nstallationswith huge databases). Then the t a p may have to be spooled forwarded 10 gct to the requested block. All in ail, this could take minutes. The big plus of tape is that tt is exceedingly cheap per bit and removable, which is important for backup taws that must he stored off-site in order'to survive fires, floods, earthquakes. etc. The memory hierarchy we have discussed is typical, but some installations do not have all the layers or have a few different ones (such as optical drsk). Still. in all of them, as one goes down the hierarchy, the random access time increases dramatically. the capacity increases equally dramatically, and the cost per bit drops enormously. C'onscquently, it is likely that mcmnry h i e r a r ~ h k swill be around far years to cornc. In addition to the kinds of memory discussed above. many computers have a small amount of ncmvolati le random access memory. lJn like RAM, nonvolatile memory does not lose its contents when thc powcr i s switched off. ROM (Read Only Memory) i s programmed st the facrory and cannot be uhangcd aflrrward. It is fast and inexpensive. On some computers, rbe bootstcip Ioadcr used to slart the computer is contained in ROM. Also, some 1/0 cards unrne with KOM fnr himdling low-level device control. EEPROM (Elecltricdly Erasable ROM 1 and flash RAM ul-e also nonvolatile, but in contrast to ROM can he erased and r c w r i w n . Howevcr. writing them takes orders of rnagnilude more time thau writing R A M . so thcy are used in the same way ROM is. only with the additional fcaturr that it is now possihlc to correct hugs in programs they hold by rewriting them in thc field. Yet another kind of memory is CMOS, which is volatile. Many cclmputers use CMOS memory to hold the currenr time and date. The CMOS memory and

he clclck circuir [ha[ incrcll~enlrthe timc in i r arc pc!ncrsJ b? n 5113~11 balrcry, thr: rilnc i.; Lorrcclly svtn w h r n rhc ci~mpurcr q irr~piugyA.'l'hi- CMOS updaiod, i tncmory c a n 4 1 ~ ~ 3 ihl: ~ w f i g ~ ~ - a tparanleterr, such as *hich disk to bt.~A hold ion fri,m. CAMi_lSis used becmse it draws su lit!le prnwr thar rhc uriginiil tBctrrviosrallcd battery often l a s ~ s several y e a r s Hnwevcr. whzn it hegins to i i i I. the for computer can appear to h a w Alzheimer's disaasc, forgetting things thar i t ha5 known fnr ycars, like which hard disk ti> hoot fmm+ Let us now foeus on main memory for a lirrlc while. I t is oitcn dcsirahlc tit hold multiple programs in memory at-once. If m e program i s blucked waiting f t ~ r a disk read ro complete, another program can use rhe CPU. giving I hettcl- CPI! , utilizalion. However. will1 two or r n r m programs in innin memrlrv 31 ijric'c. tw'r)

2 . How tu handle relocation. Many solutions arc possible. However-. all with s p x i d hardware.
i-11'itlcnli i r r vu!\.r. q u i p p i n g t hv

CTli

rible to rcfcrcncc any Dart of memory atrcw? itsti!'. T ~ u this s c h t n ~ e s solvc~. hcrrh the protection and the relocation problem ac the cost of two ncw registers i ~ n d a slight ~ I I C ~ C ~ I S C . cycle time (to perform the limit check and atlditicrnj. in
Address
Flegisters when

prograrn.2
is running

User p r o g m and data

j

Registers

+ -

Limit-2

Limit

-l - 4
1
Usmprogram and data

program 1 is running

User-2 data

+Bass-2

I

Base-2

User-1 data

Base- 1 - , -

-

Lirn~t-1

Base- 1

The check and mapping result in converting an address generstcd by the program, called a virtual address, into an address used by the memory. called a physical address. The device that performs the chcuk and mapping is called the MMU {Memory Management Unit}. It is located on thc CPU chip or close t i ) i t . bur is logically between the CPU and the memory. A more sophisticated MMLJ is iliustratcd in Fig. I-c)(b). Here we have an M M U with two pairs of base and limit registers. onc lor rhc p r o p r i m t e x t iind one for the data. The program counter and all other rcfcrencos tu the program text use pair 1 and data references use pair 2. As a conseqirence. it is now possible to h d v r multiple users share the same program with only one copy of ii in memory. scrmething not possible with the first scheme.. When program 1 i s running. the four registers arc set as indicated by the arrows to the left of Fig. 1 -9(b). When program 2 i s running, they are set as indicated by the arrows to the right of the figure. Much mure sophisticated MMUs exist. W r will s~udy some of them later in this

1.4.3 I/O Devices
Meinory is not thc only resourcc t h a ~ 1 - 1 ~ opcl-atinp system must nlanuge. I/(.) 1 devices also interact heavily with the operating sysicni. As w r saw in Fig. 1-5. I/O beviccs generally consist of T W O parts: ii ~*nnirr,ller i d the devics irseli'. The a cuntrollcr is a chip or a set of chips on a pluc-in board that ph~sicallyc u n t r d s thc ,_. device. It accepts comtniinds from the operating syslcrn. fur rxample, l o read data from the device, and carrics them our. In many cases, the actual control of the devicc i s vcry complicsted and detailed, so it is the joh of the ccmtrotler to prbescnt n simpla- inrerfilce to the operating system. For example. a disk con ti-ollrr might accept n command to read sector 1 1.20h from disk 2. The cuntrolltr- then has to convert this lincar sectrw number to a cylinder, sector, and head. This conversion may hc crrrnplicetcd by the fact that outer cylinders have more sectors than innur m c s atid that some bad sectors have been remapped onto 01her uncs. Then thc conmller has to dewmine which cylinder the disk arm is on and give it a scquencr of pulses to move in VIc ~ u tthe requisite numbcr of cylinders. It has lo wait until the proper sectur has rotated under the head and then start reading and storing the bits as they come off the drive. removing the prcan~blc and computing the uhecksum. 1:innlly, it has to assemble the incoming bits intu words and storc them i n mcmury. To do all t h i ~ work, controllers often contain small entbcdded computers thal are programmed to do their wr~rk. The other piece is the actual device itsclf D c v i c r s havc fiiirlv simple interfaces. both because they cannot do rnllch and r v inakc them standard. The loiter i.; needed s that any l F disk cnntn~llercan handlc any IDE dish. For cxalnplc. o n5 IDE stands for Integrated Drive Electronics and i s thu standard type of disk or, Pentiums and some other computers. Since the acrual dc\ice interface is hidden

SEC. 1.4

COMPUTEK HARDWARE REVIEW

behind the cclntr.oilcr+ 1 that the operatillg syStWIl S W 5 the ir~terfdcetu the cotld i f f e ~ n from the interfiie to the device. l tru\lcr. which ,nay be Because each type of controller is different. different software is WXdeti to cunrl.ol tach one. The software tfiat talks to n conlrokr, g i v i n g it conlnlands and responses, i s called a device driver. Each controller tnanufactuer has to supply n driver for each operating system i t suppwts. Thus a scanner rnay cumc with drivers f~)r Windt~ws W i n h w s 2WU, and UNIX. for example. 98, be used. the driver has to bc put intlo the operating system so it can run in kcmcl n~ode.Theoretically. drivers can run outside rhe kernel. hut few current svslcms support this possibility becausc il requires the abiiity to allow n userspace driver €{, ahlc to access the device i n n controlled way, a feature rarely be suppmted. There are threc ways the driver can be put into the kernel. The first way is to relink the keinel with the new drivcr and then reboot the system. Many UNIX systems work likc this. The second way is to make an entry in an operating system file tellling it that it needs the driver and rhen rebout the system. At boot time, the rjpcrating syst.en.1 goes 3rd finds the drivers it needs and loads them. Windows works this way. The third way i s fbr the operating system to k able. tu accept new drivers while r m n i n g and install them on-the-fly without the need to rebool. This way used tu he rurc bur is kcorning inlrch more cornmoo now. Hot pluggable devices, such ;is USR and IEEE, 1393 devices (discussed helow) always need dy narnicdly hadcd drivers. Every controller hiis a snlall number of registers that are used to com~nunicntc with i t . For example, il tnininial disk controller might have registers for spccifying rhe disk address. niernory address. sectclr count, and direction (read or write). Ta activate the controller, thc driver gets a command from the operating sysicm, then translates it into the appropriate values to write into the device rcgisrers. On some computers, the device registers are mapped into the operating system's address spacc, so rhcy can be read and writwn like urdinary memury words. On such computers, n o special I/O instructinns are needed and uscr plmograms can be kept away from the hardware by not putling these memory addresses within their math (c.g., hy using hase and limit registers). On nthcr computers, the device registers are put in a sprciirl I/O port space. with each register having a port address. O n these machines. special IN and OUT instructions are available in kernel mode to allow drivers to read and write the registers. The furmer scheme eliminates the need fur special 110 instruclions but uses up some of ihc address spacc. The latter uses no address space but requires special insrructions. Both systems are widely used. lnput and output can he done in three different ways. In the sinlplest method. a user program issues a system call. which the kernel then translates into a procedure call to he appropriate driver. The driver then stnrrs the I/(> and sits in a tight loop continuously polling the device to scr i f it is done (usually there is some bit that indicates that the device is still busy). When the V J has completed, the C driver puts the data where [hey are needed (if any). and returns. The clperating

control lo thc caller. This method i s called busy waiting a n d has the disadvantage of lying up the CPU polling ihe device until it is finished. '['he secrmd rnethud is for the driver to start the device and ask it ti) give 3n interrupt when ir is finished. At [hsr point the driver returns. I ' h e operaring syszcm rhcn blucks the r;nllzr if need he and Lwks far clther work t o do. When the vonrroller detects the cnd of rhe ~ransfer. t generates an interrupt ro signal corni
systcrn then returns
L

pletinn. h t c r n q ~ s very i~-rlpr~rtnnt operating syslerns, s u l ~ us examine fhe idca atbe in r more dusely. In Fig. 1-IO(a) wc see a threc-step process fur VU. In step I , the driver ~ 1 1 s controller what to do by writing into its device regisrcrs. The conthe

chip wing certain bus lines in step 2. If the interrupt c ~ ~ i t r o l l c r prepared lo is accept the interrupt (which it may not bc i f it i s bwiy with a higher priority m c ) , il asserrs a pin on the CPU chip informing itl in step 3. Jn step 4, the interrupt coiltroller puts the number the device on the bus so the CPU can izad it and k n o w which device has just finished (many devices may he running at the samc t i m c ) .
Disk drive

9
Oisk controller

2. Dispatch

Once the CPU has decided to take the interrupt. thc program crmnler and PSW me typically then pushed onto the curre111 stack and the CPU switched intrr kernel mode. The device number may be used as an index into part ul'rnewory find the address of the interrupt handler for this device. This piin of rncrnory i x called the interrupt vector. Once the intenup handler (part of the driver fior the intempti ng device) has st:irted, it removes: tlic stacked program cclunter and PSH. and saves them. then queries the device to learn its status. When rhe handler i s all finished, i~ returns to the previously-running user program to the first instruction that was not yet executed. These steps arc shown in Fig. i - lO(h).

The third method for doing 1/43 m a k s use uf a special UMA (Direct Memory Access) chip that can control the tlow of hits between menlory and some c~otrollcrwithout constant CPll intervention. The CPLI sets up the DMA chip, [elling it how many bytes to transfer. thc device and rnrlnory addresses involved. and the direction, and lets it gn. Whcn the DMA chip is done, it causes an interrupt. which i s handled as described above. D M A and 1/0 hardware in general will be discussed in more detail in Chap. 5 . Interrupts can often happen st highly inconvenient tnornents, for cxarnple, whilc another interrupt handler is running. fix this reason, the CPU has a way to disable interrupts and then rcenable them later. While interrupts are disnblcd. any devices that finish continue tn assert their intcrrupt signals, but the CPU is not interrupted until interrupts are enabled again. If mu1ttplc devices finish while interrupts are disabled. the inte.nupt cuntroller decides which m e rn lei through first, usually based on static priofities assigned to tach device. The highest priority device wins.

1.4.4 Buses
The organization OK Fig. i -5 was used o n minicomputers for years and .;rlstl o n the original ISM PC. Flowever. as processors and n~ernnrics faster. thu ahilily got of a single bus (and ceflainly the IBM PC bus) tu handlc all the traftk was strained to the breaking point. Something had to givc. As a result. additional buses were added, both far faster U 0 devices and fnr CPLJ to nleinm-y traffic. As a consequence of this evoiution, a'lnrge Pentiunr sqatenr currrnti? looks something ltke Fig. 1-1 1. This system has eight buses (cache, local, memory. PCI. SCSI, USB. IDE. and ISA). each with a different transfer mte and function. The operating sysrem must'be aware of all nf them for configuration and management. The two main buses are the original IBM PC ISA (Industry Standard Architecture) bus and its successor?the PC1 Peripheral Component Interconnect) bus. The ISA bus. which was originally the IBM PCIAT bus, runs at 8.33 MHz and ran transfer 2 bytes at once, fora n~aximurn speed of 16.67 MBlseo. i t is irrciuded for backward compatibility with o d and slow 1/0 cards. The PC'! hus was invenicd bv Inlei as n l successor to the ISA bus. It can run at 66 MHz and transfer 8 bvtes at a time, for a data rare of 528 MB/sec. Most high-speed [/O devices use the PC1 bus now. Even some non-Intel computers use the PC1 hus due to the large nurnbcr UP 110 cmds available for it. In this configuration, the CPU talks to the PC1 bridge chip over the local bus. and the PC1 bridge chip talks to the mernnry over a dedicated memory bus. ollcn running at 100 MHz. Pentiurn systems havu a level- t cache on chip and a nluch larger level-2 cache off chip, connected to the CPV by h e cache bus. In addition, this system contains three specialized buses: IDE. USB. imd SCSI. The IDE bus is for attaching peripherdl devices such ;ls disks and CD-

32
Cache bus
!
I

INTRODUCTION
Lucal bus
Memory bus
'

Level 2
cache

/1
1

i4
.

h-. 4

GPU

k

+
I

r

PCI bridge

7

,

PC I bus

I

I

I

1

I

I

,

.

&. "p bridge

I SA

IOE disk

Available PC1 slot

, "

ISA bus

7

P

I I
30um 1
- -

. .

..

1

Printer

I

Available ISA slot

t

ROMs to the system. Thc i n E hus is an outgrowth of the disk cot~trnllerinterfac-u o n the PC/AT and is now standard on nearly all Prntium-based systcnw for the hard disk and often the CD-ROM. The USB (Universal Serial Bus) was invented to attach 411 rhe slow 1/(3 devices, such as the kcyhoard and mouse, to the computer. It uses a small frmwire connec~z~r, of which supply electrical power to the USB devices. IJSB is two a centralired bus in which a root device p d l s the 110 devices cvery I msrc to scc if h e y have any traffic. It can handle an aggregaic toad of 1.5 MB/sec. All the U S 0 devices share a single USB det!ice driver, making i r urrnecessary to install a new driver fur each new USB device. Conseqocntly. CJSB d c v i c ~ s can bc added lo the cnmputcr without the need to rebrmt. The SCSI (Small Computer System Intwfnee) hos is a high-pcrbrn~ancc bus inkndcd f w fist disks, scanners, and othcr devices needing cmsiderablr: bandwidth. It can run at up to 160 MB/sec. I t has hecn present on Mxinttrsh systems since they were invented imd is also popular on UN[X and wmc Intel-hused systems. Yct nnothcr bus (not shown in Fig. 1-1 I ) i s IEEE 1394. Somelinizs i t i.; catled FireWirc, although strictly speJ&ing, Firewire is the name Apple uses for i t s implementation of 1394. Like USB, IEEE 1394 is bit serial but is designed fur

packet transfers at speeds up to 50 MB/scc, making it useful for connecting digital and similar multimedia devices to a computer. unlike USB. IEEE 1394 does not have a central contrulter. SCSI and IEEE 1394 face compcti€ion from a faster version of U S 3 k i n g developed. To work in an environmen~ such as that of Fig. 1- 1 1. the operating system has lo know what is out there and configure it. This requirement led lnrel and Microson tu design a PC system called plug and play, based o n a similar concept first implcrncnted in the Apple Macintosh. Before plug and play, each I/O card had a h e d interrupt requesl level mri fixed addresses for its I/O registers. For example. the key hwmd was interrupt 1 and used I/O addresses 0x60 tn OxM. the floppy disk ont troller was interrupt h and used 1/0 addresses 0 x 3 W ro 0x3F7. and the printer was inierrupt 7 and used I/O addresses 0x378 to Ox37A, and so on. Sr, far, s grmd. The trouble came when the user bought n souad card and n o modern c a d and h t h happened to use, say, interrupt 4. They would cuntlict and would nnt work together. The solution was to indude DIP switches rx- jumpers a n every 1/U card and instruct the user to please set them to select an interrupt I e v d and 1/0 device addresses that did not conflict with m y others i the user's system. n Teenaps who devoted their lives to the inlricacics of the PC hardware c o ~ ~ l d sometitnes d o h i s without making errors. Unti~rtunateIy,nobody else could. ieadi ng to chaos. What plug and play does is have the system automatically c d l c c t information about the V 0 devices, centrally assign interrupt levels and IK3 addrcsscs, and then tell each card what its numbers arc, Very briefly, that works as fnllnws i the m Pentiurn. Every Pentiurn contains a parenthoard (formerly c a l l 4 a motherboard before political correctness hit the computer industry). On the paenthuard is a program called the system BIOS (Basic Input Output System) The BIOS contains low-level 110 sdlware, including procedures to rcud the keybond, write to the screen, and do disk I/O, among other things. Nowadiiys. it is held in a flash RAM, which is nonvolatile but which cart be updaced by the operating system when bugs are found in the BIQS. When the computer is booted, the RIOS is started. It first checks to see how much RAM i s installed and whether the keyboard and othur b a s k devices are installed and responding correctly. It starts out by scanning h e 154 and PC1 buses to detect all the devices artached to them. Somc of these devices itrC typically legacy (i.e., designed before plug and play was invented} and have fixed intempt levels and U 0 addresses (possibly set by switches or jumpcrs on the 1/0 card. hut nut modifiable by the qxrating system). Thesl: devices ore recorded. The plug and play devices arc: also recordd. If rhe devices present arc different from when the system was last bouted, the new devices ere configured, The BIOS then determines the bwut devicc by trying a list o deviccs stored in f the CMOS memory. The user can change lhis list by entering a BIOS configuration program just after booting. Typically, an attempt is made to boot from the floppy disk. If that fails the CD-ROM is tried. If neither a floppy nor a CD-ROM

1 5 OPERATING SYSTEM CONCEPTS .
All operating sysren~s have certain basic concepts such as pmursses, mcmory. and files that are ccntral to understanding rhcin, In thc tbllowing sectims. we. will Irmk at some of these basic concepts evcr so brietlv. as an i ntrtduction. We h-ill come back ILI each of them in great bctnil later in !his htmk. 'l'n illustrate ihese concepts we will use examples from tirne t r ~ ~irne.getlersllly drawn f r c m U N l X . Similnr. examples typicaljy m i s t in other systerrrs as well. hrnwvcr.

1-5.1 Processes
A key concept in all operating nysrems iz. the process. A proccss i s h:lsically a program in execution. Associated wirh ruc h process is its address spare. ;I list

lncalioi~sfrom some minimum (~.~suilll0) to sume rniixiinum. which y the procrss can read and write. The address space contains ihe executable pn?gram, the program's data. and i t s stack. Also assc~cialed wilh each process is SUITI~ sct of registers. including the program courmr. stack pointer. and othsr hardware registem. and all the othcr infunnation needed to run the program. W c wilt come hack lu the process concep in much more derail in (:hap. 2. but fnr the time king.,the easiest way to get a good intuirive lkcl for a pnlccsh is In think about ti mesharing s y s t e i ~ ~ sPeriodical l y . [he opcrati ng system decides lo . stop running one pmccsu and statl running another, for example. because the first one has had more than 11s sham of CPU tinx i n the past second. When a prncrss is suspended te~r~porarily this, i t musr laler br rcstiir.~edin lihc exactly the same state it had when i t was stopped. This means that all inforn~ation about thc process must tK. explicitly saved somewhere during the suspension. For example, the proucss may have several files open for reading at orrc. Associated
t ~ f 'nlciriory

SEr. 1-5

OPERATING SYSTEM CONCEFTS

35

wit11 each d' these files is a pointer giving rile current positit3n (i.c..the n u m k r of lhc byte record to be read next). When a process is temporarily suspended. dl thrsc pointers hc: saved s that a read call exccutcd after t h prfJCCSs is reo ~ stafled will read the pruper data. I11 many operating systems, all thc information abuut each process, other than the contents of its own address space. is stored in an operating system tablc called the process table. which i s an array (or linked li sr) of structures, one t i ~ each process currently in existence. r Thus, a (suspended) process consists nf its addrcss space. usually called tl~c core image (in honor of the magnetic core inrn~ories used in days of yore), and its process u b l e m t r y . which contains i t s i-ttpistsrs. among other rhings. The kcy process management system calls are those dealing with the creation and ~crminaticm processes. Consider a t y p i c d example. A process ciilled Ihr: of command interpreter or shell reads cr~~nrnands frum a terminal. The user has just typed ii cnmrnand requesting that a program be compiled. Thc shell must nolc ur-eatr a new process that wiIl run the cumpiler. Whcn that process has finished the wmpilatiun, it executes a system call to terminate itsclf. 11' 3 process can create one or more other proucsses (referTed €0 ns child processes) m d these processes in turn c a n create child processes. wc quickly <arrive at the p r a e s s Ircc structure of Fig. 1-12. Related processes that arc cooperating ?a get. same job d r m ofte~ineed to cr~ininunicatewith one annthcr and synchronize their aclivities. This communication is called interprocess communicatiun, and will be addressed in derail in Chap. 2.

Figure 1-12. A proccss trcc. F'rwess A crcaicd rwo child prwcswx. K and C'. Process B creaied h - c c child prucesses. D. L-, b',

Other prnccss system calls are available to requcst mow memtrry (or release unused memory), wait for a child process tu terminate, iind overlay its program with a differcnt one. Occasionally, Ihere is a necd to convey inforn~srion a ~.unning to process that

is not sitting arcwnd waiting for this inforn~ation. 1431- example, a prcxcss that is communicating with another process on a different computer docs SO hy sending messages to the remote prwcess over a computer network. To guard against the possibility that a rnessagc or its reply is lost, thc sendcr may request that i t s crwn operating system notify it after a specified nulnhzr of seconds. so that i t can retransmit h e message if no acknowledgement has been received yet. After setting this timer, the program may continue doing other work.

Whcn ihc specified number of' scccmds has elapsed. the openrliag s y s t r n ~ sends an alarm signal to the prwcss. 'The signill causcs thc pImrKc5s tPmporarily suspend whatever it was doing. suvc its rcgis~crson the stack. a i d stad ~ . u n n i n g special signal handling prwedure. for. example. tu rrtranstoii 3 presuma ably lost message. When the signal halldler is donc. thc I-uoning process is restarted in thc state it wa5 in just b e f w the signal. Signals arc the .;oftwarc analog of hardware interrup~sand c m be gencratcd by a vat-jay or causes in addition to timers expiring. Many traps detected by hardware, such ;is cxecut ing an 11 legid i n s t n c h n o using an ir~valid r address, are alsu converted inlo signals to the guilty prrux. Each person authoi-ized to usc a system is ilssiped ;I IJID (User 1Dcntific~1tion) by the systcm administrator. Every prrlcess started has the CJID d the person w h o started it- A child process has the sawc UID ils ils pnrent. Users can bt. members of grwps, each of which has a GID [Group IDentification r. One UID. called the superuser (in IJNiXl, has special prwier and ]nab vid;iic many af the prntectim rules. In large inst.allations, nnly the sy swm ;1dtnitlis\.rii1 nr knows thc password nwded to becm-t~esupcruser, but inany of rhe rjrdiniiry users (especially studem) devote considerable cfftsrt Ir, trying to ijnd f1aw.s in the s y s tem that allow them tn become superuser wit h w t the pass wrmi. Wc will study prwesscs, interpmcess ut~rninmicutirm.and related i s u c s ia Chap. 2.

1 5 2 Deadlocks
When rwo or mcm yroccsses are interacting, they can sornctimes pel rhetnselves into a stalemate situation they cannot get out of. Such a siluaiion i h ~ i d l c d

a deadlock. Deadlocks can best be introduced wjrh a real-world example everyone i.; filmi h r with. deadlock in traffic. Cunsider thc si~uation Fig. I - 13ta). Herr fuur of buses are approaching an in~erscution. Behind each one are more buses (not sh<~wn).With a little bit r ~bad luck, the first f 0 l l r c r ~ u l r a11 arrivc a t the interswf j tion simultaneously. leading 1.u the situahm of Fig. I - 13(hj, i n which they arc deadlocked because nonc of them can gil forward. Each onc is blocliing one of the others. They .cannot baukwmd due t o orher buses behind thern. T k i - c is r w easy way out. Processes in a computer can experience an annlogr~us situation in which they cannot make any progress. For example, imagine a colnpurer with u tape drive and CD-recorder. Now imagine that rwn pruccsses cach need to produce a CDROM from data on a tape. Process I requesa and i s granted the tape drive- Ncxt process 2 requests and i s granted the CD-reconlcr. Then prcccss I requests the CD-rccurder and is suspended until proccss 2 i7eturns it. Finally. process 2 requests the tape drive and i also suspended because process I dready has it. Hcre s

OPERATING SYSTEM CONCEPTS

37

we have a de4adlwkfrom which there is no escape, We will study cleadlrrks ilnd what can be done aboui them in detail in Chap. 3 .

1.5.3 Memory Management
Every computer has some main memory that it uses to hold executing programs. In a very simple operating sysreni, only one pimogram a fimc is in at memory. To mn n second program, the first m e has to be reinwed and rhc second one placed in mcmory. More sophisticated operating syslcrns n1L w multiple prograins tu be in o memory at the same time. To keep them from inferfixing with one another (and with thc operating system). some kind of protection mechanism i needed. While s this mcchanisrn has to be in the hardware, i t i controlled by the operating system. s The abwe viewpoint is concerned with managing and protecling the computer's main tnemory. A different. hui equally important memory-related issue. is inanaging thc address space of the p170cesscs. Nounally, each process has some set of addresses i t can use. typically running lirm~ up to some maximum. 0 in the simplest case, the maximum amount uf addi-css space a process has is less rhun the main memory. In this way, a prtlccss can t i l l up its address space and there will be enough mom in main memory to hold it all. However. on many computers addresses arc 3 2 or 64 bits. giving on ddress space of 2" or bytes, I-espectively. What happens if a process has mtm address space than the computer has nlain memolmy thc process wants to use it and all? In the first computers, such a process was Just out of Iuvk. Now;rdays. a technique celled virtual memory exists, in which the operating system keeps part of the address space in main memory and part on dirk and s h u ~ l e s pieces back and forth between them as needed. This important operating system fuunction. a d other memory management-related functions will he covercd in Chap. 4.

38

¶NTRODUCTION

CHAP. 1

All computers have physical devices for acquiring inpul and producing output. After all. what good would a computer be if the users could not tell it what to do and could not get the results after i f did the work requested. Many kinds of input and output devices exist, including keyboards, monitors, printers, and so on. It is up to the operating system to manage these deviccs. Consequently, every operating system has an I/O subsystem for managing its l/O devices. Some of the U 0 software is device independent, that is, applies to many or all devices equally well. Other parts of it, such as device drivers, are specific to particular VO devices. I n Chap. 5 we will have a look at 1 0 / software.

I

1.5.5 Files
Another key concept suppotred by virtually all operating systems i the file s system. As noted before, a major function of the operating system is to hide rhc peculiarities of the disks and other V 0 devices and present the pmgrammer with a nice, clean abstract model of deviceindependent files. System calls are obviously needed tr, create files, removc files, read files, and write files. Before a file can be read, it must be located on the disk and opened, and after it has been read it shwld be closed, so calls are provided to dn these things. To provide a piace to keep files. most operating systems have h e cnncept o 'a f directory as a way of grouping files together. A student, for example, might have one directory for each course he is taking {fur the programs needed for that course), mother directory for his electronic mail, and still another directory for his World Wide Web home page, System calls are then needed lu create and remove directories. Calls are also provided to put an existing file in a directory, and to

remove a file from a directory. Directory entries may be eihcr files or other directories. This model also gives rise to a hierarchy-the file system-as shown in Fig. 1-14. The process and file hierarchies both are o r g a n i d as trees, but the similarity stops there. Process hierarchies usually are not very deep (more than three levels is unusual), whereas file hierarchies are commonly fuur, tlve. or even more levels deep. Process hierarchies are typically short-lived. generally a few minutes at most, whereas the directory hierarchy may ~ K I for years. Ownership and protecs ~ tion also differ for processes and Ales. Typically. only a parent process may control or even access a child process, but mechanisms nearly always exist to allow files and directories to be read by a wider group than just the owner. Every file within thc directory hierarchy can be specified by giving its path name from the top of the directory hierarchy. the root directory. Such absolute path names consist of the list of directories that must be traversed from the root directory to get to the file, with slashes separating the components. In Fig. 1-14.

SEC. 15 .

OPERATING SYSTEM CONCEPTS
Root directory

u

F'lgum 1-14. A file system for a university department.

the path for file CS101 is /Facrlty/Pro$Br~wdCo1~rses/CS~O~. leading slwh The indicates that the path is absolute, that is, starting at the root directory. As an aside, in MS-DOS and Windows, the backslash (\) character is used as the separator instead of the slash ( character, so the file path given above would be written I ) as W a c u l t y V r ~ f : B w w n ~ C ~ u r ~ e ~ \ C S l O I . Throughout this book we will generally use the UNIX convention for paths. At every instant, each process has a current working directory. in which path names not beginning with a slash are looked for. As an example. in Fig. 1-14, if /Faculry\Prof.Bmwn were the worlung directory, then use of the path name CourseJ/CS10i would yield the same Fhe a the absolute path name given above. s Processes can change their working directory by issuing a system call specifying the new working directory. Bdore a file can be read or written. it must he upened, at which time the permissions are checked. I f the access i s permitted. the system returns a small integer called a file descriptor to use in subsequent operations. If the access is prohibited, an error code is returned. Another important concept in UNIX is the mounted file system. Nearly all personal computers have one or more floppy disk drives into which floppy disks can be inserted and removed. To provide an elegant way to deal with removable

media (iwluding CD-ROMs). UNIX alltrws thc lilc system un a floppy disk to hc attached to the main tree. Consider the situaf on of Fig. I -15(;1). Ucfom llic

mount call, the root file system, nn the hard disk, and a scctmd fik systcm, c m a Iloppy disk. are separaie iind unrelated.

However, the file svstern on ihe floppy cannot he used, bccause here is nu way to specify path names un it. UNIX does not allow path names to be prcfixrd by a drive name or numkr; that would be precisely the kind of device dependence that operating systems ought to eliminate. Instead, the mount system call allows the file system an the flnppy tn be attached to the r w t file system wherever the prt>gram wants it ro be. In Fig. I-lS(h) the file system o n the floppy has been mounted on directory b, thus allowing access tu files /.A and /b&l. If directory h had contained m y files they would not be accessible while the floppy was mounted, since /b would refer to the root directory of the floppy. (Nor bring able to access these files is nut as serious as i t at first seems: filc systems are nearly always mounted on empty directories.) If a system contains multiple hard disks. they can all be muunted into a single tree as well, Another important conccpt in UNlX is the speciat file. Spctai files are pmvided in order to make I/O devices took like files. That way. they can be read and written using the same system calls as are used for reading and writing files. Two kinds nf special files exist: block spefial Thes and chnrarter special files. Block special files ;uc used to model devices that consist of a collecdon of randomly addressable blocks, such as disks. By opening a block special file and reading. say, block 4, a program can direct1y access the fourth block on the device. without regard to [he stmcture of the file system contained on it. Similarly, character special tiles are used to model printers. moderns, and other deviccs that accept or output a character stream. By convention, the special files w e kept in the /dev directory. For example. /dewYp might be t he line printer. The last feature we will discuss in this overview is one that relates to buth processes and files: pipes. A pipe i s a sort of pseudofile that can be used ro connect two processes, as shown in Fig. 1- 16. If pmcesses A and B wish to talk using

SEC. 1.5
a

OPERATING SYSTEM CONCEPTS

41

pipe, tbey must set i~ up in advance. When process A wants to send data to prwcss B, it writes on the pipe as though it were an output file. Process B can read the data by reading from the pipe as though it were an input file. Thus, communication between processes in UNIX looks very much like ordinary file reads and writes. Stronger yet, the only way a process can discover that the output file it is writing on is not really a file, but a pipe, i s by making a special system call. File systems nm very important. We will have much more to say about them i n Chap. 6 and also in Chaps. 10 and 1 I .

Figure 1-16. Two prvcesscs con~ected a pipe. by

1 5 6 Security
Computers contain large amounts of information that users often want to keep confidential. This information may include electronic mail, business plans, tax returns, and much more. It is up to the operating system to manap the system security so that files, for example, are only accessible to auth~rizeb users. As a simple example. just to ger an idea of how security c m work. consider UNIX. Files in UNIX are protected by assigning each one a 9-bit binary protection code. The protection code consists of three 3-bit fields, one for the owner, one: for other members of the owner's group (users are divided into groups by the system administrator), and one for everyme else- Each field has a bit for read access. a bit for write access, and a bit for execute access. These 3 hits are known as the FWX bits. Fnr example, the pmkction cude rwxr-x--x means that the owner can read, write, or execute the file, other p u p members can read or execute (but not write) the fiie, and everyone else can execute (but not r e d a wrire) the file. For r a directory, x indicates search permission. A dash means that the corresponding permission is absent. In addition to file protection, *ere are many other security issues. Protecting the system from unwanted inhders, both human and nonhuman (e-g.. viruses) is one of them. We will took at various security issues in Chap. 9.

1.5.7 The Sheli
The operating system is the code that carries out the system calls. Editors. compilers, assemblers. linkers, and command interpreters definitely are not p;ut of the operacing system, even though they are important and useful. At the risk of confusing things somewhat, in this section we will look briefly at the UNIX cam-

42

INTRODUCTION

CHAP. 1

mand interpreter, called the shell. Although it is not part of the operating system. it makes heavy use of many operating system features and thus serves as a good example of how the system calTs can be used. It is also the primary interface between a user sitting at his terminal and the operating system, unless the user is using a graphical user interface. Many shells exist. including ~ h csh, h h , and , bash. All of them support the functionality described below, which derives from the original shell {sh). When any user lops in, a shell is stand up. The shell has the terminal as standard input and standard output. It starts out by typing the prompt. a character such as a dollar sign, which tells the user that the shell is waiting c accept a corno mand. If the user now types
date

for example, the shell creates a child process and runs the dare program as the child. While the child process is running, the shell waits for it to terminate, When the child finishes. the shell types the prompc again and tries to read the next Input line. The user can specify that standard output be redirected to a file, for example,
date >file

Similarly, standard input can be redirected. as in

which invokes the sort program with input taken from ,file1 and output sent to

file2. The output of one program can be used as the input for another program by connecting them with a pipe. Thus
cat file1 file2 fit03 I sod M e v A p invokes the cat program to concutenate three files and send the output to s a r i to arrange all the lines in alphabetical order. The output of son is redirected to the

file /dev&, typically the printer. If a user puts an ampersand after a command. the shell does not wait for it to complete. Instead it just givcs a prompt immediately. Consequently,
cat file1 file2 file3 t sort ddevllp &

sort as a background job, allowiog the user to continue working oormally while the sort is going on. The shell has a number uf other interesting features, which we do not have space to discuss here. Mnst books on UNIX discuss the shell at some length (e.g., Kernighan and Pike, 1984; Kochan and Wood. 1990: Medinets. 1999; Newham and Rosenblatt, 1998: and Robbins, 1999).
starts up the

SEC. 1.5

OPERATING SYSTEM CONCEmS

1.5.8 Recycling of Concepts
the ancient Romans lacked cars is not that they liked walking so much. It is because they did not know how to build cars. Personal computers exist no? because millions of people had some hng pent-up desire to own a computer, but because it is now possible to manufacture them cheaply. We often forget how much technology affects our view of systems and it is worth reflecting on this

Computer science, like m n fields. i s largely technology driven. The mason ay

point fmm time t time. o In particular, it frequently happens hat a change in technology renders some idea obsolete and it quickly vanishes. However, another change in technology could revive it again. This is especially m e when the change has t do with the o relative p c r f o m c e of different parts o the system. For example, when CPUs f became much faster than memories, caches & m e irnpmnt to s p e d up the W o w " memory. If new memory technology some, day makes memories much f s t e r than Cf Us, caches will vanish. And if a new CPU technology makes them faster than memories again, caches will reappear, In biology, extinction is forever, but in computer science, it is sometimes only far a few years+ As a consequence of this impermanence, in this book we will from time to time look at "obsolete" concepts, that is, ideas that are not optimal with c m n t technology. However, changes in the technology may bring back some of the socalled "obsolete concepts." Fw this reason, it is important to understand why a concept is obsolete and what changes in the environment might bring it back

again.

To make this point clearer, let us consider a few examples. Early computers had hsrdwind instruction sets. The instructions were executed directly by hardware and could not be changed. Then came mjcroprogmming, in which an
underlying interpreter carried out the instructions in software+ Hardwired errecution became obsolete. Then RISC computers were invented, and microprogramming (i.e., interpreted execution) became obsokte because direct execution was faster. Now we are seeing the resurgence of interpretation i h e form of Java n applets that are sent over the Internet and interpreted upon arrival. Execution speed is not always crucial because network delays are so great that they tend to dominate. But th&r could change, too, some day. Early operating systems allocated files on the disk by just placing them in contiguous sectors, one after another. Although this scheme was easy to implement, it was not flexible because when a file grew, there was not enough room to store it any more. Thus the concept of contiguously allocated files was discarded as obsolete. Until CD-ROMs came around. There the problem of growing files &d not exist. All of a sudden, the simplicity of contiguous file allocation was seen as a great idea and CD-ROM file systems are now based an it. As our final idea, consider dynamic Linking. The MULTICS system was designed to mn day and night without ever stopping. To fix bugs in software, it

CHAP. I

was necessary to have n way to replace library procedures while thry wcrc k i n g used. The conccpt o dynamic linking was invented f o r this purposc. After MI!Lf TICS died, the concept was forgotten for a while. However. it was rediscowed when ~nodrrn operating systems needed a way to allow many propratns 10 share the same library procedures without having their own private copies (because gl-aphics libraries had grown so large). Most systems now wppon some form of Qnatnic linking once again. The lisr goes on. but these examples shmdd make [he point: an idea that i s ohsnlete today may he the star of thc party tomorrow. Technology is not the only factor that drives systems and software. Economics plays a big role too. In the 1960s and 1970s. most terminals were inschanical printing terminals ur 25 x XU character-orien~edCKTs rather than hitmap graphics termit~als. This choice was nut a questiun ut' technology. Bit-map graphics reminals were in use before 1960. It is just h a t thcy cnst many tens o f thousands of dollars each. Only when the prim came d r w n enorrnnusly c w l d people (rlther than the military) think o f dcdicaling une Icrrrliniil ta nn individual use..r.

1.6 SYSTEM CALLS
The interface between the operaling
systcnl ilml the user prngrrms is d e t i ~ ~ e d

by the set OF system calls that the operating system prnvides. To redly understlirld what operating systems do. we must cxaminc this inrcrface ctosel y. The sysicm calls available in the interface vary from operating system to operating system (although the underlying concepts tend ro he similar). We are thus forced to make a choice hctween ( 1 vague generalities ("operating systems have system calls t i t r reading files") and ( 2 ) some specific system ("UNLX has u read system call with threc parameters: one rcl spucify thu file, or~e to tell where the data are to be put, and unr to tell how many bytes to rcad" 1. We have chosen thc latter apprnach. It's more work that way. hut it gives inore insight into w h a ~operating systems rcally do. Althuugh this discussim specifically refers to POSIX (international Sla11dari-l Y-445-1). henw also to UNIX, System. V. BSD. Linux. MINIX. etc.. most orher tnotlern operating systems have syslcin calls that prfoimi the same t'imctions. even if the clerilils differ. S i w e the actual mechanics of issuing a system call arc highly tilachine dependent and often must be expressed ill nssemhly code. a procedure library is provided to tnakr it possihlr to make system calls from C programs and otien from orhcr im~uiiges :is we11. It is useful to keep the following in mind. Any singlc-CPU computer can exccule only one inslruction at a time. If a process i s running :I user program it) uscr mode and needs 3 system service. such as readirlr! data t'rtm a file. i t ha^ 10 m e cute n trap or system cihl instruction to transfer colarcll to the oper;lling sysren~. The operating system then figures out w h a ~ calling process wants by inspectthe ing the parameters. Then i t carries out thc system call and returns control 10 the
L

4

.

1 .A

SYSTEM CALLS

45

insvuction following the system call. In a sense, m a h n g a system call i s like making a special kind of procedure call, only system calls enter the kernel and procedure calls do not. To make the system call mechanism clearer. let us take a quick look at the read system call. As mentioned above, it has three parameters: the first one specifying the file. the second one pointing to the buffer. and the third one giving the number crf bytes to read. Like nearly all system calls, it is invoked from C programs by calling a library prw-edure with the same name as the system call: rend. A call from a C program might Imk like this:
count = read(fd, buffer, nbytes);

The system call {md the library procedure) return the number of bytes xtually read in colmt. This value is normally the same as nbytes, but may be smaller, if, . for example, end-of-fi le is encountered while readi ng. If the system call cannot be carried out, either due to an invalid parameter or a disk error, count is set €0- 1 , and the error number is pur in a global variable. errnu, Programs should always check the results of a system call to see if an error occurred. System calls are performed in a series of steps. To make this concept clearer, let us examine the read call discussed above. In preparation for calling the r v r d library procedure, which actually makes the read system call, the calling program first pushes the parameters onto the stack. as shown in steps 1-3 in Fig. 1-17. C and C++ compilers push the parameters onto the slack in reverse order fur historical reasons (having to do with makmg the firs1 parameter to prin.g the fcrmat string. appear on top o f the stack). The first and third parameters are called by value, but rhe second parameter is passed by reference, meaning that the address of the buffer (indicated by &) is passed, not the contents of the buffer. Then comes the actual call to the library procedure (step 4). This instruction is the normal procedure call instruction used to call all procedures. The library procedure, possibly written in assembly hnguage, typically puts the system call number in a place where the operating system expects it, such as a register (step 5). Then it executes a TRAP instructiun to switch from user i n d e to kernel mode md start execution at a f i x e d address within the kernel (step 6). The kernel code that starts examines the system cell number and then dispatches to the correct system call handler, usually via a Vdbk of pointers tu system call handlers indexed on system call number (step 7). At that point [he system call handler runs (step 8). Once the system call handler has completed its work. control may be returned to the user-space library procedure at the instruction following the TRAP instruction (step 9). This procedure then returns torhe user program in the usual way procedure calls return (siep 10). To finish the job. the user program has to clean up the stack. as it does after any procedure call (step 11). Assuming the stack grows downward, as it often dws. the compiled code increments the stack pointer exactly enough to remove

INTRODUCTION
Address OxFFFFFFFF

CHAP. 1

)
uwr space

Library

1 Push nbytes

f

i

u e program sr
calling read

Karnel spat% (Operating system]

Figure 1-17, The 1 1 steps in making the system call read[fd. buffer, nbytes).

the parameters pushed before the call to read. The program is now free to do whatever it wants to do next. In step 9 above, we said "may be returned 10 the user-space hbrary procedure ..." for good reason. The system call may block the caller, preventing it from continuing. FWexample, if it is trying to read from the keyboard and nothing has been typed yet. the caller has to be blocked. In this case. the ~peratingsystem will look around to see if some other process can be run next. Later, when the desired input is available, this process will get the attention of the system and steps 9-1 1 will occur. In the following sections, we will examine some of the most heavily used POSIX system calls, or more specifically, the library procedures that make those system calls. WSIX has about 100 procedure calls. Some o the most important f ones are listed in Fig. 1-18, grouped for convenience in four categories. In the text we will briefly examine each call to see what it does. To a large extent, the services offered by these calls determine most of what the operating system has to do, since the resource management on personal computers is minimal (at least compared to big machines with multiple users). The services include things like creating and terminating processes, creating, deleting, reading, and writing files, managing directories, and performing input and output.

SEC. 1.6

SYSTEM CALLS
Proceas management
_-._

7 .

"

-.
. .. -. .. . -- .

.

-..

-%--...-..

---..- .-

Call ..-._

DeserlptSon
Create a child process identical Wait for a child to terminate _ .-___.__..-.. _ . _ ___--_--Replace a process' core image Terminate process execution . ..---..A

I

1

I

:

L

pid = fork( ]

__.._.__ _._"._
_l,l_..

I

) pid = waitpid(pid, &statbc, options) __. . _ . .,. -

I s = execve(name, argv. environp) -. -.--.-...-.--

! erit(status)
-.--

----- .

-

.--

--...

.A

-

i
'
'

....., . ... . ....

----. - - . . .

F11e mana$ernent . -.--

- -

.%.

.... - . Call . fd = apen(file, how,*..) s = cltxe(fd)

hacription
..

-.-.

.-

-..

Open a file for r d i n g , writing or both --. Close an opsn file -.--.
... .A -

n = md(fd, buffw, nbytes) n =write(fd, buffer, nwes) -. .. . - . . .psition = Iseek[fd, offset, whence) s = stat(narne, &bun

Read data from a file into a buffer .- -...-

4
A -

i

Writs .,-data from a buffer into a file -. Move the file pointer - -.. --- Get a fite's status information .-.
-. ...
-A

- -....-

Directory and .We system management -- .. -. ------. - ...Cali Oercription -.-s = mkdk(name, mode) Create a new directory - --. a = rmdic(mme} Remove an empty directory 1 s = link{name1, m e 2 ) Create a new -entry, name2,-pointing -- name1 to -. . -.....,. ... . s = unllnk(name) Remove a directory entry ,-.-. .- -. s = rnount(speclal,name, flag) Mount a file system --. s = umount{spedal) . Unmount a file system - -.--.".
A

:
'

.A

.

i

Mlscel~amua

Call . , s = chdirtdirname) s = chmod{nam%,mods] s = kill(pib, signal) remnds = time(&semnds)

-

1' --:
,

%

----------..

'

I)ercriptim Change the working diredory Change a file's protection bits Send a signal to a p r c e s s .-.-. Get the elapsed time - Jan. 1 , 1970 since . .. .& -, .
..

I+.

1

Figure 1-18, Some of the major POSIX system calls. The return code s i -1 if s an error has occurred. The Etum codes are as follvws: pid is a process id,]d is a file descriptor, n is a byte count, positim is an offset within the file. and saiwnd.~ is the elapsed tima. The parmeten are explained in the text.

A an aside, it is worth pointing out that the mapping of POSIX procedure s calls onto system calls is not one-to-one. The POSIX standard specifies a number o procsdutes that a conformant system must supply, bur it does not specify f

CHAP. I

whether they are system calls. library caHs. or something else. [f a procedure can be carried oul withou~invoking a system call (i+e., without trappirlg to the kernel). i t will usually bc done in user space for reitsuns of performance. However, most 01' h e POSiX procedures do invoke system calls. usually with one procedure mapping directly onto one systern call. In a few cases. especially where several required procedures arc only minor variations of one another, one system call handlcs mare than one library call.

1.6.1 System Calls for frwess Management
The first group of calls in Fig. I - 18 deals with prtwss management. Fork is a good place to start the discussion. Fork is the only way to create a new process in UNIX. It creates an exact duplicate uf the original process, including a l the file l descriptors, registers-+werything. After the fork. the original prucess and the copy (the parent and chiid) go their separate ways. All the variables have identical values at. the time of the fork, but since the parent's data are copied to create the child, subsequent changes in one of them do not affec~ other one. {The the program text, which is unchangeable, is shared between parent and child.) The fork call returns a value, which is zero in the child and equal to the child's process identifier or PID in the parent. Using the returned PID, the twc processes can see which one i s the parent pmcess and which one is the child process. I n most cases, after a fork, the child will need to execute different code from the parent. Consider the case of the shell. Zt rends a command fmrn the terminal, forks off a child process, waits for the child to excfote the command, and then reads the next command when the child terminates. To wait for the child tu finish, the parent execuks a waitpid system call. which just waits until the child tcrminates(any child if more than one exists). Waitpid can wait for a specific child, or for any old child by setting the first parameter to - I . When waitpid completes. the address pointed to by the second parameter, s ~ i d will be set to the child's ~ , exit status (normal or abnormal termination and exit value). Various options are d o provided, specified by the third parameter. s Now consider how fork is used by the shell. When a cornmand is typed, the shell forks off a new process. This child process must execute the user command. It does this by using the execve system call. which causes its entire core image to be replaced by the file named in its first parameter. (Actually. the system call itself is exec, but several different library procedures call i t with different pararnoters and slightly different names. We will treat these as system calls here.) A highly simplified shell illustrating the use of fork. waitpid. and execve is shown in

Fig. 1 - 19. In the most general case, e x w e hiis three parameters: the name of the file to be executed, a pointer to the argument array. and a pointer to the environment m y . These will be described shortly. Various library routines. including execl, pxecv. execle. and execve. are provided to allow the paramekrs to be omitkd o r

SYSTEM E'A1.1-S

Mefine TRUE 1
whde (TRUE) { Type-prompt( ):

/* repeat forever */

read-command(command, parameters):
if (fork() != 0){ /* Parent code- */ waitpid(4, &status, 0 ; )

I* display prompt on the screen */ I* read input from terminal *I
/* fork off child process */
/* wait for child to exit */
/* execute command */

I el= l
1

I* Child code. */
execve(commafid. parameters, 0 ) ;

1
Figure 1-19. A stripped-dawn shell. Throughout this bmk, TRUE i s assurl~cd lo be defined as I .

specified in various ways. 'Thmughout this hook wc will use the namc exec tr, represent the syskm call invoked by all of r;hese. Let us consider the case of a command such as
used to copy file1 tofik2, After the shell has forked, the child process lacares nnd executes the file cp and passes to it the names uf the source and target files. The main program of cp (and main program of most uther C prngrams) c w -

tains the declaration
main(argc, argv, env p)

where argc i s a count o f the number of items on the command line, including the program name. For the example above. amc is 3. The second parameter, rtrgv, is a pointer to an arrity. Element iui' that array is a pointer to the i-th siring on the command line. In our example. crrgv[O] w o ~ l d point tu the string "cp", orgy[ I ] would point to the string "file 1 " and i r q v l 2 l would point to the string "fik2". The third parameter of main. envp, is a pointer to the environment. an array of strings containing assignments of the form ncme = v d u e used to pass information such as the terminal type and home directory name to a program. In Fig. I - 1 9 nu . environment is passed to the child. s the third parameter of cwc.ve is a zero. o If exec sterns complicated, do not despair; i t is (semantically) the most complex uf all the POSIX system calls. All the other ones are much simpler. As an example of a simple one, consider exit, which processes should use when they arc finished executing. It has one parameter. the exit status (0 lo 255), which i s returned to the parent via smrluc in the waitpid system call.

50

INTRODUCTION

CHAP. I

Pn~ccsses n UNlX have their memory divided up into three segments: the text i segment (i.c., h e program code), the data segment (i.c..the variahlesj, and the stack segment. The data segment g r o w upward and the stack grows diwnwerd. as shown in Fig. 1-20. Between [hem is a gap of unused address space. The stack grnws into rhe gap automatically, as needed. but expansion of the data segment is done explicitly by using a system call. brk, which specifies the new address where the data segment is to end. This call. however. is not defined by the POSIX standard, since programmers are encouraged to use the rnnlbr library procedure tor dynamically allocating slorage, and the underlying implernenlarion of mdhr was not thought to be a suitable subject for standardization sincc few programmers w e
Address {hex) FFFF

a,kl

Fiwre 1-20. Prr~cc,c;ses a w three segrritrnts: text. data, h

nnri

stack.

162 System Calls for File Management ..
Many system calls relate to $he file system. 'In this section we wii I look at calls that operate on individual flles: in rhe next one we will examine those that involve directories or the We system as a whole. To read or write a file, the file must first be opened using open. This call specifies the file name tcr be opened. either as an absolute path name or relative to the working directory, and a code of 0-XDONLY, 0-. WRONLY. or 0 - R D W R . meaning open for reading, writing, or both. To create a new file. 0-.(?REAT is used. The file descriptor returned can then bc used for reading or writing. Afterward, the file can be closed by close, which makrs the file descriptor available for reuse on a subsequent open. The most heavily used calls are undoubtedly read and write. We saw read earlier. Write has the same parameters. Although most prngtams read and writc files sequentjally, for some applications programs need to be able to access any pan of a file at random. Associated with each file is a pointer that indicates the current position in the file. When reading (writing) sequentially. it normally poinis to the next hytu to be read [wrirten). The lseek call changes the value of the position pointer, so that subsequent calls to read or write can hegin anywhere in the file.

SYSTEM CALLS
Lseek has three parameters: the first is the file descriptor for the file, the second is a file position, and the third tells whether the file position is relative to the beginning of the file, the current position, or the end of the file. The value returned by iseek is the absolute posiiion in the file after changing the pointer. For each file, UNIX keeps track of the file mode (regular file, special file, directory. and so on). size, time of last modification, and other information. Programs can ask to see this information via rhe stat system call. The first parameter specifies the file to be inspected the second one is a pointer to a sttucture where the information is to be put.

1.6.3 System Calls for Directory Management
will look at some system cdk that relate more to directories or the file system as a whole. rather than just to one specific file as in the previous section. The first two calls, mkdir and rmdir, create and remove empty directories, respectively. The next call is link. Its purpose is to allow the same f i k to appear under two or more names, often in different directories. A typical use is to allow several members of the same programming team to share a common file, with each of them having h e file appear in his own directory, possibly under different names. Sharing a file is not the same as giving every team member a private copy, because having a shared file means that changes that any member of the team makes are instantly visible to the orher members-there is only one file. When copies are made o a file, subsequenr changes made to one copy do not f affect the other ones. To see haw link works, consider the situation of Fig. 1-21(a). Here are two users, as! and jim, each having their own directories with some files. If as( now executes a program containing the system call
In this section we

the file memo in jirn's directory is now entered into ust's directory under the name note, Thereafter, ~ u s r / j ~ ~ m e r n/usr\usdnutu refer to the same file. As an and ~ aside. whether user directories are kept in / ~ s r ,/user, h o m e , or somewhere else is simply a decision made by the local system administrator. Understanding how link works will probably make it clearer what it does. Every file in UNlX has a unique number, its i-number, that identifies ir. This inumber is an index into a table of i-nodes, one per file. telling who owns the file. where its disk blocks are, and so on. A directory is simply a file containing a set of (i-number, ASCII name) pairs. In the first versions o f UNIX, tach directory entry was 16 bytes--2 bytes for the i-number and 14 bytes for the name. Now a more complicated structure is needed to support long file names, but conceptually a directory is still a set of &number. ASCll name) pairs. In Fig. 1-21, n r d has inumber 16. and so on. What link does is simply create a new directory entry with a (possibly new) name. using the i-number o an existing file. In Fig. 1 -21(b), two f

entries have the same i-number ( 7 0 ) and thus refer 50 thc same file. If either m e is lnrer r c m o v d , using the unlink system call. the ather one remains. If both are removed, UNlX sees that no entries to the file exist (a field I the i-nudc keeps n track of the number of direcrory entries pointing to the tile), so the file is removed from the disk. As we have mentioned earlier, the mount systern call allows two file systcrns to be merged into one. A common situation is to have the rrwt file system containing the binary {executable) versions of the curnnlon commands and other heavily used files. on a hard disk. The user can then inscn a floppy disk with filcs to be read into the floppy disk drive. By executing the mount system call. the floppy disk file system can he attached to the root file system. as shown in Fig. 1-22. A typical statement in C t.2 perform the mount is
where the first parameier is the name of a block special fi te for drive 0. the scconc t parameter is the place in ihe tree where il is lo he mounted, and thc third paramcter tells whether the file system i s to be mounted read-write or read-only.

Figure 1-22. (a) F i l e system before the mumr, rb) F i l ~ s~stcm f m the rnaunr. o

After the mount call. a file on drive O can be accrsscd hy just using its p:lth from the root directory or the working direcrow, withoui regard tu which drive i t is on+ In fact. second, third. and fourth drives can also he mounted mywhere in the tree. The mount call makes it possible to integrate removable media into ii

single integrated. file hierarchy. without having lo worry a b u t which device a file i s on. Although this evample involves floppy disks. hard disks or portions of hard

disks (often called partitions or minor devicw) can also 'be rnounted this way. When a file system i s 110 longer needed, ii can be unmounted with the umount s p ten1 call.

1-6.4 Miscellaneous System Cslls
A variety of other systcm calls exist as wcll. We will look at just four c~f them here. The chdir call changes the current wnrking direc~ov.After the call
an open on the file xy; will open /usArsrAasr!xy, The concept rlf a working direclory eliminates the need for typing (long\ ahsulirte path names all the time. In UNlX every file has a m d e used for protecttun. The mu& includes the read-wrire-execute bits for the owner, group, and others. The chmod system call makes it possible in change the inode of n file. F w example, to make a file readoniy by everyone except the owner. one crmld executc

The kill system call is the way users and user prwesses send signals. Jf a process is prepared tcl catch a particular signal, then when it arrives, a signal handler is run+ I f the process is not prepared to handle rr signal. then its arrjval kills the process (hence the name of rhe call j. POSIX defines swvcral procedures for dealing with time. For exnmplc. time just rerurns the current t h e in seconds. with O corresponding to Jan. 1 . 1070 at midnight Cjust as the day was startinp, not endingi. On computers with ??-bit words. the maximum value time can return i s 2" - f seconds (assuming an unsigned integer i s used). This value corresponds to a litllc over 136 years. Thus in the year 2106. 32-hi1 U N l X systems will go berserk. imitating the famous YIK problem. If you currunrly have a 32-hit C N I X syslem. you are advised to trade it in for a 64-bit one sometime before the year 2 106.

1.6.5 The Windows Win32 API
So f a r we have focused primarily o n LrXlX. Now it is time ti) look hriefly at Windows. Windows and UNIX differ in a fundamcntnl way in their respecrive programming mdels. A UNlX pwgranl consists o f code that does su~nething or other. making system calls to h a w certiiin services pcrfonnrd. In contrasr. a Windows program is normally event driven. Thc main pnjprarn waits for some eve111 to happrn. then calls a procedure lu handle i t . Typical events arc keys being struck, the mouse bcing moved, a mouse button being pushed, or a floppy disk ittsertrd. Handlers are then called to process the event. update thc screen and
'

54

INTRODUCTION

CHAP. 1

the internal program stale. AH in all, this leads to a somewhat different style of programming than with UNLX, but since the focus of this book is on uperat ing system function and structure, these different programming models will not cnncern us much more+ Of course, Windows also has system calls. With UNIX, there is almost a 1to- l relationship between the system calls (e.g., read) and the library procedures (cg., read) used $0invoke the sys!em calls. In other words, for each system call, there is roughly one library procedure that i called to invoke it, as indicated in s F i g . 1 - 17 . Furthermore, POSIX has on1y about 1 OO procedure calls. With Windows, the situation i s radically different. To start with, the library calls and the actual system calls are highly decoupled. Microsoft has defined a set of procedures, called the Win32 AM (Application Frogram Interface) that programmers are expected to use to get operating system services. This interface is (partially) supprted an all versions of Windows since Windows 95. By dtcoupling the interface from the actual system calls, Microsoft retains the ability to change the actual system calls in time (even from release to release) without invalidating existing pugrams. W h a ~ actually constitutes Win32 is also slightly ambiguous since Windows 2000 has many new calls that were not previous!^ available. In this section, Win32 means the interface supported by all versions o f Windows. The number of Win32 A H calls is extremely large, numbering in rhe thousands. Furthermore, while many of them do invoke system calls, a substnntial number are canied out entirely in user space. As a consequence, with Windows it is impossible to see what i s a system call (i.e., performed by the kernel) and what is simply a user-space library call. In fact, what is a system call in one version of Windows may be done in user space in a different version, and vice versa. When we discuss the Windows systcm cails in this book, w e will use the Win32 procedures (where appropriate) since Mjcrosoft guarantees that these will be stable over time. But it is worth remernberi ng that not all of them are true system calls (i-e., traps to thc kernel). Another complication i s that in UNIX, the GUI (e.g.,X Windows and Motifl runs entirely in user space, so the only system calls needed for writing on the screen are write and a few other minor ones. Of course, here arc a large number of calls to X Windows and the GUI, but these are not system calls in any sense. ln contrast, the Win32 APL has a hugc number of calls for managing windows, geometric figures. text. fonts, scrollbars, dialog boxes, menus. and other features of the GUI. To the extent that the graph pc subsystem runs in the kernel hs i (true on some versjons of Windows but not on all), these are system calls; othe17wise they art: just library calls, Should we discuss these calls in this book or not? Since they are not really related to the function of an operating system, we have decided not to, even t h ~ u g hthey may be carried out by ibe kernel. Readers interested in the Win32 API should consult one of the many books on the subjecr, for example (Hart, 1997; Rector and Newcomer. 1997; and Simon, 1997).

.

-. -- - -.

,

.- - .-. .

.... .. .

- .

.-

-.---.
. . . - . -.

I

UNrX

!

Win32
CreateProcess
-

.

Description

.

- 4I

. eiecve...@one) . - -1 . .-.- ... . -. exit - I ExltProcess -.I
, ,

,

.

-

.-A
.

. .

j CreateProcess = fork + exscve . .. , Terminate execution . . _ ___+.-

.. .

.

-.

.. - -

pen i .- .- . .
! close

I CreateFile. . ,
.
i

-..
.

-1.Create a file or open an existing file .. . .-.
- L .

1

&.--

read -i write

I:

MoseHandle - .... . ReadFile -. ---.
..

; +- WriteFile --

1
.

.

--

- -

i

-

Close a- file -- >

... .

. -.
---. . .. . - -- ..-.

Read data from a file Write data to a file -- .. .. ... . - - .- . -

----

. .-

. -. .. - -- .,

Move the file pointer
Create a new directory --._

.-

. ._

.. .- . . .-.

_

.

_

__
. .

(none) .
*>

. ..

. ..

Win32 does -.not support links . -- .
-?

,

--

4
I

.- - -. . - .. . .

..- -. - -- .. .. --

..I - -+ +SetCurrentDirectory chclir -.%

-

. .--- -

.

. .

.

does not - . suppofl mount .Change - -- current working directorythe - . . . . & Win32 does not
--Win32 .
I

-I
I

Win32 does not support signals

I time

-

1

GetLocalTirne

Get the current time

1

Let us now brietly go through the list { r f Fig. 1-23. CreateProcess creates a new prucess. It does the combincd work of fork and execve in CINlX. It has many parameters specifying the prtyxrties of the newly created process. Windows does not have a process hierarchy as UNIX does so there is no concept o f a parent process and n chiid process. After a prwess is crealsd, the ureatclr and crcatee are equals. WakForSingleObject is used to wait for an rvenr. Many possible evznts can be waited for. If the parameter specifics a process. then the caller waits for rhe specified process to e x i t , which is done using ExitProcess. The next six calls operate irn files and arc func~ionall?; similar lo their U N I X cnunterparts although they differ in the parainetm and details. Sti!l, files can bc opened, closed, read, and written pretty inuch as in UNIX. The SetFilePointer and GetFileAttributesEx calls set the file position and get some o the file attributes. f

WjnJows has directories and [hrv are crealed with CreateDirectory and RemoveDirecPoty, respectively. There is also a notion nf a cw-rent directory. Set SetCurrentOirectory. The current time is acquired using GetLocalTirne. The Win32 inierke does not have links r files, rrlr~untcdfilc syskms, secun riry, or signals. st> the calls corresponding to the UNlX ones d o not ex is^. Of course. Win32 has a huge number of other calls that UNIX does not have. especially f w inanaging the GUI. And Windows 2000 has an elaborate security sysrem and also supports file links. Onc bsr note about Win32 is perhaps worth making. Win32 is not a terribly unifnnn ar consistent interfact. The main culprit here was the need to k hackward compatible with the previous 16-bit intertice used in Windows 3 . x .

27 OPERATING SYSTEM STRUCTURE .
Ncnv that we have seen what operating systems look like on the outside (i.e.. the programmer's interface), it i s time to take n look inside, I n the following scc~ions. will examine five different structures that have k e n tried. in order to get we some idea of the spectrum of pnssibilities. These are by no means exhnustius. but they give an idea of some designs that have been tried in practice. The five designs are mand ithic systems, layered systems. virtual mavhines. e xokernels. and client-server systems.

1.7.1 Monolithic Systerns

By far the most common organization. this approach mirht well be subtitled ...
"The Big Mess." The structure is that there is no structure. The operating system is written as a collection of procedures, each of which can call any of the orher ones whenever it needs to. When this cechniquc is used. each procedure in ihe system has a well-defined interface in terms of parnmcters and results. and each one is free to call any other one. if the latrcr prorJides some uscful computaiion that the former needs. To constmct the actual object program of the operating system when [his approach i s used, one first compiles all the individual procedures, or files containing the procedures, and then binds them all togeiher into a single object file using the system linker. In terms of information hiding. there is essentially none-evely procrdure is visible to every orher procedure (as opposed to a structure containing modules or packages. in which much of the information is hidden away insidc modules. and only the officially designated entry points can be called from outside the mwluk). Even in munolithic systems. however, it is possible to have st least a littlc structure. The services (system calls) provided by the operating system am requested by putring the parameters in a well-defined place (e.g., on the stack) and

SEC. 2.7

OYERATlKG SYSTEM STRLiCTUKE

57

tben executing a trap instruction. This instrucrion switches the machine f i r ~ r nuser

mode to kernel mode and transfers control ro the operating system, shown as step 6 in Fig. 1- 17. The operating system then fetches the parameters and determines which system call is lo he carried out. After that, i t indexes into a table that contains in slot t a pointer to the procedure that carries out system call k (step 7 in
Fig. 1-17]. This organization suggests a basic structure for the operating system:
1 . A main program that invokes the requested service procedure.

2. A set nf service procedures that carry out the system calfs.
3. A set of uti t i ty prrxxdures that hclp the service procedures,

In this model, for tach system call there i s one service procedure that takes care of it. The utility procedures do things that are needed by several service procedures. such as fetching data from user programs. This division of the procedures into t h e layers is shown in Fig. 1-24,

n

pmwdure

Main

Sewice procedures
-- -- --

Vtility

procedures

F r 1-24. A simple srructuring model fnr a monolithic system. me

1.7.2 Layered Systems
A generalization of the approach of Fig. 1-24 is to organize the operating system as a hierarchy of layers, each one constructed upun the one below it. The first system constructed in [his way was the THE system built at the Technische Hogeschool Eindhoven in the Netherlands by E. W. Dijkstra (1968) and his srudents. The THE system was a simple batch system for a Dutch computer, the Electrolqica X8,which had 32K of 27-bit words (bits were expensive back then). The system had 6 layers, as shown in Fig. 1 -25. Layer 0 dealt with allocation of the processor. switching between processes when interrupts occurred or timers expired. Above layer 0. the system consisted of sequential processes, each of

58

INTRODUCTION

CHAP. I

which ccmld he pmgrarnn~edwithour having to worry aboul the fact that multiple processes were running on a single processor. I n other words. layer O provided the basic multiprogramming of the CPU.

Figure 1-25. S t r u c t u ~ the THE npewting syslein of

Layer I did the memory management. It allocated space for prucesses i n main memory and on a 5 12K word drum used for holding: parts of pmctssss (pages) for which there was no mom in main memory. Above layer 1 . processes did not have to worry about whether they were in memory ur Gn the drum: the layer I software took care of making sure pages were brmght intu m e m o q whenever they were needed, h y e r 2 handled cammunicatiurt between each process and the operator console. Above this layer each process effectively had its own operator console. Layer 3 t m k care of managing the 1/0 devices and buffering the information s t ~ a r n s and from them. Above layer 3 each process could deal with abstrac~ to LM3 devices with nice pruperties, instead of real devices with many peculiarities. Layer 4 was where tbc user programs were found. They did not have to worry about process, memory, console. or I/O management. The system operator process was located in layer 5 , A further generalization of the layering concept was present in the MULTICS system. Instead of layers, MULTICS was described as havtng a series of concentric rings, with the inner ones being more privileged than the outer ones (which is effectively the same thing). When a procedure in an outer ring wanted to call a procedure in an inner ring, it had to make the equivalent of a .system call. that is. u TRAP i n s ~ n ttc ion whose parameters were carefully checked for validity before allowing the call lo proceed. Although the entire operating system was part of the address space o each user process in MULTICS, the hardware made i t possible to f designate individual procedures (memory segments. actually) as protected against reading, writing. or executing. Whereas the THE layering scheme was really only a design aid, because all the parts of the system were ultimately linked together into a single object program, in MULTICS, the ring mechanism was very much present at run time and enforced by the hardware. The advantage o f the ring mechanism i s that it can easily be extended to structure user subsystems. For example, a professor could

SEC. I .7

OPERATING SYSTEM STR CTURE

59

write a program to test and grade student programs and run this program in ring n, with the student programs running in ring n + 1 so that they could not change their grades.

1 7 3 Virtual Machines ..
The initial releases of ow360 were strictly batch systems. Nevertheless, many 360 users wanted to have timesharing, so various groups, both inside and outside IBM decided to write timesharing systems for it. The official IBM timesharing system, TSS1360, was delivered late, and when it finally arrived i r was so big and slow that few sites converted to it. It was eventually abandoned after its development had consumed some $50 million (Grrtharn, 19701, But a group at IBM's Scientific Center in Cambridge, Massachusetts. produced a d i c a l l y different system that IB M eventually accepted as a probuct, and which is now widely used on its remaining mainframes. This system, originally called CPKMS and later renamed VMi370 (Seawrigh t and MacKinnon. 1979), was based on an astute observation: a timesharing system provides ( 1 ) mu~tiprograrnming and (2) an extended machine with a more convenient interface than the bare hardware. The essence of VM(370 is to completely separate these two functions. The heart of the system. known as the virtual machine monitor. runs on the bare hardware and does the multiprogramming, providing not one, but several virtual machines to the next layer up, as shown in Fig. 1-26. However, unlike: dl other operating systems, these virtual machines are nut extended machines, with files and other nice features. Instead, they are e x ~ c r copies of the bare hardware, including kerneVuser mode, I/O,interrupts, and everything else the real machine has.

instructions here
Trap here

I

370 Bare hardware

1

Figure 1-26, The structure of V W 7 0 with CMs.

Because each virtual machine is identical to the true hardware, each one can nm any operating system that will run directly on the bare hardware. Different virtual machines can, and frequently do, run different operating systems. Some run one of the descendants of OS1360 for batch or transaction processing, while other ones run a sing le-user, interactive system called CMS (Conversstional Monitor System] for interactive timesharing users.

When a CMS program executes a systcm call. thc call is lrappcd to thc operaring system in its own virtual machine, not r VM1370. just as it wuuld if it were o running on a real machine instead of a virtual one. CMS then issues the normal hardware VO instructiwns for reading i t s virtual disk or whatever is needed lo carry out the call. These 1/0 instructions are trapped hy VM/370. which then performs them as part of its siinulation o f the real hardware. By completely separating the functions of multiprogramming and providing an extended machine. each of the pieces can be much simpler. more flexible. and easier t(1 maintain. The idea o a virtual machine is heavily used nowadays in a different context: f running old MS-DOS programs on a Penrium (or ocher 32-bit Intel CPU). When designing the Pentium and its software. both Intel and Microsoft realized that there would be a big demand for running old software o n new hardware. For this reason, lntcl provided a virtual 8086 mode on the Pentiurn. In this mode, the machine acts like an 8086 (which is identical to an 8088 from a software point of view), including 16-bit addressing with a 1 -MI3 limit. This mode is used by Windows and other operating systems for running MSDOS programs. These programs are started up in virtual 8086 mode. As long as they execute normal instructions. h e y run on the bare hardware. However, when a program tries to trap to the aperating system to make a sysrcm calt, or tries to do protected 110 directly, a trap to the virtual machinc monitor wxurs. T w u variants on this design are possible. In the first one, MS-DOS itself is loaded into the virtual 8086's address spacu, so the virtual machine monitor just reflects the trap back to MS-DOS, just as would happen on a real 8086. When MS-DOS later tries to do the 10 itself. that uperation is caught and carried out by /
the virtual machine monitor, In ~ h other variant, the virtual machine monitor just cat.ches rhc first trnp and c dues the 110 itself, since it knows what all the MS-DOS system calls a r r and thus knows what each trap i s supposcd to do. This v m i a n ~is less pure that] thc first om, since i t only cmulatcs MS-DOS correctly. and not other operaling systcms, as thc first one does. On the other hand, it is much fdster. since it saves thc trouble of starting up MS-DOS €0do the I D . A further disadvantage of acriially nlnning MS-DOS in virtual 8086 mode is that M S - W S fiddles art~undwith thc interrupt enablddisable bit quite a lot. all of which must be einulatcd at considel-ahle cnst. 1t i s worth noting that neither of these approaches are redly thc same as

VM1370. since the machine being emi~lated s nor a full Pentium. bur only an 8086. i With the VM1370 system, it i s possible to run VM1370, itself. in rhc virtual machinc. With the Pentiurn, ir is not possible to run, say. Windows in the virtual 8086 because no version of Windows runs on an 8086: a 286 is rhe minimum for even the oldest versiun, and 286 emulation is not provided (let alone Pcntiurn emulation). However. by modifying the Windows binary slightly. this emulation is possible and even available in commercial products. Anuth~rarea where virtual machines arc used, b u ~ a sclmewhst different in way, is for running Java programs. When Sun Microsystems invented the Java

SEC. 1.7

OPERATING SYSTEM STRUCTURE

61

prqprnming ianguagc. it also invented a virtual machine (i.e..a cornpurer architccture) called the JVM (Java Virtual Machine). The Java compiler produces code for IVM. which then typically is executed by a software JVM inrerprcter. The advantage of this approach is that the J V M code can be shipped over the Internet to any computer that has u IVM interpreter and run there. If the compiler had produced SPARC ur Pentium binary programs, for example. they could not have been shipped iind run anywhere as easily. (Of course, Sun could have produced a compiler that produced SPARC binaries and the11 distributed a SPARC interpreter. hut JVM is a much simpler architecture to inrerpret.) Another advantage of using JVM is that if the interpreter is implemented properly. which is not completely trivial, incoming JVM programs can be checked for safety and then executed in a protected environment so they cannot steal data or do any damage.

With VMfl70, each user process gets an exact copy of the actual computer. With virtual 8086 mode on the Pentiurn. each user process gets an exact copy d a different computer. Going one step further, researchers at M.l,T. have built a system that gives each user a clone of the actual computer. but with a subset of the resources (Engler et al., 1995). Thus one virtual machine might get disk b k k s 0 tu 1023, the next one might get blocks 1024 to 2047, and so on. At the bottom layer. running in kernel tnude. is a program called the exokernel. Its job is to allocate resources to vifiual n ~ x h i n e s then check attempts to and use them to make s u e no machine is trying to use somchcdy eke's resources. Each user-level virtual machine can run its own operaling system. as on VM/370 and rhe Pentium virtual 8086s,except that each one is restricted to using only the resources It has asked fr and been allwated. The advantage of the exokernel scheme is that it saves :r layer of rnapp~ng.In the other designs. each virtual machine thinks it has its own disk, with blocks running fmm 0 tr, some rnaxirnum, so thc virtual machine n~oniturmust n~ainzilin tables to remap disk addresses (and all other resnurces). With the exokernel, this remapping is not needed. The cxokernel need only keep track of which virtual machine has been assigned which resource. This method still has h e advanrage of separating the multiprogramming (in the exokernel) from the user operaling system cade I'in user space). but with less overhead. since all rhc exokernel has to do is keep the virtual machines out of each other's hair.

1.7.5 Client-Server Madel
VM1370 gains much in sitnplicity by moving a large part of #he traditional operating system code (implementing the extended machine) into a higher layer. CMS. Nevertheless, VM1370 itself i s still a complex program because simulating a

62

INTRODUCTION

CHAP. I

number of virtual 370s in their entirety is not that simple (especially if you want to do it reasonably efficiently). A trend i modem operating systems is to take the idea of moving code up n into higher layers even further and remove as much as possible from kernel mode. leaving a minimal mierokernel. The usual approach i s to implement most of the operating system in user processes. To request a service, such as reading a block o f a file, a user process (now known as the client process) sends the request to a server process, which then does the work and sends back the answer.
Cliant

Client

.

Pmess

Terminal

.

,

+

process

prams

server

wrvet
Microkernel

File wnmr
,

1

1
}

user mode
Kernel mode

Client obtains service by sending messages to server processes

Figure 1-27, The client-server model.

In this t n d e l , shown in Fig. 1-27, all the kernel does is handle the communication between clients and servers. By splitting the operating system up into parts. each of which only handles one facet of the system, such as file service. process service. terminal service, or memory service, each part becomes small and manageable. Furthermore, because a11 the servers run as user-mode processes, and not in kernel mode, they do not have direct access to the hardware. As a consequence, if a bug in the file server is triggered, the file service may crash, but this will not usually bring the whole machine dawn. Another advantage of the client-server model is its adaptability to use in distributed systems (see Fig. 1-28). If a client communicates with a server by sending it messages. h e client need not know whether the message i s handled loeall y in its own machine, or whether it was sent across a network to a server on a remote machine. As far as the client i s concerned. the same thing happens in both cases: a request was sent and a reply came back. The picture painted above of a kernel that handles only the transport of messages from clients to servers and back is not completely realistic. Some operating system functions (such as loading commands into the physical UO device registers) arc difficult, if not impossible, to do from user-space programs. There are two ways of dealing with this problem. One way is to have some critical server processes (e.g., I/O device drivers) actual1y run in kernel mode, with complete access to all the hardware, but still communicate with other processes using the normal message mechanism. The other way is to build a minimal amount of mechanism into the kernel but leave the policy decisions up to servers in user space (Levin et a]., 1975). For

SEC. 1.7
Machine 1 Ciiant
- + *

OPERATING SYSTEM STRUCTURE
Machina 2

Machine 3
i

Machine 4
,

7'

I

4

Fib server

Pmessawar
Kernel

Terminal server
Kernel

4

I
1 b +

Kernel

Kernel

A

NetwQrk

Message from client to sewer

Figurn 14.8. The client-server model in a distributed system.

example, the kernel might recognize that e message sent to a certain special address means to take the contents of that message and load it into the VO device registers for some disk. to stan a disk read. I n this example, rhe kernel would not even inspect the bytes in the message to see if they were valid o meaningful; it r would just blindly copy them into the disk's device registers. (Obviously, some scheme for limiting such messages to authorized processes only must be used.) The split between mechanism and policy is an imprfant concept; it occurs again and again in operating systems in various contexts.

1-8 RESEARCH O N OPERATING SYSTEMS
Computer science is a rapidy advancing field and it is hard to predict where it is going. Re~archmsat universities and industrial research labs are constantiy thinking up new ideas, some of which go nowhere but some of which become the camexstone o future products and have massive impact on the industry and users. f Telling which is which turns out to be easier to do in hindsight than in red time. Separating the wheat fmm the chaff is especially difficult because it offen takes 20-30 years from idea to impact. For example, when President Eisenhower set up the Dept. of Defense's Advanced Research P o e t Agency (ARPA) in 1958, he was trying to keep the rjcs A m y from killing the Navy and the Air Force nver the Pentagon's research bude get. H wz~5not trying to invent the Internet. But one of the things ARPA did was fund some university research on the then-obscure concept of packer switching, which quickly led to the first experimental packet-s witched network. the ARPANET. It went live in 1969. Before long, other ARPA-funded reseamh networks were connected to the ARPANET, and the Internet was born. The Internet was then happily used by academic researchers for scnding email to each other for 20 years. In the early l990s, Tim Berners-Lee invented the Wurld Wide Web at the CERN research lab in Geneva and Marc Andreesen wrate a graphical browser for it at the University of Illinois. All o a sudden the internet was full o chatting f f teenagers. President Eisenhower is probably rolling over in his grave.

:

Research in operating systems has also led to dramatic changes in practicill systems. As we discussed earlier. the first commercial computer systems were all batch systems. until M.I.T. invented interactive timesharing in the early 1960s. Computers were all text-based until Doug Engelban invented the mouse and thc graphical user interface at Stanford Research Institute in the late 1960s. Who knows what will come next? In this section and in comparable sections throughout tbe book, w e will take a briel' lovk at some of the ntsearch in operating systems that has taken place during the past 5 r 1 years, just ro give a flavor of what might be on the horizon. This o 0 introduction i s certainly not comprehensive and is based largely on papers that have been published in the top research journals and conferences because these ideas have at teast survived a rigorous peer review process in order to get published. Most o f the papers cited in the resear~h sections were published by either ACM, the lEEE Computer Society, c3r USENIX and are available over &heInternet ro (student) members of these organizations. For more infomation about these organizations and their digital libraries. see
-

ACM

IEEE Computer Society USENIX

http:/ww .acm.org http:/www.computer.org http:/~.usenix.org

Virtually all operating systems researchers realize that current nperating systems are massive, inflexible, unreliable. insecure, and loaded with bugs, certain ones more than others (names widdwlb hem tn prutect rke guihy). Cnnsequcntly, there is o lot o f research om how to build flexible and dependable sysiems. Much of the research concerns microkernel systems. These systems have a minimal kernel, so there is a reawnable chance they can be made reliable and he debugged. They are also flexible because much of the real operating system runs as user-mode processes, and can thus be replaced or adapted easily, possibly even during execution. Typically, dl the microkernel does is handle low-1eve1 resource management and message passing between the user processes. The first generation rnicrokernels, such as Amoeba (Tanenbaum et al.. 1990), Chorus (Rozier et a]., 1988). Mach (Accelta et a].. IY86), and V (Cheriton. 1988). proved that these systems could be built and made to wnrk. The second generation i s trying to prove that they can not only work, but with high performance as well (Ford et al., 1996; Hartig et al., 1997; Liedtkc 1995, 1996; Rawson 1997; and Zuberi et al., 1999). Based on published measurements, it appears that this goal has been achieved. Much kernel research is focused nowadays on building extensible operating systems. These are typically microkernel systems with the ability to extend or customize them in some direction. Some examples are Fluke (Ford et al.. 1997). Paramecium (Van Dmrn et a\., 1993, SPIN (Bershad et al.. 1995b). and Vino (Seltzer et a]+,1996). Some researchen are also looking at how to extend existing

S K . 1-8

RkSEAKCH ON OPERATIKC; SYSTEMS
31,.

65

1 W X ) . Many of chese systems illlow users to add their c,\N~ ccl& in the kernel. which brings up the obvious pmblrtrn of how to d h w User

sysrcmx (Ghormley et

extensjons in a secure way. Techniques include interpreting the extensions. resrricting them t~ code sandboxes, usjng type-safe languages, and code signing tGrimm iind Bcrshad. 1997: and Small and Seltzer. 1998). Dwschel ct al. ( 1997) prcsrnt o dissenting view, saying rha~loo much effort is going into srcuimity for user-extendable svstenl?;. In their view, researchers should fiyurc out which extension.; are useful anti then just make those a r~ormal part of thc kernel. without the ability tu have users exlcnd ihe kernel on the fly. Although one apprcwch to eliminating bloated. buggy. un~.eli;ibleoperating s y s l r m s is lo makc them smaller. a more radical one i s to eliminate [he operating syslem iilrogcther. Ttiis approach is k i n g taken by the group t ~ fKaastioek at M.I.T. in heir Exnkerr~elresearch. Here the iden is tr, have ; thin layer of I suAware running on the barc metal, whose only job is to securely nllwate the hardware resources arntsng the users. For cxirmple. it must decide who gels ti, use which part of the disk and where incoming network packets should be delivered. Everything else is up tn user-levcl processes, making it pussihle tn build both general-purpose and highly-specialized upcrating systems (Engler and Kaashoek, 1 W 5 ; Englcr et al+,1995; and Kaashoek et d., 19971.

1.9 OUTLINE OF THE REST OF THIS BOOK
We have now completed our introduction and hird's-eyc view of h e operaling system. It is time to get down to the details. Chapter 2 is about processes. It discusses their prnperties and how they comrnunica~ewith one another. It also gives a number of detailed examples of huw it-tterproucss communication works and how to avoid some nf the pitfalls. Chapter 3 i s dmut deadlocks. We briefly showed whai deadlocks are ill this chapter. hut there is much more to say. Ways tu prevent or avoid ihetn are dis-

cussed. In Chap. 4 we will study menwry mmilgement in detail. The irnpo1t.int topic of virtual memory will he examined, along with closely relatcd concepts such as paging and segmentation. InpuVOutput is c u v ~ r e d Chap. 5 . The cor~ccpts device independence and in of device depmdence will k looked st. Several importan4 devices, including disks, keyboards, and displays, will he used as examples. Then. in Chap. 6, wc come to the all-important topic of file systems. To a considerable extent, what the user sees is largely the file system. We will lc-mk st both the file system interface and the file system implementarion. At this point we will have completed our study of the basic principles 4of single-CPU operating systems. However, there is more to say. especially about advanced topics. In Chap. 7. we examine n~ultimedia systems. which have a num-

66 .

INTRODUCTION

CHAP. I

b e r of properties and requirements that differ from conventional operating systems. Among other items. scheduling and the file system are affected by the nature of mu1timedia. Another advanced topic is mu1tiple prmessor systems. including multiprocessurs, parallel computers, and distributed systems. These subjects are covered in Chap. 8. A hugely important subject is operating system security. which is covered in Chap 9. Among the topics discussed in this chapter are threats (e.g+,viruses and worms). protection mahan isms, and security m d e l s . Next we have some case studies of real operating systems. These are UNIX (Chap. 10) and Windows 2000 (Chap. i 1). The book concludes with some thoughts a b u t operating system design in Chap. 12.

1.10 METRIC UNITS
TO avoid arly c o n f u s h , it is worth stating explicitly that in this b w k , as in computer science in general, metric units are used inskad o traditional English f units (the furlong-stone-fornight system). The principal metric prefixes are listed
in Fig. 1-29, The pretixes are typically abbreviated by their first letters, with the units greater than I capitalized. Thus a 1-TR database occupies 10" bytes of storage and a 100 psec (or 100 ps) clock ticks every 10-lo seconds. Since mtlli and micro both begin with the letter "m," n choice had to be made. Normally, "m" is for rnilli and "y" (the Greek letter mu) is for micro.

Figure 1.29. The principal rnctrir: prefixes.

It is also worth pillring c>ut that for measuring memory sizes, in commcm indusky practice, the units have slightly different meanings. There Kilo m e m s 2'" (1024) rather than 10' (1000) because rncmcwies are always a power of two. Thus a I - K B memory crmtains 1024 hytcs, rwi I(NO bytes. Similarly. a I-MB memory contains 22U (1.048,576) bytes and a 1-GB memory contains 2-'" (1,073,741,824) bytes. However. a I -Kbps cnrnmunicatim line ~ransmits 1IH)O bits per second and a 10-Mbps LAN runs at 10,000,000 b i t s k c because these speeds are not powers of'two. Un furtunutcl y. many people tend to mix up these

SEC. 1.10

METRIC UNITS

67

two systems, especially for disk sizes. Tn avoid ambiguity. in this b w k , w e will use the symbols KB,MB.and GB for 2". 2*', and bytes respectively, and the syrnbols Kbps, Mbps, and Gbps for 103. 1u6and I o9 bitdsec. respectively

z3'

.

1.11 SUMMARY
Operating systems can be viewed frum two viewpoinls: resource managers and extended machines. In the resource manager view, the operating system's job i s ro manage the different parts of the system efficiently. In the extended machine view, the job of the system is to provide the users with a virtual machine that i s mwc convenient to use than the actual machine. Operating systems have a long history, starting from the days when they replaced the operator, tc, modern multiprogramming systems, Highlights include early batch systems, multiprogramming syskrns, and personal cainputer systernh. Since operating systems interact closely with the hardware, some knowledge of computer hardware is useful to understanding them. Compurers are built up of processors, memories, and VO devices, These parts are connected by buses. The basic concepts on which all operating systems are built are processes, memory management, VO management, the file system, and security. Each uf these will be treated in a subsequent chapter. The heart of any operating system is the set of system calls that it can handle. These tell what the operating system really does. For UNIX, we have looked ;it four groups of system calls. The first group of system calls relates to process creation and kmination. The second group is for reading and writing files. Thc third group is for directory management- The fourth group contains miscellancous calls. Operating systems can be structured in several ways. The mos~ cnmmon ones at as a monolithic systcm, a hierarchy of layers, a virtual machine system, Jn exnkernel, or using the client-server model.

PROBLEMS
1. What are the two main funciir~ns an operating sysiern'? of

2. What is rnultipruamming'?
3. What is spooling? Do yuu think that advanced personal computers w i l l have spooling as a standard feature in the future?
4. On early computers, every byte of data read or written was directly handled by rhe CPU (i.e., there was no DMA). What implicatiims does this organization have t i ~ r inultiprogramming'?

CHAP. I

PROBLEMS

16. Why i s ihe p~-nccss tahlc nccdcd in a timcshuring n y t c n l ? Is i t also neetled i n pel-wnal cnmpurcr systcrns in which n n l y imr pnjccss cxisls. [hat prtlccss laking over the m l i r e machine u n t i l i t i s finislicd'?

17. Is there anv reason why yilu might w ~ n 10mount t trsry'! !f 50, what is it'?

; I

f i l e system nn a nonempry dirtx-

19, Can thc

count = write{fd, buffer, nbytes);

20, A T'ilu whtmr lilc dcsa~-iplr,r is,f[/ cantitins the fnllowing sequencc of' b p s : 3. 1. 3. 1, 5 , 9: '2, 5 . 3, 5 . Thc r o l l o w i n g svslcm calls are madc: 6.
Iseek(fd, 3, SEEK-SET); read(fd, &buffer, 4);

21. What is the essential dil'fercnce k t w c e n t'iIts '7

blwk special f i k and n c h a r w t c r special

22. [n ~ h example given in Fig. I - 17. zhe l i b w r y yrtrccdure i s c d lcd wod itnd [he s y s ~ c r n c c a l l itself is c d l c d read. Is i~ essential zhar h r h 01 thcsc h a w the hamc ILIITW'! T m t . f which rsnc i s morc i~nporrant'?

23. The slicnl-server ~ n n d e il s popular in distributed
single-cornpu~crsy slem'?

systems. C a n il also he used ill

; 1

24. To a pri~grammcr,a

system. c;dl

impm:int thai a prryriunnler [Jnder what circun~stanccsand why'?

tu ii Iibritry ~ I - L X ~ ~ L H - C . i~ [s k n i w which lib r ; q pruc-edurcs result in sy sir111 cdll;'.'

l m k s likr any ulher call

27. Write

shell ihat is * i m i l u r lo Fig. 1- 19 b u l c o ~ ~ t i l i nenough ctde t h a ~il actually s works so yrru can lesr i t . Y o u might also add sr~rnc~~XJL~I-L:.; s u c h ns r e d i r c c t i m of input and oulput. pipcs. and hackgn~und jrrhs.
; I

INTRODUCTION
uulimitcd number of child pmcsscs and observe what happens. Befclrc r w n i n g rhe cxperirnent, type sync ro thc shell to flush the file system buffers to disk l avoid ruino ing the file system. Note: Do nor try this on ; shared system without first gc~ting i permission from the system administrator. T h r consequences will be 'ins~anll): obvhus s r ~ you arc likely to be caught and sanctirms may follow.

29. Examine and try to interpret the contents of ri L I ~ l X - h k c Windows directory with 3 or ton1 like the I JNIX rrd program or thc MS-DOS DEBUG program. Hint: How you do this will depend upon whnl the OS a h w s . One trick that may work is l create a o directory on a floppy disk with one upcrating system and then read the raw disk datil using n different oper~ling system that allows such access.

PROCESSES AND THREADS

We arc nuw iibc~ut err~bark1-m a det.ailcd study a f hr>w ryxriiting sysfitrns Jre I.{:, designed and constructed. 'The most central concept in a n y i-rperating system is the p w ~ ~ ~ , sans abstraction of ii running pnqram. Everything else hingex un this L : concept. and i t is impm-toni €hat Ihc operating system designer {,;lndstudent) have a thnrough cinderstanding o f what a process is 3s earl!. ;IS possible.

2.1 PROCESSES
All rnodern crmputcrrs can du several Ihinps nr ~ h c same tinic. Whilc running a uscr prngrarn. a computer can also bc r i d i n g from ;I disk and u u t p ~ l ~ t i n ~ It) texl ii screen or printer. In a ~nultiprograrninings y s t m ~ . thc CYU also switchc.; f n n progrim t u prr,gratn, running each for tens ur hundreds r-d- milliseur~nds. While. strictly speaking. at any instant of time. the C'PU i h running only m e p r u g m n . in h e course of 1 secmd, it may work an sevcrltl pmgrarlis. thus giving thc users the iliusion of parallelisni. Surnetitnes peuple speak of ps~udoparallelismin this context, to contrast it with the true hardware parallelism of multiprrscessor systems (which have two or nwre CPUs sharing the same physical sncmnry). Kecyiag track of multipie, parallel activities is hard for yet-qlr: lo do. Therefurtl, opcrating system designers uver the ycars have evolved a conceplual rnodel (seqomtial processes) that makes paralklisrn easier to deal with. That model, its uses, and some of its consequences form the subject of this chapter.

72

PROCESSES AND THREADS

2.1.1 The Process Model
i n this model, all the runnable software on the computer, sometimes including the operating system. is organized into a ilumhcr of sequential processes. or just processes for short. A process is just an executing program. including thc current values of the program counter, registers, and variables. Conceptually. each procc s s has its own vidual CPU. In reality, of coursc. the real CPU switches back and f m h from process t process. bur to understand !he system. it is much easier to o think about a collection of processes running in (pseudo) parallel, than to 1ry t o keep track of hnw the CYU switches fmm program to program. 'This rapid switching hack and forth is called multiprclgramming. as wc saw in Chap. I. Jn Fig, 3-l(a) wc see a computer multipmgramming four programs in memory. In Fig. 2- I [b) we sce four processes, each with its own jlrs w l?f contrul kt.,its own lngical proqram counter). and each m e running independently o f the other ones. O f course, there is only one physical pnrgram counter, so when each pmcess runs, its bgicai program counter is Ina3ed intn [he real program munter. When it is finished for the time being, the physical program counter i s saved in thc process' logical pnjgrarn counter in inemury. In Fig. 2- l( c ) we see that viewed over a long enough time interval, all the processes have lnatlr progress, but at any given inatant only one process i s aurually running.
L

One progmm counter

I

Press switch

Four program counters

With the CPU swi~chingback and forth amnng the processes, the rate at which s prucess prforlns its computation will not be uniform and probably not even reproducible if the same processes are run again. Thus, processes must not he programmed with built-in assumptions about liming. Consider, for example, an I process that starts a streamer rape to restore backed up files. executes an D idle loop 10.000 times to let it get up to speed. imd then issues a command t read o the firsr record. If the CPU decides to sunitch to amxher process during rhe idle Loop, thc tape process might not run again until after the first record was already past the read head. When a process has cri~ical real-time requirements like this,

SEC. 2.1

PROCESSES

73

that is, pal.ticular events mrrst occur wirhin a spccilird nu~nberof milliseconds. special nleasures must he taken to cnsure that they do occur. Normally. huwevrr, most processes x e not affcctcd by the underlying multiprogramming of the CPU or rhe relative spccds o f different processes. dit'ferencc between a process and a prograin is subtle, but crucid. An analogy makc hclp here. Cunsider a culinary-minded computer scientist who is baking a binhday cake for his daughter. H e has a birthday cake recipc and a kitchen well stocked with all the input: flour, cggs, sugar, extract of vanilla, and r o o n . In this analogy, the reeipc is the program (ix., an algorithm expressed in sume suitable notation). the computer scientist is the processor (CPU), and the cake ingredients are the input data. The process is the activity consisting of nur baker reading the recipe, fetching the ingredients. and baking the cakc. Now imagine fhat the computer sckntist's son wmes running ia crying, saying that he has been stung by a bee. The computer scicnrist records wherc he was in the recipe (the state of the current process i s savcd). gcts out a first aid book. and begins follnwing the directions in I t . Hcre we see the processor being switched from one process (haking) to a higher-priority prmcss (administering medical care), each having a different prugrarn (recipe versus first aid book). When the bee sting has been taken care of, the cornputcr scientist gocs back lo his cake, cnntinuing at the point where he Zef~off. The key idea here is that a process is an activity nf some kind. It has a pnlgram, input, output, and a state. A single prxessor may be shucd among several processes. with some scheduling algorithm being used to determine when to stop work on one process and service a different one.

2.1+2 Process Creation
Operaling systems nced somc way to irlake sure all ihe necessary processes C X I SIn .very simple systems, or in systems designed €or running only a single ~ application (e+g., conhwller in a microwave oven). i t may be possible to have the all the processes that will evrr be needed be presenl when the system ctrlnes up. I n general-purpose systems. however, some way is oeedcd to create and term iiia t e processes as needed during operation. We wilf nnw louk at some of the issues. There are four principal events that cause proccssus t.0 be created:

I . System initialization.

2. Execution of a process creation system call by a running process.
3. A user request to create a new process.

4. Initiation of abatch job.

When an npcrating system is booted. typically scvetal processes are created. Some of these are foreground p~~oocsscs, that is, processes that interact with

PROCESSES AND THREADS

{human) users and perform utork for them. (Ithers are backgmulld processes. which re not ( I S S W ~ & ~with p ~ t i c u l a rusers. but instead have some specific function. For example, one background process may be designed to accept incoming ernail, steeping most of the day but suddcnly springing lu life when email arrives. Another background prwess may he designed to accept incoming rcquesh for Web pages hosted on [ha1 machine. waking up when a requesr arrims ro service the request. Processes that stay in the background to handle some activity such as email. Web pagcs, news, prindng. and so o n are called daemons. Large systcms commonly have dozens or thcm. I n UNIX. the ps pnlgram can be used to l i s t the runni!~g processes. In Windows [15/98/Me, typing CTRL-ALTDEL once shows what ' r running. In Windows 2(N)O. the €ask mnnrrart. is uscd. In addition to the processes created a1 bcmt titne, new prrxesscs ran be crealcd iifterward as wcli. Oftrn a running pmcess will issue system calls lo c'rtxtt. Llne tbr rwire riew processes 112 help i t do i t s job. Creating new procrsses is particuhrly useful when the work to be done can easily be C~~rrnu1atc.d terms r7f sevcral irl relnked, but otherwise iriclependenr interwtitlp pn~uesses. h r exii.ltnple, ir a large amount of dara is k i n g f e t c h 4 over a netwurli for subscqucnt prncrssing. i t tr~uy
L.

In interactive systems, users can star1 a prograrn by typing crsrrrlnand or (double) clicking an icon. Taking either n i rhesc actions starts n new process and
runs the selected program in it. l o comma~id-basedI!NIX systclns r u l ~ n i r ~ g WinX dnws, rhe new process takes over lhc wint;irn+ in which j r was sm-red. In MIrrwsrjfi Windows, when a prncess is shried i t dne5 t w t have il t ? i i n d ~ w b ~ i t t can , i create one (or more) and inohi do. I n both sysrelns. uscrs may have nmltiple windows n p n at once, each running sorrv process, Using the mousc. ~ h w e r can r selecl a window a i d inlcrxl with the procc5. for exnrnplc. prirviding i l l put whcn needed.

T r o h ~ ~ i u a l l y . all 1hcse cases. ;l ncw process is creatcd by having ail cxisting in prlress executc a p n x c s s cresticm systern call. That proccss ma): he a running user p r e s s , a system prrlcess i ~ ~ v o k c from the keyboard or ~nousc.~,r batch ci n matlager process. What that prtxess does is 2xecute a system call 10 cli-cate rhe new process. This systrm cull tells the operatin_r system lo urentc a new process and indiciucs, directly or. indircctl y , which program lo tvn in it: Ln UNIX. thew is only one system call lu create a new process: fork. This d l creates an exaci clone of the calling process. After the fork. thr two processes. the
L

SEC. 2.1

PROCESSES

75

parcllt atld the child, have the same memory image. the sa11-Wenvit~flment swings. and the sallle open files. That i s all there is. Usually. the child process then executes execve or a similar system call to change its memor). image and run a new progratn. For example, when a user types s command, say, surf, to the shell, the shell forks off a child prtxcss and the child executes sort. 'The reason for this rwo-strp process is allow the child ro n~anipulateits file drscriplors after the fork bul before the execve to accomplish r-edirection of standard input. standard
output, and standard errtw. In Windows, in cclntrast, a single Win32 function call, CreateProcess. handles both pmuess creation ~ n loading the correct program into the new process. d This call has 10 pxamelcrs, which include the program to be executed, the command lint. paratneters t o feed that program, various security attributes. bits that control whether open files are inherited, ptic~ritvinformation, i-l specitlcatiui~af the winduw to be created for the process (if any), and a pointer to a structure in which information a b u l the newly created process is returned tr-l the caller. In addition to CreateProcess, Win32 has abmt I 0 0 uther f'urictims for managirig and synchronizing processes and re1ated topics. In both IJNlX and Windows, afrer a pnJcess is created. both rhe parent and child have their own distinct address spaces. If either process changes a w.ortj in i t s address space. the changc i s no1 visibIr tn the other process. Tn UT%~X, the child's initial address space i s a cr>py of the piircnt's. but there are two distinct address spaces iiivnlved; no writable mrmory is shared (some C'NIX implrmen~alirms share the program text belween the two since that cannut be rnuditied). 11 is. however. possible for a newly created process tn shrirc somc of ils crciitot's r,thsr resources, such as open files. In Windows. the parcnt's and child's address spaces arc different from the start.

2.1.3 Proces Terminat i m
After a process has been created. it starts running aud dtws whatever its job is. However. nothitlg lasts forever. not even ptmuccsses.Sooner or later thc new pmcess will terminate. usually duc r onc of thc following conditions: o

I . Nr?rm;ll exit ( volunrxy).
7. Errnr exit (volu t~tary).

3. Fatat error (invi-luntuy).

1. Killed by anothcr process (involuntary).
Most processes terminate because they h w e done rhcir work. Whcn ;I am\piler has compiled the program given to it. ~ h c cnrnpiler executes a system call to tell the operating system that i l is finished. This call is exit in UNlX and ExitProcess in Windows. Screen-orien~edprograms also support voluntary termination.

PROCESSES AND THRFADS

Word prc.~rssors, Internet hrnwscrs and s i inilni- progtmilrns always hove an icon or lnenu irem that the uscr can click to tell t h e pi-ocess to rcmovc ;my tcrnporary files i t has upen and then terminate. The second reason for termination i s that the process discovurs a htal rslmr. Fur example. if a user types I h c c o ~ n m a i ~ d

r corrpilc the program fou.r and n o siich file exists, the cum pile^. sitnply exits. o Screen-clricnted inieractivc proccsses generally do not c x i t when given had parirmeters. Instead they pop up a dialog hox and ask ihs user to try ;\gain. The third reason for terininiltlcm is nn t.r-rnr c,.aut;tdby the prrxess, often due tr~ a program bug. Examples include executing an illegal instruction, referenuing nonexistent meInory, nr dividing by zero. In snrne systcms (erga. N I X 1, a process U can tell the operating system that it wishes to handle certain errors itself, in which case the process is sigr~aled(interrupted) instead i ~ f terminated when one of he
ct-rors I'lccurs.

The fourth reason a process might terminare is that a process execules a system call telling the operating system to kill snmc other process. In UbiIX this call is kill+ The correspondrr~g in32 functilm is TerminateProcess. In b o ~ h W cases, the killer must have the necessary uuthmimtion to ciu in the killee. In snme systems. when a process terminates, either vduntariiy or otherwise, d l processes it created are immediately killed as well. Neither UWTX nur Windows works this way, however.

2.1.4 Process Hierarchies
In some systems, when a prwess urGi1te.c; amther pl-occss. rhe parent process and child process continue to be associated in certain ways. The child pi-ocess can itself create more proccsses. forming a proccss hierarchy. Note that unlike plants and animals that use sexual reproduction. a process has only one parent (hut zem. one, two, o more children). r In UNIX. a process and all of its children and fulther dcscendalits rogether form a process group. When a user sends a signal from the keyboard, the signal is delivered m all members of the prcress group currently associated with the keyboard (usually all activc processes char were created in the current window)- Individually, each proccss can catch the signal. ignore the signal. or rake thc defitult action, which is to be killed by the signal. As another example of where the procr.; hierarchy pla1.s n ride. Ict us look it{ how UNIX initializes itself when i t is s t a n d . A special process. called h i r . is present in h e boot image. When i t starts running. j t rcads n file relling how m:my terminals there are. Then i t forks off one new process per terminal. Thcse processes wait for someone to log in. If a login i s succcssful. the login prtxess erecutes a shell to accept commands. These cotnn~ands may start up more processes.

SEC. 2.I

and so li~rth. Thus, all the prucrsses in the wholc sysrem belong to a single tree. will1 i ~ . r i c3t the ma. In conIrast, Windrws does not have any concept of a process hierarchy. All proccsscs arc equal. 'The only place wher-c therc is something like 3. process hierarchy is that when u process i s created, the parent i s given n special ioken (called a handle) thal it can use to con~rol tho child. However, i t is free tn pass ~ J token to some othcr process, thus invalidating the hierarchy. Processes in S LJMX cannot disinherit their children.

2.1.5 Process States
Alrhuugh each prucoss is an independent entity, with its own program counter and internal state, processes often need tr, interact with other processes. One process may generate srlme autpur that another prwcss uses as input. I n thc shell

ccln~rnand
cat chapter1 chapter2 chapter3 I grep tree
h e first process, running crrf, concatenates and autputs ihree f ~ k s . The secmd process, running grep, selects all lines containi~lg word "trec." Depending on thc the relative speeds of the two processes (which depends nn both the relative cumplexity of the programs and how rnuch CPli time each one has had), ir may happen that grep is ready to run. but there i s no input waiting for it. l r must then blmk until some inpur is available. When a process Mocks, il d i ~ s because logically it cannot continue, typiso cally hecause it is waiting for input that is not yet available. I t is also possible fur a process that is conceptually ready and able to run to be stopped because 1ht. csperating system Bas decided to allocate the CPU to another prwcss h r a while. These two conditir~ns completely different. In the first case, the suspension i s are inhereni in the problem {you cannnl prwess the user's command line until it has been typed). In the second case, i t i s a technicality trf the system (nut enough CPUs to give each prcwess its clwn private processor). In Fig. 2-2 we see a stare

diagram showing the three states a process may be in:
1 . Running (actually using I h c CPU nt thal instant).

2. Ready (runnablc: temporarily stopped to let another process run).
3. Blncked (unable to run until sumc cxtemal event happens).
Logically, the firs1 two stales are similar. In both cases thc process is willing io run. only in the second one, there is ten~pc~rarily C'PU availiiblz f i x it. The nu third state is different from the first twn in that the proccss cannot run, cvcn if tbc CPU has mlhing else to do. Fuur trmsitions are possible among these thrce states. as shown. Transition I occurs when a process discovers that it caonot continue. In some systems the

1 . Process olnck$ for input

2.Scheduler picks another process
3. Scheduler picks this process 4 Input becomes available

p,+L~,ccss cxecule a sys~cincall, such as block o r pause. lo grr itlw t ~ l o c k ~ d musl slalz, In other systems. including I:NIX. whcn a process r u d s ti-OW pire 01, ';PCa cial file (e.g., a tennitlall and there is n o input avnilahlr, t h e prucc5s i s ~ U ~ U T I I ; ! ! ~ d l y b\ocked. 3 Transitions a ~ t d ;w caused by thc process schedulrr. a pait of i h r operdting systunl, without the proccss cven knowing about thcm Tlnnsition 2 occurs when the scheduler deci~ies h i t thc iunning p n x r s s has ruIi long cnr~ugh.m d i t i s t time tr, 1ct mother pructx have some CPLI time. Transitirm 3 occurs u.hm all t h r uther processes have had theit. fair sharr: and ir i s tirnc fur the firsr pAocess get ti, the CPU to run again. The subject of scheduling. that is. deciding which prowis should run when and for how long, i s an imprmant one; wr will 1nr.A at i t later in this chapter. Many algorithn~s have been devised to t r y to baiatwr thc snmpeting demands of efficiency for the system as a whole and fiiirness to individ~ral processes. Wc will study some nf them latrr in this chptcr. Transition 4 occurs when the external event h r which a process was waiting (such as the arrival of some input) happens. If no orhcl prncess is running ar that instant, transition 3 will he triggered and the pnxess will start running. Otherwiw i t may have to wait in rua& state for a htlc while t~nti! CPU is available a i d the its turn comes. Using the process model. it becomes rrluch cusier 10 think ahsut what i s going on inside the system. Some of ~ h c proccsscs rut-] prugrarns that c a r 0 vul cumrnands r y p d in by a user. Other processes arc part of the system and tiar~dlctasks such as carrying out requests for tile senices nr inar~aging detaj 1s ul' ruming u the disk 01. a tape drive. When a disk interrupt occurs, the systetn rnakcs a decision to stup running the current process and run the disk prmess, which wits blocked waiting for that interrupt. Thus. instead of thinking ahout interrupts. we can think iibnut user processes. disk pmcesses. icnninal processes. and so on. wllich bltxk when they are waiting for something to happen. When the disk has beell :-earl or ~ h character typed, the process wailing for i t is urthlncketi iind is elipiblc ro nln c again. This vicw gives riw to the modcl shown in Fig, 2 - 3 . Helde ~ h lou,t"st level of s t hc operating system is the schedulrr. with a variety of prrwesscs 011 top uf it. A l l the itlterrupt harldhng and details of actually starring and slopping pibwcesses;ire hidden away in what i s here called the scheduler. which is actually not much

code. The rest o thc operating system is niccly structure J i ~ process fctm. kicw f r red syste~ns as nicely struuturcd as this. h o w v e t . are
Processes

Scheduler

I

2.1.6 Implementation of Processes
To implement the process modcl. the uperaring system imintains a rablu (an array uf structures), called the prwess table. with rm-e centry per pruucss. (Sorrbc authors call these entries process control hlucks.) This entry crmtairw infonmtion about the pruccss' slalc, its progrnim counter. stack pointer, mtinrry dluciition, the status at' its opcn fjles, its accounting and scheduling inhrn~atiun,and everything else about the process that must he saved when the pi-ocess is switched from mnning to r e d y o b1m'krJ statc so that i t can bc restnrtcd li.il~r if it had r as never been stopped. Figure 2-4 shows some of the more important: fields in ; rypiciil syslern. The I frclds in thc tifit column relatc to proccss tnanapcnwnt.. Thc other two ci~~uiniis relate to rnernory management and file rnan;lgeti~ent,rcs pectivcty. t t shrluld be noted that precisely which fields the process table has is highly system dependent, but this figure gives a general idca of the kinds 01' informatirm ~ieeded. Now that we have looked at the prncess table, i t is possible to explain a littlc more about how the illusiun of multiple sequential processes is ~nitintninedo n a machine with m e CPU and many 110 dcvlces. Associated with cach 1K.I d ~ v i c c : ciass ( e g . floppy disks, hard disks, timers, k m ~ i n a l s is u I w a t i m (oficn near t h r ) bottnm o memory) called the interrupt vector. It crmt.ilins the sddres8 t,C the f intermpc scn~iceprocedui-e. Suppose that user process 3 is running n h e n a disk intempt OCCUTS. User p~-DCCSSprogram counter. ptmgrmn ~Latuswword.and 3s ' possibly one or more registers are pushed onlo the (current) stack by the interrupt hardware. The computer then jumps to the ;~tldress specified in the disk interrupt vector. That is all rhe hardware does. Frrm herc o n . i t i s up ro the software. i n particular, the interrupt service prncedure. All inkrrtrpts star1 by saving the rcgistn-s, often in the pl-oress t:hle entry fcjr the currcnt process. Then thc intormation pushed onto the stock by the interrupr i c removed ant1 the stack pointer is set to point I U a lrmporary stack used by the

.- . . . . . . . . . .

...

- -... --

Process management Registers Program counter Program status word Stack pointer Process state
Priority Scheduling parameters Process ID Parent prc-cess Process group

I

Memory management Pointer to text segment i Pointer to data segment i Pointer to stack segment

File managemen R o d directory Working directory File descriptors

User lD
Group ID
I I

.

Signals Time when pmcess started CPLI time used Children's CPU time Time - . . next alarm of . . .-

process handler. Actions suuh as saving thc tegisters and setting the stack pointer cannot w e n be expressed in h i g h - b e l lancuagcs such ;is C. so they are performed by a small a.;sembly language routinc. usually rhc s m l e me- for all inlerrupts since the work ol' saving the registers i s identical. nc, n m k r what t h e c;iusc of the interrupt is, Whcn this routinc i s finished, it calls 3 C prr,cedwe tr:, do the rest uf' lhrr work far this specific intempt type, (We assume the upcrating system is wti~teri C. irr the usual choice for all real operating systems.) When it has done i ~ job. possibly s making somc prmws now ready, the scheduler is called I t 1 szc u h i ) tu run n e x t . After that. control is passed back t o the asscmbly language c d r : l o l o a d up the repistcrs and mcrnw-y map for the now-current proccss and slarl i t runnirlg. Interrupt handling and scheduling are summarized in Fig. 2-5. It is wu1d-1noting that the details vary sumewhar. frwm system lo system.
b

-

.--- . .

-

-

. . . . -. . .

......

.. .

....

I F H a r d w a r e stacks program counter. atc. 2.Hardware loads new program counter from interrupt vector. 3.Assembly language procedure saves registers. 4. Assembly language procedure sets up new stack. 5. C interrupt service runs (tjtpically reads and buffers input). 6.Scheduler decides which process 1 to run next. s ! 7. C procedure returns to the assembly code. 1..8. .Assembly language procedure starts up new current process. . - . . . . .
:

. -....

PROCESSES

2.2 THREADS
. 3

In traditional openting systems. each pnxess has an address space and a sinc~Ie hmid nf cnntrol. In fact, that is alnlvsl Ihu definition of a process. Neverthet Icss, there are frequently si~uiltions in which it is Jesirahlc to have multiple thrcads of cuntrtd in the silme addmss spacc running in quasi-parallel, ils though lhcy were separate processes (rxuept for rhc shared address space). In the followi i w seclions we will discuss h s e situations and their irnplic:atims.
%-

2.2.1 'Che Thread Mndd
The process model as we h a w discussed ir thus far is based m twu indepcndent cl3nr;cpts: resource grouping and cxecu t i m . Sr~rnetin~cst is ~ s t f u ltu i separate ~hcrn; is where thrmds wine in. this One way of looking at ii process is that i~ is way lu group related resources together. A prrxess has an address spaw containing program tex! and data. ns well ns other. resources. These rcsourm may include open files, child pruceshcs, pending alarms, signal handlers, accounling inforrnatinn. and mure. E3y putring them tr~gether n [hc form o f ; process. they can be managed mure easily. i l The olher concept a process has is a Ihread uf execution, usually shortened ta just thread. The thread has a program crsuntcr that keeps track of which instruction to execute next. It has registers. which hold its current working variables. I t has a stack. which contains the execution history. with one 1'riime fur each procedure called but not yet rcturncd froin. hhhough a thrcad must e x r c u k in sunle proccss, ihc thrcad and its proccss arc difKcreni ur-mcrpts and ciin he treated separately, Pnxxsscs are used t o grwp resources together; ~hreads are r h entities ~ scheduled fbr executiun on rhc CPU. What threads add lo the proccss model is to illlr~w~rlultiple cxrcutionr to take place in the same process environment. to a large dcgret: independent o f ( m e another. Having multiple threads running in parallel in onc proccss i s malogous lo having multiplc processes running in parallel irr one computer. In thc fomicr case. the threads share an address spacc, npen files, and ut.her resources. In the latter casc, proccsses sharc physical memory, disks. printer.;, and other resources. &cause threads have wmr uf the propcrries of processes, rhcy arc sometinxs ~ulled lightweight processes. The term multithreading is also used to deswihe he situation of allowing multiple threads in h e same process. In Fig. 2-6(a) we see three traditional proccsses. Each pnlcess has i t s own address space and a single thread uf contml. In ctmtrast, in Fig. 2-hib) we see i3 single process with threc thrcads of contrnl. Although in both cases wc hnvc threc threads, in Fig. 2-6(a) each of them operates in a diffcrenl addrcss space. whercas in Fig. 2-6(b) all threc o f tbcm share the same address space. When a rnultithreaded process is run o n a singlc-CPU system. the threads take turns running. In Fig. 2 - 1 , we saw how multiprogramming of processes works.

Process

User

spocc
Thread

space

\)

I

Kernel

Bv switching back and forth among multipk proccsscs. the syslcm gives the ilhsion of separate sequential processes running in paraliel. Multithreading works the same way. The CPU switches ripidly back and forth among the rhreads pmviding the illusion that the threads are running in parallel. albeit on a slnwer CPU than the real one. Wlth thrce compute-bcmnd thrcads in ;I pnKesr;, the threads wr~uldappear tr, be running i n parallel, each m e o n a CPU w i l h mw-third the speed of the real CPU. Different threads in a process are not quite as independent as dif'krenr processes. All threads have cxnctly the same address s p a w which means [hill they also share the same global variables. Since every thread can access evcry memory address within thc process' address spuce, one thread can read, write. o r even cun~pletelywipe out another thread's stack. Thcre is no protection hctween threads because ( 1 ) it i impossible. and ( 2 ) it should not br: necessary. Unlike s different processes. which may be from different users and n h i v h may be hostile co one another, a process i s always owned hy a single user. who has presumably created multiple threads x that they call cu~spcrate.not fight. In addition to sharu ing an address space, all the threads share 111csame S C ~r ~ f upen files. child processes. alarms, and signals, etc. us shown in F i g 2-7. T h u s the organization uf Fig. 2-6(a) woutd be uscd when the thwc prwcsses are ussentially unreliitcd, whereas Fig. 2-6(h) would be appropriate whcri ihe three threads arc actual1y parm! o f the 5ame jot? and are actively and c l t ~ l cwperitting with each othcr. y The items in the firs[ column are p n m s s properties. not thread propcrtieh. For example, if one thread upens a filc, that file i s visible to the othcr threads i n the process and they can read and write il. '!'his i s hgical since thc prncess is the unit (,f resource rnnnngen.lcnt. not the thread. Tf caoh thread had its own address spuce, open tiles, pending alarms. and so vn. it would he a separate process. What we are trying to achieve with the thread conccpt is thc ability for multiple threads

I

-.

I

-

--

"-

. . .

Per process items
;

Per thread items
Program counter Registers

; Address space I Global variables Open files : Child processes I I Pending alarms Signals and signal handlers Accounting information .. . . .- .

!

Stack
State

I'rmn ~ o m e task.

Like a traditinnal process {i-e., ii process with ~ r d y ULIC thread), a rhrad can be in any m e nf several states: running, hlnckud, ready. or terminated. A running thread currently has [hi: CPC; and is active. A l~locksd thread i s waiting h r srmw w e n t tn unblock it. F m example. when a thread prrfor~nsa system c d l t o read from the keyboard, it i < bhckcd until irlput is typed. A thread cun hluch wiring for some external even1 lo happen or for some other thread to u~ihjock t . A r e d y i thread is scheduled to run and will as soon as its turn curncs up. Thc transi~ions between thread states are the same as the transitions k l w e e n process slates mb are illustrated in Fig. 2-2. It is important to realize thiii each thread has its own stack, as shown in Fig. 2-8, Each thread's stack wntains o t ~ c frame f'or each p w e d u r e called bul nut yet returned frrmt. This frame contains the prrxxdure's lwal variabks and thc return address to use when thr pmccdure call has finished. Fur example. if procedure X calls procedure Y and this one calls prmcdurc Z , while Z is executing the frames for X, Y, and Z will all be o n the stack. Each thread will generally call ilifferent procedures and a thus a different execu~ion history. This is why i s thread needs i t s own s~ac k When multithreading is present, processes nomially s1at-t with a single thread present. This thread has the ability to create new threads by calling a library procedure, for example, fhreud-rreure. A parameter to rhrrd-crrnru typically specifies the name of a procedure for the n c n thread to run. It is nor ncctssnry (or even possible) to specify anything about the new ihrcad's address space since il au~ornaticnllyruns in thc address space of thr crealing ihrcati. Sometimes threads are hierarchical. with a pare.nt-child relationship. hot often ilo such relarimship exists, with all threads being equal. Wirh ur without a hierarchical rtll:riionship, the creating thread is l~suallyreturned a thread identifier thai niilmes thc ncw rhrcad. When a thread has f<nished i t s work. it can exit by calling a l i b r a ~ y pro~edure, say. thrrird-eri~. I t then vanishcs and i s rro lotrger schedulable. In snme thread

.

PROCESSES ANT) THREADS
Thread 2

Process

Thread 1's stack

Thread 3's stack

Kernel

k'ignre 2-8. Each thread has irs own stack.

systems, one thread can wait for a (specific) ~hreadio exit by calling a pmcedurc. far example, head-wrrir. T h i s procedure b h c k s the calling thread u t ~ t i l a (specific) thread has exired. In this regard, thread creation and termminatim very is much like process crention and termination. with appmxirnately the sijrne uptims as we1 l. Another common thread cdl is rhrrrrd-~irid.which allows n thrcacl to voluntarily give up the CPU to let another thread'run. Such a call i s importan1 becausc there is no clock interrupt to actually enforce timesharing as there is with processes. Thus it i s imponant for threads to be polite and voluntitrily surrendtsr the CPU from time to time to give other threads a chancc t o run. Other calls allow one thread to wait for another thread to h i s h some wnrk, for a ~ h r c a d1 0 announce that iit has finished some work, and so on. While threads are oftcn useful, they alsn intrrxluci: 3 number ~ n fr;urnpliuatirms into the programming mr~dei. Tr, start w i h , consider the effects of thc l?YIX fork system call. If the pawn1 process has multiple threads, shr,uld the child alsu have thei-h'? If not. thc process may nat function prqserly, since d o f them may be 1
essential.

Howevcr. if the child process gets as many threads as the parent. what happens i f a thread in the parent was blocked on a read call, say, from the keyhoard? Are two threads now biockcd on the keyboard, one in the parent and (mi: thc in child? When a line is typed, do both rhl-eads get a copy of' i17 Only the parenr? Only the child? The samr problem exists with open ne~work cnnneclions. Another class of problems is related to the fact that threads share many da\;.i structures. What happens if nnc thread clnses a file while anrjther m e is still readinp from it? Suppose that une thread noticcs that thcre is too 1irrle rllernury anti starts allocating mare memory. Pan way through. a thread switch occurs, and thc new thread also notices that there is too iirtle memory and also starts allocating

THREADS more memory. Menlory w i l l probably he allocated twice. These problems can be solved with some effort, but careful thcsugh~ and dcsign are needed to make mult ithreaded programs work correct1y .

2.2.2 Thread Usage
Having dcscrihed whar threads are. i f is now time to explain why anyone wants them. The main reason for having threads is that in many applications. multiple activities are going on at once. Some of these may block from time to lime. By decomposing such an appiication into mu ttiple sequential rhreads that run in quasi-parallel, Ihe programming m d e l kcornes simpler. We have seen this argurnenl before. It is precisely the argument tur huving processes. instead of thinking about interrupts, timers, and context switches. we can think abuut parallel processes. Only now with rhreads we add a new clcnient: the ability for the parallel entities to share an address spacc and all af its data amung themselves. This ability i s essential for certain applications, which i s why having muhiple processes (with their separiite address spaces) will not wtlrk. A second argument f w having threads i s that since they do nor have any resnurces ateached to ihurn, they are easier it, crenlc and destroy than processes. In many systems, creating a thread grxs 100 tirncs faster than ui7eating u pmuess. When the number of threads needed changes dynamicntly and rapidly, this property is useful. A third reason for having threads is also a pcr1;mnance argurnenx. Threads yietd no perfnmance gain when all of ihem are CPU hound. but when there is substantial computing and also substantial l/O. huving threads allows thcse astivities to overlap, thus spceding up the application. Finaljy, threads are uscful on systcrns with milltiple CPUs. whew r e a l parallelism is possible. We w i l l come hack to this issue in Chap. 8. It is prt~bablyeasicsl to sce why threads arc useful hy giving some concrete examples. As a firs1 cxample, consider a word processor. Mosl word prwcshors display the document being cwared tro h e screen formatted exactly as it will apwar on the printed page. I n particularc all the line breaks and page hrcaks are in their crlrrect and r i n d position so the uscr can inspcct them and h a n g the document i f need br: k g . . to elirninare widows and nrphans-incomplttk top imd bottom lines on a page. which are considered esthet i d l y unpleasing). Suppose that the usur i s writing a book. From thc authofs p i n t of vicw, it i s casiest to keep rht: emire bonk as a single file t c ~lnakc i~ easier to search for topics, perform global substitutions. and so o n . Alternatively, each chapter might be a separale file. Hnwevel-. having every section and suhscction a a separate file is a ma1 nuisance when global changes have ro he rtlade to the entire book since rhm hundreds o r files have to bc individually edited. For example, if proposed standard x x x x . is approved just bctirrc the book gws to press, all vccLirrences of "llraft Srandard xxxx" h a w tu be changed to "Standard xxxx' at

the last minule. If the entire book is one file, typically a single cornnland can do h the .;uhslitutions. In contrast, if the book 15 spreild tIVCF 3NI fiics. c ~ fllK must be edited separately. Now consider what happens when the riser suddenly dcletrs one s c n t ~ l l u r from pnge 1 of an 800-pagc docurncnt. After ctieckinp h e changed papc t o rtlakc sure It i s correct, the user now wants to nmkc ;mother uhangc 011 pagc A(#) and types in a command telling the word pruoctssca- to go to that page ((possjbly by searching for a phrase occuning only there). The word p~occssoris now forced to refofma1 thc entire book up to page 600 on the spot be.ca\~set does not know what i the f i r s { linu o f pagc 60I) will be until ir has prmocesscdall the previous pilges. There may be a substantial delay before pnge 600 can h displqed, Icading tu an r
unhappy uscr. Threads can help here. Suppose thal the word prt>cessor 1% wwrit~enas a cwuthreaded program. One thread interacts with h e user and the o ~ h e handles ~ c f t l r r matting in the background. As soon as the sentenre is delctd I r m ~ q e t Ihe p interactive thread tells the reformatring thread ro rcl'urmnt the whole h w h . Meanwhile, the int-eractivc thread continues tu listen 10 the keybr~ardand inouse and responds to simple commands like scmlling page 1 while the orher tliread is cumpuling madly in the btlckground. With a little luck, the rcformacting w i l l he completed before the user asks to see pagc 600, so it can he disptaycd instanlly. Whilc we are at it, why not add n third thread? Many word processors have I( feature of automaticaLly saving the entire file tn disk every few minutcs to pl-otect the user against losing a day's work in the event of a program crash, syslcm ur;lsh. or power failure. The third h e a d can handlc the disk backups without interfering with the other two. The situation with three threads is shown in Fig. 19-9.

Kernel

Keyboard
Figure 2-9. :I word proccsstx with khrce threads.

Disk

progrkl,reme e dnp\e-thred&d. (hen u heclever a disk backup started: w from the keyboard and rnuusc wuuld be ignored urltil the backup was t e. finished, The user %rould p e r ~ c i v e h i s as sluggish p € ' r f ~ m ~ ; l ~ cfi11~.rllati\'ely+ keyhoard and *nouseevents could inlcrrupr thc disk b i ~ k l ~ allowing g c ~ perforp. ~d mance bur leading lo n complex intrrrupl-driven pn-qrarnming model. With thrcc threads. 1hc prog.r!ralmming model is t ~ ~ u c h sirnplcr. The first t h ~ just irmxacts d with t t 3 t user. The second thread reformals the dixwnerrt when ~ n l d The third tcr. disk periodically. I hwad writcs the cantents of R A M I t should be clear that having three separate processes would nut work here because a11 three threads need tu opcratc n n the dncunlenl. Ry having three threads instead o f rhrctr prtscesscs. they share a ccmmun t n z m q and thus all have access tu the docurnen1 being cdiced. An analcsgnus situation exists with many other irttenwtive prt)grams. For cxnmple, iln eleutrt~nicspreadsheet is a pnyrarn that allows i i user tn muintiin a ~nirtrix,some of whose elcn~entsare data prrwidrd by the user. Other elerncnts arc crm~pu~ed based r m the inpul dora using potentially cunlplex fnnnirlns. When a user changes one element, many other elements may have tn be recrmipcrtcd. By having a hackgrr~und thread do the recomputation, \hc interaclive thread ran nilow the user t c ~ rnake nbditimal changes while thc compuiatian is going o n . Similar1y , a third thread can handle periodic backups trj disk un its uwn. Nr>w crtnsider yet mother example nf where threads arc uucful: a server likr n World Wide Web site. Requests for pages come in and rhe requested page is sent back to the client. At most Weh sites. some pages are more coml~ionlyaccewcd than other pages. For example. Sony's homc page is accessed far more than a pave deep in the lree cont~lining the technical specifications o f some particular carnuorder. Web servers use this fact to iniprove prriormi~nceby maintaining a collection of heavily used pages in main incmory to climii~atcthe r ~ e e d gu to to disk L get thcni. Such a cdlection is cnllcii a cache and is used in inany other o wntexts as well. Onc way L organize the Web server is s h ~ m ~ i i Fig. 2-10(3). Hcrc m e o in thread, the dispatcher, reads i n c u n l i ~ e rcquests f t ~ wurk f w m thc network. At'ter r exmining the requcst. it chrmses an idle ( i . l~lockccl) ~ worker thread and hands il thc ;equent. possibly by writinp n pointer to the rlrcssage into a spccial word ossociatcd with u c h thread. The dispatcher then wakes up ihe slccping worker. moving it h u m b l w k e d state i n ready state. Whl-n the worker wakes up, it checks t o sel: i f the rcqurst can he satisfitxi from the Web page cub. to which all thrzaris h a w access. It' nut, it. starts ; read I operation l ger thc page h r n thc disk and blocks until rhe disk c~perationcomo plctrs. When the thread blocks on the disk operation. another ihrcad is uhnscln to run. possibly the dispatcher. in order IIJ acquire more work, ur possibly ilnother wnrker h a t is nmv ready tu run. This model allows ihe server t o he written as ; collection of sequential 1 threads. The dispatcher's program consis~s an infinite lcmp for getting n work of
~f
b

L.

PROCESSES AND THREADS
Web $ w w r process

I,
Dispatcher thread

I

space

Web page cache

I
Kernel
space

\
J

request. and hmding it ufl'to a worker. Each worker's code consists o f an infinite loop consisting of accepting a request from h c dispatcher and checking the Web cache 10 see if the page is present. If stl, it i s returned to the client and the wnrkciblocks waiting for a new request. If nnt, it gcts the p a p frr~mthe disk, rcturns il to the client, and bhcks waiting for a new request. A rough c~utlineof thc cude is given in Fig. ?-I J . Herc. as i n the rest of this book, TRUE is assumed to be the constant I. Also, b@and prrgv are structure?. appropriate for holding a work request and a Web p q c . respectively.
while (TRUE) { get-next ... request(&buf); handoff-work(&buf); while (TRUE] {
wait for ..work(&buf)

1 1

look-for-page-in-cache[&buf, &page); if (page-.m t h ..cache(&page)) read page-from disk(&buf, &page); return page(8page);

(a1

v 4

Consider how the Wcb server cr~uld written in the absetwc of threads. Ontbe possibility is to h a w it operate as a single thread. The main loop of the Wch servcr gets a request, examines it, and carries it o u t tn completion bcfure ge~tinp thc next one. While wailing fur the disk, the server is idle arid does not process any other incoming requests. If the Web server is running un a dedicated

SEC. 2.2

machine, as is comnlonly the case, the CPU is simply idle whilc the Web server i; . wailing for the disk. The net result is that many fwcr ~ C ~ U ~ S € S / S C C be proCan cesscd. Thus thrcads gain considerable perfomance, but each thrcad i.s programrt~cd wqucntinlly. in the usual way. S o f i r we have seen ~ w possible designs: a inultithreaded Web server and a o singlc-threaded Web server. Supposc that threads are not available but the system designers find the performance loss due to single threading unacceptable. If a nunblocking versivn of the read sys~cm call is available. a third approach i s possible. When ii request cornrs in. the one and only thread examines it. If it can be satisfied fi-om the cachc. Fine, hut if nul. a nm~hIocking disk c~perationis started. 'I'he server records the state of the current ruqucst in ;itable and then goes and wts the next cvent. The ncxt event tnay cither be ii request fur ncw work or a c I-cply from thc disk about 2 previous operaticm. If i t i s new wr~rk.that work i s started. If i t is a reply frnrn the disk, the telcvant infixmation i s f'erchd from the table and the reply prucessed. With nunblr~ckingdisk I/<),a reply prnbably will have to take the f o m of a signal o r interrupt. i n this design, the "sequential prwess" model that we had in the fip,t [wr, cases is lost. The state of the crsrnpiitaticrn must be explicitly saved i ~ n d re.~tored in the table every lime the scrver switches from working on one requrst to another. In effecl, we are simulating the thrcads and their stacks the hard way. A design like this i n which each rr~mputatiunhas a saved stale and thcre exists w r n e set o f events that can occur l change the st8to is called a finite-statc machine. u This concept is widely used throughnut computer science. It should now be clear what threads have to offer. They make i t pussibk t o retain Lhc idea of sequential processes h a t m a k t blmking system calls (e.g.. fnr disk 1/0) and still achicve prvallelisnl. Blocking system calls make prrlgranmirlg easier and paralldistn inlproves perfwmance. The single-threaded scrvw rct~ains the ease of blocking sysrcm calls but pivcs up pcdbnnnnce. The third approach achieves high performance through parallelism but uses nonblocking calls and interrupts and is thus is hard to prtyram. These mt,dels arc sutnmarized in Fig. 2- 12,
.

7
'

-.. ..

-. -. .

--

1
I
.

. -

. .
:

Character istics

.. - . - - - -- ~ ~ ~ ~ Single-threaded -process . . - _

Finite-state machine -. -.

7 -

... Parallelism, nonblocking system calls, interrupts .i .. .... .

system calls-. d parallelism, blocking system calls s No - . -.. -

.. --- -

Figure 2-12. Three n ' a ys ro construct a serr.er.

A third example where threads are useful is in applications that must process very large amounts ell data. The normal approach i s ro read in a block of dara. process it. and then write it out again. Thc problem here is that if only blocking system calls are available. \he process blocks while data are coming in and data