You are on page 1of 19

Master of Science in Information Technology (MScIT-NEW) Semester 2 MT0041 : Computer Architecture

Assignment Set 1 Books ID: B0065 1. Explain briefly the functional units of a computer. Answer: FUNCTIONAL UNITS OF A COMPUTER SYSTEM

Following figure shows the functional units of a computer. It consists of five main parts.
y y y y y

Input unit Output unit Memory Arithmetic and Logic unit Control unit

Fig. 1.1: Functional Units of a Computer The input unit accepts the coded information from the keyboard of a video terminal or from other computers over digital communication lines. The information received is either stored in the memory for later use or immediately by the ALU to perform the required operations. The result obtained is sent back to the output unit. All these actions are controlled by the control unit. ALU and control unit together called as CPU or simply a processor. 1.2.1 Input Unit Computer accepts the coded information through the input unit. It has the capability of reading the instructions and data to be processed. The most commonly used input devices are keyboard of a video terminal. This is electronically connected to the processing part of a computer. The keyboards is wired such that whenever a key is pressed the corresponding letter or digit is

automatically translated into its corresponding code and is directly sent either to memory or the processor. 1.2.2 Output Unit Output unit displays the processed results. Examples are video terminals and graphic displays. Input and output unit is usually combined under the term input-output unit (I/O). For example consider the keyboard of a video terminal, which consists of key- board for input and a cathode ray tube display for output. I/O devices do not alter the information content or the meaning of the data. Some devices can be used as output only e.g. graphic displays. Organization of input/output unit is studied in detail in unit 5. 1.2.3 Memory Unit Memory unit is an integral part of a computer system. Main function of a memory unit is to store the information needed by the system. Typically it stores programs and data. The system performance is largely dependent on the organization, storage capacity and speed of operation of the memory system. Computer memory system can be broadly classified into four groups. Internal Memory Internal memory refers to a set of CPU registers. These serve as working memory, storing temporary results during the computation process. They form a general purpose register file for storing the data as it is processed. Since the cost of these registers is very high only few registers can be used in the CPU. Primary Memory Primary memory is also called as main memory, which operates at electronic speeds. CPU can directly access the program stored in the primary memory. Main memory consists of large number of semiconductor storage cells. Each cell is capable of storing one bit of information. Word is a group of these cells. Main memory is organized so that the contents of one word, containing n bits, can be stored or retrieved in one basic operation. Addresses are the numbers used to identify successive locations. Specifying its address and a command that performs the storage or retrieval process can access a word. The time required to access one word is called the memory access time.

The number of bits in each word is called as word length of the computer. Large computers usually have 32 or more bits in a word. Word length of Microcomputers ranges from 8 to 32 bits. The capacity of the memory is one factor that decides the size of the computer. Data are usually manipulated within the machine in units of words, multiples of words or parts of words. During execution the program must reside in the main memory. Instructions and data are written into the memory or readout from the memory under the control of a processor. Secondary Memory This memory type is much larger in capacity and also much slower than the main memory. Secondary memory stores system programs, large data files and the information which is not regularly used by the CPU. When the capacity of the main memory is exceeded, the additional information is stored in the secondary memory. Information from the secondary memory is accessed indirectly through the I/O programs that transfer the information between the main memory and secondary memory. Examples for secondary memory devices are magnetic hard disks and CD-ROMs. Cache Memory The performance of a computer system will be severely affected if the speed disparity between the processor and the main memory is significant. The system performance can be improved by placing a small, fast acting buffer memory between the processor and the main memory. This buffer memory is called as cache memory. Cost of this memory is very high. Access modes A fundamental characteristic of a memory is the order in which information is accessed. If the memory locations can be accessed randomly and if access time is independent of the location to be accessed, then the memory is known as Random Access Memory (RAM). Semiconductor memories are of this type. Memories whose storage locations can be accessed only in a certain pre-determined manner are called as Serial Access Memories. Examples are magnetic disks and tapes and optical memories like CD-ROMS. Each storage location in a RAM can be accessed independent of other locations. There is separate access mechanism (read-write head) for each location. Hence these are costlier than the serial type.

Fig. 1.2: Conceptual model of a Random Access Memory

In serial access memories, the access mechanism (read-write head) is shared by the storage locations. They must be assigned to different locations at different times either by moving the storage location or read-write head or both. Many serial-access memories operate by continuously moving the stored information around a closed path also known as track. A particular location can be accessed only when it passes the fixed read-write head. Hence the time required to access a particular location depends upon its position relative to read-write head.

Fig. 1.3: Conceptual model of a Serial Access Memory Memory devices such as magnetic hard disks and CD-ROMs contain many rotating storage tracks. If each track has its own read-write head, the tracks can be accessed randomly, but access within the track is serial. In such cases the access is semirandom. 1.2.4 Arithmetic and Logic Unit The Arithmetic and Logic Unit is the core of any processor: its unit that performs the calculations. A typical ALU will have two input ports and a result port. It will also have a control input telling it which operation (add, subtract, and, or, etc.) to perform and additional outputs for condition codes (carry, overflow, negative, zero result). ALUs may be simple and perform only a few operations: Integer arithmetic (add, subtract), Boolean logic (and, or, complement) and Shifts (left, right, rotate). Such simple ALUs may be found in small 4- and 8-bit processors used in embedded systems. More complex ALUs will support a wider range of integer operations (multiply and divide), floating point operations (add, subtract, multiply, divide) and even mathematical functions (square root, sine, cosine, log, etc.). To perform arithmetic and logic operations necessary operands are transferred from the memory location to ALU where one of the operand is stored temporarily in some register. This register is called temporary register. Each register stores one word of data. The various circuits used to execute data processing instructions are usually combined in a single circuit called an Arithmetic-Logic Unit or ALU. The complexity of ALU is determined by the way in which its arithmetic instructions are realized. Simple ALUs that perform fixed point addition and subtraction, as well as logical operations can be realized by combinational circuits.

The ALU is a combination of two sub-units i.e., Arithmetic Unit and Logic Unit. a) Arithmetic Unit The arithmetic unit performs typical arithmetic operations such as addition, subtraction, increment and decrement usually by unity. This unit also includes the necessary hardware required for operation on signed and unsigned integers, floating point numbers and BCD numbers etc. Now we will consider a simple arithmetic unit which performs addition and subtraction, which is shown in fig. It consists of a 4-bit parallel adder. X and Y are the two input operands each of 4bit. There are 4 one of two multiplexers associated with the y operand of the parallel adder. This multiplexer selects either Y or `Y depending on the selection line S0 which is also the carry-in input of the adder. If S0 =0 output is F = X+Y If S0 =1 output is F = X+`Y+1 where `Y+1 is 2s compliment of Y. Hence the output is F = X -Y

Fig. 1.4: Arithmetic Unit b) Logic Unit Consider a simple logic unit which can perform two logic operations AND and EX-OR. Here also multiplexer selects either of these operations depending on the state of S0. If S0 = 0 output G = X AND Y If S0 = 1 output G = X Y Some additional hardware can be added to the above basic logic unit to perform other Boolean operations. For example,

Fig. 1.5: Logic Unit Arithmetic and logic units are combined together to form a complete ALU. Outputs F and G are multiplexed with a nibble multiplexer to get a single output depending on the selection line S1 of the nibble multiplexer. The selection line is sometime known as mode selection line since it selects the desired mode, either arithmetic or logic. A truth table can be developed to illustrate the two arithmetic and two logic functions on 4-bit operands X and Y. Nibble Multiplexer:

1.3.5 Control Unit The purpose of control unit is to control the system operations by routing the selected data items to the selected processing hardware at the right time. Control unit acts as nerve centre for the other units. This unit decodes and translates each instruction and generates the necessary enable signals for ALU and other units. Control unit has two responsibilities i.e., instruction interpretation and instruction sequencing. In instruction interpretation the control unit reads instruction from the memory and recognizes the instruction type, gets the necessary operand and sends them to the appropriate functional unit. The signals necessary to perform desired operation are taken to the processing unit and results obtained are sent to the specified destination. In instruction sequencing control unit determines the address of the next instruction to be executed and loads it into program counter.

In general the I/O transfers are controlled by the software instructions that identify both the devices involved and the type of transfer. But the actual timing signals that govern the transfers are generated by the control circuits. Similarly the data transfer between a processor and the memory is controlled by the control circuits. The operation of the computer can be summarized as below:
y y y y

The computer accepts information through the input unit and transfers it to the memory. Information stored in the memory is fetched into arithmetic and logic unit to perform the desired operations. Processed information is transferred to the output unit. All activities inside the machine are controlled by a control unit.

In a given program, it is often necessary to perform a particular subtask many times on different data values. Such a subtask is usually called a subroutine. For example, a subroutine may evaluate the sine function or sort a list of values into increasing or decreasing order. It is possible to include the block of instructions that constitute a subroutine at every place where it is needed in the program. However, to save space, only one copy of the instructions that constitute the subroutine is placed in the memory, and any program that requires the use of the subroutine simply branches to its starting location. When a program branches to a subroutine we say that it is calling the subroutine. The instruction that performs this branch operation is named a Call instruction. After a subroutine has been executed, the calling program must resume execution, continuing immediately after the instruction that called the subroutine. The subroutine is said to return to the program that called it by executing a Return instruction. Since the subroutine may be called from different places in a calling program, provision must be made for returning to the appropriate location. The location where the calling program resumes execution is the location pointed to by the updated PC while the Call instruction is being executed. Hence, the contents of the PC must be saved by the Call instruction to enable correct return to the calling program. The way in which a computer makes it possible to call and return from subroutines is referred to as its subroutine linkage method. The simplest subroutine linkage method is to save the return address in a specific location, which may be a register dedicated to this function. Such a register is called the link register. When the subroutine completes its task, the Return instruction returns to the calling program by branching indirectly through the link register. The Call instruction is just a special branch instruction that performs the following operations: Store the contents of the PC in the link register Branch to the target address specified by the instruction

2. What is subroutine? Explain Answer:

The Return instruction is a special branch instruction that performs the operation: Branch to the address contained in the link register. Figure 12 illustrates this procedure.

3. Explain fetching a word from the memory. Answer: A typical computing task consists of a series of steps specified by a sequence of machine instructions that constitute a program. An instruction is executed by carrying out a sequence of more rudimentary operations.

Fundamental Concepts
Processor fetches one instruction at a time and performs the operation specified. Instructions are fetched from successive memory locations until a branch or a jump instruction is encountered. Processor keeps track of the address of the memory location containing the next instruction to be fetched using Program Counter (PC). Instruction Register (IR).

Executing an Instruction
y Fetch the contents of the memory location pointed to by the PC. The contents of this location are loaded into the IR (fetch phase). IR [[PC]] Assuming that the memory is byte addressable, increment the contents of the PC by 4 (fetch phase). PC [PC] + 4 Carry out the actions specified by the instruction in the IR (execution phase).

Executing an Instruction
y Transfer a word of data from one processor register to another or to the ALU.

y y y

Perform an arithmetic or a logic operation and store the result in a processor register. Fetch the contents of a given memory location and load them into a processor register. Store a word of data from a processor register into a given memory location.

Fetching a Word from Memory


y y y The response time of each memory access varies (cache miss, memory-mapped I/O,) To accommodate this, the processor waits until it receives an indication that the requested operation has been completed (Memory-Function-Completed, MFC). Move (R1), R2 MAR [R1] Start a Read operation on the memory bus Wait for the MFC response from the memory Load MDR from the memory bus R2 [MDR]

To fetch a word of information from the memory CPU must specify the address of the memory location where this word is located and request a read operation. This includes whether the information to be fetched represents an instruction in a program or an operand specified by the instruction. CPU transfers the address of the required information to the Memory Address Register (MAR) from where it is transferred to the main memory through the address lines of the main memory. In the same period CPU uses the control lines of memory bus to indicate that a read operation is required. After issuing this request CPU waits until it receives a feedback from the memory indicating that the requested function has been completed. This is done using another control signal on the memory bus, referred to as Memory Function Completed (MFC). The memory sets this signal to 1 to indicate that the contents of the specified location in the memory have been read and are available on the data lines of the memory bus and thus available for use inside the CPU. This completes memory fetch operation. The transfer mechanism where one device initiates the transfer and waits until the other device responds is called asynchronous transfer. This mechanism enables transfer of data between two independent devices that have different speed of operation.
4. Explain the evolution of the concept computer architecture. Answer:

Computer Evolution and Performance The evolution of computers has been characterized by increasing processor speed, decreasing component size, increasing memory size, and increasing I/O capacity and speed. One factor responsible for the great increase in processor speed in the shrinking size of the microprocessor components; this reduces the distance between components and hence increases speed. However, the true gains in speed in recent years have come from the organization of the processor, including heavy use of pipelining and parallel execution techniques and the use of speculative execution techniques, which results in the tentative execution of future instructions that might be needed. All of these techniques are designed to keep the processor busy as much

of the time as possible. A critical issue in computer system design is balancing the performance of the various elements, so that gains in performance in one area are not handicapped by a lag in other areas. In particular, processor speed has increased more rapidly than memory access time. A variety of techniques are used to compensate for this mismatch, including caches, wider data paths from memory to processor, and more intelligent memory chips.
Evolution of the concept of computer architecture
Computer architecture (programmers view)  the structure of a computer that a machine language programmer must understand to write a correct (timing independent) program for the machine Computer organization (implementers view)  actual hardware structure and realization Hierarchical, multilevel description  electronic circuit, logic design, programming, processor-memory-switch functional specification and hardware implementation 5. Explain concept of cost performance ratio Answer: The cost-performance ratio is a good indicator of relative quality for small changes, but its usefulness breaks down when costs and performance vary by large factors. It would be very deceptive, for example, to measure the cost-performance ratio of a small computer, such as an 8-bit video-game system, and to compare this to a much more powerful system, such as a workstation for computer-aided design. Although both systems are used to display images and interact with the images in real time, the video game probably has a much better cost-performance ratio than the workstation, assuming we can find some way of measuring relative performance. The problem is that the relative costs of the systems vary by a factor of up to 1000 to 1, and similarly, the relative performance factor is very large, although probably not as large as the relative cost. The video game cannot do the same job as the workstation. Moreover, if you put enough copies of the video game together to have a performance equal to the workstation, the cost would be less than the workstation cost, but the collection of video games still could not do the same job. So just to be sure that comparisons based on cost-performance ratios are valid, one should be careful to make the comparisons between computers that are similar in function and relatively close in performance. This discussion points to two important ways to make architectural advances: 1. Make small perturbations in cost and performance that yield lower cost- performance ratios; and 2. Boost absolute performance to make new computations feasible at reasonable cost. By "small" changes, we mean roughly a factor of 10 or less. Changes larger than this are surely welcome, but the cost-performance ratio cannot be trusted as a measure to evaluate the change. For the second point, the cost-performance ratio can actually increase, provided that the user can absorb the additional

cost, because the benefit of the greater capacity exceeds the cost to attain the capacity. We use both of these criteria throughout the unit as informal ways to evaluate ideas. Because absolute cost measured in currency is changing every year, it is more useful to define cost in terms of other parameters that influence cost. These parameters include the physical parameters, such as pin count, chip area, chip count, board area, and power consumption, derived from an implementation of architecture. The parameters also include factors associated with development, such as elapsed design time, amount of associated software to be written, and size of development team required. This unit cannot easily account for all the factors that affect cost, but it can isolate the most important ones, especially when comparing two closely related architectures whose differences are limited to a few critical design choices. The intent is to focus on the differences and discuss the ways they affect the cost factors. Each different approach has its own advantages and disadvantages, and they in turn affect the cost of the approach. We cannot give absolute costs, but we can show the influence of the design decision on the cost parameters. 6. What are the characteristics of RISC and CISC processors? Answer: RISC and CISC are computing systems developed for computers. Difference between RISC and CISC is critical to understanding how a computer follows your instructions. These are commonly misunderstood terms and this article intends to clarify their meanings and concepts behind the two acronyms. RISC Pronounced same as RISK, it is an acronym for Reduced Instruction Set Computer. It is a type of microprocessor that has been designed to carry out few instructions at the same time. Till 1980s hardware manufacturers were trying to build CPUs that could carry out a large number of instructions at the same instant. But the trend was reversed and manufacturers decided to build computers that were capable of carrying out relatively very few instructions. Instructions being simple and few, CPUs could execute them quickly. Another advantage of RISC is the use of fewer transistors making them inexpensive to produce. CISC CISC stands for Complex Instruction Set Computer. It is actually a CPU which is capable of executing many operations through a single instruction. These basic operations could be loading from memory, carrying out a mathematical operation etc. Common RISC characteristics Load/store architecture (also called register-register or RR architecture) which fetches operands and results indirectly from main memory through a lot of scalar registers. Other architecture is storage-storage or SS in which source operands and final results are retrieved directly from memory. Fixed length instructions which (a) are easier to decode than variable length instructions, and (b) use fast, inexpensive memory to execute a larger piece of code. Hardwired controller instructions (as opposed to microcoded instructions). This is where RISC really shines as hardware implementation of instructions is much faster and uses less silicon real estate than a microstore area. Fused or compound instructions which are heavily optimized for

the most commonly used functions. Pipelined implementations with goal of executing one instruction (or more) per machine cycle. Large uniform register set minimal number of addressing modes no/minimal support for misaligned accesses

Master of Science in Information Technology (MScIT-NEW) Semester 2 MT0041 : Computer Architecture 4 Credits
Assignment Set 2
Books ID: B0065 1. Write a note on static and dynamic memories. Answer:

The allocation of memory for the specific fixed purposes of a program in a predetermined fashion controlled by the compiler is said to be static memory allocation. The allocation of memory (and possibly its later deallocation) during the running of a program and under the control of the program is said to be dynamic memory allocation. Static memories Static RAM cells use 4 6 transistors to store a single bit of data. This provides faster access times at the expense of lower bit densities. A processors internal memory (registers and cache) is fabricated using static RAM. SRAMs resemble the flip-flops used in the processor design. SRAM cells differ from the flipflops primarily in methods used to address the cells and transfer data to and from them. The six-transistor SRAM is shown in Fig. 4.3. A signal applied to the word line (also called as address line) by the address decoder selects the cell either for Read or Write operation. The two bit lines (also called as data lines) are used to transfer stored data and its compliment between the cell and data drivers.

Dynamic memories

The bulk of a modern processors memory is composed of dynamic RAM (DRAM) chips. A DRAM memory cell uses a single transistor and a capacitor to store a bit of data. In a DRAM cell, the 1 and 0 states correspond to the presence or absence of a stored charge in a capacitor controlled by the transistor switching circuit. Since a DRAM can be constructed by a single transistor the storage density is higher. Since the charge stored in DRAM may leak with time, the cell must be periodically refreshed. Fig. 4.4 illustrates one-transistor DRAM cell. Transistor used is a MOS transistor which acts as a switch and a capacitor to store a data bit. To write information into the cell, a voltage signal is applied to the data line. Voltage signal can either be high or low representing 1 and 0 respectively. A signal is applied to the word line to switch on T. Now the capacitor charges if the data line is 1. When the transistor is off, the capacitor begins to discharge due to capacitors own leakage resistance and due to the fact that the transistor continues to conduct a very small amount of current after it is turned off. Hence the information stored in the cell can be retrieved correctly only if it is read before the charge on the capacitor drops below some threshold value. The memory cell is therefore refreshed every time its contents are read. When a DRAM is being refreshed, other accesses must be "held off". This increases the complexity of DRAM controllers. To read the cell, the word line is activated. Now the charge stored in capacitor is transferred to the bit line where it is detected.

2. What do you mean by an interrupt? How does the CPU react to an interrupt? Answer:

An interrupt is an exception condition in a computer system caused by an event external to the CPU. Interrupts are commonly used in I/O operations by a device interface (or controller) to notify the CPU that it has completed an I/O operation. An interrupt is indicated by a signal sent by the device interface to the CPU via an interrupt request line (on an external bus). This signal notifies the CPU that the signalling interface needs to be serviced. The signal is held until the CPU acknowledges or otherwise services the interface from which the interrupt originated.

The CPU checks periodically to determine if an interrupt signal is pending. This check is usually done at the end of each instruction, although some modern machines allow for interrupts to be checked for several times during the execution of very long instructions. When the CPU detects an interrupt, it then saves its current state (at least the PC and the Processor Status Register containing condition codes); this state information is usually saved in memory. After the interrupt has been serviced, this state information is restored in the CPU and the previously executing software resumes execution as if nothing had happened.
3. Discuss the Booths Multiplication Algorithm Answer:
Booths multiplication algorithm is a multiplication algorithm that multiplies two signed binary numbers in twos complement notation. The algorithm was invented by Andrew Donald Booth in 1951 while doing research on crystallography at Birkbeck College in Bloomsbury, London. As in all multiplication schemes, Both algorithm requires examination of the multiplier bits and shifting of the partial product. Prior to the shifting, the multiplicand may be added to the partial product, subtracted from the partial product, or left unchanged according to the following rules: 1. The multiplicand is subtracted from the partial product upon encountering the first least significant 1 in a string of 1s in the multiplier. 2. The multiplicand is added to the partial product upon encountering the first 0 (provided that there was a previous 1) in a string of 0 in the multiplier. 3. The partial product does not change when the multiplier bit is identical to the previous multiplier bit. BOOTHS ALGORITHM 1. Booth's algorithm is a powerful direct algorithm to perform signed-number multiplication. The algorithm is based on the fact that any binary number can be represented by the sum and difference of other binary numbers. Using a signed binary notation we can represent a multiplier in a unique scheme with the possibility of fewer add cycles for a given multiplier. 2 .a. Examples of the scheme are shown (a positive Multiplier and a negative Multiplier). The first example shows how zero's may be skipped over for faster implementation. b. Figure illustrates how the Booth recordings are accomplished and next figure illustrates the transition recording table. c. Also a flow chart and a table show, respectively, the use of the algorithm for multiplication of 2's complement numbers. d. The speed of the algorithm depends upon the bit savings if any that the Booth algorithm will generate. Figure illustrates the worst, normal and good case for a 16 bit number. e. The algorithm accomplishes the following Uniform treatment of positive and negative numbers. Achieves efficiency in the number of summands in some cases (data dependent). 3. It would be desirable to use the Booth technique in some way that removed some of the data dependence.

Books ID: B0074 4. Explain concept of virtual memory Answer:  Virtual Memory is automatic address translation that provides: y y y y Decoupling of program name space from physical location Provides access to name space potentially greater in size than physical memory Expandability of used name space without reallocation of existing memory Protection from interference with name space by other tasks

 Components that make virtual memory work include: Physical memory divided up into pages A swap device (typically a disk) that holds pages not resident in physical memory (that why it referred to as backing store as well) Address translation Page tables to hold virtual-to-physical address mappings Translation lookaside buffer is cache of translation information Management software in the operating system 5. Write a note on a) Data parallelism

The term data parallelism refers to the concurrency that is obtained when the same operation is applied to some or all elements of a data ensemble. A data-parallel program is a sequence of such operations. A parallel algorithm is obtained from a data-parallel program by applying domain decomposition techniques to the data structures operated on. Operations are then partitioned, often according to the owner computes rule, in which the processor that owns a value is responsible for updating that value. Typically, the programmer is responsible for specifying the domain decomposition, but the compiler partitions the computation automatically. Another commonly used parallel programming model, data parallelism, calls for exploitation of the concurrency that derives from the application of the same operation to multiple elements of a data structure, for example, add 2 to all elements of this array, or increase the salary of all employees with 5 years service. A data-parallel program consists of a sequence of such operations. As each operation on each data element can be thought of as an independent task, the natural granularity of a data-parallel computation is small, and the concept of locality does not arise naturally. Hence, data-parallel compilers often require the programmer to provide information about how data are to be distributed over processors, in other words, how data are to

be partitioned into tasks. The compiler can then translate the data-parallel program into required form thereby generating communication code automatically.
b) Message passing

Message passing is probably the most widely used parallel programming model today. Message-passing programs, like task/channel programs, create multiple tasks, with each task encapsulating local data. Each task is identified by a unique name, and tasks interact by sending and receiving messages to and from named tasks. In this respect, message passing is really just a minor variation on the task/channel model, differing only in the mechanism used for data transfer. For example, rather than sending a message on channel ch, we may send a message to task 17. The message-passing model does not preclude the dynamic creation of tasks, the execution of multiple tasks per processor, or the execution of different programs by different tasks. However, in practice most message-passing systems create a fixed number of identical tasks at program startup and do not allow tasks to be created or destroyed during program execution. These systems are said to implement a single program multiple data (SPMD) programming model because each task executes the same program but operates on different data. the SPMD model is sufficient for a wide range of parallel programming problems but does hinder some parallel algorithm developments.
6. With the help of a neat diagram, explain I/O sub systems
Answer:

The performance of a computer system can be limited by compute-bound jobs or input-output (I/O) bound jobs, The emphasis in the following discussion is on the I/O problem and various techniques which can be used to manage I/O data transfer, An example I/O subsystem for a dual processor system is shown in Figure 7.2. The subsystem consists of I/O interfaces and peripheral devices. Sometimes the distinction between the device and its associated interface is fuzzy. The I/O interface controls the operation of the peripheral device attached to it. The control operations are initiated by commands from the CPU. The set of commands used to accomplish an I/O transaction is called the device driver or software. The functions of the interface are to buffer and perform data conversion in to the required format. It also detects transmission errors and requests regeneration of an I/O transaction in case of error. Moreover, the interface can interrogate, start, and stop the device according to commands issued by the CPU. In some cases, the interface can also interrogate the CPU if the device requests an urgent attention. Not all interfaces possess these capabilities and many design options are available depending on the device characteristics. Below, we outline a few devices and their speed characteristics.

I/O Subsystem in a dual processor system. There are many different types of peripheral devices. Most of them are electromechanical devices and hence transfer data at a rate often limited by the speed of the electromechanical components. Table 7.1 shows some typical peripheral devices. Bubble memories, disk drums, and tape devices are mass storage devices which store data cheaply for later retrieval. Typical capacities of mass storage devices are: fixed-head and moving-head disks 5l2M bytes; floppy disks, lM bytes; 9-track tape, 46M bytes; and cassette tape, from 64K to 5l2K bytes. Display terminals are input-output devices, which consist of keyboards and cathode ray tubes (CRT). The keyboard acts as input while the CRT is the output display. In some cases where a printer replaces the CRT, the terminals are called teletypes. Since terminals are often used interactively and are relatively slow devices, a reliable technique for transmitting characters between the processor and the terminal is serial data transmission. This method is cheaper than parallel transmission of characters because only one signal path is required. Data communication over long distances is usually done serially. For this reason, remote communication can be done over telephone lines by using a modem (modulatordemodulator) interface. The modem is used at each end of the transmission line. There are a variety of character codes used in the transmission of data. However, one of the standard codes often used is the American Standards Committee on Information Interchange (ASCII) which uses seven-bit characters. I/O subsystems may be classified according to the extent to which the CPU is involved in the I/O transaction. An I/O transaction can be the transfer of a single bit, byte, word, or block of bytes of

information between the I/O device and the CPU, or between the I/O device and the main memory. The simplest I/O architecture is one in which all processing is performed sequentially. In such systems, the CPU executes programs that initiate, test the status of the device, perform the data transfer, and terminate I/O operations. In this case, the I/O transaction is performed using program-driven I/O. Most computers provide this option, as it requires minimal hardware. However, as the action of the program-driven I/O is illustrated in Figure 7.6, the CPU can spend a significant amount of time testing the status of the device. This busy-wait feature of the program-driven I/O scheme has the disadvantage that the time required to transfer a unit of information between main memory and an I/O device is typically several orders of magnitude greater than the average instruction cycle. Therefore, even a moderate I/O transfer rate will significantly degrade the useful cycles of the CPU in performing actual computations. Hence, the system performance may be degraded significantly

You might also like