Chapter 4,5&6

Computer Organization and Architecture
Chapter four
Memory system organization and architecture
4.1. Storage systems
Memory is unit is used for storage, and retrieval of data and instructions. A typical computer
system is equipped with a hierarchy of memory subsystems, some internal to the system and
some external. Internal memory systems are accessible by the CPU directly and external memory
systems are accessible by the CPU through an I/O module.
Memory systems are classified according to their key characteristics. The most important are
listed below:
Location
The classification of memory is done according to the location of the memory as:
 Registers: The CPU requires its own local memory in the form of registers and also
control unit requires local memories which are fast accessible.
 Internal (main): is often equated with the main memory (RAM)
 External (secondary): consists of peripheral storage devices like Hard disks, magnetic
tapes, etc.
Capacity
Storage capacity is one of the important aspects of the memory. It is measured in bytes. Since the
capacity of memory in a typical memory is very large, the prefixes kilo (K), mega (M), and giga
(G). A kilobyte is 210 bytes, a megabyte is 220 bytes, and a giga byte is 230 bytes.
Unit of Transfer
Unit of transfer for internal memory is equal to the number of data lines into and out of memory
module.
 Word: For internal memory, unit of transfer is equal to the number of data lines into and
out of the memory module.
 Block: For external memory, data are often transferred in much larger units than a word,
and these are referred to as blocks.
Access Method
Prepared by: Firomsa K. Page 1

 Sequential: Tape units have sequential access. Data are generally stored in units called
“records”. Data is accessed sequentially; the records may be passed (or rejected) until the
record that is searched is found.
 Random: Each addressable location in memory has a unique addressing mechanism. The
time to access a given location is independent of the sequence of prior accesses and
constant. Any location can be selected at random and directly addressed and accessed.
Main memory and cache systems are random access.
Performance
 Access time: For random-access memory, this is the time it takes to perform a read or
write operation: that is, the time from the instant that an address is presented to the
memory to the instant that data have been stored or made available for use. For non-
random-access memory, access time is the time it takes to position the read-write
mechanism at the desired location.
 Transfer rate: This is the rate at which data can be transferred into or out of a memory
unit.
Physical Type
 Semiconductor: Main memory, cache. RAM, ROM.

 Magnetic: Magnetic disks (hard disks), magnetic tape units.
 Optical: CD, DVD.
Physical Characteristics
 Volatile/nonvolatile: In a volatile memory, information decays naturally or is lost when

electrical power is switched off.
 Erasable/non erasable: Non erasable memory cannot be altered (except by destroying
the storage unit). ROM‟s are non-erasable.
4.2. Coding, data compression, and data integrity
Coding
Coding means the transformation of data for computer software. The classification
of information is an important step in preparation of data for computer processing with statistical
software.
Some studies will employ multiple coders working independently on the same data. This
minimizes the chance of errors from coding and increases the reliability of data.

Data compression
Data compression is a reduction in the number of bits needed to represent data. Compressing
data can save storage capacity, speed up file transfer, and decrease costs for storage hardware.
Compression is performed by a program that uses a formula or algorithm to determine how to
shrink the size of the data. For instance, an algorithm may represent a string of bits -- or 0s and
1s -- with a smaller string of 0s and 1s by using a dictionary for the conversion between them, or
the formula may insert a reference or pointer to a string of 0s and 1s that the program has already
seen. Text compression can be as simple as removing all unneeded characters, inserting a single
repeat character to indicate a string of repeated characters and substituting a smaller bit string for
a frequently occurring bit string. Data compression can reduce a text file to 50% or a
significantly higher percentage of its original size.
Data integrity
Data integrity refers to the accuracy and consistency (validity) of data over its lifecycle.
Compromised data, after all, is of little use to enterprises, not to mention the dangers presented
by sensitive data loss. For this reason, maintaining data integrity is a core focus of many
enterprise security solutions.
Data integrity can be compromised in a number of ways. Each time data is replicated or
transferred, it should remain intact and unaltered between updates. Error checking methods and
validation procedures are typically relied on to ensure the integrity of data that is transferred or
reproduced without the intention of alteration.
4.3 Memory hierarchy
The memory unit is an essential component in any digital computer since it is needed for storing
programs and data. A very small computer with a limited application may be able to fulfill its
intended task without the need of additional storage capacity. Most general purpose computers
would run more efficiently if they were equipped with additional storage beyond the capacity of
the main memory. There is just not enough space in one memory unit to accommodate all the
programs used in a typical computer. Moreover, most computer users accumulate and continue
to accumulate large amounts of data-processing software. Not all accumulated information is
needed by the processor at the same time. Therefore, it is more economical to use low-cost
storage devices to serve as a backup for storing the information that is not currently used by the
CPU. The memory unit that communicates directly with the CPU is called the main memory.
Devices that provide backup storage are called auxiliary memory. The most common auxiliary
memory devices used in computer systems are magnetic disks and tapes. They are used for
storing system programs, large data files, and other backup information. Only programs and data
currently needed by the processor reside in main memory. All other information is stored in
auxiliary memory and transferred to main memory when needed.

The total memory capacity of a computer can be visualized as being a hierarchy of components.
The memory hierarchy system consists of all storage devices employed in a computer system
from the slow but high-capacity auxiliary memory to a relatively faster main memory, to an even
smaller and faster cache memory accessible to the high-speed processing logic. The following
figure illustrates the components in a typical memory hierarchy. At the bottom of the hierarchy
are the relatively slow magnetic tapes used to store removable files. Next are the magnetic disks
used as backup storage. The main memory occupies a central position by being able to
communicate directly with the CPU and with auxiliary memory devices through an I/O
processor. When programs not residing in main memory are needed by the CPU, they are
brought in from auxiliary memory. Programs not currently needed in main memory are
transferred into auxiliary memory to provide space for currently used programs and data.
A special very high speed memory called a Cache is sometimes used to increase the speed of
processing by making current programs and data available to the CPU at a rapid rate. The cache
memory is employed in computer systems to compensate for the speed differential between main
memory access time and processor logic. CPU logic is usually faster than main memory access
time, with the result that processing speed is limited primarily by the speed of main memory. A
technique used to compensate for the mismatch in operating speeds is to employ an extremely
fast, small cache between the CPU and main memory whose access time is close to processor
logic clock cycle time. The cache is used for storing segments of programs currently being
executed in the CPU and temporary data frequently needed in the present calculations
While the I/O processor manages data transfers between auxiliary memory and main memory,
the cache organization is concerned with the transfer of information between main memory and
CPU. Thus each is involved with a different level in the memory hierarchy system. The reason
for having two or three levels of memory hierarchy is economics. As the storage capacity of the
memory increases, the cost per bit for storing binary information decreases and the access time
of the memory becomes longer. The auxiliary memory has a large storage capacity, is relatively
inexpensive, but has low access speed compared to main memory. The cache memory is very
small, relatively expensive, and has very high access speed. Thus as the memory access speed
increases, so does its relative cost. The overall goal of using a memory hierarchy is to obtain the
highest-possible average access speed while minimizing the total cost of the entire memory
system.
Auxiliary and cache memories are used for different purposes. The cache holds those parts of the
program and data that are most heavily used, while the auxiliary memory holds those parts that
are not presently used by the CPU. Moreover, the CPU has direct access to both cache and main
memory but not to auxiliary memory. The transfer from auxiliary to main memory is usually
done by means of direct memory access of large blocks of data.
4.4. Main memory organization and operations
The main memory is the central storage unit in a computer system. It is a relatively large and fast
memory used to store programs and data during the computer operation. The principal
technology used for the main memory is based on semiconductor integrated circuits. Integrated
circuit RAM chips are available in two possible operating modes, static and dynamic. The static
RAM consists essentially of internal flip-flops that store the binary information. The stored
information remains valid as long as power is applied to the unit. The dynamic RAM stores the
binary information in the form of electric charges that are applied to capacitors. The capacitors
are provided inside the chip by MOS transistors. The stored charge on the capacitors tends to
discharge with time and the capacitors must be periodically recharged by refreshing the dynamic
memory. Refreshing is done by cycling through the words every few milliseconds to restore the
decaying charge. The dynamic RAM offers reduced power consumption and larger storage
capacity in a single memory chip. The static RAM is easier to use and has shorter read and write
cycles.
Most of the main memory in a general-purpose computer is made up of RAM integrated circuit
chips, but a portion of the memory may be constructed with ROM chips. Originally, RAM was
used to refer to a random-access memory, but now it is used to designate a read/write memory to
distinguish it from a read-only memory, although ROM is also random access. RAM is used for
storing the bulk of the programs and data that are subject to change. ROM is used for storing
programs that are permanently resident in the computer and for tables of constants that do not
change in value once the production of the computer is completed.
Among other things, the ROM portion of main memory is needed for storing an initial program
called a bootstrap loader. The bootstrap loader is a program whose function is to start the
computer software operating when power is turned on. Since RAM is volatile, its contents are
destroyed when power is turned off. The contents of ROM remain unchanged after power is

turned off and on again. The startup of a computer consists of turning the power on and starting
the execution of an initial program. Thus when power is turned on, the hardware of the computer
sets the program counter to the first address of the bootstrap loader. The bootstrap program loads
a portion of the operating system from disk to main memory and control is then transferred to the
operating system, which prepares the computer for general use. RAM and ROM chips are
available in a variety of sizes. If the memory needed for the computer is larger than the capacity
of one chip, it is necessary to combine a number of chips to form the required memory size. To
demonstrate the chip interconnection, we will show an example of a 1024 x 8 memory
constructed with 128 x 8 RAM chips and 512 x 8 ROM chips.
Latency, cycle time, bandwidth, and interleaving

Latency: is the combined delay between an input or command and the desired output. In a
computer system, latency is often used to mean any delay or waiting that increases real or
perceived response time beyond what is desired. Specific contributors to computer latency
include mismatches in data speed between the microprocessor and input/output devices,
inadequate data buffers and the performance of the hardware involved, as well as its drivers. The
processing load of the computer can also add significant latency.
Cycle time: is the time, usually measured in nanosecond s, between the start of one random
access memory (RAM) access to the time when the next access can be started. Access time is
sometimes used as a synonym (although IBM deprecates it). Cycle time consists of latency (the
overhead of finding the right place for the memory access and preparing to access it) and transfer
time. Cycle time should not be confused with processor clock cycles or clock speed, which have
to do with the number of cycles per second (in megahertz or MHz) to which a processor is paced.
Interleaving: It is a technique for compensating the relatively slow speed of DRAM (Dynamic
RAM). In this technique, the main memory is divided into memory banks which can be accessed
individually without any dependency on the other.
Bandwidth is the rate at which data can be read from or stored into a semiconductor memory by
a processor
Cache Memory
For all instructions cycles, the CPU accesses the memory at least once to fetch the instruction
and sometimes again accesses memory to fetch the operands. The rate at which the CPU can
execute instructions is limited by the memory cycle time. This limitation is due to the mismatch
between the memory cycle time and processor cycle time. Ideally the main memory should be
built with the same technology as that of CPU registers, giving memory cycle times comparable
to processor cycle times but this is too expensive strategy.
The solution is to exploit the principle of locality by providing a small, fast memory between the
CPU and the main memory. This memory is known as cache memory. Thus the intention of
using cache memory is to give memory speed approaching the speed of fastest memories

available, and at the same time provide a large memory size at the price of less expensive types
of semiconductor memory.
Principles of Cache Memory
Figure: Cache and main memory
The cache memory contains the copy of portions of main memory. When CPU attempts to read a
word from main memory, a check is made to find if the word is in cache memory. If so cache
and then the word are delivered to the CPU. The purpose of reading a block from main memory
is that it is likely future references will be to other words in the block.
Cache Read Operation
Now let us see how cache read operation is executed. The flow chart illustrating the steps of
cache read operation. The processor generates the address „RA‟ of a word to be read. If the word
is contained in the cache, it is delivered to the processor. Otherwise, the block containing that
word is loaded into the cache and simultaneously the word is delivered to the processor from
main memory. These last two operations occur in parallel.

Transferring data in blocks between the main memory and the cache enables an interleaved
memory to operate at its maximum possible speed.
Figure Cache read operation
Elements of Cache Design
 Cache Size: Small cache => performance problems, large cache => too expensive.
 Mapping Function: Will discussed below.
 Write Policy: Before a block that is resident in the cache can be replaced, it is necessary
to consider whether it has been altered in the cache but not in the main memory.
 Write back: Writes are only done to cache. There is an UPDATE bit set when there is a
write. Before a cache block (line) is replaced, if UPDATE bit is set, its content is written
to main memory. Access to main memory by I/O modules can only be allowed through
the cache.
 Line Size: Greater line size => more hit (+) but also more line replacements (-). Too
small line size => less chance of hit for some parts of the block.

 Number of caches: Two levels of cache: Cache internal to processor is called Level-1
(L1) cache. External cache is called Level-2 (L2) cache.
Mapping Functions
The correspondence between the main memory and CPU are specified by a mapping function.
There are three standard mapping functions namely:
 Direct mapping
 Associative mapping
 Block set associative mapping
In order to discuss these methods consider a cache consisting of 128 blocks of 16 words each.
Assume that main memory is addressable by a 16 bit address. For mapping purpose main
memory is viewed as composed of 4K blocks.
1. Direct Mapping Technique
This is a simplest mapping technique. In this case, block K of the main memory maps onto block
K modulo 128 of the cache. Since more than one main memory block is mapped onto a given
cache block position, contention may arise for that position even when the cache is not full. This
is overcome by allowing the new block to overwrite the currently resident block.
A main memory address can be divided into three fields, TAG, BLOCK and WORD. The TAG
bit is required to identify a main memory block when it is resident in the cache. When a new
block enters the cache the 7-bit cache block field determines the cache position in which this
block must be stored. The tag field of that block is compared with tag field of the address. If they
match, then the desired word is present in that block of cache. If there is no match, then the block
containing the required word must be first read from the main memory and then loaded into the
cache.

Figure: Direct mapping cache
2. Associative Mapping Technique
This is a much more flexible mapping technique. Here any main memory block can be loaded to
any cache block position. In this case 12 tag bits are required to identify a main memory block
when it is resident in the cache.

Figure Associative mapping cache
The tag bits of an address received from the CPU are compared with the tag bits of each cache
block to see if the desired block is present in the cache. Here we need to search all 128 tag
patterns to determine whether a given block is in the cache. This type of search is called
associative search. Therefore the cost of implementation is higher. Because of complete freedom
in positioning, a wide range of replacement can be used.
3. Block Set Associative Mapping
This is a combination of two techniques discussed above. In this case blocks of the cache are
grouped into sets and the mapping allows a block of main memory to reside in any block of a
particular set.

Figure: Block set associative mapping cache with two blocks per set
Replacement Algorithms of Cache Memory
For and set associative mapping a replacement algorithm is needed. The most common
algorithms are discussed here. When a new block is to be brought into the cache and all the
positions that it may occupy are full, then the cache controller must decide which of the old
blocks to overwrite. Because the programs usually stay in localized areas for a reasonable period
of time, there is high probability that the blocks that have been referenced recently will be
referenced again soon. Therefore when a block is to be overwritten, the block that has not
referenced for the longest time is overwritten. This block is called least-recently-used block
(LRU), and the technique is called the LRU replacement algorithm. In order to use the LRU
algorithm, the cache controller must track the LRU block as computation proceeds.
There are several replacement algorithms that require less overhead than LRU method. One
method is to remove the oldest block from a full set when a new block must be brought in. This

method is referred to as FIFO. In this technique no updating is needed when hit occurs.
However, because the algorithm does not consider the recent patterns of access to blocks in the
cache, it is not effective as LRU approach in choosing the best block to remove. There is another
method called least frequently used (LFU) that replaces that block in the set which has
experienced the fewer references. It is implemented by associating a counter with each slot. Yet
another simplest algorithm called random, is to choose a block to be overwritten in random.
Virtual memory
Virtual (or logical) memory is a concept that, when implemented by a computer and its operating
system, allows programmers to use a very large range of memory or storage addresses for stored
data. The computing system maps the programmer‟s virtual addresses to real hardware storage
addresses. Usually, the programmer is freed from having to be concerned about the availability
of data storage.
In addition to managing the mapping of virtual storage addresses to real storage addresses, a
computer implementing virtual memory or storage also manages storage swapping between
active storage (RAM) and hard disk or other high volume storage devices. Data is read in units
called “pages” of sizes ranging from a thousand bytes (actually 1,024 decimal bytes) up to
several megabytes in size. This reduces the amount of physical storage access that is required
and speeds up overall system performance. The memory control circuitry translates the address
specified by the program into an address that can be used to access the physical memory. This
address is called logical address or virtual address. A set of virtual addresses constitute the
virtual address space.
The mechanism that translates virtual addresses into physical address is usually implemented by
a combination of hardware and software components. If a virtual address refers to a part of the
program or data space that is currently in the physical memory, then the contents of the
appropriate physical location in the main memory are accessed. Otherwise its contents must be
brought into a suitable location in the main memory. The mapping function is implemented by a
special memory control unit called as memory management unit. Mapping function can be
changed during the program execution according to the system requirement.
The simplest method of translation assumes that all programs and data are composed of fixed
length units called pages. Each page consists of a block of words that occupy contiguous
locations in the main memory or in the secondary storage. Page normally ranges from 1K to 8K
bytes in length. They form the basic unit of information that is transmitted between main
memory and secondary storage devices.
Virtual memory increases the effective size of the main memory. Only the active space of the
virtual address space is mapped onto locations in the physical main memory, whereas the

remaining virtual addresses are mapped onto the bulk storage devices used. During a memory
cycle the addressing spacing mechanism (hardware or software) determines whether the
addressed information is in the physical main memory unit. If it is, the proper information is
accessed and the execution proceeds. If it is not, a contiguous block of words containing the
desired information are transferred from the bulk storage to main memory displacing some block
that is currently inactive.

Chapter five
Interfacing and communication
I/O fundamentals
Handshaking
Handshaking is an I/O control method to synchronize I/O devices with the microprocessor. As
many I/O devices accepts or release information at a much slower rate than the microprocessor,
this method is used to control the microprocessor to work with a I/O device at the I/O devices
data transfer rate. Supposing that we have a printer connected to a system. The printer can print
100 characters/second, but the microprocessor can send much more information to the printer at
the same time. That‟s why; just when the printer gets it enough data to print it places a logic 1
signal at its busy pin, indicating that it is busy in printing. The microprocessor now tests the busy
bit to decide if the printer is busy or not. When the printer will become free it will change the
busy bit and the microprocessor will again send enough amounts of data to be printed. This
process of interrogating the printer is called handshaking.
Buffering
I/O is the process of transferring data between a program and an external device. The process of
optimizing I/O consists primarily of making the best possible use of the slowest part of the path
between the program and the device.
The slowest part is usually the physical channel, which is often slower than the CPU or a
memory-to-memory data transfer. The time spent in I/O processing overhead can reduce the
amount of time that a channel can be used, thereby reducing the effective transfer rate. The
biggest factor in maximizing this channel speed is often the reduction of I/O processing
overhead.
A buffer is a temporary storage location for data while the data is being transferred. A buffer is
often used for the following purposes:
 Small I/O requests can be collected into a buffer, and the overhead of making many
relatively expensive system calls can be greatly reduced.
A collection buffer of this type can be sized and handled so that the actual physical I/O
requests made to the operating system match the physical characteristics of the device
being used.
 Many data file structures, such as the f77 and cos file structures, contain control words.
During the write process, a buffer can be used as a work area where control words can be

inserted into the data stream (a process called blocking). The blocked data is then written
to the device. During the read process, the same buffer work area can be used to examine
and remove these control words before passing the data on to the user (deblocking ).
 When data access is random, the same data may be requested many times. A cache is a
buffer that keeps old requests in the buffer in case these requests are needed again. A
cache that is sufficiently large and/or efficient can avoid a large part of the physical I/O
by having the data ready in a buffer. When the data is often found in the cache buffer, it
is referred to as having a high hit rate. For example, if the entire file fits in the cache and
the file is present in the cache, no more physical requests are required to perform the I/O.
In this case, the hit rate is 100%.
 Running the disks and the CPU in parallel often improves performance; therefore, it is
useful to keep the CPU busy while data is being moved. To do this when writing, data
can be transferred to the buffer at memory-to-memory copy speed and an asynchronous
I/O request can be made. The control is then immediately returned to the program, which
continues to execute as if the I/O were complete (a process called write-behind). A
similar process can be used while reading; in this process, data is read into a buffer before
the actual request is issued for it. When it is needed, it is already in the buffer and can be
transferred to the user at very high speed. This is another form or use of a cache.
Interrupt Driven I/O
Using Program-controlled I/O requires continuous involvement of the processor in the I/O
activities. It is desirable to avoid wasting processor execution time. An alternative is for the CPU
to issue an I/O command to a module and then go on other work. The I/O module will then
interrupt the CPU requesting service when it is ready to exchange data with the CPU. The CPU
will then execute the data transfer and then resumes its former processing. Based on the use of
interrupts, this technique improves the utilization of the processor.
With Interrupt driven I/O, the CPU issues a command to I/O module and it does not wait until
I/O operation is complete but instead continues to execute other instructions. When I/O module
has completed its work it interrupts the CPU.
An interrupt is more than a simple mechanism for coordinating I/O transfers. In a general sense,
interrupts enable transfer of control from one program to another to be initiated by an event that
is external to a computer. Execution of the interrupted program resumes after completion of
execution of the interrupt service routine. The concept of interrupts is useful in operating systems
and in many control applications where processing of certain routines has to be accurately timed
relative to the external events.
Using Interrupt Driven I/O technique CPU issues read command. I/O module gets data from
peripheral while CPU does other work and I/O module interrupts CPU checks the status if no

error that is the device is ready then CPU requests data and I/O module transfers data. Thus CPU
reads the data and stores it in the main memory.
Programmed I/O
With Programmed I/O, data are exchanged between the CPU and the I/O module. The CPU
executes a program that gives it direct control of the I/O operation, including sensing device
status, sending a read or write command and transferring data. When CPU issues a command to
I/O module, it must wait until I/O operation is complete. If the CPU is faster than I/O module,
there is wastage of CPU time.
The I/O module does not take any further action to alert CPU. That is it doesn‟t interrupt CPU.
Hence it is the responsibility of the CPU to periodically check the status of the I/O module until
it finds that the operation is complete.
The sequences of actions that take place with programmed I/O are:
 CPU requests I/O operation
 I/O module performs operation

 I/O module sets status bits
 CPU checks status bits periodically
 I/O module does not inform CPU directly
 I/O module does not interrupt CPU
 CPU may wait or come back later
Interrupt structures
Vectored interrupt
In a computer, a vectored interrupt is an I/O interrupt that tells the part of the computer that
handles I/O interrupts at the hardware level that a request for attention from an I/O device has
been received and also identifies the device that sent the request. A vectored interrupt is an
alternative to a polled interrupt, which requires that the interrupt handler poll or send a signal to
each device in turn in order to find out which one sent the interrupt request.
Polled interrupt
In a computer, a polled interrupt is a specific type of I/O interrupt that notifies the part of the
computer containing the I/O interface that a device is ready to be read or otherwise handled but
does not indicate which device. The interrupt controller must poll (send a signal out to) each
device to determine which one made the request.

External storage
External storage is all addressable data storage that is not currently in the computer's main
storage or memory. Synonyms are auxiliary storage and secondary storage.
Primary storage (or main memory or internal memory), often referred to simply as memory,
is the only one directly accessible to the CPU. The CPU continuously reads instructions stored
there and executes them as required. Any data actively operated on is also stored there in
uniform manner.
Main memory is directly or indirectly connected to the CPU via a memory bus. It is actually two
buses (not on the diagram): an address bus and a data bus. The CPU firstly sends a number
through an address bus, a number called memory address that indicates the desired location of
data. Then it reads or writes the data itself using the data bus. Additionally, a memory
management unit (MMU) is a small device between CPU and RAM recalculating the actual
memory address, for example to provide an abstraction of virtual memory or other tasks.
Secondary storage (or external memory) differs from primary storage in that it is not directly
accessible by the CPU. The computer usually uses its input/output channels to access secondary
storage and transfers the desired data using intermediate area in primary storage. Secondary
storage does not lose the data when the device is powered down—it is non-volatile. Per unit, it is
typically also an order of magnitude less expensive than primary storage. Consequently, modern
computer systems typically have an order of magnitude more secondary storage than primary
storage and data is kept for a longer time there.
In modern computers, hard disk drives are usually used as secondary storage. The time taken to
access a given byte of information stored on a hard disk is typically a few thousandths of a
second, or milliseconds. By contrast, the time taken to access a given byte of information stored
in random access memory is measured in billionths of a second, or nanoseconds. This illustrates
the very significant access-time difference which distinguishes solid-state memory from rotating
magnetic storage devices: hard disks are typically about a million times slower than memory.
Rotating optical storage devices, such as CD and DVD drives, have even longer access times.
With disk drives, once the disk read/write head reaches the proper placement and the data of
interest rotates under it, subsequent data on the track are very fast to access. As a result, in order
to hide the initial seek time and rotational latency, data are transferred to and from disks in large
contiguous blocks.
When data reside on disk, block access to hide latency offers a ray of hope in designing
efficient external memory algorithms. Sequential or block access on disks is orders of magnitude
faster than random access, and many sophisticated paradigms have been developed to design
efficient algorithms based upon sequential and block access . Another way to reduce the I/O
bottleneck is to use multiple disks in parallel in order to increase the bandwidth between primary
and secondary memory.

Some other examples of secondary storage technologies are: flash memory (e.g. USB flash
drives or keys), floppy disks, magnetic tape, paper tape, punched cards, standalone RAM disks,
and Iomega Zip drives.
The secondary storage is often formatted according to a file system format, which provides the
abstraction necessary to organize data into files and directories, providing also additional
information (called metadata) describing the owner of a certain file, the access time, the access
permissions, and other information.
Most computer operating systems use the concept of virtual memory, allowing utilization of
more primary storage capacity than is physically available in the system. As the primary memory
fills up, the system moves the least-used chunks (pages) to secondary storage devices (to a swap
file or page file), retrieving them later when they are needed. As more of these retrievals from
slower secondary storage are necessary, the more the overall system performance is degraded.
Buses
Bus is a communication system that transfers data between components inside a computer, or
between computers.
This expression covers all related hardware components (wire, optical fiber, etc.) and software,
including communication protocols.
Early computer buses were parallel electrical wires with multiple connections, but the term is
now used for any physical arrangement that provides the same logical function as a
parallel electrical bus. Modern computer buses can use both parallel and bit serial connections,
and can be wired in either a multi drop (electrical parallel) or daisy chain topology, or connected
by switched hubs, as in the case of USB.
Computer bus types are as follows:
 System Bus: A parallel bus that simultaneously transfers data in 8-, 16-, or 32-bit
channels and is the primary pathway between the CPU and memory.
 Internal Bus: Connects a local device, like internal CPU memory.
 External Bus: Connects peripheral devices to the motherboard, such as scanners or disk
drives.
 Expansion Bus: Allows expansion boards to access the CPU and RAM.
 Front side Bus: Main computer bus that determines data transfer rate speed and is the
primary data transfer path between the CPU, RAM and other motherboard devices.
 Backside Bus: Transfers secondary cache (L2 cache) data at faster speeds, allowing more
efficient CPU operations.

Bus Protocol
The essence of any bus is the set of rules by which data moves between devices. This set of rules
is called a protocol, for how data is transferred across the bus. Every bus transaction consists of
an address phase and one or more data phases. Both the initiator and the target of a transaction
can regulate the flow of data by controlling their respective “ready” signals.
Bus arbitration process
Refers to a process by which the current bus master (the controller that has access to a bus at an
instance) accesses and then leaves the control of the bus and passes it to another bus-requesting
processor unit.
Three bus arbitration processes

1. Daisy Chain
2. Independent Bus Requests and Grant
3. Polling
Direct Memory Access
Direct Memory Access is capabilities provided by some computer bus architectures that allow
data to be sent directly from an attached device (such as a disk drive) to the memory on the
computer‟s motherboard. The microprocessor (CPU) is freed from involvement with the data
transfer, thus speeding up overall computer operation.
When the CPU wishes to read or write a block of data, it issues a command to the DMA module
and gives following information:
CPU tells DMA controller:

 Whether to read or write
 Device address
 Starting address of memory block for data
 Amount of data to be transferred
The CPU carries on with other work.
 Thus DMA controller steals the CPU‟s work of I/O operation.
 The DMA module transfers the entire block of data,
 One word at a time, directly to or from memory, without going through CPU.
When the transfer is complete
 DMA controller sends interrupt when finished

Thus CPU is involved only at the beginning and at the end of the transfer.
DMA Configurations
The DMA mechanism can be configured in variety of ways. Some of the common configurations
are discussed here.
1. Single Bus Detached DMA
In this configuration all modules share the same system bus. The block diagram of single bus
detached DMA is as shown in the following Figure. The DMA module that is mimicking the
CPU uses the programmed I/O to exchange the data between the memory and the I/O module
through the DMA module. This scheme may be inexpensive but is clearly inefficient. The
features of this configuration are:
 Single Bus, Detached DMA controller
 Each transfer uses bus twice

 I/O to DMA then DMA to memory
 CPU is suspended twice
DMA I/O I/O Main

CPU Controller device device Memory
2. Single Bus, integrated DMA
Here, there is a path between DMA module and one or more I/O modules that do not include the
system bus. The block diagram of single bus Integrated DMA is as shown in the following. The
DMA logic can actually be considered as a part of an I/O module or there may be a separate
module that controls one more I/O modules.
The features of this configuration can be considered as:
DMA DMA Main

CPU Controller Controller Memory
I/O I/O I/O

device device device
 Single Bus, Integrated DMA controller

 Controller may support >1 device
 Each transfer uses the bus once
 DMA to memory
 CPU is suspended once
DMA using an I/O bus
One further step of the concept of integrated DMA is to connect I/O modules to DMA controller
using a separate bus called I/O bus. This reduces the number of I/O interfaces in the DMA
module to one and provides for an easily expandable configuration. The block diagram of DMA
using I/O bus is as shown in the following Figure. Here the system bus that the DMA shares with
CPU and main memory is used by DMA module only to exchange data with memory. And the
exchange of data between the DMA module and the I/O modules takes place off the system bus
that is through the I/O bus.
System bus
DMA Main
CPU Controller Memory
I/O bus
I/O I/O I/O I/O
device device device device
The features of this configuration are:

 Separate I/O Bus
 Bus supports all DMA enabled devices

 Each transfer uses bus once
 DMA to memory
 CPU is suspended once

Advantages of DMA
DMA has several advantages over polling and interrupts. DMA is fast because a dedicated piece
of hardware transfers data from one computer location to another and only one or two bus
read/write cycles are required per piece of data transferred. In addition, DMA is usually required
to achieve maximum data transfer speed, and thus is useful for high speed data acquisition
devices. DMA also minimizes latency in servicing a data acquisition device because the
dedicated hardware responds more quickly than interrupts and transfer time is short. Minimizing
latency reduces the amount of temporary storage (memory) required on an I/O device. DMA also
off-loads the processor, which means the processor does not have to execute any instructions to
transfer data. Therefore, the processor is not used for handling the data transfer activity and is
available for other processing activity. Also, in systems where the processor primarily operates
out of its cache, data transfer is actually occurring in parallel, thus increasing overall system
utilization.
Introduction to network
A computer network, or data network, is a digital telecommunications network which
allows nodes to share resources. In computer networks, computing devices exchange data with
each other using connections between nodes (data links.) These data links are established
over cable media such as wires or optic cables, or wireless media such as Wi-Fi.
Network computer devices that originate, route and terminate the data are called network nodes.
Nodes can include hosts such as personal computers, phones, servers as well as networking
hardware. Two such devices can be said to be networked together when one device is able to
exchange information with the other device, whether or not they have a direct connection to each
other. In most cases, application-specific communications protocols are layered (i.e. carried
as payload) over other more general communications protocols. This formidable collection
of information technology requires skilled network management to keep it all running reliably.
Computer networks support an enormous number of applications and services such as access to
the World Wide Web, digital video, digital audio, shared use of application and storage servers,
printers, and fax machines, and use of email and instant messaging applications as well as many
others. Computer networks differ in the transmission medium used to carry their
signals, communications protocols to organize network traffic, the network's size, topology and
organizational intent. The best-known computer network is the Internet.
Multimedia support
Multimedia is more than one concurrent presentation medium (for example, on CD-ROM or a
Web site). Although still images are a different medium than text, multimedia is typically used to
mean the combination of text, sound, and/or motion video. It is typically meant one of the
following:

 Text and sound

 Text, sound, and still or animated graphic images
 Text, sound, and video images
 Video and sound
 Multiple display areas, images, or presentations presented concurrently
RAID architecture
RAID (redundant array of independent disks; originally redundant array of inexpensive disks) is
a way of storing the same data in different places on multiple hard disks to protect data in the
case of a drive failure. RAID works by placing data on multiple disks and allowing input/output
(I/O) operations to overlap in a balanced way, improving performance. Because the use of
multiple disks increases the mean time between failures, storing data redundantly also
increases fault tolerance.
RAID arrays appear to the operating system (OS) as a single logical hard disk. RAID employs
the techniques of disk mirroring or disk striping. Mirroring copies identical data onto more than
one drive. Striping partitions each drive's storage space into units ranging from
a sector (512 bytes) up to several megabytes. The stripes of all the disks are interleaved and
addressed in order.
RAID (Redundant Array of Independent Disks)

Is a way of storing the same data in different places on multiple hard disks to protect data in the
case of a drive failure. A RAID controller can be used in both hardware- and software-based
RAID arrays. In a hardware-based RAID product, a physical controller manages the array. When
in the form of a Peripheral Component Interconnect or PCI Express card, the controller can be
designed to support drive formats such as SATA and SCSI. A physical RAID controller can also
be part of the motherboard. With software-based RAID, the controller uses the resources of the
hardware system. While it performs the same functions as a hardware-based RAID controller,
software-based RAID controllers may not enable as much of a performance boost. If a software-
based RAID implementation isn't compatible with a system's boot-up process, and hardware-
based RAID controllers are too costly, firmware- or driver-based, RAID is another
implementation option.
A firmware-based RAID controller chip is located on the motherboard, and all operations are
performed by the CPU, similar to software-based RAID. However, with firmware, the RAID
system is only implemented at the beginning of the boot process. Once the OS has loaded, the
controller driver takes over RAID functionality. A firmware RAID controller isn't as pricy as a
hardware option, but puts more strain on the computer's CPU. Firmware-based RAID is also
called hardware-assisted software RAID, hybrid model RAID and fake RAID.

Chapter 6
Functional organization
Implementation of simple data paths
The data path is the "brawn" of a processor, since it implements the fetch-decode-execute cycle.
The general discipline for data path design is to:
 Determine the instruction classes and formats in the ISA.
 Design data path components and interconnections for each instruction class or format.
 Compose the data path segments designed in Step 2 to yield a composite data path.
Simple data path components include memory (stores the current instruction), PC or program
counter (stores the address of current instruction), and ALU (executes current instruction). The
interconnection of these simple components to form a basic data path is illustrated in the
following Figure. Note that the register file is written to by the output of the ALU.
Implementation of the data path for I- and J-format instructions requires two more components -
a data memory and a sign extender, illustrated in the following Figure. The data memory stores
ALU results and operands, including instructions, and has two enabling inputs (MemWrite and
MemRead) that cannot both be active (have a logical high value) at the same time. The data
memory accepts an address and either accepts data (WriteData port if MemWrite is enabled) or

outputs data (ReadData port if MemRead is enabled), at the indicated address. The sign extender
adds 16 leading digits to a 16-bit word with most significant bit b, to product a 32-bit word. In
particular, the additional 16 digits have the same value as b, thus implementing sign extension in
twos complement representation.
Control unit
A processor is composed of data path and control unit. Data path of a processor is the execution
unit such as ALU, shifter, registers and their interconnects. Control unit is considered to be the
most complex part of a processor. Its function is to control various units in the data path. Control
unit realizes the behavior of a processor as specified by its micro-operations. The performance
of control unit is crucial as it determines the clock cycle of the processor.
Control unit can be implemented by hardwired or by micro program. A computer designer

strives to optimize three aspects of control unit design:
 The complexity (hence cost) of the control unit

 The speed of control unit
 The engineering cost of the design (time, correctness etc.)
Hardwired control unit
In the past, hardwired control unit is very difficult to design hence its engineering cost is very
high. Presently, the emphasis of computer design is the performance therefore hardwired design
is the choice. Also the CAD tools for logic design have improved to the point that a complex
design can be mostly automated. Therefore almost all processors of today use hardwired control

unit. Starting with a behavioral description of the control unit, the state diagram of micro-
operations is constructed. Most states are simply driven by clock and only transition to the next
state. Some states branch to different states depends on conditions such as testing conditional
codes or decoding the instruction.
Micro programmed control unit
Micro program control unit is actually like a miniature computer which can be "programmed" to
sequence the patterns of control bits. Its "program" is called "micro program" to distinguish it
from an ordinary computer program. Using micro program, a control unit can be implemented
for a complex instruction set which is impossible to do by hardwired.
Micro program approach for control unit has several advantages:

1. One computer model can be micro programmed to "emulate" other model.
2. One instruction set can be used throughout different models of hardware.
3. One hardware can realized many instruction sets. Therefore it is possible to choose the
set that is most suitable for an application.
Instruction pipelining
Pipelining is the process of accumulating instruction from the processor through a pipeline. It
allows storing and executing instructions in an orderly process. It is also known as pipeline
processing. Pipelining is a technique where multiple instructions are overlapped during
execution. Pipeline is divided into stages and these stages are connected with one another to form
a pipe like structure. Instructions enter from one end and exit from another end. Pipelining
increases the overall instruction throughput.
In pipeline system, each segment consists of an input register followed by a combinational
circuit. The register is used to hold data and combinational circuit performs operations on it. The
output of combinational circuit is applied to the input register of the next segment.
Types of Pipeline:
It is divided into 2 categories:
Arithmetic Pipeline
Arithmetic pipelines are usually found in most of the computers. They are used for floating point
operations, multiplication of fixed point numbers etc. For example: The input to the Floating
Point Adder pipeline is:
X = A*2^a
Y = B*2^b
Here A and B are mantissas (significant digit of floating point numbers), while a and b are
exponents.
The floating point addition and subtraction is done in 4 parts:

1. Compare the exponents.

2. Align the mantissas.
3. Add or subtract mantissas
4. Produce the result.
Instruction Pipeline
In this a stream of instructions can be executed by overlapping fetch, decode and execute phases
of an instruction cycle. This type of technique is used to increase the throughput of the computer
system.
An instruction pipeline reads instruction from the memory while previous instructions are being
executed in other segments of the pipeline. Thus we can execute multiple instructions
simultaneously. The pipeline will be more efficient if the instruction cycle is divided into
segments of equal duration.
Pipeline Conflicts
There are some factors that cause the pipeline to deviate its normal performance. Some of these
factors are given below:
Timing Variations
All stages cannot take same amount of time. This problem generally occurs in instruction
processing where different instructions have different operand requirements and thus different
processing time.
Data Hazards
When several instructions are in partial execution, and if they reference same data then the
problem arises. We must ensure that next instruction does not attempt to access data before the
current instruction, because this will lead to incorrect results.
Branching
In order to fetch and execute the next instruction, we must know what that instruction is. If the
present instruction is a conditional branch, and its result will lead us to the next instruction, then
the next instruction may not be known until the current one is processed.
Interrupts
Interrupts set unwanted instruction into the instruction stream. Interrupts effect the execution of
instruction.
Data Dependency
It arises when an instruction depends upon the result of a previous instruction but this result is
not yet available.
Advantages of Pipelining

1. The cycle time of the processor is reduced.

2. It increases the throughput of the system
3. It makes the system reliable.
Disadvantages of Pipelining
1. The design of pipelined processor is complex and costly to manufacture.

2. The instruction latency is more.
Introduction to instruction-level parallelism (ILP)
Is a measure of how many of the instructions in a computer program can be executed

simultaneously.
There are two approaches to instruction level parallelism:
 Hardware
 Software
Hardware level works upon dynamic parallelism whereas, the software level works on static
parallelism. Dynamic parallelism means the processor decides at run time which instructions to
execute in parallel, whereas static parallelism means the compiler decides which instructions to
execute in parallel. The Pentium processor works on the dynamic sequence of parallel execution
but the Itanium processor works on the static level parallelism.
Consider the following program:
1. e = a + b
2. f = c + d
3. m = e * f
Operation 3 depends on the results of operations 1 and 2, so it cannot be calculated until both of
them are completed. However, operations 1 and 2 do not depend on any other operation, so they
can be calculated simultaneously. If we assume that each operation can be completed in one unit
of time then these three instructions can be completed in a total of two units of time, giving an
ILP of 3/2.
A goal of compiler and processor designers is to identify and take advantage of as much ILP as
possible. Ordinary programs are typically written under a sequential execution model where
instructions execute one after the other and in the order specified by the programmer. ILP allows
the compiler and the processor to overlap the execution of multiple instructions or even to
change the order in which instructions are executed.
How much ILP exists in programs is very application specific. In certain fields, such as graphics
and scientific computing the amount can be very large. However, workloads such
as cryptography may exhibit much less parallelism.

Micro-architectural techniques that are used to exploit ILP include:
 Instruction pipelining where the execution of multiple instructions can be partially

overlapped.
 Superscalar execution, VLIW, and the closely related explicitly parallel instruction
computing concepts, in which multiple execution units are used to execute multiple
instructions in parallel.
 Out-of-order execution where instructions execute in any order that does not violate data
dependencies. Note that this technique is independent of both pipelining and superscalar.
Current implementations of out-of-order execution dynamically (i.e., while the program is
executing and without any help from the compiler) extract ILP from ordinary programs. An
alternative is to extract this parallelism at compile time and somehow convey this
information to the hardware. Due to the complexity of scaling the out-of-order execution
technique, the industry has re-examined instruction sets which explicitly encode multiple
independent operations per instruction.
 Register renaming which refers to a technique used to avoid unnecessary serialization of
program operations imposed by the reuse of registers by those operations, used to enable out-
of-order execution.
 Speculative execution which allow the execution of complete instructions or parts of
instructions before being certain whether this execution should take place. A commonly used
form of speculative execution is control flow speculation where instructions past a control
flow instruction (e.g., a branch) are executed before the target of the control flow instruction
is determined. Several other forms of speculative execution have been proposed and are in
use including speculative execution driven by value prediction, memory dependence
prediction and cache latency prediction.
 Branch prediction which is used to avoid stalling for control dependencies to be resolved.
Branch prediction is used with speculative execution.
It is known that the ILP is exploited by both the compiler and hardware support but the compiler
also provides inherit and implicit ILP in programs to hardware by compilation optimization.
Some optimization techniques for extracting available ILP in programs would include
scheduling, register allocation/renaming, and memory access optimization.

Chapter 4,5&amp;6

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 4,5&amp;6

Uploaded by

Copyright:

Available Formats

Computer Organization and Architecture

Memory system organization and architecture

4.1. Storage systems

Prepared by: Firomsa K. Page 1

 Semiconductor: Main memory, cache. RAM, ROM.

 Volatile/nonvolatile: In a volatile memory, information decays naturally or is lost when

Prepared by: Firomsa K. Page 2

4.3 Memory hierarchy

Prepared by: Firomsa K. Page 3

4.4. Main memory organization and operations

Prepared by: Firomsa K. Page 5

Latency, cycle time, bandwidth, and interleaving

Prepared by: Firomsa K. Page 6

Principles of Cache Memory

Figure: Cache and main memory

Cache Read Operation

Prepared by: Firomsa K. Page 7

Figure Cache read operation

Elements of Cache Design

Prepared by: Firomsa K. Page 8

1. Direct Mapping Technique

Prepared by: Firomsa K. Page 9

Figure: Direct mapping cache

2. Associative Mapping Technique

Prepared by: Firomsa K. Page 10

Figure Associative mapping cache

3. Block Set Associative Mapping

Prepared by: Firomsa K. Page 11

Replacement Algorithms of Cache Memory

Prepared by: Firomsa K. Page 12

Prepared by: Firomsa K. Page 13

Prepared by: Firomsa K. Page 14

Interfacing and communication

Prepared by: Firomsa K. Page 15

Interrupt Driven I/O

Prepared by: Firomsa K. Page 16

 I/O module performs operation

Prepared by: Firomsa K. Page 17

Prepared by: Firomsa K. Page 18

Prepared by: Firomsa K. Page 19

Bus arbitration process

Three bus arbitration processes

CPU tells DMA controller:

Prepared by: Firomsa K. Page 20

1. Single Bus Detached DMA

 Each transfer uses bus twice

DMA I/O I/O Main

2. Single Bus, integrated DMA

The features of this configuration can be considered as:

DMA DMA Main

I/O I/O I/O

 Single Bus, Integrated DMA controller

DMA using an I/O bus

The features of this configuration are:

 Bus supports all DMA enabled devices

Prepared by: Firomsa K. Page 22

Prepared by: Firomsa K. Page 23

 Text and sound

RAID (Redundant Array of Independent Disks)

Prepared by: Firomsa K. Page 24

Implementation of simple data paths

 Determine the instruction classes and formats in the ISA.

Prepared by: Firomsa K. Page 25

Chapter 4,5&6

Chapter 4,5&6