You are on page 1of 77

ECPE13

Computer Architecture and


Organization
Dr. P. Maheswaran
Assistant Professor, Dept. Of ECE
NIT Trichy

mahes@nitt.edu
Time Table
Slot: C1
Credits: 3
Venue: G9, Orion Day Time

Monday 2:30 PM to 3:20 PM

Wednesday 3:20 PM to 4:10 PM

Thursday 4:10 PM to 5:00 PM


Syllabus
Unit 1: Introduction


Function and structure of a computer

Functional components of a computer

Interconnection of components

Performance of a computer
Syllabus
Unit 2: Representation of Instructions


Machine instructions

Memory locations & Addresses, Operands

Addressing modes

Instruction formats, Instruction sets

Instruction set architectures - CISC and RISC architectures

Superscalar Architectures

Fixed point and floating point operations
Syllabus
Unit 3: Basic Processing Unit


Fundamental concepts

ALU, Control unit

Multiple bus organization

Hardwired control, Micro programmed control

Pipelining

Data hazards, Instruction hazards

Influence on instruction sets

Data path and control considerations, Performance considerations
Syllabus
Unit 4: Memory organization


Basic concepts: Semiconductor RAM memories, ROM, Speed - Size and cost

Memory Interfacing circuits

Cache memory, Improving cache performance

Memory management unit

Shared/Distributed Memory

Cache coherency in multiprocessor

Segmentation, Paging

Concept of virtual memory, Address translation

Secondary storage devices.
Syllabus
Unit 5: I/O Organization


Accessing I/O devices

Input/output programming

Interrupts, Exception Handling

Direct Memory Access

Buses, I/O interfaces-

Serial port, Parallel port

PCI bus, SCSI bus, USB bus

Firewall and Infinity band

I/O peripherals.
Function of a computer:


A computer is a complex system.

The hierarchical nature of most complex systems is the key to describe them.

A hierarchical system is a set of interrelated subsystems.

The subsytems are hierarchical in structure until we reach some lowest level of
elementary subsystem.

The hierarchical nature is essential to both design and description.

The designer need only deal with a particular level of the system.

At each level, the system consists of a set of components and their
interrelationships.
Function of a computer:


The behavior at each level depends only on a simplified, abstracted
characterization of the system at the next lower level.

At each level, the designer is concerned with structure and function.
Structure: The way in which the components are interrelated
Function: The operation of each individual component as part of the structure

Description of the computer system:

Top-down: Begin description with a top view and decomposing the system into
its subparts. This is clearest and most effective.

Bottom-up: Starting at the bottom and building up to a complete description.
Function of a computer:

The basic functions that a computer can perform is given in
Fig.1.1.

There are only four functions:

Data processing

Data storage

Data movement

Control

The computer is to process data.

The computer may processing data on the fly.

The computer must temporarily store at least those pieces of
data that are being processed. A short-term data storage
Fig. 1.1 A Functional View of the
function is needed. Computer
Function of a computer:


The computer may perform a long-term data storage function
also.

Files of data are stored on the computer for subsequent
retrieval and update.

The computer must be able to move data between itself and the
outside world.

Input–Output (I/O): Process when data are received from or
delivered to a device that is directly connected to the computer.

I/O devices are called peripherals.

When data are moved over longer distances, to or from a remote
device, the process is known as data communications. Fig. 1.1 A Functional View of the
Computer
Function of a computer:


There must be control of store, process, and move functions.

This control is exercised by the individual(s) who provides the
computer with instructions.

A control unit manages the computer’s resources and
orchestrates the performance of its functional parts in response
to those instructions.

Fig. 1.1 A Functional View of the


Computer
Function of a computer:


Figure depicts four possible operations.

(a) As a data movement device – transferring data


from one peripheral or communications line to another.

(b) A data storage device - data transferred from the


external environment to computer storage (read) and
vice versa (write).

(c) Data processing - on data in storage.

(d) Data processing - en route between storage and the


external environment.
Fig. 1.2 Possible Computer
Operations
Structure of a computer:


The simplest possible depiction of a computer.

The computer interacts with its external environment through:

Peripheral devices or communication lines.

Fig. 1.3: The Computer


Structure of a computer:

The internal structure of the computer is shown in Fig. 1.4.

The four main structural components are:

Central processing unit (CPU): Controls the operation of
the computer and performs its data processing functions;
often simply referred to as processor.

Main memory: Stores data.

I/O: Moves data between the computer and its external
environment.

System interconnection: Some mechanism that provides
for communication among CPU, main memory, and I/O.

System interconnection is done by system bus.

System bus consist of a number of conducting wires to Fig. 1.4 The Computer: Top-Level
Structure
which all the other components attach.
Structure of a computer:


There may be one or more of each of the aforementioned
components.

Traditionally, there has been just a single processor.

Multiple processors in a single computer are used in recent
times.

The most complex component is the CPU. Its major
structural components are:

Control unit: Controls the operation of the CPU and hence


the computer.
Arithmetic and logic unit (ALU): Performs the computer’s
data processing functions.
Registers: Provides storage internal to the CPU.
CPU interconnection: Some mechanism that provides for Fig. 1.4 The Computer: Top-Level
communication among the control unit, ALU, and registers. Structure
Functional components of a computer:

A computer consists of five functionally independent main parts:

Input

Memory

Arithmetic and logic

Output

Controls

The input unit accepts coded information from human operators.

Keyboard, digital comm lines from other computers.

Received information is stored in memory.

Information is processed by arithmetic and logic unit.

Processing steps also stored in memory.

Results are sent back to outside world using output unit.

All of these actions are coordinated by control unit.

An interconnection network provides the means for the functional units to exchange information and
coordinate their actions.
Functional components of a computer:

Processor: arithmetic and logic circuits + control circuits.

Input-output (I/O) unit: I/O equipments.


Information in a computer: Data or Instructions.

Instructions command:

Transfer of information within a computer or between computers.

Specify the arithmetic and logic operations to be performed


Program: A list of instructions which performs a task.

Processor fetches the program instructions from the memory and performs the desired operations.

Data: Numbers and characters that are used as operands by the instructions.

Instruction, number, or character is encoded as a string of binary digits called bits.
Functional components of a computer:
Input Unit:

Computers accept coded information through input units.

Some of the input devices are:

Keyboard

Touchpad, Mouse, Joystick, and Trackball.

Microphones

Cameras

Memory Unit:

Stores programs and data.

Two classes of storage: primary and secondary.
Functional components of a computer: Memory unit
Primary Memory: (aka main memory)

A fast memory that operates at electronic speeds.

Programs must be stored in this memory while they are being executed.

Consists of semiconductor storage cells.

Handled in groups of fixed size called words.

Number of bits in each word is word length: 16, 32, 64 bits.

A distinct address is associated with each word location.

Addresses are consecutive numbers, starting from 0.

A particular word is accessed by:

Specifying its address.

Issuing a control command to the memory.

Random access memory: A memory in which any location can be accessed in a short and fixed
amount of time with address.

Memory access time: The time required to access one word. 1 to 100 ns for RAMs.
Functional components of a computer: Memory unit
Cache Memory:

Used to hold sections of a program that are currently being executed.

Associated data are also stored.

The cache is tightly coupled with the processor.

Usually contained on the same integrated-circuit chip.

The cache is to facilitate high instruction execution rates.

Instructions are fetched into the processor and copied in cache.

Data from main memory when needed by instructions are copied in cache.

Repeatedly used instructions and data are directly fetched from cache.
Functional components of a computer: Memory unit
Secondary Storage:

Primary memory is expensive, does not retain information when power is turned off.

Secondary memory is less expensive, permanent.

Used when large amounts of data and many programs have to be stored.

Data that are accessed infrequently.

Access times for secondary storage are longer than for primary memory.

Secondary storage devices:

Magnetic disks.

Optical disks (CD, DVD).

Flash memory.
Functional components of a computer: ALU

Computer operations are executed in the arithmetic and logic unit (ALU) of the processor.

Arithmetic or logic operation: addition, subtraction, multiplication, division, or comparison of numbers.

Operations are performed by bringing operands into the processor’s ALU.

Two number to be added are:

Brought into the processor from memory.

The addition is carried out by the ALU.

The sum may then be stored in the memory or retained in the processor for immediate use.

Operands brought into the processor are stored in high-speed storage elements called registers.

Each register can store one word of data.

Access times to registers are even shorter than access times to the cache unit on the processor chip.
Functional components of a computer: Control Unit

The memory, arithmetic and logic, and I/O units store and process information.

Perform input and output operations.

Their operations needs to be coordinated.

The control unit sends control signals to other units and senses their states.

I/O transfers, are controlled by program instructions.

The identify the devices involved and the information to be transferred.

Control circuits are responsible for generating the timing signals that:

Govern the transfers.

Determine when a given action is to take place.

The control circuitry is physically distributed throughout the computer.
Functional components of a computer:

Configuration of logic components designed for a specific
computation is constructed.

Connecting the various components in the desired
configuration is a form of ‘hardwired’ programming.

With general-purpose hardware, the system accepts data and
control signals.

Only new set of control signals needed instead of rewiring the
hardware for each new program.

Data and instructions must be put into the system.

Input modules are needed.

Output module serves the results.
Functional components of a computer:

I/O devices are sequential.

Programs may require non-sequential execution.

Memory is used for non-sequential data access and
instruction execution.

Von Neumann architecture saves both instructions
and data in same memory.

MAR: specifies the address in memory for the next
read or write.

MBR: contains the data to be written into memory
or receives the data read from memory.

I/O AR: specifies a particular I/O device.

I/O BR: used for the exchange of data between an
I/O module and the CPU.
Computer function: In detail

Computer executes instructions.

Instruction processing consists of two steps:

The processor reads (fetches) instructions from memory one at a time (fetch cycle).

Executes each instruction (execute cycle).

Program execution consists of repeating the process of instruction fetch and instruction execution.

The processing required for a single instruction is called an instruction cycle.

Program execution halts only if:

The machine is turned off.

Unrecoverable error occurs.

Halt signal is encountered.

Figure 3.3 Basic Instruction Cycle


Instruction Fetch and Execute:

The processor fetches an instruction from memory at the start of instruction cycle.

A register called the program counter (PC) holds the address of the instruction to be fetched next.

The processor increments the PC after each instruction fetch:

In order to fetch the next instruction in sequence.

If same instruction is to be executed, PC is not incremented.

Consider 16-bit word of memory.

The program counter is set to location 300.

The processor will next fetch the instruction at location 300.

On succeeding instruction cycles, it will fetch instructions from locations 301, 302, 303, and so on.

The fetched instruction is loaded into the instruction register (IR).

Instruction bits specify the action to take.

The processor interprets the instruction and performs the required action.
Instruction Fetch and Execute:

In general, processor actions fall into four categories:

Processor-memory: Data may be transferred from processor to memory or from memory to
processor.

Processor-I/O: Data transferred to or from a peripheral device (I/O module).

Data processing: The processor may perform some arithmetic or logic operation on data.

Control: An instruction may specify that the sequence of execution be altered.

The processor may fetch an instruction from location 149.

It specifies the next instruction is from location 182.

PC is set to 182.

The instruction will be fetched from location 182 rather than 150.

An instruction’s execution may involve a combination of these actions.
Instruction Fetch and Execute:

Consider the characteristics of a hypothetical machine as in figure.

The processor contains a single data register, called an accumulator (AC).

Instructions and data are 16 bits long. Memory is organized using 16-bit words.

The instruction format provides:

4 bits for the opcode, 24 =16 opcodes.

12 bits for address. 212 = 4096 (4K) words of memory can be directly addressed
Instruction Fetch and Execute:

Figure illustrates a partial program with hexadecimal
notation (in memory and processor registers).

Program adds the contents of the memory word at
address 940 to the contents of the memory word at
address 941.

Stores the result in the 941 location.
Instruction Fetch and Execute:

Three instruction cycles are used in the example.

Each consisting of a fetch cycle and an execute cycle.

With a more complex set of instructions, fewer cycles would be needed.

Older processors included instructions that contain more than one memory address.

The execution cycle for a particular instruction on such processors could involve more than one
reference to memory.

Instead of memory references, an instruction may specify an I/O operation.
Instruction Fetch and Execute:

The PDP-11 processor includes an instruction, expressed symbolically as ADD B,A

Stores the sum of the contents of memory locations B and A into memory location A.

A single instruction cycle with the following steps occurs:

Fetch the ADD instruction.

Read the contents of memory location A into the processor.

Read the contents of memory location B into the processor.

Two accumulators are needed in order to save A and B.

Add the two values.

Write the result from the processor to memory location A.
Instruction Fetch and Execute:

The execution cycle for a particular instruction may involve more than one reference to memory.

Instead of memory references, an instruction may specify an I/O operation.

A more detailed look at the basic instruction cycle of figure 3.3 is given in figure 3.6.

For any given instruction cycle:

Some states may be null.

Others may be visited more than once.
Instruction address calculation (iac):

Determine the address of the next instruction to be executed.

This involves adding a fixed number to the address of the

previous instruction.

If each instruction is 16 bits long and memory is

organized into 16-bit words, add 1 to the previous address.

For 8-bit words, add 2 to the previous address.
Instruction Fetch and Execute:

Instruction fetch (if):



Read instruction from its memory location into the
processor.
Instruction operation decoding (iod):

Analyze instruction to determine type of operation to be
performed and operand(s) to be used.

Operand address calculation (oac):



If the operation involves reference to an operand in memory or available via I/O, then determine the
address of the operand.
Operand fetch (of): Fetch the operand from memory or read it in from I/O.
Data operation (do): Perform the operation indicated in the instruction.
Operand store (os): Write the result into memory or out to I/O.
Instruction Fetch and Execute:

States in the upper part of figure:

Involve an exchange between the processor and either
memory or an I/O module.

States in the lower part of the figure:

Involve only internal processor operations.


The oac state appears twice: an instruction may involve a read, a write, or both.

The diagram allows for multiple operands and multiple results (some machines require this).

The PDP-11 instruction ADD A,B results in the following sequence of states:

iac, if, iod, oac, of, oac, of, do, oac, os.

On some machines, a single instruction can specify an operation to be performed on a vector:

One-dimensional array of numbers

A string (one-dimensional array) of characters

This would involve repetitive operand fetch and/or store operations.
Interrupts:

This is a mechanism by which other modules (I/O, memory) may interrupt the normal processing of the
processor.

The most common classes of interrupts are shown in the table.

Interrupts are provided primarily as a way to improve processing efficiency.

Most external devices are much slower than the processor.

Suppose the processor is transferring data to a printer using the instruction cycle scheme of Fig.3.3.

After each write operation, the processor must pause and remain idle until the printer catches up.

The length of this pause may be on the order of many hundreds or even thousands of instruction
cycles that do not involve memory.

This is a very wasteful use of the processor.
Interrupts:

The most common classes of interrupts are shown in the table.
Interrupts:

The user program performs a series of WRITE calls interleaved with
processing.

Code segments 1, 2, and 3 refer to sequences of instructions that do not
involve I/O.

The WRITE calls are to an I/O program that is a system utility.

System utility will perform the actual I/O operation.

The I/O program consists of three sections:

Program flow of control without and with


Interrupts

(a) No Interrupts
Interrupts and instruction cycle:

With interrupts, the processor can be engaged in executing other instructions
while an I/O operation is in progress.

The user program makes a system call in the form of a WRITE call.

The I/O program invoked in this case consists only of the preparation code
and the actual I/O command.

After the few instructions, control returns to the user program.

Meanwhile, the external device is busy accepting data from computer
memory and printing it.

I/O operation is conducted concurrently with the execution of instructions in
the user program.

Program flow of control without and with


Interrupts

(b) Interrupts, short I/O waits


Interrupts and instruction cycle:

When the external device is ready to accept more data from the processor,
the I/O module for that external device sends an interrupt request signal to
the processor.

The processor responds by:

Suspending operation of the current program.

Branching off to interrupt handler to service that particular I/O device.

Resuming the original execution after the device is serviced.

Program flow of control without and with


Interrupts

(b) Interrupts, short I/O waits


Interrupts and instruction cycle:

An interrupt from the point of view of user program:

An interruption of the normal sequence of execution.

When the interrupt processing is completed, execution resumes.

The user program does not have to contain any special code to accommodate interrupts.

The processor and the operating system are responsible for:

Suspending the user program.

After interrupt is completed, resuming it at the same point.
Interrupts and instruction cycle:

To accommodate interrupts, an interrupt cycle is added to the instruction cycle.

In the interrupt cycle, the processor checks to see if any interrupts have occurred.

This is indicated by the presence of an interrupt signal.

If no interrupts are pending:

The processor proceeds to the fetch cycle and fetches the next instruction of the current program.

If an interrupt is pending, the processor does the following:


The processor now proceeds to the fetch cycle and

fetches the first instruction in the interrupt handler program.

This will serve the interrupt.

The interrupt handler program is generally part of

the operating system.
Interrupts and instruction cycle:
Interrupts and instruction cycle:

The user program reaches the second WRITE call.

The I/O operation spawned by the first call is incomplete.

The user program is hung up at that point.

New WRITE call may be processed when the preceding I/O operation is completed.

I/O operation overlaps with the execution of user instructions.

Gain in efficiency.
Interrupts and instruction cycle:

A revised instruction cycle state diagram that includes interrupt cycle processing is shown in Fig. 3.12.
Multiple interrupts:

Multiple interrupts can occur in a system.

A program may be receiving data from a communications line and printing results.

The printer will generate an interrupt every time that it completes a print operation.

The communication line controller will generate an interrupt every time a unit of data arrives.

It is possible for a communications interrupt to occur while a printer interrupt is being processed.

Two approaches to deal with multiple interrupts:

Disable interrupts while an interrupt is being processed. The processor can and will ignore that
interrupt request signal.

If an interrupt occurs, it will remain in pending state. It will be check by processor after interrrupt
enable.

Second approach is to define priorities for interrupts.
Multiple interrupts:

When a user program is executing and an interrupt occurs, interrupts are disabled immediately.

After the interrupt handler routine completes, interrupts are enabled before resuming the user
program.

The processor checks to see if additional interrupts have occurred.

This approach does not take into account relative priority or time-critical needs.

Example:
When input arrives from the communications line,
it may need to be absorbed rapidly to make room for more input.
If the first batch of input has not been processed before
the second batch arrives, data may be lost.
Multiple interrupts:

Second approach is to define priorities for interrupts.

This allows an interrupt of higher priority to cause a lower-priority interrupt handler to be itself
interrupted (Fig. 3.13 (b)).

Consider a system with three I/O devices:

A printer - P2, a disk - P4, a communications line – P5.
Multiple interrupts:

A user program begins at t = 0.

At t = 10, a printer interrupt occurs.

User information is placed on the system stack and execution continues at the printer interrupt
service routine (ISR).

While ISR still executing, at t = 15, a communications interrupt occurs.

This interrupt is honored as comm line has P5.

The printer ISR is interrupted.

Its state is pushed onto the stack.

Execution continues at the communications ISR.

A disk interrupt occurs at t = 20.

This interrupt is held disk interrupt is P4.

The communications ISR runs to completion.
Multiple interrupts:

The communications ISR is complete (t = 25).

The previous processor state (printer ISR) is restored.

Before executing any instruction in printer ISR (P2), processor honors disk interrupt (P4).

Control transfers to the disk ISR.

When disk ISR is complete at t = 35, printer ISR is resumed.

When printer ISR completes at t = 40, control finally returns to the user program.
I/O Function:

The operation of the computer as controlled by the processor is seen.

The interaction between processor and memory is also seen.

An I/O module (e.g., a disk controller) can exchange data directly with the processor.

A processor can read/write into/from memory with address of specific location.

The processor can also read data from or write data to an I/O module.

The processor identifies a specific device that is controlled by a particular I/O module.

I/O instructions rather than memory-referencing instructions are executed in this mode.

It is desirable to allow I/O exchanges to occur directly with memory.

The processor grants to an I/O module the authority to read from or write to memory.

That I/O-memory transfer can occur without tying up the processor.

During such a transfer, the I/O module issues read or write commands to memory.

Relieving the processor of responsibility for the exchange.

This operation is known as direct memory access (DMA).
Interconnection Structures:

A computer consists of a set of components or modules of three basic types:

Processor, memory, I/O.

The set of components communicate with each other.

A computer is a network of basic modules.

There must be paths for connecting the modules.

Interconnection Structure: The collection of paths connecting the various modules.

Fig 3.15 suggests the types of exchanges, the major forms of input and output for each module type is
indicated.

The wide arrows represent multiple signal lines carrying multiple bits of information in parallel.

Each narrow arrows represents a single signal line.
Interconnection Structures:
Memory:

A memory module will consist of N words of equal length.

Each word is assigned a unique numerical address (0, 1, . . . , N – 1).

A word of data can be read from or written into the memory.

The nature of the operation is indicated by read and write control signals.

The location for the operation is specified by an address.
I/O module:

From an internal (to the computer system) point of view, I/O is functionally similar to memory.

There are two operations, read and write.

An I/O module may control more than one external device.

Each of the interfaces to an external device is called as a port.

Each port is given a unique address (e.g., 0, 1, . . . , M – 1).

There are external data paths for the input and output of data with an external device.

An I/O module may be able to send interrupt signals to the processor.
Interconnection Structures:
Processor:

The processor reads in instructions and data.

Writes out data after processing.

Uses control signals to control the overall operation of the system.

It also receives interrupt signals.
The interconnection structure must support the following types of transfers:

The most common interconnection structure is the bus interconnection.


Bus Interconnection:
 A bus is a communication pathway connecting two or more devices.

A bus is a shared transmission medium.

Multiple devices connect to the bus.

A signal transmitted by any one device is available for reception by all other devices attached to the bus.

If two devices transmit during the same time, their signals interfere.

Only one device at a time can successfully transmit.

A bus consists of multiple communication pathways, or lines.

Each line is capable of transmitting signals representing binary 1 and binary 0.

Several lines of a bus can be used to transmit binary digits simultaneously (in parallel).

An 8-bit unit of data can be transmitted over eight bus lines.

Computer systems contain a number of different buses.

Each provide pathways between components at various levels.

System bus: connects major computer components (processor, memory, I/O).
Bus Interconnection: Bus Structure
 A system bus consists of about 50 to hundreds of separate lines.

Each line is assigned a particular meaning or function.

The lines can be classified into three functional group: data, address, and control lines.

In addition, power distribution lines that supply power to the attached modules.
 The data lines provide a path for moving data among system modules.

These lines are collectively called data bus. May consist of 32, 64, 128, or even more separate lines.

The number of lines being referred to as the width of the data bus.

Each line carry 1 bit. Number of lines determine data rate.

Width of bus is a key factor in determining overall system performance.

Eg. Data bus: 32 bits wide. Instructions: 64 bits long.

The processor must access the memory module twice during each instruction cycle.
Bus Interconnection: Bus Structure
 The address lines are used to designate the source or destination of the data on the data bus.

The processor wishes to read a word (8, 16, or 32 bits) of data from memory.

It puts the address of the desired word on the address lines.

The width of the address bus determines the maximum possible memory capacity of the system.
 The address lines are generally also used to address I/O ports.

The higher-order bits are used to select a particular module on the bus.

The lower-order bits select a memory location or I/O port within the module.

Eg. On an 8-bit address bus:

Address 01111111 and below: reference locations in a memory module (module 0) with 128 words of
memory.

Address 10000000 and above: refer to devices attached to an I/O module (module 1).
Bus Interconnection: Bus Structure
 The control lines are used to control the access to and the use of the data and address lines.

The data and address lines are shared by all components. There must be a means of controlling their
use.

Control signals transmit both command and timing information among system modules.

Timing signals indicate the validity of data and address information.

Command signals specify operations to be performed.
 Typical control lines include:
Bus Interconnection: Bus Operation
 If one module wishes to send data to another, it must do two things:

Obtain the use of the bus – gain access to it.

Transfer data via the bus.
 If one module wishes to request data from another module:

Obtain the use of the bus.

Transfer a request to the other module over the appropriate control and address lines.

Wait for that second module to send the data.
Bus Interconnection: Bus Structure
 Physically, the system bus is actually a number of parallel electrical conductors.
 In the classic bus arrangement, these conductors are metal lines etched in a card or board (printed circuit
board).
 The bus extends across all of the system components, each of which taps into some or all of the bus lines.
 A bus with two vertical columns of conductors is shown.
 Attachment points in the form of slots are given at regular intervals.

To support a PCB.
 Major system components occupies one or more boards and plugs
into the bus at these slots.
Bus Interconnection: Bus Structure
 Modern systems tend to have all of the major components on the same board.

With more elements on the same chip as the processor.

An on-chip bus may connect the processor and cache memory.

An on-board bus may connect the processor to main memory and other components.
 A small computer system may be acquired and then expanded later.

More memory, more I/O by adding more boards.
 If a component on a board fails, that board can easily be removed and replaced.
Bus Interconnection: Multiple-Bus Hierarchies
 If number of devices are connected to the bus increase, performance will suffer.

The more devices attached to the bus, the greater the bus length and hence the greater the
propagation delay.

Delay determines the time it takes for devices to coordinate the use of the bus.

Propagation delays can noticeably affect performance when control of the bus passes from one device to
another frequently.

The bus may become a bottleneck as the aggregate data transfer demand approaches the
capacity of the bus.

Problem can be countered by increasing the data rate the bus can carry and by using wider buses.

Increasing the data bus from 32 to 64 bits.

The data rates generated by attached devices (e.g., graphics and video controllers, network interfaces)
are growing rapidly, a single bus is ultimately destined to lose.
 Most computer systems use multiple buses, generally laid out in a hierarchy.
Bus Interconnection: Multiple-Bus Hierarchies
 A local bus that connects the processor to a cache memory.

May support one or more local devices.

The cache memory controller connects the cache to

A system bus to which all of the main memory modules are attached, and a local bus.

The use of a cache structure insulates the processor from a requirement to access main memory frequently.

Main memory can be moved off of the local bus onto a system bus.

I/O transfers to and from the main memory across the system bus do not interfere with the processor’s activity.
 It is possible to connect I/O controllers directly onto the system bus.

Use of one or more expansion buses for this purpose.

An expansion bus interface buffers data transfers between:

The system bus and the I/O controllers on the expansion bus.

This allows the system to support a wide variety of I/O devices.

At the same time insulate memory-to-processor traffic from I/O traffic.
Bus Interconnection: Multiple-Bus Hierarchies
 Traditional bus architecture is reasonably efficient.
 Begins to break down as higher and higher performance is seen in the I/O devices.
 A common approach taken by industry is to build a high-speed bus that is closely integrated with the rest of the
system.
 This requirs only a bridge between the processor’s bus and the high-speed bus.
 This arrangement is sometimes known as a mezzanine architecture.
 A local bus connects the processor to a cache controller.
 Cache controller connected to system bus that supports main memory.
 The cache controller is integrated into a bridge, or buffering device.

Connects to the high-speed bus.
 Lower-speed devices are supported by an expansion bus.

Interface for buffering traffic between:

The expansion bus and the high-speed bus.
The high-speed bus brings high-demand devices into
closer integration with the processor. At the same time is independent of
processor. Differences in processor and high-speed bus speeds and signal line definitions are tolerated.
Bus Interconnection: Peripheral Component Interconnect (PCI)
 PCI is a popular high-bandwidth, processor-independent bus that can function as a mezzanine or peripheral
bus.
 The current standard allows the use of up to 64 data lines at 66 Mhz.

A raw transfer rate of 528 MByte/s, or 4.224 Gbps.
 PCI is designed to meet economically the I/O requirements of modern systems.

It requires very few chips to implement.

Supports other buses attached to the PCI bus.
Performance of a computer: The System Clock
 A system clock governs:

Fetching an instruction, decoding the instruction, performing an arithmetic operation, etc.
 All operations begin with the pulse of the clock.

The speed of a processor is dictated by the pulse frequency produced by the clock.

Measured in cycles per second, or Hertz (Hz).
 A 1-GHz processor receives 1 billion pulses per second.

The rate of pulses is known as the clock rate, or clock speed.

One increment, or pulse, of the clock is referred to as a clock cycle, or a clock tick.
 The clock rate must be appropriate for the physical layout of the processor.

Signals to be sent from one processor element to another.

When a signal is placed on a line, it takes finite amount of time for the voltage levels to settle down to get
an accurate value (1 or 0).
Performance of a computer: Instruction Execution Rate
 A processor is driven by a clock with a constant frequency f or, equivalently, a constant cycle time τ = 1/f.
 The instruction count, Ic:

The number of machine instructions executed for that program until:

It runs to completion, or for some defined time interval.
 An important parameter is the average cycles per instruction (CPI) for a program.
 If all instructions required the same number of clock cycles, then CPI would be a constant value for a
processor.
 But, the number of clock cycles required varies for different types of instructions.

 Let CPIi be the number of cycles required for instruction type i.

 Let Ii be the number of executed instructions of type i for a given program.


 The overall CPI is:
Performance of a computer: Instruction Execution Rate
 The processor time T needed to execute a given program can be expressed as
 During the execution of an instruction:

Part of the work is done by the processor.

Part of the time a word is being transferred to or from memory.

The time to transfer depends on the memory cycle time, which may be greater than the processor
cycle time.
 The time equation is modified as:

p - the number of processor cycles needed to decode and execute the instruction.

m - the number of memory references needed.

k - the ratio between memory cycle time and processor cycle time.
Performance of a computer: Instruction Execution Rate
 The five performance factors (Ic, p, m, k, τ) are influenced by four system attributes:

The design of the instruction set (known as instruction set architecture).

Compiler technology (how effective the compiler is in producing an efficient machine language program
from a high-level language program)

Processor implementation.

Cache and memory hierarchy.
 The rate at which instructions are executed expressed as millions of instructions per second (MIPS):

Referred to as the MIPS rate.
 The MIPS rate in terms of the clock rate and CPI:
Performance of a computer: Instruction Execution Rate
 The MIPS rate in terms of the clock rate and CPI:
 Consider the execution of a program:

Execution of 2 million instructions on 400-MHz processor.

The program consists of four major types of instructions.
 The instruction mix and the CPI for each instruction type are given:
 The average CPI when the program is executed on a uniprocessor is

 The corresponding MIPS rate is


 Another common performance measure deals only with floating-point instructions.

Common in many scientific and game applications

Expressed as millions of floating-point operations per second (MFLOPS)
Performance of a computer: Benchmarks
 MIPS and MFLOPS are inadequate to evaluating the performance of processors.
 The instruction execution rate is not a valid means of comparing the performance.

Due to the differences in instruction set architectures.
 Consider this high-level language statement:
 With a complex instruction set computer (CISC), this instruction can be compiled into one processor
instruction
 On a RISC machine, the compilation would look like:

 Both machines may execute the original high-level language instruction in about the same time.
 In this example, the CISC machine is rated at 1 MIPS, the RISC machine would be rated at 4 MIPS.

But both do the same amount of high-level language work in the same amount of time.
 The performance of a processor on a program may not be useful in determining its performance on a very
different type of application.
Performance of a computer: Benchmarks
 A set of benchmark programs were developed for this purpose.

The same set of programs can be run on different machines and the execution times compared.
 The desirable characteristics of a benchmark program are:

It is written in a high-level language, making it portable across different machines.

It is representative of a particular kind of programming style, such as systems programming, numerical
programming, or commercial programming.

It can be measured easily.

It has wide distribution.
 A benchmark suite is a collection of programs.

Defined in a high-level language.

These programs together attempt to provide a representative test of a computer in a particular application
or system programming area.
 The best known collection of benchmark suites is defined and maintained by the System Performance
Evaluation Corporation (SPEC).
Performance of a computer: Benchmarks
 The best known collection of benchmark suites is defined and maintained by the System Performance
Evaluation Corporation (SPEC).
 SPEC CPU2006 if for processor-intensive applications.

Appropriate for measuring performance for applications that spend most of their time doing computation
rather than I/O.

Consists of 17 floating-point programs written in C, C_x0002__x0002_, and Fortran.

12 integer programs written in C and C_x0002__x0002_++.

Contains over 3 million lines of code.
 Older SPEC CPU: SPEC CPU2000, SPEC CPU95, SPEC CPU92, and SPEC CPU89
 Other SPEC suites:
Performance of a computer: Averaging Results
 Run a number of different benchmark programs on a machine and then average the results.
 With m different benchmark program:
 Ri is the high-level language instruction execution rate for the ith benchmark program.
 An alternative is to take the harmonic mean:

 The user is concerned with the execution time of a system, not its execution rate.

Arithmetic mean result is proportional to the sum of the inverses of execution times.

Not inversely proportional to the sum of execution times.
 The harmonic mean instruction rate is the inverse of the average execution time.
Performance of a computer: Averaging Results
 Two fundamental metrics are of interest from SPEC:

Speed metric: Measures the ability of a computer to complete a single task.

Rate metric: Measures the throughput or rate of a machine carrying out a number of tasks.
 SPEC defines a base runtime for each benchmark program using a reference machine.

Results are reported as the ratio of the reference run time to the system run time.

Trefi is the execution time of benchmark program i on the reference system.

Tsuti is the execution time of benchmark program i on the system under test.
 Example: Sun Blade 6250

SPEC CPU2006 integer benchmark is 464.h264ref

Tsuti = 934 sec, Trefi = 22136 sec ==> ratio is: 22136/934 _x0003_= 23.7
 Overall performance measure for the system under test is average values for the ratios for all 12 integer
benchmarks.
Performance of a computer: Averaging Results
 Overall performance measure for the system under test is average values for the ratios for all 12 integer
benchmarks.
 SPEC specifies the use of a geometric mean:

ri is the ratio for the ith benchmark program.
 For the Sun Blade 6250:

 The speed metric is:

You might also like