Professional Documents
Culture Documents
Pipelining is a technique of decomposing a sequential process into class of techniques that are used to provide simultaneous data-processing
suboperations, with each subprocess being executed in a special dedicated tasks for the purpose of increasing the computational speed of a computer
segment that operates concurrently with all other segments. The system. The purpose of parallel processing is to speed up the computer
overlapping of computation is made possible by associating a register with processing capability and increase its throughput, that is, the amount of
each segment in the pipeline. The registers provide isolation between processing that can be accomplished during a given interval of time. The
each segment so that each can operate on distinct data simultaneously. amount of hardware increases with parallel processing, and with it, the
Perhaps the simplest way of viewing the pipeline structure is to imagine cost of the system increases. Parallel processing can be viewed from
that each segment consists of an input register followed by a various levels of complexity. o At the lowest level, we distinguish between
combinational circuit. o The register holds the data. o The combinational parallel and serial operations by the type of registers used. e.g. shift
circuit performs the suboperation in the particular segment. A clock is registers and registers with parallel load o At a higher level, it can be
applied to all registers after enough time has elapsed to perform all achieved by having a multiplicity of functional units that perform identical
segment activity. The pipeline organization will be demonstrated by or different operations simultaneously. Fig. 4-5 shows one possible way
means of a simple example. o To perform the combined multiply and add of separating the execution unit into eight functional units operating in
operations with a stream of numbers Ai * Bi + Ci for i = 1, 2, 3, …, 7 parallel. o A multifunctional organization is usually associated with a
complex control unit to coordinate all the activities among the various
components.
RISC Pipeline
To use an efficient instruction pipeline o To implement an instruction
pipeline using a small number of suboperations, with each being executed
in one clock cycle. o Because of the fixed-length instruction format, the
decoding of the operation can occur at the same time as the register
selection. o Therefore, the instruction pipeline can be implemented with
two or three segments. One segment fetches the instruction from
program memory The other segment executes the instruction in the
ALU Third segment may be used to store the result of the ALU operation
in a destination register The data transfer instructions in RISC are limited
to load and store instructions. o These instructions use register indirect
addressing. They usually need three or four stages in the pipeline. o To Supercomputers A commercial computer with vector instructions and
prevent conflicts between a memory access to fetch an instruction and to pipelined floating-point arithmetic operations is referred to as a
load or store an operand, most RISC machines use two separate buses supercomputer. o To speed up the operation, the components are packed
with two memories. o Cache memory: operate at the same speed as the tightly together to minimize the distance that the electronic signals have
CPU clock One of the major advantages of RISC is its ability to execute to travel. This is augmented by instructions that process vectors and
instructions at the rate of one per clock cycle. o In effect, it is to start each combinations of scalars and vectors. A supercomputer is a computer
instruction with each clock cycle and to pipeline the processor to achieve system best known for its high computational speed, fast and large
the goal of single-cycle instruction execution. o RISC can achieve pipeline memory systems, and the extensive use of parallel processing. o It is
segments, requiring just one clock cycle. Compiler supported that equipped with multiple functional units and each unit has its own pipeline
translates the high-level language program into machine language configuration. It is specifically optimized for the type of numerical
program. o Instead of designing hardware to handle the difficulties calculations involving vectors and matrices of floating-point numbers.
associated with data conflicts and branch penalties. o RISC processors rely They are limited in their use to a number of scientific applications, such as
on the efficiency of the compiler to detect and minimize the delays numerical weather forecasting, seismic wave analysis, and space research.
encountered with these problems. A measure used to evaluate computers in their ability to perform a given
number of floating-point operations per second is referred to as flops. A
typical supercomputer has a basic cycle time of 4 to 20 ns. The examples
of supercomputer: Cray-1: it uses vector processing with 12 distinct
functional units in parallel; a large number of registers (over 150);
multiprocessor configuration (Cray X-MP and Cray Y-MP) o Fujitsu VP-200:
83 vector instructions and 195 scalar instructions; 300 megaflops
SIMD Array Processor An SIMD array processor is a computer with Input-Output Interface Input-output interface provides a method for
multiple processing units operating in parallel. A general block diagram transferring information between internal storage and external I/O
of an array processor is shown in Fig. 9-15. o It contains a set of identical devices. It resolves the differences between the computer and peripheral
processing elements (PEs), each having a local memory M. o Each PE devices. The major differences are: Peripherals are electromechanical
includes an ALU, a floating-point arithmetic unit, and working registers. o and electromagnetic devices and manner of operation is different from
Vector instructions are broadcast to all PEs simultaneously. Masking that of CPU which is electronic component. Data transfer rate of
schemes are used to control the status of each PE during the execution of peripherals is slower than that of CPU. So some synchronization
vector instructions. o Each PE has a flag that is set when the PE is active mechanism may be needed. Data codes and formats in peripherals
and reset when the PE is inactive. For example, the ILLIAC IV computer differ from the word format in CPU and memory. Operating modes of
developed at the University of Illinois and manufactured by the Burroughs peripherals are different from each other and each must be controlled so
Corp. o Are highly specialized computers. o They are suited primarily for as not to disturb other. To resolve these differences, computer system
numerical problems that can be expressed in vector or matrix form. usually include special hardware unit between CPU and peripherals to
supervise and synchronize I/O transfers, which are called Interface units
since they interface processor bus and peripherals
I/O subsystem The input-output subsystem of a computer, referred as I/O, I/O commands The function code provided by processor in control line is
provides an efficient mode of communication between the central system called I/O command. The interpretation of command depends on the
and outside environment. Data and programs must be entered into the peripheral that the processor is addressing. There are 4 types of
computer memory for processing and result of computations must be commands that an interface may receive:
must be recorded or displayed for the user. a) Control command: Issued to activate the peripheral and to inform it
Peripheral devices Input or output devices attached to the computer are what to do? E.g. a magnetic tape unit may be instructed to backspace tape
called peripherals. Keyboards, display units and printers are most common by one record.
peripheral devices. Magnetic disks, tapes are also peripherals which b) Status command: Used to check the various status conditions of the
provide auxiliary storage for the system. Not all input comes from people interface before a transfer is initiated.
and not all intended for people. In various real time processes as machine c) Data input command: Causes the interface to read the data from the
tooling, assembly line procedures and chemical & industrial processes, peripheral and places it into the interface buffer. [HEY! Processor checks if
various processes communicate with each other providing input and/or data are available using status command and then issues a data input
outputs to other processe command. The interface places the data on data lines, where they are
I/O organization of a computer is a function of size of the computer and accepted by the processor]
the devices connected to it. In other words, amount of hardware d) Data output command: Causes the interface to read the data from the
computer possesses to communicate with no. of peripheral units, bus and saves it into the interface buffer.
differentiate between small and large system. IO devices
communicating with people and computer usually transfer alphanumeric
I/O Bus versus Memory Bus In addition to communicating with I/O,
information using ASCII binary encoding.
processor also has to work with memory unit. Like I/O bus, memory bus
contains data, address and read/write control lines. 3 physical
organizations, the computer buses can be used to communicate with
memory and I/O: a) Use two separate buses, one for memory and other
for I/O: Computer has independent sets of data, address and control
buses, one for accessing memory and other for I/O. usually employed in a
computer that has separate IOP (Input Output Processor). b) Use one
common bus for both memory and I/O having separate control lines c) Use
one common bus for memory and I/O with common control lines
Isolated I/O versus Memory-Mapped I/O Question: I/O interface Unit (an example)
Differentiate between isolated I/O and memory-mapped I/O. Isolated I/O I/O interface unit is shown in the block diagram below, it consists:
Configuration Two data registers called ports. A control register A status register
Isolated I/O Configuration Two functionally and physically separate Bus buffers Timing and control circuits
buses o Memory bus o I/O bus Each consists of the same three main
groupings of wires o Address (example might be 6 wires, or up to 2 6 = 64
devices) o Data o Control
Each device interface has a port which consists of a minimum of: o Control
register o Status register o Data-in (Read) register (or buffer) o Data-out
(Write) register (or buffer)
CPU can execute instructions to manipulate the I/O bus separate from
the memory bus Now only used in very high performance systems
Interface communicates with CPU through data bus Chip select (CS)
and Register select (RS) inputs determine the address assigned to the
interface. I/O read and write are two control lines that specify input and
output. 4 registers directly communicates with the I/O device attached
to the interface.
Flowchart of the program that must be written to the CPU is shown here.
The transfer of each byte (assuming device is sending sequence of bytes)
requires 3 instructions: a) Read status register b) Check F bit. If not set
branch to a) and if set branch to c). c) Read data register
Interrupt-initiated I/O
Since polling (constantly monitoring the flag F) takes valuable CPU time,
alternative for CPU is to let the interface inform the computer when it is
ready to transfer data. This mode of transfer uses the interrupt facility.
While the CPU is running a program, it does not check the flag. However,
when the flag is set, the computer is momentarily interrupted from
proceeding with the current program and is informed of the fact that the
flag has been set. The CPU deviates from what it is doing to take care of
the input or output transfer. After the transfer is completed, the computer
returns to the previous program to continue what it was doing before the
interrupt. The CPU responds to the interrupt signal by storing the return
address from the program counter into a memory stack and then control
branches to a service routine that processes the required I/O transfer.
Input-Output Processor (IOP) IOP is a processor with direct memory Data Communication Processor (DCP)
access capability that communicates with I/O devices. In this Data communication processor (DCP) is an I/O processor that distributes
configuration, the computer system can be divided into a memory unit, and collects data from many remote terminals connected through
and a number of processors comprised of CPU and one or more IOPs. telephone and other communication lines. It is a specialized I/O processor
IOP is similar to CPU except that it is designed to handle the details of I/O designed to communicate directly with data communication networks
processing. Unlike DMA controller (which is set up completely by the (which may consists of wide variety of devices as printers, displays,
CPU), IOP can fetch and execute its own instructions. IOP instructions are sensors etc.). So DCP makes possible to operate efficiently in a time-
designed specifically to facilitate I/O transfers. Instructions that are read sharing environment.
form memory by an IOP are called commands to differ them form Difference between IOP and DCP: Is the way processor communicates
instructions read by CPU. The command words constitute the program for with I/O devices. • An I/O processor communicates with the peripherals
the IOP. The CPU informs the IOP where to find commands in memory through a common I/O bus i.e. all peripherals share common bus and use
when it is time to execute the I/O program. The memory occupies a to transfer information to and from I/O processor. • DCP communicates
central position and can communicate with each processor by means of with each terminal through a single pair of wires. Both data and control
DMA. CPU is usually assigned the task of initiating the I/O program, from information are transferred in serial fashion. DCP must also communicate
then on; IOP operates independent of the CPU and continues to transfer with the CPU and memory in the same manner as any I/O processor
data from external devices and memory.
Serial and parallel communication
Serial: Serial communication is the process of sending data one bit at a
time, sequentially, over a communication channel or computer bus. This is
in contrast to parallel communication. Parallel: Parallel communication is a
method of sending several data signals simultaneously over several
parallel channels. It contrasts with serial communication; this distinction is
one way of characterizing a communications link
.
The mapping from address space to memory space becomes easy if virtual
address is represented by two numbers: a page number address and a line Where Base address field gives the base of the page table address in
with in the page. In a computer with 2 p words per page, p bits are used to segmented-page organization Length field gives the segment size (in
specify a line address and remaining high-order bits of the virtual address number of pages) The protection field specifies access rights available to
specify the page number. a particular segment. The protection information is set into the descriptor
by the master control program of the OS. Some of the access rights are: o
Full read and write privileges o Read only (Write protection) o Execute
only (Program protection) o System only (OS protection