You are on page 1of 170

UNIT- 5

I/O ORGANIZATION PIPELING AND VECTOR


 Peripheral Devices PROCESSING
 I/O sub system or IO Module  Basic Concepts

 I/O Interface  ILP-Instruction Level Parallelism


 Arithmetic & Instruction pipelining
 Input-Output Processor
 Throughput & Speed up
 Modes of Transfer or IO Transfer
 Pipeline Hazards
1) Programmed I/O or Program initiated
 Case study- Introduction to x86
I/O or Program controlled I/O
architecture
2) Interrupt driven I/O or Interrupt
initiated I/O
3) Direct Memory Access(DMA)
 Interrupts and exceptions
 I/O device interfaces – SCSI, USB
1.Peripheral Devices

• The I/O organization of computer depends upon the size of


computer and the peripherals connected to it.
• The I/O Subsystem or also known as I/O Module of the
computer, provides an efficient mode of communication
between the central system and the outside environment.
• The most common input/output devices are:
i) Monitor
ii) Keyboard
iii) Mouse
iv) Printer
v) Magnetic tapes & Disks
1.Peripheral Devices(cont’d)
• Devices that are under direct control of computer are
said to be "connected on-line".
• Input or output devices attached to the computer are also
called peripherals.
• There are three types of peripherals :
• Input peripherals
• Output peripherals
• Input-output peripherals
1. Peripheral Devices(cont’d)

• Input peripherals : Allows user input, from the outside


world to the computer.
Example: Keyboard, Mouse etc.
• Output peripherals: Allows information output, from
the computer to the outside world. Example: Printer,
Monitor etc
• Input-Output peripherals: Allows both input(from
outside world to computer) as well as, output(from
computer to the outside world). Example: Touch
screen etc.
1. Peripheral Devices(cont’d)
Input Devices Output Devices
• Keyboard
• Optical input devices • CRT
- Card Reader • Printer (Daisy Wheel, Dot
- Paper Tape Reader Matrix, Laser)
- Bar code reader • Plotter
- Optical Mark Reader
• Magnetic Input Devices
- Magnetic Stripe
Reader
• Screen Input Devices
- Touch Screen
- Light Pen
- Mouse
1. Peripheral Devices(cont’d)
• We can broadly classify peripheral devices into three
categories:
- Human Readable: Communicating with the computer
users, e.g. Video display terminal, Printers etc.
- Machine Readable: Communicating with
equipment's, e.g. Magnetic disk, Magnetic tape,
sensor, Actuators used in robotics etc.
- Communication: Communicating with remote
devices means exchanging data with that,
e.g. Modem, NIC (network interface Card) etc.
2.Input Output Organization - IO Module

– I/O Subsystem / IO Module


• Provides an efficient mode of communication between
the central system and the outside environment.
• Programs and data must be entered into computer
memory for processing and results obtained from
computer must be recorded and displayed to user.
2. I/O Modules(Cont’d)

• The I/O module is a special


hardware component
interface between the CPU
and peripherals to supervise
and synchronize all I/O
transformation
• I/O modules interface to
the system bus or central
devices (CPU and Memory),
interfaces and controls to
one or more peripheral
devices.
2. I/O Modules(Cont’d)

• Peripherals are not directly connected to the system bus instead an I/O module is used
which contains logic for performing a communication between the peripherals and the
system bus.
• The reasons due to which peripherals do not directly connected to the system bus are:
-There are a wide variety of peripherals with various methods of operation. It would be
impractical to incorporate the necessary logic within the processor to control a range of
devices.
-The data transfer rate of peripherals is often much slower than that of the memory or
processor. Thus, it is impractical to use high speed system bus to communicate directly
with a peripheral and vice versa.
-Peripherals often use different data format and word length than the computer to
which they are connected.
Thus an I/O module is required which performs two major functions.
• Interface to the processor and memory via the system bus
• Interface to one or more peripherals by tailored data links
I/O Modules(Cont’d)
The detailed functions of I/O modules are:

1) Control & Timing: I/O module includes control and timing to


coordinate the flow of traffic between internal resources and external
devices.
• The control of the transfer of data from external devices to processor
consists following steps:
- The processor interrogates the I/O module to check status of the
attached device.
- The I/O module returns the device status.
o If the device is operational and ready to transmit, the processor
requests the transfer of data by means of a command to I/O module.
- The I/O module obtains the unit of data from the external device.
- The data are transferred from the I/O module to the processor.
2. I/O Modules(Cont’d)

2) Processor Communication:
I/O module communicates with the processor which
involves
- Command decoding: I/O module accepts commands
from the processor.
- Data: Data are exchanged between the processor and
I/O module over the bus.
- Status reporting: Peripherals are too slow and it is
important to know the status of I/O module.
- Address recognition: I/O module must recognize one
unique address for each peripheral it controls.
2. I/O Modules(Cont’d)

3) Device Communication: It involves commands, status


information and data.
4) Data Buffering: I/O module must be able to operate at both
device and memory speeds. If the I/O device operates at a rate
higher than the memory access rate, then the I/O module
performs data buffering. If I/O devices rate slower than
memory, it buffers data so as not to tie up the memory in
slower transfer operation.
5) Error Detection: I/O module is responsible for error
detection such as mechanical and electrical malfunction
reported by device e.g. paper jam, bad ink track & unintentional
changes to the bit pattern and transmission error.
2. I/O Modules(Cont’d)
2. I/O Modules(Cont’d)

• The I/O bus from the processor is attached to all peripheral


interfaces.
• To communicate with the particular devices, the processor places
a device address on the address bus.
• Each interface contains an address decoder that monitors the
address line.
• When the interface detects the particular device address, it
activates the path between the data line and devices that it
controls.
• At the same time that the address is made available in the
address line, the processor provides a function code in the control
way includes control command, output data and input data.
3. Input-Output interface

• Input-Output interface provides a method for transferring


information between internal storage (such as memory and
CPU registers) and external I/O devices.
• Peripherals connected to a computer need special
communication links for interfacing with the central
processing unit.
• They are special hardware components between CPU
and peripheral to supervise and synchronize all input
and output transfer.
3.Input-Output Interface( Cont’d)
• They are called interface units because they interface between the
processor bus and the peripheral device.

The CPU is interfaced to the data bus and address bus through the

MDR and MAR registers, respectively. ...

Then, the CPU waits for the memory to finish the requested transfer

operation.

It is required that the CPU keeps the Read or Write signal set until

the memory finishes the requested operation.


I/O INTERFACE(CONT’D)

• Resolves the differences between the computer and peripheral devices


(1) Peripherals – Electromechanical or Electromagnetic Devices
• CPU or Memory - Electronic Device
– Conversion of signal values
(2) Data Transfer Rate
• Peripherals - Usually slower
• CPU or Memory - Usually
faster than peripherals
– Some kinds of Synchronization
mechanism may be needed

(3) Data formats or Unit of Information


• Peripherals – Byte, Block, …
• CPU or Memory – Word

(4) Operating modes of peripherals may


differ
I/O Bus & Interface

:
• Interface

-Decodes the device address (device code)


-Decodes the commands (operation)
-Provides signals for the peripheral controller
-Synchronizes the data flow and supervises
the transfer rate between peripheral and CPU or
Memory
4 types of command interface can receive : control, status,
data o/p and data i/p
I/O COMMANDS

The control lines are referred as I/O command. I/O Command is an


instruction that is executed in the interface and its attached peripheral
units.
•Control command : is issued to activate peripheral and to
inform what to do

•Status command : used to test various status condition


in the interface and the peripherals

•Data o/p command :causes the interface to respond by


transferring data from the bus into one of its registers

•Data i/p command : interface receives an item of data


from the peripheral and places it in its buffer register.
I/O Bus Vs Memory Bus
•Functions of Buses
• MEMORY BUS is for information transfers between CPU and the MM

• I/O BUS is for information transfers between CPU and I/O devices through
their I/O interface

•Three ways , bus can communicate with memory and I/O :

(1) use two separate buses, one to communicate with memory and
the other with I/O interfaces.

(2) Use one common bus for memory and I/O but separate control
lines for each.

(3) Use one common bus for memory and I/O with common control
• The I/O Bus consists of data lines, address lines and
control lines.
• The I/O bus from the processor is attached to all
peripherals interface.
• To communicate with a particular device, the processor
places a device address on address lines.
• Each Interface decodes the address and control received
from the I/O bus, interprets them for peripherals and
provides signals for the peripheral controller.
• It is also synchronizes the data flow and supervises the
transfer between peripheral and processor.
• Each peripheral has its own controller.
• For example, the printer controller controls the paper
motion, the print timing
Isolated vs. Memory Mapped I/O

Isolated I/O

• Many computers use common bus to transfer


information between memory or I/O.

• Separate I/O read/write control lines in addition to memory


read/write control lines

• Separate (isolated) memory and I/O address spaces

• Distinct input and output instructions

- each associated with address of interface register


•Memory-mapped I/O
- A single set of read/write control lines

- (no distinction between memory and I/O transfer)


- Memory and I/O addresses share the common address space

- reduces memory address range available


- No specific input or output instruction

- The same memory reference instructions can


be used for I/O transfers
- Considerable flexibility in handling I/O operations
Differences Between Isolated I/O and Memory Mapped I/O
Example of I/O Interface
INPUT-OUTPUT PROCESSOR
• The block diagram of a computer along with various I/O
Processors.
• The memory unit occupies the central position and can
communicate with each processor.
• The CPU processes the data required for solving the
computational tasks.
• The IOP provides a path for transfer of data between
peripherals and memory.
• The CPU assigns the task of initiating the I/O program.
• The IOP operates independent from CPU and transfer data
 The communication between the IOP and the devices is similar to the
program control method of transfer. And the communication with the
memory is similar to the direct memory access method.
 In large scale computers, each processor is independent of other
processors and any processor can initiate the operation.
 The CPU can act as master and the IOP act as slave processor.

 The CPU assigns the task of initiating operations but it is the IOP, who
executes the instructions, and not the CPU.
 CPU instructions provide operations to start an I/O transfer. The IOP
asks for CPU through interrupt.
CPU-IOP Communication
I/O TRANSFERS

• Transfer of data is required between CPU and peripherals or


memory or sometimes between any two devices or units of
your computer system.
• This data transfer with the computer is Internal Operation
• .All the internal operations in a digital system are synchronized
by means of clock pulses supplied by a common clock pulse
Generator. The data transfer can be
• i. Synchronous
• ii. Asynchronous
• When both the transmitting and receiving units use same clock
pulse then such a data transfer is called Synchronous process.

• On the other hand, if the there is not concept of clock pulses


and the sender operates at different moment than the
receiver then such a data transfer is called Asynchronous data
transfer.
ASYNCHRONOUS DATA TRANSFER

• This Scheme is used when speed of I/O devices do not match


with microprocessor, and timing characteristics of I/O devices is
not predictable.
• In this method, process initiates the device and check its status.
• In this method two types of techniques are used based on
signals before data transfer.
• 1. Strobe Control
• 2. Handshaking
STROBE CONTROL
* Employs a single control line to time each transfer
* The strobe may be activated by either the source or the destination unit

Source-Initiated Strobe Destination-Initiated Strobe


for Data Transfer for Data Transfer

Block Diagram Block Diagram


Data bus Data bus
Source Destination Source Destination
unit Strobe unit unit Strobe unit

Timing Diagram Timing Diagram


Valid data Valid data
Data Data

Strobe Strobe
HANDSHAKING
Problems in Strobe Methods

Source-Initiated
The source unit that initiates the transfer has no way of knowing
whether the destination unit has actually received data

Destination-Initiated

The destination unit that initiates the transfer no way of


knowing
whether the source has actually placed the data on the bus

To solve this problem, the HANDSHAKE method introduces a


second control signal to provide a Reply to the unit that initiates the
transfer.
SOURCE-INITIATED TRANSFER USING HANDSHAKE
Data bus
Source Data valid Destination
Block Diagram unit Data accepted unit

Valid data
Data bus
Timing Diagram

Data valid

Data accepted

Sequence of Events Source unit Destination unit


Place data on bus.
Enable data valid.
Accept data from bus.
Enable data accepted
Disable data valid.
Invalidate data on bus.
Disable data accepted.
Ready to accept data
(initial state).
* Allows arbitrary delays from one state to the next
* Permits each unit to respond at its own data transfer rate
* The rate of transfer is determined by the slower unit
DESTINATION-INITIATED TRANSFER USING HANDSHAKE
Data bus
Block Diagram Source Data valid Destination
unit Ready for data unit

Timing Diagram Ready for data

Data valid

Valid data
Data bus

Sequence of Events Source unit Destination unit


Ready to accept data.
Place data on bus. Enable ready for
Enable data valid. data.

Accept data from bus.


Disable data valid. Disable ready for data.
Invalidate data on bus
(initial state).

* Handshaking provides a high degree of flexibility and reliability because the


successful completion of a data transfer relies on active participation by both units
* If one unit is faulty, data transfer will not be completed
-> Can be detected by means of a timeout mechanism
Difference between serial and parallel transfer
ASYNCHRONOUS SERIAL TRANSFER
Asynchronous serial transfer
Four Different Types of Transfer Synchronous serial transfer
Asynchronous parallel transfer
Synchronous parallel transfer
Asynchronous Serial Transfer
- Employs special bits which are inserted at both
ends of the character code
- Each character consists of three parts; Start bit; Data bits; Stop bits.

1 1 0 0 0 1 0 1
Start Character bits Stop
bit
(1 bits
bit) (at least
A character can be detected by the receiver from the1 bit)
knowledge
of 4 rules;
- When data are not being sent, the line is kept in the 1-state (idle state)
- The initiation of a character transmission is detected
by a Start Bit , which is always a 0
- The character bits always follow the Start Bit
- After the last character , a Stop Bit is detected
when the line returns to the 1-state for at least 1 bit
time
The receiver knows in advance the transfer rate of the
bits and the number of information bits to
UNIVERSAL ASYNCHRONOUS RECEIVER-TRANSMITTER
- UART -
A typical asynchronous communication interface available as an IC
Transmit
Bidirectional Transmitter Shift data
data bus Bus register register
buffers

Control Transmitter Transmitter


control clock

Internal Bus
register and clock
Chip select
CS
Register select Status Receiver Receiver CS RS Oper. Register selected
RS clock
register control 0 x x None
I/O read Timi and clock 1 0 WR Transmitter register
RD ng
Control 1 1 WR Control register
I/O write and Receive 1 0 RD Receiver register
WR Receiver Shift data
1 1 RD Status register
register
register

Transmitter Register
- Accepts a data byte(from CPU) through the data bus
-Transferred to a shift register for serial transmission
Receiver
- Receives serial information into another shift
register
- Complete data byte is sent to the receiver register
Status Register Bits
-Used for I/O flags and for recording errors
Control Register Bits
- Define baud rate, no. of bits in each
character, whether
to generate and check parity, and no. of
• First In First Out Buffer (FIFO):

• A First In First Out (FIFO) Buffer is a memory unit that stores information in such a

manner that the first item is the item first out. It comes with separate input and

output terminals.

• The important feature of this buffer is that it can input data and output data at two

different rates.

• When placed between two units, the FIFO can accept data from the source unit at one

rate, rate of transfer and deliver the data to the destination unit at another rate.

• If the source is faster than the destination, the FIFO is useful for source data arrive in

bursts that fills out the buffer. FIFO is useful in some applications when data are

transferred asynchronously.
MODES OF TRANSFER

• A computer system is made up


of Processor, Memory & I/O
devices which are
interconnected through bus
system to make transmission
among them.

Fig: Computer System interconnection


MODES OF TRANSFER(cont’d)

• Transfer of data is required between CPU and peripherals or


Memory or sometimes between any two devices or units of
computer system.
• To transfer a data from one unit to another, one should be sure that
both units have proper connection and at the time of data transfer
the receiving unit is not busy.
• This data transfer within the CPU is called as Internal operation.
• Otherwise we can called as External Operation.
• Some modes use the CPU as an intermediate path;
other transfer the data directly to and from the memory unit.
• All the internal operations in a digital system are synchronized by
means of clock pulses supplied by a common clock pulse generator.
• The data transfer can be
1) Synchronous - uses Same or common clock pulse
2) Asynchronous - uses different clock pulses at different moment.
• When both the transmitting and receiving units use same clock pulse
then such a data transfer is called Synchronous process.
• When both the transmitting and receiving units may not use the same
clock pulse then such a data transfer is called Asynchronous process.
• Sometimes, they may not use the concept of clock pulses also. The
sender operates at different moment than the receiver.
MODES OF TRANSFER(Cont’d)

• Data transfer between the central computer and I/O devices


may be handled in a variety of modes.
• Some modes use the CPU as an intermediate path; other
transfer the data directly to and from the memory unit.
• Data transfer to and from peripherals may be handled in one of
three possible modes:
1) Programmed I/O
2) Interrupt-initiated I/O
3) Direct memory access (DMA)
Programmed I/O

• Programmed I/O operations are the result of I/O instructions written in the
computer program.
• Each data item transfer is initiated by an instruction in the program. Usually,
the transfer is to and from a CPU register and peripheral.
• Transferring data under program control requires constant monitoring of
the peripheral by the CPU.
• Once a data transfer is initiated, the CPU is required to monitor the
interface to see when a transfer can again be made.
• It is up to the programmed instructions executed in the CPU to keep close
tabs on everything that is taking place in the interface unit and the I/O
device.
Mode 1 : Programmed I/O (Cont’d)
Example 1:
Programmed I/O
Ex 2:Data transfer from I/O to Memory:
Mode 2: Interrupt-Initiated I/O :

• In this method, interrupts are used. An interrupt command is used to


inform the device about the start and end of transfer. In the meantime
the CPU executes other program.
• When the interface determines that the device is ready for data transfer
it generates an Interrupt Request and sends it to the computer.
• When the CPU receives such an signal, it temporarily stops the execution
of the program and branches to a service program to process the I/O
transfer and after completing it returns back to task, what it was
originally performing.
• In this type of IO, computer does not check the flag. It continue to
perform its task.
Interrupt initiated I/O
• Whenever any device wants the attention, it sends the interrupt signal to
the CPU.
• CPU then deviates from what it was doing, store the return address from PC
and branch to the address of the subroutine
• There are two ways of choosing the branch address:

o Vectored Interrupt

o Non-vectored Interrupt
• In vectored interrupt, the source that interrupt the CPU provides the branch
information. This information is called interrupt vectored.
• In non-vectored interrupt, the branch address is assigned to the fixed
address in the memory.
Direct Memory Access (DMA)
It transfers the block of data between the memory and peripheral
devices of the system, without the participation of the processor.
DMA can be controlled by DMA controller.
DMA controller:
The unit that controls the activity of accessing memory directly is
called a DMA controller.
The processor relinquishes(leaves) the system bus for a few clock
cycles. So, the DMA controller can accomplish the task of data
transfer via the system bus.
DMA Introduction
• In this, a DMA controller substitutes the CPU unit and is
responsible for accessing the input-output devices and
memory for transferring data.
• A DMA controller is dedicated hardware that performs read
and write operations directly without the involvement of the
CPU and saves time that involves Opcode fetching, decoding,
incrementing, and source/destination test addresses that
otherwise, central processing units should do.
• This leads to high data transfer rates between the
peripherals and the memory and communicates large blocks
of data speedily.
Role of DMA
Why it is used?

• For transmission of large or bulk amount of


information directly without using of CPU
time.
• We have two other methods of data transfer
such as programmed I/O and Interrupt driven
I/O, but both are inefficient for bulk transfer
of data.
• Here, CPU time is wasted.
• In programmed I/O, the processor keeps on
scanning/checking/monitoring whether any
device is ready for data transfer. If an I/O device
is ready, the processor fully dedicates itself in
transferring the data between I/O and memory.
• It transfers data at a high rate, but it can’t get
involved in any other activity during data
transfer. This is the major drawback of
programmed I/O.
• In Interrupt driven I/O, whenever the device is
ready for data transfer, then it raises an interrupt
to processor. Processor completes executing its
ongoing instruction and saves its current state. It
then switches to data transfer which causes
a delay. Here, the processor doesn’t keep scanning
for peripherals ready for data transfer. But, it
is fully involved in the data transfer process.
• So, it is also not an effective way of data transfer.
Key points:
• The Programmed IO & Interrupt driven IO are
not useful for transferring a large block of
data.
• But, the DMA controller completes this task at
a faster rate and is also effective for transfer
of large data block.
The DMA controller transfers the data in three modes:
• Burst Mode: Here, once the DMA controller gains the charge of
the system bus, then it releases the system bus only
after completion of data transfer. Till then the CPU has to wait for
the system buses.
• Cycle Stealing Mode: In this mode, the DMA controller forces the
CPU to stop its operation and relinquish the control over the
bus for a short term to DMA controller. After the transfer of
every byte, the DMA controller releases the bus and then again
requests for the system bus. In this way, the DMA controller
steals the clock cycle for transferring every byte.
• Transparent Mode: Here, the DMA controller takes the charge of
system bus only if the processor does not require the system
bus.
Direct Memory Access Controller & it’s Working

• The data transfer is initiated by the start address, the


number of words(i.e word count) to be transferred
in a block, and the direction of transferring
data(control information).
• The DMA controller performs the requested function
as soon it receives the information.
• When the entire block of data is transferred, an
interrupt signal is sent by the controller to inform the
microprocessor that the requested operation has
been completed.
DMA working
•The CPU may be placed in an idle state in a variety of ways. One
common method extensively used in microprocessor is to disable the
buses through special control signals such as:
 Bus Request (BR)
 Bus Grant (BG)
•These two control signals in the CPU that facilitates the DMA transfer.
•The Bus Request (BR) input is used by the DMA controller to request
the CPU.
•When this input is active, the CPU terminates the execution of the
current instruction and places the address bus, data bus and read write
lines into a high Impedance state.
•High Impedance state means that the output is disconnected
Registers used in DMA Controller :
The DMA controller needs the usual circuits of an interface to
communicate with the CPU and I/O device. The DMA controller has
three registers:
i. Address Register
ii. Word Count Register
iii. Control Register
Address Register :- Address Register contains an address to specify
the desired location in memory.
Word Count Register :- WC holds the number of words to be
transferred. The register is increment/decrement by one after each
word transfer and internally tested for zero.
Control Register :- Control Register specifies the mode of transfer
DMA Transfer:
•The CPU communicates with the DMA through the
address and data buses as with any interface unit.
• The DMA has its own address, which activates the DS
and RS lines.
•The CPU initializes the DMA through the data bus.
•Once the DMA receives the start control command, it
can transfer between the peripheral and the memory.
DMA transfer
• Whenever an I/O device wants to transfer the data to or from memory, it
sends the DMA request (DRQ) to the DMA controller. DMA controller
accepts this DRQ and asks the CPU to hold for a few clock cycles by
sending it the Hold request (HLD).
• CPU receives the Hold request (HLD) from DMA controller and
relinquishes the bus and sends the Hold acknowledgement (HLDA) to
DMA controller.
• After receiving the Hold acknowledgement (HLDA), DMA controller
acknowledges I/O device (DACK) that the data transfer can be performed
and DMA controller takes the charge of the system bus and transfers the
data to or from memory.
• When the data transfer is accomplished, the DMA raise an interrupt to
let know the processor that the task of data transfer is finished and the
processor can take control over the bus again and start processing where
it has left.
• Now the DMA controller can be a separate unit that is shared by various
I/O devices, or it can also be a part of the I/O device interface.
• When it is instructed by the program to read i.e
R/W is 1, data is transferred from the memory to
the I/O device and when it is 0, it writes data from
the peripheral to the main memory. When the
chunk of the data is entirely transferred, DMA is
ready to take in further commands. This is
represented by setting the Done flag to 1. After that
IE flag is raised by the DMA that enables an
interrupt for the processor and also IRQ bit goes to
1 when DMA has requested an interrupt.
DMAC Controller Registers

•  It has registers for the purpose of storing the


addresses, word count, and control signals. The
processor accesses controller registers to start the
data transfer operations. There are two registers i.e
address register and word count register to store
memory address where the data is going to be
stored and word count respectively and a control
register to keep the status and control flags. Along
with that, there is a Read/Write bit that determines
the direction of data communication.
• Whenever a processor is requested to read or write a block of data, i.e.
transfer a block of data, it instructs the DMA controller by sending the
following information.
• The first information is whether the data has to be read from memory or
the data has to be written to the memory. It passes this information
via read or write control lines that is between the processor and DMA
controllers control logic unit.
• The processor also provides the starting address of/ for the data block in
the memory, from where the data block in memory has to be read or
where the data block has to be written in memory. DMA controller stores
this in its address register. It is also called the starting address register.
• The processor also sends the word count, i.e. how many words are to be
read or written. It stores this information in the data count or the word
count register.
• The most important is the address of I/O device that wants to read or
write data. This information is stored in the data register.
Direct Memory Access Advantages and Disadvantages

Advantages:
• Transferring the data without the involvement of the processor
will speed up the read-write task.
• DMA reduces the clock cycle requires to read or write a block of
data.
• Implementing DMA also reduces the overhead of the processor.
Disadvantages:
• As it is a hardware unit, it would cost to implement a DMA
controller in the system.
• Cache coherence problem can occur while using DMA controller.
Key points
• DMA is an abbreviation of direct memory access.
• DMA is a method of data transfer between main memory and peripheral devices.
• The hardware unit that controls the DMA transfer is a DMA controller.
• DMA controller transfers the data to and from memory without the participation of the processor.
• The processor provides the start address and the word count of the data block which is transferred
to or from memory to the DMA controller and frees the bus for DMA controller to transfer the
block of data.
• DMA controller transfers the data block at the faster rate as data is directly accessed by I/O devices
and is not required to pass through the processor which save the clock cycles.
• DMA controller transfers the block of data to and from memory in three modes burst mode, cycle
steal mode and transparent mode.
• DMA can be configured in various ways it can be a part of individual I/O devices, or all the
peripherals attached to the system may share the same DMA controller.
• Thus the DMA controller is a convenient mode of data transfer. It is preferred over the programmed
I/O and Interrupt-driven I/O mode of data transfer.
DMA Advantages

• Fast Data transfer between Memory and I/O


devices
• CPU and DMA can operate concurrently and
provides better performance
• More Efficient use of external interrupts
• Higher data throughput
• I/O devices and memory devices
communicate directly by DMA controller.
Interrupts

Data transfer between the CPU and the peripherals is initiated


by the CPU. But the CPU cannot start the transfer unless the
peripheral is ready to communicate with the CPU.
When a device is ready to communicate with the CPU, it
generates an interrupt signal.
A number of input-output devices are attached to the
computer and each device is able to generate an interrupt
request.
The main job of the interrupt system is to identify the source
of the interrupt.
There is also a possibility that several devices will request
simultaneously for CPU communication. Then, the interrupt
system has to decide which device is to be serviced first.
Priority Interrupt

 A priority interrupt is a system which decides the priority at


which various devices, which generates the interrupt signal at the
same time, will be serviced by the CPU.
 The system has authority to decide which conditions are allowed
to interrupt the CPU, while some other interrupt is being
serviced.
 Generally, devices with high speed transfer such as magnetic
disks are given high priority and slow devices such
as keyboards are given low priority.
 When two or more devices interrupt the computer
simultaneously, the computer services the device with the higher
priority first.
Daisy Chaining Priority
 This way of deciding the interrupt priority consists of serial connection
of all the devices which generates an interrupt signal.
 The device with the highest priority is placed at the first position
followed by lower priority devices and the device which has lowest
priority among all is placed at the last in the chain.
 In daisy chaining system all the devices are connected in a serial form.
 The interrupt line request is common to all devices.
 If any device has interrupt signal in low level state then interrupt line
goes to low level state and enables the interrupt input in the CPU.
 When there is no interrupt the interrupt line stays in high level state.
 The CPU respond to the interrupt by enabling the interrupt
acknowledge line.
 This signal is received by the device 1 at its PI input.
 The acknowledge signal passes to next device through PO output only
if device 1 is not requesting an interrupt.
Types of Interrupts - H/W & S/W

Hardware Interrupts

•When the signal for the processor is from an external device or

hardware then this interrupts is known as hardware interrupt.

•Maskable Interrupt

The hardware interrupts which can be delayed or ignored when a

much high priority interrupt has occurred at the same time.

•Non Maskable Interrupt

The hardware interrupts which cannot be delayed or ignored and

should be processed by the processor immediately.


Software Interrupts

• The interrupt that is caused by any internal system


of the computer system is known as a software
interrupt. It can also be of two types:
• Normal Interrupt
The interrupts that are caused by software
instructions are called normal software interrupts.
• Exception
Unplanned interrupts which are produced during
the execution of some program are
called exceptions, such as division by zero.
INTERRUPTS AND EXCEPTIONS

•Interrupts, Exceptions and Traps are asynchronous


changes in the control flow.
•Interrupts and Exceptions can be viewed as asynchronous
(unscheduled) procedure calls.
Exception: a change in execution caused by a condition that occurs
within the processor.
 segmentation fault (access outside program boundaries, illegal access, .)
 bus error
 divide by 0
 overflow
 page fault(virtual memory…)
Interrupt: a change in execution caused by an external event
 devices: disk, network, keyboard, etc. clock for timesharing (multitasking)
These are useful events, must do something when they occur.
Trap: a user-requested exception
Operating system call (syscall)
Breakpoints (debugging mode)
What is a trap?
• In computing and operating systems, a trap, also known as an exception
or a fault, is typically a type of synchronous interrupt typically caused
by an exceptional condition (e.g., breakpoint, division by zero, invalid
memory access).
• OS moves from user mode to kernel mode.
• A trap usually results in a switch to kernel mode, wherein the operating
system performs some action before returning control to the
originating process.
• A trap in a system process is more serious than a trap in a user process,
and in some systems is fatal.
• In some usages, the term trap refers specifically to an interrupt
intended to initiate a context switch to a monitor program or debugger
What is a break point?
• A breakpoint is an intentional stopping or pausing place in a 
program, put in place for  debugging  purposes.
• It is also sometimes simply referred to as a pause.
• a breakpoint is a means of acquiring knowledge about a
program during its execution.
• During the interruption, the programmer inspects the test 
environment (general purpose registers, memory, logs, files,
etc.) to find out whether the program is functioning as
expected.
• In practice, a breakpoint consists of one or more conditions
that determine when a program's execution should be
interrupted.
I/O DEVICE INTERFACES

• The processor bus is the bus defined by the signals on the processor chip itself.
• Devices that require a very high-speed connection to the processor, such as
the main memory, may be connected directly to this bus.
• Some standards have been developed through industrial cooperative efforts,
even among competing companies driven by their common self-interest in
having compatible products.
• In some cases, organizations such as the IEEE (Institute of Electrical and
Electronics Engineers), ANSI (American National Standards Institute), or
international bodies such as ISO (International Standards Organization) have
blessed these standards and given them an official status.
SCSI(SMALL COMPUTER SYSTEM INTERFACE)

• It refers to a standard bus defined by the American


National Standards Institute (ANSI) under the
designation X3.131 .
• In the original specifications of the standard, devices
such as disks are connected to a computer via a 50-
wire cable, which can be up to 25 meters in length
and can transfer data at rates up to 5 megabytes/s
• SCSI (Small Computer System Interface) is an
independent processor standard for system-
level interface used between computers and
intelligent devices (including hard disk, floppy
drive, optical drive, printer, scanner, and so
on).
• It is an intelligent and universal interface
standard. 
SCSI interface
Characteristics of SCSI

• SCSI is an universal interface. On the SCSI bus, the host adapter can
be connected with 8 SCSI peripheral controllers. Peripherals include
disk, tape, CD-ROM, rewriteable optical drive, printer, scanner,
communication equipment, etc.
• SCSI is a multi-task interface with bus arbitration function. Multiple
peripherals hung on one SCSI bus can work simultaneously. And
SCSI devices have equal possession of the bus.
• The connecting cable can reach 6 meters when SCSI interface is
connected to external devices.
• SCSI interface can transmit data synchronously and asynchronously.
The synchronous transmission rate reaches 10MB/s, and the
asynchronously transmission rate reaches 1.5MB/s. 
• The SCSI bus standard has undergone many revisions,
and its data transfer capability has increased very
rapidly, almost doubling every two years.
• A SCSI bus may have eight data lines, in which case it
is called a narrow bus and transfers data one byte at a
time.
• Alternatively, a wide SCSI bus has 16 data lines and
transfers data 16 bits at a time.
• There are also several options for the electrical signaling
scheme used.
USB (Universal Serial Bus)
• USB supports 3 speed of operation. They are,

1) Low speed (1.5Mb/s) interconnection

2) Full speed (12mb/s)

3) High speed ( 480mb/s)

The USB has been designed to meet the key objectives. They are

• It provide a simple, low cost & easy to use interconnection system that

overcomes the difficulties due to the limited number of I/O ports

available on a computer.

• It accommodate a wide range of data transfer characteristics for I/O

devices including telephone & Internet connections.

• Enhance user convenience through ‘Plug & Play’ mode of operation.


USB
UNIT – 5

CHAPTER-2

PIPELING AND VECTOR PROCESSING

1
Pipelining and Vector Processing:
•Basic concepts
•Instruction level Parallelism
• Throughput and Speedup
•Pipeline hazards
•Case Study- Introduction to x86 i.e 8086
processor (micro processor) architecture
Parallel processing
• Parallel processing is a term used for a large class of techniques
that are used to provide simultaneous data-processing tasks for
the purpose of increasing the computational speed of a
computer system.
• It refers to techniques that are used to provide simultaneous
data processing.
• The system may have two or more ALUs to be able to execute
two or more instruction at the same time.
• The system may have two or more processors operating
concurrently.
• It can be achieved by having multiple functional units that
perform same or different operation simultaneously.
Example of parallel Processing:
• Multiple Functional Unit: Separate the execution unit into eight
functional units operating in parallel.
Processor with Multiple Functional UNITS
• There are variety of ways in which the parallel
processing can be classified
o Internal Organization of Processor
o Interconnection structure between processors
o Flow of information through system
• Michel J.Flynn's classification
Based on the multiplicity of Instruction Streams and Data
Streams
Instruction Stream
Sequence of Instructions read from memory
Data Stream
Operations performed on the data in the processor
• Single Instruction, Single Data (SISD): This is just a standard non-parallel
processor. We usually refer to this as a scalar processor.

• Single Instruction, Multiple Data (SIMD): A single operation (task)


executes simultaneously on multiple elements of data. The number of
elements in a SIMD operation can vary from a small number, such as the
4 to 16 elements in short vector instructions, to thousands, as in
streaming vector processors.
• SIMD processors are also known as array /vector processors, since they
consist of an array of functional units with a shared controller.

• Multiple Instruction, Multiple Data (MIMD): Separate instruction


streams, each with its own flow of control, operate on separate data.
FLYNNS - 4

SISD represents the organization containing single


control unit, a processor unit and a memory unit.
 Instruction are executed sequentially and system
may or may not have internal parallel processing
capabilities.
 SIMD represents an organization that includes many
processing units under the supervision of a common
control unit.
 MISD structure is of only theoretical interest since no
practical system has been constructed using this
organization.
 MIMD organization refers to a computer system capable
of processing several programs at the same time.
SIMD orga ---Array/Vector Processing

• Vector?
• A vector, in programming, is a type of array
that is one dimensional.
• Vectors are a logical element , that are used
for storing data.
• Vectors are similar to arrays but their actual
implementation and operation differs.
• A vector processor or array processor is a 
central processing unit (CPU) that implements
an instruction set where its instructions are
designed to operate efficiently and effectively
on large one-dimensional arrays of data
called vectors.
• This is in contrast to scalar processors, whose
instructions operate on single data items only.
• SISD for Scalar processing.
SIMD
MISD
MIMD
Non-overlapped execution: sequential fashion, Total = 8 clock cycles

Stage /
1 2 3 4 5 6 7 8
Cycle

S1 I1 I2

S2 I1 I2

S3 I1 I2

S4 I1 I2
Overlapped execution: Total time = 5 Cycle

Stage / Cycle 1 2 3 4 5

S1 I1 I2

S2 I1 I2

S3 I1 I2

S4 I1 I2
PIPELINING
• Pipelining is the process of accumulating instruction

from the processor through a pipeline.

• It allows storing and executing instructions in an

orderly process. It is also known as pipeline processing.

• Pipelining is a technique where multiple instructions are

overlapped during execution.

• Pipeline is divided into stages and these stages are

connected with one another to form a pipe like

structure.
• Instructions enter from one end and exit from another

end. Pipelining increases the overall instruction

throughput.

• In pipeline system, each segment consists of an input

register followed by a combinational circuit.

• The register is used to hold data and combinational

circuit performs operations on it.

• The output of combinational circuit is applied to the

input register of the next segment.


S1,S2,S3 are Segments
R1,R2,R3 are registers
Instruction pipeline 5-phase pipeline
basic – 2 phases fetch+execution
3 phase -f+e+s
4 phase -f+d+e+s
5 phase - f+d+operand fetch+ex+store
6 phase_-f+d+of+ex+strore+arite back.
Example of Pipelining Process
• Performance of Pipelined Execution-

• The following parameters serve as criterion to


estimate the performance of pipelined
execution-
-Speed Up
-Efficiency
-Throughput
1. Speed Up-

• It gives an idea of “how much faster” the


pipelined execution is as compared to non-
pipelined execution.
• It is calculated as-
Speed up = Non-pipelined execution/
pipelined execution
2. Efficiency-

• The efficiency of pipelined execution is


calculated as-
3. Throughput-

• Throughput is defined as number of


instructions executed per unit time.
• It is calculated as-
 
• Calculating Non-Pipelined Execution Time-

• In non-pipelined architecture,
The instructions execute one after the other.
• The execution of a new instruction begins only after the previous
instruction has executed completely.
• So, number of clock cycles taken by each instruction = k clock cycles

Thus,
• Non-pipelined execution time
= Total number of instructions x Time taken to execute one
instruction
= n x k clock cycles
• Calculating Pipelined Execution Time-

• In pipelined architecture,
Multiple instructions execute parallely.
• Number of clock cycles taken by the first instruction = k clock cycles
• After first instruction has completely executed, one instruction comes out per clock cycle.
• So, number of clock cycles taken by each remaining instruction = 1 clock cycle

Thus,
• Pipelined execution time
= Time taken to execute first instruction + Time taken to execute remaining
instructions
= 1 x k clock cycles + (n-1) x 1 clock cycle
= (k + n – 1) clock cycles
• Calculating Speed Up-
• Speed up
= Non-pipelined execution time / Pipelined
execution time
= n x k clock cycles / (k + n – 1) clock cycles
= n x k / (k + n – 1)
= n x k / n + (k – 1)
= k / { 1 + (k – 1)/n }
Example for Pipelined, Non-Pipelined & Speed up.
Instruction Pipeline

• Pipeline processing can occur not only in the data stream but
in the instruction stream as well.
• Most of the digital computers with complex instructions
require instruction pipeline to carry out operations like fetch,
decode and execute instructions.
• In general, the computer needs to process each instruction
with the following sequence of steps.
1)Fetch instruction from memory.
2)Decode the instruction.
3)Calculate the effective address.
4)Fetch the operands from memory.
5)Execute the instruction.
6)Store the result in the proper place.
6 stage pipeline with 14 cycles instead of 54(9*6) cycles
• Each step is executed in a particular segment, and there are
times when different segments may take different times to
operate on the incoming information.
• Moreover, there are times when two or more segments may
require memory access at the same time, causing one segment
to wait until another is finished with the memory.
• The organization of an instruction pipeline will be more efficient
if the instruction cycle is divided into segments of equal
duration.
• One of the most common examples of this type of organization
is a Four-segment instruction pipeline.
• A four-segment instruction pipeline combines two or more
different segments and makes it as a single one.
• For instance, the decoding of the instruction can be combined
with the calculation of the effective address into one segment.
• The instruction cycle is completed in four segments.
Segment 1:
• The instruction fetch segment can be implemented
using first in, first out (FIFO) buffer.
Segment 2:
• The instruction fetched from memory is decoded in the
second segment, and eventually, the effective address
is calculated in a separate arithmetic circuit.
Segment 3:
• An operand from memory is fetched in the third
segment.
Segment 4:
• The instructions are finally executed in the last
segment of the pipeline organization.
Instruction-Level Parallelism
Pipeline Hazards:
• There are situations or problems with the instruction pipeline in CPU micro

architectures when the next instruction cannot execute in the following clock

cycle and can potentially lead to incorrect results.

• Hazards reduce the performance from the ideal speedup gained by

pipelining.

• pipeline stall is also referred to as a pipeline bubble.


• There are three types of hazards:
Resource(or) Structural,
Data,
Control.
Data Hazards:
• A data hazard is any condition in which either the source or the
destination operands of an instruction are not available at the time
expected in the pipeline.
• As a result of which some operation has to be delayed and the pipeline
stalls.
• Whenever there are two instructions one of which depends on the data
obtained from the other.
A=3+A
B=A*4
• For the above sequence, the second instruction needs the value of ‘A’
computed in the first instruction.
• Thus the second instruction is said to depend on the first.
• If the execution is done in a pipelined processor, it is highly likely that
the interleaving of these two instructions can lead to incorrect results
due to data dependency between the instructions. Thus the pipeline
needs to be stalled as and when necessary to avoid errors.
8086 Microprocessor

• Definition: 8086 is a 16-bit microprocessor and was designed


in 1978 by Intel.
• It is a 40 pin chip.
• 8086 microprocessor has 20-bit address bus. Thus, is able to
access 220 i.e., 1 MB address in the memory.
• Purpose: It performs arithmetic and logic operations.
• It is able to perform these operations with 16-bit data in one
cycle. Hence is a 16-bit microprocessor.
• Thus the size of the data bus is 16-bit as it can carry 16-bit
data at a time. The architecture of 8086 microprocessor, is very
much different from that of 8085 microprocessor.
Architecture of 8 086:
 

8086 has two blocks 1) BIU (BUS INTERFACR UNIT) and 2) EU (EXECUTION UNIT)

•1)The BIU performs all bus operations such as instruction fetching, reading and

writing operands for memory and calculating the addresses of the memory operands . It

has Instruction queue.

• The instruction bytes are transferred to the instruction queue. 

•2) EU executes instructions from the instruction system byte queue. 

•Both units operate asynchronously to give the 8086 an overlapping instruction fetch and

execution mechanism which is called as Pipelining.

•BIU contains Instruction queue, Segment registers, Instruction pointer, and Address

adder. 

•EU contains Control circuitry, Instruction decoder, ALU, Pointer and Index register, Flag

register.
Bus Interface Unit (BIU)

• The Bus Interface Unit (BIU) manages the data, address and control
buses.
• The BIU functions are..
1)Fetches the sequenced instruction from the memory.
2)Finds the physical address of that location in the memory where the
instruction is stored and
3)Manages the 6-byte pre-fetch queue where the pipelined instructions
are stored.
• An 8086 microprocessor exhibits a property of pipelining the instructions
in a queue while performing decoding and execution of the previous
instruction.
• This saves the processor time of operation by a large amount. This
pipelining is done in a 6-byte queue.
• Also, the BIU contains 4 segment registers. Each segment register is of
16-bit. The segments are present in the memory and these registers
hold the address of all the segments. These registers are as follows:
1.Code segment register: It is a 16-bit register and holds the address of
the instruction or program stored in the code segment of the memory.
Also, the IP in the block diagram is the instruction pointer which is a
default register that is used by the processor in order to get the desired
instruction.
2. Stack segment register: The stack segment register provides the
starting address of stack segment in the memory. Like in stack pointer,
PUSH and POP operations are used in this segment to give and take the
data to/from it.
3. Data segment register: It holds the address of the data segment. The
data segment stores the data in the memory whose address is present in
this 16-bit register.
4. Extra segment register: Here the starting address of the extra segment is
present. This register basically contains the address of the string data.
6-byte pre-fetch queue: This queue is used in 8086 in order to perform
pipelining.
As at the time of decoding and execution of the instruction in EU, the BIU
fetches the sequential upcoming instructions and stores it in this queue.
• The size of this queue is 6-byte. This means at maximum a 6-byte
instruction can be stored in this queue.
• The queue exhibits  FIFO behaviour., first in first out.
Execution Unit (EU)
• The Execution Unit (EU) performs the decoding and execution of the
instructions that are being fetched from the desired memory location.
• Control Unit:
The control unit in 8086 microprocessor produces control signal after
decoding the opcode to inform the general purpose register to release
the value stored in it. And it also signals the ALU to perform the desired
operation.
• ALU:
• The arithmetic and logic unit carries out the logical tasks according to
the signal generated by the CU. The result of the operation is stored in
the desired register.
• Flag:
• Like in 8085, here also the flag register holds the status
of the result generated by the ALU. It has several flags
that show the different conditions of the result.
• Operand:
• It is a temporary register and is used by the processor
to hold the temporary values at the time of operation.
• The reason behind two separate sections for BIU and
EU in the architecture of 8086 is to perform fetching
and decoding-executing simultaneously.
Working of 8086 Microprocessor

• when an instruction is to be fetched from the


memory, then firstly its physical address must be
calculated and this is done at the BIU.
• The physical address(PA) of an instruction is
given as:
• PA = Segment address Χ 10 + Offset
• For example: Suppose the segment address is
0A00 H and the offset address is 0100H. So, the
generated physical address is 0A100 H.
Physical address calculation
• Here, the code segment register provides the base address of the code
segment which is combined with the offset address.
• The code segment contains the instructions. Each time an instruction is
fetched the offset address inside the code segment gets incremented.
• So, once the physical address of an instruction is calculated by the BIU of
the processor, it sends the memory location by the address bus to the
memory.
• Further, the desired instruction at that memory location which is present in
the form of the opcode is fetched by the microprocessor through the data
bus.
• Suppose the instruction is ADD BL, CL. But, inside the memory, it will be in
the form of an opcode. So, this opcode is sent to the control unit.
• The control unit decodes the opcode and generates control signals that
inform the BL and CL register to release the value stored in it. Also, it
signals the ALU to perform the ADD operation on that particular data.
Decode &Execution by EU
Tutorial Questions
1)Explain about different modes of data transfer?
2)Explain about DMA controller with neat diagram?
3)What is a pipeline? Explain pipeline hazards?
4)Describe IO processor organization in detail.
5)Explain asynchronous communication interface with
neat diagram.
6)Explain a)Arithmetic pipeline b)Instruction pipeline.
7)Explain about 8086 architecture?
8)What is a Vector processing /array processing ?

You might also like