Professional Documents
Culture Documents
Modes of Transfer 3-7-21
Modes of Transfer 3-7-21
• Peripherals are not directly connected to the system bus instead an I/O module is used
which contains logic for performing a communication between the peripherals and the
system bus.
• The reasons due to which peripherals do not directly connected to the system bus are:
-There are a wide variety of peripherals with various methods of operation. It would be
impractical to incorporate the necessary logic within the processor to control a range of
devices.
-The data transfer rate of peripherals is often much slower than that of the memory or
processor. Thus, it is impractical to use high speed system bus to communicate directly
with a peripheral and vice versa.
-Peripherals often use different data format and word length than the computer to
which they are connected.
Thus an I/O module is required which performs two major functions.
• Interface to the processor and memory via the system bus
• Interface to one or more peripherals by tailored data links
I/O Modules(Cont’d)
The detailed functions of I/O modules are:
2) Processor Communication:
I/O module communicates with the processor which
involves
- Command decoding: I/O module accepts commands
from the processor.
- Data: Data are exchanged between the processor and
I/O module over the bus.
- Status reporting: Peripherals are too slow and it is
important to know the status of I/O module.
- Address recognition: I/O module must recognize one
unique address for each peripheral it controls.
2. I/O Modules(Cont’d)
operation.
:
• Interface
• I/O BUS is for information transfers between CPU and I/O devices through
their I/O interface
(1) use two separate buses, one to communicate with memory and
the other with I/O interfaces.
(2) Use one common bus for memory and I/O but separate control
lines for each.
(3) Use one common bus for memory and I/O with common control
• The I/O Bus consists of data lines, address lines and
control lines.
• The I/O bus from the processor is attached to all
peripherals interface.
• To communicate with a particular device, the processor
places a device address on address lines.
• Each Interface decodes the address and control received
from the I/O bus, interprets them for peripherals and
provides signals for the peripheral controller.
• It is also synchronizes the data flow and supervises the
transfer between peripheral and processor.
• Each peripheral has its own controller.
• For example, the printer controller controls the paper
motion, the print timing
Isolated vs. Memory Mapped I/O
Isolated I/O
The CPU assigns the task of initiating operations but it is the IOP, who
executes the instructions, and not the CPU.
CPU instructions provide operations to start an I/O transfer. The IOP
asks for CPU through interrupt.
CPU-IOP Communication
I/O TRANSFERS
Strobe Strobe
HANDSHAKING
Problems in Strobe Methods
Source-Initiated
The source unit that initiates the transfer has no way of knowing
whether the destination unit has actually received data
Destination-Initiated
Valid data
Data bus
Timing Diagram
Data valid
Data accepted
Data valid
Valid data
Data bus
1 1 0 0 0 1 0 1
Start Character bits Stop
bit
(1 bits
bit) (at least
A character can be detected by the receiver from the1 bit)
knowledge
of 4 rules;
- When data are not being sent, the line is kept in the 1-state (idle state)
- The initiation of a character transmission is detected
by a Start Bit , which is always a 0
- The character bits always follow the Start Bit
- After the last character , a Stop Bit is detected
when the line returns to the 1-state for at least 1 bit
time
The receiver knows in advance the transfer rate of the
bits and the number of information bits to
UNIVERSAL ASYNCHRONOUS RECEIVER-TRANSMITTER
- UART -
A typical asynchronous communication interface available as an IC
Transmit
Bidirectional Transmitter Shift data
data bus Bus register register
buffers
Internal Bus
register and clock
Chip select
CS
Register select Status Receiver Receiver CS RS Oper. Register selected
RS clock
register control 0 x x None
I/O read Timi and clock 1 0 WR Transmitter register
RD ng
Control 1 1 WR Control register
I/O write and Receive 1 0 RD Receiver register
WR Receiver Shift data
1 1 RD Status register
register
register
Transmitter Register
- Accepts a data byte(from CPU) through the data bus
-Transferred to a shift register for serial transmission
Receiver
- Receives serial information into another shift
register
- Complete data byte is sent to the receiver register
Status Register Bits
-Used for I/O flags and for recording errors
Control Register Bits
- Define baud rate, no. of bits in each
character, whether
to generate and check parity, and no. of
• First In First Out Buffer (FIFO):
• A First In First Out (FIFO) Buffer is a memory unit that stores information in such a
manner that the first item is the item first out. It comes with separate input and
output terminals.
• The important feature of this buffer is that it can input data and output data at two
different rates.
• When placed between two units, the FIFO can accept data from the source unit at one
rate, rate of transfer and deliver the data to the destination unit at another rate.
• If the source is faster than the destination, the FIFO is useful for source data arrive in
bursts that fills out the buffer. FIFO is useful in some applications when data are
transferred asynchronously.
MODES OF TRANSFER
• Programmed I/O operations are the result of I/O instructions written in the
computer program.
• Each data item transfer is initiated by an instruction in the program. Usually,
the transfer is to and from a CPU register and peripheral.
• Transferring data under program control requires constant monitoring of
the peripheral by the CPU.
• Once a data transfer is initiated, the CPU is required to monitor the
interface to see when a transfer can again be made.
• It is up to the programmed instructions executed in the CPU to keep close
tabs on everything that is taking place in the interface unit and the I/O
device.
Mode 1 : Programmed I/O (Cont’d)
Example 1:
Programmed I/O
Ex 2:Data transfer from I/O to Memory:
Mode 2: Interrupt-Initiated I/O :
o Vectored Interrupt
o Non-vectored Interrupt
• In vectored interrupt, the source that interrupt the CPU provides the branch
information. This information is called interrupt vectored.
• In non-vectored interrupt, the branch address is assigned to the fixed
address in the memory.
Direct Memory Access (DMA)
It transfers the block of data between the memory and peripheral
devices of the system, without the participation of the processor.
DMA can be controlled by DMA controller.
DMA controller:
The unit that controls the activity of accessing memory directly is
called a DMA controller.
The processor relinquishes(leaves) the system bus for a few clock
cycles. So, the DMA controller can accomplish the task of data
transfer via the system bus.
DMA Introduction
• In this, a DMA controller substitutes the CPU unit and is
responsible for accessing the input-output devices and
memory for transferring data.
• A DMA controller is dedicated hardware that performs read
and write operations directly without the involvement of the
CPU and saves time that involves Opcode fetching, decoding,
incrementing, and source/destination test addresses that
otherwise, central processing units should do.
• This leads to high data transfer rates between the
peripherals and the memory and communicates large blocks
of data speedily.
Role of DMA
Why it is used?
Advantages:
• Transferring the data without the involvement of the processor
will speed up the read-write task.
• DMA reduces the clock cycle requires to read or write a block of
data.
• Implementing DMA also reduces the overhead of the processor.
Disadvantages:
• As it is a hardware unit, it would cost to implement a DMA
controller in the system.
• Cache coherence problem can occur while using DMA controller.
Key points
• DMA is an abbreviation of direct memory access.
• DMA is a method of data transfer between main memory and peripheral devices.
• The hardware unit that controls the DMA transfer is a DMA controller.
• DMA controller transfers the data to and from memory without the participation of the processor.
• The processor provides the start address and the word count of the data block which is transferred
to or from memory to the DMA controller and frees the bus for DMA controller to transfer the
block of data.
• DMA controller transfers the data block at the faster rate as data is directly accessed by I/O devices
and is not required to pass through the processor which save the clock cycles.
• DMA controller transfers the block of data to and from memory in three modes burst mode, cycle
steal mode and transparent mode.
• DMA can be configured in various ways it can be a part of individual I/O devices, or all the
peripherals attached to the system may share the same DMA controller.
• Thus the DMA controller is a convenient mode of data transfer. It is preferred over the programmed
I/O and Interrupt-driven I/O mode of data transfer.
DMA Advantages
Hardware Interrupts
•Maskable Interrupt
• The processor bus is the bus defined by the signals on the processor chip itself.
• Devices that require a very high-speed connection to the processor, such as
the main memory, may be connected directly to this bus.
• Some standards have been developed through industrial cooperative efforts,
even among competing companies driven by their common self-interest in
having compatible products.
• In some cases, organizations such as the IEEE (Institute of Electrical and
Electronics Engineers), ANSI (American National Standards Institute), or
international bodies such as ISO (International Standards Organization) have
blessed these standards and given them an official status.
SCSI(SMALL COMPUTER SYSTEM INTERFACE)
• SCSI is an universal interface. On the SCSI bus, the host adapter can
be connected with 8 SCSI peripheral controllers. Peripherals include
disk, tape, CD-ROM, rewriteable optical drive, printer, scanner,
communication equipment, etc.
• SCSI is a multi-task interface with bus arbitration function. Multiple
peripherals hung on one SCSI bus can work simultaneously. And
SCSI devices have equal possession of the bus.
• The connecting cable can reach 6 meters when SCSI interface is
connected to external devices.
• SCSI interface can transmit data synchronously and asynchronously.
The synchronous transmission rate reaches 10MB/s, and the
asynchronously transmission rate reaches 1.5MB/s.
• The SCSI bus standard has undergone many revisions,
and its data transfer capability has increased very
rapidly, almost doubling every two years.
• A SCSI bus may have eight data lines, in which case it
is called a narrow bus and transfers data one byte at a
time.
• Alternatively, a wide SCSI bus has 16 data lines and
transfers data 16 bits at a time.
• There are also several options for the electrical signaling
scheme used.
USB (Universal Serial Bus)
• USB supports 3 speed of operation. They are,
The USB has been designed to meet the key objectives. They are
• It provide a simple, low cost & easy to use interconnection system that
available on a computer.
CHAPTER-2
1
Pipelining and Vector Processing:
•Basic concepts
•Instruction level Parallelism
• Throughput and Speedup
•Pipeline hazards
•Case Study- Introduction to x86 i.e 8086
processor (micro processor) architecture
Parallel processing
• Parallel processing is a term used for a large class of techniques
that are used to provide simultaneous data-processing tasks for
the purpose of increasing the computational speed of a
computer system.
• It refers to techniques that are used to provide simultaneous
data processing.
• The system may have two or more ALUs to be able to execute
two or more instruction at the same time.
• The system may have two or more processors operating
concurrently.
• It can be achieved by having multiple functional units that
perform same or different operation simultaneously.
Example of parallel Processing:
• Multiple Functional Unit: Separate the execution unit into eight
functional units operating in parallel.
Processor with Multiple Functional UNITS
• There are variety of ways in which the parallel
processing can be classified
o Internal Organization of Processor
o Interconnection structure between processors
o Flow of information through system
• Michel J.Flynn's classification
Based on the multiplicity of Instruction Streams and Data
Streams
Instruction Stream
Sequence of Instructions read from memory
Data Stream
Operations performed on the data in the processor
• Single Instruction, Single Data (SISD): This is just a standard non-parallel
processor. We usually refer to this as a scalar processor.
• Vector?
• A vector, in programming, is a type of array
that is one dimensional.
• Vectors are a logical element , that are used
for storing data.
• Vectors are similar to arrays but their actual
implementation and operation differs.
• A vector processor or array processor is a
central processing unit (CPU) that implements
an instruction set where its instructions are
designed to operate efficiently and effectively
on large one-dimensional arrays of data
called vectors.
• This is in contrast to scalar processors, whose
instructions operate on single data items only.
• SISD for Scalar processing.
SIMD
MISD
MIMD
Non-overlapped execution: sequential fashion, Total = 8 clock cycles
Stage /
1 2 3 4 5 6 7 8
Cycle
S1 I1 I2
S2 I1 I2
S3 I1 I2
S4 I1 I2
Overlapped execution: Total time = 5 Cycle
Stage / Cycle 1 2 3 4 5
S1 I1 I2
S2 I1 I2
S3 I1 I2
S4 I1 I2
PIPELINING
• Pipelining is the process of accumulating instruction
structure.
• Instructions enter from one end and exit from another
throughput.
• In non-pipelined architecture,
The instructions execute one after the other.
• The execution of a new instruction begins only after the previous
instruction has executed completely.
• So, number of clock cycles taken by each instruction = k clock cycles
Thus,
• Non-pipelined execution time
= Total number of instructions x Time taken to execute one
instruction
= n x k clock cycles
• Calculating Pipelined Execution Time-
• In pipelined architecture,
Multiple instructions execute parallely.
• Number of clock cycles taken by the first instruction = k clock cycles
• After first instruction has completely executed, one instruction comes out per clock cycle.
• So, number of clock cycles taken by each remaining instruction = 1 clock cycle
Thus,
• Pipelined execution time
= Time taken to execute first instruction + Time taken to execute remaining
instructions
= 1 x k clock cycles + (n-1) x 1 clock cycle
= (k + n – 1) clock cycles
• Calculating Speed Up-
• Speed up
= Non-pipelined execution time / Pipelined
execution time
= n x k clock cycles / (k + n – 1) clock cycles
= n x k / (k + n – 1)
= n x k / n + (k – 1)
= k / { 1 + (k – 1)/n }
Example for Pipelined, Non-Pipelined & Speed up.
Instruction Pipeline
• Pipeline processing can occur not only in the data stream but
in the instruction stream as well.
• Most of the digital computers with complex instructions
require instruction pipeline to carry out operations like fetch,
decode and execute instructions.
• In general, the computer needs to process each instruction
with the following sequence of steps.
1)Fetch instruction from memory.
2)Decode the instruction.
3)Calculate the effective address.
4)Fetch the operands from memory.
5)Execute the instruction.
6)Store the result in the proper place.
6 stage pipeline with 14 cycles instead of 54(9*6) cycles
• Each step is executed in a particular segment, and there are
times when different segments may take different times to
operate on the incoming information.
• Moreover, there are times when two or more segments may
require memory access at the same time, causing one segment
to wait until another is finished with the memory.
• The organization of an instruction pipeline will be more efficient
if the instruction cycle is divided into segments of equal
duration.
• One of the most common examples of this type of organization
is a Four-segment instruction pipeline.
• A four-segment instruction pipeline combines two or more
different segments and makes it as a single one.
• For instance, the decoding of the instruction can be combined
with the calculation of the effective address into one segment.
• The instruction cycle is completed in four segments.
Segment 1:
• The instruction fetch segment can be implemented
using first in, first out (FIFO) buffer.
Segment 2:
• The instruction fetched from memory is decoded in the
second segment, and eventually, the effective address
is calculated in a separate arithmetic circuit.
Segment 3:
• An operand from memory is fetched in the third
segment.
Segment 4:
• The instructions are finally executed in the last
segment of the pipeline organization.
Instruction-Level Parallelism
Pipeline Hazards:
• There are situations or problems with the instruction pipeline in CPU micro
architectures when the next instruction cannot execute in the following clock
pipelining.
8086 has two blocks 1) BIU (BUS INTERFACR UNIT) and 2) EU (EXECUTION UNIT)
•1)The BIU performs all bus operations such as instruction fetching, reading and
writing operands for memory and calculating the addresses of the memory operands . It
•Both units operate asynchronously to give the 8086 an overlapping instruction fetch and
•BIU contains Instruction queue, Segment registers, Instruction pointer, and Address
adder.
•EU contains Control circuitry, Instruction decoder, ALU, Pointer and Index register, Flag
register.
Bus Interface Unit (BIU)
• The Bus Interface Unit (BIU) manages the data, address and control
buses.
• The BIU functions are..
1)Fetches the sequenced instruction from the memory.
2)Finds the physical address of that location in the memory where the
instruction is stored and
3)Manages the 6-byte pre-fetch queue where the pipelined instructions
are stored.
• An 8086 microprocessor exhibits a property of pipelining the instructions
in a queue while performing decoding and execution of the previous
instruction.
• This saves the processor time of operation by a large amount. This
pipelining is done in a 6-byte queue.
• Also, the BIU contains 4 segment registers. Each segment register is of
16-bit. The segments are present in the memory and these registers
hold the address of all the segments. These registers are as follows:
1.Code segment register: It is a 16-bit register and holds the address of
the instruction or program stored in the code segment of the memory.
Also, the IP in the block diagram is the instruction pointer which is a
default register that is used by the processor in order to get the desired
instruction.
2. Stack segment register: The stack segment register provides the
starting address of stack segment in the memory. Like in stack pointer,
PUSH and POP operations are used in this segment to give and take the
data to/from it.
3. Data segment register: It holds the address of the data segment. The
data segment stores the data in the memory whose address is present in
this 16-bit register.
4. Extra segment register: Here the starting address of the extra segment is
present. This register basically contains the address of the string data.
6-byte pre-fetch queue: This queue is used in 8086 in order to perform
pipelining.
As at the time of decoding and execution of the instruction in EU, the BIU
fetches the sequential upcoming instructions and stores it in this queue.
• The size of this queue is 6-byte. This means at maximum a 6-byte
instruction can be stored in this queue.
• The queue exhibits FIFO behaviour., first in first out.
Execution Unit (EU)
• The Execution Unit (EU) performs the decoding and execution of the
instructions that are being fetched from the desired memory location.
• Control Unit:
The control unit in 8086 microprocessor produces control signal after
decoding the opcode to inform the general purpose register to release
the value stored in it. And it also signals the ALU to perform the desired
operation.
• ALU:
• The arithmetic and logic unit carries out the logical tasks according to
the signal generated by the CU. The result of the operation is stored in
the desired register.
• Flag:
• Like in 8085, here also the flag register holds the status
of the result generated by the ALU. It has several flags
that show the different conditions of the result.
• Operand:
• It is a temporary register and is used by the processor
to hold the temporary values at the time of operation.
• The reason behind two separate sections for BIU and
EU in the architecture of 8086 is to perform fetching
and decoding-executing simultaneously.
Working of 8086 Microprocessor