Professional Documents
Culture Documents
DIGITAL NOTES
ON
COMPUTER ARCHITECTURE
TOPIC COVERED
I/O Organization
Data Transfer to Array Processing
1|Page
Computer Architecture
INDEX
1. I/O Organization 03 - 19
2. Data Transfer to Array Processing 20 - 31
2|Page
Computer Architecture
Peripherals Devices
A Peripheral Device is defined as the device which provides input/output functions for a computer
and serves as an auxiliary computer device without computing-intensive functionality.
Generally peripheral devices, however, are not essential for the computer to perform its basic tasks,
they can be thought of as an enhancement to the user’s experience. A peripheral device is a device
that is connected to a computer system but is not part of the core computer system architecture.
Generally, more people use the term peripheral more loosely to refer to a device external to the
computer case.
Peripherals devices provides more feature due to this operation of the system is easy. These are
given below:
It is helpful for taking input very easily.
It is also provided a specific output.
It has a storage device for storing information or data
It also improves the efficiency of the system.
3|Page
Computer Architecture
NUL Null
ENQ Enquiry
ACK Acknowledge
EM End of medium
Format Effectors − It can control the design of printing. It contains familiar typewrite controls
including Back Space (BS), Horizontal Tabulation (HT), and Carriage Return (CR).
4|Page
Computer Architecture
Information Separators − It can separate the information into divisions including paragraphs and
pages. It includes Record Separator (RS) and File Separator (FS).
Communication Control − It can be used during the transmission of text between remote terminals.
I/O Interface
The I/O interface supports a method by which data is transferred between internal storage and
external I/O devices. All the peripherals connected to a computer require special communication
connections for interfacing them with the CPU.
The I/O bus includes data lines, address lines, and control lines. In any general-purpose computer,
the magnetic disk, printer, and keyboard, and display terminal are commonly employed. Each
peripheral unit has an interface unit associated with it. Each interface decodes the control and
address received from the I/O bus.
It can describe the address and control received from the peripheral and supports signals for the
peripheral controller. It also conducts the transfer of information between peripheral and processor
and also integrates the data flow.
The I/O bus is linked to all peripheral interfaces from the processor. The processor locates a device
address on the address line to interact with a specific device. Each interface contains an address
decoder attached to the I/O bus that monitors the address lines.
When the address is recognized by the interface, it activates the direction between the bus lines
and the device that it controls. The interface disables the peripherals whose address does not
equivalent to the address in the bus.
An interface receives any of the following four commands −
5|Page
Computer Architecture
Control − A command control is given to activate the peripheral and to inform its next task. This
control command depends on the peripheral, and each peripheral receives its sequence of control
commands, depending on its mode of operation.
Status − A status command can test multiple test conditions in the interface and the peripheral.
Data Output − A data output command creates the interface counter to the command by sending
data from the bus to one of its registers.
Data Input − The data input command is opposite to the data output command. In data input, the
interface gets an element of data from the peripheral and places it in its buffer register.
If the registers in the I/O interface share a common clock with CPU registers, then transfer between
the two units is said to be synchronous. But in most cases, the internal timing in each unit is
independent of each other, so each uses its private clock for its internal registers. In this case, the
two units are said to be asynchronous to each other, and if data transfer occurs between them, this
data transfer is called Asynchronous Data Transfer.
But, the Asynchronous Data Transfer between two independent units requires that control signals be
transmitted between the communicating units so that the time can be indicated at which they send
data. These two methods can achieve this asynchronous way of data transfer:
o Strobe control: A strobe pulse is supplied by one unit to indicate to the other unit when the transfer
has to occur.
o Handshaking: This method is commonly used to accompany each data item being transferred with a
control signal that indicates data in the bus. The unit receiving the data item responds with another
signal to acknowledge receipt of the data.
The strobe pulse and handshaking method of asynchronous data transfer is not restricted to I/O
transfer. They are used extensively on numerous occasions requiring the transfer of data between
two independent units. So, here we consider the transmitting unit as a source and receiving unit as
a destination.
Skip Ad
6|Page
Computer Architecture
For example, the CPU is the source during output or write transfer and the destination unit during
input or read transfer.
Therefore, the control sequence during an asynchronous transfer depends on whether the transfer is
initiated by the source or by the destination.
So, while discussing each data transfer method asynchronously, you can see the control sequence in
both terms when it is initiated by source or by destination. In this way, each data transfer method
can be further divided into parts, source initiated and destination initiated.
The Strobe Control method of asynchronous data transfer employs a single control line to time each
transfer. This control line is also known as a strobe, and it may be achieved either by source or
destination, depending on which initiate the transfer.
a. Source initiated strobe: In the below block diagram, you can see that strobe is initiated by source,
and as shown in the timing diagram, the source unit first places the data on the data bus.
After a brief delay to ensure that the data resolve to a stable value, the source activates a strobe pulse. The
information on the data bus and strobe control signal remains in the active state for a sufficient time to allow
the destination unit to receive the data.
The destination unit uses a falling edge of strobe control to transfer the contents of a data bus to one of its
internal registers. The source removes the data from the data bus after it disables its strobe pulse. Thus, new
valid data will be available only after the strobe is enabled again.
In this case, the strobe may be a memory-write control signal from the CPU to a memory unit. The CPU places
the word on the data bus and informs the memory unit, which is the destination.
7|Page
Computer Architecture
b. Destination initiated strobe: In the below block diagram, you see that the strobe initiated by
destination, and in the timing diagram, the destination unit first activates the strobe pulse, informing
the source to provide the data.
The source unit responds by placing the requested binary information on the data bus. The data must
be valid and remain on the bus long enough for the destination unit to accept it.
The falling edge of the strobe pulse can use again to trigger a destination register. The destination unit
then disables the strobe. Finally, and source removes the data from the data bus after a determined
time interval.
In this case, the strobe may be a memory read control from the CPU to a memory unit. The CPU initiates
the read operation to inform the memory, which is a source unit, to place the selected word into the
data bus.
2. Handshaking Method
The strobe method has the disadvantage that the source unit that initiates the transfer has no way
of knowing whether the destination has received the data that was placed in the bus. Similarly, a
destination unit that initiates the transfer has no way of knowing whether the source unit has placed
data on the bus.
So this problem is solved by the handshaking method. The handshaking method introduces a second
control signal line that replays the unit that initiates the transfer.
In this method, one control line is in the same direction as the data flow in the bus from the source
to the destination. The source unit uses it to inform the destination unit whether there are valid data
in the bus.
The other control line is in the other direction from the destination to the source. This is because the
destination unit uses it to inform the source whether it can accept data. And in it also, the sequence
of control depends on the unit that initiates the transfer. So it means the sequence of control depends
on whether the transfer is initiated by source and destination.
8|Page
Computer Architecture
o Source initiated handshaking: In the below block diagram, you can see that two handshaking lines
are "data valid", which is generated by the source unit, and "data accepted", generated by the
destination unit.
The timing diagram shows the timing relationship of the exchange of signals between the two units.
The source initiates a transfer by placing data on the bus and enabling its data valid signal. The
destination unit then activates the data accepted signal after it accepts the data from the bus.
The source unit then disables its valid data signal, which invalidates the data on the bus.
After this, the destination unit disables its data accepted signal, and the system goes into its initial
state. The source unit does not send the next data item until after the destination unit shows readiness
to accept new data by disabling the data accepted signal.
This sequence of events described in its sequence diagram, which shows the above sequence in which
the system is present at any given time.
o Destination initiated handshaking: In the below block diagram, you see that the two handshaking
lines are "data valid", generated by the source unit, and "ready for data" generated by the destination
unit.
Note that the name of signal data accepted generated by the destination unit has been changed to
ready for data to reflect its new meaning.
9|Page
Computer Architecture
The destination transfer is initiated, so the source unit does not place data on the data bus until it
receives a ready data signal from the destination unit. After that, the handshaking process is the same
as that of the source initiated.
The sequence of events is shown in its sequence diagram, and the timing relationship between signals
is shown in its timing diagram. Therefore, the sequence of events in both cases would be identical.
o It is more flexible, and devices can exchange information at their own pace. In addition, individual data
characters can complete themselves so that even if one packet is corrupted, its predecessors and
successors will not be affected.
o It does not require complex processes by the receiving device. Furthermore, it means that inconsistency
in data transfer does not result in a big crisis since the device can keep up with the data stream. It also
makes asynchronous transfers suitable for applications where character data is generated irregularly.
o The success of these transmissions depends on the start bits and their recognition. Unfortunately, this
can be easily susceptible to line interference, causing these bits to be corrupted or distorted.
10 | P a g e
Computer Architecture
o A large portion of the transmitted data is used to control and identify header bits and thus carries no
helpful information related to the transmitted data. This invariably means that more data packets need
to be sent.
11 | P a g e
Computer Architecture
12 | P a g e
Computer Architecture
Note: Both the methods programmed I/O and Interrupt-driven I/O require the active intervention of
the
processor to transfer data between memory and the I/O module, and any data transfer must
transverse
a path through the processor. Thus both these forms of I/O suffer from two inherent drawbacks.
The I/O transfer rate is limited by the speed with which the processor can test and service a
device.
The processor is tied up in managing an I/O transfer; a number of instructions must be executed
for each I/O transfer.
Bus Request : It is used by the DMA controller to request the CPU to relinquish the control of the
buses.
Bus Grant : It is activated by the CPU to Inform the external DMA controller that the buses are in
high impedance state and the requesting DMA can take control of the buses. Once the DMA has
taken the control of the buses it transfers the data. This transfer can take place in many ways.
Types of DMA transfer using DMA controller:
Burst Transfer :
DMA returns the bus after complete data transfer. A register is used as a byte count,
being decremented for each byte transfer, and upon the byte count reaching zero, the DMAC will
release the bus. When the DMAC operates in burst mode, the CPU is halted for the duration of the
data
13 | P a g e
Computer Architecture
transfer.
Steps involved are:
3. Bus grant request time.
4. Transfer the entire block of data at transfer rate of device because the device is usually slow than
the
speed at which the data can be transferred to CPU.
5. Release the control of the bus back to CPU
So, total time taken to transfer the N bytes
= Bus grant request time + (N) * (memory transfer rate) + Bus release control time.
Where,
X µsec =data transfer time or preparation time (words/block)
Y µsec =memory cycle time or cycle time or transfer time (words/block)
% CPU idle (Blocked)=(Y/X+Y)*100
% CPU Busy=(X/X+Y)*100
Cyclic Stealing :
An alternative method in which DMA controller transfers one word at a time after which it must
return the control of the buses to the CPU. The CPU delays its operation only for one memory cycle
to allow the direct memory I/O transfer to “steal” one memory cycle.
Steps Involved are:
6. Buffer the byte into the buffer
7. Inform the CPU that the device has 1 byte to transfer (i.e. bus grant request)
8. Transfer the byte (at system bus speed)
9. Release the control of the bus back to CPU.
Before moving on transfer next byte of data, device performs step 1 again so that bus isn’t tied up
and
the transfer won’t depend upon the transfer rate of device.
So, for 1 byte of transfer of data, time taken by using cycle stealing mode (T).
= time required for bus grant + 1 bus cycle to transfer data + time required to release the bus, it will
be
NxT
In cycle stealing mode we always follow pipelining concept that when one byte is getting transferred
then Device is parallel preparing the next byte. “The fraction of CPU time to the data transfer time”
if asked then cycle stealing mode is used. Where,
X µsec =data transfer time or preparation time
(words/block)
Y µsec =memory cycle time or cycle time or transfer
time (words/block)
% CPU idle (Blocked) =(Y/X)*100
% CPU busy=(X/Y)*100
14 | P a g e
Computer Architecture
15 | P a g e
Computer Architecture
Direct Memory Access (DMA) :
DMA Controller is a hardware device that allows I/O devices to directly access memory with
less participation of the processor. DMA controller needs the same old circuits of an
interface to communicate with the CPU and Input/Output devices.
Fig-1 below shows the block diagram of the DMA controller. The unit communicates with
the CPU through data bus and control lines. Through the use of the address bus and
allowing the DMA and RS register to select inputs, the register within the DMA is chosen
by the CPU. RD and WR are two-way inputs. When BG (bus grant) input is 0, the CPU can
communicate with DMA registers. When BG (bus grant) input is 1, the CPU has relinquished
the buses and DMA can communicate directly with the memory.
DMA controller registers :
The DMA controller has three registers as follows.
Address register – It contains the address to specify the desired location in memory.
Word count register – It contains the number of words to be transferred.
Control register – It specifies the transfer mode.
Note –
All registers in the DMA appear to the CPU as I/O interface registers. Therefore, the CPU
can both read and write into the DMA registers under program control via the data bus.
Explanation :
The CPU initializes the DMA by sending the given information through the data bus.
The starting address of the memory block where the data is available (to read) or where
data are to be stored (to write).
It also sends word count which is the number of words in the memory block to be read
or write.
Control to define the mode of transfer such as read or write.
A control to begin the DMA transfer.
16 | P a g e
Computer Architecture
Input-Output Processor
The DMA mode of data transfer reduces CPU’s overhead in handling I/O operations. It
also allows parallelism in CPU and I/O operations. Such parallelism is necessary to avoid
wastage of valuable CPU time while handling I/O devices whose speeds are much slower
as compared to CPU. The concept of DMA operation can be extended to relieve the CPU
further from getting involved with the execution of I/O operations. This gives rises to the
development of special purpose processor called Input-Output Processor (IOP) or IO
channel.
The Input Output Processor (IOP) is just like a CPU that handles the details of I/O
operations. It is more equipped with facilities than those are available in typical DMA
controller. The IOP can fetch and execute its own instructions that are specifically
designed to characterize I/O transfers. In addition to the I/O – related tasks, it can perform
other processing tasks like arithmetic, logic, branching and code translation. The main
memory unit takes the pivotal role. It communicates with processor by the means of DMA.
The block diagram –
The Input Output Processor is a specialized processor which loads and stores data into
memory along with the execution of I/O instructions. It acts as an interface between
system and devices. It involves a sequence of events to executing I/O operations and
then store the results into the memory.
Advantages –
The I/O devices can directly access the main memory without the intervention by the
processor in I/O processor based systems.
It is used to address the problems that are arises in Direct memory access method.
17 | P a g e
Computer Architecture
Communication
Digital communication can be considered as the communication happening between
two (or more) devices in terms of bits. This transferring of data, either wirelessly or
through wires, can be either one bit at a time or the entire data (depending on the size
of the processor inside i.e., 8 bit, 16 bit etc.) at once. Based on this, we can have the
following classification namely, Serial Communication and Parallel
Communication.
Serial Communication
Serial Communication implies transferring of data bit by bit, sequentially. This is the
most common form of communication used in the digital word. Contrary to the parallel
communication, serial communication needs only one line for the data transfer.
Thereby, the cost for the communication line as well as the space required is reduced.
Parallel Communication
Parallel communication implies transferring of the bits in a parallel fashion at a time.
This communication comes for rescue when speed rather than space is the main
objective. The transfer of data is at high speed owing to the fact that no bus buffer is
present.
18 | P a g e
Computer Architecture
Parallel and Serial Communication(Interface)
MSB:Most Significant Bit
LSB:Least Significant Bit
Example:
For a 8 bit data transfer in Serial communication one bit will be sent at a time. The
entire data is first fed into the serial port buffer. From this buffer one bit will be sent at
a time. Only after the last bit is received the data transferred can be forwarded for
processing. While in the Parallel Communication a serial port buffer is not required.
According to the length of the data, the number of bus lines are available plus a
synchronization line for synchronized transmission of data.
Thus we can state that for the same frequency of data transmission Serial
communication is slower than parallel communication
In Serial the data is sent sequentially and latched up at the receiving end thus procuring
the entire data from the data bus using USART/UART (Universal Synchronous
Asynchronous Receiver Transmitter) without any loss in synchronization but in
parallel even if one wire takes more time to recover the received data will be faulty.
19 | P a g e
Computer Architecture
Load LD
Store ST
Move MOV
Exchange XCH
Input In
Output OUT
Push PUSH
Pop POP
The instructions can be described as follows −
Load − The load instruction is used to transfer data from the memory to a processor register,
which is usually an accumulator.
Store − The store instruction transfers data from processor registers to memory.
Move − The move instruction transfers data from processor register to memory or memory
to processor register or between processor registers itself.
Exchange − The exchange instruction swaps information either between two registers or
between a register and a memory word.
Input − The input instruction transfers data between the processor register and the input
terminal.
Output − The output instruction transfers data between the processor register and the
output terminal.
Push and Pop − The push and pop instructions transfer data between a processor register
and memory stack.
All these instructions are associated with a variety of addressing modes. Some assembly language
instructions use different mnemonic symbols just to differentiate between the different addressing
modes.
20 | P a g e
Computer Architecture
Arithmetic instructions
The four basic arithmetic operations are addition, subtraction, multiplication, and division. Most
computers provide instructions for all four operations.
Typical Arithmetic Instructions –
Name MnemonicExample Explanation
It will increment the register B by 1
B<-B+1
Increment INC INC B
21 | P a g e
Computer Architecture
22 | P a g e
Computer Architecture
Logical instructions perform binary operations on strings of bits stored in registers. They are
useful for manipulating individual bits or a group of bits.
Typical Logical and Bit Manipulation Instructions –
Name MnemonicExample Explanation
It will set the accumulator to 0
AC<-0
Clear CLR CLR
23 | P a g e
Computer Architecture
Carry flag<-1
Disable
interrupt DI DI It will disable the interrupt
Shift Instructions
Shifts are operations in which the bits of a word are moved to the left or right. Shift instructions
may specify either logical shifts, arithmetic shifts, or rotate-type operations.
Typical Shift Instructions –
Name Mnemonic
Logical shift right SHR
24 | P a g e
Computer Architecture
Name Mnemonics
Branch BR
Jump JMP
Skip SKP
Call Call
Return RET
25 | P a g e
Computer Architecture
1. Compare Instruction:
Compare instruction is specifically provided, which is similar to a subtract instruction
except the result is not stored anywhere, but flags are set according to the result.
Example:
CMP R1, R2 ;
4. Subroutines:
A subroutine is a program fragment that lives in user space, performs a well-defined task.
It is invoked by another user program and returns control to the calling program when
finished.
Example:
CALL and RET
5. Halting Instructions:
NOP Instruction – NOP is no operation. It cause no change in the processor state
other than an advancement of the program counter. It can be used to synchronize
timing.
HALT – It brings the processor to an orderly halt, remaining in an idle state until
restarted by interrupt, trace, reset or external action.
6. Interrupt Instructions:
Interrupt is a mechanism by which an I/O or an instruction can suspend the normal
execution of processor and get itself serviced.
RESET – It reset the processor. This may include any or all setting registers to an
initial value or setting program counter to standard starting location.
TRAP – It is non-maskable edge and level triggered interrupt. TRAP has the highest
priority and vectored interrupt.
INTR – It is level triggered and maskable interrupt. It has the lowest priority. It can be
disabled by resetting the processor.
26 | P a g e
Computer Architecture
CISC: The CISC approach attempts to minimize the number of instructions per program but at
the cost of an increase in the number of cycles per instruction.
Earlier when programming was done using assembly language, a need was felt to make instruction
do more tasks because programming in assembly was tedious and error-prone due to which CISC
architecture evolved but with the uprise of high-level language dependency on assembly reduced
RISC architecture prevailed.
Characteristic of RISC –
1. Simpler instruction, hence simple instruction decoding.
2. Instruction comes undersize of one word.
3. Instruction takes a single clock cycle to get executed.
4. More general-purpose registers.
5. Simple Addressing Modes.
6. Fewer Data types.
7. A pipeline can be achieved.
Characteristic of CISC –
1. Complex instruction, hence complex instruction decoding.
2. Instructions are larger than one-word size.
3. Instruction may take more than a single clock cycle to get executed.
4. Less number of general-purpose registers as operations get performed in memory itself.
5. Complex Addressing Modes.
6. More Data types.
27 | P a g e
Computer Architecture
RISC approach: Here programmer will write the first load command to load data in registers
then it will use a suitable operator and then it will store the result in the desired location.
So, add operation is divided into parts i.e. load, operate, store due to which RISC programs are
longer and require more memory to get stored but require fewer transistors due to less complex
command.
Can perform only Register to Register Arithmetic Can perform REG to REG or REG to MEM or MEM to
operations MEM
An instruction executed in a single clock cycle Instruction takes more than one clock cycle
An instruction fit in one word Instructions are larger than the size of one word
28 | P a g e
Computer Architecture
Pipelining
Pipelining defines the temporal overlapping of processing. Pipelines are emptiness greater than
assembly lines in computing that can be used either for instruction processing or, in a more general
method, for executing any complex operations. It can be used efficiently only for a sequence of the
same task, much similar to assembly lines.
A basic pipeline processes a sequence of tasks, including instructions, as per the following principle
of operation −
Each task is subdivided into multiple successive subtasks as shown in the figure. For instance, the
execution of register-register instructions can be broken down into instruction fetch, decode,
execute, and writeback(fetch, decode, execute, memory & writeback).
29 | P a g e
Computer Architecture
Advantages of Pipelining
The cycle time of the processor is decreased. It can improve the instruction throughput.
Pipelining doesn't lower the time it takes to do an instruction. Rather than, it can raise the
multiple instructions that can be processed together ("at once") and lower the delay between
completed instructions (known as 'throughput').
If pipelining is used, the CPU Arithmetic logic unit can be designed quicker, but more complex.
Pipelining increases execution over an un-pipelined core by an element of the multiple stages
(considering the clock frequency also increases by a similar factor) and the code is optimal for
pipeline execution.
Pipelined CPUs frequently work at a higher clock frequency than the RAM clock frequency,
(as of 2008 technologies, RAMs operate at a low frequency correlated to CPUs frequencies)
increasing the computer’s global implementation.
Vector/Array Processing
Array processors are also known as multiprocessors or vector processors. They perform
computations on large arrays of data. Thus, they are used to improve the performance of the
computer.
Vector processing is a central processing unit that can perform the complete vector input in individual
instruction. It is a complete unit of hardware resources that implements a sequential set of similar
data elements in the memory using individual instruction.
The scientific and research computations involve many computations which require extensive and
high-power computers. These computations when run in a conventional computer may take days or
weeks to complete. The science and engineering problems can be specified in methods of vectors
and matrices using vector processing.
30 | P a g e
Computer Architecture
o When successive vector instructions use the resulting flow from one vector register as the
operand of another operation utilizing a different functional unit. This phase is known as
chaining.
A vector processor implements better with higher vectors because of the foundation delay in
a pipeline.
Vector processing decrease the overhead related to maintenance of the loop-control variables
which creates it more efficient than scalar processing.
31 | P a g e