Professional Documents
Culture Documents
1. Memory Reference – These instructions refer to memory address as an operand. The other
operand is always accumulator. Specifies 12 bit address, 3 bit opcode (other than 111) and 1 bit
addressing mode for direct and indirect addressing.
Example –
IR register contains = 0001XXXXXXXXXXXX, i.e. ADD after fetching and decoding of instruction we find
out that it is a memory reference instruction for ADD operation.
Hence, DR ← M[AR]
AC ← AC + DR, SC ← 0
2. Register Reference – These instructions perform operations on registers rather than memory
addresses. The IR(14 – 12) is 111 (differentiates it from memory reference) and IR(15) is 0
(differentiates it from input/output instructions). The rest 12 bits specify register operation.
Example –
IR register contains = 0111001000000000, i.e. CMA after fetch and decode cycle we find out that it is a
register reference instruction for complement accumulator.
Hence, AC ← ~AC
3. Input/Output – These instructions are for communication between computer and outside
environment. The IR(14 – 12) is 111 (differentiates it from memory reference) and IR(15) is 1
(differentiates it from register reference instructions). The rest 12 bits specify I/O operation.
1
Example –
IR register contains = 1111100000000000, i.e. INP after fetch and decode cycle we find out that it is an
input/output instruction for inputing character. Hence, INPUT character from peripheral device.
1. Arithmetic, logical and shift instructions (and, add, complement, circulate left, right, etc)
2. To move information to and from memory (store the accumulator, load the accumulator)
2
SYMBOL HEXADECIMAL CODE DESCRIPTION
Addressing mode very much depend on the type of CPU organisation. There are three types of CPU
organisation:
3. Stack organisation
Addressing modes is used for one or both of the purpose. These can also be said as the advantages of
using addressing mode:
1. To give programming versatility to the user by providing such facilities as pointers to memory,
counter for loop control, indexing of data, and program relocation.
There are numbers of addressing modes available and it depends on the architecture and CPU
organisation which of the addressing modes can be applied.
3
MEMORY BASED ADDRESSING MODES REGISTER BASED ADDRESSING MODES
The content of base register is added to the If we are having a table of data and our program
address part of the instruction to obtain the needs to access all the values one by one we need
effective address. A base register is assumed to something which decrements the program
hold a base address and the address field of the counter/or any register which has base address.
instruction gives displacement relative to the Though in this case register is basically decreased,
base address, e.g., Base Register Addressing it is register based addressing mode, e.g., In Auto
Mode decrements mode
Memory based addressing modes are mostly rely on Memory address and content present at some
memory location. Register based addressing modes are mostly rely on Registers and content present at
some register either it is data or some memory address.
4
Addressing Modes
Addressing Modes– The term addressing modes refers to the way in which the operand of an
instruction is specified. The addressing mode specifies a rule for interpreting or modifying the address
field of the instruction before the operand is actually executed.
Addressing modes for 8086 instructions are divided into two categories:
The 8086 memory addressing modes provide flexible access to memory, allowing you to easily access
variables, arrays, records, pointers, and other complex data types. The key to good assembly language
programming is the proper use of memory addressing modes.
IMPORTANT TERMS
Effective address or Offset: An offset is determined by adding any combination of three address
elements: displacement, base and index.
Implied mode:: In implied addressing the operand is specified in the instruction itself. In this
mode the data is 8 bits or 16 bits long and data is the part of instruction.Zero address instruction
are designed with implied addressing mode.
Example: MOV AL, 35H (move the data 35H into AL register)
5
Immediate addressing mode (symbol #):In this mode data is present in address field of
instruction .Designed like one address instruction format.
Note:Limitation in the immediate mode is that the range of constants are restricted by size of
address field.
Register mode: In register addressing the operand is placed in one of 8 bit or 16 bit general
purpose registers. The data is in the register that is specified by the instruction.
Here one register reference is required to access the data.
Register Indirect mode: In this addressing the operand’s offset is placed in any one of the
registers BX,BP,SI,DI as specified in the instruction. The effective address of the data is in the
base register or an index register that is specified by the instruction.
Here two register reference is required to access the data.
The 8086 CPUs let you access memory indirectly through a register using the register indirect
addressing modes.
Auto Indexed (increment mode): Effective address of the operand is the contents of a register
specified in the instruction. After accessing the operand, the contents of this register are
automatically incremented to point to the next consecutive memory location.(R1)+.
Here one register reference,one memory reference and one ALU operation is required to access
the data.
Example:
R1 = R1 +M[R2]
R2 = R2 + d
Useful for stepping through arrays in a loop. R2 – start of array d – size of an element
6
Auto indexed ( decrement mode): Effective address of the operand is the contents of a register
specified in the instruction. Before accessing the operand, the contents of this register are
automatically decremented to point to the previous consecutive memory location. –(R1)
Here one register reference,one memory reference and one ALU operation is required to access
the data.
Example:
R2 = R2-d
R1 = R1 + M[R2]
Auto decrement mode is same as auto increment mode. Both can also be used to implement a stack as
push and pop . Auto increment and Auto decrement modes are useful for implementing “Last-In-First-
Out” data structures.
Direct addressing/ Absolute addressing Mode (symbol [ ]): The operand’s offset is given in the
instruction as an 8 bit or 16 bit displacement element. In this addressing mode the 16 bit
effective address of the data is the part of the instruction.
Here only one memory reference operation is required to access the data.
Indirect addressing Mode (symbol @ or () ):In this mode address field of instruction contains
the address of effective address. Here two references are required.
1st reference to get effective address.
2nd reference to access the data.
1. Register Indirect: In this mode effective address is in the register, and corresponding
register name will be maintained in the address field of an instruction.
Here one register reference, one memory reference is required to access the data.
2. Memory Indirect: In this mode effective address is in the memory, and corresponding
memory address will be maintained in the address field of an instruction.
Here two memory reference is required to access the data.
Indexed addressing mode: The operand’s offset is the sum of the content of an index register SI
or DI and an 8 bit or 16 bit displacement.
Based Indexed Addressing: The operand’s offset is sum of the content of a base register BX or
BP and an index register SI or DI.
7
Example: ADD AX, [BX+SI]
Base register addressing mode: Base register addressing mode is used to implement
inter segment transfer of control. In this mode effective address is obtained by adding
base register value to address field value.
Note:
1. PC relative nad based register both addressing modes are suitable for program
relocation at runtime.
Match each of the high level language statements given on the left hand side with the most natural
addressing mode from those listed on the right hand side.
Answer: (C)
Explanation:
List 1 List 2
8
2) while [*A++]; c) auto increment
This article is contributed by Pooja Taneja. Please write comments if you find anything incorrect, or you
want to share more information about the topic discussed above.
1. Fixed Program Computers – Their function is very specific and they couldn’t be programmed,
e.g. Calculators.
2. Stored Program Computers – These can be programmed to carry out many different tasks,
applications are stored on them, hence the name.
The modern computers are based on a stored-program concept introduced by John Von Neumann. In
this stored-program concept, programs and data are stored in a separate storage unit called memories
and are treated the same. This novel idea meant that a computer built with this architecture would be
much easier to reprogram.
9
1. The Central Processing Unit (CPU)
Control Unit –
A control unit (CU) handles all processor control signals. It directs all input and output flow, fetches code
for instructions and controlling how data moves around the system.
The arithmetic logic unit is that part of the CPU that handles all the calculations the CPU may need, e.g.
Addition, Subtraction, Comparisons. It performs Logical Operations, Bit Shifting Operations, and
Arithmetic Operation.
2. Program Counter (PC): Keeps track of the memory location of the next instructions to
be dealt with. The PC then passes this next address to Memory Address Register (MAR).
3. Memory Address Register (MAR): It stores the memory locations of instructions that
need to be fetched from memory or stored into memory.
10
4. Memory Data Register (MDR): It stores instructions fetched from memory or any data
that is to be transferred to, and stored in, memory.
5. Current Instruction Register (CIR): It stores the most recently fetched instructions while
it is waiting to be coded and executed.
6. Instruction Buffer Register (IBR): The instruction that is not to be executed immediately
is placed in the instruction buffer register IBR.
Input/Output Devices – Program or data is read into main memory from the input device or
secondary storage under the control of CPU input instruction. Output devices are used to output
the information from a computer. If some results are evaluated by computer and it is stored in
the computer, then with the help of output devices, we can present it to the user.
Buses – Data is transmitted from one part of a computer to another, connecting all major
internal components to the CPU and memory, by the means of Buses. Types:
1. Data Bus: It carries data among the memory unit, the I/O devices, and the processor.
2. Address Bus: It carries the address of data (not the actual data) between memory and
processor.
3. Control Bus: It carries control commands from the CPU (and status signals from other
devices) in order to control and coordinate all the activities within the computer.
This architecture is very important and is used in our PCs and even in Super Computers.
Let’s talk about it abstraction by abstraction starting from writing code in any text editor.
1. We write code in text editor using any language like C++, JAVA, Python etc.
2. This code is given to the compiler and it actually converts it to assembly code that is very close
to machine hardware as it depend on instruction set which is then converted to binary that is 0
and 1 which actually represent digital voltage feeded to transistors inside the chip.
11
3. Now we have voltages which is actually required to run the hardware.These voltages actually
connect the correct circuitry inside the chip and perform that specific task for example addition,
subtraction etc .All these operations are done by combination of little transistors if we go into
low level or flip-flops which are combination of gates and gates are combination of transistors.
So, it all started with the invention of transistors.
4. The chip has lot of circuits inside it to perform various task like arithmetic nd logical task.
The computer hardware also contain RAM which is another chip which can store data
temporary and Hard disk which can permanently store data.
5. Operating system is also responsible to feed the software to the right hardware like keyboard,
mouse, screen etc.
Object program for SIC can be properly executed on SIX/XE which is known as upward compatability.
1. Memory –
Memory is byte addressable that is words are addressed by location of their lowest
numbered byte.
12
2. Registers –
There are 5 registers in SIC. Every register has an address associated with it known as register
number. Size of each register is 4 bytes. On basis of register size, integer size is dependent.
CC bit refers to condition code i.e. It tells whether device is ready or not. It occupies 2
bits.[6-7]
Mask bit refers to interrupt mask. It occupies 4 bits.[8-11]
ICode refers to interrupt code i.e. Interrupt Service Routine. It occupies the remaining
bits.[16-23]
3. Data Format –
4. Instruction Format –
All instructions in SIC have 24 bit format.
13
If x=1 it means indexed addressing mode.
5. Instruction Set –
Load And Store Instructions: To move or store data from accumulator to memory or
vice-versa. For example LDA, STA, LDX, STX etc.
Conditional Jump: compare the contents of accumulator and memory and performs task
based on conditions. For example JLT, JEQ, JGT
Test Device (TD) tests whether device is ready or not. Condition code in Status Word
Register is used for this purpose. If cc is < then device is ready otherwise device is busy.
References:
Leland.L.Beck: An introduction to systems programming, 3rd Edition, Addison-Wesley, 1997.
Here,
A stands for Accumulator
M stands for Memory
CC stands for Condition Code
PC stands for Program Counter
RMB stands for Right Most Byte
L stands for Linkage Register
14
MNEMONIC OPERAND OPCODE EXPLANATION
ADD M 18 A=A+M
AND M 40 A = A AND M
DIV M 24 A=A/M
J M 3C PC = M
JEQ M 30 if CC set to =, PC = M
JSUB M 48 L = PC ; PC = M
LDA M 00 A=M
LDL M 08 L=M
LDX M 04 X=M
MUL M 20 A=A*M
OR M 44 A = A OR M
RSUB 4C PC = L
STA M 0C M=A
STL M 14 M=L
STSW M E8 M = SW
STX M 10 M=X
SUB M 1C A=A–M
15
MNEMONIC OPERAND OPCODE EXPLANATION
RISC: Reduce the cycles per instruction at the cost of the number of instructions per program.
CISC: The CISC approach attempts to minimize the number of instructions per program but at
the cost of increase in number of cycles per instruction.
Earlier when programming was done using assembly language, a need was felt to make instruction do
more task because programming in assembly was tedious and error prone due to which CISC
architecture evolved but with uprise of high level language dependency on assembly reduced RISC
architecture prevailed.
Characteristic of RISC –
16
Characteristic of CISC –
3. Instruction may take more than single clock cycle to get executed.
4. Less number of general purpose register as operation get performed in memory itself.
CISC approach: There will be a single command or instruction for this like ADD which will
perform the task.
RISC approach: Here programmer will write first load command to load data in registers then it
will use suitable operator and then it will store result in desired location.
So, add operation is divided into parts i.e. load, operate, store due to which RISC programs are longer
and require more memory to get stored but require less transistors due to less complex command.
Difference –
RISC CISC
A instruction execute in single clock cycle Instruction take more than one clock cycle
A instruction fit in one word Instruction are larger than size of one woed
17
Instruction Set –
Set of complete instructions that the microprocessor executes is termed as the instruction set.
Word Length –
The number of bits processed in a single instruction is called word length or word size. Greater
the word size, larger the processing power of the CPU.
Classification of Microprocessors:
Besides the classification based on the word length, the classification is also based on the architecture
i.e. Instruction Set of the microprocessor. These are categorised into RISC and CISC.
1. RISC:
It stands for Reduced Instruction Set Computer. It is a type of microprocessor architecture that
uses a small set of instructions of uniform length. These are simple instructions which are
generally executed in one clock cycle. RISC chips are relatively simple to design and
inexpensive.The setback of this design is that the computer has to repeatedly perform simple
operations to execute a larger program having a large number of processing operations.
Examples: SPARC, POWER PC etc.
2. CISC:
It stands for Complex Instruction Set Computer. These processors offer the users, hundreds of
instructions of variable sizes. CISC architecture includes a complete set of special purpose
circuits that carry out these instructions at a very high speed. These instructions interact with
memory by using complex addressing modes. CISC processors reduce the program size and
hence lesser number of memory cycles are required to execute the programs. This increases the
overall speed of execution.
Examples: Intel architecture, AMD
3. EPIC:
It stands for Explicitly Parallel Instruction Computing. The best features of RISC and CISC
processors are combined in the architecture. It implements parallel processing of instructions
rather than using fixed length instructions. The working of EPIC processors are supported by
using a set of complex instructions that contain both basic instructions as well as the
information of execution of parallel instructions. It substantially increases the efficiency of these
processors.
18
CISC RISC
A large number of instructions are present in the Very fewer instructions are present. The number of
architecture. instructions are generally less than 100.
Some instructions with long execution times. No instruction with a long execution time due to
These include instructions that copy an entire very simple instruction set. Some early RISC
block from one part of memory to another and machines did not even have an integer multiply
others that copy multiple registers to and from instruction, requiring compilers to implement
memory. multiplication as a sequence of additions.
Variable-length encodings of the instructions. Fixed-length encodings of the instructions are used.
Example: IA32 instruction size can range from 1 Example: In IA32, generally all instructions are
to 15 bytes. encoded as 4 bytes.
19
program and store the results into the accumulator. The instruction format that is used by this CPU
Organisation is One address field. Due to this the CPU is known as One Address Machine.
The main points about Single Accumulator based CPU Organisation are:
1. In this CPU Organization, the first ALU operand is always stored into the Accumulator and the
second operand is present either in Registers or in the Memory.
2. Accumulator is the default address thus after data manipulation the results are stored into the
accumulator.
Here LOAD is memory read operation that is data is transfer from memory to accumulator and STORE is
memory write operation that is data is transfer from accumulator to memory.
2. ALU operation –
In this type of operation, arithmetic operations are performed on the data.
where X is the address of the operand. The MULT instruction in this example performs the operation,
AC <-- AC * M[X]
This type of CPU organization is first used in PDP-8 processor and is used for process control and
laboratory applications. It has been totally replaced by the introduction of the new general register
based CPU.
Advantages –
One of the operands is always held by the accumulator register. This results in short instructions
and less memory space.
Instruction cycle takes less time because it saves time in instruction fetching from memory.
Disadvantages –
When complex expressions are computed, program size increases due to the usage of many
short instructions to execute it. Thus memory size increases.
As the number of instructions increases for a program, the execution time increases.
20
Computer Organization | Stack based CPU Organization
The computers which use Stack based CPU Organization are based on a data structure called stack.
Stack is a list of data words. It uses Last In First Out (LIFO) access method which is the most popular
access method in most of the CPU. A register is used to store the address of the top most element of the
stack which is known as Stack pointer (SP).
The main two operations that are performed on the operators of the stack are Push and Pop. These two
operations are performed from one end only.
1. Push –
This operation is results in inserting one operand at the top of the stack and it decrease the
stack pointer register. The format of the PUSH instruction is:
PUSH
It inserts the data word at specified address to the top of the stack. It can be implemented as:
//decrement SP by 1
SP <-- SP - 1
2. Pop –
This operation is results in deleting one operand from the top of the stack and it increase the
stack pointer register. The format of the POP instruction is:
POP
It deletes the data word at the top of the stack to the specified address. It can be implemented as:
//increment SP by 1
SP <-- SP + 1
Operation type instruction do not need address field in this CPU organization. This is because the
operation is performed on the two operands that are on the top of the stack. For example:
21
SUB
This instruction contains the opcode only with no address field. It pops the two top data from the stack,
subtracting the data, and pushing the result into the stack at the top.
PDP-11, Intel’s 8085 and HP 3000 are some of the examples of the stack organized computers.
Execution of instructions is fast because operand data are stored in consecutive memory
locations.
This is an instruction of an arithmatic multiplication written in assembly language. It uses three address
fields R1, R2 and R3. The meaning of this instruction is:
R1 <-- R2 * R3
This instruction also can be written using only two address fields as:
MULT R1, R2
In this instruction, the destination register is the same as one of the source registers. This means the
operation
R1 <-- R1 * R2
The use of large number of registers results in short program with limited instructions.
Some examples of General register based CPU Organization are IBM 360 and PDP- 11.
Efficiency of CPU increases as there are large number of registers are used in this organization.
22
Less memory space is used to store the program since the instructions are written in compact
way.
Care should be taken to avoid unnecessary usage of registers. Thus, compilers need to be more
intelligent in this aspect.
Since large number of registers are used, thus extra cost is required in this organization.
2. HLT (Halt)
3. DI (Disable interrupts)
4. EI (Enable interrupts)
2. Opcode- NOP
3. Operand- None
4. Length- 1 byte
5. M-Cycles- 1
6. T-states- 4
Hex code- 00
It is used when no operation is performed. No flags are affected during the execution of NOP. The
instructon is used to fill in time delay or to delete and insert instructions while troubleshooting.
8. Opcode- HLT
9. Operand- None
23
11. M-Cycles- 2 or more
Hex code- 76
The Microprocessor finishes executing the current instruction and halts any further execution. The
contents of the registers are unaffected during the HLT state.
14. Opcode- DI
17. M-Cycles- 1
18. T-states- 4
Hex code- F3
Disable interrupt is used when the execution of a code sequence cannot be interrupted. For example, in
critical time delays, this instruction is used at the beginning of the code and the interrupts are enabled
at the end of the code. The 8085 TRAP cannot be disabled.
20. Opcode- EI
23. M-Cycles- 1
24. T-states- 4
Hex code- FB
After a system reset or the acknowledgement of an interrupt, the Interrupt Enable the flip-flop is reset,
thus disabling the interrupts.
29. M-Cycles- 1
30. T-states- 4
24
Hex code- 30
This SIM instruction is used to implementation of different interrupts of 8085 microprocessor like RST
7.5, 6.5 and 5.5 and also serial data output. It does not affect TRAP interrupt.
35. M-Cycles- 1
36. T-states- 4
Hex code- 20
This is a multipurpose instruction used to read the status of 8085 interrupts 7.5, 6.5, 5.5 and to read
serial data input bit.
Memory address registers(MAR) : It is connected to the address lines of the system bus. It
specifies the address in memory for a read or write operation.
Memory Buffer Register(MBR) : It is connected to the data lines of the system bus. It contains
the value to be stored in memory or the last value read from the memory.
25
Each phase of Instruction Cycle can be decomposed into a sequence of elementary micro-operations. In
the above examples, there is one sequence each for the Fetch, Indirect, Execute and Interrupt Cycles.
The Indirect Cycle is always followed by the Execute Cycle. The Interrupt Cycle is always followed by
the Fetch Cycle. For both fetch and execute cycles, the next cycle depends on the state of the system.
We assumed a new 2-bit register called Instruction Cycle Code (ICC). The ICC designates the state of
processor in terms of which portion of the cycle it is in:-
00 : Fetch Cycle
01 : Indirect Cycle
10 : Execute Cycle
11 : Interrupt Cycle
26
At the end of the each cycles, the ICC is set appropriately.The above flowchart of Instruction
Cycledescribes the complete sequence of micro-operations, depending only on the instruction sequence
and the interrupt pattern(this is a simplified example). The operation of the processor is described as
the performance of a sequence of micro-operation.
Step 1: The address in the program counter is moved to the memory address register(MAR), as this is
the only register which is connected to address lines of the system bus.
Step 2: The address in MAR is placed on the address bus, now the control unit issues a READ command
on the control bus, and the result appears on the data bus and is then copied into the memory buffer
register(MBR). Program counter is incremented by one, to get ready for the next instruction.(These two
action can be performed simultaneously to save time)
27
Thus, a simple Fetch Cycle consist of three steps and four micro-operation. Symbolically, we can write
these sequence of events as follows:-
Here ‘I’ is the instruction length. The notations (t1, t2, t3) represents successive time units. We assume
that a clock is available for timing purposes and it emits regularly spaced clock pulses. Each clock pulse
defines a time unit. Thus, all time units are of equal duration. Each micro-operation can be performed
within the time of a single time unit.
First time unit: Move the contents of the PC to MAR.
Second time unit: Move contents of memory location specified by MAR to MBR. Increment content of
PC by I.
Third time unit: Move contents of MBR to IR.
Note: Second and third micro-operations both take place during the second time unit.
Once an instruction is fetched, the next step is to fetch source operands. Source Operand is being
fetched by indirect addressing. Register-based operands need not be fetched. Once the opcode is
executed, a similar process may be needed to store the result in main memory. Following micro-
operations takes place:-
Step 1: The address field of the instruction is transferred to the MAR. This is used to fetch the address of
the operand.
Step 2: The address field of the IR is updated from the MBR.(So that it now contains a direct addressing
rather than indirect addressing)
Step 3: The IR is now in the state, as if indirect addressing has not been occurred.
Note: Now IR is ready for the execute cycle, but it skips that cycle for a moment to consider
the Interrupt Cycle .
28
3. The Execute Cycle
The other three cycles(Fetch, Indirect and Interrupt) are simple and predictable. Each of them requires
simple, small and fixed sequence of micro-operation. In each case same micro-operation are repeated
each time around.
Execute Cycle is different from them. Like, for a machine with N different opcodes there are N different
sequence of micro-operations that can occur.
Lets take an hypothetical example :-
consider an add instruction:
Here, this instruction adds the content of location X to register R. Corresponding micro-operation will
be:-
Here, the content of location X is incremented by 1. If the result is 0, the next instruction will be skipped.
Corresponding sequence of micro-operation will be :-
Here, the PC is incremented if (MBR) = 0. This test (is MBR equal to zero or not) and action (PC is
incremented by 1) can be implemented as one micro-operation.
Note : This test and action micro-operation can be performed during the same time unit during which
the updated value MBR is stored back to memory.
29
4. The Interrupt Cycle:
At the completion of the Execute Cycle, a test is made to determine whether any enabled
interrupt has occurred or not. If an enabled interrupt has occurred then Interrupt Cycle occurs.
The natare of this cycle varies greatly from one machine to another.
Lets take a sequence of micro-operation:-
Step 1: Contents of the PC is transferred to the MBR, so that they can be saved for return.
Step 2: MAR is loaded with the address at which the contents of the PC are to be saved.
PC is loaded with the address of the start of the interrupt-processing routine.
Step 3: MBR, containing the old value of PC, is stored in memory.
Note: In step 2, two actions are implemented as one micro-operation. However, most processor provide
multiple types of interrupts, it may take one or more micro-operation to obtain the save_address and
the routine_address before they are transferred to the MAR and PC respectively.
Machine Instructions
Machine Instructions are commands or programs written in machine code of a machine (computer) that
it can recognize and execute.
A machine instruction consists of several bytes in memory that tells the processor to perform
one machine operation.
The processor looks at machine instructions in main memory one after another, and performs
one machine operation for each machine instruction.
The collection of machine instructions in main memory is called a machine language program.
Machine code or machine language is a set of instructions executed directly by a computer’s central
processing unit (CPU). Each instruction performs a very specific task, such as a load, a jump, or an ALU
operation on a unit of data in a CPU register or memory. Every program directly executed by a CPU is
made up of a series of such instructions.
Label is an identifier that is assigned the address of the first byte of the instruction in which it
appears. It must be followed by “:”
30
Inclusion of spaces is arbitrary, except that at least one space must be inserted; no space would
lead to an ambiguity.
Example:
IN, OUT: Input byte or word from port, output word to port.
PUSH, POP: Push word onto stack, pop word off stack.
2. Arithmetic instructions – add, subtract, increment, decrement, convert byte/word and compare.
AAA, AAS, AAM,AAD: ASCII adjust for add, sub, mul, div .
31
XOR: Logical exclusive-OR of byte or word
Shift, rotate instruction- SHL, SHR Logical shift left, right byte or word? by 1or CL
RCL, RCR Rotate left, right through carry byte or word? by 1 or CL.
4. String manipulation instruction – load, store, move, compare and scan for byte/word
5. Control transfer instructions – conditional, unconditional, call subroutine and return from subroutine.
JMP:Unconditional jump .it includes loop transfer and subroutine and interrupt instructions.
LOOPE (LOOPZ): Loop if equal (zero), count in CX, short jump to target address.
LOOPNE (LOOPNZ): Loop if not equal (not zero), count in CX, short jump to target address.
CALL, RET: Call, return from procedure (inside or outside current segment).
Flag manipulation:
STD, CLD: Set, clear direction flag.STI, CLI: Set, clear interrupt enable flag.
PUSHF, POPF: Push flags onto stack, pop flags off stack.
32
MUL R5, R0, R1
In the above sequence, R0 to R8 are general purpose registers. In the instructions shown, the first
register stores the result of the operation performed on the second and the third registers. This
sequence of instructions is to be executed in a pipelined instruction processor with the following 4
stages: (1) Instruction Fetch and Decode (IF), (2) Operand Fetch (OF), (3) Perform Operation (PO) and (4)
Write back the Result (WB). The IF, OF and WB stages take 1 clock cycle each for any instruction. The PO
stage takes 1 clock cycle for ADD or SUB instruction, 3 clock cycles for MUL instruction and 5 clock cycles
for DIV instruction. The pipelined processor uses operand forwarding from the PO stage to the OF stage.
The number of clock cycles taken for the execution of the above sequence of instructions is
___________
(A) 11
(B) 12
(C) 13
(D) 14
Answer: (C)
Explanation:
1 2 3 4 5 6 7 8 9 10 11 12 13
IF OF PO PO PO WB
IF OF PO PO PO PO PO WB
IF OF PO WB
IF OF PO WB
Article Contributed by Pooja Taneja. Please write comments if you find anything incorrect, or you want
to share more information about the topic discussed above.
33
Address field which contain the location of operand, i.e., register or memory location.
A instruction is of various length depending upon the number of addresses it contain. Generally CPU
organization are of three types on the basis of number of address fields:
3. Stack organization
In first organization operation is done involving a special register called accumulator. In second on
multiple registers are used for the computation purpose. In third organization the work on stack basis
operation due to which it does not contain any address field. It is not necessary that only a single
organization is is applied a blend of various organization is mostly what we see generally.
A stack based computer do not use address field in instruction.To evaluate a expression first it is
converted to revere Polish Notation i.e. Post fix Notation.
Expression: X = (A+B)*(C+D)
Postfixed : X = AB+CD+*
PUSH A TOP = A
PUSH B TOP = B
34
PUSH C TOP = C
PUSH D TOP = D
Expression: X = (A+B)*(C+D)
AC is accumulator
LOAD A AC = M[A]
ADD B AC = AC + M[B]
STORE T M[T] = AC
LOAD C AC = M[C]
ADD D AC = AC + M[D]
MUL T AC = AC * M[T]
STORE X M[X] = AC
3. Two Address Instructions –
This is common in commercial computers. Here two address can be specified in the instruction.
Unlike earlier in one address instruction the result was stored in accumulator here result cab be
stored at different location rather than just accumulator, but require more number of bit to
represent address.
35
Here destination address can also contain operand.
Expression: X = (A+B)*(C+D)
MOV R2, C R2 = C
ADD R2, D R2 = R2 + D
MUL R1, R2 R1 = R1 * R2
MOV X, R1 M[X] = R1
4. Three Address Instructions –
This has three address field to specify a register or a memory location. Program created are
much short in size but number of bits per instruction increase. These instructions make creation
of program much easier but it does not mean that program will run much faster because now
instruction only contain more information but each micro operation (changing content of
register, loading address in address bus etc.) will be performed in one cycle only.
Expression: X = (A+B)*(C+D)
36
Basically, you are given a set of instructions and the initial content of the registers and flags of 8085
microprocessor. You have to find the content of the registers and flag status after each instruction.
Initially,
MOV B, A
DCR B
INR B
SUI 01H
HLT
Assumption:
Each instruction will use the result of the previous instruction for registers. Following is the description
of each instruction with register content and flag status:
Instruction-1:
SUB A instruction will subtract the content of the accumulator itself. It is used to clear the
content of the accumulator. After this operation the content of the registers and flags will be
like figure given below.
Instruction-2:
MOV B, A will copy the content from source register (A) to the destination register (B). Since it is
the Data Transfer instruction so it will not affect any flag. After this operation the content of the
registers and flags will be like figure given below.
Instruction-3:
DCR B will decrease the content of the register B by 1. DCR operation doesn’t affect Carry
flag(CY).
B-00H 0 0 0 0 0 0 0 0
37
For DCR B takes the 2’s complement of the 01H, 2’s Complement of 01H:
0000 0001
1 1 1 1 1 1 1 0 (1's complement)
+1
------------------
1111 1111
------------------
+(00) 0 0 0 0 0 0 0 0
-----------------------
1111 1111
----------------------
(FFH) this will be the content of the B. So after this operation the content of the registers and flag will be
like figure given below.
Instruction-4:
INR B will increase the content of the register B by 1. INR operation doesn’t affect Carry flag(CY).
B(FFH)
1111 1111
+(01) 0 0 0 0 0 0 0 1
------------------
CY=1 0 0 0 0 0 0 0 0
------------------
(0 0 0 0 0 0 0 0) will be the content of the register B. So after this operation the content of the registers
and flag will be like figure given below.
38
Instruction-5:
SUI 01H will subtract 01H from the content of the accumulator and store the result in the
accumulator.
A-00H 0 0 0 0 0 0 0 0
For SUI 01H takes the 2’s complement of the 01H, 2’s Complement of 01H:
0000 0001
1 1 1 1 1 1 1 0 (1's complement)
+1
------------------
1111 1111
------------------
-----------------------
1111 1111
(FFH) this will store in the Accumulator. After this operation the content of the registers and flag will be
like figure given below.
Microprogrammed Control
Computer Organization | Micro-Operation
In computer central processing units, micro-operations (also known as micro-ops) are the functional or
atomic, operations of a processor. These are low level instructions used in some designs to implement
complex machine instructions. They generally perform operations on data stored in one or more
registers. They transfer data between registers or between external buses of the CPU, also performs
arithmetic and logical operations on registers.
In executing a program, operation of a computer consists of a sequence of instruction cycles, with one
machine instruction per cycle. Each instruction cycle is made up of a number of smaller units – Fetch,
Indirect, Execute and Interrupt cycles. Each of these cycles involves series of steps, each of which
involves the processor registers. These steps are referred as micro-operations. the prefix micro refers to
39
the fact that each of the step is very simple and accomplishes very little. Figure below depicts the
concept being discussed here.
This basically means that an ISA describes the design of a Computer in terms of the basic operationsit
must support. The ISA is not concerned with the implementation specific details of a computer. It is only
concerned with the set or collection of basic operations the computer must support. For example the
AMD Athlon and the Core 2 Duo processors have entirely different implementations but they support
more or less the same set of basic operations as defined in the x86 Instruction Set.
Let us try to understand the Objectives of an ISA by taking the example of the MIPS ISA. MIPS is one of
the most widely used ISAs in education due to its simplicity.
Arithmetic/Logic Instructions:
These Instructions perform various Arithmetic & Logical operations on one or more
operands.
40
Data Transfer Instructions:
These instructions are responsible for the transfer of instructions from memory to the
processor registers and vice versa.
Since the MIPS is a 32 bit ISA, each instruction must be accomodated within 32 bits.
3. The ISA defines the the Instruction Format of each type of instruction.
The Instruction Format determines how the entire instruction is encoded within 32 bits
There are 3 types of Instruction Formats in the MIPS ISA:
R-Instruction Format
I-Instruction Format
J-Instruction Format
Each of the above Instruction Formats have different instruction encoding schemes, and hence need to
be interpreted differently by the processor.
We note that the Microarchitectural level lies just below the ISA level and hence is concerned with the
implementation of the basic operations to be supported by the Computer as defined by the ISA.
Therefore we can say that the AMD Athlon and Core 2 Duo processors are based on the same ISA but
have different microarchitectures with different performance and efficiencies.
Now one may ask the need to distinguish between Microarchitecture and ISA ?
The answer to this lies in the need to standardize and maintain the compatibility of programs across
different hardware implementations based on the same ISA. Making different machines compatible
41
with the same set of basic instructions (The ISA) allows the same program to run smoothly on many
different machines thereby making it easier for the programmers to document and maintain code for
many different machines simulteneously and efficiently.
This Flexibility is the reason we first define an ISA and then design different microarchitectures
complying to this ISA for implementing the machine. The design of a ISA is one of the major tasks in the
study of Computer Architecture.
The x86 was developed by Intel, but we see that almost every year Intel comes up with a new
generation of i-series processors. The x86 architecture on which most of the Intel Processors are based
essentially remains the same across all these generations but, where they differ is in the underlying
Microarchitecture. They differ in their implementation, and hence are claimed to have improved
Performance. These various Microarchitectures developed by Intel are codenamed as ‘Nehalem’,
‘Sandybridge’, ‘Ivybridge’ and so on.
Therefore in conclusion, we can say that different machines may be based on the same ISA, but have
different Microarchitectures.
42
SERIAL
NO. JUMP CALL
10 T states are required to execute this 18 T states are required to execute this
7. instruction instruction
Fixed logic circuits that correspond directly to the Boolean expressions are used to generate the
control signals.
43
Micro-programmed Control Unit –
The control signals associated with operations are stored in special memory units inaccessible
by the programmer as Control Words.
Control signals are generated by a program are similar to machine language programs.
Micro-programmed control unit is slower in speed because of the time it takes to fetch
microinstructions from the control memory.
1. Control Word : A control word is a word whose individual bits represent various control signals.
5. Control Store : the micro-routines for all instructions in the instruction set of a computer are
stored in a special memory called the Control Store.
44
Types of Micro-programmed Control Unit – Based on the type of Control Word stored in the Control
Memory (CM), it is classified into two types :
45
Hardwired Vs Micro-programmed Control unit | Set 2
1. Hardwired control units are generally faster than microprogrammed designs. In hardwired
control, we saw how all the control signals required inside the CPU can be generated using a
state counter and a PLA circuit.
2. A microprogrammed control unit is a relatively simple logic circuit that is capable of (1)
sequencing through microinstructions and (2) generating control signals to execute each
microinstruction.
Hardwired control unit generates the control Micrprogrammed control unit generates the
signals needed for the processor using logic control signals with the help of micro
circuits instructions stored in control memory
Difficult to modify as the control signals that need Easy to modify as the modification need to
to be generated are hard wired be done only at the instruction level
Only limited number of instructions are used due Control signals for many instructions can be
to the hardware implementation generated
Used in computer that makes use of Reduced Used in computer that makes use of
Instruction Set Computers(RISC) Complex Instruction Set Computers(CISC)
46
Response time is the time from start to completion of a task. This also includes:
CPU execution time is the total time a CPU spends computing on a given task. It also excludes time for
I/O or running other programs. This is also referred to as simply CPU time.
And,
(Performance of A / Performance of B)
If given that Processor A is faster than processor B, that means execution time of A is less than that of
execution time of B. Therefore, performance of A is greater than that of performance of B.
Example –
Machine A runs a program in 100 seconds, Machine B runs the same program in 125 seconds
(Performance of A / Performance of B)
Since clock cycle time and clock rate are reciprocals, so,
47
Which gives,
Execution time
Decrease the CPI (clock cycles per instruction) by using new Hardware.
Decrease the clock time or Increase clock rate by reducing propagation delays or by use
pipelining.
A control unit works by receiving input information to which it converts into control signals, which are
then sent to the central processor. The computer’s processor then tells the attached hardware what
operations to perform. The functions that a control unit performs are dependent on the type of CPU
because the architecture of CPU varies from manufacturer to manufacturer. Examples of devices that
require a CU are:
48
Functions of the Control Unit –
1. It coordinates the sequence of data movements into, out of, and between a processor’s many
sub-units.
2. It interprets instructions.
5. It controls many execution units(i.e. ALU, data buffers and registers) contained within a CPU.
6. It also handles multiple tasks, such as fetching, decoding, execution handling and storing results.
As a result, few output lines going out from the instruction decoder obtains active signal values. These
output lines are connected to the inputs of the matrix that generates control signals for executive units
of the computer. This matrix implements logical combinations of the decoded signals from the
instruction opcode with the outputs from the matrix that generates signals representing consecutive
control unit states and with signals coming from the outside of the processor, e.g. interrupt signals. The
matrices are built in a similar way as a programmable logic arrays.
49
Control signals for an instruction execution have to be generated not in a single time point but during
the entire time interval that corresponds to the instruction execution cycle. Following the structure of
this cycle, the suitable sequence of internal states is organized in the control unit.
A number of signals generated by the control signal generator matrix are sent back to inputs of the next
control state generator matrix. This matrix combines these signals with the timing signals, which are
generated by the timing unit based on the rectangular patterns usually supplied by the quartz generator.
When a new instruction arrives at the control unit, the control units is in the initial state of new
instruction fetching. Instruction decoding allows the control unit enters the first state relating execution
of the new instruction, which lasts as long as the timing signals and other input signals as flags and state
information of the computer remain unaltered. A change of any of the earlier mentioned signals
stimulates the change of the control unit state.
This causes that a new respective input is generated for the control signal generator matrix. When an
external signal appears, (e.g. an interrupt) the control unit takes entry into a next control state that is
the state concerned with the reaction to this external signal (e.g. interrupt processing). The values of
flags and state variables of the computer are used to select suitable states for the instruction execution
cycle.
The last states in the cycle are control states that commence fetching the next instruction of the
program: sending the program counter content to the main memory address buffer register and next,
reading the instruction word to the instruction register of computer. When the ongoing instruction is
the stop instruction that ends program execution, the control unit enters an operating system state, in
which it waits for a next user directive.
50
2. Microprogrammable control unit –
The fundamental difference between these unit structures and the structure of the hardwired
control unit is the existence of the control store that is used for storing words containing
encoded control signals mandatory for instruction execution.
In microprogrammed control units, subsequent instruction words are fetched into the instruction
register in a normal way. However, the operation code of each instruction is not directly decoded to
enable immediate control signal generation but it comprises the initial address of a microprogram
contained in the control store.
The last mentioned field decides the addressing mode (addressing operation) to be applied to the
address embedded in the ongoing microinstruction. In microinstructions along with conditional
addressing mode, this address is refined by using the processor condition flags that represent the status
of computations in the current program. The last microinstruction in the instruction of the given
microprogram is the microinstruction that fetches the next instruction from the main memory to the
instruction register.
51
With a two-level control store:
In this, in a control unit with a two-level control store, besides the control memory for
microinstructions, a nano-instruction memory is included. In such a control unit,
microinstructions do not contain encoded control signals. The operation part of
microinstructions contains the address of the word in the nano-instruction memory,
which contains encoded control signals. The nano-instruction memory contains all
combinations of control signals that appear in microprograms that interpret the
complete instruction set of a given computer, written once in the form of nano-
instructions.
In this way, unnecessary storing of the same operation parts of microinstructions is avoided. In this case,
microinstruction word can be much shorter than with the single level control store. It gives a much
smaller size in bits of the microinstruction memory and, as a result, a much smaller size of the entire
control memory. The microinstruction memory contains the control for selection of consecutive
microinstructions, while those control signals are generated at the basis of nano-instructions. In nano-
instructions, control signals are frequently encoded using 1 bit/ 1 signal method that eliminates
decoding.
52
“microinstructions”. The sequences of microinstructions could be stored in an internal “control”
memory.
Micro-programmed control unit can be classified into two types based on the type of Control Word
stored in the Control Memory, viz., Horizontal micro-programmed control unit and Vertical micro-
programmed control unit.
In Horizontal micro-programmed control unit, the control signals are represented in the
decoded binary format, i.e., 1 bit/CS. On the other hand.
In Vertical micro-programmed control unit, the control signals are represented in the encoded
binary format.
It is less flexible than Vertical micro- It is more flexible than Horizontal micro-programmed
programmed control unit. control unit.
Horizontal micro-programmed control unit Vertical micro-programmed control unit uses vertical
uses horizontal microinstruction, where every microinstruction, where a code is used for each action to be
bit in the control field attaches to a control performed and the decoder translates this code into
line. individual control signals.
53
Memory Organization
A storage element is called a Cell. Each register is made up of storage element in which one bit of data is
stored. The data in a memory are stored and retrieved by the process
called writing and readingrespectively.
A word is a group of bits where a memory unit stores binary information. A word with group of 8 bits is
called a byte.
A memory unit consists of data lines, address selection lines, and control lines that specify the direction
of transfer. The block diagram of a memory unit is shown below:
54
Data lines provide the information to be stored in memory. The control inputs specify the direction
transfer. The k-address lines specify the word chosen.
Refer for RAM and ROM, different types of RAM, cache memory, and secondary memory
We can infer the following characteristics of Memory Hierarchy Design from above figure:
1. Capacity:
It is the global volume of information the memory can store. As we move from top to bottom in
the Hierarchy, the capacity increases.
2. Access Time:
It is the time interval between the read/write request and the availability of the data. As we
move from top to bottom in the Hierarchy, the access time increases.
3. Performance:
Earlier when the computer system was designed without Memory Hierarchy design, the speed
55
gap increases between the CPU registers and Main Memory due to large difference in access
time. This results in lower performance of the system and thus, enhancement was required. This
enhancement was made in the form of Memory Hierarchy Design because of which the
performance of the system increases. One of the most significant ways to increase system
performance is minimizing how far down the memory hierarchy one has to go to manipulate
data.
The following information can be obtained from the memory chip representation shown above:
Now we can clearly state the difference between Byte Addressable Memory & Word Addressable
Memory.
56
BYTE ADDRESSABLE MEMORY WORD ADDRESSABLE MEMORY
When the data space in the cell = 8 When the data space in the cell = word length of
bitsthen the corresponding address CPU then the corresponding address space is called
space is called as Byte Address. as Word Address.
Based on this data storage i.e. Bytewise Based on this data storage i.e. Wordwise storage, the
storage, the memory chip configuration is memory chip configuration is named as Word
named as Byte Addressable Memory. Addressable Memory.
For eg. : 64K X 8 chip has 16 bit Address For eg. : For a 16-bit CPU, 64K X 16 chip has 16 bit
and cell size = 8 bits (1 Byte) which means Address & cell size = 16 bits (Word Length of CPU)
that in this chip, data is stored byte by which means that in this chip, data is stored word by
byte. word.
NOTE :
i) The most important point to be noted is that in case of either of Byte Address or Word Address, the
address size can be any number of bits (depends on the number of cells in the chip) but the cell
sizediffers in each case.
ii) The default memory configuration in the Computer design is Byte Addressable.
These two types include Simultaneous Access Memory Organisation and Hierarchical Access Memory
Organisation. Let us understand the difference between the two from the following table:
57
Difference between Simultaneous and Hierarchical Access Memory Organisations:
NOTE:
1. By default the memory structure of Computer Systems is designed with Hierarchical Access
Memory Organisation.It is so because in this type of memory organisation the average access
time is reduced due to locality of references.
2. Simultaneous access Memory organisation is used for the implementation of Write Through
Cache.
3. In both types of memory organisation, the Hit Ratio of last memory level is always 1.
58
complete problem. However, this problem can be reduced to graph coloring to achieve allocation and
assignment. Therefore a good register allocator computes an effective approximate solution to a hard
problem.
Figure – Input-Output
The register allocator determines which values will reside in the register and which register will hold
each of those values. It takes as its input a program with an arbitrary number of registers and produces a
program with finite register set that can fit into the target machine. (See image)
Allocation vs Assignment:
Allocation –
Maps an unlimited namespace onto that register set of the target machine.
Reg. to Reg. Model: Maps virtual registers to physical registers but spills excess amount to
memory.
Mem. to Mem. Model: Maps some subset of the memory location to a set of names that
models physical register set.
Allocation ensures that code will fit the target machine’s reg. set at each instruction.
Assignment –
Maps an allocated name set to physical register set of the target machine.
Assumes allocation has been done so that code will fit into the set of physical registers.
No more than ‘k’ values are designated into the registers, where ‘k’ is the no. of physical
registers.
Solved in polynomial time, when (no. of required registers) <= (no. of available physical
registers).
Top Down Approach is a simple approach based on ‘Frequency Count’. Identify the values which should
be kept in registers and which should be kept in memory.
Algorithm:
59
2. Sort the registers in into priority order.
Liveness and Live Ranges: Live ranges consists of a set of definitions and uses that are related to
each other as they i.e. no single register can be common in such couple of instruction/data.
Following is a way to find out Live ranges in a block. A live range is represented as an interval [i,j], where
i is the definition and j is the last use.
2. Global allocation can’t guarantee an optimal solution for the execution time of spill code.
3. Prime differences between Local and Global Allocation:
Structure of a global live range is naturally more complex than the local one.
Within a global live range, distinct references may execute a different number of times. (When
basic blocks form a loop)
4. To make the decision about allocation and assignments, global allocator mostly uses graph coloring by
building an interference graph.
5. Register allocator then attempts to construct a k-coloring for that graph where ‘k’ is the no. of
physical registers.
In case, the compiler can’t directly construct a k-coloring for that graph, it modifies the
underlying code by spilling some values to memory and tries again.
Spilling actually simplifies that graph which ensures that the algorithm will halt.
6. Global Allocator uses several approaches, however, we’ll see top down and bottom up allocations
strategies. Subproblems associated with the above approaches.
60
Figure – Discovering live ranges in a single block
The above diagram explains everything properly. Lets take the example of Rarp, its been initialised at
program point 1 and its last usage is at program point 11. Therefore, Live Rnage of Rarp i.e. Larp is
[1,11]. Similarly, others follow up.
Essential for taking a spill decision which includes – address computation, memory operation
cost and estimated execution frquency.
For performance benefits these spilled values are kept typically for Activation record.
Some embedded processors offers ScratchPad Memory to hold such spilled values.
Negative Spill Cost: Consecutive load store for a single address needs to be removed as it
increases burden, hence incur negative spill cost.
Infinite Spill Cost: A live range should have infinite spill cost if no other live range ends between
its definition and it’s used.
61
Interference and Interference Graph:
From the above diagram, it can be observed that the live range LRa starts in the first basic block and
ends in the last basic block. Therefore it will share an edge with every other live Range i.e. Lrb,Lrc,Lrd.
However, Lrb,Lrc,Lrd doesn’t overlap with any other live range excpet Lra so they are only sharing an
edge with Lra.
Building an Allocator:
Try with live range splitting into some non-trivial chunks (most used ones).
1. Tries to color live range in an order determined by some ranking functions i.e. priority based.
2. If no color is available for a live range, allocator invokes either spilling or splitting to handle
uncolored ones.
3. Live ranges having k or more neighbors are called constrained nodes and are difficult to handle.
5. Handling Spills: When no color found for some live ranges, spilling is needed to be done, but
this may not be a final/ultimate solution of course.
6. Live Range Splitting: For uncolored ones, split the live range into sub-ranges, those may have
fewer interference than the original one so that some of them can be colored at least.
Chaitin’s Idea:
62
Remove that node and all its edges from the graph. (This may decrease the degree of some
other nodes and cause some more nodes to have degree = k, some node has to be spilled.
If no vertex needs to be spilled, successively pop vertices off the stack and color them in a color
not used by neighbors. (reuse colors as far as possible).
Top-down allocator could adopt the ‘spill and iterate’ philosophy used in bottom-up ones.
‘Spill and iterate’ trades additional compile time for an allocation that potentially, uses less spill
code.
Top-Down uses priority ranking to order all the constrained nodes. (However, it colors the
unconstrained nodes in an arbitrary order)
Bottom-up constructs an order in which most nodes are colored in a graph where they are
unconstrained.
63
Computer Organization | Cache Memory
Cache Memory is a special very high-speed memory. It is used to speedup and synchronising with high-
speed CPU. Cache memory is costlier than main memory or disk memory but economical than CPU
registers. Cache memory is an extremely fast memory type that acts as a buffer between RAM and the
CPU. It holds frequently requested data and instructions so that they are immediately available to the
CPU when needed.
Cache memory is used to reduce the average time to access data from the Main memory. The cache is a
smaller and faster memory which stores copies of the data from frequently used main memory
locations. There are various different independent caches in a CPU, which stored instruction and data.
Levels of memory:
Level 1 or Register –
It is a type of memory in which data is stored and accepted that are immediately stored in CPU.
Most commonly used register is accumulator, Program counter, address register etc.
Cache Performance:
When the processor needs to read or write a location in main memory, it first checks for a
corresponding entry in the cache.
If the processor finds that the memory location is in the cache, a cache hit has occurred and
data is read from chache
If the processor does not find the memory location in the cache, a cache miss has occurred. For
a cache miss, the cache allocates a new entry and copies in data from main memory, then the
request is fulfilled from the contents of the cache.
The performance of cache memory is frequently measured in terms of a quantity called Hit ratio.
64
We can improve Cache performance using higher cache block size, higher associativity, reduce miss rate,
reduce miss penalty, and reduce Reduce the time to hit in the cache.
Cache Mapping:
There are three different types of mapping used for the purpose of cache memory are as follow: Direct
mapping, Associative mapping, and Set-Associative mapping. These are explained as following below.
1. Direct Mapping –
The simplest technique, known as direct mapping, maps each block of main memory into only
one possible cache line. or
In direct mapping, assigned each memory block to a specific line in the cache. If a line is
previously taken up by a memory block when a new block needs to be loaded, the old block is
trashed. An address space is split into two parts index field and tag field. The cache is used to
store the tag field whereas the rest is stored in the main memory. Direct mapping`s
performance is directly proportional to the Hit ratio.
2. i = j modulo m
3. where
For purposes of cache access, each main memory address can be viewed as consisting of three fields.The
least significant w bits identify a unique word or byte within a block of main memory. In most
contemporary machines,the address is at the byte level.The remaining s bits specify one of the 2 s blocks
of main memory.The cache logic interprets these s bits as a tag of s-r bits (most significant portion) and
a line field of r bits. This latter field identifies one of the m=2r lines of the cache.
65
6. Associative Mapping –
In this type of mapping the associative memory is used to store content and addresses both of
the memory word. Any block can go into any line of the cache. This means that the word id bits
are used to identify which word in the block is needed, but the tag becomes all of the remaining
bits. This enables the placement of the any word at any place in the cache memory. It is
considered to be the fastest and the most flexible mapping form.
7. Set-associative Mapping –
This form of mapping is a enhanced form of the direct mapping where the drawbacks of direct
mapping is removed. Set associative addresses the problem of possible thrashing in the direct
mapping method. It does this by saying that instead of having exactly one line that a block can
map to in the cache, we will group a few lines together creating a set. Then a block in memory
can map to any one of the lines of a specific set..Set-associative mapping allows that each word
that is present in the cache can have two or more words in the main memory for the same index
address. Set associative cache mapping combines the best of direct and associative cache
mapping techniques.
66
In this case, the cache consists of a number sets, each of which consists of a number of lines.The
relationships are
m=v*k
i= j mod v
where
v=number of stes
1. Usually, the cache memory can store a reasonable number of blocks at any given time,
but this number is small compared to the total number of blocks in the main memory.
2. The correspondence between the main memory blocks and those in the cache is
specified by a mapping function.
Types of Cache –
67
Primary Cache –
A primary cache is always located on the processor chip. This cache is small and its
access time is comparable to that of processor registers.
Secondary Cache –
Secondary cache is placed between the primary cache and the rest of the memory. It is
referred to as the level 2 (L2) cache. Often, the Level 2 cache is also housed on the
processor chip.
Locality of reference –
Since size of cache memory is less as compared to main memory. So to check which part of main
memory should be given priority and loaded in cache is decided based on locality of reference.
A Simple Solution:
One way to go about this mapping is to consider last few bits of long memory address to find small
cache address, and place them at the found address.
68
Solution is Tag:
To handle above problem, more information is stored in cache to tell which block of memory is stored in
cache. We store additional information as Tag
69
The above arrangement is Direct Mapped Cache and it has following problem
We have discussed above that last few bits of memory addresses are being used to address in cache and
remaining bits are stored as tag. Now imagine that cache is very small and addresses of 2 bits. Suppose
we use the last two bits of main memory address to decide the cache (as shown in below diagram). So if
a program accesses 2, 6, 2, 6, 2, …, every access would cause a hit as 2 and 6 have to be stored in same
location in cache.
70
Here the CPU directly communicates with the main memory and no caches are involved.
In this case, the CPU needs to access the main memory 10 times to access the desired information.
Here the CPU at first checks whether the desired data is present in the Cache Memory or not i.e.
whether there is a “hit” in cache or “miss” in cache. Suppose there are 3 miss in Cache Memory then the
Main Memory will be accessed only 3 times . We can see that here the miss penalty is reduced because
the Main Memory is accessed lesser number of times than that in the previous case.
Here the Cache performance is optimized further by introducing multilevel Caches. As shown in the
above figure, we are considering 2 level Cache Design . Suppose there are 3 miss in the L1 Cache
Memory and out of these 3 misses there are 2 miss in the L2 Cache Memory then the Main Memory will
be accessed only 2 times . It is clear that here the Miss Penalty is reduced considerably than that in the
previous case thereby improving the Performance of Cache Memory.
NOTE :
We can observe from the above 3 cases that we are trying to decrease the number of Main Memory
References and thus decreasing the Miss Penalty in order to improve the overall System Performance.
Also, it is important to note that in the Multilevel Cache Design, L1 Cache is attached to the CPU and it is
small in size but fast. Although, L2 Cache is attached to the Primary Cache i.e. L1 Cache and it is larger in
size and slower but still faster than the Main Memory.
Caches are the faster memories that are built to deal with the Processor-Memory gap in data read
operation, i.e. the time difference in a data read operation in a CPU register and that in the main
memory. Data read operation in registers is generally 100 times faster than in the main memory and it
keeps on increasing substantially, as we go down in the memory hierarchy.
Caches are installed in the middle of CPU registers and the main memory to bridge this time gap in data
reading. Caches serve as temporary staging area for a subset of data and instructions stored in relatively
slow main memory. Since the size of cache is small, only the data which is frequently used by the
processor during the execution of a program is stored in cache. Caching of this frequently used data by
71
CPU eliminates the need of bringing the data from the slower main memory again and again which takes
hundreds of CPU cycles.
The idea of caching the useful data centers around a fundamental property of computer programs
known as locality. Programs with good locality tend to access the same set of data items over and over
again from the upper levels of the memory hierarchy (i.e. cache) and thus run faster.
Example: The run time of different matrix multiplication kernels that perform the same number of
arithmetic operations, but have different degrees of locality, can vary by a factor of 20!
Types of Locality:
Temporal locality –
Temporal locality states that, the same data objects are likely to be reused multiple times by the
CPU during the execution of a program. Once a data object has been written into the cache on
the first miss, a number of subsequent hits on that object can be expected. Since the cache is
faster than the storage at the next lower level like the main memory, these subsequent hits can
be served much faster than the original miss.
Spatial locality –
It states that if a data object is referenced once, then there is a high probability that it’s
neighbor data objects will also be referenced in near future. Memory blocks usually contain
multiple data objects. Because of spatial locality, we can expect that the cost of copying a block
after a miss will be amortized by subsequent references to other objects within that block.
Importance of Locality –
Locality in programs has an enormous impact on the design and performance of hardware and software
systems. In modern computing systems, the locality based advantages are not only confined to the
architecture but also, operating systems and application programs are built in a manner that they can
exploit the locality to the full extent.
In operating systems, the principle of locality allows the system to use main memory as a cache of the
most recently referenced chunk of virtual address space and also in case of recently used disk blocks in
disk file systems.
Similarly, Applications programs like web browsers exploit temporal locality by caching recently
referenced documents on a local disk. High-volume web servers hold recently requested documents in
the front end disk cache that satisfy requests for these documents aithout any intervention of server.
Frequently used cases need to be faster: Programs often invest most of the time in a few core
functions and these functions in return have most to do with the loops. So, these loops should
be designed in a way that they possess a good locality.
72
Multiple loops: If a program constitutes of multiple loops then minimize the cache misses in the
inner loop to alleviate the performance of the code.
Example-1: The above context can be understood by following the simple examples of multi-
dimensional array code. Consider the sum_array() function which sums the elements of a two dimension
array in row-major order:
int sumarrayrows(int a[8][4])
int i, j, sum = 0;
sum += a[i][j];
return sum;
Assuming, the cache has a block size of 4 words each, word size being 4 bytes. It is initially empty and
since, C stores arrays in row-major order so the references will result in the following pattern of hits and
misses, independent of cache organization.
The block which contains w[0]–w[3] is loaded into the cache from memory and reference to w[0] is a
miss but the next three references are all hits. The reference to v[4] causes another miss as a new block
is loaded into the cache, the next three references are hits, and so on. In general, three out of four
references will hit, which is the best that can be done with a cold cache. Thus, the hit ratio is 3/4*100 =
75%
Example-2: Now, the sum_array() function sums the elements of a two dimension array in column-
major order.
int sum_array(int a[8][8])
int i, j, sum = 0;
73
sum += a[i][j];
return sum;
As C stores arrays in row-major order but in this case array is being accessed in column major order, so
the locality spoils in this case. the references will be made in order: a[0][0], a[1][0], a[2][0] and so on. As
the cache size is smaller, with each reference there will be a miss due to poor locality of the program.
Hence, the hit ratio will be 0. Poor hit ratio will eventually decrease the performance of a program and
will lead to a slower execution. In programming, these type of practices should be avoided.
Conclusion -
When talking about real life application programs and programming realms, optimized cache
performance gives a good speedup to a program, even if the runtime complexity of the program is high.
A good example is Quick sort. Though it has a worst case complexity of O(n2), it is the most popular
sorting algorithm and one of the important factor is the better cache performance than many other
sorting algorithms. Codes should be written in a way that they can exploit the cache to the best extent
for a faster execution.
74
1. In case of loops in program control processing unit repeatedly refers to the set of instructions
that constitute the loop.
2. In case of subroutine calls, everytime the set of instructions are fetched from memory.
3. References to data items also get localized that means same data item is referenced again and
again.
In above figure, you can see that CPU wants to read or fetch the data or instruction.First it will access
the cache memory as it is near to it and provides very fast access. If the required data or instruction is
found, it will be fetched. This situation is known as cache hit. But if the required data or instruction is
not found in the cache memory then this situation is known as cache miss.Now the main memory will be
searched for the required data or instruction that was being searched and if found will go through one
of the two ways:
1. First way is that the CPU should fetch the required data or instruction and use it and that’s it but
what, when the same data or instruction is required again.CPU again has to access same main
memory location for it and we already know that main memory is the slowest to access.
2. The second way is to store the data or instruction in the cache memory so that if it is needed
soon again in near future it could be fetched in a much faster way.
Cache Operation:
It is based on the principle of locality of reference. There are two ways with which data or instruction is
fetched from main memory and get stored in cache memory. These two ways are following:
1. Temporal Locality –
Temporal locality means current data or instruction that is being fetched may be needed soon.
So we should store that data or instruction in the cache memory so that we can avoid again
searching in main memory for the same data.
75
When CPU accesses the current main memory location for reading required data or instruction, it also
get stored in cache memory which is based on the fact that same data or instruction may be needed in
near future. This is known as temporal locality.
2. Spatial Locality –
Spatial locality means instruction or data near to the current memory location that is being
fetched, may be needed soon in near future.This is slightly different from temporal locality.Here
we are taking about nearly located memory locations while in temporal locality we were taking
about the actual memory location that were being fetched.
Cache Performance:
The performance of the cache is measured in terms of hit ratio. When CPU refers to memory and find
the data or instruction within the Cache Memory, it is known as cache hit. If the desired data or
instruction is not found in cache memory and CPU refers to the main memory to find that data or
instruction, it is known as cache miss.
Average access time of any memory system consists of two levels: Cache and Main Memory. If Tc is time
to access cache memory and Tm is the time to access main memory then we can write:
76
Tavg = Average time to access memory
First thing first. CPU Cache is a fast memory which is used to improve latency of fetching information
from Main memory (RAM) to CPU registers. So CPU Cache sits between Main memory and CPU. And this
cache stores information temporarily so that the next access to the same information is faster. A CPU
cache which used to store executable instructions, it’s called Instruction Cache (I-Cache). A CPU cache
which is used to store data, it’s called Data Cache (D-Cache). So I-Cache and D-Cache speeds up fetching
time for instructions and data respectively. A modern processor contains both I-Cache and D-Cache. For
completeness, let us discuss about D-cache hierarchy as well. D-Cache is typically organized in a
hierarchy i.e. Level 1 data cache, Level 2 data cache etc.. It should be noted that L1 D-Cache is
faster/smaller/costlier as compared to L2 D-Cache. But the basic idea of ‘CPU cache‘ is to speed up
instruction/data fetch time from Main memory to CPU.
Translation Lookaside Buffer (i.e. TLB) is required only if Virtual Memory is used by a processor. In
short, TLB speeds up translation of virtual address to physical address by storing page-table in a faster
memory. In fact, TLB also sits between CPU and Main memory. Precisely speaking, TLB is used by MMU
when physical address needs to be translated to virtual address. By keeping this mapping of virtual-
physical addresses in a fast memory, access to page-table improves. It should be noted that page-table
(which itself is stored in RAM) keeps track of where virtual pages are stored in the physical memory. In
that sense, TLB also can be considered as a cache of the page-table.
But the scope of operation for TLB and CPU Cache is different. TLB is about ‘speeding up address
translation for Virtual memory’ so that page-table needn’t to be accessed for every address. CPU Cache
is about ‘speeding up main memory access latency’ so that RAM isn’t accessed always by CPU. TLB
operation comes at the time of address translation by MMU while CPU cache operation comes at the
time of memory access by CPU. In fact, any modern processor deploys all I-Cache, L1 & L2 D-Cache and
TLB.
Please do Like/Share if you find the above useful. Also, please do leave us comment for further
77
Different Types of RAM (Random Access Memory )
RAM(Random Access Memory) is a part of computer’s Main Memory which is directly accessible by CPU.
RAM is used to Read and Write data into it which is accessed by CPU randomly. RAM is volatile in
nature, it means if the power goes off, the stored information is lost. RAM is used to store the data that
is currently processed by the CPU. Most of the programs and data that are modifiable are stored in
RAM.
1. SRAM(Static RAM)
2. DRAM(Dynamic RAM)
SRAM
The SRAM memories consist of circuits capable of retaining the stored information as long as the power
is applied. That means this type of memory requires constant power. SRAM memories are used to build
Cache Memory.
SRAM Memory Cell: Static memories (SRAM) are memories that consist of circuits capable of retaining
their state as long as power is on. Thus this type of memories is called volatile memories. The below
figure shows a cell diagram of SRAM. A latch is formed by two inverters connected as shown in the
figure. Two transistors T1 and T2 are used for connecting the latch with two bit lines. The purpose of
these transistors is to act as switches that can be opened or closed under the control of the word line,
which is controlled by the address decoder. When the word line is at 0-level, the transistors are turned
off and the latch remains its information. For example, the cell is at state 1 if the logic value at point A is
1 and at point B is 0. This state is retained as long as the word line is not activated.
For Read operation, the word line is activated by the address input to the address decoder. The
activated word line closes both the transistors (switches) T1 and T2. Then the bit values at points A and
78
B can transmit to their respective bit lines. The sense/write circuit at the end of the bit lines sends the
output to the processor.
For Write operation, the address provided to the decoder activates the word line to close both the
switches. Then the bit value that to be written into the cell is provided through the sense/write circuit
and the signals in bit lines are then stored in the cell.
DRAM
DRAM stores the binary information in the form of electric charges that applied to capacitors. The
stored information on the capacitors tend to lose over a period of time and thus the capacitors must be
periodically recharged to retain their usage. The main memory is generally made up of DRAM chips.
DRAM Memory Cell: Though SRAM is very fast, but it is expensive because of its every cell requires
several transistors. Relatively less expensive RAM is DRAM, due to the use of one transistor and one
capacitor in each cell, as shown in the below figure., where C is the capacitor and T is the transistor.
Information is stored in a DRAM cell in the form of a charge on a capacitor and this charge needs to be
periodically recharged.
For storing information in this cell, transistor T is turned on and an appropriate voltage is applied to the
bit line. This causes a known amount of charge to be stored in the capacitor. After the transistor is
turned off, due to the property of the capacitor, it starts to discharge. Hence, the information stored in
the cell can be read correctly only if it is read before the charge on the capacitors drops below some
threshold value.
Types of DRAM
1. Asynchronous DRAM (ADRAM): The DRAM described above is the asynchronous type DRAM.
The timing of the memory device is controlled asynchronously. A specialized memory controller
circuit generates the necessary control signals to control the timing. The CPU must take into
account the delay in the response of the memory.
2. Synchronous DRAM (SDRAM): These RAM chips’ access speed is directly synchronized with the
CPU’s clock. For this, the memory chips remain ready for operation when the CPU expects them
to be ready. These memories operate at the CPU-memory bus without imposing wait states.
SDRAM is commercially available as modules incorporating multiple SDRAM chips and forming
the required capacity for the modules.
79
3. Double-Data-Rate SDRAM (DDR SDRAM): This faster version of SDRAM performs its operations
on both edges of the clock signal; whereas a standard SDRAM performs its operations on the
rising edge of the clock signal. Since they transfer data on both edges of the clock, the data
transfer rate is doubled. To access the data at high rate, the memory cells are organized into
two groups. Each group is accessed separately.
4. Rambus DRAM (RDRAM): The RDRAM provides a very high data transfer rate over a narrow
CPU-memory bus. It uses various speedup mechanisms, like synchronous memory interface,
caching inside the DRAM chips and very fast signal timing. The Rambus data bus width is 8 or 9
bits.
5. Cache DRAM (CDRAM): This memory is a special type DRAM memory with an on-chip cache
memory (SRAM) that acts as a high-speed buffer for the main DRAM.
Below table lists some of the differences between SRAM and DRAM:
The disk is divided into tracks. Each track is further divided into sectors. The point to be noted here is
that outer tracks are bigger in size than the inner tracks but they contain the same number of sectors
and have equal storage capacity. This is because the storage density is high in sectors of the inner tracks
where as the bits are sparsely arranged in sectors of the outer tracks. Some space of every sector is used
for formatting. So, the actual capacity of a sector is less than the given capacity.
Read-Write(R-W) head moves over the rotating hard disk. It is this Read-Write head that performs all
the read and write operations on the disk and hence, position of the R-W head is a major concern. To
80
perform a read or write operation on a memory location, we need to place the R-W head over that
position. Some important terms must be noted here:
1. Seek time – The time taken by the R-W head to reach the desired track from it’s current
position.
2. Rotational latency – Time taken by the sector to come under the R-W head.
3. Data transfer time – Time taken to transfer the required amount of data. It depends upon the
rotational speed.
5. Average Access time – seek time + Average Rotational latency + data transfer time + controller
time.
In questions, if the seek time and controller time is not mentioned, take them to be zero.
If the amount of data to be transferred is not given, assume that no data is being transferred.
Otherwise, calculate the time taken to transfer the given amount of data.
The average of rotational latency is taken when the current position of R-W head is not given. Because,
the R-W may be already present at the desired position or it might take a whole rotation to get the
desired sector under the R-W head. But, if the current position of the R-W head is given then the
rotational latency must be calculated.
Example –
Consider a hard disk with:
4 surfaces
64 tracks/surface
128 sectors/track
256 bytes/sector
2. The disk is rotating at 3600 RPM, what is the data transfer rate?
60 sec -> 3600 rotations
1 sec -> 60 rotations
Data transfer rate = number of rotations per second * track capacity * number of surfaces (since
1 R-W head is used for each surface)
Data transfer rate = 60 * 128 * 256 * 4
Data transfer rate = 7.5 MB/sec
3. The disk is rotating at 3600 RPM, what is the average access time?
Since, seek time, controller time and the amount of data to be transferred is not given, we
consider all the three terms as 0.
Therefore, Average Access time = Average rotational delay
81
Rotational latency => 60 sec -> 3600 rotations
1 sec -> 60 rotations
Rotational latency = (1/60) sec = 16.67 msec.
Average Rotational latency = (16.67)/2
= 8.33 msec.
Average Access time = 8.33 msec.
N = 2^{l} Bytes
Where,
Where,
l is the total address buses
N is the memory in bytes
For example, some storage can be described below in terms of bytes using the above formula:
= 2^16 Bytes
= 2^32 Bytes
Memory Address Register (MAR) is the address register which is used to store the address of the
memory location where the operation is being performed Memory Data Register (MDR) is the data
register which is used to store the data on which the operation is being performed.
82
In the above diagram initially, MDR can contain any garbage value and MAR is containing 2003 memory
address. After the execution of read instruction, the data of memory location 2003 will be read and the
MDR will get updated by the value of the 2003 memory location (3D).
In the above diagram, the MAR contains 2003 and MDR contains 3D. After the execution of write
instruction 3D will be written at 2003 memory location.
In I/O Interface (Interrupt and DMA Mode), we have discussed concept behind the Interrupt-initiated
I/O.
To summarize, when I/O devices are ready for I/O transfer, they generate an interrupt request signal to
the computer. The CPU receives this signal, suspends the current instructions it is executing and then
moves forward to service that transfer request. But what if multiple devices generate interrupts
simultaneously. In that case, we have to have a way to decide which interrupt is to be serviced first. In
other words, we have to set a priority among all the devices for systemic interrupt servicing.
The concept of defining the priority among devices so as to know which one is to be serviced first in case
of simultaneous requests is called priority interrupt system. This could be done with either software or
hardware methods.
83
SOFTWARE METHOD – POLLING
In this method, all interrupts are serviced by branching to the same service program. This program then
checks with each device if it is the one generating the interrupt. The order of checking is determined by
the priority that has to be set. The device having the highest priority is checked first and then devices
are checked in descending order of priority. If the device is checked to be generating the interrupt,
another service program is called which works specifically for that particular device.
The structure will look something like this-
if (device[0].flag)
device[0].service();
else if (device[1].flag)
device[1].service();
.
.
.
.
.
.
else
//raise error
The major disadvantage of this method is that it is quite slow. To overcome this, we can use hardware
solution, one of which involves connecting the devices in series. This is called Daisy-chaining method.
The daisy-chaining method involves connecting all the devices that can request an interrupt in a serial
manner. This configuration is governed by the priority of the devices. The device with the highest
priority is placed first followed by the second highest priority device and so on. The given figure depicts
this arrangement.
WORKING:
There is an interrupt request line which is common to all the devices and goes into the CPU.
84
When no interrupts are pending, the line is in HIGH state. But if any of the devices raises an
interrupt, it places the interrupt request line in the LOW state.
The CPU acknowledges this interrupt request from the line and then enables the interrupt
acknowledge line in response to the request.
If the device has not requested the interrupt, it passes this signal to the next device through its
PO(priority out) output. (PI = 1 & PO = 1)
The device consumes the acknowledge signal and block its further use by placing 0 at its
PO(priority out) output.
The device then proceeds to place its interrupt vector address(VAD) into the data bus of
CPU.
The device puts its interrupt request signal in HIGH state to indicate its interrupt has
been taken care of.
NOTE: VAD is the address of the service routine which services that device.
If a device gets 0 at its PI input, it generates 0 at the PO output to tell other devices that
acknowledge signal has been blocked. (PI = 0 & PO = 0)
Hence, the device having PI = 1 and PO = 0 is the highest priority device that is requesting an interrupt.
Therefore, by daisy chain arrangement we have ensured that the highest priority interrupt gets serviced
first and have established a hierarchy. The farther a device is from the first device, the lower its priority.
This article is contributed by Jatin Gupta. If you like GeeksforGeeks and would like to contribute, you
can also write an article using contribute.geeksforgeeks.org or mail your article to
contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help
other Geeks.
Please write comments if you find anything incorrect, or you want to share more information about the
topic discussed above.
Mode of Transfer:
85
The binary information that is received from an external device is usually stored in the memory unit. The
information that is transferred from the CPU to the external device is originated from the memory unit.
CPU merely processes the information but the source and target is always the memory unit. Data
transfer between CPU and the I/O devices may be done in different modes.
Data transfer to and from the peripherals may be done in any of the three possible ways
1. Programmed I/O.
1. Programmed I/O: It is due to the result of the I/O instructions that are written in the computer
program. Each data item transfer is initiated by an instruction in the program. Usually the
transfer is from a CPU register and memory. In this case it requires constant monitoring by the
CPU of the peripheral devices.
Example of Programmed I/O: In this case, the I/O device does not have direct access to the memory
unit. A transfer from I/O device to memory requires the execution of several instructions by the CPU,
including an input instruction to transfer the data from device to the CPU and store instruction to
transfer the data from CPU to memory. In programmed I/O, the CPU stays in the program loop until the
I/O unit indicates that it is ready for data transfer. This is a time consuming process since it needlessly
keeps the CPU busy. This situation can be avoided by using an interrupt facility. This is discussed below.
2. Interrupt- initiated I/O: Since in the above case we saw the CPU is kept busy unnecessarily. This
situation can very well be avoided by using an interrupt driven method for data transfer. By
using interrupt facility and special commands to inform the interface to issue an interrupt
request signal whenever data is available from any device. In the meantime the CPU can
proceed for any other program execution. The interface meanwhile keeps monitoring the
device. Whenever it is determined that the device is ready for data transfer it initiates an
interrupt request signal to the computer. Upon detection of an external interrupt signal the CPU
stops momentarily the task that it was already performing, branches to the service program to
process the I/O transfer, and then return to the task it was originally performing.
Note: Both the methods programmed I/O and Interrupt-driven I/O require the active intervention of the
processor to transfer data between memory and the I/O module, and any data transfer must transverse
a path through the processor. Thus both these forms of I/O suffer from two inherent drawbacks.
The I/O transfer rate is limited by the speed with which the processor can test and
service a
device.
86
3. Direct Memory Access: The data transfer between a fast storage media such as magnetic disk
and memory unit is limited by the speed of the CPU. Thus we can allow the peripherals directly
communicate with each other using the memory buses, removing the intervention of the CPU.
This type of data transfer technique is known as DMA or direct memory access. During DMA the
CPU is idle and it has no control over the memory buses. The DMA controller takes over the
buses to manage the transfer directly between the I/O devices and the memory unit.
Bus Request : It is used by the DMA controller to request the CPU to relinquish the control of the buses.
Bus Grant : It is activated by the CPU to Inform the external DMA controller that the buses are in high
impedance state and the requesting DMA can take control of the buses. Once the DMA has taken the
control of the buses it transfers the data. This transfer can take place in many ways.
2. Transfer the entire block of data at transfer rate of device because the device is usually slow
than the
speed at which the data can be transferred to CPU.
Where,
87
% CPU Busy=(X/X+Y)*100
Cyclic Stealing:
In this DMA controller transfers one word at a time after which it must return the control of the buses to
the CPU. The CPU merely delays its operation for one memory cycle to allow the direct memory I/O
transfer to “steal” one memory cycle.
Steps Involved are:
5. Inform the CPU that the device has 1 byte to transfer (i.e. bus grant request)
Before moving on transfer next byte of data, device performs step 1 again so that bus isn’t tied up and
the transfer won’t depend upon the transfer rate of device.
So, for 1 byte of transfer of data, time taken by using cycle stealing mode (T).
= time required for bus grant + 1 bus cycle to transfer data + time required to release the bus, it will be
NxT
In cycle stealing mode we always follow pipelining concept that when one byte is getting transferred
then Device is parallel preparing the next byte. “The fraction of CPU time to the data transfer time” if
asked then cycle stealing mode is used.
Where,
(words/block)
time (words/block)
% CPU busy=(X/Y)*100
Interleaved mode: In this technique , the DMA controller takes over the system bus when the
microprocessor is not using it.An alternate half cycle i.e. half cycle DMA + half cycle processor.
88
Computer Organization | Asynchronous input output synchronization
Asynchronous input output is a form of input output processing that allows others devices to do
processing before the transmission or data transfer is done.
1. Strobe
2. Handshaking
1. Strobe Mechanism:
1. Source initiated Strobe – When source initiates the process of data transfer. Strobe is just a
signal.
(i) First, source puts data on the data bus and ON the strobe signal.
(ii) Destination on seeing the ON signal of strobe, read data from the data bus.
(iii) After reading data from the data bus by destination, strobe gets OFF.
It shows that first data is put on the data bus and then strobe signal gets active.
2. Destination initiated signal – When destination initiates the process of data transfer.
89
(i) First, the destination ON the strobe signal to ensure the source to put the fresh data on the data bus.
(ii) Source on seeing the ON signal puts fresh data on the data bus.
(iii) Destination reads the data from the data bus and strobe gets OFF signal.
It shows that first strobe signal gets active then data is put on the data bus.
1. In Source initiated Strobe, it is assumed that destination has read the data from the data bus but
their is no surety.
2. In Destination initiated Strobe, it is assumed that source has put the data on the data bus but
their is no surety.
2. Handshaking Mechanism:
1. Source initiated Handshaking – When source initiates the data transfer process. It consists of
signals:
DATA VALID: if ON tells data on the data bus is valid otherwise invalid.
DATA ACCEPTED: if ON tells data is accepted otherwise not accepted.
(i) Source places data on the data bus and enable Data valid signal.
(ii) Destination accepts data from the data bus and enable Data accepted signal.
90
(iii) After this, disable Data valid signal means data on data bus is invalid now.
(iv) Disable Data accepted signal and the process ends.
Now there is surety that destination has read the data from the data bus through data accepted signal.
It shows that first data is put on the data bus then data valid signal gets active and then data accepted
signal gets active. After accepting the data, first data valid signal gets off then data accepted signal gets
off.
2. Destination initiated Handshaking – When destination initiates the process of data transfer.
REQUEST FOR DATA: if ON requests for putting data on the data bus.
DATA VALID: if ON tells data is valid on the data bus otherwise invalid data.
(i) When destination is ready to receive data, Request for Data signal gets activated.
(ii) Source in response puts data on the data bus and enabled Data valid signal.
(iii) Destination then accepts data from the data bus and after accepting data, disabled Request for Data
signal.
(iv) At last, Data valid signal gets disabled means data on the data bus is no more valid data.
Now there is surety that source has put the data on the data bus through data valid signal.
91
It shows that first Request for Data signal gets active then data is put on data bus then Data valid signal
gets active. After reading data, first Request for Data signal gets off then Data valid signal.
The master does not expect any acknowledgement signal from the slave, when a data is sent by the
master to the slave. Similarly, when a data from the slave is read by the master, neither the slave
informs that the data has been placed on the data bus nor the master acknowledges that the data has
been read. Both the master and slave performs their own task of transferring data at designed clock
period. Since both devices know the behavior (response time) of each other, no difficulty arises.
Prior to transferring data, the master must logically select the slave either by sending slave’s address or
sending “device select” signal to the slave. But there are no acknowledgement signal from the slave to
master if device is selected.
92
In this timing diagram, the master first places slave’s address in the address bus and read signal in the
control line at the falling edge of the clock . The entire read operation is over in one clock period.
Advantages –
1. The design procedure is easy. The master does not wait for any acknowledges signal from the
slave through the master waits for a time equal to slave’s response time.
2. The slave does not generate acknowledge signal, though it obeys the timing rules as per the
protocol set by the master or system designer.
Disadvantages –
1. If a slow speed unit connected to a common bus, it can degrade overall rate of transfer in the
system.
2. If the slave operates at a slow speed, the master will be idle for some time during data transfer
and vice versa.
The Input Output Processor (IOP) is just like a CPU that handles the details of I/O operations. It is more
equipped with facilities than those are available in typical DMA controller. The IOP can fetch and
execute its own instructions that are specifically designed to characterize I/O transfers. In addition to
the I/O – related tasks, it can perform other processing tasks like arithmetic, logic, branching and code
translation. The main memory unit takes the pivotal role. It communicates with processor by the means
of DMA.
93
The Input Output Processor is a specialized processor which loads and stores data into memory along
with the execution of I/O instructions. It acts as an interface between system and devices. It involves a
sequence of events to executing I/O operations and then store the results into the memory.
Advantages –
The I/O devices can directly access the main memory without the intervention by the processor
in I/O processor based systems.
It is used to address the problems that are arises in Direct memory access method.
1. Separate set of address, control and data bus to I/O and memory.
2. Have common bus (data and address) for I/O and memory but separate control lines.
3. Have common bus (data, address, and control) for I/O and memory.
In first case it is simple because both have different set of address space and instruction but require
more buses.
Isolated I/O –
Then we have Isolated I/O in which we Have common bus(data and address) for I/O and memory but
separate read and write control lines for I/O. So when CPU decode instruction then if data is for I/O then
it places the address on the address line and set I/O read or write control line on due to which data
transfer occurs between CPU and I/O. As the address space of memory and I/O is isolated and the name
is so. The address for I/O here is called ports. Here we have different read-write instruction for both I/O
and memory.
94
In this case every bus in common due to which the same set of instructions work for memory and I/O.
Hence we manipulate I/O same as memory and both have same address space, due to which addressing
capability of memory become less because some part is occupied by the I/O.
Separate instruction control read and write Same instructions can control both I/O and
operation in I/O and Memory Memory
In this I/O address are called ports. Normal memory address are for both
It is complex due to separate separate logic Simpler logic is used as I/O is also treated
is used to control both. as memory only.
95
Types of Micro-programmed Control Unit – Based on the type of Control Word stored in the Control
Memory (CM), it is classified into two types :
96