You are on page 1of 96

ADEKUNLE AJASIN UNIVERSITY, AKUNGBA-AKOKO

DEPARTMENT OF COMPUTER SCIENCE

CSC 301: Computer Architecture and Organisation

Basic Computer Instructions


Computer Organization | Basic Computer Instructions
The basic computer has 16 bit instruction register (IR) which can denote either memory reference or
register reference or input-output instruction.

1. Memory Reference – These instructions refer to memory address as an operand. The other
operand is always accumulator. Specifies 12 bit address, 3 bit opcode (other than 111) and 1 bit
addressing mode for direct and indirect addressing.

Example –
IR register contains = 0001XXXXXXXXXXXX, i.e. ADD after fetching and decoding of instruction we find
out that it is a memory reference instruction for ADD operation.

Hence, DR ← M[AR]

AC ← AC + DR, SC ← 0

2. Register Reference – These instructions perform operations on registers rather than memory
addresses. The IR(14 – 12) is 111 (differentiates it from memory reference) and IR(15) is 0
(differentiates it from input/output instructions). The rest 12 bits specify register operation.

Example –
IR register contains = 0111001000000000, i.e. CMA after fetch and decode cycle we find out that it is a
register reference instruction for complement accumulator.

Hence, AC ← ~AC

3. Input/Output – These instructions are for communication between computer and outside
environment. The IR(14 – 12) is 111 (differentiates it from memory reference) and IR(15) is 1
(differentiates it from register reference instructions). The rest 12 bits specify I/O operation.

1
Example –
IR register contains = 1111100000000000, i.e. INP after fetch and decode cycle we find out that it is an
input/output instruction for inputing character. Hence, INPUT character from peripheral device.

The set of instructions incorporated in16 bit IR register are:

1. Arithmetic, logical and shift instructions (and, add, complement, circulate left, right, etc)

2. To move information to and from memory (store the accumulator, load the accumulator)

3. Program control instructions with status conditions (branch, skip)

4. Input output instructions (input character, output character)

SYMBOL HEXADECIMAL CODE DESCRIPTION

AND 0xxx 8xxx And memory word to AC

ADD 1xxx 9xxx Add memory word to AC

LDA 2xxx Axxx Load memory word to AC

STA 3xxx Bxxx Store AC content in memory

BUN 4xxx Cxxx Branch Unconditionally

BSA 5xxx Dxxx Add memory word to AC

ISZ 6xxx Exxx Increment and skip if 0

CLA 7800 Clear AC

CLE 7400 Clear E(overflow bit)

CMA 7200 Complement AC

CME 7100 Complement E

CIR 7080 Circulate right AC and E

CIL 7040 Circulate left AC and E

INC 7020 Increment AC

SPA 7010 Skip next instruction if AC > 0

SNA 7008 Skip next instruction if AC < 0

2
SYMBOL HEXADECIMAL CODE DESCRIPTION

SZA 7004 Skip next instruction if AC = 0

SE 7002 Skip next instruction if E = 0

HLT 7001 Halt computer

INP F800 Input character to AC

OUT F400 Output character from AC

SKI F200 Skip on input flag

SKO F100 Skip on output flag

IEN F080 Interrupt On

Memory based Vs Register based addressing modes


Addressing modes are the operations field specifies the operations which need to be performed. The
operation must be executed on some data which is already stored in computer registers or in the
memory. The way of choosing operands during program execution is dependent on addressing modes of
the instruction. “The addressing mode specifies a rule for interpreting or modifying the address field of
the instruction before the operand is actually referenced. “Basically how we are interpreting the
operand which is given in the instruction is known as addressing mode.

Addressing mode very much depend on the type of CPU organisation. There are three types of CPU
organisation:

1. Single Accumulator organisation

2. General register organisation

3. Stack organisation

Addressing modes is used for one or both of the purpose. These can also be said as the advantages of
using addressing mode:

1. To give programming versatility to the user by providing such facilities as pointers to memory,
counter for loop control, indexing of data, and program relocation.

2. To reduce the number of bits in the addressing field of the instruction.

There are numbers of addressing modes available and it depends on the architecture and CPU
organisation which of the addressing modes can be applied.

3
MEMORY BASED ADDRESSING MODES REGISTER BASED ADDRESSING MODES

An operand will be given in one of the register


The operand is present in memory and its address and register number will be provided in the
is given in the instruction itself. This addressing instruction.With the register number present in
mode is taking proper advantage of memory instruction, operand is fetched, e.g., Register
address, e.g., Direct addressing mode mode

The register contains the address of the operand.


The memory address specified in instruction may The effective address can be derived from the
give the address where the effective address is content of the register specified in the
stored in the memory. In this case effective instruction. The content of the register might not
memory address is present in the memory be the effective address. This mode takes full
address which is specified in the instruction, e.g., advantage of registers, e.g., Register indirect
Indirect Addressing Mode mode

The content of base register is added to the If we are having a table of data and our program
address part of the instruction to obtain the needs to access all the values one by one we need
effective address. A base register is assumed to something which decrements the program
hold a base address and the address field of the counter/or any register which has base address.
instruction gives displacement relative to the Though in this case register is basically decreased,
base address, e.g., Base Register Addressing it is register based addressing mode, e.g., In Auto
Mode decrements mode

The content of the index register is added to the


address part that is given in the instruction to If we are having a table of data and our program
obtain the effective address. Index Mode is used needs to access all the values one by one we need
to access an array whose elements are in something which increment the program
successive memory locations, e.g., Indexed counter/or any register which has base address,
Addressing Mode e.g., Auto increment mode

The content of program counter is added to the


address part of the instruction in order to obtain
the effective address. The address part of the Instructions generally used for initializing
instruction in this case is usually a signed number registers to a constant value is register based
which can be either positive or negative, e.g., addressing mode,and this technique is very useful
Relative addressing mode approach, e.g., Immediate mode.

Memory based addressing modes are mostly rely on Memory address and content present at some
memory location. Register based addressing modes are mostly rely on Registers and content present at
some register either it is data or some memory address.

4
Addressing Modes
Addressing Modes– The term addressing modes refers to the way in which the operand of an
instruction is specified. The addressing mode specifies a rule for interpreting or modifying the address
field of the instruction before the operand is actually executed.

Addressing modes for 8086 instructions are divided into two categories:

1) Addressing modes for data

2) Addressing modes for branch

The 8086 memory addressing modes provide flexible access to memory, allowing you to easily access
variables, arrays, records, pointers, and other complex data types. The key to good assembly language
programming is the proper use of memory addressing modes.

An assembly language program instruction consists of two parts

The memory address of an operand consists of two components:

IMPORTANT TERMS

 Starting address of memory segment.

 Effective address or Offset: An offset is determined by adding any combination of three address
elements: displacement, base and index.

 Displacement: It is an 8 bit or 16 bit immediate value given in the instruction.

 Base: Contents of base register, BX or BP.

 Index: Content of index register SI or DI.

According to different ways of specifying an operand by 8086 microprocessor, different addressing


modes are used by 8086.

Addressing modes used by 8086 microprocessor are discussed below:

 Implied mode:: In implied addressing the operand is specified in the instruction itself. In this
mode the data is 8 bits or 16 bits long and data is the part of instruction.Zero address instruction
are designed with implied addressing mode.

Example: MOV AL, 35H (move the data 35H into AL register)

5
 Immediate addressing mode (symbol #):In this mode data is present in address field of
instruction .Designed like one address instruction format.
Note:Limitation in the immediate mode is that the range of constants are restricted by size of
address field.

 Register mode: In register addressing the operand is placed in one of 8 bit or 16 bit general
purpose registers. The data is in the register that is specified by the instruction.
Here one register reference is required to access the data.

Example: MOV AX,CX (move the contents of CX register to AX register)

 Register Indirect mode: In this addressing the operand’s offset is placed in any one of the
registers BX,BP,SI,DI as specified in the instruction. The effective address of the data is in the
base register or an index register that is specified by the instruction.
Here two register reference is required to access the data.

The 8086 CPUs let you access memory indirectly through a register using the register indirect
addressing modes.

 MOV AX, [BX](move the contents of memory location s

addressed by the register BX to the register AX)

 Auto Indexed (increment mode): Effective address of the operand is the contents of a register
specified in the instruction. After accessing the operand, the contents of this register are
automatically incremented to point to the next consecutive memory location.(R1)+.
Here one register reference,one memory reference and one ALU operation is required to access
the data.

Example:

 Add R1, (R2)+ // OR

 R1 = R1 +M[R2]

R2 = R2 + d

Useful for stepping through arrays in a loop. R2 – start of array d – size of an element

6
 Auto indexed ( decrement mode): Effective address of the operand is the contents of a register
specified in the instruction. Before accessing the operand, the contents of this register are
automatically decremented to point to the previous consecutive memory location. –(R1)
Here one register reference,one memory reference and one ALU operation is required to access
the data.

Example:

Add R1,-(R2) //OR

R2 = R2-d

R1 = R1 + M[R2]

Auto decrement mode is same as auto increment mode. Both can also be used to implement a stack as
push and pop . Auto increment and Auto decrement modes are useful for implementing “Last-In-First-
Out” data structures.

 Direct addressing/ Absolute addressing Mode (symbol [ ]): The operand’s offset is given in the
instruction as an 8 bit or 16 bit displacement element. In this addressing mode the 16 bit
effective address of the data is the part of the instruction.
Here only one memory reference operation is required to access the data.

Example: ADD AL,[0301] //add the contents of offset address 0301 to AL

 Indirect addressing Mode (symbol @ or () ):In this mode address field of instruction contains
the address of effective address. Here two references are required.
1st reference to get effective address.
2nd reference to access the data.

Based on the availability of Effective address, Indirect mode is of two kind:

1. Register Indirect: In this mode effective address is in the register, and corresponding
register name will be maintained in the address field of an instruction.
Here one register reference, one memory reference is required to access the data.

2. Memory Indirect: In this mode effective address is in the memory, and corresponding
memory address will be maintained in the address field of an instruction.
Here two memory reference is required to access the data.

 Indexed addressing mode: The operand’s offset is the sum of the content of an index register SI
or DI and an 8 bit or 16 bit displacement.

Example: MOV AX, [SI +05]

 Based Indexed Addressing: The operand’s offset is sum of the content of a base register BX or
BP and an index register SI or DI.

7
Example: ADD AX, [BX+SI]

Based on Transfer of control, addressing modes are:

 PC relative addressing mode: PC relative addressing mode is used to implement intra


segment transfer of control, In this mode effective address is obtained by adding
displacement to PC.

 EA= PC + Address field value

PC= PC + Relative value.

 Base register addressing mode: Base register addressing mode is used to implement
inter segment transfer of control. In this mode effective address is obtained by adding
base register value to address field value.

 EA= Base register + Address field value.

 PC= Base register + Relative value.

Note:

1. PC relative nad based register both addressing modes are suitable for program
relocation at runtime.

2. Based register addressing mode is best suitable to write position independent


codes.

Sample GATE Question

Match each of the high level language statements given on the left hand side with the most natural
addressing mode from those listed on the right hand side.

1. A[1] = B[J]; a. Indirect addressing

2. while [*A++]; b. Indexed addressing

3. int temp = *x; c. Autoincrement

(A) (1, c), (2, b), (3, a)


(B) (1, a), (2, c), (3, b)
(C) (1, b), (2, c), (3, a)
(D) (1, a), (2, b), (3, c)

Answer: (C)

Explanation:

List 1 List 2

1) A[1] = B[J]; b) Index addressing

Here indexing is used

8
2) while [*A++]; c) auto increment

The memory locations are automatically incremented

3) int temp = *x; a) Indirect addressing

Here temp is assigned the value of int type stored

at the address contained in X

Hence (C) is correct solution.

This article is contributed by Pooja Taneja. Please write comments if you find anything incorrect, or you
want to share more information about the topic discussed above.

Computer Organization | Von Neumann architecture


Historically there have been 2 types of Computers:

1. Fixed Program Computers – Their function is very specific and they couldn’t be programmed,
e.g. Calculators.

2. Stored Program Computers – These can be programmed to carry out many different tasks,
applications are stored on them, hence the name.

The modern computers are based on a stored-program concept introduced by John Von Neumann. In
this stored-program concept, programs and data are stored in a separate storage unit called memories
and are treated the same. This novel idea meant that a computer built with this architecture would be
much easier to reprogram.

The basic structure is like,

It is also known as IAS computer and is having three basic units:

9
1. The Central Processing Unit (CPU)

2. The Main Memory Unit

3. The Input/Output Device

Let’s consider them in details.

 Control Unit –

A control unit (CU) handles all processor control signals. It directs all input and output flow, fetches code
for instructions and controlling how data moves around the system.

 Arithmetic and Logic Unit (ALU) –

The arithmetic logic unit is that part of the CPU that handles all the calculations the CPU may need, e.g.
Addition, Subtraction, Comparisons. It performs Logical Operations, Bit Shifting Operations, and
Arithmetic Operation.

Figure – Basic CPU structure, illustrating ALU

 Main Memory Unit (Registers) –

1. Accumulator: Stores the results of calculations made by ALU.

2. Program Counter (PC): Keeps track of the memory location of the next instructions to
be dealt with. The PC then passes this next address to Memory Address Register (MAR).

3. Memory Address Register (MAR): It stores the memory locations of instructions that
need to be fetched from memory or stored into memory.

10
4. Memory Data Register (MDR): It stores instructions fetched from memory or any data
that is to be transferred to, and stored in, memory.

5. Current Instruction Register (CIR): It stores the most recently fetched instructions while
it is waiting to be coded and executed.

6. Instruction Buffer Register (IBR): The instruction that is not to be executed immediately
is placed in the instruction buffer register IBR.

 Input/Output Devices – Program or data is read into main memory from the input device or
secondary storage under the control of CPU input instruction. Output devices are used to output
the information from a computer. If some results are evaluated by computer and it is stored in
the computer, then with the help of output devices, we can present it to the user.

 Buses – Data is transmitted from one part of a computer to another, connecting all major
internal components to the CPU and memory, by the means of Buses. Types:

1. Data Bus: It carries data among the memory unit, the I/O devices, and the processor.

2. Address Bus: It carries the address of data (not the actual data) between memory and
processor.

3. Control Bus: It carries control commands from the CPU (and status signals from other
devices) in order to control and coordinate all the activities within the computer.

Von Neumann bottleneck –


Whatever we do to enhance performance, we cannot get away from the fact that instructions can only
be done one at a time and can only be carried out sequentially. Both of these factors hold back the
competence of the CPU. This is commonly referred to as the ‘Von Neumann bottleneck’. We can provide
a Von Neumann processor with more cache, more RAM, or faster components but if original gains are to
be made in CPU performance then an influential inspection needs to take place of CPU configuration.

This architecture is very important and is used in our PCs and even in Super Computers.

Interaction of a Program with Hardware


When a Programmer wrote a program, then how it is feeded to the computer and how it actually
works?
So, this article is about the process how the program code that we write in any text editor is feeded to
computer as we all know computer works on only two numbers that is 0 or 1.

Let’s talk about it abstraction by abstraction starting from writing code in any text editor.

1. We write code in text editor using any language like C++, JAVA, Python etc.

2. This code is given to the compiler and it actually converts it to assembly code that is very close
to machine hardware as it depend on instruction set which is then converted to binary that is 0
and 1 which actually represent digital voltage feeded to transistors inside the chip.

11
3. Now we have voltages which is actually required to run the hardware.These voltages actually
connect the correct circuitry inside the chip and perform that specific task for example addition,
subtraction etc .All these operations are done by combination of little transistors if we go into
low level or flip-flops which are combination of gates and gates are combination of transistors.
So, it all started with the invention of transistors.

4. The chip has lot of circuits inside it to perform various task like arithmetic nd logical task.
The computer hardware also contain RAM which is another chip which can store data
temporary and Hard disk which can permanently store data.

5. Operating system is also responsible to feed the software to the right hardware like keyboard,
mouse, screen etc.

The following picture depict the whole process:

Simplified Instructional Computer (SIC)


Simplified Instructional Computer (SIC) is a hypothetical computer that has hardware features which are
often found in real machines. There are two versions of this machine:

1. SIC standard Model

2. SIC/XE(extra equipment or expensive)

Object program for SIC can be properly executed on SIX/XE which is known as upward compatability.

SIC Machine Architecture/Components –

1. Memory –

 Memory is byte addressable that is words are addressed by location of their lowest
numbered byte.

 There are 2^15 bytes in computer memory (1 byte = 8 bits)


3 consecutive byte = 1 word (24 bits = 1 word)

12
2. Registers –
There are 5 registers in SIC. Every register has an address associated with it known as register
number. Size of each register is 4 bytes. On basis of register size, integer size is dependent.

I. A(Accumulator-0): It is used for mathematical operations.


II. X(Index Register-1): It is used for addressing.
III. L(Linkage Register-2): It stores the return address of instruction in case of subroutines.
IV. PC(Program Counter-8): It holds the address of next instruction to be executed.
V. SW(Status Word-9): It contains the variety of information

Status Word Register:

 mode bit refers to user mode(value=0) or supervising mode(value=1). It occupies 1


bit.[0]

 state bit refers whether process is in running state(value=0) or idle state(value=1). It


also occupies 1 bit.[1]

 id bit refers to process id(PID). It occupies 3 bits.[2-5]

 CC bit refers to condition code i.e. It tells whether device is ready or not. It occupies 2
bits.[6-7]
Mask bit refers to interrupt mask. It occupies 4 bits.[8-11]

 X refers to unused bit. It also occupies 4 bits.[12-15]

 ICode refers to interrupt code i.e. Interrupt Service Routine. It occupies the remaining
bits.[16-23]

3. Data Format –

 Integers are represented by 24 bit.

 Negative numbers are represented in 2’s complement.

 Characters are represented by 8 bit ASCII value.

 No floating point representation is available.

4. Instruction Format –
All instructions in SIC have 24 bit format.

 If x=0 it means direct addressing mode.

13
 If x=1 it means indexed addressing mode.

5. Instruction Set –

 Load And Store Instructions: To move or store data from accumulator to memory or
vice-versa. For example LDA, STA, LDX, STX etc.

 Comparison Instructions: Used to compare data in memory by contents in accumulator.


For example COMP data.

 Arithmetic Instructions: Used to perform operations on accumulator and memory and


store result in accumulator. For example ADD, SUB, MUL, DIV etc.

 Conditional Jump: compare the contents of accumulator and memory and performs task
based on conditions. For example JLT, JEQ, JGT

 Subroutine Linkage: Instructions related to subroutines. For example JSUB, RSUB

6. Input and Output –


It is performed by transferring 1 byte at a time from or to rightmost 8 bits of accumulator. Each
device has 8 bit unique code.
There are 3 I/O instructions:

 Test Device (TD) tests whether device is ready or not. Condition code in Status Word
Register is used for this purpose. If cc is < then device is ready otherwise device is busy.

 Read data(RD) reads a byte from device and stores in register A.

 Write data(WD) writes a byte from register A to the device.

References:
Leland.L.Beck: An introduction to systems programming, 3rd Edition, Addison-Wesley, 1997.

Instruction Set used in simplified instructional Computer (SIC)


These are the instructions used in programming the Simplified Instructional Computer(SIC).

Here,
A stands for Accumulator
M stands for Memory
CC stands for Condition Code
PC stands for Program Counter
RMB stands for Right Most Byte
L stands for Linkage Register

14
MNEMONIC OPERAND OPCODE EXPLANATION

ADD M 18 A=A+M

AND M 40 A = A AND M

COMP M 28 compares A and M

DIV M 24 A=A/M

J M 3C PC = M

JEQ M 30 if CC set to =, PC = M

JGT M 34 if CC set to >, PC = M

JLT M 38 if CC set to <, PC = M

JSUB M 48 L = PC ; PC = M

LDA M 00 A=M

LDCH M 50 A[RMB] = [RMB]

LDL M 08 L=M

LDX M 04 X=M

MUL M 20 A=A*M

OR M 44 A = A OR M

RD M D8 A[RMB] = data specified by M[RMB]

RSUB 4C PC = L

STA M 0C M=A

STCH M 54 M[RMB] = A[RMB]

STL M 14 M=L

STSW M E8 M = SW

STX M 10 M=X

SUB M 1C A=A–M

TD M E0 test device specified by M

15
MNEMONIC OPERAND OPCODE EXPLANATION

TIX M 2C X = X + 1 ; compare X with M

WD M DC device specified by M[RMB] = S[RMB]

Computer Organization | RISC and CISC


Reduced Set Instruction Set Architecture (RISC) –
The main idea behind is to make hardware simpler by using an instruction set composed of a few basic
steps for loading, evaluating and storing operations just like an addition command will be composed of
loading data, evaluating and storing.

Complex Instruction Set Architecture (CISC) –


The main idea is to make hardware complex as a single instruction will do all loading, evaluating and
storing operations just like a multiplication command will do stuff like loading data, evaluating and
storing it.

Both approaches try to increase the CPU performance

 RISC: Reduce the cycles per instruction at the cost of the number of instructions per program.

 CISC: The CISC approach attempts to minimize the number of instructions per program but at
the cost of increase in number of cycles per instruction.

Earlier when programming was done using assembly language, a need was felt to make instruction do
more task because programming in assembly was tedious and error prone due to which CISC
architecture evolved but with uprise of high level language dependency on assembly reduced RISC
architecture prevailed.

Characteristic of RISC –

1. Simpler instruction, hence simple instruction decoding.

2. Instruction come under size of one word.

3. Instruction take single clock cycle to get executed.

4. More number of general purpose register.

5. Simple Addressing Modes.

6. Less Data types.

7. Pipeling can be achieved.

16
Characteristic of CISC –

1. Complex instruction, hence complex instruction decoding.

2. Instruction are larger than one word size.

3. Instruction may take more than single clock cycle to get executed.

4. Less number of general purpose register as operation get performed in memory itself.

5. Complex Addressing Modes.

6. More Data types.

Example – Suppose we have to add two 8-bit number:

 CISC approach: There will be a single command or instruction for this like ADD which will
perform the task.

 RISC approach: Here programmer will write first load command to load data in registers then it
will use suitable operator and then it will store result in desired location.

So, add operation is divided into parts i.e. load, operate, store due to which RISC programs are longer
and require more memory to get stored but require less transistors due to less complex command.

Difference –

RISC CISC

Focus on software Focus on hardware

Transistors are used for storing complex


Transistors are used for more registers Instructions

Code size is large Code size is small

A instruction execute in single clock cycle Instruction take more than one clock cycle

A instruction fit in one word Instruction are larger than size of one woed

Microprocessors: RISC and CISC | Set 2


A microprocessor is a processing unit on a single chip. It is an integrated circuit which performs the core
functions of a computer CPU. It is a multipurpose programmable silicon chip constructed using Metal
Oxide Semiconductor (MOS) technology which is clock driven and register based. It accepts binary data
as input and provides output after processing it as per the specification of instructions stored in the
memory. These microprocessors are capable of processing 128 bits at a time at the speed of one billion
instructions per second.

Characteristics of a micro processor:

17
 Instruction Set –
Set of complete instructions that the microprocessor executes is termed as the instruction set.

 Word Length –
The number of bits processed in a single instruction is called word length or word size. Greater
the word size, larger the processing power of the CPU.

 System Clock Speed –


Clock speed determines how fast a single instruction can be executed in a processor. The
microprocessor’s pace is controlled by the System Clock. Clock speeds are generally measured in
million of cycles per second (MHz) and thousand million of cycles per second (GHz). Clock speed
is considered to be a very important aspect of predicting the performance of a processor.

Classification of Microprocessors:
Besides the classification based on the word length, the classification is also based on the architecture
i.e. Instruction Set of the microprocessor. These are categorised into RISC and CISC.

1. RISC:
It stands for Reduced Instruction Set Computer. It is a type of microprocessor architecture that
uses a small set of instructions of uniform length. These are simple instructions which are
generally executed in one clock cycle. RISC chips are relatively simple to design and
inexpensive.The setback of this design is that the computer has to repeatedly perform simple
operations to execute a larger program having a large number of processing operations.
Examples: SPARC, POWER PC etc.

2. CISC:
It stands for Complex Instruction Set Computer. These processors offer the users, hundreds of
instructions of variable sizes. CISC architecture includes a complete set of special purpose
circuits that carry out these instructions at a very high speed. These instructions interact with
memory by using complex addressing modes. CISC processors reduce the program size and
hence lesser number of memory cycles are required to execute the programs. This increases the
overall speed of execution.
Examples: Intel architecture, AMD

3. EPIC:
It stands for Explicitly Parallel Instruction Computing. The best features of RISC and CISC
processors are combined in the architecture. It implements parallel processing of instructions
rather than using fixed length instructions. The working of EPIC processors are supported by
using a set of complex instructions that contain both basic instructions as well as the
information of execution of parallel instructions. It substantially increases the efficiency of these
processors.

Below are few differences between RISC and CISC:

18
CISC RISC

A large number of instructions are present in the Very fewer instructions are present. The number of
architecture. instructions are generally less than 100.

Some instructions with long execution times. No instruction with a long execution time due to
These include instructions that copy an entire very simple instruction set. Some early RISC
block from one part of memory to another and machines did not even have an integer multiply
others that copy multiple registers to and from instruction, requiring compilers to implement
memory. multiplication as a sequence of additions.

Variable-length encodings of the instructions. Fixed-length encodings of the instructions are used.
Example: IA32 instruction size can range from 1 Example: In IA32, generally all instructions are
to 15 bytes. encoded as 4 bytes.

Multiple formats are supported for specifying


operands. A memory operand specifier can have
many different combinations of displacement, Simple addressing formats are supported. Only
base and index registers. base and displacement addressing is allowed.

CISC supports array. RISC does not supports array.

Arithmetic and logical operations only use register


operands. Memory referencing is only allowed by
load and store instructions, i.e. reading from
Arithmetic and logical operations can be applied memory into a register and writing from a register
to both memory and register operands. to memory respectively.

Implementation programs are hidden from


machine level programs. The ISA provides a clean Implementation programs exposed to machine
abstraction between programs and how they get level programs. Few RISC machines do not allow
executed. specific instruction sequences.

Condition codes are used. No condition codes are used.

Registers are being used for procedure arguments


The stack is being used for procedure arguments and return addresses. Memory references can be
and return addresses. avoided by some procedures.

Single Accumulator based CPU organization


The computers, present in the early days of computer history, had accumulator based CPUs. In this type
of CPU organization, the accumulator register is used implicitly for processing all instructions of a

19
program and store the results into the accumulator. The instruction format that is used by this CPU
Organisation is One address field. Due to this the CPU is known as One Address Machine.

The main points about Single Accumulator based CPU Organisation are:

1. In this CPU Organization, the first ALU operand is always stored into the Accumulator and the
second operand is present either in Registers or in the Memory.

2. Accumulator is the default address thus after data manipulation the results are stored into the
accumulator.

3. One address instruction is used in this type of organization.

The format of instruction is: Opcode + Address

Opcode indicates the type of operation to be performed.


Mainly two types of operation are performed in single accumulator based CPU organization:

1. Data transfer operation –


In this type of operation, the data is transferred from a source to a destination.

For ex: LOAD X, STORE Y

Here LOAD is memory read operation that is data is transfer from memory to accumulator and STORE is
memory write operation that is data is transfer from accumulator to memory.

2. ALU operation –
In this type of operation, arithmetic operations are performed on the data.

For ex: MULT X

where X is the address of the operand. The MULT instruction in this example performs the operation,

AC <-- AC * M[X]

AC is the Accumulator and M[X] is the memory word located at location X.

This type of CPU organization is first used in PDP-8 processor and is used for process control and
laboratory applications. It has been totally replaced by the introduction of the new general register
based CPU.

Advantages –

 One of the operands is always held by the accumulator register. This results in short instructions
and less memory space.

 Instruction cycle takes less time because it saves time in instruction fetching from memory.

Disadvantages –

 When complex expressions are computed, program size increases due to the usage of many
short instructions to execute it. Thus memory size increases.

 As the number of instructions increases for a program, the execution time increases.

20
Computer Organization | Stack based CPU Organization
The computers which use Stack based CPU Organization are based on a data structure called stack.
Stack is a list of data words. It uses Last In First Out (LIFO) access method which is the most popular
access method in most of the CPU. A register is used to store the address of the top most element of the
stack which is known as Stack pointer (SP).

The main two operations that are performed on the operators of the stack are Push and Pop. These two
operations are performed from one end only.

1. Push –
This operation is results in inserting one operand at the top of the stack and it decrease the
stack pointer register. The format of the PUSH instruction is:

PUSH

It inserts the data word at specified address to the top of the stack. It can be implemented as:

//decrement SP by 1

SP <-- SP - 1

//store the content of specified memory address

//into SP; i.e, at top of stack

SP <-- (memory address)

2. Pop –
This operation is results in deleting one operand from the top of the stack and it increase the
stack pointer register. The format of the POP instruction is:

POP

It deletes the data word at the top of the stack to the specified address. It can be implemented as:

//transfer the content of SP (i.e, at top most data)

//into specified memory location

(memory address) <-- SP

//increment SP by 1

SP <-- SP + 1

Operation type instruction do not need address field in this CPU organization. This is because the
operation is performed on the two operands that are on the top of the stack. For example:

21
SUB

This instruction contains the opcode only with no address field. It pops the two top data from the stack,
subtracting the data, and pushing the result into the stack at the top.

PDP-11, Intel’s 8085 and HP 3000 are some of the examples of the stack organized computers.

The advantages of Stack based CPU organization –

 Efficient computation of complex arithmetic expressions.

 Execution of instructions is fast because operand data are stored in consecutive memory
locations.

 Length of instruction is short as they do not have address field.

The disadvantages of Stack based CPU organization –

 The size of the program increases.

Computer Organization | General Register based CPU Organization


When we are using multiple general purpose registers, instead of single accumulator register, in the CPU
Organization then this type of organization is known as General register based CPU Organization. In this
type of organization, computer uses two or three address fields in their instruction format. Each address
field may specify a general register or a memory word. For example:

MULT R1, R2, R3

This is an instruction of an arithmatic multiplication written in assembly language. It uses three address
fields R1, R2 and R3. The meaning of this instruction is:

R1 <-- R2 * R3

This instruction also can be written using only two address fields as:

MULT R1, R2

In this instruction, the destination register is the same as one of the source registers. This means the
operation

R1 <-- R1 * R2

The use of large number of registers results in short program with limited instructions.

Some examples of General register based CPU Organization are IBM 360 and PDP- 11.

The advantages of General register based CPU organization –

 Efficiency of CPU increases as there are large number of registers are used in this organization.

22
 Less memory space is used to store the program since the instructions are written in compact
way.

The disadvantages of General register based CPU organization –

 Care should be taken to avoid unnecessary usage of registers. Thus, compilers need to be more
intelligent in this aspect.

 Since large number of registers are used, thus extra cost is required in this organization.

Computer Organization | Machine Control Instruction


These type of instructions control machine functions such as Halt, Interrupt, or do nothing. This type of
instructions alters the different type of operations executed in the processor.

Following are the type of Machine control instructions:

1. NOP (No operation)

2. HLT (Halt)

3. DI (Disable interrupts)

4. EI (Enable interrupts)

5. SIM (Set interrupt mask)

6. RIM (Reset interrupt mask)

1. NOP (No operation) –

2. Opcode- NOP

3. Operand- None

4. Length- 1 byte

5. M-Cycles- 1

6. T-states- 4

Hex code- 00

It is used when no operation is performed. No flags are affected during the execution of NOP. The
instructon is used to fill in time delay or to delete and insert instructions while troubleshooting.

7. HLT (Halt and enter wait state) –

8. Opcode- HLT

9. Operand- None

10. Length- 1 byte

23
11. M-Cycles- 2 or more

12. T-states- 5 or more

Hex code- 76

The Microprocessor finishes executing the current instruction and halts any further execution. The
contents of the registers are unaffected during the HLT state.

13. DI (Disable interrupts) –

14. Opcode- DI

15. Operand- None

16. Length- 1 byte

17. M-Cycles- 1

18. T-states- 4

Hex code- F3

Disable interrupt is used when the execution of a code sequence cannot be interrupted. For example, in
critical time delays, this instruction is used at the beginning of the code and the interrupts are enabled
at the end of the code. The 8085 TRAP cannot be disabled.

19. EI (Enable interrupts) –

20. Opcode- EI

21. Operand- None

22. Length- 1 byte

23. M-Cycles- 1

24. T-states- 4

Hex code- FB

After a system reset or the acknowledgement of an interrupt, the Interrupt Enable the flip-flop is reset,
thus disabling the interrupts.

25. SIM (Set interrupt mask) –

26. Opcode- SIM

27. Operand- None

28. Length- 1 byte

29. M-Cycles- 1

30. T-states- 4

24
Hex code- 30

This SIM instruction is used to implementation of different interrupts of 8085 microprocessor like RST
7.5, 6.5 and 5.5 and also serial data output. It does not affect TRAP interrupt.

31. RIM (Reset interrupt mask) –

32. Opcode- RIM

33. Operand- None

34. Length- 1 byte

35. M-Cycles- 1

36. T-states- 4

Hex code- 20

This is a multipurpose instruction used to read the status of 8085 interrupts 7.5, 6.5, 5.5 and to read
serial data input bit.

Instruction Design and Format


Computer Organization | Different Instruction Cycles
Prerequiste – Execution, Stages and Throughput

Registers Involved In Each Instruction Cycle:

 Memory address registers(MAR) : It is connected to the address lines of the system bus. It
specifies the address in memory for a read or write operation.

 Memory Buffer Register(MBR) : It is connected to the data lines of the system bus. It contains
the value to be stored in memory or the last value read from the memory.

 Program Counter(PC) : Holds the address of the next instruction to be fetched.

 Instruction Register(IR) : Holds the last instruction fetched.

The Instruction Cycle –

25
Each phase of Instruction Cycle can be decomposed into a sequence of elementary micro-operations. In
the above examples, there is one sequence each for the Fetch, Indirect, Execute and Interrupt Cycles.

The Indirect Cycle is always followed by the Execute Cycle. The Interrupt Cycle is always followed by
the Fetch Cycle. For both fetch and execute cycles, the next cycle depends on the state of the system.

We assumed a new 2-bit register called Instruction Cycle Code (ICC). The ICC designates the state of
processor in terms of which portion of the cycle it is in:-

00 : Fetch Cycle
01 : Indirect Cycle
10 : Execute Cycle
11 : Interrupt Cycle

26
At the end of the each cycles, the ICC is set appropriately.The above flowchart of Instruction
Cycledescribes the complete sequence of micro-operations, depending only on the instruction sequence
and the interrupt pattern(this is a simplified example). The operation of the processor is described as
the performance of a sequence of micro-operation.

Different Instruction Cycles:

1. The Fetch Cycle –


At the beginning of the fetch cycle, the address of the next instruction to be executed is in
the Program Counter(PC).

Step 1: The address in the program counter is moved to the memory address register(MAR), as this is
the only register which is connected to address lines of the system bus.

Step 2: The address in MAR is placed on the address bus, now the control unit issues a READ command
on the control bus, and the result appears on the data bus and is then copied into the memory buffer
register(MBR). Program counter is incremented by one, to get ready for the next instruction.(These two
action can be performed simultaneously to save time)

Step 3: The content of the MBR is moved to the instruction register(IR).

27
Thus, a simple Fetch Cycle consist of three steps and four micro-operation. Symbolically, we can write
these sequence of events as follows:-

Here ‘I’ is the instruction length. The notations (t1, t2, t3) represents successive time units. We assume
that a clock is available for timing purposes and it emits regularly spaced clock pulses. Each clock pulse
defines a time unit. Thus, all time units are of equal duration. Each micro-operation can be performed
within the time of a single time unit.
First time unit: Move the contents of the PC to MAR.
Second time unit: Move contents of memory location specified by MAR to MBR. Increment content of
PC by I.
Third time unit: Move contents of MBR to IR.
Note: Second and third micro-operations both take place during the second time unit.

2. The Indirect Cycles –

Once an instruction is fetched, the next step is to fetch source operands. Source Operand is being
fetched by indirect addressing. Register-based operands need not be fetched. Once the opcode is
executed, a similar process may be needed to store the result in main memory. Following micro-
operations takes place:-

Step 1: The address field of the instruction is transferred to the MAR. This is used to fetch the address of
the operand.
Step 2: The address field of the IR is updated from the MBR.(So that it now contains a direct addressing
rather than indirect addressing)
Step 3: The IR is now in the state, as if indirect addressing has not been occurred.

Note: Now IR is ready for the execute cycle, but it skips that cycle for a moment to consider
the Interrupt Cycle .

28
3. The Execute Cycle

The other three cycles(Fetch, Indirect and Interrupt) are simple and predictable. Each of them requires
simple, small and fixed sequence of micro-operation. In each case same micro-operation are repeated
each time around.
Execute Cycle is different from them. Like, for a machine with N different opcodes there are N different
sequence of micro-operations that can occur.
Lets take an hypothetical example :-
consider an add instruction:

Here, this instruction adds the content of location X to register R. Corresponding micro-operation will
be:-

We begin with the IR containing the ADD instruction.


Step 1: The address portion of IR is loaded into the MAR.
Step 2: The address field of the IR is updated from the MBR, so the reference memory location is read.
Step 3: Now, the contents of R and MBR are added by the ALU.

Lets take a complex example :-

Here, the content of location X is incremented by 1. If the result is 0, the next instruction will be skipped.
Corresponding sequence of micro-operation will be :-

Here, the PC is incremented if (MBR) = 0. This test (is MBR equal to zero or not) and action (PC is
incremented by 1) can be implemented as one micro-operation.
Note : This test and action micro-operation can be performed during the same time unit during which
the updated value MBR is stored back to memory.

29
4. The Interrupt Cycle:
At the completion of the Execute Cycle, a test is made to determine whether any enabled
interrupt has occurred or not. If an enabled interrupt has occurred then Interrupt Cycle occurs.
The natare of this cycle varies greatly from one machine to another.
Lets take a sequence of micro-operation:-

Step 1: Contents of the PC is transferred to the MBR, so that they can be saved for return.
Step 2: MAR is loaded with the address at which the contents of the PC are to be saved.
PC is loaded with the address of the start of the interrupt-processing routine.
Step 3: MBR, containing the old value of PC, is stored in memory.

Note: In step 2, two actions are implemented as one micro-operation. However, most processor provide
multiple types of interrupts, it may take one or more micro-operation to obtain the save_address and
the routine_address before they are transferred to the MAR and PC respectively.

Machine Instructions
Machine Instructions are commands or programs written in machine code of a machine (computer) that
it can recognize and execute.

 A machine instruction consists of several bytes in memory that tells the processor to perform
one machine operation.

 The processor looks at machine instructions in main memory one after another, and performs
one machine operation for each machine instruction.

 The collection of machine instructions in main memory is called a machine language program.

Machine code or machine language is a set of instructions executed directly by a computer’s central
processing unit (CPU). Each instruction performs a very specific task, such as a load, a jump, or an ALU
operation on a unit of data in a CPU register or memory. Every program directly executed by a CPU is
made up of a series of such instructions.

The general format of a machine instruction is

[Label:] Mnemonic [Operand, Operand] [; Comments]

 Brackets indicate that a field is optional

 Label is an identifier that is assigned the address of the first byte of the instruction in which it
appears. It must be followed by “:”

30
 Inclusion of spaces is arbitrary, except that at least one space must be inserted; no space would
lead to an ambiguity.

 Comment field begins with a semicolon “ ; ”

Example:

Here: MOV R5,#25H ;load 25H into R5

Machine instructions used in 8086 microprocessor

1. Data transfer instructions– move, load exchange, input, output.

 MOV :Move byte or word to register or memory .

 IN, OUT: Input byte or word from port, output word to port.

 LEA: Load effective address

 LDS, LES Load pointer using data segment, extra segment .

 PUSH, POP: Push word onto stack, pop word off stack.

 XCHG: Exchange byte or word.

 XLAT: Translate byte using look-up table.

2. Arithmetic instructions – add, subtract, increment, decrement, convert byte/word and compare.

 ADD, SUB: Add, subtract byte or word

 ADC, SBB :Add, subtract byte or word and carry (borrow).

 INC, DEC: Increment, decrement byte or word.

 NEG: Negate byte or word (two’s complement).

 CMP: Compare byte or word (subtract without storing).

 MUL, DIV: Multiply, divide byte or word (unsigned).

 IMUL, IDIV: Integer multiply, divide byte or word (signed)

 CBW, CWD: Convert byte to word, word to double word

 AAA, AAS, AAM,AAD: ASCII adjust for add, sub, mul, div .

 DAA, DAS: Decimal adjust for addition, subtraction (BCD numbers)

3. Logic instructions – AND, OR, exclusive OR, shift/rotate and test

 NOT : Logical NOT of byte or word (one’s complement)

 AND: Logical AND of byte or word

 OR: Logical OR of byte or word.

31
 XOR: Logical exclusive-OR of byte or word

 TEST: Test byte or word (AND without storing).

 Shift, rotate instruction- SHL, SHR Logical shift left, right byte or word? by 1or CL

 SAL, SAR Arithmetic shift left, right byte or word? by 1 or CL

 ROL, ROR Rotate left, right byte or word? by 1 or CL .

 RCL, RCR Rotate left, right through carry byte or word? by 1 or CL.

4. String manipulation instruction – load, store, move, compare and scan for byte/word

 MOVS: Move byte or word string

 MOVSB, MOVSW: Move byte, word string.

 CMPS: Compare byte or word string.

 SCAS S: can byte or word string (comparing to A or AX)

 LODS, STOS: Load, store byte or word string to AL.

5. Control transfer instructions – conditional, unconditional, call subroutine and return from subroutine.

 JMP:Unconditional jump .it includes loop transfer and subroutine and interrupt instructions.

6. Loop control instructions-

 LOOP: Loop unconditional, count in CX, short jump to target address.

 LOOPE (LOOPZ): Loop if equal (zero), count in CX, short jump to target address.

 LOOPNE (LOOPNZ): Loop if not equal (not zero), count in CX, short jump to target address.

 JCXZ: Jump if CX equals zero (used to skip code in loop).

 Subroutine and Intrrupt instructions-

 CALL, RET: Call, return from procedure (inside or outside current segment).

 INT, INTO: Software interrupt, interrupt if overflow.IRET: Return from interrupt.

7. Processor control instructions-

Flag manipulation:

 STC, CLC, CMC: Set, clear, complement carry flag.

 STD, CLD: Set, clear direction flag.STI, CLI: Set, clear interrupt enable flag.

 PUSHF, POPF: Push flags onto stack, pop flags off stack.

Sample GATE Question

Consider the sequence of machine instructions given below:

32
MUL R5, R0, R1

DIV R6, R2, R3

ADD R7, R5, R6

SUB R8, R7, R4

In the above sequence, R0 to R8 are general purpose registers. In the instructions shown, the first
register stores the result of the operation performed on the second and the third registers. This
sequence of instructions is to be executed in a pipelined instruction processor with the following 4
stages: (1) Instruction Fetch and Decode (IF), (2) Operand Fetch (OF), (3) Perform Operation (PO) and (4)
Write back the Result (WB). The IF, OF and WB stages take 1 clock cycle each for any instruction. The PO
stage takes 1 clock cycle for ADD or SUB instruction, 3 clock cycles for MUL instruction and 5 clock cycles
for DIV instruction. The pipelined processor uses operand forwarding from the PO stage to the OF stage.
The number of clock cycles taken for the execution of the above sequence of instructions is
___________
(A) 11
(B) 12
(C) 13
(D) 14

Answer: (C)

Explanation:
1 2 3 4 5 6 7 8 9 10 11 12 13

IF OF PO PO PO WB

IF OF PO PO PO PO PO WB

IF OF PO WB

IF OF PO WB

Article Contributed by Pooja Taneja. Please write comments if you find anything incorrect, or you want
to share more information about the topic discussed above.

Instruction Formats (Zero, One, Two and Three Address Instruction)


Computer perform task on the basis of instruction provided. A instruction in computer comprises of
groups called fields. These field contains different information as for computers every thing is in 0 and 1
so each field has different significance on the basis of which a CPU decide what so perform. The most
common fields are:

 Operation field which specifies the operation to be performed like addition.

33
 Address field which contain the location of operand, i.e., register or memory location.

 Mode field which specifies how operand is to be founded.

A instruction is of various length depending upon the number of addresses it contain. Generally CPU
organization are of three types on the basis of number of address fields:

1. Single Accumulator organization

2. General register organization

3. Stack organization

In first organization operation is done involving a special register called accumulator. In second on
multiple registers are used for the computation purpose. In third organization the work on stack basis
operation due to which it does not contain any address field. It is not necessary that only a single
organization is is applied a blend of various organization is mostly what we see generally.

On the basis of number of address instruction are classified as:

Note that we will use X = (A+B)*(C+D) expression to showcase the procedure.

1. Zero Address Instructions –

A stack based computer do not use address field in instruction.To evaluate a expression first it is
converted to revere Polish Notation i.e. Post fix Notation.

Expression: X = (A+B)*(C+D)

Postfixed : X = AB+CD+*

TOP means top of stack

M[X] is any memory location

PUSH A TOP = A

PUSH B TOP = B

ADD TOP = A+B

34
PUSH C TOP = C

PUSH D TOP = D

ADD TOP = C+D

MUL TOP = (C+D)*(A+B)

POP X M[X] = TOP


2. One Address Instructions –
This use a implied ACCUMULATOR register for data manipulation.One operand is in accumulator
and other is in register or memory location.Implied means that the CPU already know that one
operand is in accumulator so there is no need to specify it.

Expression: X = (A+B)*(C+D)

AC is accumulator

M[] is any memory location

M[T] is temporary location

LOAD A AC = M[A]

ADD B AC = AC + M[B]

STORE T M[T] = AC

LOAD C AC = M[C]

ADD D AC = AC + M[D]

MUL T AC = AC * M[T]

STORE X M[X] = AC
3. Two Address Instructions –
This is common in commercial computers. Here two address can be specified in the instruction.
Unlike earlier in one address instruction the result was stored in accumulator here result cab be
stored at different location rather than just accumulator, but require more number of bit to
represent address.

35
Here destination address can also contain operand.

Expression: X = (A+B)*(C+D)

R1, R2 are registers

M[] is any memory location

MOV R1, A R1 = M[A]

ADD R1, B R1 = R1 + M[B]

MOV R2, C R2 = C

ADD R2, D R2 = R2 + D

MUL R1, R2 R1 = R1 * R2

MOV X, R1 M[X] = R1
4. Three Address Instructions –
This has three address field to specify a register or a memory location. Program created are
much short in size but number of bits per instruction increase. These instructions make creation
of program much easier but it does not mean that program will run much faster because now
instruction only contain more information but each micro operation (changing content of
register, loading address in address bus etc.) will be performed in one cycle only.

Expression: X = (A+B)*(C+D)

R1, R2 are registers

M[] is any memory location

ADD R1, A, B R1 = M[A] + M[B]

ADD R2, C, D R2 = M[C] + M[D]

MUL X, R1, R2 M[X] = R1 * R2

Register content and Flag status after Instructions

36
Basically, you are given a set of instructions and the initial content of the registers and flags of 8085
microprocessor. You have to find the content of the registers and flag status after each instruction.

Initially,

Below is the set of the instructions:


SUB A

MOV B, A

DCR B

INR B

SUI 01H

HLT

Assumption:
Each instruction will use the result of the previous instruction for registers. Following is the description
of each instruction with register content and flag status:

 Instruction-1:
SUB A instruction will subtract the content of the accumulator itself. It is used to clear the
content of the accumulator. After this operation the content of the registers and flags will be
like figure given below.

 Instruction-2:
MOV B, A will copy the content from source register (A) to the destination register (B). Since it is
the Data Transfer instruction so it will not affect any flag. After this operation the content of the
registers and flags will be like figure given below.

 Instruction-3:
DCR B will decrease the content of the register B by 1. DCR operation doesn’t affect Carry
flag(CY).

B-00H 0 0 0 0 0 0 0 0

37
For DCR B takes the 2’s complement of the 01H, 2’s Complement of 01H:

0000 0001

1 1 1 1 1 1 1 0 (1's complement)

+1

------------------

1111 1111

------------------

+(00) 0 0 0 0 0 0 0 0

-----------------------

1111 1111

----------------------

(FFH) this will be the content of the B. So after this operation the content of the registers and flag will be
like figure given below.

 Instruction-4:
INR B will increase the content of the register B by 1. INR operation doesn’t affect Carry flag(CY).

 B(FFH)

 1111 1111

 +(01) 0 0 0 0 0 0 0 1

 ------------------

 CY=1 0 0 0 0 0 0 0 0

------------------

(0 0 0 0 0 0 0 0) will be the content of the register B. So after this operation the content of the registers
and flag will be like figure given below.

38
 Instruction-5:
SUI 01H will subtract 01H from the content of the accumulator and store the result in the
accumulator.

A-00H 0 0 0 0 0 0 0 0

For SUI 01H takes the 2’s complement of the 01H, 2’s Complement of 01H:

0000 0001

1 1 1 1 1 1 1 0 (1's complement)

+1

------------------

1111 1111

------------------

+(00) 0 0 0 0 0 0 0 0 (Content of the accumulator)

-----------------------

1111 1111

(FFH) this will store in the Accumulator. After this operation the content of the registers and flag will be
like figure given below.

HLT will terminate the execution of program.

Microprogrammed Control
Computer Organization | Micro-Operation
In computer central processing units, micro-operations (also known as micro-ops) are the functional or
atomic, operations of a processor. These are low level instructions used in some designs to implement
complex machine instructions. They generally perform operations on data stored in one or more
registers. They transfer data between registers or between external buses of the CPU, also performs
arithmetic and logical operations on registers.
In executing a program, operation of a computer consists of a sequence of instruction cycles, with one
machine instruction per cycle. Each instruction cycle is made up of a number of smaller units – Fetch,
Indirect, Execute and Interrupt cycles. Each of these cycles involves series of steps, each of which
involves the processor registers. These steps are referred as micro-operations. the prefix micro refers to

39
the fact that each of the step is very simple and accomplishes very little. Figure below depicts the
concept being discussed here.

Summary: Execution of a program consists of sequential execution of instructions. Each instruction is


executed during an instruction cycle made up of shorter sub-cycles(example – fetch, indirect, execute,
interrupt). The performance of each sub-cycle involves one or more shorter operations, that is, micro-
operations.

Microarchitecture and Instruction Set Architecture


In this article we look at what an Instruction Set Architecture (ISA) is and what is the difference between
an ‘ISA’ and Microarchitecture. An ISA is defined as the design of a computer from the Programmer’s
Perspective.

This basically means that an ISA describes the design of a Computer in terms of the basic operationsit
must support. The ISA is not concerned with the implementation specific details of a computer. It is only
concerned with the set or collection of basic operations the computer must support. For example the
AMD Athlon and the Core 2 Duo processors have entirely different implementations but they support
more or less the same set of basic operations as defined in the x86 Instruction Set.

Let us try to understand the Objectives of an ISA by taking the example of the MIPS ISA. MIPS is one of
the most widely used ISAs in education due to its simplicity.

1. The ISA defines the types of instructions to be supported by the processor.


Based on the type of operations they perform MIPS Instructions are classified into 3 types:

 Arithmetic/Logic Instructions:
These Instructions perform various Arithmetic & Logical operations on one or more
operands.

40
 Data Transfer Instructions:
These instructions are responsible for the transfer of instructions from memory to the
processor registers and vice versa.

 Branch and Jump Instructions:


These instructions are responsible for breaking the sequential flow of instructions and
jumping to instructions at various other locations, this is necessary for the
implementation of functions and conditional statements.

2. The ISA defines the maximum length of each type of instruction.

Since the MIPS is a 32 bit ISA, each instruction must be accomodated within 32 bits.

3. The ISA defines the the Instruction Format of each type of instruction.
The Instruction Format determines how the entire instruction is encoded within 32 bits
There are 3 types of Instruction Formats in the MIPS ISA:

 R-Instruction Format

 I-Instruction Format

 J-Instruction Format

Each of the above Instruction Formats have different instruction encoding schemes, and hence need to
be interpreted differently by the processor.

If we look at the Abstraction Hierarchy:

Figure – The Abstraction Hierarchy

We note that the Microarchitectural level lies just below the ISA level and hence is concerned with the
implementation of the basic operations to be supported by the Computer as defined by the ISA.
Therefore we can say that the AMD Athlon and Core 2 Duo processors are based on the same ISA but
have different microarchitectures with different performance and efficiencies.

Now one may ask the need to distinguish between Microarchitecture and ISA ?

The answer to this lies in the need to standardize and maintain the compatibility of programs across
different hardware implementations based on the same ISA. Making different machines compatible

41
with the same set of basic instructions (The ISA) allows the same program to run smoothly on many
different machines thereby making it easier for the programmers to document and maintain code for
many different machines simulteneously and efficiently.

This Flexibility is the reason we first define an ISA and then design different microarchitectures
complying to this ISA for implementing the machine. The design of a ISA is one of the major tasks in the
study of Computer Architecture.

INSTRUCTION SET ARCHITECTURE MICROARCHITECTURE

The Microarchitecture is more concerned with the


The ISA is responsible for defining the set of lower level implementation of how the instructions
instructions to be supported by the processor. are going to be executed and deals with concepts
For example, some of the instructions defined like Instruction Pipelining, Branch Prediction, Out
by the ARMv7 ISA are given below. of Order Execution.

On the other hand the Branch of Computer


Organization is concerned with the
implementation of a particular ISA deals with
various hardware implementation techniques, i.e.
is the Microarchitecture level. For Example, ARM
The Branch of Computer Architecture is more licenses other companies like Qualcomm, Apple for
inclined towards the Analysis and Design of using it ARM ISA, but each of these companies have
Instruction Set Architecture.For Example, Intel their own implementations of this ISA thereby
developed the x86 architecture, ARM making them different in performance and power
developed the ARMarchitecture, & AMD efficiency. The Krait cores developed by Qualcomm
developed the amd64 architecture. The RISC-V have a different microarchitecture and the Apple
ISA developed by UC Berkeley is an example of A-series processors have a different
a Open Source ISA. microarchitecture.

The x86 was developed by Intel, but we see that almost every year Intel comes up with a new
generation of i-series processors. The x86 architecture on which most of the Intel Processors are based
essentially remains the same across all these generations but, where they differ is in the underlying
Microarchitecture. They differ in their implementation, and hence are claimed to have improved
Performance. These various Microarchitectures developed by Intel are codenamed as ‘Nehalem’,
‘Sandybridge’, ‘Ivybridge’ and so on.

Therefore in conclusion, we can say that different machines may be based on the same ISA, but have
different Microarchitectures.

Difference between CALL and JUMP instructions


CALL instruction is used to call a subroutine. Subroutines are often used to perform tasks that need to
be performed frequently. The JMP instruction is used to cause the PLC to skip over rungs.

The differences Between CALL and JUMP instructions are:

42
SERIAL
NO. JUMP CALL

Program control is transferred to a Program Control is transferred to a


memory location which is in the main memory location which is not a part of
1. program main program

Immediate Addressing Mode + Register


2. Immediate Addressing Mode Indirect Addressing Mode

Initialisation of SP(Stack Pointer) is not Initialisation of SP(Stack Pointer) is


3. mandatory mandatory

Value of Program Counter(PC) is not Value of Program Counter(PC) is


4. transferred to stack transferred to stack

After JUMP, there is no return


5. instruction After CALL, there is a return instruction

6. Value of SP does not changes Value of SP is decremented by 2

10 T states are required to execute this 18 T states are required to execute this
7. instruction instruction

3 Machine cycles are required to 5 Machine cycles are required to execute


8. execute this instruction this instruction

Computer Organization | Hardwired v/s Micro-programmed Control Unit


To execute an instruction, the control unit of the CPU must generate the required control signal in the
proper sequence. There are two approaches used for generating the control signals in proper sequence
as Hardwired Control unit and Micro-programmed control unit.

Hardwired Control Unit –


The control hardware can be viewed as a state machine that changes from one state to another in every
clock cycle, depending on the contents of the instruction register, the condition codes and the external
inputs. The outputs of the state machine are the control signals. The sequence of the operation carried
out by this machine is determined by the wiring of the logic elements and hence named as “hardwired”.

 Fixed logic circuits that correspond directly to the Boolean expressions are used to generate the
control signals.

 Hardwired control is faster than micro-programmed control.

 A controller that uses this approach can operate at high speed.

43
Micro-programmed Control Unit –

 The control signals associated with operations are stored in special memory units inaccessible
by the programmer as Control Words.

 Control signals are generated by a program are similar to machine language programs.

 Micro-programmed control unit is slower in speed because of the time it takes to fetch
microinstructions from the control memory.

Some Important Terms –

1. Control Word : A control word is a word whose individual bits represent various control signals.

2. Micro-routine : A sequence of control words corresponding to the control sequence of a


machine instruction constitutes the micro-routine for that instruction.

3. Micro-instruction : Individual control words in this micro-routine are referred to as


microinstructions.

4. Micro-program : A sequence of micro-instructions is called a micro-program, which is stored in a


ROM or RAM called a Control Memory (CM).

5. Control Store : the micro-routines for all instructions in the instruction set of a computer are
stored in a special memory called the Control Store.

44
Types of Micro-programmed Control Unit – Based on the type of Control Word stored in the Control
Memory (CM), it is classified into two types :

1. Horizontal Micro-programmed control Unit :


The control signals are represented in the decoded binary format that is 1 bit/CS. Example: If 53 Control
signals are present in the processor than 53 bits are required. More than 1 control signal can be enabled
at a time.

 It supports longer control word.

 It is used in parallel processing applications.

 It allows higher degree of parallelism. If degree is n, n CS are enabled at a time.

 It requires no additional hardware(decoders). It means it is faster than Verical


Microprogrammed.

2. Vertical Micro-programmed control Unit :


The control signals re represented in the encoded binary format. For N control signals- Log2(N) bits are
required.

 It supports shorter control words.

 It supports easy implementation of new conrol signals therefore it is more flexible.

 It allows low degree of parallelism i.e., degree of parallelism is either 0 or 1.

 Requires an additional hardware (decoders) to generate control signals, it implies it is slower


than horizontal microprogrammed.

45
Hardwired Vs Micro-programmed Control unit | Set 2

Prerequisite – Hardwired v/s Micro-programmed Control Unit


To execute an instruction, there are two types of control units Hardwired Control unit and Micro-
programmed control unit.

1. Hardwired control units are generally faster than microprogrammed designs. In hardwired
control, we saw how all the control signals required inside the CPU can be generated using a
state counter and a PLA circuit.

2. A microprogrammed control unit is a relatively simple logic circuit that is capable of (1)
sequencing through microinstructions and (2) generating control signals to execute each
microinstruction.

HARDWIRED CONTROL UNIT MICROPROGRAMMED CONTROL UNIT

Hardwired control unit generates the control Micrprogrammed control unit generates the
signals needed for the processor using logic control signals with the help of micro
circuits instructions stored in control memory

Hardwired control unit is faster when compared to


microprogrammed control unit as the required This is slower than the other as micro
control signals are generated with the help of instructions are used for generating signals
hardwares here

Difficult to modify as the control signals that need Easy to modify as the modification need to
to be generated are hard wired be done only at the instruction level

Less costlier than hardwired control as only


More costlier as everything has to be realized in micro instructions are used for generating
terms of logic gates control signals

It cannot handle complex instructions as the


circuit design for it becomes complex It can handle complex instructions

Only limited number of instructions are used due Control signals for many instructions can be
to the hardware implementation generated

Used in computer that makes use of Reduced Used in computer that makes use of
Instruction Set Computers(RISC) Complex Instruction Set Computers(CISC)

Computer Organization | Performance of Computer


Computer performance is the amount of work accomplished by a computer system. The word
performance in computer performance means “How well is the computer doing the work it is supposed
to do?”. It basically depends on response time, throughput and execution time of a computer system.

46
Response time is the time from start to completion of a task. This also includes:

 Operating system overhead.

 Waiting for I/O and other processes

 Accessing disk and memory

 Time spent executing on the CPU or execution time.

Throughput is the total amount of work done in a given time.

CPU execution time is the total time a CPU spends computing on a given task. It also excludes time for
I/O or running other programs. This is also referred to as simply CPU time.

Performance is determined by execution time as performance is inversely proportional to execution


time.

Performance = (1 / Execution time)

And,

(Performance of A / Performance of B)

= (Execution Time of B / Execution Time of A)

If given that Processor A is faster than processor B, that means execution time of A is less than that of
execution time of B. Therefore, performance of A is greater than that of performance of B.

Example –
Machine A runs a program in 100 seconds, Machine B runs the same program in 125 seconds

(Performance of A / Performance of B)

= (Execution Time of B / Execution Time of A)

= 125 / 100 = 1.25

That means machine A is 1.25 times faster than Machine B.

And, the time to execute a given program can be computed as:

Execution time = CPU clock cycles x clock cycle time

Since clock cycle time and clock rate are reciprocals, so,

Execution time = CPU clock cycles / clock rate

The number of CPU clock cycles can be determined by,

CPU clock cycles

= (No. of instructions / Program ) x (Clock cycles / Instruction)

= Instruction Count x CPI

47
Which gives,

Execution time

= Instruction Count x CPI x clock cycle time

= Instruction Count x CPI / clock rate

The units for CPU Execution time are:

How to Improve Performance?


To improve performance you can either:

 Decrease the CPI (clock cycles per instruction) by using new Hardware.

 Decrease the clock time or Increase clock rate by reducing propagation delays or by use
pipelining.

 Decrease the number of required cycles or improve ISA or Compiler.

Computer Organization | Control Unit and design


Control Unit is the part of the computer’s central processing unit (CPU), which directs the operation of
the processor. It was included as part of the Von Neumann Architecture by John von Neumann. It is the
responsibility of the Control Unit to tell the computer’s memory, arithmetic/logic unit and input and
output devices how to respond to the instructions that have been sent to the processor. It fetches
internal instructions of the programs from the main memory to the processor instruction register, and
based on this register contents, the control unit generates a control signal that supervises the execution
of these instructions.

A control unit works by receiving input information to which it converts into control signals, which are
then sent to the central processor. The computer’s processor then tells the attached hardware what
operations to perform. The functions that a control unit performs are dependent on the type of CPU
because the architecture of CPU varies from manufacturer to manufacturer. Examples of devices that
require a CU are:

 Control Processing Units(CPUs)

 Graphics Processing Units(GPUs)

48
Functions of the Control Unit –

1. It coordinates the sequence of data movements into, out of, and between a processor’s many
sub-units.

2. It interprets instructions.

3. It controls data flow inside the processor.

4. It receives external instructions or commands to which it converts to sequence of control


signals.

5. It controls many execution units(i.e. ALU, data buffers and registers) contained within a CPU.

6. It also handles multiple tasks, such as fetching, decoding, execution handling and storing results.

Types of Control Unit –


There are two types of control units: Hardwired control unit and Microprogrammable control unit.

1. Hardwired Control Unit –


In the Hardwired control unit, the control signals that are important for instruction execution
control are generated by specially designed hardware logical circuits, in which we can not
modify the signal generation method without physical change of the circuit structure. The
operation code of an instruction contains the basic data for control signal generation. In the
instruction decoder, the operation code is decoded. The instruction decoder constitutes a set of
many decoders that decode different fields of the instruction opcode.

As a result, few output lines going out from the instruction decoder obtains active signal values. These
output lines are connected to the inputs of the matrix that generates control signals for executive units
of the computer. This matrix implements logical combinations of the decoded signals from the
instruction opcode with the outputs from the matrix that generates signals representing consecutive
control unit states and with signals coming from the outside of the processor, e.g. interrupt signals. The
matrices are built in a similar way as a programmable logic arrays.

49
Control signals for an instruction execution have to be generated not in a single time point but during
the entire time interval that corresponds to the instruction execution cycle. Following the structure of
this cycle, the suitable sequence of internal states is organized in the control unit.

A number of signals generated by the control signal generator matrix are sent back to inputs of the next
control state generator matrix. This matrix combines these signals with the timing signals, which are
generated by the timing unit based on the rectangular patterns usually supplied by the quartz generator.
When a new instruction arrives at the control unit, the control units is in the initial state of new
instruction fetching. Instruction decoding allows the control unit enters the first state relating execution
of the new instruction, which lasts as long as the timing signals and other input signals as flags and state
information of the computer remain unaltered. A change of any of the earlier mentioned signals
stimulates the change of the control unit state.

This causes that a new respective input is generated for the control signal generator matrix. When an
external signal appears, (e.g. an interrupt) the control unit takes entry into a next control state that is
the state concerned with the reaction to this external signal (e.g. interrupt processing). The values of
flags and state variables of the computer are used to select suitable states for the instruction execution
cycle.

The last states in the cycle are control states that commence fetching the next instruction of the
program: sending the program counter content to the main memory address buffer register and next,
reading the instruction word to the instruction register of computer. When the ongoing instruction is
the stop instruction that ends program execution, the control unit enters an operating system state, in
which it waits for a next user directive.

50
2. Microprogrammable control unit –
The fundamental difference between these unit structures and the structure of the hardwired
control unit is the existence of the control store that is used for storing words containing
encoded control signals mandatory for instruction execution.

In microprogrammed control units, subsequent instruction words are fetched into the instruction
register in a normal way. However, the operation code of each instruction is not directly decoded to
enable immediate control signal generation but it comprises the initial address of a microprogram
contained in the control store.

 With a single-level control store:


In this, the instruction opcode from the instruction register is sent to the control store
address register. Based on this address, the first microinstruction of a microprogram
that interprets execution of this instruction is read to the microinstruction register. This
microinstruction contains in its operation part encoded control signals, normally as few
bit fields. In a set microinstruction field decoders, the fields are decoded. The
microinstruction also contains the address of the next microinstruction of the given
instruction microprogram and a control field used to control activities of the
microinstruction address generator.

The last mentioned field decides the addressing mode (addressing operation) to be applied to the
address embedded in the ongoing microinstruction. In microinstructions along with conditional
addressing mode, this address is refined by using the processor condition flags that represent the status
of computations in the current program. The last microinstruction in the instruction of the given
microprogram is the microinstruction that fetches the next instruction from the main memory to the
instruction register.

51
 With a two-level control store:
In this, in a control unit with a two-level control store, besides the control memory for
microinstructions, a nano-instruction memory is included. In such a control unit,
microinstructions do not contain encoded control signals. The operation part of
microinstructions contains the address of the word in the nano-instruction memory,
which contains encoded control signals. The nano-instruction memory contains all
combinations of control signals that appear in microprograms that interpret the
complete instruction set of a given computer, written once in the form of nano-
instructions.

In this way, unnecessary storing of the same operation parts of microinstructions is avoided. In this case,
microinstruction word can be much shorter than with the single level control store. It gives a much
smaller size in bits of the microinstruction memory and, as a result, a much smaller size of the entire
control memory. The microinstruction memory contains the control for selection of consecutive
microinstructions, while those control signals are generated at the basis of nano-instructions. In nano-
instructions, control signals are frequently encoded using 1 bit/ 1 signal method that eliminates
decoding.

Computer Organization | Horizontal micro-programmed Vs Vertical micro-programmed


control unit
Basically, control unit (CU) is the engine that runs the entire functions of a computer with the help of
control signals in the proper sequence. In the micro-programmed control unit approach, the control
signals that are associated with the operations are stored in special memory units. It is convenient to
think of sets of control signals that cause specific micro-operations to occur as being

52
“microinstructions”. The sequences of microinstructions could be stored in an internal “control”
memory.

Micro-programmed control unit can be classified into two types based on the type of Control Word
stored in the Control Memory, viz., Horizontal micro-programmed control unit and Vertical micro-
programmed control unit.

 In Horizontal micro-programmed control unit, the control signals are represented in the
decoded binary format, i.e., 1 bit/CS. On the other hand.

 In Vertical micro-programmed control unit, the control signals are represented in the encoded
binary format.

Comparison between Horizontal micro-programmed control unit and Vertical micro-programmed


control unit:

HORIZONTAL Μ-PROGRAMMED CU VERTICAL Μ-PROGRAMMED CU

It supports longer control word. It supports shorter control word.

It allows higher degree of parallelism. If


degree is n, then n Control Signals are enabled It allows low degree of parallelism i.e., degree of parallelism
at a time. is either 0 or 1.

Additional hardware in the form of decoders are required


No additional hardware is required. to generate control signals.

It is faster than Vertical micro-programmed


control unit. it is slower than Horizontal micro-programmed control unit.

It is less flexible than Vertical micro- It is more flexible than Horizontal micro-programmed
programmed control unit. control unit.

Horizontal micro-programmed control unit Vertical micro-programmed control unit uses vertical
uses horizontal microinstruction, where every microinstruction, where a code is used for each action to be
bit in the control field attaches to a control performed and the decoder translates this code into
line. individual control signals.

Horizontal micro-programmed control unit


makes less use of ROM encoding than vertical Vertical micro-programmed control unit makes more use of
micro-programmed control unit. ROM encoding to reduce the length of the control word.

53
Memory Organization

Introduction to memory and memory units


Memories are made up of registers. Each register in the memory is one storage location. Storage
location is also called as memory location. Memory locations are identified using Address. The total
number of bit a memory can store is its capacity.

A storage element is called a Cell. Each register is made up of storage element in which one bit of data is
stored. The data in a memory are stored and retrieved by the process
called writing and readingrespectively.

A word is a group of bits where a memory unit stores binary information. A word with group of 8 bits is
called a byte.
A memory unit consists of data lines, address selection lines, and control lines that specify the direction
of transfer. The block diagram of a memory unit is shown below:

54
Data lines provide the information to be stored in memory. The control inputs specify the direction
transfer. The k-address lines specify the word chosen.

When there are k address lines, 2k memory word can be accessed.

Refer for RAM and ROM, different types of RAM, cache memory, and secondary memory

Memory Hierarchy Design and its Characteristics


In the Computer System Design, Memory Hierarchy is an enhancement to organize the memory such
that it can minimize the access time. The Memory Hierarchy was developed based on a program
behavior known as locality of references.The figure below clearly demonstrates the different levels of
memory hierarchy :

This Memory Hierarchy Design is divided into 2 main types:

1. External Memory or Secondary Memory –


Comprising of Magnetic Disk, Optical Disk, Magnetic Tape i.e. peripheral storage devices which
are accessible by the processor via I/O Module.

2. Internal Memory or Primary Memory –


Comprising of Main Memory, Cache Memory & CPU registers. This is directly accessible by the
processor.

We can infer the following characteristics of Memory Hierarchy Design from above figure:

1. Capacity:
It is the global volume of information the memory can store. As we move from top to bottom in
the Hierarchy, the capacity increases.

2. Access Time:
It is the time interval between the read/write request and the availability of the data. As we
move from top to bottom in the Hierarchy, the access time increases.

3. Performance:
Earlier when the computer system was designed without Memory Hierarchy design, the speed

55
gap increases between the CPU registers and Main Memory due to large difference in access
time. This results in lower performance of the system and thus, enhancement was required. This
enhancement was made in the form of Memory Hierarchy Design because of which the
performance of the system increases. One of the most significant ways to increase system
performance is minimizing how far down the memory hierarchy one has to go to manipulate
data.

4. Cost per bit:


As we move from bottom to top in the Hierarchy, the cost per bit increases i.e. Internal Memory
is costlier than External Memory.

Difference between Byte Addressable Memory and Word Addressable Memory


Memory is a storage component in the Computer used to store application programs. The Memory Chip
is divided into equal parts called as “CELLS”. Each Cell is uniquely identified by a binary number called
as “ADDRESS”. For example, the Memory Chip configuration is represented as ’64 K x 8′ as shown in the
figure below.

The following information can be obtained from the memory chip representation shown above:

1. Data Space in the Chip = 64K X 8


2. Data Space in the Cell = 8 bits
3. Address Space in the Chip = \log_{2} (64 K) = 16 bits

Now we can clearly state the difference between Byte Addressable Memory & Word Addressable
Memory.

56
BYTE ADDRESSABLE MEMORY WORD ADDRESSABLE MEMORY

When the data space in the cell = 8 When the data space in the cell = word length of
bitsthen the corresponding address CPU then the corresponding address space is called
space is called as Byte Address. as Word Address.

Based on this data storage i.e. Bytewise Based on this data storage i.e. Wordwise storage, the
storage, the memory chip configuration is memory chip configuration is named as Word
named as Byte Addressable Memory. Addressable Memory.

For eg. : 64K X 8 chip has 16 bit Address For eg. : For a 16-bit CPU, 64K X 16 chip has 16 bit
and cell size = 8 bits (1 Byte) which means Address & cell size = 16 bits (Word Length of CPU)
that in this chip, data is stored byte by which means that in this chip, data is stored word by
byte. word.

NOTE :
i) The most important point to be noted is that in case of either of Byte Address or Word Address, the
address size can be any number of bits (depends on the number of cells in the chip) but the cell
sizediffers in each case.

ii) The default memory configuration in the Computer design is Byte Addressable.

Difference between Simultaneous and Hierarchical Access Memory


Organisations
In the Computer System Design, Memory organisation is primarily divided into two main types on the
basis of the manner in which CPU tries to access different levels of Memory.

These two types include Simultaneous Access Memory Organisation and Hierarchical Access Memory
Organisation. Let us understand the difference between the two from the following table:

Simultaneous Access Memory Organisation Hierarchical Access Memory Organisation

57
Difference between Simultaneous and Hierarchical Access Memory Organisations:

SIMULTANEOUS ACCESS MEMORY


ORGANISATION HIERARCHICAL ACCESS MEMORY ORGANISATION

In this organisation, CPU is always directly connected


In this organisation, CPU is directly to L1
connected to all the levels of Memory. i.e. Level-1 Memory only.

CPU accesses the data from all levels of


Memory simultaneously. CPU always accesses the data from Level-1 Memory.

For any “miss” encountered in L1 memory, CPU cannot


For any “miss” encountered in L1 directly access data from higher memory levels(i.e. L2,
memory, CPU can directly access data L3, …..Ln). First the desired data will be transferred
from higher memory levels from higher memory levels to L1 memory.
(i.e. L2, L3, …..Ln). Only then it can be accessed by the CPU.

If H1 and H2 are the Hit Ratios and T1


and T2 are the access time of L1 and L2 If H1 and H2 are the Hit Ratios and T1 and T2 are the
memory levels respectively then the access time of L1 and L2 memory levels respectively
Average Memory Access Timecan be then the
calculated as: Average Memory Access Time can be calculated as:

NOTE:

1. By default the memory structure of Computer Systems is designed with Hierarchical Access
Memory Organisation.It is so because in this type of memory organisation the average access
time is reduced due to locality of references.

2. Simultaneous access Memory organisation is used for the implementation of Write Through
Cache.

3. In both types of memory organisation, the Hit Ratio of last memory level is always 1.

Computer Organization | Register Allocation


Registers are the fastest locations in the memory hierarchy. But unfortunately, this resource is limited. It
comes under the most constrained resources of the target processor. Register allocation is an NP-

58
complete problem. However, this problem can be reduced to graph coloring to achieve allocation and
assignment. Therefore a good register allocator computes an effective approximate solution to a hard
problem.

Figure – Input-Output

The register allocator determines which values will reside in the register and which register will hold
each of those values. It takes as its input a program with an arbitrary number of registers and produces a
program with finite register set that can fit into the target machine. (See image)

Allocation vs Assignment:

Allocation –
Maps an unlimited namespace onto that register set of the target machine.

 Reg. to Reg. Model: Maps virtual registers to physical registers but spills excess amount to
memory.

 Mem. to Mem. Model: Maps some subset of the memory location to a set of names that
models physical register set.

Allocation ensures that code will fit the target machine’s reg. set at each instruction.
Assignment –
Maps an allocated name set to physical register set of the target machine.

 Assumes allocation has been done so that code will fit into the set of physical registers.

 No more than ‘k’ values are designated into the registers, where ‘k’ is the no. of physical
registers.

General register allocation is a NP complete problem:

 Solved in polynomial time, when (no. of required registers) <= (no. of available physical
registers).

 An assignment can be produced in linear time using Interval-Graph Coloring.

Local Register Allocation And Assignment:


Allocation just inside a basic block is called Local Reg. Allocation. Two approaches for local reg.
allocation: Top down approach and bottom up approach.

Top Down Approach is a simple approach based on ‘Frequency Count’. Identify the values which should
be kept in registers and which should be kept in memory.

Algorithm:

1. Compute a priority for each virtual register.

59
2. Sort the registers in into priority order.

3. Assign registers in priority order.

4. Rewrite the code.

Moving beyond single Blocks:

 More complicated because the control flow enters the picture.

 Liveness and Live Ranges: Live ranges consists of a set of definitions and uses that are related to
each other as they i.e. no single register can be common in such couple of instruction/data.

Following is a way to find out Live ranges in a block. A live range is represented as an interval [i,j], where
i is the definition and j is the last use.

Global Register Allocation and Assignment:


1. The main issue of a register allocator is minimizing the impact of spill code;

 Execution time for spill code.

 Code space for spill operation.

 Data space for spilled values.

2. Global allocation can’t guarantee an optimal solution for the execution time of spill code.
3. Prime differences between Local and Global Allocation:

 Structure of a global live range is naturally more complex than the local one.

 Within a global live range, distinct references may execute a different number of times. (When
basic blocks form a loop)

4. To make the decision about allocation and assignments, global allocator mostly uses graph coloring by
building an interference graph.
5. Register allocator then attempts to construct a k-coloring for that graph where ‘k’ is the no. of
physical registers.

 In case, the compiler can’t directly construct a k-coloring for that graph, it modifies the
underlying code by spilling some values to memory and tries again.

 Spilling actually simplifies that graph which ensures that the algorithm will halt.

6. Global Allocator uses several approaches, however, we’ll see top down and bottom up allocations
strategies. Subproblems associated with the above approaches.

 Discovering Global live ranges.

 Estimating Spilling Costs.

 Building an Interference graph.

Discovering Global Live Ranges:


How to discover Live range for a variable?

60
Figure – Discovering live ranges in a single block

The above diagram explains everything properly. Lets take the example of Rarp, its been initialised at
program point 1 and its last usage is at program point 11. Therefore, Live Rnage of Rarp i.e. Larp is
[1,11]. Similarly, others follow up.

Figure – Discovering Live Ranges

Estimating Global Spill Cost:

 Essential for taking a spill decision which includes – address computation, memory operation
cost and estimated execution frquency.

 For performance benefits these spilled values are kept typically for Activation record.

 Some embedded processors offers ScratchPad Memory to hold such spilled values.

 Negative Spill Cost: Consecutive load store for a single address needs to be removed as it
increases burden, hence incur negative spill cost.

 Infinite Spill Cost: A live range should have infinite spill cost if no other live range ends between
its definition and it’s used.

61
Interference and Interference Graph:

Figure – Building Interference Graph from Live Ranges

From the above diagram, it can be observed that the live range LRa starts in the first basic block and
ends in the last basic block. Therefore it will share an edge with every other live Range i.e. Lrb,Lrc,Lrd.
However, Lrb,Lrc,Lrd doesn’t overlap with any other live range excpet Lra so they are only sharing an
edge with Lra.

Building an Allocator:

 Note that a k-colorable graph finding is an NP-complete problem, so we need an approximation


for this.

 Try with live range splitting into some non-trivial chunks (most used ones).

Top Down Colouring:

1. Tries to color live range in an order determined by some ranking functions i.e. priority based.

2. If no color is available for a live range, allocator invokes either spilling or splitting to handle
uncolored ones.

3. Live ranges having k or more neighbors are called constrained nodes and are difficult to handle.

4. The unconstrained nodes are comparatively easy to handle.

5. Handling Spills: When no color found for some live ranges, spilling is needed to be done, but
this may not be a final/ultimate solution of course.

6. Live Range Splitting: For uncolored ones, split the live range into sub-ranges, those may have
fewer interference than the original one so that some of them can be colored at least.

Chaitin’s Idea:

 Choose an arbitrary node of ( degree < k ) and put it in the stack.

62
 Remove that node and all its edges from the graph. (This may decrease the degree of some
other nodes and cause some more nodes to have degree = k, some node has to be spilled.

 If no vertex needs to be spilled, successively pop vertices off the stack and color them in a color
not used by neighbors. (reuse colors as far as possible).

Coalescing copies to reduce degree:


The compiler can use the interference graph to coalesce two live ranges. So by coalescing, what type of
benefits can you get?

Figure – Coalescing Live Ranges

Comparing Top-Down and Bottom-Up allocator:

 Top-down allocator could adopt the ‘spill and iterate’ philosophy used in bottom-up ones.

 ‘Spill and iterate’ trades additional compile time for an allocation that potentially, uses less spill
code.

 Top-Down uses priority ranking to order all the constrained nodes. (However, it colors the
unconstrained nodes in an arbitrary order)

 Bottom-up constructs an order in which most nodes are colored in a graph where they are
unconstrained.

Figure – Coalescing Live Ranges

63
Computer Organization | Cache Memory
Cache Memory is a special very high-speed memory. It is used to speedup and synchronising with high-
speed CPU. Cache memory is costlier than main memory or disk memory but economical than CPU
registers. Cache memory is an extremely fast memory type that acts as a buffer between RAM and the
CPU. It holds frequently requested data and instructions so that they are immediately available to the
CPU when needed.

Cache memory is used to reduce the average time to access data from the Main memory. The cache is a
smaller and faster memory which stores copies of the data from frequently used main memory
locations. There are various different independent caches in a CPU, which stored instruction and data.

Levels of memory:

 Level 1 or Register –
It is a type of memory in which data is stored and accepted that are immediately stored in CPU.
Most commonly used register is accumulator, Program counter, address register etc.

 Level 2 or Cache memory –


It is the fastest memory which has faster access time where data is temporarily stored for faster
access.

 Level 3 or Main Memory –


It is memory on which computer works currently it is small in size and once power is off data no
longer stays in this memory

 Level 4 or Secondary Memory –


It is external memory which is not fast as main memory but data stays permanently in this
memory

Cache Performance:
When the processor needs to read or write a location in main memory, it first checks for a
corresponding entry in the cache.

 If the processor finds that the memory location is in the cache, a cache hit has occurred and
data is read from chache

 If the processor does not find the memory location in the cache, a cache miss has occurred. For
a cache miss, the cache allocates a new entry and copies in data from main memory, then the
request is fulfilled from the contents of the cache.

The performance of cache memory is frequently measured in terms of a quantity called Hit ratio.

Hit ratio = hit / (hit + miss) = no. of hits/total accesses

64
We can improve Cache performance using higher cache block size, higher associativity, reduce miss rate,
reduce miss penalty, and reduce Reduce the time to hit in the cache.

Cache Mapping:
There are three different types of mapping used for the purpose of cache memory are as follow: Direct
mapping, Associative mapping, and Set-Associative mapping. These are explained as following below.

1. Direct Mapping –
The simplest technique, known as direct mapping, maps each block of main memory into only
one possible cache line. or
In direct mapping, assigned each memory block to a specific line in the cache. If a line is
previously taken up by a memory block when a new block needs to be loaded, the old block is
trashed. An address space is split into two parts index field and tag field. The cache is used to
store the tag field whereas the rest is stored in the main memory. Direct mapping`s
performance is directly proportional to the Hit ratio.

2. i = j modulo m

3. where

4. i=cache line number

5. j= main memory block number

m=number of lines in the cache

For purposes of cache access, each main memory address can be viewed as consisting of three fields.The
least significant w bits identify a unique word or byte within a block of main memory. In most
contemporary machines,the address is at the byte level.The remaining s bits specify one of the 2 s blocks
of main memory.The cache logic interprets these s bits as a tag of s-r bits (most significant portion) and
a line field of r bits. This latter field identifies one of the m=2r lines of the cache.

65
6. Associative Mapping –
In this type of mapping the associative memory is used to store content and addresses both of
the memory word. Any block can go into any line of the cache. This means that the word id bits
are used to identify which word in the block is needed, but the tag becomes all of the remaining
bits. This enables the placement of the any word at any place in the cache memory. It is
considered to be the fastest and the most flexible mapping form.

7. Set-associative Mapping –
This form of mapping is a enhanced form of the direct mapping where the drawbacks of direct
mapping is removed. Set associative addresses the problem of possible thrashing in the direct
mapping method. It does this by saying that instead of having exactly one line that a block can
map to in the cache, we will group a few lines together creating a set. Then a block in memory
can map to any one of the lines of a specific set..Set-associative mapping allows that each word
that is present in the cache can have two or more words in the main memory for the same index
address. Set associative cache mapping combines the best of direct and associative cache
mapping techniques.

66
In this case, the cache consists of a number sets, each of which consists of a number of lines.The
relationships are

m=v*k

i= j mod v

where

i=cache set number

j=main memory block number

v=number of stes

m=number of lines in the cache number of sets

k=number of lines in each set

Application of Cache Memory –

1. Usually, the cache memory can store a reasonable number of blocks at any given time,
but this number is small compared to the total number of blocks in the main memory.

2. The correspondence between the main memory blocks and those in the cache is
specified by a mapping function.

Types of Cache –

67
 Primary Cache –
A primary cache is always located on the processor chip. This cache is small and its
access time is comparable to that of processor registers.

 Secondary Cache –
Secondary cache is placed between the primary cache and the rest of the memory. It is
referred to as the level 2 (L2) cache. Often, the Level 2 cache is also housed on the
processor chip.

Locality of reference –
Since size of cache memory is less as compared to main memory. So to check which part of main
memory should be given priority and loaded in cache is decided based on locality of reference.

Types of Locality of reference

5. Spatial Locality of reference


This says that there is chance that element will be present in the close proximity to the
reference point and next time if again searched then more close proximity to the point
of reference.

6. Temporal Locality of reference


In this Least recently used algorithm will be used. Whenever there is page fault occurs
within word will not only load word in main memory but complete page fault will be
loaded because spatial locality of reference rule says that if you are referring any word
next word will be referred in its register that’s why we load complete page table so
complete block will be loaded.

Cache Organization | Set 1 (Introduction)


Cache is close to CPU and faster than main memory. But at the same time is smaller than main memory.
The cache organization is about mapping data in memory to a location in cache.

A Simple Solution:
One way to go about this mapping is to consider last few bits of long memory address to find small
cache address, and place them at the found address.

Problems With Simple Solution:


The problem with this approach is, we loose the information about high order bits and have no way to
find out the lower order bits belong to which higher order bits.

68
Solution is Tag:
To handle above problem, more information is stored in cache to tell which block of memory is stored in
cache. We store additional information as Tag

What is a Cache Block?


Since programs have Spatial Locality (Once a location is retrieved, it is highly probable that the nearby
locations would be retrieved in near future). So a cache is organized in the form of blocks. Typical cache
block sizes are 32 bytes or 64 bytes.

69
The above arrangement is Direct Mapped Cache and it has following problem
We have discussed above that last few bits of memory addresses are being used to address in cache and
remaining bits are stored as tag. Now imagine that cache is very small and addresses of 2 bits. Suppose
we use the last two bits of main memory address to decide the cache (as shown in below diagram). So if
a program accesses 2, 6, 2, 6, 2, …, every access would cause a hit as 2 and 6 have to be stored in same
location in cache.

Solution to above problem – Associativity


What if we could store data at any place in cache, the above problem won’t be there? That would slow
down cache, so we do something in between.

Multilevel Cache Organisation


Cache is a random access memory used by the CPU to reduce the average time taken to access memory.
Multilevel Caches is one of the techniques to improve Cache Performance by reducing the “MISS
PENALTY”. Miss Penalty refers to the extra time required to bring the data into cache from the Main
memory whenever there is a “miss” in cache .
For clear understanding let us consider an example where CPU requires 10 Memory References for
accessing the desired information and consider this scenario in the following 3 cases of System design :

Case 1 : System Design without Cache Memory

70
Here the CPU directly communicates with the main memory and no caches are involved.
In this case, the CPU needs to access the main memory 10 times to access the desired information.

Case 2 : System Design with Cache Memory

Here the CPU at first checks whether the desired data is present in the Cache Memory or not i.e.
whether there is a “hit” in cache or “miss” in cache. Suppose there are 3 miss in Cache Memory then the
Main Memory will be accessed only 3 times . We can see that here the miss penalty is reduced because
the Main Memory is accessed lesser number of times than that in the previous case.

Case 3 : System Design with Multilevel Cache Memory

Here the Cache performance is optimized further by introducing multilevel Caches. As shown in the
above figure, we are considering 2 level Cache Design . Suppose there are 3 miss in the L1 Cache
Memory and out of these 3 misses there are 2 miss in the L2 Cache Memory then the Main Memory will
be accessed only 2 times . It is clear that here the Miss Penalty is reduced considerably than that in the
previous case thereby improving the Performance of Cache Memory.

NOTE :
We can observe from the above 3 cases that we are trying to decrease the number of Main Memory
References and thus decreasing the Miss Penalty in order to improve the overall System Performance.
Also, it is important to note that in the Multilevel Cache Design, L1 Cache is attached to the CPU and it is
small in size but fast. Although, L2 Cache is attached to the Primary Cache i.e. L1 Cache and it is larger in
size and slower but still faster than the Main Memory.

Computer Organization | Locality and Cache friendly code

Caches are the faster memories that are built to deal with the Processor-Memory gap in data read
operation, i.e. the time difference in a data read operation in a CPU register and that in the main
memory. Data read operation in registers is generally 100 times faster than in the main memory and it
keeps on increasing substantially, as we go down in the memory hierarchy.

Caches are installed in the middle of CPU registers and the main memory to bridge this time gap in data
reading. Caches serve as temporary staging area for a subset of data and instructions stored in relatively
slow main memory. Since the size of cache is small, only the data which is frequently used by the
processor during the execution of a program is stored in cache. Caching of this frequently used data by

71
CPU eliminates the need of bringing the data from the slower main memory again and again which takes
hundreds of CPU cycles.

The idea of caching the useful data centers around a fundamental property of computer programs
known as locality. Programs with good locality tend to access the same set of data items over and over
again from the upper levels of the memory hierarchy (i.e. cache) and thus run faster.

Example: The run time of different matrix multiplication kernels that perform the same number of
arithmetic operations, but have different degrees of locality, can vary by a factor of 20!

Types of Locality:

 Temporal locality –
Temporal locality states that, the same data objects are likely to be reused multiple times by the
CPU during the execution of a program. Once a data object has been written into the cache on
the first miss, a number of subsequent hits on that object can be expected. Since the cache is
faster than the storage at the next lower level like the main memory, these subsequent hits can
be served much faster than the original miss.

 Spatial locality –
It states that if a data object is referenced once, then there is a high probability that it’s
neighbor data objects will also be referenced in near future. Memory blocks usually contain
multiple data objects. Because of spatial locality, we can expect that the cost of copying a block
after a miss will be amortized by subsequent references to other objects within that block.

Importance of Locality –
Locality in programs has an enormous impact on the design and performance of hardware and software
systems. In modern computing systems, the locality based advantages are not only confined to the
architecture but also, operating systems and application programs are built in a manner that they can
exploit the locality to the full extent.

In operating systems, the principle of locality allows the system to use main memory as a cache of the
most recently referenced chunk of virtual address space and also in case of recently used disk blocks in
disk file systems.

Similarly, Applications programs like web browsers exploit temporal locality by caching recently
referenced documents on a local disk. High-volume web servers hold recently requested documents in
the front end disk cache that satisfy requests for these documents aithout any intervention of server.

Cache Friendly Code –


Programs with good locality generally run faster as they have lower cache miss rate in comparison with
the ones with bad locality. In a good programming practice, cache performance is always counted as one
of the important factor when it comes to the analysis of the performance of a program. The basic
approach on how a code can be cache friendly is:

 Frequently used cases need to be faster: Programs often invest most of the time in a few core
functions and these functions in return have most to do with the loops. So, these loops should
be designed in a way that they possess a good locality.

72
 Multiple loops: If a program constitutes of multiple loops then minimize the cache misses in the
inner loop to alleviate the performance of the code.

Example-1: The above context can be understood by following the simple examples of multi-
dimensional array code. Consider the sum_array() function which sums the elements of a two dimension
array in row-major order:
int sumarrayrows(int a[8][4])

int i, j, sum = 0;

for (i = 0; i < 8; i++)

for (j = 0; j < 4; j++)

sum += a[i][j];

return sum;

Assuming, the cache has a block size of 4 words each, word size being 4 bytes. It is initially empty and
since, C stores arrays in row-major order so the references will result in the following pattern of hits and
misses, independent of cache organization.

The block which contains w[0]–w[3] is loaded into the cache from memory and reference to w[0] is a
miss but the next three references are all hits. The reference to v[4] causes another miss as a new block
is loaded into the cache, the next three references are hits, and so on. In general, three out of four
references will hit, which is the best that can be done with a cold cache. Thus, the hit ratio is 3/4*100 =
75%

Example-2: Now, the sum_array() function sums the elements of a two dimension array in column-
major order.
int sum_array(int a[8][8])

int i, j, sum = 0;

for (j = 0; j < 8; j++)

for (i = 0; i < 8; i++)

73
sum += a[i][j];

return sum;

The cache layout of the program will be as shown in the figure:

As C stores arrays in row-major order but in this case array is being accessed in column major order, so
the locality spoils in this case. the references will be made in order: a[0][0], a[1][0], a[2][0] and so on. As
the cache size is smaller, with each reference there will be a miss due to poor locality of the program.
Hence, the hit ratio will be 0. Poor hit ratio will eventually decrease the performance of a program and
will lead to a slower execution. In programming, these type of practices should be avoided.

Conclusion -
When talking about real life application programs and programming realms, optimized cache
performance gives a good speedup to a program, even if the runtime complexity of the program is high.
A good example is Quick sort. Though it has a worst case complexity of O(n2), it is the most popular
sorting algorithm and one of the important factor is the better cache performance than many other
sorting algorithms. Codes should be written in a way that they can exploit the cache to the best extent
for a faster execution.

Computer Organization | Locality of Reference and Cache Operation


Locality of reference refers to a phenomenon in which a computer program tends to access same set of
memory locations for a particular time period. In other words, Locality of Reference refers to the
tendency of the computer program to access instructions whose addresses are near one another. The
property of locality of reference is mainly shown by loops and subroutine calls in a program.

74
1. In case of loops in program control processing unit repeatedly refers to the set of instructions
that constitute the loop.

2. In case of subroutine calls, everytime the set of instructions are fetched from memory.

3. References to data items also get localized that means same data item is referenced again and
again.

In above figure, you can see that CPU wants to read or fetch the data or instruction.First it will access
the cache memory as it is near to it and provides very fast access. If the required data or instruction is
found, it will be fetched. This situation is known as cache hit. But if the required data or instruction is
not found in the cache memory then this situation is known as cache miss.Now the main memory will be
searched for the required data or instruction that was being searched and if found will go through one
of the two ways:

1. First way is that the CPU should fetch the required data or instruction and use it and that’s it but
what, when the same data or instruction is required again.CPU again has to access same main
memory location for it and we already know that main memory is the slowest to access.

2. The second way is to store the data or instruction in the cache memory so that if it is needed
soon again in near future it could be fetched in a much faster way.

Cache Operation:
It is based on the principle of locality of reference. There are two ways with which data or instruction is
fetched from main memory and get stored in cache memory. These two ways are following:

1. Temporal Locality –
Temporal locality means current data or instruction that is being fetched may be needed soon.
So we should store that data or instruction in the cache memory so that we can avoid again
searching in main memory for the same data.

75
When CPU accesses the current main memory location for reading required data or instruction, it also
get stored in cache memory which is based on the fact that same data or instruction may be needed in
near future. This is known as temporal locality.

2. Spatial Locality –
Spatial locality means instruction or data near to the current memory location that is being
fetched, may be needed soon in near future.This is slightly different from temporal locality.Here
we are taking about nearly located memory locations while in temporal locality we were taking
about the actual memory location that were being fetched.

Cache Performance:
The performance of the cache is measured in terms of hit ratio. When CPU refers to memory and find
the data or instruction within the Cache Memory, it is known as cache hit. If the desired data or
instruction is not found in cache memory and CPU refers to the main memory to find that data or
instruction, it is known as cache miss.

Hit + Miss = Total CPU Reference

Hit Ratio(h) = Hit / (Hit+Miss)

Average access time of any memory system consists of two levels: Cache and Main Memory. If Tc is time
to access cache memory and Tm is the time to access main memory then we can write:

76
Tavg = Average time to access memory

Tavg = h*Tc + (1-h)*(Tm+Tc)

What’s difference between CPU Cache and TLB?


Both CPU Cache and TLB are hardware used in microprocessors but what’s the difference, especially
when someone says that TLB is also a type of Cache?

First thing first. CPU Cache is a fast memory which is used to improve latency of fetching information
from Main memory (RAM) to CPU registers. So CPU Cache sits between Main memory and CPU. And this
cache stores information temporarily so that the next access to the same information is faster. A CPU
cache which used to store executable instructions, it’s called Instruction Cache (I-Cache). A CPU cache
which is used to store data, it’s called Data Cache (D-Cache). So I-Cache and D-Cache speeds up fetching
time for instructions and data respectively. A modern processor contains both I-Cache and D-Cache. For
completeness, let us discuss about D-cache hierarchy as well. D-Cache is typically organized in a
hierarchy i.e. Level 1 data cache, Level 2 data cache etc.. It should be noted that L1 D-Cache is
faster/smaller/costlier as compared to L2 D-Cache. But the basic idea of ‘CPU cache‘ is to speed up
instruction/data fetch time from Main memory to CPU.

Translation Lookaside Buffer (i.e. TLB) is required only if Virtual Memory is used by a processor. In
short, TLB speeds up translation of virtual address to physical address by storing page-table in a faster
memory. In fact, TLB also sits between CPU and Main memory. Precisely speaking, TLB is used by MMU
when physical address needs to be translated to virtual address. By keeping this mapping of virtual-
physical addresses in a fast memory, access to page-table improves. It should be noted that page-table
(which itself is stored in RAM) keeps track of where virtual pages are stored in the physical memory. In
that sense, TLB also can be considered as a cache of the page-table.

But the scope of operation for TLB and CPU Cache is different. TLB is about ‘speeding up address
translation for Virtual memory’ so that page-table needn’t to be accessed for every address. CPU Cache
is about ‘speeding up main memory access latency’ so that RAM isn’t accessed always by CPU. TLB
operation comes at the time of address translation by MMU while CPU cache operation comes at the
time of memory access by CPU. In fact, any modern processor deploys all I-Cache, L1 & L2 D-Cache and
TLB.

Please do Like/Share if you find the above useful. Also, please do leave us comment for further

clarification or info. We would love to help and learn

77
Different Types of RAM (Random Access Memory )
RAM(Random Access Memory) is a part of computer’s Main Memory which is directly accessible by CPU.
RAM is used to Read and Write data into it which is accessed by CPU randomly. RAM is volatile in
nature, it means if the power goes off, the stored information is lost. RAM is used to store the data that
is currently processed by the CPU. Most of the programs and data that are modifiable are stored in
RAM.

Integrated RAM chips are available in two form:

1. SRAM(Static RAM)

2. DRAM(Dynamic RAM)

The block diagram of RAM chip is given below.

SRAM

The SRAM memories consist of circuits capable of retaining the stored information as long as the power
is applied. That means this type of memory requires constant power. SRAM memories are used to build
Cache Memory.

SRAM Memory Cell: Static memories (SRAM) are memories that consist of circuits capable of retaining
their state as long as power is on. Thus this type of memories is called volatile memories. The below
figure shows a cell diagram of SRAM. A latch is formed by two inverters connected as shown in the
figure. Two transistors T1 and T2 are used for connecting the latch with two bit lines. The purpose of
these transistors is to act as switches that can be opened or closed under the control of the word line,
which is controlled by the address decoder. When the word line is at 0-level, the transistors are turned
off and the latch remains its information. For example, the cell is at state 1 if the logic value at point A is
1 and at point B is 0. This state is retained as long as the word line is not activated.

For Read operation, the word line is activated by the address input to the address decoder. The
activated word line closes both the transistors (switches) T1 and T2. Then the bit values at points A and

78
B can transmit to their respective bit lines. The sense/write circuit at the end of the bit lines sends the
output to the processor.
For Write operation, the address provided to the decoder activates the word line to close both the
switches. Then the bit value that to be written into the cell is provided through the sense/write circuit
and the signals in bit lines are then stored in the cell.

DRAM

DRAM stores the binary information in the form of electric charges that applied to capacitors. The
stored information on the capacitors tend to lose over a period of time and thus the capacitors must be
periodically recharged to retain their usage. The main memory is generally made up of DRAM chips.

DRAM Memory Cell: Though SRAM is very fast, but it is expensive because of its every cell requires
several transistors. Relatively less expensive RAM is DRAM, due to the use of one transistor and one
capacitor in each cell, as shown in the below figure., where C is the capacitor and T is the transistor.
Information is stored in a DRAM cell in the form of a charge on a capacitor and this charge needs to be
periodically recharged.
For storing information in this cell, transistor T is turned on and an appropriate voltage is applied to the
bit line. This causes a known amount of charge to be stored in the capacitor. After the transistor is
turned off, due to the property of the capacitor, it starts to discharge. Hence, the information stored in
the cell can be read correctly only if it is read before the charge on the capacitors drops below some
threshold value.

Types of DRAM

There are mainly 5 types of DRAM:

1. Asynchronous DRAM (ADRAM): The DRAM described above is the asynchronous type DRAM.
The timing of the memory device is controlled asynchronously. A specialized memory controller
circuit generates the necessary control signals to control the timing. The CPU must take into
account the delay in the response of the memory.

2. Synchronous DRAM (SDRAM): These RAM chips’ access speed is directly synchronized with the
CPU’s clock. For this, the memory chips remain ready for operation when the CPU expects them
to be ready. These memories operate at the CPU-memory bus without imposing wait states.
SDRAM is commercially available as modules incorporating multiple SDRAM chips and forming
the required capacity for the modules.

79
3. Double-Data-Rate SDRAM (DDR SDRAM): This faster version of SDRAM performs its operations
on both edges of the clock signal; whereas a standard SDRAM performs its operations on the
rising edge of the clock signal. Since they transfer data on both edges of the clock, the data
transfer rate is doubled. To access the data at high rate, the memory cells are organized into
two groups. Each group is accessed separately.

4. Rambus DRAM (RDRAM): The RDRAM provides a very high data transfer rate over a narrow
CPU-memory bus. It uses various speedup mechanisms, like synchronous memory interface,
caching inside the DRAM chips and very fast signal timing. The Rambus data bus width is 8 or 9
bits.

5. Cache DRAM (CDRAM): This memory is a special type DRAM memory with an on-chip cache
memory (SRAM) that acts as a high-speed buffer for the main DRAM.

Difference between SRAM and DRAM

Below table lists some of the differences between SRAM and DRAM:

Operating System | Secondary memory – Hard disk drive


A hard disk is a memory storage device which looks like this:

The disk is divided into tracks. Each track is further divided into sectors. The point to be noted here is
that outer tracks are bigger in size than the inner tracks but they contain the same number of sectors
and have equal storage capacity. This is because the storage density is high in sectors of the inner tracks
where as the bits are sparsely arranged in sectors of the outer tracks. Some space of every sector is used
for formatting. So, the actual capacity of a sector is less than the given capacity.

Read-Write(R-W) head moves over the rotating hard disk. It is this Read-Write head that performs all
the read and write operations on the disk and hence, position of the R-W head is a major concern. To

80
perform a read or write operation on a memory location, we need to place the R-W head over that
position. Some important terms must be noted here:

1. Seek time – The time taken by the R-W head to reach the desired track from it’s current
position.

2. Rotational latency – Time taken by the sector to come under the R-W head.

3. Data transfer time – Time taken to transfer the required amount of data. It depends upon the
rotational speed.

4. Controller time – The processing time taken by the controller.

5. Average Access time – seek time + Average Rotational latency + data transfer time + controller
time.

In questions, if the seek time and controller time is not mentioned, take them to be zero.

If the amount of data to be transferred is not given, assume that no data is being transferred.
Otherwise, calculate the time taken to transfer the given amount of data.

The average of rotational latency is taken when the current position of R-W head is not given. Because,
the R-W may be already present at the desired position or it might take a whole rotation to get the
desired sector under the R-W head. But, if the current position of the R-W head is given then the
rotational latency must be calculated.

Example –
Consider a hard disk with:
4 surfaces
64 tracks/surface
128 sectors/track
256 bytes/sector

1. What is the capacity of the hard disk?


Disk capacity = surfaces * tracks/surface * sectors/track * bytes/sector
Disk capacity = 4 * 64 * 128 * 256
Disk capacity = 8 MB

2. The disk is rotating at 3600 RPM, what is the data transfer rate?
60 sec -> 3600 rotations
1 sec -> 60 rotations
Data transfer rate = number of rotations per second * track capacity * number of surfaces (since
1 R-W head is used for each surface)
Data transfer rate = 60 * 128 * 256 * 4
Data transfer rate = 7.5 MB/sec

3. The disk is rotating at 3600 RPM, what is the average access time?
Since, seek time, controller time and the amount of data to be transferred is not given, we
consider all the three terms as 0.
Therefore, Average Access time = Average rotational delay

81
Rotational latency => 60 sec -> 3600 rotations
1 sec -> 60 rotations
Rotational latency = (1/60) sec = 16.67 msec.
Average Rotational latency = (16.67)/2
= 8.33 msec.
Average Access time = 8.33 msec.

Computer Organization | Read and Write operations in memory


A memory unit stores binary information in groups of bits called words. Data input lines provide the
information to be stored into the memory, Data output lines carry the information out from the
memory. The control lines Read and write specifies the direction of transfer of data. Basically, in the
memory organization, there are 2^{l} memory locations indexing from 0 to 2^{l}-1 where l is the address
buses. We can describe the memory in terms of the bytes using the following formula:

N = 2^{l} Bytes

Where,
Where,
l is the total address buses
N is the memory in bytes

For example, some storage can be described below in terms of bytes using the above formula:

1kB= 2^10 Bytes

64 kB = 2^6 x 2^10 Bytes

= 2^16 Bytes

4 GB = 2^2 x 2^10(kB) x 2^10(MB) x 2^10(GB)

= 2^32 Bytes

Memory Address Register (MAR) is the address register which is used to store the address of the
memory location where the operation is being performed Memory Data Register (MDR) is the data
register which is used to store the data on which the operation is being performed.

1. Memory Read Operation:


Memory read operation transfers the desired word to address lines and activates the read
control line.Description of memory read read operation is given below:

82
In the above diagram initially, MDR can contain any garbage value and MAR is containing 2003 memory
address. After the execution of read instruction, the data of memory location 2003 will be read and the
MDR will get updated by the value of the 2003 memory location (3D).

2. Memory Write Operation:


Memory write operation transfers the address of the desired word to the address lines,
transfers the data bits to be stored in memory to the data input lines. Then it activates the write
control line. Description of the write operation is given below:

In the above diagram, the MAR contains 2003 and MDR contains 3D. After the execution of write
instruction 3D will be written at 2003 memory location.

Priority Interrupts | (S/W Polling and Daisy Chaining)

In I/O Interface (Interrupt and DMA Mode), we have discussed concept behind the Interrupt-initiated
I/O.

To summarize, when I/O devices are ready for I/O transfer, they generate an interrupt request signal to
the computer. The CPU receives this signal, suspends the current instructions it is executing and then
moves forward to service that transfer request. But what if multiple devices generate interrupts
simultaneously. In that case, we have to have a way to decide which interrupt is to be serviced first. In
other words, we have to set a priority among all the devices for systemic interrupt servicing.

The concept of defining the priority among devices so as to know which one is to be serviced first in case
of simultaneous requests is called priority interrupt system. This could be done with either software or
hardware methods.

83
SOFTWARE METHOD – POLLING

In this method, all interrupts are serviced by branching to the same service program. This program then
checks with each device if it is the one generating the interrupt. The order of checking is determined by
the priority that has to be set. The device having the highest priority is checked first and then devices
are checked in descending order of priority. If the device is checked to be generating the interrupt,
another service program is called which works specifically for that particular device.
The structure will look something like this-
if (device[0].flag)
device[0].service();
else if (device[1].flag)
device[1].service();
.
.
.
.
.
.
else
//raise error
The major disadvantage of this method is that it is quite slow. To overcome this, we can use hardware
solution, one of which involves connecting the devices in series. This is called Daisy-chaining method.

HARDWARE METHOD – DAISY CHAINING

The daisy-chaining method involves connecting all the devices that can request an interrupt in a serial
manner. This configuration is governed by the priority of the devices. The device with the highest
priority is placed first followed by the second highest priority device and so on. The given figure depicts
this arrangement.

WORKING:
There is an interrupt request line which is common to all the devices and goes into the CPU.

84
 When no interrupts are pending, the line is in HIGH state. But if any of the devices raises an
interrupt, it places the interrupt request line in the LOW state.

 The CPU acknowledges this interrupt request from the line and then enables the interrupt
acknowledge line in response to the request.

 This signal is received at the PI(Priority in) input of device 1.

 If the device has not requested the interrupt, it passes this signal to the next device through its
PO(priority out) output. (PI = 1 & PO = 1)

 However, if the device had requested the interrupt, (PI =1 & PO = 0)

 The device consumes the acknowledge signal and block its further use by placing 0 at its
PO(priority out) output.

 The device then proceeds to place its interrupt vector address(VAD) into the data bus of
CPU.

 The device puts its interrupt request signal in HIGH state to indicate its interrupt has
been taken care of.

NOTE: VAD is the address of the service routine which services that device.

 If a device gets 0 at its PI input, it generates 0 at the PO output to tell other devices that
acknowledge signal has been blocked. (PI = 0 & PO = 0)

Hence, the device having PI = 1 and PO = 0 is the highest priority device that is requesting an interrupt.
Therefore, by daisy chain arrangement we have ensured that the highest priority interrupt gets serviced
first and have established a hierarchy. The farther a device is from the first device, the lower its priority.

This article is contributed by Jatin Gupta. If you like GeeksforGeeks and would like to contribute, you
can also write an article using contribute.geeksforgeeks.org or mail your article to
contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help
other Geeks.

Please write comments if you find anything incorrect, or you want to share more information about the
topic discussed above.

I/O Interface (Interrupt and DMA Mode)


The method that is used to transfer information between internal storage and external I/O devices is
known as I/O interface. The CPU is interfaced using special communication links by the peripherals
connected to any computer system. These communication links are used to resolve the differences
between CPU and peripheral. There exists special hardware components between CPU and peripherals
to supervise and synchronize all the input and output transfers that are called interface units.

Mode of Transfer:

85
The binary information that is received from an external device is usually stored in the memory unit. The
information that is transferred from the CPU to the external device is originated from the memory unit.
CPU merely processes the information but the source and target is always the memory unit. Data
transfer between CPU and the I/O devices may be done in different modes.

Data transfer to and from the peripherals may be done in any of the three possible ways

1. Programmed I/O.

2. Interrupt- initiated I/O.

3. Direct memory access( DMA).

Now let’s discuss each mode one by one.

1. Programmed I/O: It is due to the result of the I/O instructions that are written in the computer
program. Each data item transfer is initiated by an instruction in the program. Usually the
transfer is from a CPU register and memory. In this case it requires constant monitoring by the
CPU of the peripheral devices.

Example of Programmed I/O: In this case, the I/O device does not have direct access to the memory
unit. A transfer from I/O device to memory requires the execution of several instructions by the CPU,
including an input instruction to transfer the data from device to the CPU and store instruction to
transfer the data from CPU to memory. In programmed I/O, the CPU stays in the program loop until the
I/O unit indicates that it is ready for data transfer. This is a time consuming process since it needlessly
keeps the CPU busy. This situation can be avoided by using an interrupt facility. This is discussed below.

2. Interrupt- initiated I/O: Since in the above case we saw the CPU is kept busy unnecessarily. This
situation can very well be avoided by using an interrupt driven method for data transfer. By
using interrupt facility and special commands to inform the interface to issue an interrupt
request signal whenever data is available from any device. In the meantime the CPU can
proceed for any other program execution. The interface meanwhile keeps monitoring the
device. Whenever it is determined that the device is ready for data transfer it initiates an
interrupt request signal to the computer. Upon detection of an external interrupt signal the CPU
stops momentarily the task that it was already performing, branches to the service program to
process the I/O transfer, and then return to the task it was originally performing.

Note: Both the methods programmed I/O and Interrupt-driven I/O require the active intervention of the
processor to transfer data between memory and the I/O module, and any data transfer must transverse
a path through the processor. Thus both these forms of I/O suffer from two inherent drawbacks.

 The I/O transfer rate is limited by the speed with which the processor can test and
service a
device.

 The processor is tied up in managing an I/O transfer; a number of instructions must be


executed
for each I/O transfer.

86
3. Direct Memory Access: The data transfer between a fast storage media such as magnetic disk
and memory unit is limited by the speed of the CPU. Thus we can allow the peripherals directly
communicate with each other using the memory buses, removing the intervention of the CPU.
This type of data transfer technique is known as DMA or direct memory access. During DMA the
CPU is idle and it has no control over the memory buses. The DMA controller takes over the
buses to manage the transfer directly between the I/O devices and the memory unit.

Bus Request : It is used by the DMA controller to request the CPU to relinquish the control of the buses.

Bus Grant : It is activated by the CPU to Inform the external DMA controller that the buses are in high
impedance state and the requesting DMA can take control of the buses. Once the DMA has taken the
control of the buses it transfers the data. This transfer can take place in many ways.

Types of DMA transfer using DMA controller:


Burst Transfer :
DMA returns the bus after complete data transfer. A register is used as a byte count,
being decremented for each byte transfer, and upon the byte count reaching zero, the DMAC will
release the bus. When the DMAC operates in burst mode, the CPU is halted for the duration of the data
transfer.
Steps involved are:

1. Bus grant request time.

2. Transfer the entire block of data at transfer rate of device because the device is usually slow
than the
speed at which the data can be transferred to CPU.

3. Release the control of the bus back to CPU


So, total time taken to transfer the N bytes
= Bus grant request time + (N) * (memory transfer rate) + Bus release control time.

Where,

X µsec =data transfer time or preparation time (words/block)

Y µsec =memory cycle time or cycle time or transfer time (words/block)

% CPU idle (Blocked)=(Y/X+Y)*100

87
% CPU Busy=(X/X+Y)*100

Cyclic Stealing:
In this DMA controller transfers one word at a time after which it must return the control of the buses to
the CPU. The CPU merely delays its operation for one memory cycle to allow the direct memory I/O
transfer to “steal” one memory cycle.
Steps Involved are:

4. Buffer the byte into the buffer

5. Inform the CPU that the device has 1 byte to transfer (i.e. bus grant request)

6. Transfer the byte (at system bus speed)

7. Release the control of the bus back to CPU.

Before moving on transfer next byte of data, device performs step 1 again so that bus isn’t tied up and
the transfer won’t depend upon the transfer rate of device.
So, for 1 byte of transfer of data, time taken by using cycle stealing mode (T).
= time required for bus grant + 1 bus cycle to transfer data + time required to release the bus, it will be
NxT

In cycle stealing mode we always follow pipelining concept that when one byte is getting transferred
then Device is parallel preparing the next byte. “The fraction of CPU time to the data transfer time” if
asked then cycle stealing mode is used.

Where,

X µsec =data transfer time or preparation time

(words/block)

Y µsec =memory cycle time or cycle time or transfer

time (words/block)

% CPU idle (Blocked) =(Y/X)*100

% CPU busy=(X/Y)*100

Interleaved mode: In this technique , the DMA controller takes over the system bus when the
microprocessor is not using it.An alternate half cycle i.e. half cycle DMA + half cycle processor.

88
Computer Organization | Asynchronous input output synchronization
Asynchronous input output is a form of input output processing that allows others devices to do
processing before the transmission or data transfer is done.

Problem faced in asynchronous input output synchronization –


It is not sure that the data on the data bus is fresh or not as their no time slot for sending or receiving
data.

This problem is solved by following mechanism:

1. Strobe

2. Handshaking

Data is transferred from source to destination through data bus in between.

1. Strobe Mechanism:

1. Source initiated Strobe – When source initiates the process of data transfer. Strobe is just a
signal.

(i) First, source puts data on the data bus and ON the strobe signal.
(ii) Destination on seeing the ON signal of strobe, read data from the data bus.
(iii) After reading data from the data bus by destination, strobe gets OFF.

Signals can be seen as:

It shows that first data is put on the data bus and then strobe signal gets active.

2. Destination initiated signal – When destination initiates the process of data transfer.

89
(i) First, the destination ON the strobe signal to ensure the source to put the fresh data on the data bus.
(ii) Source on seeing the ON signal puts fresh data on the data bus.
(iii) Destination reads the data from the data bus and strobe gets OFF signal.

Signals can be seen as:

It shows that first strobe signal gets active then data is put on the data bus.

Problems faced in Strobe based asynchronous input output –

1. In Source initiated Strobe, it is assumed that destination has read the data from the data bus but
their is no surety.

2. In Destination initiated Strobe, it is assumed that source has put the data on the data bus but
their is no surety.

This problem is overcome by Handshaking.

2. Handshaking Mechanism:

1. Source initiated Handshaking – When source initiates the data transfer process. It consists of
signals:
DATA VALID: if ON tells data on the data bus is valid otherwise invalid.
DATA ACCEPTED: if ON tells data is accepted otherwise not accepted.

(i) Source places data on the data bus and enable Data valid signal.
(ii) Destination accepts data from the data bus and enable Data accepted signal.

90
(iii) After this, disable Data valid signal means data on data bus is invalid now.
(iv) Disable Data accepted signal and the process ends.

Now there is surety that destination has read the data from the data bus through data accepted signal.

Signals can be seen as:

It shows that first data is put on the data bus then data valid signal gets active and then data accepted
signal gets active. After accepting the data, first data valid signal gets off then data accepted signal gets
off.

2. Destination initiated Handshaking – When destination initiates the process of data transfer.
REQUEST FOR DATA: if ON requests for putting data on the data bus.
DATA VALID: if ON tells data is valid on the data bus otherwise invalid data.

(i) When destination is ready to receive data, Request for Data signal gets activated.
(ii) Source in response puts data on the data bus and enabled Data valid signal.
(iii) Destination then accepts data from the data bus and after accepting data, disabled Request for Data
signal.
(iv) At last, Data valid signal gets disabled means data on the data bus is no more valid data.

Now there is surety that source has put the data on the data bus through data valid signal.

Signals can be seen as:

91
It shows that first Request for Data signal gets active then data is put on data bus then Data valid signal
gets active. After reading data, first Request for Data signal gets off then Data valid signal.

Computer Organization | Synchronous Data Transfer


In Synchronous data transfer, the sending and receiving units are enabled with same clock signal. It is
possible between two units when each of them knows the behavior of the other. The master performs a
sequence of instructions for data transfer in a predefined order. All these actions are synchronized with
the common clock. The master is designed to supply the data at a time when the slave is definitely ready
for it. Usually, the master will introduce sufficient delay to take into account the slow response of the
slave, without any request from the slave.

The master does not expect any acknowledgement signal from the slave, when a data is sent by the
master to the slave. Similarly, when a data from the slave is read by the master, neither the slave
informs that the data has been placed on the data bus nor the master acknowledges that the data has
been read. Both the master and slave performs their own task of transferring data at designed clock
period. Since both devices know the behavior (response time) of each other, no difficulty arises.
Prior to transferring data, the master must logically select the slave either by sending slave’s address or
sending “device select” signal to the slave. But there are no acknowledgement signal from the slave to
master if device is selected.

Timing diagram of synchronous read operation is given below:

92
In this timing diagram, the master first places slave’s address in the address bus and read signal in the
control line at the falling edge of the clock . The entire read operation is over in one clock period.

Advantages –

1. The design procedure is easy. The master does not wait for any acknowledges signal from the
slave through the master waits for a time equal to slave’s response time.

2. The slave does not generate acknowledge signal, though it obeys the timing rules as per the
protocol set by the master or system designer.

Disadvantages –

1. If a slow speed unit connected to a common bus, it can degrade overall rate of transfer in the
system.

2. If the slave operates at a slow speed, the master will be idle for some time during data transfer
and vice versa.

Computer Organization | Input-Output Processor


The DMA mode of data transfer reduces CPU’s overhead in handling I/O operations. It also allows
parallelism in CPU and I/O operations. Such parallelism is necessary to avoid wastage of valuable CPU
time while handling I/O devices whose speeds are much slower as compared to CPU. The concept of
DMA operation can be extended to relieve the CPU further from getting involved with the execution of
I/O operations. This gives rises to the development of special purpose processor called Input-Output
Processor (IOP) or IO channel.

The Input Output Processor (IOP) is just like a CPU that handles the details of I/O operations. It is more
equipped with facilities than those are available in typical DMA controller. The IOP can fetch and
execute its own instructions that are specifically designed to characterize I/O transfers. In addition to
the I/O – related tasks, it can perform other processing tasks like arithmetic, logic, branching and code
translation. The main memory unit takes the pivotal role. It communicates with processor by the means
of DMA.

The block diagram –

93
The Input Output Processor is a specialized processor which loads and stores data into memory along
with the execution of I/O instructions. It acts as an interface between system and devices. It involves a
sequence of events to executing I/O operations and then store the results into the memory.

Advantages –

 The I/O devices can directly access the main memory without the intervention by the processor
in I/O processor based systems.

 It is used to address the problems that are arises in Direct memory access method.

Memory mapped I/O and Isolated I/O


As a CPU needs to communicate with the various memory and input-output devices (I/O) as we know
data between the processor and these devices flow with the help of the system bus. There are three
ways in which system bus can be allotted to them :

1. Separate set of address, control and data bus to I/O and memory.

2. Have common bus (data and address) for I/O and memory but separate control lines.

3. Have common bus (data, address, and control) for I/O and memory.

In first case it is simple because both have different set of address space and instruction but require
more buses.

Isolated I/O –

Then we have Isolated I/O in which we Have common bus(data and address) for I/O and memory but
separate read and write control lines for I/O. So when CPU decode instruction then if data is for I/O then
it places the address on the address line and set I/O read or write control line on due to which data
transfer occurs between CPU and I/O. As the address space of memory and I/O is isolated and the name
is so. The address for I/O here is called ports. Here we have different read-write instruction for both I/O
and memory.

Memory Mapped I/O –

94
In this case every bus in common due to which the same set of instructions work for memory and I/O.
Hence we manipulate I/O same as memory and both have same address space, due to which addressing
capability of memory become less because some part is occupied by the I/O.

Differences between memory mapped I/O and isolated I/O –

ISOLATED I/O MEMORY MAPPED I/O

Memory and I/O have seperate address


space Both have same address space

Due to addition of I/O addressable


All address can be used by the memory memory become less for memory

Separate instruction control read and write Same instructions can control both I/O and
operation in I/O and Memory Memory

In this I/O address are called ports. Normal memory address are for both

More efficient due to seperate buses Lesser efficient

Larger in size due to more buses Smaller in size

It is complex due to separate separate logic Simpler logic is used as I/O is also treated
is used to control both. as memory only.

95
Types of Micro-programmed Control Unit – Based on the type of Control Word stored in the Control
Memory (CM), it is classified into two types :

1. Horizontal Micro-programmed control Unit :


The control signals are represented in the decoded binary format that is 1 bit/CS. Example: If 53 Control
signals are present in the processor than 53 bits are required. More than 1 control signal can be enabled
at a time.

 It supports longer control word.

 It is used in parallel processing applications.

 It allows higher degree of parallelism. If degree is n, n CS are enabled at a time.

 It requires no additional hardware(decoders). It means it is faster than Verical


Microprogrammed.

2. Vertical Micro-programmed control Unit :


The control signals re represented in the encoded binary format. For N control signals- Log2(N) bits are
required.

 It supports shorter control words.

 It supports easy implementation of new conrol signals therefore it is more flexible.

 It allows low degree of parallelism i.e., degree of parallelism is either 0 or 1.

 Requires an additional hardware (decoders) to generate control signals, it implies it is slower


than horizontal microprogrammed.

96

You might also like