0% found this document useful (0 votes)

79 views52 pages

CSC303: Computer Architecture I Lecture Notes

Uploaded by

Olalekan Ibrahim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

79 views52 pages

CSC303: Computer Architecture I Lecture Notes

Uploaded by

Olalekan Ibrahim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Lecture note on Computer Architecture and Organization CSC303

KWARA STATE UNIVERSITY, MALETE.

HARMATTAN SEMESTER 2024/2025 ACADEMIC SESSION
Course Code: CSC 303 Course Title: Computer Architecture I Course Status: 2C

Instructor: Dr. (Mrs.) R. S. Babatunde (Department of Computer Science)

Course Objective: To introduce the students to the basic concepts of Computer Architecture and
Organization, as well as the internal workings of the Computer. The knowledge acquired in this course
will enable students to develop a firm foundation for other courses such as Distributed System, Operating
System, and Computer Architecture II.
COURSE CONTENT

MODULE ONE Week 1-3

Evolution of Computing devices. Computer Architecture and Organization
Central Processing Unit’s Core Components. Processor Architectural development and styles,
Recent trends in Processor Technology. Basic GPU architecture. CPU vs GPU
Flynn’s Taxonomy. Future trends in microprocessor technology. CPU Pipelining and Pipeline
design. Parallel processing. History of Microprocessor
MODULE TWO Week 4-8
Instruction Set Architecture and design.
Memory Operations and Operand addressing mode.
The Instruction Cycle.
Instruction Format and components of a Machine Instruction,
Types of Computer instruction. Operand addressing mode
Intel 80X86 Programming Model. Registers – Types of Registers.
X86 Register set, Instruction types.

MODULE THREE Week 9-12

Numbering System. Computer Arithmetic. Conversion between radix.
Binary, Octal, Hexadecimal, Signed and Unsigned number representation
Ones and Twos Compliment.
Floating Point Arithmetic.
IEEE 754 Architecture, conversion standards and storage, Single, double precision and
counterparts, Programming Model

In-class test and Revision Week13

Examination Week 14

Course Reference Resources

I. Mostafa Abde-El-Bar and Hesham El-Rewini. Fundamentals of Computer Organization and

Architecture. John Wiley & Sons, Inc. (2019)

1
Lecture note on Computer Architecture and Organization CSC303
II. Richard C. Detmer. “Introduction to 80x86 Assembly Language and Computer Architecture”.
Second Edition
III. (2019) John & Bartlelt publishers, LLC
IV. Mark Balch. (2003): “A Comprehensive Guide to Digital Electronics and Computer System
Architecture”.
V. McGRAW-HILL Company. doi: 10.1036/0071433473.
VI. William Stallings. (2020): Computer Organization and Architecture. Designing for performance. 6 th
Edition. 5. Brookshear, J.G. Computer Science: An Overview. (Boston, Mass.: Pearson, 2009)
tenth edition [ISBN 9780321544285 (pbk)].1 6. Reynolds, C. and P. Tymann Schaum’s Outline of
Principles of Computer Science ( Schaum’s Outline Series). (New York :
VII. McGraw-Hill, 2020)
VIII. Stallings, W. Computer Organization and Architecture, Designing for Performance. ( Boston,
Mass.: Pearson, 2018).
IX. Tanenbaum, A. Structured Computer Organization. (Upper Saddle River, N.J.: Pearson Prentice
Hall, 2020).

2
Lecture note on Computer Architecture and Organization CSC303
MODULE ONE

1.1 EVOLUTION OF COMPUTING

 Early counting devices, generations of computers (Vacuum tube, Transistor, Integrated Circuit,
Very Large Scale Integration, AI).
 Classification
o Types of data: Digital, Analogue, Hybrid
o Purpose: General purpose, Special purpose
o Size: Notebook, microcomputer, minicomputer, mainframe, supercomputer

The ancestors of the modern age computer were the mechanical and electromechanical devices.
These include:- Blaise Pascal’s machine, Difference Engine, Anal machine, Difference Engine,
Analytical Engine, ENIAC, EDSAC, EDVAC, UNIVAC, MARK I, II, III, etc

Computer technology has made incredible improvement in the past half century. In the early part
of computer evolution, there were no stored-program computer, the computational power was
less and the sizes of the computers were very large. Nowadays, a personal computer has more
computational power, memory, disk storage, smaller in size and available in affordable cost. This
rapid improvement is as a result of advances in the technology used to build computers and
innovation in computer design.

Table 1: Four Decades of Computing

3
Lecture note on Computer Architecture and Organization CSC303
What is Computer Architecture?
o The structure and functional organization of a computer system.
o It specifies how data is processed and transferred between different parts of the
computer.
o It focuses on the design principles and how different components work together.
Computer architecture can be divided into two main types: Von Neumann architecture and
Harvard architecture.

1. Von Neumann architecture, also known as the Princeton architecture, was established in a
1945 presentation by John von Neumann and his collaborators in the First Draft of a Report
on the EDVAC. This example of computer architecture proposes five components: a
processor with connected registers, a control unit capable of storing instructions,
memory capable of storing information as well as instructions and communicating via
buses, additional or external storage, and device input as well as output mechanisms. On
the other hand,
2. Harvard architecture refers to a computer architecture with distinct data and instruction
storage and signal pathways. In contrast to the von Neumann architecture, in which
program instructions and data use the very same memory and pathways, this design
separates the two. In practice, a customized Harvard architecture with two distinct caches
is employed (for data and instruction); X86 and Advanced RISC Machine (ARM) systems
frequently employ this instruction

The duty of a computer designer/architect is to

 determine what attributes are important for a new computer, then

 design a computer to maximize performance and energy efficiency while staying within cost,
power, and availability constraints. This task has many aspects, including instruction set
design, functional organization, logic design, and implementation. The implementation may
encompass integrated circuit design, packaging, power, and cooling. Optimizing the design
requires familiarity with a very wide range of technologies, from compilers and operating
systems to logic design and packaging .

4
Lecture note on Computer Architecture and Organization CSC303
Overview of Microprocessor Architecture

A Microprocessor is a multipurpose programmable logic device which reads the binary

instructions from a storage device called ‘Memory’ accepts binary data as input and process data
according to the instructions and gives the results as output. Therefore, the Microprocessor as a
programmable digital device, which can be used for both data processing and control
applications.

A microcomputer system as seen in Figure 1, consists of a CPU (microprocessor), memories

(primary and secondary) and I/O devices as shown in the figure below. The memory and I/O
devices are linked by data and address (control) buses. The CPU communicates with only one
peripheral at a time. The peripheral is been enabled by the control signal.

For example to send data to the output device, the CPU places the device address on the address
bus, data on the data bus and enables the output device.

System Buses
Buses are wires connecting memory & I/O to microprocessor. 3 main types of Buses;
– Address Bus
• Unidirectional
• Identifying peripheral or memory location
– Data Bus
• Bidirectional
• Transferring data
– Control Bus
• Synchronization signals
• Timing signals
• Control signal

Figure1: Block diagram of a Microcomputer

Computer Architecture and Computer Organization

Changes in technology not only influe nc e organization but also result in the introduction of
more powerful and more complex architecture. However, because a computer organization
must be designed to implement a particular architectural specification, a thorough treatment

5
Lecture note on Computer Architecture and Organization CSC303
of organization requires a detailed examination of architecture a s we ll. Computer architecture
comes before Computer organization.

Computer architecture and computer organization are related but distinct in concepts.

Computer Architecture is a functional description of requirements and design implementation for

the various parts of a computer. It deals with the functional behavior of computer systems. It comes
before the computer organization while designing a computer.

Computer Architecture refers to the design of the internal workings of a computer system,
including the CPU, memory, and other hardware components. It involves decisions about the
organization of the hardware, such as the instruction set architecture, the data path design, and
the control unit design.
Computer Architecture is concerned with optimizing the performance of a computer system
and ensuring that it can execute instructions quickly and efficiently.
On the other hand,
Computer Organization refers to the operational units and their interconnections that
implement the architecture specification. It deals with how the components of a computer system
are arranged and how they interact to perform the required operations.
Computer Organization is concerned with the physical implementation of the architecture
design and includes decisions about the interconnection and communication between
components, such as the bus structure, memory hierarchy, and input/output systems.
Computer Organization comes after the decision of Computer Architecture first.

Computer Organization is how operational attributes are linked together and contribute to
realizing the architectural specification, hence Computer Organization deals with a structural
relationship.

In summary, computer architecture is focused on the design of the internal workings of a

computer system, while computer organization is focused on the implementation of that design.
Computer architecture is concerned with the high-level design decisions, while computer
organization deals with the low-level implementation details.
Therefore, architecture describes what the computer does, organization describes how it
does it.

Summary difference between Computer Architecture and Computer Organization:

6
Lecture note on Computer Architecture and Organization CSC303

S. No. Computer Architecture Computer Organization

Architecture describes what the

The Organization describes how it does it.
1. computer does.

Computer Architecture deals with

Computer Organization deals with a
the functional behavior of
structural relationship.
2. computer systems.

It deals with high-level design It deals with low-level design issues as

3. issues as seen in Figure 2 seen in Figure 2.

For designing a computer, its For designing a computer, an organization

architecture is fixed first. is decided after its architecture.
4.

Computer Architecture is also

Computer Organization is frequently called
called Instruction Set Architecture
microarchitecture.
5. (ISA).

Computer Architecture comprises

Computer Organization consists of physical
logical functions such as
units like circuit designs, peripherals, and
instruction sets, registers, data
adders.
6. types, and addressing modes.

Computer System Structure and Function

A computer system, like any system, consists of an interrelated set of components. The system is
best characterized in terms of structure, the way in which components are interconnected, and
function, the operation of the individual components. Furthermore, a computer’s organization is
hierarchical.
Each major component can be further described by decomposing it into its major subcomponents
and describing their structure and function.

Function
Both the structure and functioning of a computer are, in essence, simple. In general terms, there
are only four basic functions that a computer can perform:

7
Lecture note on Computer Architecture and Organization CSC303
• Data processing: Data may take a wide variety of forms, and the range of processing
requirements is broad.
• Data storage: Even if the computer is processing data on the fly (i.e., data come in and get
processed, and the results go out immediately), the computer must temporarily store at
least those pieces of data that are being worked on at any given moment. Thus, there is at
least a short- term data storage function. Equally important, the computer performs a long-
term data storage function. Files of data are stored on the computer for subsequent retrieval
and update.
• Data movement: The computer’s operating environment consists of devices that serve as
either sources or destinations of data. When data are received from or delivered to a device
that is directly connected to the computer, the process is known as input– output (I/O), and
the device is referred to as a peripheral. When data are moved over longer distances, to or
from a remote device, the process is known as data communications.
• Control: Within the computer, a control unit manages the computer’s resources and
orchestrates the performance of its functional parts in response to instructions.

Structure
There are four main structural components:

• Central processing unit (CPU): Controls the operation of the computer and performs its data
processing functions; often simply referred to as processor.
• Main memory: Stores data.
• I/O: Moves data between the computer and its external environment.
• System interconnection: Some mechanism that provides for communication among CPU,
main memory, and I/O. A common example of system interconnection is by means of a
system bus, consisting of a number of conducting wires to which all the other components
attach.

Components of the Central Processing Unit

The general components upon which the central processing unit is built include:

1. Bus
A bus is a bundle of wires grouped together to serve a single purpose. The main purpose of the
bus is to transfer data from one device to another. The processor's interface to the bus includes
connections used to pass data, connections to represent the address with which the processor
is interested, and control lines to manage and synchronize the transaction. The three major
buses are Data, Address and Control buses. There are internal buses that the processor uses
to move data, instructions, configuration, and status between its subsystems.

8
Lecture note on Computer Architecture and Organization CSC303

a. The Data Bus provides a path for moving data among system modules. The data bus may
consist of 32, 64, 128, or even more separate lines, the number of lines being referred to as the width of
the data bus. Because each line can carry only 1 bit at a time, the number of lines determines how many
bits can be transferred at a time. The width of the data bus is a key factor in determining overall system
performance. A narrower bus width means that it will take more time to communicate a quantity of data as
compared to a wider bus. For example, if the data bus is 32 bits wide and each instruction is 64 bits long,
then the processor must access the memory module twice during each instruction cycle.

b. The Address Bus is used to designate the source or destination of the data on the data bus. For
example, if the processor wishes to read a word (8, 16, or 32 bits) of data from memory, it puts the address
of the desired word on the address lines. Clearly, the width of the address bus determines the maximum
possible memory capacity of the system. Address space refers to the maximum amount of memory and
I/O that a microprocessor can directly address.
If a microprocessor has a 16-bit address bus, it can address up to 216 = 65,536 bytes. Therefore, it has a
64 kB address space. i.e.
1byte = 8 bits….
1024bytes =>1kB
65,536bytes=>64kB
Furthermore, the address lines are generally also used to address I/O ports. Note that the address bus is
unidirectional (the microprocessor asserts requested addresses to the various devices), and the data bus
is bidirectional (the microprocessor asserts data on a write and the devices assert data on reads).

c. The Control Bus is used to control the access to and the use of the data and address lines.
Because the data and address lines are shared by all components, there must be a means of controlling
their use. Control signals transmit both command and timing information among system modules.
Timing signals indicate the validity of data and address information. Command signals specify operations
to be performed. Typical control lines include:

• Memory write: Causes data on the bus to be written into the addressed location
• Memory read: Causes data from the addressed location to be placed on the bus
• I/O write: Causes data on the bus to be output to the addressed I/O port
• I/O read: Causes data from the addressed I/O port to be placed on the bus
• Transfer ACK: Indicates that data have been accepted from or placed on the bus
• Bus request: Indicates that a module needs to gain control of the bus
• Bus grant: Indicates that a requesting module has been granted control of the bus
• Interrupt request: Indicates that an interrupt is pending
• Interrupt ACK: Acknowledges that the pending interrupt has been recognized
• Clock: Is used to synchronize operations
• Reset: Initializes all modules

9
Lecture note on Computer Architecture and Organization CSC303
2. Registers
Registers are temporary storage locations in the CPU. A register stores a binary value using a
group of latches. Although variables and pointers used in a program are all stored in memory,
they are moved to registers during periods in which they are the focus of operation. This is so
that they can be manipulated quickly. Once the processor shifts its focus, it stores the values it
doesn't need any longer back in memory. Registers may be used for several operations.
Discussion on types and usage of registers will follow in Module III of this document.

3. Buffers
A processor does not operate in isolation. Typically there are multiple processors supporting
the operation of the main processor. These include video processors, the keyboard and
mouse interface processor, and the processors providing data from hard drives and
CDROMs. There are also processors to control communication interfaces such as USB, and
Ethernet networks. These processors all operate independently, and therefore one may
finish an operation before a second processor is ready to receive the results.

If one processor is faster than another or if one processor is tied up with a process prohibiting
it from receiving data from a second process, then there needs to be a mechanism in place so
that data is not lost. This mechanism takes the form of a block of memory that can hold data
until it is ready to be picked up.

4. The Stack
During the course of normal operation, there will be a number of times when the processor
needs to use a temporary memory, a place where it can store a number for a while until it is
ready to use it again.
For example, every processor has a finite number of registers. If an application needs more
registers than are available, the register values that are not needed immediately can be stored
in this temporary memory. When a processor needs to jump to a subroutine or function, it
needs to remember the instruction it jumped from so that it can pick back up where it left off
when the subroutine is completed. The return address is stored in this temporary memory.
The stack is a block of memory locations reserved to function as temporary memory. It
operates much like the stack of plates at the start of a restaurant buffet line. When a plate is
put on top of an existing stack of plates, the plate that was on top is now hidden, one position
lower in the stack. It is not accessible until the top plate is removed. There are two main
operations that the processor can perform on the stack: it can either store the value of a
register to the top of the stack or remove the top piece of data from the stack and place it in a
register. Storing data to the stack is referred to as "pushing" while removing the top piece of
data is called "popping". The LIFO nature of the stack makes it so that applications must
remove data items in the opposite order from which they were placed on the stack.

10
Lecture note on Computer Architecture and Organization CSC303
5. I/O Ports
Input/output ports or I/O ports refer to any connections that exist between the processor
and its external devices. A USB printer or scanner, for example, is connected to the computer
system through an I/O port. The computer can issue commands and send data to be printed
through this port or receive the device's status or scanned images. Some I/O devices are
connected directly to the memory bus and act just like memory devices. Sending data to the
port is done by storing data to a memory address and retrieving data from the port is done by
reading from a memory address.
If the device is incorporated into the processor, then communication with the port is done by
reading and writing to registers. This is sometimes the case for simple serial and parallel
interfaces such as a printer port or keyboard and mouse interface.

PROCESSOR DESIGN APPROACH

One of the key features used to categorize a microprocessor is whether it supports reduced
instruction set computing (RISC) or complex instruction set computing (CISC). The distinction is
how complex individual instructions are the arrangement that exist for the same basic instruction.
In practical terms, this distinction directly relates to the complexity of a microprocessor’s instruction
decoding logic; a more complex instruction set requires more complex decoding logic. The
differences are tabulated in Table 1.

CISC RISC

Instructions and addressing modes are Simple instruction decode logic since there
complex hence complex instruction decode are few instructions to decode hence few
logic operand complexity

Processor are complex hence increasing In a single instruction, smaller number of

difficulty to support clock rate because operations can be performed, using simpler
computation are complex within a single number of instruction
clock period
In a single instruction, many operations are Has separate instruction for each set of
embedded.eg. fetch, add, increment, store operation, hence reduce complexity by
operations all in one instruction speeding up instructions that are frequently
used

Not all instructions in CISC microprocessors The instructions that are not frequently used
are used with the same frequency. Only are removed so as to simplify the
some (core set) are called most of the time microprocessor control logic hence system
can perform faster, faster execution of
11
Lecture note on Computer Architecture and Organization CSC303
programs, leading to improved throughput
for the commonly used instruction and
increase overall performance.
The instructions that are used less often Reduces permutation of the decode logic
impose a burden on the entire system since instructions are reduced and only few
because there is increase in permutation of memory R/W operations
decode logic in a given clock cycle

RECENT TRENDS IN PROCESSOR TECHNOLOGY

Data creation is growing exponentially due to explosion in big data and machine learning, both
processor, storage and memory technology has witness fundamental change in terms of size,
speed, capacity and architecture, hence the demand for graphics processing units. GPUs are ideal
fit for so many modern applications. A Central Processing Unit (CPU) is a latency -optimized
general purpose processor that is designed to handle a wide range of distinct tasks sequentially,
while a Graphics Processing Unit (GPU) is a throughput-optimized specialized processor designed
for high-end parallel computing, as illustrated in Figure 2.

CPU Architecture
A Central Processing Unit (CPU) is the brains of your computer. The main job of the CPU is to
carry out a diverse set of instructions through the fetch-decode-execute cycle to manage all parts
of your computer and run all kinds of computer programs.

A CPU is very fast at processing your data in sequence, as it has few heavyweight cores with high
clock speed. It’s like a Swiss army knife that can handle diverse tasks pretty well. The CPU is
latency-optimized and can switch between numbers of tasks real quick, which may create an
impression of parallelism. Nevertheless, fundamentally it is designed to run one task at a time.

GPU Architecture

A Graphics Processing Unit (GPU) is a specialized processor whose job is to rapidly manipulate
memory and accelerate the computer for a number of specific tasks that require a high degree of
parallelism.

As the GPU uses thousands of lightweight cores whose instruction sets are optimized for
dimensional matrix arithmetic and floating point calculations, it is extremely fast with linear algebra
and similar tasks that require a high degree of parallelism.

As a rule of thumb, if your algorithm accepts vectorized data, the job is probably well-suited
for GPU computing.

12
Lecture note on Computer Architecture and Organization CSC303
Architecturally, GPU’s internal memory has a wide interface with a point-to-point connection which
accelerates memory throughput and increases the amount of data the GPU can work with in a
given moment. It is designed to rapidly manipulate huge chunks of data all at once.

Figure 2: CPU vs GPU

CPU GPU

Task parallelism Data parallelism

A few heavyweight cores Many lightweight cores
High memory size High memory throughput
Many diverse instruction sets A few highly optimized instruction sets
Explicit thread management Threads are managed by hardware

A larger number (thousands) of smaller

A smaller number of larger cores
cores

Low latency High throughput

Optimized for serial processing Optimized for parallel processing

Designed for running complex Designed for simple and repetitive

programs calculations

Performs fewer instructions per clock Performs more instructions per clock

Cost-efficient for smaller workloads Cost-efficient for bigger workloads

Automatic cache management Allows for manual memory management

13
Lecture note on Computer Architecture and Organization CSC303
When comparing the two, it is important to understand that GPUs were designed to complement
CPUs, not to replace them. The CPU and the GPU work together to increase the amount and
speed of processed data.

A GPU cannot replace a CPU in a computer system. The CPU is necessary to oversee the
execution of tasks on the system. However, the CPU can delegate specific repetitive workloads
to the GPU and free its own resources necessary for maintaining the stability of the system and
the programs that are running

GPU uses many lightweight processing cores, leverages data parallelism, and has high memory
throughput. While the specific components will vary by model, fundamentally most modern GPUs
use single instruction multiple data (SIMD) stream architecture.

FLYNN’S TAXONOMY

What is Flynn’s Taxonomy?

Flynn’s Taxonomy is a categorization of computer architectures by Stanford University’s Michael

J. Flynn. The basic idea behind Flynn’s Taxonomy is simply that computations consist of 2
streams (data and instruction streams) that can be processed in sequence (1 stream at a time)
or in parallel (multiple streams at once). It is important to understand this because it has been
used as a tool in design of modern processors and their functionalities.

Two data streams with two possible methods to process them leads to the 4 different categories
in Flynn’s Taxonomy. Let’s take a look at each, as illustrated in Figure 3

Figure 3: Flynn’s Taxonomy

14
Lecture note on Computer Architecture and Organization CSC303
1.Single Instruction Single Data (SISD)

SISD stream is an architecture where a single instruction stream (e.g. a program) executes on one data
stream. This architecture is used in older computers with a single-core processor, as well as many simple
compute devices.

2.Single Instruction Multiple Data (SIMD)

A SIMD stream architecture has a single control processor and instruction memory, so only one instruction
can be run at any given point in time. That single instruction is copied and ran across each core at the same
time. This is possible because each processor has its own dedicated memory which allows for parallelism
at the data-level (a.k.a. “data parallelism”).
The fundamental advantage of SIMD is that data parallelism allows it to execute computations quickly
(multiple processors doing the same thing) and efficiently (only one instruction unit).

3. Multiple Instruction Single Data (MISD)

MISD stream architecture is effectively the reverse of SIMD architecture. With MISD multiple instructions
are performed on the same data stream. The use cases for MISD are very limited today. Most practical
applications are better addressed by one of the other architectures.

4. Multiple Instruction Multiple Data (MIMD)

MIMD stream architecture offers parallelism for both data and instruction streams. With MIMD, multiple
processors execute instruction streams independently against different data streams.

What makes SIMD best for GPUs?

Now that we understand the different architectures, let’s consider why SIMD is the best choice for GPUs.
The answer becomes intuitive when you understand that fundamentally graphics processing and many
other common GPU computing use cases are simply running the same mathematical function over and
over again at scale. In this case, many processors running the same instruction on multiple data sets is
ideal.

What about SIMT?

So where does SIMT fit into Flynn’s Taxonomy? SIMT can be viewed as an extension of SIMD. It adds
multithreading to SIMD which improves efficiency as there is less instruction fetching overhead.

Terminologies for Future Trends in Computer Architecture

These trends are actively being researched and developed by scientists, engineers, and tech companies
around the world.

15
Lecture note on Computer Architecture and Organization CSC303
While some trends, such as quantum computing, are still in the experimental stage, others like in-memory
computing and reconfigurable architecture are already making their way into practical applications to drive
transformative changes across various industries. Quantum computing could revolutionize fields like
cryptography and drug discovery, while neuromorphic architecture could lead to breakthroughs in artificial
intelligence. In-memory computing could accelerate data-driven insights, and photonic computing might
reshape communication networks. Reconfigurable architecture could optimize computing resources for
different tasks, improving overall efficiency.

1. Quantum computing

Quantum computing utilizes principles of quantum mechanics to process information using quantum bits or
qubits. Unlike classical bits, qubits can exist in multiple states simultaneously, enabling quantum computers
to perform complex calculations exponentially faster than classical computers. Quantum computing has the
potential to revolutionize fields like cryptography, optimization, and materials science.

Quantum computing uses quantum-mechanical phenomena like superposition and entanglement to

process information. It can potentially be beneficial because it can tackle issues that are difficult for
traditional computers to handle, like factoring big numbers, modelling complicated systems, and optimising
complex functions.

Besides, the number of potential states and interactions multiplies exponentially as the complexity of the
problem rises. Although it is still in its initial phase, quantum computing has the potential to change
industries, including cryptography, banking, and drug discovery. Building a quantum computer can be
done in several ways, such as using topological qubits, trapped ions, and superconducting circuits.

2. Neuromorphic architecture

Neuromorphic architecture is inspired by the human brain’s neural networks. It aims to create computer
systems that can process information and learn in ways similar to biological systems. By emulating the
brain’s efficiency and adaptability, neuromorphic architecture enhances machine learning and artificial
intelligence capabilities, enabling computers to perform tasks intuitively and efficiently.

Neuromorphic computing is motivated by the structure and operation of the human brain. It processes
information in a way that is fundamentally distinct from conventional computing by using specialised
hardware and software to replicate the brain’s neuronal structure. For instance, neuromorphic computing
relies on analogue rather than digital computations, it may be more energy-efficient. Because it can learn
from and adjust to new information in real-time, it can also be more versatile and adaptive. Several
computing fields, such as artificial intelligence, robotics, and sensory processing, stand to benefit from it.

3. In-memory computing

In-memory computing challenges the traditional separation of processing and memory units by performing
computations directly within the memory. This approach eliminates the need to transfer data between
components, leading to faster and more efficient data processing. In-memory computing is particularly
beneficial for data-intensive tasks like big data analytics and machine learning. memory technologies
address some of the major issues in computer architecture, including power consumption, performance,
and scalability.

16
Lecture note on Computer Architecture and Organization CSC303
4. Reconfigurable architecture

Reconfigurable architecture is a computer architecture combining some of the flexibility of software with
the high performance of hardware.

Reconfigurable architecture allows computer systems to dynamically adjust their hardware configurations
to optimize performance for specific tasks. This adaptability is crucial in environments with rapidly changing
workloads and applications. Reconfigurable architecture offers versatility and efficiency, making it well-
suited for diverse computing needs, including edge computing and scientific simulations. E.g FPGA.

5. Cloud-based computing

Cloud-based computing, commonly referred to as cloud computing, uses remote servers and networks in
place of just a local computer or server to store, administer, and process data and applications. Cloud
computing enables greater flexibility and scalability in computer resources because resources and
services are offered over the internet. Software as a Service (SaaS), Platform as a Service (PaaS), and
Infrastructure as a Service are the three primary divisions of cloud computing (IaaS).

6. Edge computing

This is a distributed computing paradigm that processes data at the network’s edge, nearer to the data
source. Edge computing enables data to be processed and analysed locally, on devices or systems closer
to the source of data generation, rather than transferring all of the data to a centralised data center or
cloud for processing. This method is frequently applied to decrease latency and speed up data processing.

CPU Pipelining
Microprocessor designers, in an attempt to squeeze every last bit of performance from their
designs, try to make sure that every circuit of the CPU is doing something productive at all times.
The most common application of this practice applies to the execution of instructions. It is
based on the fact that there are steps to the execution of an instruction, each of which uses
entirely different components of the CPU.
Assuming that the execution of a machine code instruction can be broken into three stages:

• Fetch – get the next instruction to execute from its location in memory
• Decode – determine which circuits to energize in order to execute the fetched instruction
• Execute – use the ALU and the processor to memory interface to execute the instruction

By comparing the definitions of the different components of the CPU shown with the needs of
these three different stages or cycles, it can be seen that three different circuits are used for these
three tasks.

17
Lecture note on Computer Architecture and Organization CSC303
• The internal data bus and the instruction pointer perform the fetch.
• The instruction decoder performs the decode cycle.
• The ALU and CPU registers are responsible for the execute cycle.

Once the logic that controls the internal data bus is done fetching the current instruction,
what's to keep it from fetching the next instruction? It may have to guess what the next instruction
is, but if it guesses right, then a new instruction will be available to the instruction decoder
immediately after it finishes decoding the previous one.

Once the instruction decoder has finished telling the ALU what to do to execute the current
instruction, what's to keep it from decoding the next instruction while it's waiting for the ALU to
finish? If the internal data bus logic guessed right about what the next instruction is, then the
ALU won't have to wait for a fetch and subsequent decode in order to execute the next
instruction.

This process of creating a queue of fetched, decoded, and executed instructions is called pipelining, and
it is a common method for improving the performance of a processor.

Therefore, a fast processor can be built by making the rate of execution of instruction fast. This can be
achieved by increasing the number of instructions that can be executed simultaneously. Some
CPUs break the fetch-decode execute cycle down into smaller steps, where some of these smaller steps
can be performed in parallel. This overlapping speeds up execution. i.e. The CPU fetches and executes
simultaneously.
This method, used by all current CPUs, is known as pipelining. This is a process whereby the CPU fetches
and executes at the same time, achieved by splitting the microprocessor into two;
(1) bus interface unit (BIU) and (2) execution unit (EU). It is a way of improving the processing power of
the CPU. The BIU access the memory and peripherals while the EU executes instructions. The idea of
pipelining is to have more than one instruction being processed by the processor at the same time.
Figure 4a shows the time-line sequence of the execution of five instructions on a non-pipelined processor.
Notice how a full fetch - decode-execute cycle must be performed on instruction 1 before instruction 2 can
be fetched.
This sequential execution of instructions allows for a very simple CPU hardware, but it leaves each portion
of the CPU idle for 2 out of every 3 cycles. During the fetch cycle, the instruction decoder and ALU are idle;
during the decode cycle, the bus interface and the ALU are idle; and during the execute cycle, the bus
interface and the instruction decoder are idle.

18
Lecture note on Computer Architecture and Organization CSC303
Figure 4b on the other hand shows the time-line sequence for the execution of five instructions using a
pipelined processor. Once the bus interface has fetched instruction 1 and passed it to the instruction
decoder for decoding, it can begin its fetch of instruction 2.
Notice that the first cycle in the figure only has the fetch operation. The second cycle has both the fetch and
the decode cycle happening at the same time. By the third cycle, all three operations are happening in
parallel. Without pipelining, five instructions take 15 cycles to execute. In a pipelined architecture, those
same five instructions take only 7 cycles to execute, a savings of over 50 %. In general, the number of
cycles it takes for a non-pipelined architecture using three cycles to execute an instruction is equal to three
times the number of instructions.

Figure 4a: Non-Pipelined Execution of Five Instructions

Figure 4b: Pipelined Execution of Five Instructions

Disadvantages of Pipelining/Pipeline Conflicts

Resource Hazards. When an instruction is storing a value to memory, and another value is
being fetched from the memory, both need access to memory, this result in a conflict. This
19
Lecture note on Computer Architecture and Organization CSC303
occurs when two or more instructions that are already in the pipeline need the same
resources. It can also occur when multiple instructions are ready to enter the execute phase, and
there exist only a single ALU. This can be taken care of in 2 ways. (1) Instruction execution will
continue while instruction fetch will wait. (2) providing more resources such as multiple ports into
main memory and multiple ALU

Data Hazards. This happens when the result of one instruction, not yet available is to be used as
an operand for a following instruction. This is a situation when there is a conflict in the access of
an operand location i.e. two or more instructions accessing a particular register or memory
operand {NB in sequential processing, this is not a problem but in parallel, the values will be
different}. This can be resolved by altering the flow of execution in a program i.e. specialized
hardware can be used to detect the conflict and route data through special paths that exists
between various stages in the pipeline, thereby reducing the time needed for the instruction to
access the required operand.

Control Hazard: This occurs when the pipeline makes the wrong decision on a branch prediction,
and brings a wrong instruction into the pipe. A conditional branch instruction makes the
address of the next instruction to be fetched unknown. After a conditional branch, predicting
the instruction that will be needed next becomes a problem. This may be overcome by (i)
rearranging the machine code to cause a delayed branch. (ii) Fetching the beginning and branch
instruction at the same time and save the branch until it is actually needed of which at that time
the true execution path will be known.

Why does design issue matter?

- Designers cannot assume infinite speed and memory.
- Speed mismatch between memory and processor
- handle bugs and errors (bad pointers, overflow etc.)
- multiple processors, processes, threads
- shared memory
- disk access

20
Lecture note on Computer Architecture and Organization CSC303
- better performance with reduced power

History/ Evolution of Microprocessors

The first Microprocessor (4004) was designed by Intel Corporation which was founded by Moore
and Noyce in 1968.
In the early years, Intel focused on developing semiconductor memories (DRAMs and EPROMs)
for digital computers.
In 1969, a Japanese Calculator manufacturer, Busicom approached Intel with a design for a
small calculator which need 12 custom chips. Ted Hoff, an Intel Engineer thought that a general
purpose logic device could replace the multiple components.
This idea led to the development of the first so called microprocessor. So, Microprocessors
started with a modest beginning of drivers for calculators.

With developments in integration technology Intel was able to integrate the additional chips like
8224 clock generator and the 8228 system controller along with 8080 microprocessor within a
single chip and released the 8 bit microprocessor 8085 in the year 1976.
The 8085microprocessor consisted of 6500 MOS transistors and could work at clock frequencies
of 3-5MHz. The other improved 8 bit microprocessors include Motorola MC 6809, Zilog Z-80 and
RCA COSMAC.

In 1978, Intel introduced the 16 bit microprocessor 8086 and 8088 in 1979. IBM selected the
Intel 8088 for their personal computer (IBM-PC). 8086 microprocessor made up of 29,000 MOS
transistors and could work at a clock speed of 5-10 MHz. It has a 16-bit ALU with 16-bit data bus
and 20-bit address bus. It can address up to 1MB of address space.
The pipelining concept was used for the first time to improve the speed of the processor. It had a
pre-fetch queue of 6 instructions where in the instructions to be executed were fetched during the
execution of an instruction. It means 8086 architecture supports parallel processing.

The 8088 microprocessor is similar to 8086 processor in architecture, but the basic difference is
it has only 8-bit data bus even though the ALU is of 16-bit.

In 1982 Intel released another 16-bit processor called 80186 designed by a team under the
leadership of Dave Stamm. This is having higher reliability and faster operational speed but at a
lower cost. It had a pre-fetch queue of 6-instructions and it is suitable for high volume applications
such as computer workstations, word-processor and personal computers.
It is made up of 134,000 MOS transistors and could work at clock rates of 4 - 6 MHz.

Intel released another 16 bit microprocessor 80286 having 1, 34,000 transistors in 1981. It was
used as CPU in PC-ATs in 1982. It is the second generation microprocessor, more advanced to
80186 processor. It could run at clock speeds of 6 to 12.5 MHz. It has a 16-bit data bus and 24bit
address bus, so that it can address up to 16MB of address space and 1GB of virtual memory.

21
Lecture note on Computer Architecture and Organization CSC303
Intel introduced the concept of protected mode and virtual mode to ensure proper operation. It
also had on-chip memory management unit (MMU).This was popularly called as Intel 286 in those
days.

In 1985, Intel released the first 32 bit processor 80386, with 275,000 transistors. It has 32-bit data
bus and 32-bit address bus so that it can address up to a total of 4GB memory also a virtual
memory space of 64TB. It could process five million instructions per second and could work with
all popular operating systems including Windows. It is incorporated with a concept called paging
in addition to segmentation technique. It uses a math co-processor called 80387.

Intel introduced 80486 microprocessor with a built-in maths co-processor and with 1.2 million
transistors. It could run at the clock speed of 50 MHz. This is also a 32 bit processor but it is
twice as fast as 80386.The additional features in 486 processor are the built-in Cache and built-
in math co-processors. The address bus here is bidirectional because of presence of cache
memory.

On 19th October, 1992, Intel released the Pentium-I Processor with 3.1 million transistors. So,
the Pentium began as fifth generation of the Intel x86 architecture. This Pentium was backward
compatible while offering new features. The revolutionary technology is that the CPU is able to
execute two instructions at the same time. This is known as super scalar technology. The Pentium
uses a 32-bit expansion bus, however the data bus is 64 bits.

The 7.5 million transistors based chip, Intel Pentium II processor was released in 1997. It works
at a clock speed of 300M.Hz. Pentium II uses the Dynamic Execution Technology which consists
of three different facilities namely, Multiple branch prediction, Data flow analysis, and Speculative
execution unit. Another important feature is a thermal sensor located on the mother board which
monitor the die temperature of the processor.

Intel Celeron Processors, the Pentium-III processor with 9.5 million transistors was introduced in
1999. It uses dynamic execution micro-architecture, a unique combination of multiple branch
prediction, dataflow analysis and speculative execution.
The Pentium III has improved MMX and processor serial number feature. The improved MMX
enables advanced imaging, 3D streaming audio and video, and speech recognition and
enhanced Internet facility.

Pentium-IV with 42 million transistors and 1.5 GHz clock speed was released by Intel in
November 2000. The Pentium -IV processor has a system bus with 3.2 G-bytes per second of
bandwidth. This high bandwidth is a key reason for applications that stream data from memory.
This bandwidth is achieved with 64 –bit wide bus capable of transferring data at a rate of 400MHz.
The Pentium -IV processor enables real-time MPEG2 video encoding and near real-time MPEG4
encoding, allowing efficient video editing and video conferencing.

Intel with partner Hewlett-Packard developed the next generation 64-bit processor architecture
called IA-64.This first implementation was named Itanium. Itanium processor which is the first in

22
Lecture note on Computer Architecture and Organization CSC303
a family of 64 bit products was introduced in the year 2001.The Itanium processor was specially
designed to provide a very high level of parallel processing, to enable high performance without
requiring very high clock frequencies. The Itanium processor can handle up to 6 simultaneous 64
–bit instructions per clock cycle.

The Itanium II is an IA-64 microprocessor developed jointly by Hewlett-Packard (HP) and Intel
and released on July 8, 2002. It is theoretically capable of performing nearly 8 times more work
per clock cycle than other CISC and RISC architectures due to its parallel computing micro-
architecture.
Pentium 4EE was released by Intel in the year 2003 and Pentium 4E was released i n the year
2004.

The Pentium Dual-Core brand was used for mainstream X86-architecture microprocessors from
Intel from 2006 to 2009. The 64 bit Intel Core2 was released on July 27, 2006. In terms of features,
price and performance at a given clock frequency, Pentium Dual Core processors were
positioned above Celeron but below Core and Core 2 microprocessors in Intel's product range.

The Pentium Dual Core, which consists of 167 million transistors was released on January 21,
2007. Intel Core Duo consists of two cores on one die, a 2 MB L2 cache shared by both cores,
and an arbiter bus that controls L2 cache.
Core 2 Quad processors are multi-chip modules consisting of two dies similar to those used in
Core 2 Duo, forming a quad-core processor.

In September 2009, new Core i7 models based on the Lynnfield desktop quad-core processor
and the Clarksfield quad-core mobile were added, The first six-core processor in the Core lineup
is the Gulftown, which was launched on March 16, 2010. Both the regular Core i7 and the Extreme
Edition are advertised as five stars in the Intel Processor Rating.

Features of 8086 Microprocessor

– It is a 16-bit microprocessor.
– 8086 has a 20 bit address bus can access up to 220 memory locations (1 MB).
– It can support up to 64K I/O ports.
– It provides 14, 16 -bit registers.
– It has multiplexed address and data bus AD0- AD15 and A16 – A19.
– It requires single phase clock with 33% duty cycle to provide internal timing.
– 8086 is designed to operate in two modes, Minimum and Maximum.
– It can prefetches up to 6 instruction bytes from memory and put them in instr queue in order to
speed up instruction execution.
– It requires +5V power supply.
– A 40 pin dual in line package.

8086 employs parallel processing. The 8086 has 2 parts which operates at the same time; the
bus interface unit BIU and execution unit EU as seen in the Figure 8 below

23
Lecture note on Computer Architecture and Organization CSC303

Figure 5: Illustrating parallel processing

Bus Interface Unit (BIU):

– The BIU performs all bus operations such as instruction fetching, reading and writing
operands for memory and calculating the addresses of the memory operands.
– The instruction bytes are transferred to the instruction queue.
– It provides a full 16 bit bidirectional data bus and 20 bit address bus.
– The bus interface unit is responsible for performing all external bus operations.
Specifically it has the following functions:
– Instruction fetch , Instruction queuing, Operand fetch and storage, Address calculation
relocation and Bus control.
– The BIU uses a mechanism known as an instruction queue to implement a pipeline architecture.
The BIU contains the following registers:

IP - the Instruction Pointer

CS - the Code Segment Register
DS - the Data Segment Register
SS - the Stack Segment Register
ES - the Extra Segment Register

The BIU fetches instructions using the CS and IP, written CS:IP, to contract the 20-bit address.
Data is fetched using a segment register (usually the DS) and an effective address (EA) computed
by the EU depending on the addressing mode.

Execution Unit (EU):

– The Execution unit is responsible for decoding and executing all instructions.

24
Lecture note on Computer Architecture and Organization CSC303
– The EU extracts instructions from the top of the queue in the BIU, decodes them, generates
operands if necessary, passes them to the BIU and requests it to perform the read or write by
cycles to memory or I/O and perform the operation specified by the instruction on the operands.

– During the execution of the instruction, the EU tests the status and control flags and updates
them based on the results of executing the instruction.
– If the queue is empty, the EU waits for the next instruction byte to be fetched and shifted to top
of the queue.
– When the EU executes a branch or jump instruction, it transfers control to a location
corresponding to another set of sequential instructions.
– Whenever this happens, the BIU automatically resets the queue and then begins to fetch
instructions from this new location to refill the queue.

25
Lecture note on Computer Architecture and Organization CSC303

MODULE TWO

INSTRUCTION SET ARCHITECTURE (ISA)

An instruction set architecture (ISA), is the part of the computer architecture related to
programming, including the native data types, instructions, registers, addressing modes, memory
architecture, interrupt and exception handling, and external I/O. The ISA also includes a
specification of the set of opcodes (machine language) - the native commands for a particular
processor. ISA is the hardware – software interface

Figure 6: Illustrating the ISA

Instruction set architecture (ISA) describes the processor in terms of what the
programmer sees, i.e. the instructions and registers. Two machines may have the
same ISA, but different organizations. Organization is concerned with the internal design
of the processor, the design of the bus system and its interfaces, the design of memory
and so on. Two machines with the same organization may have different hardware
implementations.

ISA is the interface between software and hardware. It is an abstraction which hides
hardware complexity from software through a set of operations and devices. One of the
crucial features of any processor is its instruction set, i.e. the set of machine code
instructions that the processor can carry out.

Each processor has its own unique instruction set specifically designed to make best use
of the capabilities of that processor. The actual number of instructions provided ranges
from a few dozen for a simple 8-bit microprocessor to several hundred for a 32-bit (virtual

26
Lecture note on Computer Architecture and Organization CSC303
address extension) VAX processor. However, it should be pointed out that a large
instruction set does not necessarily imply a more powerful processor.

An instruction set is a list of all the instructions that a processor can execute.

Typical Categories of Instructions are:

 Arithmetic - add, subtract
 Logic - and, or and not
 Data movement move input output load and store
 Control flow - goto, if ... goto, call and return.

An instruction is a form of control code, which supplies the information about an

operation and the data on which the operation is to be performed. Each instruction
consists of several elements. An instruction element is a unit of information required by
the CPU for execution. One thing which should be kept in mind is that the instruction set
is a boundary which is looked upon in the same fashion by a computer designer and the
programmer. From the computer designer’s point of view, the instruction set provides the
functional requirements of the CPU.

Features of Instruction Set

• Instructions and Instruction Formats

• Data Types, Encodings, and Representations

• Programmable Storage: Registers and Memory

• Addressing Modes: Accessing Instructions and Data

The components/elements of an instruction

i. An operation code, also termed an opcode which specifies the operation to be
performed ii. A reference to the operands on which data processing is to be
performed. For example, an address of an operand
iii. A reference to the operands which store the results of data processing operations
performed by the instruction.

27
Lecture note on Computer Architecture and Organization CSC303
iv. A reference for the next instruction, to be fetched and executed. The next instruction
which is to be executed is normally the next instruction following the current instruction in
the memory. Therefore, no explicit reference to the next instruction is provided.

Where are those operands located? In the memory or in the CPU registers or in the I/O

device.
If the operands are located in the registers then an instruction can be executed faster than
that of the operands located in the memory. The main reason here is that memory access
time is higher in comparison to the register access time.

How is an instruction represented? Instructions are represented as sequence of bits.

An instruction is divided into a number of fields. Each of these fields corresponds to a
constituent element of instruction. A layout of instruction is known as instruction format.

Categories of Instructions/Instruction types

i. Data Processing Instructions: These instructions are used for arithmetical and
logic operations in a machine. Examples of data processing instructions are: Arithmetic,
Boolean, shift, character and string processing instructions, stack and register,
manipulation instructions, vector instructions, etc.

ii. Data Storage/Retrieval Instructions: Since the data processing operations are
normally performed on the data stored in CPU registers, we need instructions to bring
data to and from memory to registers. These are called data storage/retrieval instructions.
Examples of data storage and retrieval instructions are load and store instructions.

iii. Data Movement Instructions: These are basically input/output instructions. They
are required to bring in programs and data from various devices to memory or to
communicate the results to the input/output devices. Some of these instructions can be:
start, halt, test etc.

iv. Control Instructions: These instructions are used for testing the status of
computation through Processor Status Word (PSW). Branch instruction.

v. Miscellaneous Instructions: These instructions do not fit in any of the above

categories. Some of these instructions are: interrupt or supervisory call, swapping, return
from interrupt, halt instruction or some privileged instruction of operating systems.
28
Lecture note on Computer Architecture and Organization CSC303
Factors to consider in the designing of instruction set for a machine?
Instruction set design is the most complex yet interesting and very much analyzed aspect
of computer design. The instruction set plays an important role in the design of the CPU
as it defines many functions of it. Since instruction sets are the means by which a
programmer can control the CPU, therefore, users’ views must be considered while
designing the instruction set.
Some of the important design issues relating to instruction design are:
i How many and what operations should be provided?
ii. What are the operand data types to be provided?
iii. What should be the instruction format? This includes issues such as: instruction
length, number of address, length of various elements of instructions, etc.
iv. What is the number of registers which can be referenced by an instruction and how are
they used?
v. What are the modes of specifying an operand address?

Operand Data Types

An operand data type specifies the type of data on which a particular operation can be
performed. For example, for an arithmetical operation, numbers are to be used as data
types. In general the operands which can be used in an instruction can be categorised
into four general categories. These are:

Addresses: Addresses are treated as a form of data which is used in the calculation of
actual physical memory address of an operand. In most of the cases, the addresses
provided in instruction are operand references and not the actual physical memory
addresses.

Numbers: All machines provide numeric data types. One special feature of numbers used
in computers is that they are limited in magnitude, and hence the underflow and overflow
may occur during arithmetical operations on these numbers. The maximum and minimum
magnitude is fixed for an integer number while a limit of precision of numbers and
exponent exist in the floating point numbers. The three numeric data types which are
common in computers are:

- Fixed point numbers or Integers (signed or unsigned)

- Floating point numbers
29
Lecture note on Computer Architecture and Organization CSC303
- Decimal numbers
All the machines provide instructions for performing arithmetical operations on fixed point
and floating point numbers. Many machines provide arithmetical instructions which
perform operations on packed decimal digits.

Characters: Another very common data type is the character or string of characters. The
most widely used character representation is ASCII(American National Standard Code of
Information Interchange). It has 7 bits for coding data pattern which implies 128 different
characters.

Some of these characters are control characters which may be used in data
communication. The eighth bit of ASCII may be used as a parity bit. One special mention
about ASCII which facilitates the conversion of a 7 bit ASCII and a 4 bit packed decimal
number is that the last four digits of ASCII number are binary equivalents of digi ts 0-9.

Logical Data: In general a data word or any other addressable unit such as byte, half
word etc. are treated as a single unit of data. But can we consider an n-bit data unit
consisting of n items of 1 bit each? If we treat each bit of an n-bit data as an item then it
can be considered to be logical data. Each of these n items can have a value 0 or 1. What
are the advantages of such a bit oriented view of data? The advantages of such a view
will be:

i. We can store an array of Boolean or binary data items most

efficiently.
ii. We will be in a position to manipulate the bits of any data item

Instruction Format

Information involved in any operation performed by the CPU needs to be addressed. In

computer terminology, such information is called the operand.

Therefore, any instruction issued by the processor must carry at least two types of
information.
30
Lecture note on Computer Architecture and Organization CSC303
These are the operation to be performed, encoded in what is called the op-code ﬁeld,
and the address information of the operand on which the operation is to be performed,
encoded in what is called the address ﬁeld.

Based on the number of operands, instructions can be classiﬁed as:

i. three-address,
ii. two-address,
iii. one-and-half-address,
iv. one-address, and
v. zero-address.

Using the convention - operation, source, destination to express an instruction, wherein

operation represents the operation to be performed, for example, Add, Subtract, Write,
or Read.
The source ﬁeld represents the source operand(s). The source operand can be a
constant, a value stored in a register, or a value stored in the memory. The destination
ﬁeld represents the place where the result of the operation is to be stored, for example, a
register or a memory location.

 A three-address instruction takes the form operation add-1, add-2, add-3.

In this form, each of add-1, add-2, and add-3 refers to a register or to a memory location.

For example, the instruction ADD R1, R2, R3

This instruction indicates that the operation to be performed is addition. It also indicates
that the values to be added are those stored in registers R1 and R2, that the results should
be stored in register R3.
An example of a three-address instruction that refers to memory locations may take the
form ADD A,B,C. The instruction adds the contents of memory location A to the contents
of memory location B and stores the result in memory location C.

 A two-address instruction takes the form operation add-1, add-2.

In this form, each of add-1 and add-2 refers to a register or to a memory location.
Consider, for example, the instruction

ADD R1, R2. This instruction adds the contents of register R1 to the content of R2 and
stores the results in register R2.

The original contents of register R2 are lost due to this operation while the original
contents of register R1 remain intact.

31
Lecture note on Computer Architecture and Organization CSC303
A similar instruction that uses memory locations instead of registers can take the form
ADD A, B. In this case, the contents of memory location A are added to the contents of
memory location B and the result is used to override the original contents of memory
location B.

 A one-address instruction takes the form ADD R1

In this case the instruction implicitly refers to a register, called the Accumulator Racc ,
such that the contents of the accumulator is added to the contents of the register R1 and
the results are stored back into the accumulator Racc

If a memory location is used instead of a register, then an instruction of the form ADD B
is used. In this case, the instruction adds the content of the accumulator Racc to the
content of memory location B and stores the result back into the accumulator Racc

 Between the two- and the one-address instruction, there can be a one-and-half
address instruction.

Consider, for example, the instruction ADD B, R1. In this case, the instruction adds the
contents of register R to the contents of memory location B and stores the result in register
R1

Owing to the fact that the instruction uses two types of addressing, that is, a register and
a memory location, it is called a one-and-half-address instruction. This is because register
addressing needs a smaller number of bits than those needed by memory addressing.

 zero-address instructions.

These are the instructions that use stack operation. A stack is a data organization
mechanism in which the last data item stored is the ﬁrst data item retrieved. Two speciﬁc
operations can be performed on a stack. These are the push and the pop operations. A
special register called stack pointer (SP), is used to indicate the stack location that can
be addressed. The classes of instruction is summarized in the table below.

Table 3: Instruction Classification

32
Lecture note on Computer Architecture and Organization CSC303

MEMORY OPERATIONS

The main memory can be modeled as an array of millions of adjacent cells, each capable of storing
a binary digit (bit), having value of 1 or 0. These cells are organized in the form of groups of fixed
number, say n, of cells that can be dealt with as an entity.

An entity consisting of 8 bits is called a byte. This address will be used to determine the location
in the memory in which a given word is to be stored. This is called a memory WRITE operation.
Similarly, the address will be used to determine the memory location from which a word is to be
retrieved from the memory. This is called a memory READ operation.

During a memory write operation a word is stored into a memory location whose address is
specified. During a memory read operation a word is read from a memory location whose address
is specified. Typically, memory read and memory write operations are performed by the central
processing unit (CPU).

Three basic steps are needed in order to perform a memory READ operation:
1.The address of the location from which the word is to be read is loaded into the MAR.
2.A READ signal issued by the CPU indicating that the word whose address is in the MAR is to be
read into the MDR.
3. The required word will be loaded by the memory into the MDR ready for use by the CPU.

33
Lecture note on Computer Architecture and Organization CSC303
Similarly, the 3 basic steps needed in order for the CPU to perform a WRITE operation into a
specified memory location:
1. The word to be stored into the memory location is first loaded by the CPU into a specified
register, called the memory data register (MDR).
2.The address of the location into which the word is to be stored is loaded by the CPU into a
specified register, called the memory address register (MAR).
3. WRITE signal is issued by the CPU indicating that the word stored in the MDR is to be stored
in the memory location whose address in loaded in the MAR.

FETCH-EXECUTE CYCLE

Fetch and Execute are the fundamental operations of the processor. The fetch-
decode execute cycle represents the steps that a computer follows to run a program. The
program which is to be executed is a set of instructions that is stored in the memory,
hence, the CPU executes the instructions that it finds in the computer’s memory. In order
to execute an instruction;

- the CPU must first fetch (transfer) the instruction from memory into one of its
registers. -the CPU then decodes the instruction, i.e. it decides which instruction
has been fetched and -finally it executes (carries out) the instruction.
The CPU then repeats this procedure, i.e. it fetches an instruction, decodes and executes
it. This process is repeated continuously and is known as the fetch-execute cycle.

This cycle begins when the processor is switched on and continues until the CPU is halted
(via a halt instruction, e.g. 8086 HLT instruction or the machine is switched off). The fetch-
execute cycle operates by first fetching an instruction.

Instruction Fetch

34
Lecture note on Computer Architecture and Organization CSC303
An instruction fetch involves the reading of an instruction from the memory location(s) to
the CPU. The execution of this instruction may involve several operations, depending on
the nature of the instruction. The processing needed for a single instruction (fetch and
execution) is referred to as an instruction cycle

- The Program Counter (PC) keeps track of the instruction that is to be executed
next after the execution of an on-going instruction. i.e. PC always contains the
address of the next instruction to be executed. A program counter is used for a fetch
cycle in a typical CPU.

- The instructions are loaded into the Instruction Register (IR), before their
execution. i.e. the IR holds the instruction to be executed.

Figure 7: The Fetch Execute Cycle

Instruction Execution

The instruction execution takes place in the CPU registers. The following are CPU
registers:

• Memory Address Register (MAR): It specifies the address of the memory location
from which the data or instruction is to be accessed (for a read operation) or to
which the data is to be stored (for a written operation).

35
Lecture note on Computer Architecture and Organization CSC303
 Memory Buffer Register (MBR): It is a register which contains the data to be
written in the memory (for a written operation) or it receives the data from the
memory (for read operation).

• Program Counter (PC): It keeps track of the instruction that is to be executed next,
after the execution of an on-going instruction.
• Instruction Register (IR): the instructions are loaded here before their execution.

THE EVOLUTION OF INTEL X86 ARCHITECTURE

In terms of market share, Intel has ranked as the number one maker of microprocessors
for non- embedded systems for decades.

Table 4 (a) to (d) shows the evolution and how microprocessors have grown faster and
much more complex.

36
Lecture note on Computer Architecture and Organization CSC303

It is worthwhile to list some of the highlights of the evolution of the Intel product line:
8080: The world’s first general- purpose microprocessor. This was an 8-bit machine, with
an 8-bit data path to memory. The 8080 was used in the first personal computer, the Altair.

-8086: A far more powerful, 16-bit machine. In addition to a wider data path and larger
registers, the 8086 sported an instruction cache, or queue, that prefetches a few
instructions before they are executed. A variant of this processor, the 8088, was used in
IBM’s first personal computer, securing the success of Intel.

-80286: This extension of the 8086 enabled addressing a 16-MB memory instead of just 1
MB.

-80386: Intel’s first 32-bit machine, and a major overhaul of the product. With a 32-bit
architecture, the 80386 rivaled the complexity and power of minicomputers and
mainframes introduced just a few years earlier. This was the first Intel processor to support
multitasking, meaning it could run multiple programs at the same time.

-80486: The 80486 introduced the use of much more sophisticated and powerful cache
technology and sophisticated instruction pipelining. The 80486 also offered a built-in math
coprocessor, offloading complex math operations from the main CPU.

37
Lecture note on Computer Architecture and Organization CSC303
-Pentium: With the Pentium, Intel introduced the use of superscalar techniques, which
allow multiple instructions to execute in parallel.

-Pentium Pro: The Pentium Pro continued the move into superscalar organization begun
with the Pentium, with aggressive use of register renaming, branch prediction, data flow
analysis, and speculative execution.

-Pentium II: The Pentium II incorporated Intel MMX technology, which is designed
specifically to process video, audio, and graphics data efficiently.

-Pentium III: The Pentium III incorporates additional floating point instructions: The
Streaming SIMD Extensions (SSE) instruction set extension added 70 new instructions
designed to increase performance when exactly the same operations are to be performed
on multiple data objects. Typical applications are digital signal processing and graphics
processing.

-Pentium 4: The Pentium 4 includes additional floating point and other enhancements for
multimedia.

-Core: This is the first Intel x86 microprocessor with a dual core, referring to the
implementation of two cores on a single chip.

- Core 2: The Core 2 extends the Core architecture to 64 bits. The Core 2 Quad provides
four cores on a single chip. More recent Core offerings have up to 10 cores per chip. An
important addition to the architecture was the Advanced Vector Extensions instruction set
that provided a set of 256-bit, and then 512 bit, instructions for efficient processing of vector
data.
Although the organization and technology of the x86 machines have changed dramatically
over the decades, the instruction set architecture has evolved to remain backward
compatible with earlier versions. Thus, any program written on an older version of the x86
architecture can execute on newer versions.

REGISTERS, TYPES OF REGISTERS, 80X86 PROGRAMMING MODEL.

Registers are extremely fast memory locations within the CPU that are used to create and
store the results of CPU operations and other calculations. Computers differ in register
sets, number of registers, register types, and the length of each register. They also differ
in the usage of each register.
General-purpose registers can be used for multiple purposes and assigned to a variety of
functions by the programmer.

38
Lecture note on Computer Architecture and Organization CSC303
Special-purpose registers are restricted to only specific functions. In some cases, some
registers are used only to hold data and cannot be used in the calculations of operand
addresses.
Therefore, the Internal Registers of 8086 is shown in the figure 7 below

Figure 7: 8086 Internal Registers

• The 8086 has the following groups of the user accessible internal registers. These are
- Instruction pointer(IP)
_ Four General purpose registers(AX,BX,CX,DX)
_ Four pointer (SP,BP,SI,DI)
_ Four segment registers (CS,DS,SS,ES)
_ Flag Register(FR)

• The 8086 has a total of fourteen 16-bit registers including a 16 bit register called the
status register (flag register), with 9 of bits implemented for status and control flags.

Segment Registers

1) Code segment (CS) is a 16-bit register containing address of 64 KB segment with processor
instructions. The processor uses CS segment for all accesses to instructions referenced by
instruction pointer (IP) register.

2) Stack segment (SS) is a 16-bit register containing address of 64KB segment with program
stack. By default, the processor assumes that all data referenced by the stack pointer (SP) and
base pointer (BP) registers is located in the stack segment. SS register can be changed directly
using POP instruction.

39
Lecture note on Computer Architecture and Organization CSC303
3) Data and Extra segment (DS and ES) is a 16-bit register containing address of 64KB segment
with program data. By default, the processor assumes that all data referenced by general
registers (AX, BX, CX, and DX) and index register (SI, DI) is located in the data and Extra
segment.

Data Registers

1) AX (Accumulator)
• It consists of two 8-bit registers AL and AH, which can be combined together and used as a 16-
bit register AX. AL in this case contains the low order byte of the word, and AH contains the high-
order byte.
*Accumulator can be used for I/O operations and string manipulation.

2) BX (Base register)
• It is consists of two 8-bit registers BL and BH, which can be combined together and used as a
16-bit register BX. BL in this case contains the low order byte of the word, and BH contains the
high-order byte.
• BX register usually contains a offset for data segment.

3) CX (Count register)
• It is consists of two 8-bit registers CL and CH, which can be combined together and used as a
16-bit register CX. When combined, CL register contains the low-order byte of the word, and CH
contains the high-order byte.
• Count register can be used in Loop, shift/rotate instructions and as a counter in string
manipulation.

4) DX (Data register)
• It is consists of two 8-bit registers DL and DH, which can be combined together and used as a
16-bit register DX. When combined, DL register contains the low-order byte of the word, and DH
contains the high-order byte.
• DX can be used as a port number in I/O operations.
• In integer 32-bit multiply and divide instruction the DX register contains high-order word of the
initial or resulting number.

Pointer register

1. Stack Pointer (SP) is a 16-bit register is used to hold the offset address for stack segment.

2. Base Pointer (BP) is a 16-bit register is used to hold the offset address for stack segment.
i. BP register is usually used for based, based indexed or register indirect
addressing.

ii. The difference between SP and BP is that the SP is used internally to store the address
in case of interrupt and the CALL instruction.

40
Lecture note on Computer Architecture and Organization CSC303

3. Source Index (SI) and Destination Index (DI). These two 16-bit register is used to hold the
offset address for DS and ES in case of string manipulation instruction.
i. SI is used for indexed, based indexed and register indirect addressing, as well as a source
data addresses in string manipulation instructions.

ii. DI is used for indexed, based indexed and register indirect addressing, as well as a
destination data addresses in string manipulation instructions.

Instruction Pointer (IP)

It is a 16-bit register. It acts as a program counter and is used to hold the offset address for CS.

Flag Register
A flag is a 16-bit register containing 9 one bit flags. i.e. the Flag Register is addressable by bit as
shown in the figure below. Each of the bit depicts a status flag of the microprocessor.

i. Overflow Flag (OF) :This flag is set if an overflow occurs. i.e. if the result of a signed operation
is large enough to be accommodated in a destination register.

ii. Direction Flag (DF) : This is used by string manipulation instructions. If this flag bit is ‘0’, the
string is processed beginning from the lowest address to the highest address. i.e. auto-
incrementing mode.
Otherwise, the string is processed from the highest address towards the lowest address, i.e. auto-
decrementing mode.

iii. Interrupt-enable Flag (IF) : If this flag is set, the maskable interrupts are recognized by the
CPU. Otherwise they are ignored. Setting this bit enables maskable interrupts.

iv. Single-step Flag (TF) : If this flag is set, the processor enters the single step execution mode.
In other words, a trap interrupt is generated after execution of each instruction. The processor
executes the current instruction and the control is transferred to the Trap interrupt service routine.

v. Sign Flag (SF) : _ This flag is set when the result of any computation is negative. For signed
computations, the sign flag equals the MSB of the result.

vi. Zero Flag (ZF) - set if the result is zero.

vii. Auxiliary carry Flag (AF) : set if there was a carry from or borrow to bits 0-3 in the AL register.
41
Lecture note on Computer Architecture and Organization CSC303

viii. Parity Flag (PF) : set if parity (the number of "1" bits) in the low-order byte of the result is
even.

ix. Carry Flag (CF) :This flag is set when there is a carry out of MSB in case of addition or, a
borrow in case of subtraction. For example. When two numbers are added, a carry may be
generated out of the most significant bit position. The carry flag, in this case, will be set to 1’. In
case, no carry is generated, it will be ‘0.

Memory Access Registers

Some registers are used in memory references. Two registers are essential in memory write and
read operations: the memory data register (MDR) and memory address register (MAR). The MDR
and MAR are used exclusively by the CPU and are not directly accessible to programmers

Instruction Fetching Registers

Two main registers are involved in fetching an instruction for execution: the program counter
(PC) and the instruction register (IR). The PC is the register that contains the address of the
next instruction to be fetched. The fetched instruction is loaded in the IR for execution. After a
successful instruction fetch, the PC is updated to point to the next instruction to be
executed. In the case of a branch operation, the PC is updated to point to the branch target
instruction after the branch is resolved, that is, the target address is known.
Condition Registers
Condition registers, or flags, are used to maintain status information. Some
architectures contain a special program status word (PSW) register. The PSW contains bits that
are set by the CPU to indicate the current status of an executing program. These indicators
are typically for arithmetic operations, interrupts, memory protection information, or processor
status.

Special-Purpose Address Registers

- Index Register, used in index addressing. The address of the operand is obtained by
adding a constant to the content of a register, called the index register. The index
register holds an address displacement. Index addressing is indicated in the instruction by
including the name of the index register in parentheses and using the symbol X to indicate
the constant to be added.

- Segment Pointers used in order to support segmentation, the address issued by the
processor should consist of a segment number (base) and a displacement (or an
offset) within the segment. A segment register holds the address of the base of the
segment.

42
Lecture note on Computer Architecture and Organization CSC303
- Stack Pointer. A stack is a data organization mechanism in which the last data item stored
is the first data item retrieved. Two specific operations can be performed on a stack.
These are the Push and the Pop operations. The stack pointer (SP) is used to indicate
the stack location that can be addressed. In the stack push operation, the SP value is
used to indicate the location (called the top of the stack).

ADDRESSING MODE

This refers to the different ways in which operands can be addressed. Addressing mode differ in
the way the address information of operands is specified. The basic addressing modes are;

i. IMMEDIATE ADDRESSING

- The operand is given explicitly as part of the instruction. No memory access is required. Also
operand could follow immediately after the instruction. According to this addressing mode, the
value of the operand is (immediately) available in the instruction itself. For example, the case of
loading the decimal value 9000 into a register Ri. This operation can be performed using an
instruction LOAD 9000, Ri.
In this instruction, the operation to be performed is to load a value into a register. The
source operand is (immediately) given as 9000, and the destination is the register Ri.

ii. DIRECT (ABSOLUTE) ADRESSING

- The address of operand is given explicitly as part of the instruction. According to this addressing
mode, the address of the memory location that holds the operand is included in the instruction.
For example, the case of loading the value of the operand stored in memory location 5000 into
register Ri. This operation can be performed using an instruction; MOV DH, [5000]
In this instruction, the source operand is the value stored in the memory location whose address
is 5000 , and the destination is the register DH.
iii. REGISTER INDIRECT ADDRESSING
- The effective address of the operand is in the register or main memory location whose address
appears in the instruction. The name of a register or a memory location that holds the (effective)
address of the operand is included in the instruction. In order to indicate the use of indirection in
the instruction, it is customary to include the name of the register or the memory location in
parentheses. For example, the instruction; MOV CL, [BX]
This instruction moves the content of address indicated by SI into CL. i.e. address of operand is
held in register SI.
iv. INDEXED RELATIVE ADDRESSING

43
Lecture note on Computer Architecture and Organization CSC303
- The effective address (EA) of the operand is generated by adding an index register value (X) to
the direct address (DA) i.e. EA = X + DA
In this addressing mode, the address of the operand is obtained by adding a constant to the
content of a register, called the index register. For example, the instruction LOAD 5+ [DI], AX.
This instruction loads register AX with the contents of the memory location whose address is the
sum of the contents of register DI and the value 5. Index addressing is indicated in the instruction
by including the name of the index register in parentheses and symbol X indicate the constant
to be added.

v. BASE RELATIVE ADDRESSING

The effective address of the operand is generated by adding a constant to the direct address
indicated in the instruction. Hence, the address of the operand is obtained by adding a constant
to the content of a base register indicated in parenthesis. For example, the instruction
MOV DX, [BP] +10
This instruction moves the content of register (BP plus 10) into register DX i.e. the effective
address of operand to be moved into DX is obtained by summing the constant 10 with the value
in register BP.

Comparing Intel 8086 and 80x86 PROGRAMMER’S MODEL

Programmer’s model is the available hardware resources of a microprocessor available

for direct programming. These include; the internal registers, addressing mode, instruction
set, and data types. Intel processors, depending on family generation, can be programmed to
operate on integer data, floating point data, and multimedia data. The Figure 8 below shows the
programmer’s model of Intel 8086 and Intel 80x86.

44
Lecture note on Computer Architecture and Organization CSC303

Figure 8: Intel 8086 vs 80x86 Registers

45
Lecture note on Computer Architecture and Organization CSC303

The Intel processors architecture are 80x86/Pentium, where x ≥ 3, processors. Its features
include:
 Increased data bus from 16bits to 32 bits
 IA-32 processors are 32 bit integrated processors that can operate on integer and floating
point data
 It is backward compatible with 16 bit 8086 in real mode
 IA-32 operates in real mode by default, hence it has to be switched to protected mode
 Pentium II processors, as a family of IA-32, support MMX, i.e. multimedia data structure
which is SIMD (single instruction multiple data) in nature

IA-32 Processor Internal Registers:

 The Intel 80x86 extends the 4 general purpose registers, the pointer register and the index
registers to 32 bits. It adds 2 extra segment registers FS and GS
 8 of 32 bit Registers (General Purpose Registers, GPRs): EAX, EBX, ECX, EDX, EBP,
ESP, ESI, EDI;
 All the general registers (16 bit/8bit) of 8086 (AX, BX,CX, DX, BP, SP, SI,DI, AH, AL,
BH, BL, CH, CL. DH, DL) and 16 bit IP, and 16 bit Flags ;
 6 of 16 bit Segment Registers: CS, SS, DS, ES, FS,GS ;
 1 of 32 bit Instruction Pointer: EIP ;
 1 of 32 bit Flags: EFP ;
 IA-32 increased address bus from 16 bits to 32 bits so that the physical addressable
memory is
232 = 4GB of memory.
 The default segment register in 32 bit programming is DS register
 The extended registers can all be used as pointer with DS register as the offset unlike
8086 where only SI, DI, BP, SP can be used as pointer.

IA-32 Data Types:

Byte, Word, Double Word, Single precision floating point, Double precision floating point,
Temporary Real floating point, Packed BCD
Lecture note on Computer Architecture and Organization CSC303

IA-32 Addressing Techniques:

(i) Immediate addressing (ii) Register addressing (iii) Direct addressing (v) Register Indirect

(iv) Relative base addressing (v) Relative Index addressing (vi) Based Index addressing
(vii) Relative based index addressing (viii) Scaled index addressing

(i) Immediate Addressing

Operand is specified as source operand in the instruction-e.g

MOV AL, 22h

MOV EBX, 12345678h

(ii) Register Addressing

Operand is in the specified register which is ≥ 80386 registers:

EAX, AX, AL, AH, EBX, BX, BH, BL, ECX, CX, CH, CL, EDX, DX, DH, DL, EBP, BP, ESP,
SP, ESI, SI, EDI, DI

(iii) Direct Addressing

Operand has its memory address specified in the instruction-e.g

MOV AL, [1234h];
MOV EBX, list;

(iv) Register Indirect

▪ Memory address of the operand is pointed to by the register contents of either a base register
(BX, BP), an index register (SI, DI), or any of the general purpose 32 bit registers (EAX, EBX,
ECX, EDX, EBP, ESI, EDI)
▪ Operand size: byte, word, double word
e.g MOV AL, [ECX]
(v) Base + Index Addressing
-is a register indirect addressing where a combination of (base + index) registers is used as an
operand memory address pointer.
- any pair of the general purpose 32 bit registers (EAX, EBX, ECX, EBP, EDI, ESI) can be used
- the first register is the base and the second is the index regardless of any of the 32 bit general
purpose register used -e.g MOV [EAX + ECX ] , BL
Lecture note on Computer Architecture and Organization CSC303

(vi) Register Relative Addressing

- Effective memory address of an operand is the combination of any of the index
register + displacement, base register + displacement, or any of the general
purpose registers + displacement
- displacement size:2n where n = 2, 3, 4…..e.g
MOV AX, [BX + 4]
MOV AX, [ECX + 4]

(vii) Relative Base + Index Addressing

-Same as register relative addressing except that displacement is added to a pair of base + index
register to form the effective memory address of the operand. Any pair of general purpose
registers + displacement can be used. e.g
MOV AX, [AX + DI + 30]
MOV ECX, ARRAY[EBX + ECX]

(viii) Scaled index addressing

-available on ≥ 80386 only

-the second general purpose register is scaled by a factor of 1, 2, 4, or 8 and added to
another general purpose register to form a memory pointer for the operand.
-e.g. MOV EDX, [EAX + 4 * EBX]
Scaled index addressing mode allows easy access to multidimensional arrays. In this addressing
mode any of the 32bit registers except ESP can be used as a pointer which is multiplied by a
scaled factor as stated above corresponding to byte, word, doubleword and quadword operands.
Note that only the 32 bit register can be used for scale addressing mode. 16 bit register
cannot be used as a scaled index.

Example: Find the effective address in each of the following cases. Assume that ESI =
200h, ECX =100h, EBX = 50h and EDI = 100h.
1. MOV AX, [2000 + ESI *4] 3. MOV ECX, [2400 + EBX *4]
2. MOV AX, [5000 + ECX *2] 4. MOV DX, [100 + EDI*8]
Lecture note on Computer Architecture and Organization CSC303

Solution:
1. 2000h + 200h x 4 = 2000h + 800h = 2800h. Therefore the address of the operand moved
into
AX is DS:2800h
2. Effective address = 5000h + 100 x 2 = 5000h + 200h = 5200h.
3. EA= 2400 +50x4= 2400+200= 2600h
4. EA = 100 + 100x8= 100+800 =900h

80X86 FLAG REGISTER BIT AS REGARDS ADD INSTRUCTION

The flag bits affected by the ADD instruction are carry flag (CF), parity flag (PF), auxiliary carry
flag (AF), zero flag (ZF), sign flag (SF) and overflow flag (OF).
CF- This flag is set whenever there is a carry out either from d7 after an 8bit operation or from
d15 after a 16bit data operation.
PF – After certain operations, the parity of the result’s low order byte is checked. If the byte has
an even number of 1’s, the parity flag is set to 1, otherwise it is cleared i.e. 0. Parity is checked
for the lower 8 bits only in a 16 bit operation
AF – If there is a carry from d3 to d4 of an operation, this bit is set, otherwise it is cleared.
ZF – Is set to 1 if the result of an arithmetic or logical operation is zero, otherwise, it is cleared.
SF – the binary representation of signed numbers uses the most significant bit as the sign bit.
After arithmetic or logic operations, the status of this sign bit is copied into the SF, thereby
indicating the sign of the result.
OF – is set whenever the result of a signed number operation is too large, causing the high order
bit to overflow into the sign bit.

Examples:-Show how the flag register is affected by the addition of 38h and 2Fh in the following
lines of code. MOV BH, 38h ; ADD BH, 2Fh
38h 0011 1000
2Fh 00101111
67h 01100111

CF = 0 since there is no carry beyond d7

PF = 0 since there is an odd number of 1’s in the result
AF = 1 since there is a carry from d3 to d4
ZF = 0 since the result is not zero
SF = 0 since d7 of the result is zero
Lecture note on Computer Architecture and Organization CSC303

1. Show how the flag register is affected by the following lines of

code.
MOV AL, 9Ch 9Ch 1001
1100
MOV DH, 64h 64h 0110
0100
ADD AL, DH 00h 0000
0000
CF = 1 since there is a carry beyond d7
PF = 1 since there is an even number of 1’s in the result
AF = 1 since there is a carry from d3 to d4
ZF = 1 since the result is zero
SF = 0 since the d7 of the result is zero

2. Show how the flag register is affected by

MOV BX, AAAAh
ADD BX, 5556 h
AAAAh 1010 1010 1010 1010
5556h

CF = 1 since there is a carry beyond d15

PF = 1 since there is an even number of 1’s in the lower byte
AF = 1 since there is a carry from d3 to d4
ZF = 1 since the result is zero
SF = 0 since the d15 of the result is zero

3. How would the status flags be set after the processor performed
the 8-bit addition of 101101012 and 100101102?
Lecture note on Computer Architecture and Organization CSC303

Exercise:

1. Show how the flag register is affected by the instruction

MOV AL, 0F5 h
ADD AL, 0Bh

Logical Instructions
- AND destination, Source
E.g. MOV BL, 35h
AND BL, 0Fh
35h 0011 0101
0Fh

- OR destination, source
e.g. MOV AX, 0504 h
OR AX, DA68h
0504h 0000 0101 0000 0100
DA68h
DF6Ch
Lecture note on Computer Architecture and Organization CSC303

- XOR destination, source. When the inputs are different, you have 1 and when they are the
same, you have 0.

Logical Shift – Right and Left. E.g. show the result of SHR in the flowing instructions.
MOV AL, 9Ah
MOV CL, 3 ;move 3 into CL
SHR AL, CL ;move AL 3 times
9Ah = 10011010
01001101 1st shift
00100110 2nd shift
00010011 3rd shift AL = 13 h
Exercise: Use operand 4FCAh and C237h to perform (a) AND (b) OR (c) XOR

Computer Architecture Lecture Notes
No ratings yet
Computer Architecture Lecture Notes
78 pages
Computer Organization & Architecture Handout
No ratings yet
Computer Organization & Architecture Handout
59 pages
Computer Architecture Course Overview
No ratings yet
Computer Architecture Course Overview
36 pages
Computer Organization & Architecture Course
No ratings yet
Computer Organization & Architecture Course
55 pages
Computer Organization Notes
No ratings yet
Computer Organization Notes
82 pages
L01 Introduction
No ratings yet
L01 Introduction
22 pages
Advanced Computer Architecture Course Overview
No ratings yet
Advanced Computer Architecture Course Overview
43 pages
WINSEM2024-25 ECE2002 TH AP2024254000162 2024-12-12 Reference-Material-I
No ratings yet
WINSEM2024-25 ECE2002 TH AP2024254000162 2024-12-12 Reference-Material-I
162 pages
Speedup in Multiprocessor Systems
No ratings yet
Speedup in Multiprocessor Systems
222 pages
Computer Architecture Basics and Concepts
No ratings yet
Computer Architecture Basics and Concepts
21 pages
Module - 1-1
No ratings yet
Module - 1-1
58 pages
IIC1082 Chapter1 A
No ratings yet
IIC1082 Chapter1 A
39 pages
Computer Organization and Architecture Tutorial
No ratings yet
Computer Organization and Architecture Tutorial
6 pages
Advanced Computer Architecture: CSE-401 E
No ratings yet
Advanced Computer Architecture: CSE-401 E
71 pages
Notes Lecture1 Intro
No ratings yet
Notes Lecture1 Intro
43 pages
Computer Architecture CSC 303
100% (1)
Computer Architecture CSC 303
36 pages
CompArch CH 1 1
No ratings yet
CompArch CH 1 1
23 pages
Computer Architecture Course Overview
No ratings yet
Computer Architecture Course Overview
3 pages
EC8552 Computer Architecture Syllabus
No ratings yet
EC8552 Computer Architecture Syllabus
1 page
It (r22) 2-2 Computer Organization & Microprocessors Digital Notes (2023-24)
No ratings yet
It (r22) 2-2 Computer Organization & Microprocessors Digital Notes (2023-24)
311 pages
COA-note 1680051568084
No ratings yet
COA-note 1680051568084
136 pages
106105163
No ratings yet
106105163
1,044 pages
Computer Architecture: Dept. of Computer Science (UOG) University of Gujrat
No ratings yet
Computer Architecture: Dept. of Computer Science (UOG) University of Gujrat
20 pages
Computer - Archito - Lecture 1
No ratings yet
Computer - Archito - Lecture 1
30 pages
Introduction to Machine Architecture
No ratings yet
Introduction to Machine Architecture
26 pages
Computer Architecture Course Overview
No ratings yet
Computer Architecture Course Overview
500 pages
Computer Architecture Module Outline
No ratings yet
Computer Architecture Module Outline
5 pages
Computer Architecture Course Guide
No ratings yet
Computer Architecture Course Guide
22 pages
01 CSL Part One
No ratings yet
01 CSL Part One
79 pages
Understanding Computer Architecture Basics
No ratings yet
Understanding Computer Architecture Basics
16 pages
Computer Organization & Architecture Syllabus
No ratings yet
Computer Organization & Architecture Syllabus
398 pages
Computer Architecture Overview
No ratings yet
Computer Architecture Overview
23 pages
CSE 243: Introduction To Computer Architecture and Hardware/Software Interface
No ratings yet
CSE 243: Introduction To Computer Architecture and Hardware/Software Interface
27 pages
Module-1 and 2
No ratings yet
Module-1 and 2
150 pages
Slides A CODch1
No ratings yet
Slides A CODch1
41 pages
Computer Organization and Architecture Guide
No ratings yet
Computer Organization and Architecture Guide
76 pages
Computer Architecture Course Overview
No ratings yet
Computer Architecture Course Overview
34 pages
Computer System Architecture: Self Learning Material
No ratings yet
Computer System Architecture: Self Learning Material
217 pages
CSC 323 Computer Architecture and Organization II 2ND SEMESTER
No ratings yet
CSC 323 Computer Architecture and Organization II 2ND SEMESTER
128 pages
Computer Architecture Overview and Design
No ratings yet
Computer Architecture Overview and Design
21 pages
Computer Architecture Overview and Insights
No ratings yet
Computer Architecture Overview and Insights
38 pages
Computer Organization and Architecture Tutorial
No ratings yet
Computer Organization and Architecture Tutorial
6 pages
Computer Architecture Course Overview
No ratings yet
Computer Architecture Course Overview
553 pages
Computer Architecture and Organization Overview
No ratings yet
Computer Architecture and Organization Overview
26 pages
BTech - Semester - III - Computer Architecture
No ratings yet
BTech - Semester - III - Computer Architecture
6 pages
Omputer Rganization and Esign: The Hardware/Software Interface
No ratings yet
Omputer Rganization and Esign: The Hardware/Software Interface
64 pages
CSE 243: Introduction To Computer Architecture and Hardware/Software Interface
No ratings yet
CSE 243: Introduction To Computer Architecture and Hardware/Software Interface
27 pages
Introduction
No ratings yet
Introduction
5 pages
Advanced Computer Architecture Course Guide
No ratings yet
Advanced Computer Architecture Course Guide
71 pages
01 Intro
No ratings yet
01 Intro
17 pages
Intro Coa p1
No ratings yet
Intro Coa p1
45 pages
Computer Architecture Course Overview
No ratings yet
Computer Architecture Course Overview
68 pages
1 Introduction21
No ratings yet
1 Introduction21
34 pages
Computer Organization Overview Guide
No ratings yet
Computer Organization Overview Guide
74 pages
Computer Architecture and Organization: Thanudas. B
No ratings yet
Computer Architecture and Organization: Thanudas. B
22 pages
Computer Architecture Overview and Rules
No ratings yet
Computer Architecture Overview and Rules
30 pages
COA Lecture 2 3
No ratings yet
COA Lecture 2 3
53 pages
Current Affairs Pocket PDF - November 2025 by AffairsCloud 1
No ratings yet
Current Affairs Pocket PDF - November 2025 by AffairsCloud 1
195 pages
Volvo Diesel Generator Specs
No ratings yet
Volvo Diesel Generator Specs
6 pages
Frankenstein
83% (6)
Frankenstein
6 pages
PRD Totalchrom63
No ratings yet
PRD Totalchrom63
4 pages
Absolute Value Functions
67% (3)
Absolute Value Functions
7 pages
List of Philippine Mining Laws
100% (1)
List of Philippine Mining Laws
10 pages
January Bill 2024
No ratings yet
January Bill 2024
2 pages
Grade 5 English Exam Memorandum
No ratings yet
Grade 5 English Exam Memorandum
9 pages
Year 8 Termly Exam 1 Science
No ratings yet
Year 8 Termly Exam 1 Science
10 pages
Newman and Orlando Nursing Theories Quiz
No ratings yet
Newman and Orlando Nursing Theories Quiz
8 pages
English - I (Quarter-I)
No ratings yet
English - I (Quarter-I)
2 pages
Plankton Identification for Students
No ratings yet
Plankton Identification for Students
5 pages
Gesture Recognition for Bomb Defusing Bots
No ratings yet
Gesture Recognition for Bomb Defusing Bots
5 pages
Aquaculture Biology Guide
No ratings yet
Aquaculture Biology Guide
40 pages
Mechanical 3-Roller Rolling Machine W11-12x3000
100% (1)
Mechanical 3-Roller Rolling Machine W11-12x3000
2 pages
Dominar 400 Spare Parts Catalogue
No ratings yet
Dominar 400 Spare Parts Catalogue
83 pages
Linear Optimization Geometry Guide
No ratings yet
Linear Optimization Geometry Guide
135 pages
Sony HCD-RG66T Service Manual
100% (1)
Sony HCD-RG66T Service Manual
72 pages
Maharashtra Urban Water Supply Benchmarking
No ratings yet
Maharashtra Urban Water Supply Benchmarking
60 pages
Conclusion and Ref - 240707 - 210549
No ratings yet
Conclusion and Ref - 240707 - 210549
2 pages
Laser-Driven Tin Plasma Expansion With Relevan
No ratings yet
Laser-Driven Tin Plasma Expansion With Relevan
18 pages
Ballad of The Three Rivers
No ratings yet
Ballad of The Three Rivers
2 pages
MA3151 Matrices and Calculus Reg 2021 Jan 2022.
No ratings yet
MA3151 Matrices and Calculus Reg 2021 Jan 2022.
5 pages
Sets, Relation and Functions S
No ratings yet
Sets, Relation and Functions S
4 pages
Zack, Naomi - Race Sex. Their Sameness, Difference and Interplay
No ratings yet
Zack, Naomi - Race Sex. Their Sameness, Difference and Interplay
249 pages
Datapath Design in ARM Processors
No ratings yet
Datapath Design in ARM Processors
17 pages
.30-06 Springfield Cartridge Overview
No ratings yet
.30-06 Springfield Cartridge Overview
5 pages
Problem Analysis Techniques
100% (5)
Problem Analysis Techniques
10 pages
Solution Manual For Marketing Research, 9th Edition, Alvin C. Burns Ann F. Veeck Instant Access 2025
100% (7)
Solution Manual For Marketing Research, 9th Edition, Alvin C. Burns Ann F. Veeck Instant Access 2025
508 pages
Motocross Tire Selection Guide
No ratings yet
Motocross Tire Selection Guide
7 pages