CSC303: Computer Architecture I Lecture Notes
CSC303: Computer Architecture I Lecture Notes
1
Lecture note on Computer Architecture and Organization CSC303
II. Richard C. Detmer. “Introduction to 80x86 Assembly Language and Computer Architecture”.
Second Edition
III. (2019) John & Bartlelt publishers, LLC
IV. Mark Balch. (2003): “A Comprehensive Guide to Digital Electronics and Computer System
Architecture”.
V. McGRAW-HILL Company. doi: 10.1036/0071433473.
VI. William Stallings. (2020): Computer Organization and Architecture. Designing for performance. 6 th
Edition. 5. Brookshear, J.G. Computer Science: An Overview. (Boston, Mass.: Pearson, 2009)
tenth edition [ISBN 9780321544285 (pbk)].1 6. Reynolds, C. and P. Tymann Schaum’s Outline of
Principles of Computer Science ( Schaum’s Outline Series). (New York :
VII. McGraw-Hill, 2020)
VIII. Stallings, W. Computer Organization and Architecture, Designing for Performance. ( Boston,
Mass.: Pearson, 2018).
IX. Tanenbaum, A. Structured Computer Organization. (Upper Saddle River, N.J.: Pearson Prentice
Hall, 2020).
2
Lecture note on Computer Architecture and Organization CSC303
MODULE ONE
Early counting devices, generations of computers (Vacuum tube, Transistor, Integrated Circuit,
Very Large Scale Integration, AI).
Classification
o Types of data: Digital, Analogue, Hybrid
o Purpose: General purpose, Special purpose
o Size: Notebook, microcomputer, minicomputer, mainframe, supercomputer
The ancestors of the modern age computer were the mechanical and electromechanical devices.
These include:- Blaise Pascal’s machine, Difference Engine, Anal machine, Difference Engine,
Analytical Engine, ENIAC, EDSAC, EDVAC, UNIVAC, MARK I, II, III, etc
Computer technology has made incredible improvement in the past half century. In the early part
of computer evolution, there were no stored-program computer, the computational power was
less and the sizes of the computers were very large. Nowadays, a personal computer has more
computational power, memory, disk storage, smaller in size and available in affordable cost. This
rapid improvement is as a result of advances in the technology used to build computers and
innovation in computer design.
3
Lecture note on Computer Architecture and Organization CSC303
What is Computer Architecture?
o The structure and functional organization of a computer system.
o It specifies how data is processed and transferred between different parts of the
computer.
o It focuses on the design principles and how different components work together.
Computer architecture can be divided into two main types: Von Neumann architecture and
Harvard architecture.
1. Von Neumann architecture, also known as the Princeton architecture, was established in a
1945 presentation by John von Neumann and his collaborators in the First Draft of a Report
on the EDVAC. This example of computer architecture proposes five components: a
processor with connected registers, a control unit capable of storing instructions,
memory capable of storing information as well as instructions and communicating via
buses, additional or external storage, and device input as well as output mechanisms. On
the other hand,
2. Harvard architecture refers to a computer architecture with distinct data and instruction
storage and signal pathways. In contrast to the von Neumann architecture, in which
program instructions and data use the very same memory and pathways, this design
separates the two. In practice, a customized Harvard architecture with two distinct caches
is employed (for data and instruction); X86 and Advanced RISC Machine (ARM) systems
frequently employ this instruction
4
Lecture note on Computer Architecture and Organization CSC303
Overview of Microprocessor Architecture
For example to send data to the output device, the CPU places the device address on the address
bus, data on the data bus and enables the output device.
System Buses
Buses are wires connecting memory & I/O to microprocessor. 3 main types of Buses;
– Address Bus
• Unidirectional
• Identifying peripheral or memory location
– Data Bus
• Bidirectional
• Transferring data
– Control Bus
• Synchronization signals
• Timing signals
• Control signal
Changes in technology not only influe nc e organization but also result in the introduction of
more powerful and more complex architecture. However, because a computer organization
must be designed to implement a particular architectural specification, a thorough treatment
5
Lecture note on Computer Architecture and Organization CSC303
of organization requires a detailed examination of architecture a s we ll. Computer architecture
comes before Computer organization.
Computer architecture and computer organization are related but distinct in concepts.
Computer Architecture refers to the design of the internal workings of a computer system,
including the CPU, memory, and other hardware components. It involves decisions about the
organization of the hardware, such as the instruction set architecture, the data path design, and
the control unit design.
Computer Architecture is concerned with optimizing the performance of a computer system
and ensuring that it can execute instructions quickly and efficiently.
On the other hand,
Computer Organization refers to the operational units and their interconnections that
implement the architecture specification. It deals with how the components of a computer system
are arranged and how they interact to perform the required operations.
Computer Organization is concerned with the physical implementation of the architecture
design and includes decisions about the interconnection and communication between
components, such as the bus structure, memory hierarchy, and input/output systems.
Computer Organization comes after the decision of Computer Architecture first.
Computer Organization is how operational attributes are linked together and contribute to
realizing the architectural specification, hence Computer Organization deals with a structural
relationship.
6
Lecture note on Computer Architecture and Organization CSC303
A computer system, like any system, consists of an interrelated set of components. The system is
best characterized in terms of structure, the way in which components are interconnected, and
function, the operation of the individual components. Furthermore, a computer’s organization is
hierarchical.
Each major component can be further described by decomposing it into its major subcomponents
and describing their structure and function.
Function
Both the structure and functioning of a computer are, in essence, simple. In general terms, there
are only four basic functions that a computer can perform:
7
Lecture note on Computer Architecture and Organization CSC303
• Data processing: Data may take a wide variety of forms, and the range of processing
requirements is broad.
• Data storage: Even if the computer is processing data on the fly (i.e., data come in and get
processed, and the results go out immediately), the computer must temporarily store at
least those pieces of data that are being worked on at any given moment. Thus, there is at
least a short- term data storage function. Equally important, the computer performs a long-
term data storage function. Files of data are stored on the computer for subsequent retrieval
and update.
• Data movement: The computer’s operating environment consists of devices that serve as
either sources or destinations of data. When data are received from or delivered to a device
that is directly connected to the computer, the process is known as input– output (I/O), and
the device is referred to as a peripheral. When data are moved over longer distances, to or
from a remote device, the process is known as data communications.
• Control: Within the computer, a control unit manages the computer’s resources and
orchestrates the performance of its functional parts in response to instructions.
Structure
There are four main structural components:
• Central processing unit (CPU): Controls the operation of the computer and performs its data
processing functions; often simply referred to as processor.
• Main memory: Stores data.
• I/O: Moves data between the computer and its external environment.
• System interconnection: Some mechanism that provides for communication among CPU,
main memory, and I/O. A common example of system interconnection is by means of a
system bus, consisting of a number of conducting wires to which all the other components
attach.
1. Bus
A bus is a bundle of wires grouped together to serve a single purpose. The main purpose of the
bus is to transfer data from one device to another. The processor's interface to the bus includes
connections used to pass data, connections to represent the address with which the processor
is interested, and control lines to manage and synchronize the transaction. The three major
buses are Data, Address and Control buses. There are internal buses that the processor uses
to move data, instructions, configuration, and status between its subsystems.
8
Lecture note on Computer Architecture and Organization CSC303
a. The Data Bus provides a path for moving data among system modules. The data bus may
consist of 32, 64, 128, or even more separate lines, the number of lines being referred to as the width of
the data bus. Because each line can carry only 1 bit at a time, the number of lines determines how many
bits can be transferred at a time. The width of the data bus is a key factor in determining overall system
performance. A narrower bus width means that it will take more time to communicate a quantity of data as
compared to a wider bus. For example, if the data bus is 32 bits wide and each instruction is 64 bits long,
then the processor must access the memory module twice during each instruction cycle.
b. The Address Bus is used to designate the source or destination of the data on the data bus. For
example, if the processor wishes to read a word (8, 16, or 32 bits) of data from memory, it puts the address
of the desired word on the address lines. Clearly, the width of the address bus determines the maximum
possible memory capacity of the system. Address space refers to the maximum amount of memory and
I/O that a microprocessor can directly address.
If a microprocessor has a 16-bit address bus, it can address up to 216 = 65,536 bytes. Therefore, it has a
64 kB address space. i.e.
1byte = 8 bits….
1024bytes =>1kB
65,536bytes=>64kB
Furthermore, the address lines are generally also used to address I/O ports. Note that the address bus is
unidirectional (the microprocessor asserts requested addresses to the various devices), and the data bus
is bidirectional (the microprocessor asserts data on a write and the devices assert data on reads).
c. The Control Bus is used to control the access to and the use of the data and address lines.
Because the data and address lines are shared by all components, there must be a means of controlling
their use. Control signals transmit both command and timing information among system modules.
Timing signals indicate the validity of data and address information. Command signals specify operations
to be performed. Typical control lines include:
• Memory write: Causes data on the bus to be written into the addressed location
• Memory read: Causes data from the addressed location to be placed on the bus
• I/O write: Causes data on the bus to be output to the addressed I/O port
• I/O read: Causes data from the addressed I/O port to be placed on the bus
• Transfer ACK: Indicates that data have been accepted from or placed on the bus
• Bus request: Indicates that a module needs to gain control of the bus
• Bus grant: Indicates that a requesting module has been granted control of the bus
• Interrupt request: Indicates that an interrupt is pending
• Interrupt ACK: Acknowledges that the pending interrupt has been recognized
• Clock: Is used to synchronize operations
• Reset: Initializes all modules
9
Lecture note on Computer Architecture and Organization CSC303
2. Registers
Registers are temporary storage locations in the CPU. A register stores a binary value using a
group of latches. Although variables and pointers used in a program are all stored in memory,
they are moved to registers during periods in which they are the focus of operation. This is so
that they can be manipulated quickly. Once the processor shifts its focus, it stores the values it
doesn't need any longer back in memory. Registers may be used for several operations.
Discussion on types and usage of registers will follow in Module III of this document.
3. Buffers
A processor does not operate in isolation. Typically there are multiple processors supporting
the operation of the main processor. These include video processors, the keyboard and
mouse interface processor, and the processors providing data from hard drives and
CDROMs. There are also processors to control communication interfaces such as USB, and
Ethernet networks. These processors all operate independently, and therefore one may
finish an operation before a second processor is ready to receive the results.
If one processor is faster than another or if one processor is tied up with a process prohibiting
it from receiving data from a second process, then there needs to be a mechanism in place so
that data is not lost. This mechanism takes the form of a block of memory that can hold data
until it is ready to be picked up.
4. The Stack
During the course of normal operation, there will be a number of times when the processor
needs to use a temporary memory, a place where it can store a number for a while until it is
ready to use it again.
For example, every processor has a finite number of registers. If an application needs more
registers than are available, the register values that are not needed immediately can be stored
in this temporary memory. When a processor needs to jump to a subroutine or function, it
needs to remember the instruction it jumped from so that it can pick back up where it left off
when the subroutine is completed. The return address is stored in this temporary memory.
The stack is a block of memory locations reserved to function as temporary memory. It
operates much like the stack of plates at the start of a restaurant buffet line. When a plate is
put on top of an existing stack of plates, the plate that was on top is now hidden, one position
lower in the stack. It is not accessible until the top plate is removed. There are two main
operations that the processor can perform on the stack: it can either store the value of a
register to the top of the stack or remove the top piece of data from the stack and place it in a
register. Storing data to the stack is referred to as "pushing" while removing the top piece of
data is called "popping". The LIFO nature of the stack makes it so that applications must
remove data items in the opposite order from which they were placed on the stack.
10
Lecture note on Computer Architecture and Organization CSC303
5. I/O Ports
Input/output ports or I/O ports refer to any connections that exist between the processor
and its external devices. A USB printer or scanner, for example, is connected to the computer
system through an I/O port. The computer can issue commands and send data to be printed
through this port or receive the device's status or scanned images. Some I/O devices are
connected directly to the memory bus and act just like memory devices. Sending data to the
port is done by storing data to a memory address and retrieving data from the port is done by
reading from a memory address.
If the device is incorporated into the processor, then communication with the port is done by
reading and writing to registers. This is sometimes the case for simple serial and parallel
interfaces such as a printer port or keyboard and mouse interface.
One of the key features used to categorize a microprocessor is whether it supports reduced
instruction set computing (RISC) or complex instruction set computing (CISC). The distinction is
how complex individual instructions are the arrangement that exist for the same basic instruction.
In practical terms, this distinction directly relates to the complexity of a microprocessor’s instruction
decoding logic; a more complex instruction set requires more complex decoding logic. The
differences are tabulated in Table 1.
CISC RISC
Instructions and addressing modes are Simple instruction decode logic since there
complex hence complex instruction decode are few instructions to decode hence few
logic operand complexity
Not all instructions in CISC microprocessors The instructions that are not frequently used
are used with the same frequency. Only are removed so as to simplify the
some (core set) are called most of the time microprocessor control logic hence system
can perform faster, faster execution of
11
Lecture note on Computer Architecture and Organization CSC303
programs, leading to improved throughput
for the commonly used instruction and
increase overall performance.
The instructions that are used less often Reduces permutation of the decode logic
impose a burden on the entire system since instructions are reduced and only few
because there is increase in permutation of memory R/W operations
decode logic in a given clock cycle
CPU Architecture
A Central Processing Unit (CPU) is the brains of your computer. The main job of the CPU is to
carry out a diverse set of instructions through the fetch-decode-execute cycle to manage all parts
of your computer and run all kinds of computer programs.
A CPU is very fast at processing your data in sequence, as it has few heavyweight cores with high
clock speed. It’s like a Swiss army knife that can handle diverse tasks pretty well. The CPU is
latency-optimized and can switch between numbers of tasks real quick, which may create an
impression of parallelism. Nevertheless, fundamentally it is designed to run one task at a time.
GPU Architecture
A Graphics Processing Unit (GPU) is a specialized processor whose job is to rapidly manipulate
memory and accelerate the computer for a number of specific tasks that require a high degree of
parallelism.
As the GPU uses thousands of lightweight cores whose instruction sets are optimized for
dimensional matrix arithmetic and floating point calculations, it is extremely fast with linear algebra
and similar tasks that require a high degree of parallelism.
As a rule of thumb, if your algorithm accepts vectorized data, the job is probably well-suited
for GPU computing.
12
Lecture note on Computer Architecture and Organization CSC303
Architecturally, GPU’s internal memory has a wide interface with a point-to-point connection which
accelerates memory throughput and increases the amount of data the GPU can work with in a
given moment. It is designed to rapidly manipulate huge chunks of data all at once.
CPU GPU
Performs fewer instructions per clock Performs more instructions per clock
13
Lecture note on Computer Architecture and Organization CSC303
When comparing the two, it is important to understand that GPUs were designed to complement
CPUs, not to replace them. The CPU and the GPU work together to increase the amount and
speed of processed data.
A GPU cannot replace a CPU in a computer system. The CPU is necessary to oversee the
execution of tasks on the system. However, the CPU can delegate specific repetitive workloads
to the GPU and free its own resources necessary for maintaining the stability of the system and
the programs that are running
GPU uses many lightweight processing cores, leverages data parallelism, and has high memory
throughput. While the specific components will vary by model, fundamentally most modern GPUs
use single instruction multiple data (SIMD) stream architecture.
FLYNN’S TAXONOMY
Two data streams with two possible methods to process them leads to the 4 different categories
in Flynn’s Taxonomy. Let’s take a look at each, as illustrated in Figure 3
SISD stream is an architecture where a single instruction stream (e.g. a program) executes on one data
stream. This architecture is used in older computers with a single-core processor, as well as many simple
compute devices.
A SIMD stream architecture has a single control processor and instruction memory, so only one instruction
can be run at any given point in time. That single instruction is copied and ran across each core at the same
time. This is possible because each processor has its own dedicated memory which allows for parallelism
at the data-level (a.k.a. “data parallelism”).
The fundamental advantage of SIMD is that data parallelism allows it to execute computations quickly
(multiple processors doing the same thing) and efficiently (only one instruction unit).
MISD stream architecture is effectively the reverse of SIMD architecture. With MISD multiple instructions
are performed on the same data stream. The use cases for MISD are very limited today. Most practical
applications are better addressed by one of the other architectures.
MIMD stream architecture offers parallelism for both data and instruction streams. With MIMD, multiple
processors execute instruction streams independently against different data streams.
These trends are actively being researched and developed by scientists, engineers, and tech companies
around the world.
15
Lecture note on Computer Architecture and Organization CSC303
While some trends, such as quantum computing, are still in the experimental stage, others like in-memory
computing and reconfigurable architecture are already making their way into practical applications to drive
transformative changes across various industries. Quantum computing could revolutionize fields like
cryptography and drug discovery, while neuromorphic architecture could lead to breakthroughs in artificial
intelligence. In-memory computing could accelerate data-driven insights, and photonic computing might
reshape communication networks. Reconfigurable architecture could optimize computing resources for
different tasks, improving overall efficiency.
1. Quantum computing
Quantum computing utilizes principles of quantum mechanics to process information using quantum bits or
qubits. Unlike classical bits, qubits can exist in multiple states simultaneously, enabling quantum computers
to perform complex calculations exponentially faster than classical computers. Quantum computing has the
potential to revolutionize fields like cryptography, optimization, and materials science.
Besides, the number of potential states and interactions multiplies exponentially as the complexity of the
problem rises. Although it is still in its initial phase, quantum computing has the potential to change
industries, including cryptography, banking, and drug discovery. Building a quantum computer can be
done in several ways, such as using topological qubits, trapped ions, and superconducting circuits.
2. Neuromorphic architecture
Neuromorphic architecture is inspired by the human brain’s neural networks. It aims to create computer
systems that can process information and learn in ways similar to biological systems. By emulating the
brain’s efficiency and adaptability, neuromorphic architecture enhances machine learning and artificial
intelligence capabilities, enabling computers to perform tasks intuitively and efficiently.
Neuromorphic computing is motivated by the structure and operation of the human brain. It processes
information in a way that is fundamentally distinct from conventional computing by using specialised
hardware and software to replicate the brain’s neuronal structure. For instance, neuromorphic computing
relies on analogue rather than digital computations, it may be more energy-efficient. Because it can learn
from and adjust to new information in real-time, it can also be more versatile and adaptive. Several
computing fields, such as artificial intelligence, robotics, and sensory processing, stand to benefit from it.
3. In-memory computing
In-memory computing challenges the traditional separation of processing and memory units by performing
computations directly within the memory. This approach eliminates the need to transfer data between
components, leading to faster and more efficient data processing. In-memory computing is particularly
beneficial for data-intensive tasks like big data analytics and machine learning. memory technologies
address some of the major issues in computer architecture, including power consumption, performance,
and scalability.
16
Lecture note on Computer Architecture and Organization CSC303
4. Reconfigurable architecture
Reconfigurable architecture is a computer architecture combining some of the flexibility of software with
the high performance of hardware.
Reconfigurable architecture allows computer systems to dynamically adjust their hardware configurations
to optimize performance for specific tasks. This adaptability is crucial in environments with rapidly changing
workloads and applications. Reconfigurable architecture offers versatility and efficiency, making it well-
suited for diverse computing needs, including edge computing and scientific simulations. E.g FPGA.
5. Cloud-based computing
Cloud-based computing, commonly referred to as cloud computing, uses remote servers and networks in
place of just a local computer or server to store, administer, and process data and applications. Cloud
computing enables greater flexibility and scalability in computer resources because resources and
services are offered over the internet. Software as a Service (SaaS), Platform as a Service (PaaS), and
Infrastructure as a Service are the three primary divisions of cloud computing (IaaS).
6. Edge computing
This is a distributed computing paradigm that processes data at the network’s edge, nearer to the data
source. Edge computing enables data to be processed and analysed locally, on devices or systems closer
to the source of data generation, rather than transferring all of the data to a centralised data center or
cloud for processing. This method is frequently applied to decrease latency and speed up data processing.
CPU Pipelining
Microprocessor designers, in an attempt to squeeze every last bit of performance from their
designs, try to make sure that every circuit of the CPU is doing something productive at all times.
The most common application of this practice applies to the execution of instructions. It is
based on the fact that there are steps to the execution of an instruction, each of which uses
entirely different components of the CPU.
Assuming that the execution of a machine code instruction can be broken into three stages:
• Fetch – get the next instruction to execute from its location in memory
• Decode – determine which circuits to energize in order to execute the fetched instruction
• Execute – use the ALU and the processor to memory interface to execute the instruction
By comparing the definitions of the different components of the CPU shown with the needs of
these three different stages or cycles, it can be seen that three different circuits are used for these
three tasks.
17
Lecture note on Computer Architecture and Organization CSC303
• The internal data bus and the instruction pointer perform the fetch.
• The instruction decoder performs the decode cycle.
• The ALU and CPU registers are responsible for the execute cycle.
Once the logic that controls the internal data bus is done fetching the current instruction,
what's to keep it from fetching the next instruction? It may have to guess what the next instruction
is, but if it guesses right, then a new instruction will be available to the instruction decoder
immediately after it finishes decoding the previous one.
Once the instruction decoder has finished telling the ALU what to do to execute the current
instruction, what's to keep it from decoding the next instruction while it's waiting for the ALU to
finish? If the internal data bus logic guessed right about what the next instruction is, then the
ALU won't have to wait for a fetch and subsequent decode in order to execute the next
instruction.
This process of creating a queue of fetched, decoded, and executed instructions is called pipelining, and
it is a common method for improving the performance of a processor.
Therefore, a fast processor can be built by making the rate of execution of instruction fast. This can be
achieved by increasing the number of instructions that can be executed simultaneously. Some
CPUs break the fetch-decode execute cycle down into smaller steps, where some of these smaller steps
can be performed in parallel. This overlapping speeds up execution. i.e. The CPU fetches and executes
simultaneously.
This method, used by all current CPUs, is known as pipelining. This is a process whereby the CPU fetches
and executes at the same time, achieved by splitting the microprocessor into two;
(1) bus interface unit (BIU) and (2) execution unit (EU). It is a way of improving the processing power of
the CPU. The BIU access the memory and peripherals while the EU executes instructions. The idea of
pipelining is to have more than one instruction being processed by the processor at the same time.
Figure 4a shows the time-line sequence of the execution of five instructions on a non-pipelined processor.
Notice how a full fetch - decode-execute cycle must be performed on instruction 1 before instruction 2 can
be fetched.
This sequential execution of instructions allows for a very simple CPU hardware, but it leaves each portion
of the CPU idle for 2 out of every 3 cycles. During the fetch cycle, the instruction decoder and ALU are idle;
during the decode cycle, the bus interface and the ALU are idle; and during the execute cycle, the bus
interface and the instruction decoder are idle.
18
Lecture note on Computer Architecture and Organization CSC303
Figure 4b on the other hand shows the time-line sequence for the execution of five instructions using a
pipelined processor. Once the bus interface has fetched instruction 1 and passed it to the instruction
decoder for decoding, it can begin its fetch of instruction 2.
Notice that the first cycle in the figure only has the fetch operation. The second cycle has both the fetch and
the decode cycle happening at the same time. By the third cycle, all three operations are happening in
parallel. Without pipelining, five instructions take 15 cycles to execute. In a pipelined architecture, those
same five instructions take only 7 cycles to execute, a savings of over 50 %. In general, the number of
cycles it takes for a non-pipelined architecture using three cycles to execute an instruction is equal to three
times the number of instructions.
Resource Hazards. When an instruction is storing a value to memory, and another value is
being fetched from the memory, both need access to memory, this result in a conflict. This
19
Lecture note on Computer Architecture and Organization CSC303
occurs when two or more instructions that are already in the pipeline need the same
resources. It can also occur when multiple instructions are ready to enter the execute phase, and
there exist only a single ALU. This can be taken care of in 2 ways. (1) Instruction execution will
continue while instruction fetch will wait. (2) providing more resources such as multiple ports into
main memory and multiple ALU
Data Hazards. This happens when the result of one instruction, not yet available is to be used as
an operand for a following instruction. This is a situation when there is a conflict in the access of
an operand location i.e. two or more instructions accessing a particular register or memory
operand {NB in sequential processing, this is not a problem but in parallel, the values will be
different}. This can be resolved by altering the flow of execution in a program i.e. specialized
hardware can be used to detect the conflict and route data through special paths that exists
between various stages in the pipeline, thereby reducing the time needed for the instruction to
access the required operand.
Control Hazard: This occurs when the pipeline makes the wrong decision on a branch prediction,
and brings a wrong instruction into the pipe. A conditional branch instruction makes the
address of the next instruction to be fetched unknown. After a conditional branch, predicting
the instruction that will be needed next becomes a problem. This may be overcome by (i)
rearranging the machine code to cause a delayed branch. (ii) Fetching the beginning and branch
instruction at the same time and save the branch until it is actually needed of which at that time
the true execution path will be known.
20
Lecture note on Computer Architecture and Organization CSC303
- better performance with reduced power
The first Microprocessor (4004) was designed by Intel Corporation which was founded by Moore
and Noyce in 1968.
In the early years, Intel focused on developing semiconductor memories (DRAMs and EPROMs)
for digital computers.
In 1969, a Japanese Calculator manufacturer, Busicom approached Intel with a design for a
small calculator which need 12 custom chips. Ted Hoff, an Intel Engineer thought that a general
purpose logic device could replace the multiple components.
This idea led to the development of the first so called microprocessor. So, Microprocessors
started with a modest beginning of drivers for calculators.
With developments in integration technology Intel was able to integrate the additional chips like
8224 clock generator and the 8228 system controller along with 8080 microprocessor within a
single chip and released the 8 bit microprocessor 8085 in the year 1976.
The 8085microprocessor consisted of 6500 MOS transistors and could work at clock frequencies
of 3-5MHz. The other improved 8 bit microprocessors include Motorola MC 6809, Zilog Z-80 and
RCA COSMAC.
In 1978, Intel introduced the 16 bit microprocessor 8086 and 8088 in 1979. IBM selected the
Intel 8088 for their personal computer (IBM-PC). 8086 microprocessor made up of 29,000 MOS
transistors and could work at a clock speed of 5-10 MHz. It has a 16-bit ALU with 16-bit data bus
and 20-bit address bus. It can address up to 1MB of address space.
The pipelining concept was used for the first time to improve the speed of the processor. It had a
pre-fetch queue of 6 instructions where in the instructions to be executed were fetched during the
execution of an instruction. It means 8086 architecture supports parallel processing.
The 8088 microprocessor is similar to 8086 processor in architecture, but the basic difference is
it has only 8-bit data bus even though the ALU is of 16-bit.
In 1982 Intel released another 16-bit processor called 80186 designed by a team under the
leadership of Dave Stamm. This is having higher reliability and faster operational speed but at a
lower cost. It had a pre-fetch queue of 6-instructions and it is suitable for high volume applications
such as computer workstations, word-processor and personal computers.
It is made up of 134,000 MOS transistors and could work at clock rates of 4 - 6 MHz.
Intel released another 16 bit microprocessor 80286 having 1, 34,000 transistors in 1981. It was
used as CPU in PC-ATs in 1982. It is the second generation microprocessor, more advanced to
80186 processor. It could run at clock speeds of 6 to 12.5 MHz. It has a 16-bit data bus and 24bit
address bus, so that it can address up to 16MB of address space and 1GB of virtual memory.
21
Lecture note on Computer Architecture and Organization CSC303
Intel introduced the concept of protected mode and virtual mode to ensure proper operation. It
also had on-chip memory management unit (MMU).This was popularly called as Intel 286 in those
days.
In 1985, Intel released the first 32 bit processor 80386, with 275,000 transistors. It has 32-bit data
bus and 32-bit address bus so that it can address up to a total of 4GB memory also a virtual
memory space of 64TB. It could process five million instructions per second and could work with
all popular operating systems including Windows. It is incorporated with a concept called paging
in addition to segmentation technique. It uses a math co-processor called 80387.
Intel introduced 80486 microprocessor with a built-in maths co-processor and with 1.2 million
transistors. It could run at the clock speed of 50 MHz. This is also a 32 bit processor but it is
twice as fast as 80386.The additional features in 486 processor are the built-in Cache and built-
in math co-processors. The address bus here is bidirectional because of presence of cache
memory.
On 19th October, 1992, Intel released the Pentium-I Processor with 3.1 million transistors. So,
the Pentium began as fifth generation of the Intel x86 architecture. This Pentium was backward
compatible while offering new features. The revolutionary technology is that the CPU is able to
execute two instructions at the same time. This is known as super scalar technology. The Pentium
uses a 32-bit expansion bus, however the data bus is 64 bits.
The 7.5 million transistors based chip, Intel Pentium II processor was released in 1997. It works
at a clock speed of 300M.Hz. Pentium II uses the Dynamic Execution Technology which consists
of three different facilities namely, Multiple branch prediction, Data flow analysis, and Speculative
execution unit. Another important feature is a thermal sensor located on the mother board which
monitor the die temperature of the processor.
Intel Celeron Processors, the Pentium-III processor with 9.5 million transistors was introduced in
1999. It uses dynamic execution micro-architecture, a unique combination of multiple branch
prediction, dataflow analysis and speculative execution.
The Pentium III has improved MMX and processor serial number feature. The improved MMX
enables advanced imaging, 3D streaming audio and video, and speech recognition and
enhanced Internet facility.
Pentium-IV with 42 million transistors and 1.5 GHz clock speed was released by Intel in
November 2000. The Pentium -IV processor has a system bus with 3.2 G-bytes per second of
bandwidth. This high bandwidth is a key reason for applications that stream data from memory.
This bandwidth is achieved with 64 –bit wide bus capable of transferring data at a rate of 400MHz.
The Pentium -IV processor enables real-time MPEG2 video encoding and near real-time MPEG4
encoding, allowing efficient video editing and video conferencing.
Intel with partner Hewlett-Packard developed the next generation 64-bit processor architecture
called IA-64.This first implementation was named Itanium. Itanium processor which is the first in
22
Lecture note on Computer Architecture and Organization CSC303
a family of 64 bit products was introduced in the year 2001.The Itanium processor was specially
designed to provide a very high level of parallel processing, to enable high performance without
requiring very high clock frequencies. The Itanium processor can handle up to 6 simultaneous 64
–bit instructions per clock cycle.
The Itanium II is an IA-64 microprocessor developed jointly by Hewlett-Packard (HP) and Intel
and released on July 8, 2002. It is theoretically capable of performing nearly 8 times more work
per clock cycle than other CISC and RISC architectures due to its parallel computing micro-
architecture.
Pentium 4EE was released by Intel in the year 2003 and Pentium 4E was released i n the year
2004.
The Pentium Dual-Core brand was used for mainstream X86-architecture microprocessors from
Intel from 2006 to 2009. The 64 bit Intel Core2 was released on July 27, 2006. In terms of features,
price and performance at a given clock frequency, Pentium Dual Core processors were
positioned above Celeron but below Core and Core 2 microprocessors in Intel's product range.
The Pentium Dual Core, which consists of 167 million transistors was released on January 21,
2007. Intel Core Duo consists of two cores on one die, a 2 MB L2 cache shared by both cores,
and an arbiter bus that controls L2 cache.
Core 2 Quad processors are multi-chip modules consisting of two dies similar to those used in
Core 2 Duo, forming a quad-core processor.
In September 2009, new Core i7 models based on the Lynnfield desktop quad-core processor
and the Clarksfield quad-core mobile were added, The first six-core processor in the Core lineup
is the Gulftown, which was launched on March 16, 2010. Both the regular Core i7 and the Extreme
Edition are advertised as five stars in the Intel Processor Rating.
– It is a 16-bit microprocessor.
– 8086 has a 20 bit address bus can access up to 220 memory locations (1 MB).
– It can support up to 64K I/O ports.
– It provides 14, 16 -bit registers.
– It has multiplexed address and data bus AD0- AD15 and A16 – A19.
– It requires single phase clock with 33% duty cycle to provide internal timing.
– 8086 is designed to operate in two modes, Minimum and Maximum.
– It can prefetches up to 6 instruction bytes from memory and put them in instr queue in order to
speed up instruction execution.
– It requires +5V power supply.
– A 40 pin dual in line package.
8086 employs parallel processing. The 8086 has 2 parts which operates at the same time; the
bus interface unit BIU and execution unit EU as seen in the Figure 8 below
23
Lecture note on Computer Architecture and Organization CSC303
– The BIU performs all bus operations such as instruction fetching, reading and writing
operands for memory and calculating the addresses of the memory operands.
– The instruction bytes are transferred to the instruction queue.
– It provides a full 16 bit bidirectional data bus and 20 bit address bus.
– The bus interface unit is responsible for performing all external bus operations.
Specifically it has the following functions:
– Instruction fetch , Instruction queuing, Operand fetch and storage, Address calculation
relocation and Bus control.
– The BIU uses a mechanism known as an instruction queue to implement a pipeline architecture.
The BIU contains the following registers:
The BIU fetches instructions using the CS and IP, written CS:IP, to contract the 20-bit address.
Data is fetched using a segment register (usually the DS) and an effective address (EA) computed
by the EU depending on the addressing mode.
24
Lecture note on Computer Architecture and Organization CSC303
– The EU extracts instructions from the top of the queue in the BIU, decodes them, generates
operands if necessary, passes them to the BIU and requests it to perform the read or write by
cycles to memory or I/O and perform the operation specified by the instruction on the operands.
– During the execution of the instruction, the EU tests the status and control flags and updates
them based on the results of executing the instruction.
– If the queue is empty, the EU waits for the next instruction byte to be fetched and shifted to top
of the queue.
– When the EU executes a branch or jump instruction, it transfers control to a location
corresponding to another set of sequential instructions.
– Whenever this happens, the BIU automatically resets the queue and then begins to fetch
instructions from this new location to refill the queue.
25
Lecture note on Computer Architecture and Organization CSC303
MODULE TWO
An instruction set architecture (ISA), is the part of the computer architecture related to
programming, including the native data types, instructions, registers, addressing modes, memory
architecture, interrupt and exception handling, and external I/O. The ISA also includes a
specification of the set of opcodes (machine language) - the native commands for a particular
processor. ISA is the hardware – software interface
Instruction set architecture (ISA) describes the processor in terms of what the
programmer sees, i.e. the instructions and registers. Two machines may have the
same ISA, but different organizations. Organization is concerned with the internal design
of the processor, the design of the bus system and its interfaces, the design of memory
and so on. Two machines with the same organization may have different hardware
implementations.
ISA is the interface between software and hardware. It is an abstraction which hides
hardware complexity from software through a set of operations and devices. One of the
crucial features of any processor is its instruction set, i.e. the set of machine code
instructions that the processor can carry out.
Each processor has its own unique instruction set specifically designed to make best use
of the capabilities of that processor. The actual number of instructions provided ranges
from a few dozen for a simple 8-bit microprocessor to several hundred for a 32-bit (virtual
26
Lecture note on Computer Architecture and Organization CSC303
address extension) VAX processor. However, it should be pointed out that a large
instruction set does not necessarily imply a more powerful processor.
An instruction set is a list of all the instructions that a processor can execute.
27
Lecture note on Computer Architecture and Organization CSC303
iv. A reference for the next instruction, to be fetched and executed. The next instruction
which is to be executed is normally the next instruction following the current instruction in
the memory. Therefore, no explicit reference to the next instruction is provided.
Where are those operands located? In the memory or in the CPU registers or in the I/O
device.
If the operands are located in the registers then an instruction can be executed faster than
that of the operands located in the memory. The main reason here is that memory access
time is higher in comparison to the register access time.
ii. Data Storage/Retrieval Instructions: Since the data processing operations are
normally performed on the data stored in CPU registers, we need instructions to bring
data to and from memory to registers. These are called data storage/retrieval instructions.
Examples of data storage and retrieval instructions are load and store instructions.
iii. Data Movement Instructions: These are basically input/output instructions. They
are required to bring in programs and data from various devices to memory or to
communicate the results to the input/output devices. Some of these instructions can be:
start, halt, test etc.
iv. Control Instructions: These instructions are used for testing the status of
computation through Processor Status Word (PSW). Branch instruction.
Addresses: Addresses are treated as a form of data which is used in the calculation of
actual physical memory address of an operand. In most of the cases, the addresses
provided in instruction are operand references and not the actual physical memory
addresses.
Numbers: All machines provide numeric data types. One special feature of numbers used
in computers is that they are limited in magnitude, and hence the underflow and overflow
may occur during arithmetical operations on these numbers. The maximum and minimum
magnitude is fixed for an integer number while a limit of precision of numbers and
exponent exist in the floating point numbers. The three numeric data types which are
common in computers are:
Characters: Another very common data type is the character or string of characters. The
most widely used character representation is ASCII(American National Standard Code of
Information Interchange). It has 7 bits for coding data pattern which implies 128 different
characters.
Some of these characters are control characters which may be used in data
communication. The eighth bit of ASCII may be used as a parity bit. One special mention
about ASCII which facilitates the conversion of a 7 bit ASCII and a 4 bit packed decimal
number is that the last four digits of ASCII number are binary equivalents of digi ts 0-9.
Logical Data: In general a data word or any other addressable unit such as byte, half
word etc. are treated as a single unit of data. But can we consider an n-bit data unit
consisting of n items of 1 bit each? If we treat each bit of an n-bit data as an item then it
can be considered to be logical data. Each of these n items can have a value 0 or 1. What
are the advantages of such a bit oriented view of data? The advantages of such a view
will be:
Instruction Format
Therefore, any instruction issued by the processor must carry at least two types of
information.
30
Lecture note on Computer Architecture and Organization CSC303
These are the operation to be performed, encoded in what is called the op-code field,
and the address information of the operand on which the operation is to be performed,
encoded in what is called the address field.
i. three-address,
ii. two-address,
iii. one-and-half-address,
iv. one-address, and
v. zero-address.
This instruction indicates that the operation to be performed is addition. It also indicates
that the values to be added are those stored in registers R1 and R2, that the results should
be stored in register R3.
An example of a three-address instruction that refers to memory locations may take the
form ADD A,B,C. The instruction adds the contents of memory location A to the contents
of memory location B and stores the result in memory location C.
ADD R1, R2. This instruction adds the contents of register R1 to the content of R2 and
stores the results in register R2.
The original contents of register R2 are lost due to this operation while the original
contents of register R1 remain intact.
31
Lecture note on Computer Architecture and Organization CSC303
A similar instruction that uses memory locations instead of registers can take the form
ADD A, B. In this case, the contents of memory location A are added to the contents of
memory location B and the result is used to override the original contents of memory
location B.
In this case the instruction implicitly refers to a register, called the Accumulator Racc ,
such that the contents of the accumulator is added to the contents of the register R1 and
the results are stored back into the accumulator Racc
If a memory location is used instead of a register, then an instruction of the form ADD B
is used. In this case, the instruction adds the content of the accumulator Racc to the
content of memory location B and stores the result back into the accumulator Racc
Between the two- and the one-address instruction, there can be a one-and-half
address instruction.
Consider, for example, the instruction ADD B, R1. In this case, the instruction adds the
contents of register R to the contents of memory location B and stores the result in register
R1
Owing to the fact that the instruction uses two types of addressing, that is, a register and
a memory location, it is called a one-and-half-address instruction. This is because register
addressing needs a smaller number of bits than those needed by memory addressing.
zero-address instructions.
These are the instructions that use stack operation. A stack is a data organization
mechanism in which the last data item stored is the first data item retrieved. Two specific
operations can be performed on a stack. These are the push and the pop operations. A
special register called stack pointer (SP), is used to indicate the stack location that can
be addressed. The classes of instruction is summarized in the table below.
32
Lecture note on Computer Architecture and Organization CSC303
MEMORY OPERATIONS
The main memory can be modeled as an array of millions of adjacent cells, each capable of storing
a binary digit (bit), having value of 1 or 0. These cells are organized in the form of groups of fixed
number, say n, of cells that can be dealt with as an entity.
An entity consisting of 8 bits is called a byte. This address will be used to determine the location
in the memory in which a given word is to be stored. This is called a memory WRITE operation.
Similarly, the address will be used to determine the memory location from which a word is to be
retrieved from the memory. This is called a memory READ operation.
During a memory write operation a word is stored into a memory location whose address is
specified. During a memory read operation a word is read from a memory location whose address
is specified. Typically, memory read and memory write operations are performed by the central
processing unit (CPU).
Three basic steps are needed in order to perform a memory READ operation:
1.The address of the location from which the word is to be read is loaded into the MAR.
2.A READ signal issued by the CPU indicating that the word whose address is in the MAR is to be
read into the MDR.
3. The required word will be loaded by the memory into the MDR ready for use by the CPU.
33
Lecture note on Computer Architecture and Organization CSC303
Similarly, the 3 basic steps needed in order for the CPU to perform a WRITE operation into a
specified memory location:
1. The word to be stored into the memory location is first loaded by the CPU into a specified
register, called the memory data register (MDR).
2.The address of the location into which the word is to be stored is loaded by the CPU into a
specified register, called the memory address register (MAR).
3. WRITE signal is issued by the CPU indicating that the word stored in the MDR is to be stored
in the memory location whose address in loaded in the MAR.
FETCH-EXECUTE CYCLE
Fetch and Execute are the fundamental operations of the processor. The fetch-
decode execute cycle represents the steps that a computer follows to run a program. The
program which is to be executed is a set of instructions that is stored in the memory,
hence, the CPU executes the instructions that it finds in the computer’s memory. In order
to execute an instruction;
- the CPU must first fetch (transfer) the instruction from memory into one of its
registers. -the CPU then decodes the instruction, i.e. it decides which instruction
has been fetched and -finally it executes (carries out) the instruction.
The CPU then repeats this procedure, i.e. it fetches an instruction, decodes and executes
it. This process is repeated continuously and is known as the fetch-execute cycle.
This cycle begins when the processor is switched on and continues until the CPU is halted
(via a halt instruction, e.g. 8086 HLT instruction or the machine is switched off). The fetch-
execute cycle operates by first fetching an instruction.
Instruction Fetch
34
Lecture note on Computer Architecture and Organization CSC303
An instruction fetch involves the reading of an instruction from the memory location(s) to
the CPU. The execution of this instruction may involve several operations, depending on
the nature of the instruction. The processing needed for a single instruction (fetch and
execution) is referred to as an instruction cycle
- The Program Counter (PC) keeps track of the instruction that is to be executed
next after the execution of an on-going instruction. i.e. PC always contains the
address of the next instruction to be executed. A program counter is used for a fetch
cycle in a typical CPU.
- The instructions are loaded into the Instruction Register (IR), before their
execution. i.e. the IR holds the instruction to be executed.
Instruction Execution
The instruction execution takes place in the CPU registers. The following are CPU
registers:
• Memory Address Register (MAR): It specifies the address of the memory location
from which the data or instruction is to be accessed (for a read operation) or to
which the data is to be stored (for a written operation).
35
Lecture note on Computer Architecture and Organization CSC303
Memory Buffer Register (MBR): It is a register which contains the data to be
written in the memory (for a written operation) or it receives the data from the
memory (for read operation).
• Program Counter (PC): It keeps track of the instruction that is to be executed next,
after the execution of an on-going instruction.
• Instruction Register (IR): the instructions are loaded here before their execution.
Table 4 (a) to (d) shows the evolution and how microprocessors have grown faster and
much more complex.
36
Lecture note on Computer Architecture and Organization CSC303
It is worthwhile to list some of the highlights of the evolution of the Intel product line:
8080: The world’s first general- purpose microprocessor. This was an 8-bit machine, with
an 8-bit data path to memory. The 8080 was used in the first personal computer, the Altair.
-8086: A far more powerful, 16-bit machine. In addition to a wider data path and larger
registers, the 8086 sported an instruction cache, or queue, that prefetches a few
instructions before they are executed. A variant of this processor, the 8088, was used in
IBM’s first personal computer, securing the success of Intel.
-80286: This extension of the 8086 enabled addressing a 16-MB memory instead of just 1
MB.
-80386: Intel’s first 32-bit machine, and a major overhaul of the product. With a 32-bit
architecture, the 80386 rivaled the complexity and power of minicomputers and
mainframes introduced just a few years earlier. This was the first Intel processor to support
multitasking, meaning it could run multiple programs at the same time.
-80486: The 80486 introduced the use of much more sophisticated and powerful cache
technology and sophisticated instruction pipelining. The 80486 also offered a built-in math
coprocessor, offloading complex math operations from the main CPU.
37
Lecture note on Computer Architecture and Organization CSC303
-Pentium: With the Pentium, Intel introduced the use of superscalar techniques, which
allow multiple instructions to execute in parallel.
-Pentium Pro: The Pentium Pro continued the move into superscalar organization begun
with the Pentium, with aggressive use of register renaming, branch prediction, data flow
analysis, and speculative execution.
-Pentium II: The Pentium II incorporated Intel MMX technology, which is designed
specifically to process video, audio, and graphics data efficiently.
-Pentium III: The Pentium III incorporates additional floating point instructions: The
Streaming SIMD Extensions (SSE) instruction set extension added 70 new instructions
designed to increase performance when exactly the same operations are to be performed
on multiple data objects. Typical applications are digital signal processing and graphics
processing.
-Pentium 4: The Pentium 4 includes additional floating point and other enhancements for
multimedia.
-Core: This is the first Intel x86 microprocessor with a dual core, referring to the
implementation of two cores on a single chip.
- Core 2: The Core 2 extends the Core architecture to 64 bits. The Core 2 Quad provides
four cores on a single chip. More recent Core offerings have up to 10 cores per chip. An
important addition to the architecture was the Advanced Vector Extensions instruction set
that provided a set of 256-bit, and then 512 bit, instructions for efficient processing of vector
data.
Although the organization and technology of the x86 machines have changed dramatically
over the decades, the instruction set architecture has evolved to remain backward
compatible with earlier versions. Thus, any program written on an older version of the x86
architecture can execute on newer versions.
Registers are extremely fast memory locations within the CPU that are used to create and
store the results of CPU operations and other calculations. Computers differ in register
sets, number of registers, register types, and the length of each register. They also differ
in the usage of each register.
General-purpose registers can be used for multiple purposes and assigned to a variety of
functions by the programmer.
38
Lecture note on Computer Architecture and Organization CSC303
Special-purpose registers are restricted to only specific functions. In some cases, some
registers are used only to hold data and cannot be used in the calculations of operand
addresses.
Therefore, the Internal Registers of 8086 is shown in the figure 7 below
• The 8086 has the following groups of the user accessible internal registers. These are
- Instruction pointer(IP)
_ Four General purpose registers(AX,BX,CX,DX)
_ Four pointer (SP,BP,SI,DI)
_ Four segment registers (CS,DS,SS,ES)
_ Flag Register(FR)
• The 8086 has a total of fourteen 16-bit registers including a 16 bit register called the
status register (flag register), with 9 of bits implemented for status and control flags.
Segment Registers
1) Code segment (CS) is a 16-bit register containing address of 64 KB segment with processor
instructions. The processor uses CS segment for all accesses to instructions referenced by
instruction pointer (IP) register.
2) Stack segment (SS) is a 16-bit register containing address of 64KB segment with program
stack. By default, the processor assumes that all data referenced by the stack pointer (SP) and
base pointer (BP) registers is located in the stack segment. SS register can be changed directly
using POP instruction.
39
Lecture note on Computer Architecture and Organization CSC303
3) Data and Extra segment (DS and ES) is a 16-bit register containing address of 64KB segment
with program data. By default, the processor assumes that all data referenced by general
registers (AX, BX, CX, and DX) and index register (SI, DI) is located in the data and Extra
segment.
Data Registers
1) AX (Accumulator)
• It consists of two 8-bit registers AL and AH, which can be combined together and used as a 16-
bit register AX. AL in this case contains the low order byte of the word, and AH contains the high-
order byte.
*Accumulator can be used for I/O operations and string manipulation.
2) BX (Base register)
• It is consists of two 8-bit registers BL and BH, which can be combined together and used as a
16-bit register BX. BL in this case contains the low order byte of the word, and BH contains the
high-order byte.
• BX register usually contains a offset for data segment.
3) CX (Count register)
• It is consists of two 8-bit registers CL and CH, which can be combined together and used as a
16-bit register CX. When combined, CL register contains the low-order byte of the word, and CH
contains the high-order byte.
• Count register can be used in Loop, shift/rotate instructions and as a counter in string
manipulation.
4) DX (Data register)
• It is consists of two 8-bit registers DL and DH, which can be combined together and used as a
16-bit register DX. When combined, DL register contains the low-order byte of the word, and DH
contains the high-order byte.
• DX can be used as a port number in I/O operations.
• In integer 32-bit multiply and divide instruction the DX register contains high-order word of the
initial or resulting number.
Pointer register
1. Stack Pointer (SP) is a 16-bit register is used to hold the offset address for stack segment.
2. Base Pointer (BP) is a 16-bit register is used to hold the offset address for stack segment.
i. BP register is usually used for based, based indexed or register indirect
addressing.
ii. The difference between SP and BP is that the SP is used internally to store the address
in case of interrupt and the CALL instruction.
40
Lecture note on Computer Architecture and Organization CSC303
3. Source Index (SI) and Destination Index (DI). These two 16-bit register is used to hold the
offset address for DS and ES in case of string manipulation instruction.
i. SI is used for indexed, based indexed and register indirect addressing, as well as a source
data addresses in string manipulation instructions.
ii. DI is used for indexed, based indexed and register indirect addressing, as well as a
destination data addresses in string manipulation instructions.
Flag Register
A flag is a 16-bit register containing 9 one bit flags. i.e. the Flag Register is addressable by bit as
shown in the figure below. Each of the bit depicts a status flag of the microprocessor.
i. Overflow Flag (OF) :This flag is set if an overflow occurs. i.e. if the result of a signed operation
is large enough to be accommodated in a destination register.
ii. Direction Flag (DF) : This is used by string manipulation instructions. If this flag bit is ‘0’, the
string is processed beginning from the lowest address to the highest address. i.e. auto-
incrementing mode.
Otherwise, the string is processed from the highest address towards the lowest address, i.e. auto-
decrementing mode.
iii. Interrupt-enable Flag (IF) : If this flag is set, the maskable interrupts are recognized by the
CPU. Otherwise they are ignored. Setting this bit enables maskable interrupts.
iv. Single-step Flag (TF) : If this flag is set, the processor enters the single step execution mode.
In other words, a trap interrupt is generated after execution of each instruction. The processor
executes the current instruction and the control is transferred to the Trap interrupt service routine.
v. Sign Flag (SF) : _ This flag is set when the result of any computation is negative. For signed
computations, the sign flag equals the MSB of the result.
vii. Auxiliary carry Flag (AF) : set if there was a carry from or borrow to bits 0-3 in the AL register.
41
Lecture note on Computer Architecture and Organization CSC303
viii. Parity Flag (PF) : set if parity (the number of "1" bits) in the low-order byte of the result is
even.
ix. Carry Flag (CF) :This flag is set when there is a carry out of MSB in case of addition or, a
borrow in case of subtraction. For example. When two numbers are added, a carry may be
generated out of the most significant bit position. The carry flag, in this case, will be set to 1’. In
case, no carry is generated, it will be ‘0.
- Segment Pointers used in order to support segmentation, the address issued by the
processor should consist of a segment number (base) and a displacement (or an
offset) within the segment. A segment register holds the address of the base of the
segment.
42
Lecture note on Computer Architecture and Organization CSC303
- Stack Pointer. A stack is a data organization mechanism in which the last data item stored
is the first data item retrieved. Two specific operations can be performed on a stack.
These are the Push and the Pop operations. The stack pointer (SP) is used to indicate
the stack location that can be addressed. In the stack push operation, the SP value is
used to indicate the location (called the top of the stack).
ADDRESSING MODE
This refers to the different ways in which operands can be addressed. Addressing mode differ in
the way the address information of operands is specified. The basic addressing modes are;
i. IMMEDIATE ADDRESSING
- The operand is given explicitly as part of the instruction. No memory access is required. Also
operand could follow immediately after the instruction. According to this addressing mode, the
value of the operand is (immediately) available in the instruction itself. For example, the case of
loading the decimal value 9000 into a register Ri. This operation can be performed using an
instruction LOAD 9000, Ri.
In this instruction, the operation to be performed is to load a value into a register. The
source operand is (immediately) given as 9000, and the destination is the register Ri.
43
Lecture note on Computer Architecture and Organization CSC303
- The effective address (EA) of the operand is generated by adding an index register value (X) to
the direct address (DA) i.e. EA = X + DA
In this addressing mode, the address of the operand is obtained by adding a constant to the
content of a register, called the index register. For example, the instruction LOAD 5+ [DI], AX.
This instruction loads register AX with the contents of the memory location whose address is the
sum of the contents of register DI and the value 5. Index addressing is indicated in the instruction
by including the name of the index register in parentheses and symbol X indicate the constant
to be added.
The effective address of the operand is generated by adding a constant to the direct address
indicated in the instruction. Hence, the address of the operand is obtained by adding a constant
to the content of a base register indicated in parenthesis. For example, the instruction
MOV DX, [BP] +10
This instruction moves the content of register (BP plus 10) into register DX i.e. the effective
address of operand to be moved into DX is obtained by summing the constant 10 with the value
in register BP.
44
Lecture note on Computer Architecture and Organization CSC303
45
Lecture note on Computer Architecture and Organization CSC303
The Intel processors architecture are 80x86/Pentium, where x ≥ 3, processors. Its features
include:
Increased data bus from 16bits to 32 bits
IA-32 processors are 32 bit integrated processors that can operate on integer and floating
point data
It is backward compatible with 16 bit 8086 in real mode
IA-32 operates in real mode by default, hence it has to be switched to protected mode
Pentium II processors, as a family of IA-32, support MMX, i.e. multimedia data structure
which is SIMD (single instruction multiple data) in nature
(iv) Relative base addressing (v) Relative Index addressing (vi) Based Index addressing
(vii) Relative based index addressing (viii) Scaled index addressing
▪ Memory address of the operand is pointed to by the register contents of either a base register
(BX, BP), an index register (SI, DI), or any of the general purpose 32 bit registers (EAX, EBX,
ECX, EDX, EBP, ESI, EDI)
▪ Operand size: byte, word, double word
e.g MOV AL, [ECX]
(v) Base + Index Addressing
-is a register indirect addressing where a combination of (base + index) registers is used as an
operand memory address pointer.
- any pair of the general purpose 32 bit registers (EAX, EBX, ECX, EBP, EDI, ESI) can be used
- the first register is the base and the second is the index regardless of any of the 32 bit general
purpose register used -e.g MOV [EAX + ECX ] , BL
Lecture note on Computer Architecture and Organization CSC303
Example: Find the effective address in each of the following cases. Assume that ESI =
200h, ECX =100h, EBX = 50h and EDI = 100h.
1. MOV AX, [2000 + ESI *4] 3. MOV ECX, [2400 + EBX *4]
2. MOV AX, [5000 + ECX *2] 4. MOV DX, [100 + EDI*8]
Lecture note on Computer Architecture and Organization CSC303
Solution:
1. 2000h + 200h x 4 = 2000h + 800h = 2800h. Therefore the address of the operand moved
into
AX is DS:2800h
2. Effective address = 5000h + 100 x 2 = 5000h + 200h = 5200h.
3. EA= 2400 +50x4= 2400+200= 2600h
4. EA = 100 + 100x8= 100+800 =900h
The flag bits affected by the ADD instruction are carry flag (CF), parity flag (PF), auxiliary carry
flag (AF), zero flag (ZF), sign flag (SF) and overflow flag (OF).
CF- This flag is set whenever there is a carry out either from d7 after an 8bit operation or from
d15 after a 16bit data operation.
PF – After certain operations, the parity of the result’s low order byte is checked. If the byte has
an even number of 1’s, the parity flag is set to 1, otherwise it is cleared i.e. 0. Parity is checked
for the lower 8 bits only in a 16 bit operation
AF – If there is a carry from d3 to d4 of an operation, this bit is set, otherwise it is cleared.
ZF – Is set to 1 if the result of an arithmetic or logical operation is zero, otherwise, it is cleared.
SF – the binary representation of signed numbers uses the most significant bit as the sign bit.
After arithmetic or logic operations, the status of this sign bit is copied into the SF, thereby
indicating the sign of the result.
OF – is set whenever the result of a signed number operation is too large, causing the high order
bit to overflow into the sign bit.
Examples:-Show how the flag register is affected by the addition of 38h and 2Fh in the following
lines of code. MOV BH, 38h ; ADD BH, 2Fh
38h 0011 1000
2Fh 00101111
67h 01100111
3. How would the status flags be set after the processor performed
the 8-bit addition of 101101012 and 100101102?
Lecture note on Computer Architecture and Organization CSC303
Exercise:
Logical Instructions
- AND destination, Source
E.g. MOV BL, 35h
AND BL, 0Fh
35h 0011 0101
0Fh
- OR destination, source
e.g. MOV AX, 0504 h
OR AX, DA68h
0504h 0000 0101 0000 0100
DA68h
DF6Ch
Lecture note on Computer Architecture and Organization CSC303
- XOR destination, source. When the inputs are different, you have 1 and when they are the
same, you have 0.
Logical Shift – Right and Left. E.g. show the result of SHR in the flowing instructions.
MOV AL, 9Ah
MOV CL, 3 ;move 3 into CL
SHR AL, CL ;move AL 3 times
9Ah = 10011010
01001101 1st shift
00100110 2nd shift
00010011 3rd shift AL = 13 h
Exercise: Use operand 4FCAh and C237h to perform (a) AND (b) OR (c) XOR