You are on page 1of 317

Digital systems

Digital systems are used extensively in computation and data processing, control
systems, communications, and measurement. Because digital systems are capable of
greater accuracy and reliability than analog systems, many tasks formerly done by
analog systems are now being performed digitally.

In a digital system, the physical quantities or signals can assume only discrete
values, while in analog systems the physical quantities or signals may vary continuously
over a specified range. For example, the output voltage of a digital system might be
constrained to take on only two values such as 0 volts and 5 volts,

while the output voltage from an analog system might be allowed to assume any
value in the range -10 volts to +10 volts.

Because digital systems work with discrete quantities, in many cases they can be
designed so that for a given input, the output is exactly correct. For example, if we
multiply two 5-digit numbers using a digital multiplier, the 10-digit product will be
correct in all 10 digits. On the other hand, the output of an analog multiplier might
have an error ranging from a fraction of one percent to a few percent depending
on the accuracy of the components used in construction of the multiplier.
Switching circuit

Many of a digital system’s take the form of a switching circuit. A switching circuit has one
or more inputs and one or more outputs which take on discrete values.
Decimal Notation
953.7810 = 9x102 + 5x101 + 3x100 + 7x10-1 + 8x10-2

Binary
1011.112 = 1x23 + 0x22 + 1x21 + 1x20 + 1x2-1 + 1x2-2
= 8 + 0 + 2 + 1 + 1/2 + 1/4
= 11.7510
EXAMPLE: Convert 5310 to binary.

Conversion (a)
EXAMPLE: Convert .62510 to binary.

Conversion (b)
EXAMPLE: Convert 0.710 to binary.

Conversion (c)
EXAMPLE: Convert 231.34 to base 7.

Conversion (d)
Binary  Hexadecimal
Conversion

Equation (1-1)
Conversion from binary to hexadecimal (and conversely) can
be done by inspection because each hexadecimal digit
corresponds to exactly four binary digits (bits).
Add 1310 and 1110 in binary.

Addition
Subtraction (a)
The subtraction table for binary numbers is

0–0=0
0–1=1 and borrow 1 from the next column
1–0=1
1–1=0

Borrowing 1 from a column


is equivalent to subtracting 1 from that column.
Subtraction
EXAMPLES OF BINARY SUBTRACTION:
(b)
A detailed analysis of the borrowing process for this example, indicating first
a borrow of 1 from column 1 and then a borrow of 1 from column 2, is as
follows:

Subtraction (c)
Multiplication (a)
The multiplication table for binary numbers is

0x0=0
0x1=0
1x0=0
1x1=1
The following example illustrates
multiplication of 1310 by 1110 in binary:

Multiplication (b)
When doing binary multiplication, a common way to avoid carries greater than 1
is to add in the partial products one at a time as illustrated by the following
example:

1111 multiplicand
1101 multiplier
1111 1st partial product
0000 2nd partial product
(01111) sum of first two partial products
1111 3rd partial product
(1001011) sum after adding 3rd partial product
1111 4th partial product
11000011 final product (sum after adding 4th
partial product)

Multiplication (c)
Binary Division
Binary division is similar to decimal division, except it
is much easier because the only two possible quotient
digits are 0 and 1.
We start division by comparing the divisor with the
upper bits of the dividend.
If we cannot subtract without getting a negative result,
we move one place to the right and try again.
If we can subtract, we place a 1 for the quotient above
the number we subtracted from and append the next
dividend bit to the end of the difference and repeat this
process with this modified difference until we run out
of bits in the dividend.
The following example illustrates
division of 14510 by 1110 in binary:

Binary Division
3 Systems for representing negative
numbers in binary
Sign & Magnitude: Most significant bit is the sign
Ex: – 510 = 11012

1’s Complement: = (2n – 1) - N


Ex: – 510 = (24 – 1) – 5 = 16 – 1 – 5 = 1010 = 10102

2’s Complement: N* = 2n - N
Ex: – 510 = 24 – 5 = 16 – 5 = 1110 = 10112

Section 1.4 (p. 16)


Table 1-1: Signed Binary Integers (word length n = 4)
2’s Complement Addition (a)
2’s Complement
Addition (b)
2’s Complement Addition
(c)
1’s Complement Addition
(b)
1’s Complement
Addition (c)
1’s Complement Addition (d)
2’s Complement Addition (d)
Binary Codes

Although most large computers work internally with


binary numbers, the input-output equipment generally
uses decimal numbers. Because most logic circuits
only accept two-valued signals, the decimal numbers
must be coded in terms of binary signals. In the
simplest form of binary code, each decimal digit is
replaced by its binary equivalent. For example, 937.25
is represented by:

Section 1.5 (p. 21)


Table 1–2. Binary Codes for
Decimal
8-4-2-1
Digits
Decimal 6-3-1-1 Excess-3 2-out-of-5 Gray
Code
Digit Code Code Code Code
(BCD)
0 0000 0000 0011 00011 0000
1 0001 0001 0100 00101 0001
2 0010 0011 0101 00110 0011
3 0011 0100 0110 01001 0010
4 0100 0101 0111 01010 0110
5 0101 0111 1000 01100 1110
6 0110 1000 1001 10001 1010
7 0111 1001 1010 10010 1011
8 1000 1011 1011 10100 1001
9 1001 1100 1100 11000 1000
Table 1-3
ASCII code
(incomplete)
Chapter 2: Boolean algebra
The electronic circuit which forms the
inverse of X is referred to as an inverter

X' = 1 if X = 0
X' = 0 if X = 1

Section 2.2, p. 35
AND Gate
Note that C = 1 if
and only if A and B
are both 1.

Section 2.2, p. 36
OR Gate
Note that C = 1 if
and only if A or B (or
both) are 1.

Section 2.2, p. 36
Switches
If switch X is open, then we will define the
value of X to be 0; if switch X is closed, then
we will define the value of X to be 1.

Section 2.2, p. 36
T = AB

Section 2.2, p. 36
T = A+B
Section 2.2, p. 37
F = AB’ + C

F = [A(C + D)]’ + BE
Figure 2-1: Circuits for Expressions (2-1) and (2-
2)
Figure 2-2(b) shows a
truth table which
specifies the output of
the circuit in Figure 2-
2(a) for all possible
combinations of values
of the inputs A and B.

Figure 2-2: 2-Input Circuit


Since the expression (A + C)(B’ + C) has the same
value as AB’ + C for all eight combinations of values
of the variables A, B, and C, we conclude that:
AB’ + C = (A + C)(B’ + C) (2-3)

A B C B’ AB’ AB’+C A+C B’+C (A+C)(B’+C)


000 1 0 0 0 1 0
001 1 0 1 1 1 1
010 0 0 0 0 0 0
011 0 0 1 1 1 1
100 1 1 1 1 1 1
101 1 1 1 1 1 1
110 0 0 0 1 0 0
111 0 0 1 1 1 1
Table 2-1: Truth Table for 3
variables
Basic Theorems
The following basic laws and theorems of boolean algebra involve
only a single variable:
Operations with 0 and 1:

Idempotent laws

Section 2.4 (p. 39)


Involution law

Laws of complementarity

Each of these theorems is easily proved by showing that it is valid


for both of the possible values of X. For example. To prove X + X′
= 1, we observe that if
If two switches are both labeled with the variable A, this
means that both switches are open when A = 0 and both
are closed when A = 1, thus the following circuits are
equivalent:

Section 2.4, p. 39
Section 2.4, p. 40
A in parallel with A’ can be replaced with a closed circuit
because one or the other of the two switches is always
closed.
Similarly, switch A in series with A’ can be replaced with
an open circuit because one or the other of the two
switches is always open.
Commutative and Associative Laws
Many of the laws of ordinary algebra, such as commutative and
associative laws, also apply to Boolean algebra. The commutative
laws for AND and OR, which follow directly from the definitions of
the AND and OR operations, are

This means that the order in which the variables are written will not
affect the result of applying the AND and OR operations. The
associate laws also apply to AND and OR:

Section 2.5 (p. 40-41)


Table 2-2: Proof of Associative Law
for AND
Two 2-input AND gates can be replaced with a single
3-input AND gate (a). Similarly, two 2-input OR gates
can be replaced with a single 3-input OR gate (b).

Figure 2-3: Associative Law for AND a nd


OR
Distributive Law
Using a truth table, it is easy to show that the distributive law is valid:

In addition to the ordinary distributive law, a second distributive law is valid


for Boolean algebra, but not for ordinary algebra:

Proof of the second distributive law follows:

Section 2.5 (p. 42)


Simplification Theorems
The following theorems are useful in simplifying Boolean expressions:

Section 2.6 (p. 42-43)


Illustration of
Theorem (2-14D): XY’ + Y = X + Y

Section 2.6, p. 43
F = A(A’ + B)
By Theorem (2-14), (X + Y’) = XY, the expression F
simplifies to AB.

Figure 2-4: Equivalent Gate Circuits


Example 1
Simplify Z = A′BC + A′

This expression has the same form as (2-13) if we let X = A′ and Y = BC.
Therefore, the expression simplifies to Z = X + XY = X =A′.

Example 2

Simplify (p. 43-44)


Example 3

Note that in this example we let Y = (AB + C)′ rather than (AB + C) in order
to match the form of (2-14D).

Simplify (p. 43-44)


Sum-Of-Products

An expression is said to be in sum-of-products (SOP)


form when all products are the products of single
variables. This form is the end result when an expression
is fully multiplied out.
For example:

Section 2.7 (p. 44)


Product-Of-Sums

An expression is in product-of-sums (POS) form when all


sums are the sums of single variables. It is usually easy to
recognize a product-of-sums expression since it consists
of a product of sum terms.
For example:

Section 2.7 (p. 45)


EXAMPLE 1: Factor A + B'CD. This is of the form X + YZ
where X = A, Y = B', and Z = CD, so
A + B'CD = (X + Y)(X + Z) = (A + B')(A +CD)

A + CD can be factored again using the second distributive law, so


A + B'CD = (A + B')(A + C)(A +D)
EXAMPLE 2: Factor AB' + C'D

AB′ + C’D = (AB′ + C′)(AB′ + D)


= (A + C′)(B′ + C′)(A + D)(B′ + D)

EXAMPLE 3: Factor C'D + C'E' + G'H


C′D + C′E′ + G′H = C′(D + E′) + G′H
2nd distributive law x+yz =(x+y)(x+z)
= (C′ + G′H)(D + E′ + G′H)
= (C′ + G′)(C′ + H)(D + E′ + G′)(D + E′ + H)

Factor (p. 45-


46)
(2-15):
AB′ + CD′E + AC′E′

(2-17):
A + B′ + C + D′E
Figure 2-5: Circuits for Equations (2-15) and (2-
17)
(2-18):
(A +B′)(C + D′ + E)(A + C′ + E′)

(2-20):
AB′C(D′ + E)

Figure 2-6: Circuits for Equations (2-18) and (2-


20)
D e Morgan’s Laws
Section 2.8 (p. 4 7 )

DeMorgan’s laws are easily generalized to n variables:

For example, for n = 3,


Note that in the final expressions, the complement
operation is applied only to single variables.
Duality
Section 2.8 (p. 48)
Given an expression, the dual is formed by replacing
AND with OR, OR with AND, 0 with 1, and 1 with 0.
Variables and complements are left unchanged. The
dual of AND is OR and the dual of OR is AND:

The dual of an expression may be found by


complementing the entire expression and then
complementing each individual variables. For example:
LAWS AND
THEOREMS (a)
p. 55
Operations with 0 and 1:
1. X + 0 = X 1D. X • 1 = X
2. X + 1 = 1 2D. X • 0 = 0

Idempotent laws:
3. X + X = X 3D. X • X = X

Involution law:
4. (X')' = X

Laws of complementarity:
5. X + X' = 1 5D. X • X' = 0
LAWS AND THEOREMS
(b)
Commutpat.ive5la5ws:
6. X + Y = Y + X 6D. XY = YX

Associative laws:
7. (X + Y) + Z = X + (Y + Z) 7D. (XY)Z = X(YZ) = XYZ
=X+Y+Z

Distributive laws:
8. X(Y + Z) = XY + XZ 8D. X + YZ = (X + Y)(X + Z)

Simplification theorems:
9. XY + XY' = X 9D. (X + Y)(X + Y') = X
10. X + XY = X 10D. X(X + Y) = X
11. (X + Y')Y = XY 11D. XY' + Y = X + Y
LAWS AND THEOREMS (c)
p. 55
DeMorgan's laws:
12. (X + Y + Z +...)' = X'Y'Z'... 12D. (XYZ...)' = X' + Y' + Z' +...

Duality:
13. (X + Y + Z +...)D = XYZ... 13D. (XYZ...)D = X + Y + Z +...

Theorem for multiplying out and factoring:


14. (X + Y)(X' + Z) = XZ + X'Y 14D. XY + X'Z = (X + Z)(X' + Y)

Consensus theorem:
15. XY + YZ + X'Z = XY + X'Z
15D. (X + Y)(Y + Z)(X' + Z) = (X + Y)(X' + Z)
Distributive Laws
Given an expression in product-of-sums form, the
corresponding sum-of-products expression can be
obtained by multiplying out, using the two distributive laws:
X(Y + Z) = XY + XZ (3-1)
(X + Y)(X + Z) = X + YZ (3-2)

In addition, the following theorem is very useful for


factoring and multiplying out:
(X + Y)(X′ + Z) = XZ + X′Y (3-3)
In the following example, if we were to multiply out by brute
force, we would generate 162 terms, and 158 of these
terms would then have to be eliminated to simplify the
expression. Instead, we will use the distributive laws to
simplify the process.

Example (3-4), p. 63
The same theorems that are useful for multiplying out
expressions are useful for factoring. By repeatedly
applying (3-1), (3-2), and (3-3), any expression can be
converted to a product-of-sums form.
Exclusive-OR and Equivalence Operations

The exclusive-OR operation ( ) is defined as follows:

The equivalence operation ( ) is defined by:


We will use the following
symbol for an
exclusive-OR gate:

Section 3.2, p. 64
The following theorems apply to exclusive OR:
We will use the following
symbol for an
equivalence gate:

Section 3.2, p. 65
Because equivalence is the complement of exclusive-OR,
an alternate symbol of the equivalence gate is an
exclusive-OR gate with a complemented output:

The equivalence gate is also called an


exclusive-NOR gate.

Section 3.2, p. 66
Example 1:

By (3-6) and (3-17),


F = [(A′B)C + (A′B)′C′ ] + [B′(AC′) + B(AC′)′ ]
= A′BC + (A + B′)C′ + AB′C′ + B(A′ + C)
= B(A′C + A′ + C) + C′(A + B′ + AB′) = B(A′ + C) + C′(A + B′)

Example 2:

= (A′B′ + AB)C′ + (A′B′ + AB) ′C (by (3-6))


= (A′B′ + AB)C′ + (A′B + AB′)C (by (3-19))
= A′B′C′ + ABC′ + A′BC + AB′C
Section 3.2 (p. 66)
The Consensus Theorem

The consensus theorem can be stated as follows:


XY + X'Z + YZ = XY + X'Z (3-20)

Dual Form:
(X + Y)(X’ + Z)(Y + Z) = (X + Y)(X’ + Z) (3-21)

Section 3.2 (p. 66-67)


Consensus Theorem
Proof

XY + X'Z + YZ = XY + X'Z + (X + X')YZ


= (XY + XYZ) + (X'Z + X'YZ)
= XY(1 + Z) + X'Z(1 + Y) = XY + X'Z

Section 3.3 (p. 67)


Basic methods for
simplifying functions

1. Combining terms. Use the theorem XY + XY′ = X to combine


two terms. For example,

abc′d′ + abcd′ = abd′ [X = abd′, Y = c] (3-24)

2. Eliminating terms. Use the theorem X + XY = X to eliminate


redundant terms if possible; then try to apply the consensus
theorem (XY + X′Z + YZ = XY + X′Z) to eliminate any consensus
terms. For example,
a′b + a′bc = a′b [X = a′b]
a′bc′ + bcd + a′bd = a′bc′ + bcd [X = c, Y = bd, Z = a′b] (3-24)

Section 3.4 (p. 68-69)


3. Eliminating literals. Use the theorem X + X’Y = X + Y to
eliminate redundant literals. Simple factoring may be necessary
before the theorem is applied.

A′B + A′B′C′D′ + ABCD′ = A′(B + B′C′D′) + ABCD′


= A′(B + C′D′) + ABCD′
= B(A′ + ACD′) + A′C′D′
= B(A′ + CD′) + A′C′D′
= A′B + BCD′ + A′C′D′ (3-26)
4. Adding redundant terms. Redundant terms can be
introduced in several ways such as adding xx′, multiplying
by (x + x′), adding yz to xy + x′z, or adding xy to x. When
possible, the added terms should be chosen so that they
will combine with or eliminate other terms.

WX + XY + X′Z′ + WY′Z′ (add WZ′ by consensus theorem)


= WX + XY + X′Z′ + WY′Z′ + WZ′ (eliminate WY′Z′)
= WX + XY + X′Z′ + WZ′ (eliminate WZ′)
= WX + XY + X′Z′ (3-27)
The following comprehensive example
illustrates use of all four methods:

Example (3-28), p 69-70


Proving Validity of an Equation
Often we will need to determine if an equation is valid for all
combinations of values of the variables. Several methods can be
used to determine if an equation is valid:
1. Construct a truth table and evaluate both sides of the
equation for all combinations of values of the variables.
(This method is rather tedious if the number of variables is
large, and it certainly is not very elegant.)
2. Manipulate one side of the equation by applying various
theorems until it is identical with the other side.
3. Reduce both sides of the equation independently to the
same expression.

Section 3.5 (p 70)


4. It is permissible to perform the same operation on both
sides of the equation provided that the operation is
reversible. For example, it is all right to complement both
sides of the equation, but it is not permissible to multiply
both sides of the equation by the same expression.
(Multiplication is not reversible because division is not
defined for Boolean algebra.) Similarly, it is not permissible
to add the same term to both sides of the equation because
subtraction is not defined for Boolean algebra.
To prove that an equation is not valid, it is sufficient to show one
combination of values of the variables for which the two sides of the
equation have different values. When using method 2 or 3 above to
prove that an equation is valid, a useful strategy is to
1. First reduce both sides to a sum of products (or a product of sums).
2. Compare the two sides of the equation to see how they differ.
3. Then try to add terms to one side of the equation that are present on
the other side.
4. Finally try to eliminate terms from one side that are not present on the
other.

Whatever method is used, frequently compare both sides of the


equation and let the different between them serve as a guide for what
steps to take next.
Example: Show that
A'BD' + BCD + ABC' + AB'D = BC'D' + AD + A'BC

Solution: Starting with the left side,

Example 1 (p. 71)


Differences between Boolean algebra and
ordinary algebra
As we have previously observed, some of the theorems of
Boolean algebra are not true for ordinary algebra.
Similarly, some of the theorems of ordinary algebra are not
true for Boolean algebra. Consider, for example, the
cancellation law for ordinary algebra:
If x + y = x + z, then y=z (3-31)
The cancellation law is not true for Boolean algebra. We
will demonstrate this by constructing a counterexample in
which x + y = x + z but y ≠ z. Let x = 1, y = 0, z = 1. Then,
1 + 0 = 1 + 1 but 0 ≠ 1

Section 3.5 (p 72)


In ordinary algebra, the cancellation law for multiplication
is
If xy = xz, then y=z (3-32)
This law is valid provided x ≠ 0.

In Boolean algebra, the cancellation law for multiplication


is also not valid when x = 0. (Let x = 0, y = 0, z = 1; then
0 • 0 = 0 • 1, but 0 ≠ 1). Because x = 0 about half the time
in switching algebra, the cancellation law for multiplication
cannot be used.
Similarities between Boolean algebra and
ordinary algebra

Even though the statements in the previous 2 slides


(3-31 and 3-32) are generally false for Boolean algebra,
the converses are true:

If y = z, then x + y = x + z (3-33)
If y = z, then xy = xz (3-34)

Section 3.5 (p 72)


College of Computer Science
Computer Science Department

Chapter 3: Basic structure of Computers

0
Course Objective

- Describe the general organization and architecture of computers.


- Identify computers’ major components and study their functions.
-Introduce hardware design issues of modern computer architectures.
-Build the required skills to read and research the current literature in
computer architecture.

1
Textbooks

- Computer Organization and Design: The Hardware /Software


interface, by David A. Patterson and John L. Hennessy. Fifth Edition,
2014.

2
Contents

 Introduction
 Functional units of a computer
 Basic Computer Organization
 Information in a computer :Instructions, Data,…
 Bus structures

3
Introduction: What is a computer?

 Simply put, a computer is a sophisticated electronic


calculating machine that:
 Accepts input information,
 Processes the information according to a list of internally
stored instructions and
 Produces the resulting output information.
 Functions performed by a computer are:
 Accepting information to be processed as input.
 Storing a list of instructions to process the information.
 Processing the information according to the list of
instructions.
 Providing the results of the processing as output.
 What are the functional units of a computer?

4
Functional units of a computer
Input unit accepts Arithmetic and logic unit(ALU):
information: •Performs the desired
•Human operators, operations on the input
•Electromechanical devices information as determined
•Other computers by instructions in the memory

Memory Arithmetic
Input Instr1 & Logic
Instr2
Instr3
Data1
Output Data2 Control

I/O Stores Processor


information: Control unit coordinates
Output unit sends various actions
results of processing: •Instructions,
•Data •Input,
•To a monitor display, •Output
•To a printer •Processing

5
Basic Computer Organization

6
Simple Computer Organization - Memory Details

7
Information in a computer -- Instructions

 Instructions specify commands to:


 Transfer information within a computer (e.g., from memory to
ALU)
 Transfer of information between the computer and I/O devices
(e.g., from keyboard to computer, or computer to printer)
 Perform arithmetic and logic operations (e.g., Add two
numbers, Perform a logical AND).
 A sequence of instructions to perform a task is called a
program, which is stored in the memory.
 Processor fetches instructions that make up a program from
the memory and performs the operations stated in those
instructions.
 What do the instructions operate upon?

8
Information in a computer -- Data

 Data are the “operands” upon which instructions operate.


 Data could be:
 Numbers,
 Encoded characters.
 Data, in a broad sense means any digital information.
 Computers use data that is encoded as a string of binary
digits called bits.

9
Input unit
Binary information must be presented to a computer in a specific format. This
task is performed by the input unit:
- Interfaces with input devices.
- Accepts binary information from the input devices.
- Presents this binary information in a format expected by the computer.
- Transfers this information to the memory or processor.
Real world Computer

Memory

Keyboard
Audio input
Input Unit
……

Processor

10
Memory unit

 Memory unit stores instructions and data.


 Recall, data is represented as a series of bits.
 To store data, memory unit thus stores bits.
 Processor reads instructions and reads/writes data from/to
the memory during the execution of a program.
 In theory, instructions and data could be fetched one bit at a
time.
 In practice, a group of bits is fetched at a time.
 Group of bits stored or retrieved at a time is termed as “word”
 Number of bits in a word is termed as the “word length” of a
computer.
 In order to read/write to and from memory, a processor
should know where to look:
 “Address” is associated with each word location.

11
Memory unit (contd..)
 Processor reads/writes to/from memory based on the
memory address:
 Access any word location in a short and fixed amount of time
based on the address.
 Random Access Memory (RAM) provides fixed access time
independent of the location of the word.
 Access time is known as “Memory Access Time”.
 Memory and processor have to “communicate” with each
other in order to read/write information.
 In order to reduce “communication time”, a small amount of
RAM (known as Cache) is tightly coupled with the processor.
 Modern computers have three to four levels of RAM units with
different speeds and sizes:
 Fastest, smallest known as Cache
 Slowest, largest known as Main memory.

12
Memory unit (contd..)

 Primary storage of the computer consists of RAM units.


 Fastest, smallest unit is Cache.
 Slowest, largest unit is Main Memory.
 Primary storage is insufficient to store large amounts of
data and programs.
 Primary storage can be added, but it is expensive.
 Store large amounts of data on secondary storage devices:
 Magnetic disks and tapes,
 Optical disks (CD-ROMS).
 Access to the data stored in secondary storage in slower, but
take advantage of the fact that some information may be
accessed infrequently.
 Cost of a memory unit depends on its access time, lesser
access time implies higher cost.

13
Arithmetic and logic unit (ALU)

 Operations are executed in the Arithmetic and Logic Unit


(ALU).
 Arithmetic operations such as addition, subtraction.
 Logic operations such as comparison of numbers.
 In order to execute an instruction, operands need to be
brought into the ALU from the memory.
 Operands are stored in general purpose registers available in
the ALU.
 Access times of general purpose registers are faster than the
cache.
 Results of the operations are stored back in the memory or
retained in the processor for immediate use.

14
Output unit
•Computers represent information in a specific binary form. Output units:
- Interface with output devices.
- Accept processed results provided by the computer in specific binary form.
- Convert the information in binary form to a form understood by an
output device.

Computer Real world

Memory Printer
Graphics display
Speakers
……
Output Unit

Processor

15
Control unit

 Operation of a computer can be summarized as:


 Accepts information from the input units (Input unit).
 Stores the information (Memory).
 Processes the information (ALU).
 Provides processed results through the output units (Output
unit).
 Operations of Input unit, Memory, ALU and Output unit are
coordinated by Control unit.
 Instructions control “what” operations take place (e.g. data
transfer, processing).
 Control unit generates timing signals which determines
“when” a particular operation takes place.

16
How are the functional units connected?

•For a computer to achieve its operation, the functional units need to


communicate with each other.
•In order to communicate, they need to be connected.

Input Output Memory Processor

Bus

•Functional units may be connected by a group of parallel wires.


•The group of parallel wires is called a bus.
•Each wire in a bus can transfer one bit of information.
•The number of parallel wires in a bus is equal to the word length of
a computer

17
Organization of cache and main memory

Main Cache Processor


memory memory

Bus

Why is the access time of the cache memory lesser than the
access time of the main memory?

18
Bus structures

 A bus is a communication pathway connecting two


or more devices. A key characteristic of a bus is that
it is a shared transmission medium.

 Multiple devices connect to the bus, and a signal


transmitted by any one device is available for
reception by all other devices attached to the bus.
 If two devices transmit during the same time period,
their signals will overlap and become garbled. Thus,
only one device at a time can successfully transmit

19
Data Bus

 The data lines provide a path for moving data


between system modules.
 These lines, collectively, are called the data bus;
the bus typically consists of 8, 16, or 32 separate
lines, the number of lines being referred to as the
width of the data bus.
 Since each line can carry only, 1 bit at a time, the
number of lines determines how many bits can be
transferred at a time

20
The address lines: Address Bus

 The address lines are used to designate the source or


destination of the data on the data bus.
 For example, if the CPU wishes to read a word (8, 16, 32
bits" of data from memory, it puts the address of the
desired word on the address clearly, the width of the
address bus determines the maximum possible memory
capacity of the system.
 Furthermore, the address lines are generally also use to
address I/O ports.

21
The control lines: Control Bus

 The control lines are used to control the access to


and the use of the data and address lines.
 Since the data and address lines are shared by all
components, there must be a means of controlling
their use.
 Control signals transmit both command and timing
information between system modules.
 Timing signals indicate the validity of data and
address information. Common, signals specify
operations to be performed.

22
College of Computer Science
Department of Computer Science

Chapter 4: Instruction Set Architecture (ISA)


Contents

 Introduction, ISA
 Data representations in Computer Systems
 Instruction Set
 Instruction formats
 Instruction types
 Addressing modes
 Assembly language

1
Instruction Set Architecture (ISA): Introduction

2
Data representations in Computer Systems :Data types

 The data types stored in digital computers may be classified as


being one of the following categories :
- Numbers used in arithmetic computations,
- Letters of the alphabet used in data processing, and other
discrete symbols used for specific purposes.
 All types of data are represented in computers in binary-coded
form.
- A bit is the most basic unit of information in a computer. It is a
state of “on” or “off” in a digital circuit.
- Sometimes they represent high or low voltage
- A byte is a group of eight bits.. It is the smallest possible
addressable unit of computer storage.
- A word is a contiguous group of bytes.
Words can be any number of bits or bytes.
Word sizes of 16, 32, or 64 bits are most common.

3
Number systems
The binary system is also called the base-2 system. (101100.011)
Our decimal system is the base-10 system. It uses powers of 10 for
each position in a number. (975.3)
Any integer quantity can be represented exactly using any base (or
radix). (3077 octal or 2BAD hex)
-The decimal number 947 in powers of 10 is:

- The decimal number 5836.47 in powers of 10 is:

4
Converting Numbers Between Bases

5
Converting Numbers Between Bases (contd..)

6
Converting between Base 16, 8 and Base 2

7
Examples

8
Signed Integer Representation

9
Two's Complement Representation

 Positive numbers 8-bit Binary Unsigned Signed


value value value
 Signed value = Unsigned value
00000000 0 0
 Negative numbers 00000001 1 +1
 Signed value = Unsigned value – 2n 00000010 2 +2

n = number of bits ... ... ...


01111110 126 +126
 Negative weight for MSB
01111111 127 +127
 Another way to obtain the signed
10000000 128 -128
value is to assign a negative weight
to most-significant bit 10000001 129 -127
... ... ...
1 0 1 1 0 1 0 0
11111110 254 -2
-128 64 32 16 8 4 2 1
11111111 255 -1
= -128 + 32 + 16 + 4 = -76
Forming the Two's Complement

starting value 00100100 = +36

step1: reverse the bits (1's complement) 11011011

step 2: add 1 to the value from step 1 + 1

sum = 2's complement representation 11011100 = -36

Sum of an integer and its 2's complement must be zero:


00100100 + 11011100 = 00000000 (8-bit sum)  Ignore Carry

Another way to obtain the 2's complement: Binary Value


least

Start at the least significant 1 = 00100 1 00 significant 1

Leave all the 0s to its right unchanged 2's Complement


Complement all the bits to its left = 11011 1 00
Floating-point representations

12
Floating-point representations (contd.)

13
Instruction Set: Execution of an instruction

 The steps involved in the execution of an instruction by a


processor:
 Fetch an instruction from the memory.
 Fetch the operands.
 Execute the instruction.
 Store the results.
 Several issues:
 Where is the address of the memory location from which the
present instruction is to be fetched?
 Where is the present instruction stored while it is executed?
 Where and what is the address of the memory location from
which the data is fetched?
 ......
 Basic processor architecture has several registers to assist
in the execution of the instructions.

14
Basic processor architecture

Memory Address Register (MAR) Address of the memory


Memory Data Register (MDR) location to be accessed

Memory

Address of the next Data to be read into or


instruction to be fetched read out of the current
and executed. location

MAR MDR
Control

PC R0

R1 General purpose
IR registers
ALU
Instruction that is R(n-1)
-
currently being
n general purpose
executed registers Processor

15
Basic processor architecture (contd..)

Control Data
Path Path

MAR MDR Processor

Control path is responsible for:


•Instruction fetch and execution sequencing
•Operand fetch
•Saving results
Data path:
•Contains general purpose registers
•Contains ALU
Memory •Executes instructions

16
Registers in the control path

 Instruction Register (IR):


 Instruction that is currently being executed.
 Program Counter (PC):
 Address of the next instruction to be fetched and executed.
 Memory Address Register (MAR):
 Address of the memory location to be accessed.
 Memory Data Register (MDR):
 Data to be read into or read out of the current memory
location, whose address is in the Memory Address Register
(MAR).

17
Fetch/Execute cycle

 Execution of an instruction takes place in two phases:


 Instruction fetch.
 Instruction execute.
 Instruction fetch:
 Fetch the instruction from the memory location whose address
is in the Program Counter (PC).
 Place the instruction in the Instruction Register (IR).
 Instruction execute:
 Instruction in the IR is examined (decoded) to determine which
operation is to be performed.
 Fetch the operands from the memory or registers.
 Execute the operation.
 Store the results in the destination location.
 Basic fetch/execute cycle repeats indefinitely.

18
Memory organization

 Recall:
 Information is stored in the memory as a collection of bits.
 Collection of bits are stored or retrieved simultaneously is
called a word.
 Number of bits in a word is called word length.
 Word length can be 16 to 64 bits.
 Another collection which is more basic than a word:
 Collection of 8 bits known as a “byte”
 Bytes are grouped into words, word length can also be
expressed as a number of bytes instead of the number of
bits:
Word length of 16 bits, is equivalent to word length of 2 bytes.
 Words may be 2 bytes (older architectures), 4 bytes (current
architectures), or 8+ bytes (modern architectures).

19
Memory organization (contd..)

 Accessing the memory to obtain information requires


specifying the “address” of the memory location.
 Recall that a memory has a sequence of bits:
 Assigning addresses to each bit is impractical and unnecessary.
 Typically, addresses are assigned to a single byte.
 “Byte addressable memory”
 Suppose k bits are used to hold the address of a memory
location:
k
Size of the memory in bytes is given by: 2
where k is the number of bits used to hold a memory address.
16
E.g., for a 16-bit address, size of the memory is 2 = 65536 bytes

What is the size of the memory for a 24-bit address?

20
Memory organization (contd..)

Byte 0
•Memory is viewed as a sequence of
bytes.
•Address of the first byte is 0
k
•Address of the last byte is 2 - 1,
where k is the number of bits used
to hold memory address
•E.g. when k = 16,
Address of the first byte is 0
Address of the last byte is 65535
•E.g. when k = 2,
Address of the first byte is ?
Address of the last byte is ?

k
Byte 2 -1

21
Memory organization (contd..)

Word #0 Byte 0
Byte 1
Consider a memory organization:
Byte 2 16-bit memory addresses
Byte 3 Size of the memory is ?
Word #1 Byte 4 Word length is 4 bytes
Number of words = Memory size(bytes) = ?
Word length(bytes)
Word #0 starts at Byte #0.
Word #1 starts at Byte #4.
Last word (Word #?) starts at Byte#?

Word #? Byte 65532


Byte 65533
Byte 65534
Byte 65535

22
Memory organization (contd..)

Byte 0 Word #0
Byte 1
Byte 2
MAR Byte 3
Byte 4 Word #1 MDR
MAR register
contains the
address of the
memory location
addressed

Addr 65532 Byte 65532 Word #16383


Byte 65533 MDR contains either the
Byte 65534 data to be written to that
address or read from that
Byte 65535
address.

23
Memory operations

 Memory read or load:


 Place address of the memory location to be read from into
MAR.
 Issue a Memory_read command to the memory.
 Data read from the memory is placed into MDR automatically
(by control logic).
 Memory write or store:
 Place address of the memory location to be written to into
MAR.
 Place data to be written into MDR.
 Issue Memory_write command to the memory.
 Data in MDR is written to the memory automatically (by control
logic).

24
Instruction types

 Computer instructions must be capable of performing 4


types of operations.
 Data transfer/movement between memory and processor
registers.
 E.g., memory read, memory write
 Arithmetic and logic operations:
 E.g., addition, subtraction, comparison between two numbers.
 Program sequencing and flow of control:
 Branch instructions
 Input/output transfers to transfer data to and from the
real world.

25
Instruction types (contd..)

 Examples of different types of instructions in assembly


language notation.
 Data transfers between processor and memory.
 Move A, B (B = A).
 Move A, R1 (R1 = A).
 Arithmetic and logic operation:
 Add A, B, C (C = A + B)
 Sequencing:
 Jump Label (Jump to the subroutine which starts at Label).
 Input/output data transfer:
 Input PORT, R5 (Read from i/o port “PORT” to register R5).

26
Specifying operands in instructions

 Operands are the entities operated upon by the instructions.


 Recall that operands may have to be fetched from a memory
location to execute an operation.
 Memory locations have addresses using which they can be
accessed.
 Operands may also be stored in the general purpose
registers.
 Intermediate value of some computation which will be required
immediately for subsequent computations.
 Registers also have addresses.
 Specifying the operands on which the instruction is to
operate involves specifying the addresses of the operands.
 Address can be of a memory location or a register.

27
Source and destination operands

 Operation may be specified as:


 Operation source1, source2, destination
 An operand is called a source operand if:
 It appears on the right-hand side of an expression
• E.g., Add A, B, C (C = A+ B)
– A and B are source operands.
 An operand is called a destination operand if:
 It appears on the left-hand side of an expression.
• E.g., Add A, B, C (C = A + B)
– C is a destination operand.

28
Instruction types

 Instructions can also be classified based on the number of


operand addresses they include.
 3, 2, 1, 0 operand addresses.
 3-address instructions are almost always instructions that
implement binary operations.
 E.g. Add A, B, C (C = A + B)
 k bits are used to specify the address of a memory location,
then 3-address instructions need 3*k bits to specify the
operand addresses.
 3-address instructions, where operand addresses are memory
locations are too big to fit in one word.

29
Instruction types (contd..)

 2-address instructions one operand serves as a source and


destination:
 E.g. Add A, B (B = A + B)
 2-address instructions need 2*k bits to specify an instruction.
 This may also be too big to fit into a word.
 2-address instructions, where at least one operand is a
processor register:
 E.g. Add A, R1 (R1 = A + R1)
 1-address instructions require only one operand.
 E.g. Clear A (A = 0)
 0-address instructions do not operate on operands.
 E.g. Halt (Halt the computer)
 How are addresses of operands specified in the
instructions?
30
Addressing modes

 Different ways in which the address of an operand in


specified in an instruction is referred to as addressing
modes.
 Register mode
 Operand is the contents of a processor register.
 Address of the register is given in the instruction.
 E.g. Clear R1
 Absolute mode
 Operand is in a memory location.
 Address of the memory location is given explicitly in the
instruction.
 E.g. Clear A
 Also called as “Direct mode” in some assembly languages
 Register and absolute modes can be used to represent
variables
31
Addressing modes (contd..)

 Immediate mode
 Operand is given explicitly in the instruction.
 E.g. Move #200, R0
 Can be used to represent constants.
 Register, Absolute and Immediate modes contained either
the address of the operand or the operand itself.
 Some instructions provide information from which the
memory address of the operand can be determined
 That is, they provide the “Effective Address” of the operand.
 They do not provide the operand or the address of the operand
explicitly.
 Different ways in which “Effective Address” of the operand
can be generated.

32
Addressing modes (contd..)

Effective Address of the operand is the contents of a register or


a memory location whose address appears in the instruction.

Add (R1),R0 Add (A),R0

Main
memory

B Operand A B

R1 B Register B Operand

•Register R1 contains Address B •Address A contains Address B


•Address B has the operand •Address B has the operand
R1 and A are called “pointers”
This is called as “Indirect Mode”
33
Addressing modes (contd..)

Effective Address of the operand is generated by adding a constant


value to the contents of the register

Add 20(R1),R0 •Operand is at address 1020


•Register R1 contains 1000
•Offset 20 is added to the
contents of R1 to generate the
1000 address 20
•Contents of R1 do not change in the
offset = 20 process of generating the address
•R1 is called as an “index register”
1020 Operand
What address would be generated
by Add 1000(R1), R0 if R1 had 20?

R1 1000

This is the “Indexing Mode”


34
Addressing Modes (contd..)
Relative mode

•Effective Address of the operand is generated by adding a


constant value to the contents of the Program Counter (PC).
•Variation of the Indexing Mode, where the index register is the PC
instead of a general purpose register.
•When the instruction is being executed, the PC holds the address
of the next instruction in the program.
•Useful for specifying target addresses in branch instructions.

Addressed location is “relative” to the PC, this is called “Relative Mode”

35
Instruction execution and sequencing

 Recall the fetch/execute cycle of instruction execution.


 In order to complete a meaningful task, a number of
instructions need to be executed.
 During the fetch/execution cycle of one instruction, the
Program Counter (PC) is updated with the address of the
next instruction:
 PC contains the address of the memory location from which the
next instruction is to be fetched.
 When the instruction completes its fetch/execution cycle,
the contents of the PC point to the next instruction.
 Thus, a sequence of instructions can be executed to
complete a task.

36
Instruction execution and sequencing (contd..)

 Simple processor model


 Processor has a number of general purpose registers.
 Word length is 32 bits (4 bytes).
 Memory is byte addressable.
 Each instruction is one word long.
 Instructions allow one memory operand per instruction.
 One register operand is allowed in addition to one memory
operand.
 Simple task:
 Add two numbers stored in memory locations A and B.
 Store the result of the addition in memory location C.

Move A, R0 (Move the contents of location A to register R0)


Add B, R0 (Add the contents of location B to register R0)
Move R0, C (Move the contents of register R0 to location C)

37
Instruction execution and sequencing (contd..)
Execution steps:
0 Move A, R0
Step I:
4 Add B, R0 -PC holds address 0.
8 Move R0, C -Fetches instruction at address 0.
-Fetches operand A.
-Executes the instruction.
-Increments PC to 4.
Step II:
-PC holds address 4.
A -Fetches instruction at address 4.
-Fetches operand B.
-Executes the instruction.
-Increments PC to 8.
B Step III:
-PC holds address 8.
-Fetches instruction at address 8.
-Executes the instruction.
C
-Stores the result in location C.
Instructions are executed one at a time in order of increasing addresses.
“Straight line sequencing”
38
Instruction execution and sequencing (contd..)

 Consider the following task:


 Add 10 numbers.
 Number of numbers to be added (in this case 10) is stored in
location N.
 Numbers are located in the memory at NUM1, .... NUM10
 Store the result in SUM.
Move NUM1, R0 (Move the contents of location NUM1 to register R0)
Add NUM2, R0 (Add the contents of location NUM2 to register R0)
Add NUM3, R0 (Add the contents of location NUM3 to register R0)
Add NUM4, R0 (Add the contents of location NUM4 to register R0)
Add NUM5, R0 (Add the contents of location NUM5 to register R0)
Add NUM6, R0 (Add the contents of location NUM6 to register R0)
Add NUM7, R0
Add NUM8, R0
Add NUM9, R0
Add NUM10, R0
Move R0, SUM (Move the contents of register R0 to location SUM)

39
Instruction sequencing and execution (contd..)

 Separate Add instruction to add each number in a list,


leading to a long list of Add instructions.
 Task can be accomplished in a compact way, by using the
Branch instruction.

Move N, R1 (Move the contents of location N, which is the number


of numbers to be added to register R1)
Clear R0 (This register holds the sum as the numbers are added)
LOOP Determine the address of the next number.
Add the next number to R0.
Decrement R1 (Counter which indicates how many numbers have been
added so far).
Branch>0 LOOP (If all the numbers haven’t been added, go to LOOP)
Move R0, SUM

40
Instruction execution and sequencing (contd..)

 Decrement R1:
 Initially holds the number of numbers that is to be added
(Move N, R1).
 Decrements the count each time a new number is added
(Decrement R1).
 Keeps a count of the number of the numbers added so far.
 Branch>0 LOOP:
 Checks if the count in register R1 is 0 (Branch > 0)
 If it is 0, then store the sum in register R0 at memory location
SUM (Move R0, SUM).
 If not, then get the next number, and repeat (go to LOOP). Go
to is specified implicitly.
 Note that the instruction (Branch > 0 LOOP) has no explicit
reference to register R1.

41
Instructions execution and sequencing (contd..)

 Processor keeps track of the information about the results


of previous operation.
 Information is recorded in individual bits called “condition
code flags”. Common flags are:
 N (negative, set to 1 if result is negative, else cleared to 0)
 Z (zero, set to 1 if result is zero, else cleared to 0)
 V (overflow, set to 1 if arithmetic overflow occurs, else
cleared)
 C (carry, set to 1 if a carry-out results, else cleared)
 Flags are grouped together in a special purpose register
called “condition code register” or “status register”.
If the result of Decrement R1 is 0, then flag Z is set.
Branch> 0, tests the Z flag.
If Z is 1, then the sum is stored.
Else the next number is added.

42
Instruction execution and sequencing (contd..)

 Branch instructions alter the sequence of program execution


 Recall that the PC holds the address of the next instruction to
be executed.
 Do so, by loading a new value into the PC.
 Processor fetches and executes instruction at this new
address, instead of the instruction located at the location that
follows the branch.
 New address is called a “branch target”.
 Conditional branch instructions cause a branch only if a
specified condition is satisfied
 Otherwise the PC is incremented in a normal way, and the next
sequential instruction is fetched and executed.
 Conditional branch instructions use condition code flags to
check if the various conditions are satisfied.

43
Instruction sequencing and execution (contd..)

 How to determine the address of the next number?


 Recall the addressing modes:
 Initialize register R2 with the address of the first number
using Immediate addressing.
 Use Indirect addressing mode to add the first number.
 Increment register R2 by 4, so that it points to the next
number.

Move N, R1
Move #NUM1, R2 (Initialize R2 with address of NUM1)
Clear R0
LOOP Add (R2), R0 (Indirect addressing)
Add #4, R2 (Increment R2 to point to the next number)
Decrement R1
Branch>0 LOOP
Move R3, SUM

44
Instruction execution and sequencing (contd..)

 Note that the same can be accomplished using


“autoincrement mode”:

Move N, R1
Move #NUM1, R2 (Initialize R2 with address of NUM1)
Clear R0
LOOP Add (R2)+, R0 (Autoincrement)
Decrement R1
Branch>0 LOOP
Move R3, SUM

45
Stacks

 A stack is a list of data elements, usually words or bytes


with the accessing restriction that elements can be added or
removed at one end of the stack.
 End from which elements are added and removed is called the
“top” of the stack.
 Other end is called the “bottom” of the stack.
 Also known as:
 Pushdown stack.
 Last in first out (LIFO) stack.
 Push - placing a new item on the stack.
 Pop - Removing the top item from the stack.

46
Stacks (contd..)

 Data stored in the memory of a computer can be organized


as a stack.
 Successive elements occupy successive memory locations.
 When new elements are pushed on to the stack they are
placed in successively lower address locations.
 Stack grows in direction of decreasing memory addresses.
 A processor register called as “Stack Pointer (SP)” is used
to keep track of the address of the element that is at the
top at any given time.
 A general purpose register could serve as a stack pointer.

47
Subroutines

 In a program subtasks that are repeated on different data


values are usually implemented as subroutines.
 When a program requires the use of a subroutine, it
branches to the subroutine.
 Branching to the subroutine is called as “calling” the subroutine.
 Instruction that performs this branch operation is Call.
 After a subroutine completes execution, the calling program
continues with executing the instruction immediately after
the instruction that called the subroutine.
 Subroutine is said to “return” to the program.
 Instruction that performs this is called Return.
 Subroutine may be called from many places in the program.

48
Subroutines (contd..)

Memory Memory
location Calling program location Subroutine SUB •Calling program calls a subroutine,
whose first instruction is at address
100.
200 Call SUB 1000 first instruction
•The Call instruction is at address
204 next instruction 200.
•While the Call instruction is being
Return executed, the PC points to the next
instruction at address 204.
•Call instructions stores address 204
in the Link register, and loads 1000
1000
into the PC.
•Return instruction loads back the
PC 204 address 204 from the link register
into the PC.

Link 204

Call Return

49
Subroutines and stack

 For nested subroutines:


 After the last subroutine in the nested list completes
execution, the return address is needed to execute the return
instruction.
 This return address is the last one that was generated in the
nested call sequence.
 Return addresses are generated and used in a “Last-In-First-
Out” order.
 Push the return addresses onto a stack as they are
generated by subroutine calls.
 Pop the return addresses from the stack as they are needed
to execute return instructions.

50
Assembly language

 Recall that information is stored in a computer in a binary


form, in a patterns of 0s and 1s.
 Such patterns are awkward when preparing programs.
 Symbolic names are used to represent patterns.
 So far we have used normal words such as Move, Add, Branch,
to represent corresponding binary patterns.
 When we write programs for a specific computer, the normal
words need to be replaced by acronyms called mnemonics.
 E.g., MOV, ADD, INC
 A complete set of symbolic names and rules for their use
constitute a programming language, referred to as the
assembly language.

51
Assembly language (contd..)

 Programs written in assembly language need to be translated


into a form understandable by the computer, namely, binary,
or machine language form.
 Translation from assembly language to machine language is
performed by an assembler.
 Original program in assembly language is called source program.
 Assembled machine language program is called object program.
 Each mnemonic represents the binary pattern, or OP code
for the operation performed by the instruction.
 Assembly language must also have a way to indicate the
addressing mode being used for operand addresses.
 Sometimes the addressing mode is indicated in the OP code
mnemonic.
 E.g., ADDI may be a mnemonic to indicate an addition operation
with an immediate operand.

52
Assembly language (contd..)
100 Move N,R1
104 Move #NUM1,R2
•What is the numeric value assigned to SUM?
108 Clear R0
•What is the address of the data NUM1 through
LOOP 112 Add (R2),R0
NUM100?
116 Add #4,R2 •What is the address of the memory location
120 Decrement R1 represented by the label LOOP?
124 Branch>0 LOOP •How to place a data value into a memory
128 Move R0,SUM location?
132

SUM 200
N 204 100
NUM1 208
NUM2 212

NUM 100604

53
Assembly language (contd..)
EQU:
Memory Addressing •Value of SUM is 200.
address or data
ORIGIN:
label Operation information
•Place the datablock at 204.
DATAWORD:
Assembler directives SUM EQU 200 •Place the value 100 at 204
ORIGIN 204 •Assign it label N.
N DATAWORD 100 •N EQU 100
NUM1 RESERVE 400 RESERVE:
ORIGIN 100 •Memory block of 400 words
Statements that START MOVE N,R1 is to be reserved for data.
generate MOVE #NUM1,R2
•Associate NUM1 with address
machine CLR R0
208
instructions LOOP ADD (R2),R0
ADD #4,R2 ORIGIN:
DEC R1 •Instructions of the object
BGTZ LOOP program to be loaded in memory
MOVE R0,SUM starting at 100.
Assemblerdirectives RETURN RETURN:
END START •Terminate program execution.
END:
•End of the program source text
54
Assembly language (contd..)
 Assembly language instructions have a generic form:
Label Operation Operand(s) Comment
 Four fields are separated by a delimiter, typically one or
more blank characters.
 Label is optionally associated with a memory address:
 May indicate the address of an instruction to be executed.
 May indicate the address of a data item.
 How does the assembler determine the values that
represent names?
 Value of a name may be specified by EQU directive.
• SUM EQU 100
 A name may be defined in the Label field of another
instruction, value represented by the name is determined by
the location of that instruction in the object program.
• E.g., BGTZ LOOP, the value of LOOP is the address of the
instruction ADD (R2) R0
55
Encoding of machine instructions

 Instructions specify the operation to be performed and the


operands to be used.
 Which operation is to be performed and the addressing
mode of the operands may be specified using an encoded
binary pattern referred to as the “OP code” for the
instruction.
 Consider a processor with:
 Word length 32 bits.
 16 general purpose registers, requiring 4 bits to specify the
register.
 8 bits are set aside to specify the OP code.
 256 instructions can be specified in 8 bits.

56
Encoding of machine instructions (contd..)
One-word instruction format.

8 7 7 10

OP code Source Dest Other info

Opcode : 8 bits.
Source operand : 4 bits to specify a register
3 bits to specify the addressing mode.
Destination operand : 4 bits to specify a register.
3 bits to specify the addressing mode.
Other information : 10 bits to specify other information
such as index value.

57
Encoding of machine instructions (contd..)
What if the source operand is a memory location specified
using the absolute addressing mode?

8 3 7 14

OP code Source Dest

Opcode : 8 bits.
Source operand : 3 bits to specify the addressing mode.
Destination operand : 4 bits to specify a register.
3 bits to specify the addressing mode.

•Leaves us with 14 bits to specify the address of the memory location.


•Insufficient to give a complete 32 bit address in the instruction.
•Include second word as a part of this instruction, leading to a
two-word instruction.

58
Encoding of machine instructions (contd..)

Two-word instruction format.

OP code Source Dest Other info

Memory address/Immediate operand

•Second word specifies the address of a memory location.


•Second word may also be used to specify an immediate operand.

•Complex instructions can be implemented using multiple words.


•Complex Instruction Set Computer (CISC) refers to processors
using instruction sets of this type.

59
Encoding of machine instructions (contd..)

 Insist that all instructions must fit into a single 32 bit word:
 Instruction cannot specify a memory location or an immediate
operand.
 ADD R1, R2 can be specified.
 ADD LOC, R2 cannot be specified.
 Use indirect addressing mode: ADD (R3), R2
 R3 serves as a pointer to memory location LOC.
 How to load address of LOC into R3?
 Relative addressing mode.

60
Encoding of machine instructions (contd..)

 Restriction that an instruction must occupy only one word


has led to a style of computers that are known as Reduced
Instruction Set Computers (RISC).
 Manipulation of data must be performed on operands already in
processor registers.
 Restriction may require additional instructions for tasks.
 However, it is possible to specify three operand instructions into a
single 32-bit word, where all three operands are in registers:

Three-operand instruction

OP code Ri Rj Rk Other info

61
College of Computer Science
Department of Computer Science

Chapter 4: Instruction Set Architecture (ISA)


Part B: Instruction Formats & MIPS
Instructions
Contents

 Instruction formats
 Instruction types
 MIPS instructions
 Examples

1
Instruction Set Architecture (ISA)
 Critical Interface between hardware and software
 An ISA includes the following …
 Instructions and Instruction Formats
• Data Types, Encodings, and Representations
• Addressing Modes: to address Instructions and Data
• Handling Exceptional Conditions (like division by zero)
 Programmable Storage: Registers and Memory
 Examples (Versions) First Introduced in
 Intel (8086, 80386, Pentium, ...) 1978
 MIPS (MIPS I, II, III, IV, V) 1986
 PowerPC (601, 604, …) 1993
Instructions

 Instructions are the language of the machine


 We will study the MIPS instruction set architecture
 Known as Reduced Instruction Set Computer (RISC)
 Elegant and relatively simple design
 Similar to RISC architectures developed in mid-1980’s
and 90’s
 Very popular, used in many products
• Silicon Graphics, ATI, Cisco, Sony, etc.
 Comes next in sales after Intel IA-32 processors
• Almost 100 million MIPS processors sold in 2002 (and
increasing)
 Alternative design: Intel IA-32
 Known as Complex Instruction Set Computer (CISC)
Overview of the MIPS Processor

...

4 bytes per word Memory


Up to 232 bytes = 230 words
...

EIU $0 Execution & FPU F0 Floating


32 General $1 Integer Unit F1 Point Unit
Purpose $2 (Main proc) F2 (Coproc 1) 32 Floating-Point
Registers Registers
$31 F31
Arithmetic & Integer FP
ALU
Logic Unit mul/div Arith
Floating-Point
Arithmetic Unit
Hi Lo
TMU BadVaddr Trap &
Status Memory Unit
Cause (Coproc 0)
Integer EPC
Multiplier/Divider
MIPS General-Purpose Registers
 32 General Purpose Registers (GPRs)
 Assembler uses the dollar notation to name registers
• $0 is register 0, $1 is register 1, …, and $31 is register 31
 All registers are 32-bit wide in MIPS32
$0 = $zero $16 = $s0
 Register $0 is always zero $1 = $at $17 = $s1

• Any value written to $0 is discarded $2 = $v0


$3 = $v1
$18 = $s2
$19 = $s3
 Software conventions $4 = $a0 $20 = $s4

 There are many registers (32)


$5 = $a1 $21 = $s5
$6 = $a2 $22 = $s6
 Software defines names to all registers $7 = $a3 $23 = $s7

• To standardize their use in programs $9 = $t1 $25 = $t9


$8 = $t0 $24 = $t8

 Example: $8 - $15 are called $t0 - $t7 $10 = $t2 $26 = $k0

• Used for temporary values


$11 = $t3 $27 = $k1
$12 = $t4 $28 = $gp
$13 = $t5 $29 = $sp
$14 = $t6 $30 = $fp
$15 = $t7 $31 = $ra
MIPS Register Conventions

 Assembler can refer to registers by name or by number


 It is easier for you to remember registers by name
 Assembler converts register name to its corresponding number
Name Register Usage
$zero $0 Always 0 (forced by hardware)
$at $1 Reserved for assembler use
$v0 – $v1 $2 – $3 Result values of a function
$a0 – $a3 $4 – $7 Arguments of a function
$t0 – $t7 $8 – $15 Temporary Values
$s0 – $s7 $16 – $23 Saved registers (preserved across call)
$t8 – $t9 $24 – $25 More temporaries
$k0 – $k1 $26 – $27 Reserved for OS kernel
$gp $28 Global pointer (points to global data)
$sp $29 Stack pointer (points to top of stack)
$fp $30 Frame pointer (points to stack frame)
$ra $31 Return address (used by jal for function call)
Instruction Formats
 All instructions are 32-bit wide. Three instruction formats:
 Register (R-Type)
 Register-to-register instructions
 Op: operation code specifies the format of the instruction

Op6 Rs5 Rt5 Rd5 sa5 funct6

 Immediate (I-Type)
 16-bit immediate constant is part in the instruction
Op6 Rs5 Rt5 immediate16
 Jump (J-Type)
 Used by jump instructions

Op6 immediate26
Instruction Categories
 Integer Arithmetic
 Arithmetic, logical, and shift instructions
 Data Transfer
 Load and store instructions that access memory
 Data movement and conversions
 Jump and Branch
 Flow-control instructions that alter the sequential sequence
 Floating Point Arithmetic
 Instructions that operate on floating-point registers
R-Type Format

Op6 Rs5 Rt5 Rd5 sa5 funct6

 Op: operation code (opcode)


 Specifies the operation of the instruction
 Also specifies the format of the instruction
 funct: function code – extends the opcode
 Up to 26 = 64 functions can be defined for the same opcode
 MIPS uses opcode 0 to define R-type instructions
 Three Register Operands (common to many instructions)
 Rs, Rt: first and second source operands
 Rd: destination operand
 sa: the shift amount used by shift instructions
Integer Add /Subtract Instructions
Instruction Meaning R-Type Format
add $s1, $s2, $s3 $s1 = $s2 + $s3 op = 0 rs = $s2 rt = $s3 rd = $s1 sa = 0 f = 0x20
addu $s1, $s2, $s3 $s1 = $s2 + $s3 op = 0 rs = $s2 rt = $s3 rd = $s1 sa = 0 f = 0x21
sub $s1, $s2, $s3 $s1 = $s2 – $s3 op = 0 rs = $s2 rt = $s3 rd = $s1 sa = 0 f = 0x22
subu $s1, $s2, $s3 $s1 = $s2 – $s3 op = 0 rs = $s2 rt = $s3 rd = $s1 sa = 0 f = 0x23

 add & sub: overflow causes an arithmetic exception


 In case of overflow, result is not written to destination register
 addu & subu: same operation as add & sub
 However, no arithmetic exception can occur (Add Unsigned )
 Overflow is ignored
 Many programming languages ignore overflow
 The + operator is translated into addu
 The – operator is translated into subu
Addition/Subtraction Example
 Consider the translation of: f = (g+h) – (i+j)
 Compiler allocates registers to variables
 Assume that f, g, h, i, and j are allocated registers $s0 thru
$s4
 Called the saved registers: $s0 = $16, $s1 = $17, …, $s7 =
$23
 Translation of: f = (g+h) – (i+j)
addu $t0, $s1, $s2 # $t0 = g + h
addu $t1, $s3, $s4 # $t1 = i + j
subu $s0, $t0, $t1 # f = (g+h)–(i+j)
 Temporary results are stored in $t0 = $8 and $t1 = $9
 Translate: addu $t0,$s1,$s2 to binary code

 Solution: op rs = $s1 rt = $s2 rd = $t0 sa func


000000 10001 10010 01000 00000 100001
Logical Bitwise Operations
 Logical bitwise operations: and, or, xor, nor
x y x and y x y x or y x y x xor y x y x nor y
0 0 0 0 0 0 0 0 0 0 0 1
0 1 0 0 1 1 0 1 1 0 1 0
1 0 0 1 0 1 1 0 1 1 0 0
1 1 1 1 1 1 1 1 0 1 1 0
 AND instruction is used to clear bits: x and 0 = 0
 OR instruction is used to set bits: x or 1 = 1
 XOR instruction is used to toggle bits: x xor 1 = not x
 NOR instruction can be used as a NOT, how?
 nor $s1,$s2,$s2 is equivalent to not $s1,$s2
Logical Bitwise Instructions

Instruction Meaning R-Type Format


and $s1, $s2, $s3 $s1 = $s2 & $s3 op = 0 rs = $s2 rt = $s3 rd = $s1 sa = 0 f = 0x24
or $s1, $s2, $s3 $s1 = $s2 | $s3 op = 0 rs = $s2 rt = $s3 rd = $s1 sa = 0 f = 0x25
xor $s1, $s2, $s3 $s1 = $s2 ^ $s3 op = 0 rs = $s2 rt = $s3 rd = $s1 sa = 0 f = 0x26
nor $s1, $s2, $s3 $s1 = ~($s2|$s3) op = 0 rs = $s2 rt = $s3 rd = $s1 sa = 0 f = 0x27

 Examples:
Assume $s1 = 0xabcd1234 and $s2 = 0xffff0000

and $s0,$s1,$s2 # $s0 = 0xabcd0000


or $s0,$s1,$s2 # $s0 = 0xffff1234
xor $s0,$s1,$s2 # $s0 = 0x54321234
nor $s0,$s1,$s2 # $s0 = 0x0000edcb
Examples

Multiply $s1 by 26, using shift and add instructions


Hint: 26 = 2 + 8 + 16

sll $t0, $s1, 1 ; $t0 = $s1 * 2


sll $t1, $s1, 3 ; $t1 = $s1 * 8
addu $s2, $t0, $t1 ; $s2 = $to+ $t1
sll $t0, $s1, 4 ; $t0 = $s1 * 16
addu $s2, $s2, $t0 ; $s2 = $s2+ $t0

Multiply $s1 by 31, Hint: 31 = 32 – 1

sll $s2, $s1, 5 ; $s2 = $s1 * 32


subu $s2, $s2, $s1 ; $s2 = $s1 * 31
I-Type Format
 Constants are used quite frequently in programs
 The R-type shift instructions have a 5-bit shift amount
constant
 What about other instructions that need a constant?
 I-Type: Instructions with Immediate Operands
Op6 Rs5 Rt5 immediate16

 16-bit immediate constant is stored inside the instruction


 Rs is the source register number
 Rt is now the destination register number (for R-type it was
Rd)
 Examples of I-Type ALU Instructions:
 Add immediate: addi $s1, $s2, 5 # $s1 = $s2 + 5
 OR immediate: ori $s1, $s2, 5 # $s1 = $s2 | 5
I-Type ALU Instructions

Instruction Meaning I-Type Format


addi $s1, $s2, 10 $s1 = $s2 + 10 op = 0x8 rs = $s2 rt = $s1 imm16 = 10
addiu $s1, $s2, 10 $s1 = $s2 + 10 op = 0x9 rs = $s2 rt = $s1 imm16 = 10
andi $s1, $s2, 10 $s1 = $s2 & 10 op = 0xc rs = $s2 rt = $s1 imm16 = 10
ori $s1, $s2, 10 $s1 = $s2 | 10 op = 0xd rs = $s2 rt = $s1 imm16 = 10
xori $s1, $s2, 10 $s1 = $s2 ^ 10 op = 0xe rs = $s2 rt = $s1 imm16 = 10
lui $s1, 10 $s1 = 10 << 16 op = 0xf 0 rt = $s1 imm16 = 10

 addi: overflow causes an arithmetic exception


 In case of overflow, result is not written to destination register

 addiu: same operation as addi but overflow is ignored


 Immediate constant for addi and addiu is signed
 No need for subi or subiu instructions

 Immediate constant for andi, ori, xori is unsigned


Examples: I-Type ALU Instructions
 Examples: assume A, B, C are allocated $s0, $s1, $s2

A = B+5; translated as addiu $s0,$s1,5


C = B–1; translated as addiu $s2,$s1,-1
op=001001 rs=$s1=10001 rt=$s2=10010 imm = -1 = 1111111111111111

A = B&0xf; translated as andi $s0,$s1,0xf


C = B|0xf; translated as ori $s2,$s1,0xf
C = 5; translated as ori $s2,$zero,5
A = B; translated as ori $s0,$s1,0
 No need for subi, because addi has signed immediate
 Register 0 ($zero) has always the value 0
32-bit Constants
 I-Type instructions can have only 16-bit constants

Op6 Rs5 Rt5 immediate16


 What if we want to load a 32-bit constant into a register?
 Can’t have a 32-bit constant in I-Type instructions 
 We have already fixed the sizes of all instructions to 32 bits
 Solution: use two instructions instead of one 
 Suppose we want: $s1=0xAC5165D9 (32-bit constant)
 lui: load upper immediate
load upper clear lower
16 bits 16 bits
lui $s1,0xAC51 $s1=$17 0xAC51 0x0000
ori $s1,$s1,0x65D9 $s1=$17 0xAC51 0x65D9
Conditional Branch Instructions
 MIPS compare and branch instructions:
beq Rs,Rt,label branch to label if (Rs == Rt)
bne Rs,Rt,label branch to label if (Rs != Rt)
 MIPS compare to zero & branch instructions
Compare to zero is used frequently and implemented efficiently
bltz Rs,label branch to label if (Rs < 0)
bgtz Rs,label branch to label if (Rs > 0)
blez Rs,label branch to label if (Rs <= 0)
bgez Rs,label branch to label if (Rs >= 0)
 No need for beqz and bnez instructions. Why?
Set on Less Than Instructions

 MIPS also provides set on less than instructions


slt rd,rs,rt if (rs < rt) rd = 1 else rd = 0
sltu rd,rs,rt unsigned <
slti rt,rs,im16 if (rs < im16) rt = 1 else rt = 0
sltiu rt,rs,im16 unsigned <
 Signed / Unsigned Comparisons
Can produce different results
Assume $s0 = 1 and $s1 = -1 = 0xffffffff
slt $t0,$s0,$s1 results in $t0 = 0
stlu $t0,$s0,$s1 results in $t0 = 1
More on Branch Instructions
 MIPS hardware does NOT provide instructions for …
blt, bltu branch if less than (signed/unsigned)
ble, bleu branch if less or equal (signed/unsigned)
bgt, bgtu branch if greater than (signed/unsigned)
bge, bgeu branch if greater or equal (signed/unsigned)
Can be achieved with a sequence of 2 instructions
 How to implement: blt $s0,$s1,label
 Solution: slt $at,$s0,$s1
bne $at,$zero,label
 How to implement: ble $s2,$s3,label
 Solution: slt $at,$s3,$s2
beq $at,$zero,label
Translating an IF Statement

 Consider the following IF statement:


if (a == b) c = d + e; else c = d – e;
Assume that a, b, c, d, e are in $s0, …, $s4 respectively

 How to translate the above IF statement?

beq $s0, $s1,

addu $s2, $s3, $s4


j exit
else: subu $s2, $s3, $s4
exit: . . .
Compound Expression with AND
 Programming languages use short-circuit evaluation
 If first expression is false, second expression is skipped

if (($s1 > 0) && ($s2 < 0)) {$s3++;}

# One Possible Implementation ...


bgtz $s1, L1 # first expression
j next # skip if false
L1: bltz $s2, L2 # second expression
j next # skip if false
L2: addiu $s3,$s3,1 # both are true
next:
Load and Store Instructions
 Instructions that transfer data between memory & registers

 Programs include variables such as arrays and objects

 Such variables are stored in memory

 Load Instruction:
load
 Transfers data from memory to a register
Registers Memory
 Store Instruction: store

 Transfers data from a register to memory

 Memory address must be specified by load and store


College of Computer Science

Central Processing Unit


Contents

> Processor Structure


> Basic CPU function
> Instruction cycle
> Control Unit operation
> Control unit design and implementation
> Timing & Control
> Micro operations
> Performance of the CPU
2
CPU basics

A typical CPU has three major


components:

> Register set

> Arithmetic Logic Unit (ALU)

> Control Unit (CU)

3
Register Set

> Registers are essentially extremely fast memory locations


within the CPU.

> Registers are used to create and store the results of CPU
operations.

> Different Computers have different register sets.

> They differ in the number of registers, register types, and


the length of each register.

4
Register types

> General-purpose registers can be used for multiple


purposes and assigned to a variety of functions by the
programmer.

> Special-purpose registers are restricted to only


specific functions.

— Address registers may be dedicated to a particular addressing


mode.

— Status registers is used to hold processor status bits, or flags.


These bits are set by the CPU as the result of the execution of
an operation.

5
Status register (Program Status Word)

The PSW contains bits that are set by the CPU to indicate the
current status of an executing program (arithmetic operations,
interrupts, processor status).

> Typical:
— Sign of last result
— Zero
— Carry
— Equal
— Overflow
— Interrupt enable/disable
— Supervisor mode (enable privileged instructions)

8
Example of Register Organizations (8086)

9
Arithmetic and Logic Unit (ALU)

> The component that performs the arithmetic and logical operations

> Load the operands into memory – bring them to the processor –
perform operation in ALU – store the result back to memory or
retain in the processor.

10
CPU Instruction cycle

12
Fetch Cycle

The sequence of events in fetching an instruction can be summarized as


follows:
1. The contents of the PC are loaded into the MAR.
2. The value in the PC is incremented.
3. As a result of a memory read operation, the instruction is loaded into
the MDR.
4. The contents of the MDR are loaded into the IR.

13
Execution Cycle : Execute Simple Arithmetic
Operation
Add R1, R2, R0

This instruction adds the contents of source registers R1 and R2, and stores
the results in destination register R0. This addition can be executed as follows:
1. The registers R0 , R1 , R2 , are extracted from the IR.
2. The contents of R1 and R2 are passed to the ALU for addition.

3. The output of the ALU is transferred to R0.

14
Instruction format

> Instruction typically has

— Opcode field - set of bits telling which instruction

— Address fields - memory or register address where operands


will be found.

Opcode Address of operands

15
Example of Program Execution (1)

> Instruction form for hypothetical machine


— 4-bit opcode field - 16 instruction types
— 12-bit address field- 4096 memory cells
Opcode Address
0110 001101010011

Opcode Name Effect


1 (0001) Load Load AC from memory
2 (0010) Store Store AC to memory
5 (0101) Add Add to AC from memory

16
Control Unit operation

> All computer operations are controlled by the control unit.

> The timing signals that govern the I/O transfers are also
generated by the control unit.

> On a regular processor, the control unit performs the tasks


of fetching, decoding, managing execution and then storing
results.

Register
File ALU

Control Unit
18
Timing & Control (1)

> All sequential circuits in the Basic Computer CPU are driven by a
master clock.
> At each clock pulse, the control unit sends control signals to control
inputs of the bus, the registers, and the ALU.

Machine
Cycle
1 2 3 4 5 6 7

Fetch Cycle Execution Cycle Fetch Cycle


Instruction Cycle Instruction Cycle

19
Timing & Control (2)

20
Control unit design and implementation

> Control unit design and implementation can be done by two general
methods:

— A hardwired control unit is designed from scratch using


traditional digital logic design techniques to produce a minimal,
optimized circuit. In other words, the control unit is like an ASIC
(application-specific integrated circuit).

— A microprogrammed control unit is built from some sort of


ROM. The desired control signals are simply stored in the ROM,
and retrieved in sequence to drive the microoperations needed
by a particular instruction.

21
Hardwired Control Organization

> This approach is to physically connect all of the control lines to the
actual machine instructions.

> The instructions are divided up into fields, and different bits in the
instruction are combined through various digital logic components to
drive the control-lines.

> The control unit is implemented using hardware (for example:


NAND gates, flip-flops, and counters).

22
Microprogrammed Control Organization

> A control unit whose binary control variables are stored in


memory is called a micro programmed control unit.

23
Example

24
CPU Performance

> There are various facets to the performance of a


computer.
• A metric for assessing the performance of a computer helps
comparing alternative designs.
• Performance analysis should help answering question such
as how fast can a program be executed using a given
computer?
• The clock cycle time is defined as the time between two
consecutive rising edges of a periodic clock signal.

25
Performance Measures

> We denote the number of CPU clock cycles for executing a job to be
the Cycle Count (CC), the Cycle Time by CT, and the clock
frequency by f = 1 / CT.

> The time taken by the CPU to execute a job can be expressed as:

CPU time = CC * CT = CC / f

> It is easier to count the number of instructions executed in a given


program as compared to counting the number of CPU clock cycles
needed for executing that program.

> Therefore, the average number of clock cycles per instruction


(CPI) has been used as an alternate performance measure.
26
Performance Measures

> The following equation shows how to compute the CPI:

27
Performance Measures

A different performance measure that has been given a lot of attention in


recent years is MIPS – million instructions-per-second (the rate of
instruction execution per unit time) and is defined as follows:

28
Performance Measures

MFLOPS - Million Floating-Point Instructions per second, (rate of floating-


point instruction execution per unit time) has also been used as a measure
for machines‘ performance. MFLOPS is defined as follows:

29
Performance Measures

Speedup is a measure of how a machine performs after some


enhancement relative to its original performance. The following
relationship formulates Amdahl’s law:

30
Computer Organization & Design, CNE211_ 2018 Tutorial 2 Dr. Jamel Baili

Question1: Assume the execution of the following three consecutive instructions on a CPU:
Load [520]
ADD [521]
Store [521]
If the load instruction is in memory location [400], fill each of the Program Counter (PC), Memory,
Accumulator (AC), and Instruction Register (IR) in each step.

1/5
Ans.

Question 2:
Consider the execution of a program which results in the execution of 2 million instructions on a 400MHz
processor. The program consists of four major types of instructions. The instruction mix and the CPI for each
instruction type are given below based on the result of a program trace experiment:
Instruction Type CPI Instruction
Arithmetic and logic 1 60%
Load/store with cache hit 2 18%
Branch 4 12%
Memory reference with cache 8 10%

What is the MIPS rate on this processor?

2/5
Ans.
∑ ∗
∗ ∗ . ∗ . ∗ . ∗ . .



∗ . ∗

Question3:
A benchmark program is run on a 40 MHz processor. The executed program consists of 100,000 instruction
executions, with the following instruction mix and clock cycle count:
Instruction Type Instruction Count Cycles per Instruction
Integer 45000 1
Data transfer 32000 2
Floating point 15000 2
Control transfer 8000 2
Determine the effective CPI, MIPS rate, and execution time for this program.
Ans.

∑ ∗ ∗ ∗ ∗ ∗
.


.
∗ . ∗
∗ ∗ .
.

Question 4:
Consider two different machines, with two different instruction sets, both of which have a clock rate of 200
MHz. The following measurements are recorded on the two machines running a given set of benchmark
programs:
Instruction Count
Instruction Type (millions) Cycles per Instruction
Machine A
Arithmetic and logic 4 1
Load and store 8 3
Branch 2 4
Others 4 3
Machine B
Arithmetic and logic 10 1
Load and store 8 3
Branch 2 4
Others 4 3
a) Determine the effective CPI, MIPS rate, and execution time for each machine.
b) Comment on the results.
Ans. (a)

3/5
∑ ∗ ∗ ∗ ∗ ∗ ∗
.


.
∗ . ∗
∗ ∗ ∗ .
.

∑ ∗ ∗ ∗ ∗ ∗ ∗
.


.
∗ . ∗
∗ ∗ ∗ .
.

(b) Although machine B has a higher MIPS than machine A, it requires a longer CPU time to execute the
same set of benchmark programs. (Different instruction count and instruction sets).

Question 5:
Consider the execution of a program which results in the execution of 2 million instructions on a 400-MHz
processor. The program consists of four major types of instructions. The instruction mix and the CPI for each
instruction type are given below based on the result of a program trace experiment:
Instruction Type CPI Instruction Mix

Arithmetic and logic 1 50%


Load/store with cache hit 5 20%
Branch 3 10%
Memory reference with cache miss 2 20%
a) What is the MIPS rate on this processor?
b) If a CPU design enhancement improves the CPI of (Load/store with cache hit instructions) from 5 to
2, what is the resulting performance improvement from this enhancement (speedup)?
Ans.
Before improvement:
∑ ∗
∗ ∗ . ∗ . ∗ . ∗ . .


.
∗ . ∗

∗ ∗ ∗ .
.

After improvement:

4/5
∑ ∗
∗ ∗ . ∗ . ∗ . ∗ . .

∗ ∗ ∗ .
.

.
.
.

Question 6:
Computer A has an overall CPI of 1.5 and can be run at a clock rate of 800MHz. Computer B has a CPI of 3
and can be run at a clock rate of 900 Mhz. We have a particular program we wish to run. When compiled for
computer A, this program has exactly 150,000 instructions.
a) How many instructions would the program need to have when compiled for Computer B, in order for the
two computers to have exactly the same execution time for this program?
b) If a CPU design enhancement improves the CPI of Computer A so that the speed up is 1.5, what is the new
CPI?
c) If a CPU design enhancement improves the CPI of Computer B to be 2.5, what is the speed up?

Answer Key

a) (CPUTime)A = (Instruction count)A * (CPI)A * (Clock cycle Time)A

= (150,000)*(1.5)/(800*106)
(CPUTime)B = (Instruction count)B * (CPI)B * (Clock cycle Time)B
= (Instruction count)B*(3)/(900*106)
Since (CPUTime)A = (CPUTime)B,

-We have to solve for (Instruction count)B and get 84375


.
, ∗
b) ⇨ ∗
.

.
∗ ∗
.


c) ⇨ ∗
. .

5/5
College of Computer Science
Department of Computer Science

Chapter 4: Memory System

Course CSM344 : Computer Organization


Computer Organization

Contents

> Computer memory system overview


> Types of memory
> Hierarchical memory systems
> Cache memory
> Memory Management Hardware

2
Computer Organization

Main memory

> Main memory is the second major subsystem in a


computer (After CPU).

> Holds instructions and data needed for programs that


are currently running.

> consists of a collection of storage locations, each with a


unique identifier, called an address.

> Data is transferred to and from memory in groups of bits


called words. A word can be a group of 8 bits, 16 bits,
32 bits or 64 bits (and growing). If the word is 8 bits, it is
referred to as a byte.

3
Computer Organization

Main memory (2)

4
Computer Organization

Memory units

Unit Number of bytes


------------ ------------------------
kilobyte 210 bytes
megabyte 220 bytes
gigabyte 230 bytes
terabyte 240 bytes
petabyte 250 bytes
exabyte 260 bytes

5
Computer Organization

Example 1

A computer has 32 MB (megabytes) of memory. How many


bits are needed to address any single byte in memory?

Solution
The memory address space is 32 MB, or 225 (25 × 220).
This means that we need log2 225, or 25 bits, to address
each byte.

6
Computer Organization

Example 2

A computer has 128 MB of memory. Each word in this


computer is eight bytes. How many bits are needed to
address any single word in memory?

Solution
The memory address space is 128 MB, which means 227.
However, each word is eight (23) bytes, which means that
we have 224 words. This means that we need log2 224, or 24
bits, to address each word.

7
Memory types

> Most of the main memory in a general purpose


computer is made up of RAM integrated circuits
chips, but a portion of the memory may be
constructed with ROM chips

> RAM– Random Access memory


— Integated RAM are available in two possible
operating modes, Static and Dynamic
> ROM– Read Only memory
Random-Access Memory (RAM)

> Static RAM (SRAM)


— Each cell stores bit with a six-transistor circuit.
— Retains value indefinitely, as long as it is kept powered.
— Relatively insensitive to disturbances such as electrical noise.
— Faster (8-16 times faster) and more expensive (8-16 times more
expensice as well) than DRAM.

> Dynamic RAM (DRAM)


— Each cell stores bit with a capacitor and transistor.
— Value must be refreshed every 10-100 ms.
— Sensitive to disturbances.
— Slower and cheaper than SRAM.
SRAM vs DRAM Summary

Tran. Access
per bit time Persist? Sensitive? Cost Applications

SRAM 6 1X Yes No 100x cache memories

DRAM 1 10X No Yes 1X Main memories,


frame buffers

> Virtually all desktop or server computers since


1975 used DRAMs for main memory and
SRAMs for cache
ROM (Read Only Memory)

> ROM is used for storing programs that are


PERMENTLY resident in the computer and for tables of
constants that do not change in value once the
production of the computer is completed

> The ROM portion of main memory is needed for storing


an initial program called bootstrap loader, witch is to
start the computer software operating when power is
turned off
Main Memory

> A RAM chip is better suited for communication


with the CPU if it has one or more control inputs
that select the chip when needed

> The Block diagram of a RAM chip is shown next


slide, the capacity of the memory is 128 words
of 8 bits (one byte) per word
RAM
ROM
Memory Address Map

> Memory Address Map is a pictorial representation of assigned


address space for each chip in the system

> To demonstrate an example, assume that a computer system


needs 512 bytes of RAM and 512 bytes of ROM

> The RAM have 128 byte and need seven address lines, where
the ROM have 512 bytes and need 9 address lines
Memory Address Map
Memory Address Map

> The hexadecimal address assigns a range of hexadecimal


equivalent address for each chip

> Line 8 and 9 represent four distinct binary combination to specify


which RAM we chose

> When line 10 is 0, CPU selects a RAM. And when it’s 1, it selects
the ROM
Computer Organization

Memory Hierarchy

> Generally speaking, faster memory is more expensive


than slower memory.

> To provide the best performance at the lowest cost,


memory is organized in a hierarchical fashion.

> Small, fast storage elements are kept in the CPU, larger,
slower main memory is accessed through the data bus.

> Larger, (almost) permanent storage in the form of disk and


tape drives is still further from the CPU.

19
Computer Organization

Memory Hierarchy

> This storage organization can be thought of as a pyramid:

20
Computer Organization

Memory Hierarchy

> To access a particular piece of data, the CPU first sends


a request to its nearest memory, usually cache.
> If the data is not in cache, then main memory is queried.
If the data is not in main memory, then the request goes
to disk.
> Once the data is located, then the data, and a number of
its nearby data elements are fetched into cache
memory.

21
Computer Organization

Memory cache

> The purpose of cache memory is to speed up accesses by


storing recently used data closer to the CPU, instead of
storing it in main memory.

> Although cache is much smaller than main memory, its


access time is a fraction of that of main memory.

> Unlike main memory, which is accessed by address, cache


is typically accessed by content; hence, it is often called
content addressable memory.

> Because of this, a single large cache memory isn’t always


desirable-- it takes longer to search.
22
Cache memory

> If the active portions of the program and data are placed
in a fast small memory, the average memory access
time can be reduced.

> Thus reducing the total execution time of the program


such a fast small memory is referred to as cache
memory.

> The cache is the fastest component in the memory


hierarchy and approaches the speed of CPU component
Cache memory

> When CPU needs to access memory, the cache


is examined.

> If the word is found in the cache, it is read from


the fast memory.

> If the word addressed by the CPU is not found


in the cache, the main memory is accessed to
read the word.
Cache memory

> When the CPU refers to memory and finds the word in
cache, it is said to produce a hit. Otherwise, it is a miss

> The performance of cache memory is frequently


measured in terms of a quantity called hit ratio

> Hit ratio = hit / (hit+miss)


Cache memory

> The basic characteristic of cache memory is its fast


access time,
> Therefore, very little or no time must be wasted when
searching the words in the cache

> The transformation of data from main memory to cache


memory is referred to as a mapping process, there are
three types of mapping:
— Associative mapping
— Direct mapping
— Set-associative mapping
Mapping memory to cache

Cache mapping options – 8 cache slots


01234567

Direct mapped
19 mod 8 = slot 3

01234567 ....... 19 31

Main memory blocks


Mapping memory to cache

Cache mapping options – 8 cache slots


01234567

Direct mapped
Memory blocks 3, 11, 19, and 27 all
map to cache slot 3

01234567 ....... 19 31

Main memory blocks


Mapping memory to cache

Cache mapping options – 8 cache slots


01234567

Set# 0 1 2 3
2-way set associative
19 mod 4 = set 3 (slots 6 or 7)
Mem blocks 3, 7, 11, 15, 19, 23, 27, 31 all map to set 3

01234567 ....... 19 31

Main memory blocks


Mapping memory to cache

Cache mapping options – 8 cache slots


01234567

Fully associative
19 mod 1 = set 1 (All slots)
All mem blocks map anywhere in cache

01234567 ....... 19 31
Mapping Function
• Cache of 64kByte
• Cache block of 4 bytes
—i.e. cache is 16k (214) lines of 4 bytes
• 16MBytes main memory
• 24 bit address
—(224=16M)
Direct Mapping
• Each block of main memory maps to only
one cache line
—i.e. if a block is in cache, it must be in one
specific place
• Address is in two parts
• Least Significant w bits identify unique
word
• Most Significant s bits specify one
memory block
• The MSBs are split into a cache line field r
and a tag of s-r (most significant)
Direct Mapping
Address Structure

Tag s-r Line or Slot r Word w


8 14 2

• 24 bit address
• 2 bit word identifier (4 byte block)
• 22 bit block identifier
— 8 bit tag (=22-14)
— 14 bit slot or line
• No two blocks in the same line have the same Tag field
• Check contents of cache by finding line and checking Tag
Direct Mapping from Cache to Main Memory
Direct Mapping Cache Organization
Direct Mapping Summary
• Address length = (s + w) bits
• Number of addressable units = 2s+w
words or bytes
• Block size = line size = 2w words or bytes
• Number of blocks in main memory = 2s+
w/2w = 2s
• Number of lines in cache = m = 2r
• Size of tag = (s – r) bits
Direct Mapping pros & cons
• Simple
• Inexpensive
• Fixed location for given block
—If a program accesses 2 blocks that map to
the same line repeatedly, cache misses are
very high
Associative Mapping
• A main memory block can load into any
line of cache
• Memory address is interpreted as tag and
word
• Tag uniquely identifies block of memory
• Every line’s tag is examined for a match
• Cache searching gets expensive
Associative Mapping from
Cache to Main Memory
College of Computer Science

Department of Computer Science

Chapter 5: Input/output Organization

Course CSM 344 : Computer Organization

1439/1440
Computer Organization

Contents

> I/O Architectures


> I/O Controller function
> Data Transfer Techniques
> Programmed I/O
> Interrupt,
> DMA channel
> I/O port addresses
> UART Communications
2
External Devices

> Human readable


— Screen, printer, keyboard
> Machine readable
— Monitoring and control
> Communication
— Modem
— Network Interface Card (NIC)

3
Accessing Input/Output Devices

> Wide variety of peripherals


— Delivering different amounts of data
— At different speeds
— In different formats
> All slower than CPU and RAM
> Need I/O modules

4
I/O Module Function

> Control & Timing


> CPU Communication
> Device Communication
> Data Buffering
> Error Detection

5
I/O Module Diagram

6
Example of I/O interface for Input device

7
Data Transfer Techniques

> Several ways of transferring data


— Programmed I/O
– Program uses a busy-wait loop
– Anticipated transfer
— Interrupt-driven I/O
– Interrupts are used to initiate and/or terminate data transfers
– Powerful technique
– Handles unanticipated transfers
— Direct memory access (DMA)
– Special controller (DMA controller) handles data transfers
– Typically used for bulk data transfer
8
Three Techniques for transfer of a Block of Data

9
Programmed I/O

> CPU has direct control over I/O


— Sensing status
— Read/write commands
— Transferring data
> CPU waits for I/O module to complete operation
> Wastes CPU time

10
Programmed I/O - detail

1. CPU requests I/O operation


2. I/O module performs operation
3. I/O module sets status bits
4. CPU checks status bits periodically
5. I/O module does not inform CPU directly
6. I/O module does not interrupt CPU
7. CPU may wait or come back later

11
I/O Commands

> CPU issues address


— Identifies module (& device if >1 per module)
> CPU issues command
— Control - telling module what to do
– e.g. spin up disk
— Test - check status
– e.g. power? Error?
— Read/Write
– Module transfers data via buffer from/to device

12
Computer Organization

I/O port addresses

> Under programmed I/O data transfer is very like memory access
(CPU viewpoint)

> Each I/O device connected to your computer is mapped to a unique


I/O (Input/Output) address. These addresses are assigned to every
I/O port on your computer, including USB, Ethernet, VGA,
and DVI ports, as well as any other ports your computer might have.

> Having a unique address assigned to each port allows your


computer to easily recognize and locate devices attached to your
computer.

> Whether it is a keyboard, mouse, monitor, printer, or any other


device, the computer can locate it by its I/O address.

13
I/O Mapping

> Memory mapped I/O


— Devices and memory share an address space
— I/O looks just like memory read/write
— No special commands for I/O
– Large selection of memory access commands available
> Isolated I/O
— Separate address spaces
— Need I/O or memory select lines
— Special commands for I/O
– Limited set

14
Memory Mapped and Isolated I/O

15
Interrupt Driven I/O

> Overcomes CPU waiting


> No repeated CPU checking of device
> I/O module interrupts when ready

16
Interrupt Driven I/O Basic Operation

> CPU issues read command


> I/O module gets data from peripheral whilst CPU does
other work
> I/O module interrupts CPU
> CPU requests data
> I/O module transfers data

17
Simple Interrupt
Processing

18
CPU Viewpoint

> Issue read command


> Do other work
> Check for interrupt at end of each instruction cycle
> If interrupted:-
— Save context (registers)
— Process interrupt
– Fetch data & store
> See Operating Systems notes

19
Changes in Memory and Registers
for an Interrupt

20
Direct Memory Access

> Interrupt driven and programmed I/O require active CPU


intervention
— Transfer rate is limited
— CPU is tied up
> DMA (Direct Memory Access) is the solution.

21
DMA Function

> Additional Module (hardware) on bus


> DMA controller takes over from CPU for I/O

22
Typical DMA Module Diagram

23
DMA Operation

1. CPU tells DMA controller:-


— Read/Write
— Device address
— Starting address of memory block for data
— Amount of data to be transferred
2. CPU carries on with other work
3. DMA controller deals with transfer
4. DMA controller sends interrupt when finished

24
DMA Transfer Cycle Stealing

> DMA controller takes over bus for a cycle


> Transfer of one word of data
> Not an interrupt
— CPU does not switch context
> CPU suspended just before it accesses bus
— i.e. before an operand or data fetch or a data write
> Slows down CPU but not as much as CPU doing
transfer

25
DMA and Interrupt Breakpoints During an
Instruction Cycle

26
Computer Organization

Serial interface -- universal asynchronous


receiver transmitter : UART
RS232 standard
3 wires
+10 V=‘0’=SPACE
-10V=‘1’=MARK
Pin3 Pin2
RS232 Pin2 Pin3 RS232 port
port pin5 pin5 (UART)
(UART)

27
Computer Organization

UART for data communication

> Serial data transmission means sending data bits one by one using
one wire.
> Asynchronous transmission means a data (including one start bit ,
8-bit data, and stop bits) can be sent at any time.
> RS232 is a serial communication standard.
> Since it is asynchronous, no external clock is needed, only 3 wires
are required for the simplest RS232 connection {GND, tx(transmit),
rx(receive)}

28
College of Computer Science
Department of Computer Science

Chapter 6: Advanced Computers

2018-2019
Contents

 Parallel Computers
 Pipelining Concepts
 Characteristics of RISC and CISC machines

1
What is Parallel Computing? (1)

 Traditionally, software has been written for serial


computation:
 To be run on a single computer having a single Central
Processing Unit (CPU);
 A problem is broken into a discrete series of
instructions.
 Instructions are executed one after another.
 Only one instruction may execute at any moment in time.

2
What is Parallel Computing? (2)

 In the simplest sense, parallel computing is the simultaneous use of


multiple compute resources to solve a computational problem.
 To be run using multiple CPUs
 A problem is broken into discrete parts that can be solved concurrently
 Each part is further broken down to a series of instructions
 Instructions from each part execute simultaneously on different
CPUs

3
Parallel Computing: Resources

 The compute resources can include:


 A single computer with multiple processors;
 A single computer with (multiple) processor(s) and some
specialized computer resources (GPU, FPGA …)
 An arbitrary number of computers connected by a
network;
 A combination of both.

4
Parallel Computing: The computational problem

 The computational problem usually demonstrates


characteristics such as the ability to be:
 Broken apart into discrete pieces of work that can be
solved simultaneously;
 Execute multiple program instructions at any moment in
time;
 Solved in less time with multiple compute resources than
with a single compute resource.

5
Parallel Computing: what for? (1)

 Parallel computing is an evolution of serial computing that


attempts to emulate what has always been the state of
affairs in the natural world: many complex, interrelated
events happening at the same time, yet within a sequence.
 Some examples:
 Weather and ocean patterns
 Tectonic plate drift
 Automobile assembly line
 ……

6
Pipelining versus Serial Execution: Pipelining Example

 Laundry Example: Three Stages

1. Wash dirty load of clothes

2. Dry wet clothes

3. Fold and put clothes into drawers

 Each stage takes 30 minutes to complete

 Four loads of clothes to wash, dry, and fold


A B
C D

7
Sequential Laundry

6 PM 7 8 9 10 11 12 AM
Time 30 30 30 30 30 30 30 30 30 30 30 30

 Sequential laundry takes 6 hours for 4 loads


 Intuitively, we can use pipelining to speed up laundry

8
Pipelined Laundry: Start Load ASAP

6 PM 7 8 9 PM
30 30 30
30 30 30 Time
30 30 30
30 30 30

A  Pipelined laundry takes 3


hours for 4 loads
B  Speedup factor is 2 for
4 loads
C  Time to wash, dry, and
fold one load is still the
D same (90 minutes)

9
Serial Execution versus Pipelining
 Consider a task that can be divided into k subtasks
 The k subtasks are executed on k different stages
 Each subtask requires one time unit
 The total execution time of the task is k time units
 Pipelining is to overlap the execution
 The k stages work in parallel on k different tasks
 Tasks enter/leave pipeline at the rate of one task per
time unit

1 2 … k 1 2 … k
1 2 … k 1 2 … k
1 2 … k 1 2 … k

Without Pipelining With Pipelining


One completion every k time units One completion every 1 time unit

10
Synchronous Pipeline
 Uses clocked registers between stages
 Upon arrival of a clock edge …
 All registers hold the results of previous stages
simultaneously
 The pipeline stages are combinational logic circuits
 It is desirable to have balanced stages
 Approximately equal delay in all stages
 Clock period is determined by the maximum stage delay
Register

Register

Register
Register

Input S1 S2 Sk Output

Clock

11
Pipeline Performance

 Let i = time delay in stage Si


 Clock cycle  = max(i) is the maximum stage delay
 Clock frequency f = 1/ = 1/max(i)
 A pipeline can process n tasks in k + n – 1 cycles
 k cycles are needed to complete the first task
 n – 1 cycles are needed to complete the remaining n – 1
tasks
 Ideal speedup of a k-stage pipeline over serial execution

Serial execution in cycles nk


Sk = = Sk → k for large n
Pipelined execution in cycles k+n–1

12
MIPS Processor Pipeline

 Five stages, one cycle per stage


1. IF: Instruction Fetch from instruction memory

2. ID: Instruction Decode, register read

3. EX: Execute operation, calculate load/store address or


J/Br address

4. MEM: Memory access for load and store

5. WB: Write Back result to register

13
Single-Cycle vs Pipelined Performance

 Consider a 5-stage instruction execution in which …


 Instruction fetch = ALU operation = Data memory access =
200 ps
 Register read = register write = 150 ps
 What is the clock cycle of the single-cycle processor?
 What is the clock cycle of the pipelined processor?
 What is the speedup factor of pipelined execution?
 Solution
Single-Cycle Clock = 200+150+200+200+150 = 900 ps

IF Reg ALU MEM Reg


900 ps IF Reg ALU MEM Reg
900 ps

14
Single-Cycle versus Pipelined – cont’d
 Pipelined clock cycle = max(200, 150) = 200 ps
IF Reg ALU MEM Reg
200 IF Reg ALU MEM Reg
200 IF Reg ALU MEM Reg
200 200 200 200 200

 CPI for pipelined execution = 1


 One instruction completes each cycle (ignoring pipeline
fill)
 Speedup of pipelined execution = 900 ps / 200 ps = 4.5
 Instruction count and CPI are equal in both cases
 Speedup factor is less than 5 (number of pipeline stage)
 Because the pipeline stages are not balanced

15
Pipeline Performance Summary
 Pipelining doesn’t improve latency of a single instruction
 However, it improves throughput of entire workload
 Instructions are initiated and completed at a higher
rate
 In a k-stage pipeline, k instructions operate in parallel
 Overlapped execution using multiple hardware resources
 Potential speedup = number of pipeline stages k
 Unbalanced lengths of pipeline stages reduces speedup
 Pipeline rate is limited by slowest pipeline stage
 Unbalanced lengths of pipeline stages reduces speedup
 Also, time to fill and drain pipeline reduces speedup

16
Single-Cycle Datapath
 Shown below is the single-cycle datapath
 How to pipeline this single-cycle datapath?
Answer: Introduce pipeline registers at end of each stage

IF = Instruction Fetch ID = Instruction Decode EX = Execute MEM = Memory Access

WB = Write Back
& Register Read
Branch Target Address

Jump Target = PC[31:28] ǁ Imm26

Next PC Address
ExtOp +
Imm16
+1 Ext Zero ALU result

Instruction Rs BusA
Data
RA
A
00

0 Memory Memory
Registers L Address
Rt 0
Address U
PC

1 RB 1 Data_out
Instruction 1
2 0
BusB 0
Rd RW Data_in
1 BusW

clk

PCSrc RegDst RegWr ALUSrc ALUOp MemRd MemWr WBdata


17
Pipelined Datapath
 Pipeline registers are shown in green, including the PC
 Same clock edge updates all pipeline registers and PC
 In addition to updating register file and data memory (for
store)
IF = Instruction Fetch ID = Instruction Decode EX = Execute MEM = Memory Access
& Register Read
Branch Target Address

Jump Target = PC[31:28] ǁ Imm26

WB = Write Back
BTA
Next PC Address +
NPC

ExtOp

+1 Imm16

Imm
Zero ALU Result
Ext

Instruction Rs Data

A
RA BusA
Memory A Memory
00

0
Registers L Address

Data
Rt
PC

0
1 Address
RB 1
U
Data_out
Inst

Instruction 1
2 0 BusB 0
B

Rd RW Data_in

D
1 BusW

clk

PCSrc RegDst RegWr ALUSrc ALUOp MemRd MemWr WBdata


18
Characteristics of RISC and CISC machines

19

You might also like