Professional Documents
Culture Documents
Digital systems are used extensively in computation and data processing, control
systems, communications, and measurement. Because digital systems are capable of
greater accuracy and reliability than analog systems, many tasks formerly done by
analog systems are now being performed digitally.
In a digital system, the physical quantities or signals can assume only discrete
values, while in analog systems the physical quantities or signals may vary continuously
over a specified range. For example, the output voltage of a digital system might be
constrained to take on only two values such as 0 volts and 5 volts,
while the output voltage from an analog system might be allowed to assume any
value in the range -10 volts to +10 volts.
Because digital systems work with discrete quantities, in many cases they can be
designed so that for a given input, the output is exactly correct. For example, if we
multiply two 5-digit numbers using a digital multiplier, the 10-digit product will be
correct in all 10 digits. On the other hand, the output of an analog multiplier might
have an error ranging from a fraction of one percent to a few percent depending
on the accuracy of the components used in construction of the multiplier.
Switching circuit
Many of a digital system’s take the form of a switching circuit. A switching circuit has one
or more inputs and one or more outputs which take on discrete values.
Decimal Notation
953.7810 = 9x102 + 5x101 + 3x100 + 7x10-1 + 8x10-2
Binary
1011.112 = 1x23 + 0x22 + 1x21 + 1x20 + 1x2-1 + 1x2-2
= 8 + 0 + 2 + 1 + 1/2 + 1/4
= 11.7510
EXAMPLE: Convert 5310 to binary.
Conversion (a)
EXAMPLE: Convert .62510 to binary.
Conversion (b)
EXAMPLE: Convert 0.710 to binary.
Conversion (c)
EXAMPLE: Convert 231.34 to base 7.
Conversion (d)
Binary Hexadecimal
Conversion
Equation (1-1)
Conversion from binary to hexadecimal (and conversely) can
be done by inspection because each hexadecimal digit
corresponds to exactly four binary digits (bits).
Add 1310 and 1110 in binary.
Addition
Subtraction (a)
The subtraction table for binary numbers is
0–0=0
0–1=1 and borrow 1 from the next column
1–0=1
1–1=0
Subtraction (c)
Multiplication (a)
The multiplication table for binary numbers is
0x0=0
0x1=0
1x0=0
1x1=1
The following example illustrates
multiplication of 1310 by 1110 in binary:
Multiplication (b)
When doing binary multiplication, a common way to avoid carries greater than 1
is to add in the partial products one at a time as illustrated by the following
example:
1111 multiplicand
1101 multiplier
1111 1st partial product
0000 2nd partial product
(01111) sum of first two partial products
1111 3rd partial product
(1001011) sum after adding 3rd partial product
1111 4th partial product
11000011 final product (sum after adding 4th
partial product)
Multiplication (c)
Binary Division
Binary division is similar to decimal division, except it
is much easier because the only two possible quotient
digits are 0 and 1.
We start division by comparing the divisor with the
upper bits of the dividend.
If we cannot subtract without getting a negative result,
we move one place to the right and try again.
If we can subtract, we place a 1 for the quotient above
the number we subtracted from and append the next
dividend bit to the end of the difference and repeat this
process with this modified difference until we run out
of bits in the dividend.
The following example illustrates
division of 14510 by 1110 in binary:
Binary Division
3 Systems for representing negative
numbers in binary
Sign & Magnitude: Most significant bit is the sign
Ex: – 510 = 11012
2’s Complement: N* = 2n - N
Ex: – 510 = 24 – 5 = 16 – 5 = 1110 = 10112
X' = 1 if X = 0
X' = 0 if X = 1
Section 2.2, p. 35
AND Gate
Note that C = 1 if
and only if A and B
are both 1.
Section 2.2, p. 36
OR Gate
Note that C = 1 if
and only if A or B (or
both) are 1.
Section 2.2, p. 36
Switches
If switch X is open, then we will define the
value of X to be 0; if switch X is closed, then
we will define the value of X to be 1.
Section 2.2, p. 36
T = AB
Section 2.2, p. 36
T = A+B
Section 2.2, p. 37
F = AB’ + C
F = [A(C + D)]’ + BE
Figure 2-1: Circuits for Expressions (2-1) and (2-
2)
Figure 2-2(b) shows a
truth table which
specifies the output of
the circuit in Figure 2-
2(a) for all possible
combinations of values
of the inputs A and B.
Idempotent laws
Laws of complementarity
Section 2.4, p. 39
Section 2.4, p. 40
A in parallel with A’ can be replaced with a closed circuit
because one or the other of the two switches is always
closed.
Similarly, switch A in series with A’ can be replaced with
an open circuit because one or the other of the two
switches is always open.
Commutative and Associative Laws
Many of the laws of ordinary algebra, such as commutative and
associative laws, also apply to Boolean algebra. The commutative
laws for AND and OR, which follow directly from the definitions of
the AND and OR operations, are
This means that the order in which the variables are written will not
affect the result of applying the AND and OR operations. The
associate laws also apply to AND and OR:
Section 2.6, p. 43
F = A(A’ + B)
By Theorem (2-14), (X + Y’) = XY, the expression F
simplifies to AB.
This expression has the same form as (2-13) if we let X = A′ and Y = BC.
Therefore, the expression simplifies to Z = X + XY = X =A′.
Example 2
Note that in this example we let Y = (AB + C)′ rather than (AB + C) in order
to match the form of (2-14D).
(2-17):
A + B′ + C + D′E
Figure 2-5: Circuits for Equations (2-15) and (2-
17)
(2-18):
(A +B′)(C + D′ + E)(A + C′ + E′)
(2-20):
AB′C(D′ + E)
Idempotent laws:
3. X + X = X 3D. X • X = X
Involution law:
4. (X')' = X
Laws of complementarity:
5. X + X' = 1 5D. X • X' = 0
LAWS AND THEOREMS
(b)
Commutpat.ive5la5ws:
6. X + Y = Y + X 6D. XY = YX
Associative laws:
7. (X + Y) + Z = X + (Y + Z) 7D. (XY)Z = X(YZ) = XYZ
=X+Y+Z
Distributive laws:
8. X(Y + Z) = XY + XZ 8D. X + YZ = (X + Y)(X + Z)
Simplification theorems:
9. XY + XY' = X 9D. (X + Y)(X + Y') = X
10. X + XY = X 10D. X(X + Y) = X
11. (X + Y')Y = XY 11D. XY' + Y = X + Y
LAWS AND THEOREMS (c)
p. 55
DeMorgan's laws:
12. (X + Y + Z +...)' = X'Y'Z'... 12D. (XYZ...)' = X' + Y' + Z' +...
Duality:
13. (X + Y + Z +...)D = XYZ... 13D. (XYZ...)D = X + Y + Z +...
Consensus theorem:
15. XY + YZ + X'Z = XY + X'Z
15D. (X + Y)(Y + Z)(X' + Z) = (X + Y)(X' + Z)
Distributive Laws
Given an expression in product-of-sums form, the
corresponding sum-of-products expression can be
obtained by multiplying out, using the two distributive laws:
X(Y + Z) = XY + XZ (3-1)
(X + Y)(X + Z) = X + YZ (3-2)
Example (3-4), p. 63
The same theorems that are useful for multiplying out
expressions are useful for factoring. By repeatedly
applying (3-1), (3-2), and (3-3), any expression can be
converted to a product-of-sums form.
Exclusive-OR and Equivalence Operations
Section 3.2, p. 64
The following theorems apply to exclusive OR:
We will use the following
symbol for an
equivalence gate:
Section 3.2, p. 65
Because equivalence is the complement of exclusive-OR,
an alternate symbol of the equivalence gate is an
exclusive-OR gate with a complemented output:
Section 3.2, p. 66
Example 1:
Example 2:
Dual Form:
(X + Y)(X’ + Z)(Y + Z) = (X + Y)(X’ + Z) (3-21)
If y = z, then x + y = x + z (3-33)
If y = z, then xy = xz (3-34)
0
Course Objective
1
Textbooks
2
Contents
Introduction
Functional units of a computer
Basic Computer Organization
Information in a computer :Instructions, Data,…
Bus structures
3
Introduction: What is a computer?
4
Functional units of a computer
Input unit accepts Arithmetic and logic unit(ALU):
information: •Performs the desired
•Human operators, operations on the input
•Electromechanical devices information as determined
•Other computers by instructions in the memory
Memory Arithmetic
Input Instr1 & Logic
Instr2
Instr3
Data1
Output Data2 Control
5
Basic Computer Organization
6
Simple Computer Organization - Memory Details
7
Information in a computer -- Instructions
8
Information in a computer -- Data
9
Input unit
Binary information must be presented to a computer in a specific format. This
task is performed by the input unit:
- Interfaces with input devices.
- Accepts binary information from the input devices.
- Presents this binary information in a format expected by the computer.
- Transfers this information to the memory or processor.
Real world Computer
Memory
Keyboard
Audio input
Input Unit
……
Processor
10
Memory unit
11
Memory unit (contd..)
Processor reads/writes to/from memory based on the
memory address:
Access any word location in a short and fixed amount of time
based on the address.
Random Access Memory (RAM) provides fixed access time
independent of the location of the word.
Access time is known as “Memory Access Time”.
Memory and processor have to “communicate” with each
other in order to read/write information.
In order to reduce “communication time”, a small amount of
RAM (known as Cache) is tightly coupled with the processor.
Modern computers have three to four levels of RAM units with
different speeds and sizes:
Fastest, smallest known as Cache
Slowest, largest known as Main memory.
12
Memory unit (contd..)
13
Arithmetic and logic unit (ALU)
14
Output unit
•Computers represent information in a specific binary form. Output units:
- Interface with output devices.
- Accept processed results provided by the computer in specific binary form.
- Convert the information in binary form to a form understood by an
output device.
Memory Printer
Graphics display
Speakers
……
Output Unit
Processor
15
Control unit
16
How are the functional units connected?
Bus
17
Organization of cache and main memory
Bus
Why is the access time of the cache memory lesser than the
access time of the main memory?
18
Bus structures
19
Data Bus
20
The address lines: Address Bus
21
The control lines: Control Bus
22
College of Computer Science
Department of Computer Science
Introduction, ISA
Data representations in Computer Systems
Instruction Set
Instruction formats
Instruction types
Addressing modes
Assembly language
1
Instruction Set Architecture (ISA): Introduction
2
Data representations in Computer Systems :Data types
3
Number systems
The binary system is also called the base-2 system. (101100.011)
Our decimal system is the base-10 system. It uses powers of 10 for
each position in a number. (975.3)
Any integer quantity can be represented exactly using any base (or
radix). (3077 octal or 2BAD hex)
-The decimal number 947 in powers of 10 is:
4
Converting Numbers Between Bases
5
Converting Numbers Between Bases (contd..)
6
Converting between Base 16, 8 and Base 2
7
Examples
8
Signed Integer Representation
9
Two's Complement Representation
12
Floating-point representations (contd.)
13
Instruction Set: Execution of an instruction
14
Basic processor architecture
Memory
MAR MDR
Control
PC R0
R1 General purpose
IR registers
ALU
Instruction that is R(n-1)
-
currently being
n general purpose
executed registers Processor
15
Basic processor architecture (contd..)
Control Data
Path Path
16
Registers in the control path
17
Fetch/Execute cycle
18
Memory organization
Recall:
Information is stored in the memory as a collection of bits.
Collection of bits are stored or retrieved simultaneously is
called a word.
Number of bits in a word is called word length.
Word length can be 16 to 64 bits.
Another collection which is more basic than a word:
Collection of 8 bits known as a “byte”
Bytes are grouped into words, word length can also be
expressed as a number of bytes instead of the number of
bits:
Word length of 16 bits, is equivalent to word length of 2 bytes.
Words may be 2 bytes (older architectures), 4 bytes (current
architectures), or 8+ bytes (modern architectures).
19
Memory organization (contd..)
20
Memory organization (contd..)
Byte 0
•Memory is viewed as a sequence of
bytes.
•Address of the first byte is 0
k
•Address of the last byte is 2 - 1,
where k is the number of bits used
to hold memory address
•E.g. when k = 16,
Address of the first byte is 0
Address of the last byte is 65535
•E.g. when k = 2,
Address of the first byte is ?
Address of the last byte is ?
k
Byte 2 -1
21
Memory organization (contd..)
Word #0 Byte 0
Byte 1
Consider a memory organization:
Byte 2 16-bit memory addresses
Byte 3 Size of the memory is ?
Word #1 Byte 4 Word length is 4 bytes
Number of words = Memory size(bytes) = ?
Word length(bytes)
Word #0 starts at Byte #0.
Word #1 starts at Byte #4.
Last word (Word #?) starts at Byte#?
22
Memory organization (contd..)
Byte 0 Word #0
Byte 1
Byte 2
MAR Byte 3
Byte 4 Word #1 MDR
MAR register
contains the
address of the
memory location
addressed
23
Memory operations
24
Instruction types
25
Instruction types (contd..)
26
Specifying operands in instructions
27
Source and destination operands
28
Instruction types
29
Instruction types (contd..)
Immediate mode
Operand is given explicitly in the instruction.
E.g. Move #200, R0
Can be used to represent constants.
Register, Absolute and Immediate modes contained either
the address of the operand or the operand itself.
Some instructions provide information from which the
memory address of the operand can be determined
That is, they provide the “Effective Address” of the operand.
They do not provide the operand or the address of the operand
explicitly.
Different ways in which “Effective Address” of the operand
can be generated.
32
Addressing modes (contd..)
Main
memory
B Operand A B
R1 B Register B Operand
R1 1000
35
Instruction execution and sequencing
36
Instruction execution and sequencing (contd..)
37
Instruction execution and sequencing (contd..)
Execution steps:
0 Move A, R0
Step I:
4 Add B, R0 -PC holds address 0.
8 Move R0, C -Fetches instruction at address 0.
-Fetches operand A.
-Executes the instruction.
-Increments PC to 4.
Step II:
-PC holds address 4.
A -Fetches instruction at address 4.
-Fetches operand B.
-Executes the instruction.
-Increments PC to 8.
B Step III:
-PC holds address 8.
-Fetches instruction at address 8.
-Executes the instruction.
C
-Stores the result in location C.
Instructions are executed one at a time in order of increasing addresses.
“Straight line sequencing”
38
Instruction execution and sequencing (contd..)
39
Instruction sequencing and execution (contd..)
40
Instruction execution and sequencing (contd..)
Decrement R1:
Initially holds the number of numbers that is to be added
(Move N, R1).
Decrements the count each time a new number is added
(Decrement R1).
Keeps a count of the number of the numbers added so far.
Branch>0 LOOP:
Checks if the count in register R1 is 0 (Branch > 0)
If it is 0, then store the sum in register R0 at memory location
SUM (Move R0, SUM).
If not, then get the next number, and repeat (go to LOOP). Go
to is specified implicitly.
Note that the instruction (Branch > 0 LOOP) has no explicit
reference to register R1.
41
Instructions execution and sequencing (contd..)
42
Instruction execution and sequencing (contd..)
43
Instruction sequencing and execution (contd..)
Move N, R1
Move #NUM1, R2 (Initialize R2 with address of NUM1)
Clear R0
LOOP Add (R2), R0 (Indirect addressing)
Add #4, R2 (Increment R2 to point to the next number)
Decrement R1
Branch>0 LOOP
Move R3, SUM
44
Instruction execution and sequencing (contd..)
Move N, R1
Move #NUM1, R2 (Initialize R2 with address of NUM1)
Clear R0
LOOP Add (R2)+, R0 (Autoincrement)
Decrement R1
Branch>0 LOOP
Move R3, SUM
45
Stacks
46
Stacks (contd..)
47
Subroutines
48
Subroutines (contd..)
Memory Memory
location Calling program location Subroutine SUB •Calling program calls a subroutine,
whose first instruction is at address
100.
200 Call SUB 1000 first instruction
•The Call instruction is at address
204 next instruction 200.
•While the Call instruction is being
Return executed, the PC points to the next
instruction at address 204.
•Call instructions stores address 204
in the Link register, and loads 1000
1000
into the PC.
•Return instruction loads back the
PC 204 address 204 from the link register
into the PC.
Link 204
Call Return
49
Subroutines and stack
50
Assembly language
51
Assembly language (contd..)
52
Assembly language (contd..)
100 Move N,R1
104 Move #NUM1,R2
•What is the numeric value assigned to SUM?
108 Clear R0
•What is the address of the data NUM1 through
LOOP 112 Add (R2),R0
NUM100?
116 Add #4,R2 •What is the address of the memory location
120 Decrement R1 represented by the label LOOP?
124 Branch>0 LOOP •How to place a data value into a memory
128 Move R0,SUM location?
132
SUM 200
N 204 100
NUM1 208
NUM2 212
NUM 100604
53
Assembly language (contd..)
EQU:
Memory Addressing •Value of SUM is 200.
address or data
ORIGIN:
label Operation information
•Place the datablock at 204.
DATAWORD:
Assembler directives SUM EQU 200 •Place the value 100 at 204
ORIGIN 204 •Assign it label N.
N DATAWORD 100 •N EQU 100
NUM1 RESERVE 400 RESERVE:
ORIGIN 100 •Memory block of 400 words
Statements that START MOVE N,R1 is to be reserved for data.
generate MOVE #NUM1,R2
•Associate NUM1 with address
machine CLR R0
208
instructions LOOP ADD (R2),R0
ADD #4,R2 ORIGIN:
DEC R1 •Instructions of the object
BGTZ LOOP program to be loaded in memory
MOVE R0,SUM starting at 100.
Assemblerdirectives RETURN RETURN:
END START •Terminate program execution.
END:
•End of the program source text
54
Assembly language (contd..)
Assembly language instructions have a generic form:
Label Operation Operand(s) Comment
Four fields are separated by a delimiter, typically one or
more blank characters.
Label is optionally associated with a memory address:
May indicate the address of an instruction to be executed.
May indicate the address of a data item.
How does the assembler determine the values that
represent names?
Value of a name may be specified by EQU directive.
• SUM EQU 100
A name may be defined in the Label field of another
instruction, value represented by the name is determined by
the location of that instruction in the object program.
• E.g., BGTZ LOOP, the value of LOOP is the address of the
instruction ADD (R2) R0
55
Encoding of machine instructions
56
Encoding of machine instructions (contd..)
One-word instruction format.
8 7 7 10
Opcode : 8 bits.
Source operand : 4 bits to specify a register
3 bits to specify the addressing mode.
Destination operand : 4 bits to specify a register.
3 bits to specify the addressing mode.
Other information : 10 bits to specify other information
such as index value.
57
Encoding of machine instructions (contd..)
What if the source operand is a memory location specified
using the absolute addressing mode?
8 3 7 14
Opcode : 8 bits.
Source operand : 3 bits to specify the addressing mode.
Destination operand : 4 bits to specify a register.
3 bits to specify the addressing mode.
58
Encoding of machine instructions (contd..)
59
Encoding of machine instructions (contd..)
Insist that all instructions must fit into a single 32 bit word:
Instruction cannot specify a memory location or an immediate
operand.
ADD R1, R2 can be specified.
ADD LOC, R2 cannot be specified.
Use indirect addressing mode: ADD (R3), R2
R3 serves as a pointer to memory location LOC.
How to load address of LOC into R3?
Relative addressing mode.
60
Encoding of machine instructions (contd..)
Three-operand instruction
61
College of Computer Science
Department of Computer Science
Instruction formats
Instruction types
MIPS instructions
Examples
1
Instruction Set Architecture (ISA)
Critical Interface between hardware and software
An ISA includes the following …
Instructions and Instruction Formats
• Data Types, Encodings, and Representations
• Addressing Modes: to address Instructions and Data
• Handling Exceptional Conditions (like division by zero)
Programmable Storage: Registers and Memory
Examples (Versions) First Introduced in
Intel (8086, 80386, Pentium, ...) 1978
MIPS (MIPS I, II, III, IV, V) 1986
PowerPC (601, 604, …) 1993
Instructions
...
Example: $8 - $15 are called $t0 - $t7 $10 = $t2 $26 = $k0
Immediate (I-Type)
16-bit immediate constant is part in the instruction
Op6 Rs5 Rt5 immediate16
Jump (J-Type)
Used by jump instructions
Op6 immediate26
Instruction Categories
Integer Arithmetic
Arithmetic, logical, and shift instructions
Data Transfer
Load and store instructions that access memory
Data movement and conversions
Jump and Branch
Flow-control instructions that alter the sequential sequence
Floating Point Arithmetic
Instructions that operate on floating-point registers
R-Type Format
Examples:
Assume $s1 = 0xabcd1234 and $s2 = 0xffff0000
Load Instruction:
load
Transfers data from memory to a register
Registers Memory
Store Instruction: store
3
Register Set
> Registers are used to create and store the results of CPU
operations.
4
Register types
5
Status register (Program Status Word)
The PSW contains bits that are set by the CPU to indicate the
current status of an executing program (arithmetic operations,
interrupts, processor status).
> Typical:
— Sign of last result
— Zero
— Carry
— Equal
— Overflow
— Interrupt enable/disable
— Supervisor mode (enable privileged instructions)
8
Example of Register Organizations (8086)
9
Arithmetic and Logic Unit (ALU)
> The component that performs the arithmetic and logical operations
> Load the operands into memory – bring them to the processor –
perform operation in ALU – store the result back to memory or
retain in the processor.
10
CPU Instruction cycle
12
Fetch Cycle
13
Execution Cycle : Execute Simple Arithmetic
Operation
Add R1, R2, R0
This instruction adds the contents of source registers R1 and R2, and stores
the results in destination register R0. This addition can be executed as follows:
1. The registers R0 , R1 , R2 , are extracted from the IR.
2. The contents of R1 and R2 are passed to the ALU for addition.
14
Instruction format
15
Example of Program Execution (1)
16
Control Unit operation
> The timing signals that govern the I/O transfers are also
generated by the control unit.
Register
File ALU
Control Unit
18
Timing & Control (1)
> All sequential circuits in the Basic Computer CPU are driven by a
master clock.
> At each clock pulse, the control unit sends control signals to control
inputs of the bus, the registers, and the ALU.
Machine
Cycle
1 2 3 4 5 6 7
19
Timing & Control (2)
20
Control unit design and implementation
> Control unit design and implementation can be done by two general
methods:
21
Hardwired Control Organization
> This approach is to physically connect all of the control lines to the
actual machine instructions.
> The instructions are divided up into fields, and different bits in the
instruction are combined through various digital logic components to
drive the control-lines.
22
Microprogrammed Control Organization
23
Example
24
CPU Performance
25
Performance Measures
> We denote the number of CPU clock cycles for executing a job to be
the Cycle Count (CC), the Cycle Time by CT, and the clock
frequency by f = 1 / CT.
> The time taken by the CPU to execute a job can be expressed as:
CPU time = CC * CT = CC / f
27
Performance Measures
28
Performance Measures
29
Performance Measures
30
Computer Organization & Design, CNE211_ 2018 Tutorial 2 Dr. Jamel Baili
Question1: Assume the execution of the following three consecutive instructions on a CPU:
Load [520]
ADD [521]
Store [521]
If the load instruction is in memory location [400], fill each of the Program Counter (PC), Memory,
Accumulator (AC), and Instruction Register (IR) in each step.
1/5
Ans.
Question 2:
Consider the execution of a program which results in the execution of 2 million instructions on a 400MHz
processor. The program consists of four major types of instructions. The instruction mix and the CPI for each
instruction type are given below based on the result of a program trace experiment:
Instruction Type CPI Instruction
Arithmetic and logic 1 60%
Load/store with cache hit 2 18%
Branch 4 12%
Memory reference with cache 8 10%
2/5
Ans.
∑ ∗
∗ ∗ . ∗ . ∗ . ∗ . .
∗
∗ . ∗
Question3:
A benchmark program is run on a 40 MHz processor. The executed program consists of 100,000 instruction
executions, with the following instruction mix and clock cycle count:
Instruction Type Instruction Count Cycles per Instruction
Integer 45000 1
Data transfer 32000 2
Floating point 15000 2
Control transfer 8000 2
Determine the effective CPI, MIPS rate, and execution time for this program.
Ans.
∑ ∗ ∗ ∗ ∗ ∗
.
∗
.
∗ . ∗
∗ ∗ .
.
∗
Question 4:
Consider two different machines, with two different instruction sets, both of which have a clock rate of 200
MHz. The following measurements are recorded on the two machines running a given set of benchmark
programs:
Instruction Count
Instruction Type (millions) Cycles per Instruction
Machine A
Arithmetic and logic 4 1
Load and store 8 3
Branch 2 4
Others 4 3
Machine B
Arithmetic and logic 10 1
Load and store 8 3
Branch 2 4
Others 4 3
a) Determine the effective CPI, MIPS rate, and execution time for each machine.
b) Comment on the results.
Ans. (a)
3/5
∑ ∗ ∗ ∗ ∗ ∗ ∗
.
∗
∗
.
∗ . ∗
∗ ∗ ∗ .
.
∗
∑ ∗ ∗ ∗ ∗ ∗ ∗
.
∗
∗
.
∗ . ∗
∗ ∗ ∗ .
.
∗
(b) Although machine B has a higher MIPS than machine A, it requires a longer CPU time to execute the
same set of benchmark programs. (Different instruction count and instruction sets).
Question 5:
Consider the execution of a program which results in the execution of 2 million instructions on a 400-MHz
processor. The program consists of four major types of instructions. The instruction mix and the CPI for each
instruction type are given below based on the result of a program trace experiment:
Instruction Type CPI Instruction Mix
∗
.
∗ . ∗
∗ ∗ ∗ .
.
∗
After improvement:
4/5
∑ ∗
∗ ∗ . ∗ . ∗ . ∗ . .
∗ ∗ ∗ .
.
∗
.
.
.
Question 6:
Computer A has an overall CPI of 1.5 and can be run at a clock rate of 800MHz. Computer B has a CPI of 3
and can be run at a clock rate of 900 Mhz. We have a particular program we wish to run. When compiled for
computer A, this program has exactly 150,000 instructions.
a) How many instructions would the program need to have when compiled for Computer B, in order for the
two computers to have exactly the same execution time for this program?
b) If a CPU design enhancement improves the CPI of Computer A so that the speed up is 1.5, what is the new
CPI?
c) If a CPU design enhancement improves the CPI of Computer B to be 2.5, what is the speed up?
Answer Key
= (150,000)*(1.5)/(800*106)
(CPUTime)B = (Instruction count)B * (CPI)B * (Clock cycle Time)B
= (Instruction count)B*(3)/(900*106)
Since (CPUTime)A = (CPUTime)B,
.
∗ ∗
.
∗
⇨
∗
c) ⇨ ∗
. .
∗
∗
5/5
College of Computer Science
Department of Computer Science
Contents
2
Computer Organization
Main memory
3
Computer Organization
4
Computer Organization
Memory units
5
Computer Organization
Example 1
Solution
The memory address space is 32 MB, or 225 (25 × 220).
This means that we need log2 225, or 25 bits, to address
each byte.
6
Computer Organization
Example 2
Solution
The memory address space is 128 MB, which means 227.
However, each word is eight (23) bytes, which means that
we have 224 words. This means that we need log2 224, or 24
bits, to address each word.
7
Memory types
Tran. Access
per bit time Persist? Sensitive? Cost Applications
> The RAM have 128 byte and need seven address lines, where
the ROM have 512 bytes and need 9 address lines
Memory Address Map
Memory Address Map
> When line 10 is 0, CPU selects a RAM. And when it’s 1, it selects
the ROM
Computer Organization
Memory Hierarchy
> Small, fast storage elements are kept in the CPU, larger,
slower main memory is accessed through the data bus.
19
Computer Organization
Memory Hierarchy
20
Computer Organization
Memory Hierarchy
21
Computer Organization
Memory cache
> If the active portions of the program and data are placed
in a fast small memory, the average memory access
time can be reduced.
> When the CPU refers to memory and finds the word in
cache, it is said to produce a hit. Otherwise, it is a miss
Direct mapped
19 mod 8 = slot 3
01234567 ....... 19 31
Direct mapped
Memory blocks 3, 11, 19, and 27 all
map to cache slot 3
01234567 ....... 19 31
Set# 0 1 2 3
2-way set associative
19 mod 4 = set 3 (slots 6 or 7)
Mem blocks 3, 7, 11, 15, 19, 23, 27, 31 all map to set 3
01234567 ....... 19 31
Fully associative
19 mod 1 = set 1 (All slots)
All mem blocks map anywhere in cache
01234567 ....... 19 31
Mapping Function
• Cache of 64kByte
• Cache block of 4 bytes
—i.e. cache is 16k (214) lines of 4 bytes
• 16MBytes main memory
• 24 bit address
—(224=16M)
Direct Mapping
• Each block of main memory maps to only
one cache line
—i.e. if a block is in cache, it must be in one
specific place
• Address is in two parts
• Least Significant w bits identify unique
word
• Most Significant s bits specify one
memory block
• The MSBs are split into a cache line field r
and a tag of s-r (most significant)
Direct Mapping
Address Structure
• 24 bit address
• 2 bit word identifier (4 byte block)
• 22 bit block identifier
— 8 bit tag (=22-14)
— 14 bit slot or line
• No two blocks in the same line have the same Tag field
• Check contents of cache by finding line and checking Tag
Direct Mapping from Cache to Main Memory
Direct Mapping Cache Organization
Direct Mapping Summary
• Address length = (s + w) bits
• Number of addressable units = 2s+w
words or bytes
• Block size = line size = 2w words or bytes
• Number of blocks in main memory = 2s+
w/2w = 2s
• Number of lines in cache = m = 2r
• Size of tag = (s – r) bits
Direct Mapping pros & cons
• Simple
• Inexpensive
• Fixed location for given block
—If a program accesses 2 blocks that map to
the same line repeatedly, cache misses are
very high
Associative Mapping
• A main memory block can load into any
line of cache
• Memory address is interpreted as tag and
word
• Tag uniquely identifies block of memory
• Every line’s tag is examined for a match
• Cache searching gets expensive
Associative Mapping from
Cache to Main Memory
College of Computer Science
1439/1440
Computer Organization
Contents
3
Accessing Input/Output Devices
4
I/O Module Function
5
I/O Module Diagram
6
Example of I/O interface for Input device
7
Data Transfer Techniques
9
Programmed I/O
10
Programmed I/O - detail
11
I/O Commands
12
Computer Organization
> Under programmed I/O data transfer is very like memory access
(CPU viewpoint)
13
I/O Mapping
14
Memory Mapped and Isolated I/O
15
Interrupt Driven I/O
16
Interrupt Driven I/O Basic Operation
17
Simple Interrupt
Processing
18
CPU Viewpoint
19
Changes in Memory and Registers
for an Interrupt
20
Direct Memory Access
21
DMA Function
22
Typical DMA Module Diagram
23
DMA Operation
24
DMA Transfer Cycle Stealing
25
DMA and Interrupt Breakpoints During an
Instruction Cycle
26
Computer Organization
27
Computer Organization
> Serial data transmission means sending data bits one by one using
one wire.
> Asynchronous transmission means a data (including one start bit ,
8-bit data, and stop bits) can be sent at any time.
> RS232 is a serial communication standard.
> Since it is asynchronous, no external clock is needed, only 3 wires
are required for the simplest RS232 connection {GND, tx(transmit),
rx(receive)}
28
College of Computer Science
Department of Computer Science
2018-2019
Contents
Parallel Computers
Pipelining Concepts
Characteristics of RISC and CISC machines
1
What is Parallel Computing? (1)
2
What is Parallel Computing? (2)
3
Parallel Computing: Resources
4
Parallel Computing: The computational problem
5
Parallel Computing: what for? (1)
6
Pipelining versus Serial Execution: Pipelining Example
7
Sequential Laundry
6 PM 7 8 9 10 11 12 AM
Time 30 30 30 30 30 30 30 30 30 30 30 30
8
Pipelined Laundry: Start Load ASAP
6 PM 7 8 9 PM
30 30 30
30 30 30 Time
30 30 30
30 30 30
9
Serial Execution versus Pipelining
Consider a task that can be divided into k subtasks
The k subtasks are executed on k different stages
Each subtask requires one time unit
The total execution time of the task is k time units
Pipelining is to overlap the execution
The k stages work in parallel on k different tasks
Tasks enter/leave pipeline at the rate of one task per
time unit
1 2 … k 1 2 … k
1 2 … k 1 2 … k
1 2 … k 1 2 … k
10
Synchronous Pipeline
Uses clocked registers between stages
Upon arrival of a clock edge …
All registers hold the results of previous stages
simultaneously
The pipeline stages are combinational logic circuits
It is desirable to have balanced stages
Approximately equal delay in all stages
Clock period is determined by the maximum stage delay
Register
Register
Register
Register
Input S1 S2 Sk Output
Clock
11
Pipeline Performance
12
MIPS Processor Pipeline
13
Single-Cycle vs Pipelined Performance
14
Single-Cycle versus Pipelined – cont’d
Pipelined clock cycle = max(200, 150) = 200 ps
IF Reg ALU MEM Reg
200 IF Reg ALU MEM Reg
200 IF Reg ALU MEM Reg
200 200 200 200 200
15
Pipeline Performance Summary
Pipelining doesn’t improve latency of a single instruction
However, it improves throughput of entire workload
Instructions are initiated and completed at a higher
rate
In a k-stage pipeline, k instructions operate in parallel
Overlapped execution using multiple hardware resources
Potential speedup = number of pipeline stages k
Unbalanced lengths of pipeline stages reduces speedup
Pipeline rate is limited by slowest pipeline stage
Unbalanced lengths of pipeline stages reduces speedup
Also, time to fill and drain pipeline reduces speedup
16
Single-Cycle Datapath
Shown below is the single-cycle datapath
How to pipeline this single-cycle datapath?
Answer: Introduce pipeline registers at end of each stage
WB = Write Back
& Register Read
Branch Target Address
Next PC Address
ExtOp +
Imm16
+1 Ext Zero ALU result
Instruction Rs BusA
Data
RA
A
00
0 Memory Memory
Registers L Address
Rt 0
Address U
PC
1 RB 1 Data_out
Instruction 1
2 0
BusB 0
Rd RW Data_in
1 BusW
clk
WB = Write Back
BTA
Next PC Address +
NPC
ExtOp
+1 Imm16
Imm
Zero ALU Result
Ext
Instruction Rs Data
A
RA BusA
Memory A Memory
00
0
Registers L Address
Data
Rt
PC
0
1 Address
RB 1
U
Data_out
Inst
Instruction 1
2 0 BusB 0
B
Rd RW Data_in
D
1 BusW
clk
19