Instruction Set Architecture Basics

Advanced Computer Architecture
Bahria Summer 2010

Instructor: Shaftab Ahmed
Lecture # 3
Instruction Set Architecture
06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 1

 Instruction set architecture is based on the

structure of a computer i.e. the description of
the CPU in terms of Registers, Addressability
and various Arithmetic / Control and Store
operations etc.
 Assembly / Machine language programmer
must understand ISA of target processor to
program for it.

 The programs written in any higher level language

eventually get converted to assembly level containing
instructions in mnemonics of instruction set.
 The Assembler converts these into machine language
before execution
High level language code : C, C++, Java,

Fortan, compiler
Assembly language code: architecture specific statements
assembler
Machine language code: architecture specific bit patterns
software
instruction set
hardware

ISA Metrics
 Orthogonally
 All operand modes are available with any data type or
instruction type.
 Completeness
 Support for a wide range of operations and target
applications
 Regularity
 No overloading for the meanings of instruction fields
 Streamlined
 Resource needs easily determined
 Ease of assembly language programming
 Ease of implementation

Instruction Set Design Issues
 Instruction set design issues include:

 Where are operands stored?
 registers, memory, stack, accumulator

 How many explicit operands are there?
 0, 1, 2, or 3
 How is the operand location specified?
 register, immediate, indirect, . . .
 What type & size of operands are supported?
 byte, int, float, double, string, vector. . .
 What operations are supported?
 add, sub, mul, move, compare . . .

Evolution of Instruction Sets
Single Accumulator (EDSAC 1950)
Accumulator + Index Registers

(Manchester Mark I, IBM 700 series 1953)
Separation of Programming Model

from Implementation
High-level Language Based Concept of a Family

(B5000 1963) (IBM 360 1964)
General Purpose Register Machines
Complex Instruction Sets Load/Store Architecture

(Vax, Intel 8086 1977-80) (CDC 6600, Cray 1 1963-76)
RISC
(Mips,Sparc,88000,IBM RS6000, . . .1987+)
Classifying ISAs
Accumulator (before 1960):

1 address add A acc acc + mem[A]
Stack (1960s to 1970s):

0 address add tos tos + next
Memory-Memory (1970s to 1980s):

2 address add A, B mem[A] mem[A] + mem[B]
3 address add A, B, C mem[A] mem[B] + mem[C]
Register-Memory (1970s to present):

2 address add R1, A R1 R1 + mem[A]
load R1, A R1 mem[A]
Register-Register (Load/Store) (1960s to present):

3 address add R1, R2, R3 R1 R2 + R3
load R1, R2 R1 mem[R2]
store R1, R2 mem[R1] R2
Types of Addressing Modes (VAX)
Addressing Mode Example Action

1. Register direct Add R4, R3 R4 <- R4 + R3
2. Immediate Add R4, #3 R4 <- R4 + 3
3. Displacement Add R4, 100(R1) R4 <- R4 + M[100 + R1]
4. Register indirect Add R4, (R1) R4 <- R4 + M[R1]
5. Indexed Add R4, (R1 + R2) R4 <- R4 + M[R1 + R2]
6. Direct Add R4, (1000) R4 <- R4 + M[1000]
7. Memory Indirect Add R4, @(R3) R4 <- R4 + M[M[R3]]
8. Autoincrement Add R4, (R2)+ R4 <- R4 + M[R2]
R2 <- R2 + d
9. Autodecrement Add R4, (R2)- R4 <- R4 + M[R2]
R2 <- R2 - d
10. Scaled Add R4, 100(R2)[R3] R4 <- R4 +
M[100 + R2 + R3*d]

Types of Addressing Modes Intel Instruction Set
Register Instructions involving data manipulation through registers
Immediate Involves immediate values contained within the instruction
Direct Transfer data to/from memory location to memory/ register
Register Indirect Transfer a data byte/word to a location whose address is

specified in a register e.g. [Bx]
Use of Byte PTR, Word PTR, DWord PTR specifies boundary
of data.
Base + Index MOV AX, [BX+SI]
Indirect
Relative MOV AX, (BX+4)
Base relative MOV AX, (BX+SI+4)
plus index
Scaled Index MOV AX,[AX+4*BX]

Instruction Encoding
 Variable Size
 Instruction length varies based on opcode and address
specifiers
 For example, VAX instructions vary between 1 and 53 bytes,
while x86 instruction vary between 1 and 17 bytes.
 Good source code density, but difficult to decode and pipeline
 Fixed Size
 Only a single size for all instructions
 For example, DLX, MIPS, Power PC, Sparc all have 32 bit
instructions
 Not as good code density, but easier to decode and pipeline
 Hybrid Size
 Have multiple format lengths specified by the opcode
 For example, IBM 360/370
 Compromise between code density and ease in decoding

DLX Architecture
 Introduced by Hennessey and Patterson in 1990

 Derived from many different instruction set architectures
from MIPS, Sun, IBM, Intel, HP, AMD, etc.
 DLX is a typical RISC architecture.
 32-bit fixed length instructions
 3 instruction formats
 Load/store architecture
 Simple branch conditions (no condition codes)
 DLX registers
 32 32-bit general-purpose registers (R0 = 0)
 32 32-bit (or 16 64-bit) floating point registers
 Special purpose registers (e.g., FP Status and PC)

DLX Design Decisions
 DLX is based on the following design decisions

 Use general purpose registers with a load-store architecture
 Support commonly used addressing modes
 displacement, immediate, and register deferred
 Support simple instructions that occur frequently

 load, store, add, subtract, move, and, shift, compare
equal, branch, jump, call, and return

 Support commonly required data sizes
 8 (byte), 16 (half word), and 32-bit (word) integers
 32 (float) and 64-bit (double) floating point
 Use fixed length instructions that are easy to decode

 Provide plenty of general purpose registers and separate
floating point registers

DLX Instruction Formats
(a) Register-Register (R-type) ADD R1, R2, R3
31 26 25 21 20 16 15 11 10 6 5 0
Op rs1 rs2 rd function
(ALL reg. operations, read/write special registers and moves)
(b) Register-Immediate (I-type) SUB R1, R2, #3

31 26 25 21 20 16 15 0
Op rs1 rd immediate
(ALU immediate operations, loads and stores, conditional branch, jump )
(c) Jump / Call (J-type) JUMP end

31 26 25 0
Op offset added to PC
(jump, jump and link, trap and return from exception)

Intel 80x86 Integer Registers

X86 Operand Types
 x86 instructions typically have two operands,

where one operand is both a source and a
destination operand.
 Possible combinations include
Source/destination type Second source type
Register Register
Register Immediate
Register Memory
Memory Register
Memory Immediate
 No memory-memory or immediate-immediate
 Immediate can be 8, 16, or 32 bits
Intel 80x86 Floating Point Registers

80x86 Instructions
 Data movement
(move, push, pop)
 Arithmetic and logic
(logic ops, tests CCs, shifts, integer and decimal arithmetic)
 Control flow
(branches, jumps, calls, returns)
 String instructions
(move and compare)
 FP data movement
(load, load const., store)
 Arithmetic instructions
(add, subtract, multiply, divide, square root, absolute value)
 Comparisons
(Result to Flag)
 Transcendental functions
(sin, cos, log, etc.)
80x86 Instruction Format
 Instructions sizes vary from 1 to 17 bytes

Instruction Set 8088 / 8086 CPU
FORMATS
1. One Byte  The instructions have implied data or register
operands.The least significant three bits specify register if any
2. Register to Register
Two Byte instruction where first byte contains Opcode followed by width
and second operand has 2nd register and R/ M fields. Mod field is 11
3. Register to / from Memory without displacement
NOTE: W field’s D1 gives Dir i.e. 0 Byte2 Reg is Source, 1 Byte 2 Reg is Destination
W field’s D0 bit specifies whether it is a eight bit data of 16 bit data
R/M field specifies one of 8 registers. The MOD field is 11 for Register, 00 for memory
without displacement, 01 for memory with 8 bit displacement and 10 for 16 bit displacement
4. Register to / from Memory with Displacement
One or Two additional bytes specify displacement
5. Immediate operand to Register

In this instruction the 7bits of first byte and bits 3-4 of second
byte specify the op code. The last two bytes specify the data

6. Immediate Operand to Memory with 16 bit Displacement
First two bytes specify the Opcode MOD and R/M as before followed by
two bytes of displacement and two bytes of data
Significance of OPCODE fields

Graphics and Multimedia Instruction Set Extensions
 Several companies have extended their computer’s

instruction sets to support graphics and multimedia
applications.
 Intel’s MMX Technology
 Intel’s Internet Streaming SIMD Extensions
 AMD’s 3DNow! Technology
 Sun’s Visual Instruction Set
 Motorola’s and IBM’s AltiVec Technology
 These extensions improve the performance of

 Computer-aided design
 Internet applications
 Computer visualization
 Video games
 Speech recognition

MMX Instructions
 MMX Technology adds 57 new instructions to
the x86 architecture (Reference article on PII MMX)
 Some of these instructions include
 PADD(b, w, d) Packed addition
 PSUB(b, w, d) Packed subtraction
 PCMPE(b, w, d) Packed compare equal
 PMULLw Packed word multiply low
 PMULHw Packed word multiply high
 PMADDwd Packed word multiply-add
 PSRL(w, d, q) Pack shift right logical
 PACKSS(wb, dw) Pack data
 PUNPCK(bw, wd, dq) Unpack data
 PAND, POR, PXOR Packed logical operations

MMX Data Types
 MMX Technology supports operations on the

following 64-bit integer data types.
Packed byte (eight 8-bit elements)
Packed word (four 16-bit elements)
Packed double word (two 32-bit elements)
Packed quad word (one 64-bit elements)

SIMD Operations
MMX Technology allows a Single Instruction to work on Multiple

pieces of Data (SIMD).
Example: PADD[W]: Packed add word
A3 A2 A1 A0
B3 B2 B1 B0
A3+B3 A2+B2 A1+B1 A0+B0
 4 parallel adds are performed on 16-bit elements.

 Most MMX instructions only require a single cycle.

Saturating Arithmetic
 Both wrap-around and saturating ADD instructions are supported.
 With saturating arithmetic, results that overflow are set to the

largest value.
Below are examples for both types
PADD[W]: Packed wrap-around add PADDUS[W]: Packed saturating add

Pack and Unpack Instructions
 Pack and unpack instructions provide

conversion between standard data types and
packed data types
PACKSS[DW]: Packed Signed with Saturating Double to Packed Word

Multiply-Add Operations
 Many graphics applications require multiply-
accumulate operations
 Vector Dot Products
 Matrix Multiplication
 Fast Fourier Transforms (FFTs)
 Filter implementations
PMADDWD: Packed multiply-add word to double

Vector Dot Product of two 8 Byte vectors
 A dot product on two 8-element vector can be performed
using 9 MMX instructions
0 a0*c0+..+ a3*c3 0 a4*c4+..+ a7*c7
a0*c0+..+ a7*c7
With MMX 9 Instructions

2 loads for one of the vectors
Other vector is loaded by PMADD
2 PMADDs,
2 PADDs,
2 shifts (if reqd. to fix precision)
1 Store

Vector Dot Product of two 8 Byte vectors
Without MMX 40 instructions
16 Load
8 Multiply
8 Shift
7 Add
1 Store
0 a0*c0+..+ a3*c3 0 a4*c4+..+ a7*c7
a0*c0+..+ a7*c7

MMX Technology Summary
 MMX technology extends the Intel x86 architecture to

improve the performance of multimedia and graphics
applications.
 Most MMX instructions can be executed in one clock cycle,
so the performance improvement will be more dramatic than
the simple ratio of instruction counts.
 It provides a speedup of 1.5 to 2.0 for certain applications.
 Only increase the chip area by about 5%.

MMX Technology Summary
 MMX instructions are hand-coded in assembly or

implemented as libraries to achieve high
performance.
 MMX data types use the x86 floating point registers
 Makes it easy to handle context switches
 Makes it hard to perform MMX and floating point
instructions at the same time

Internet Streaming SIMD Extensions
 ISSE introduced eight 128-bit data registers

(called XMM registers)
 In 64-bit modes, they are available as 16 X 64-bit registers
 The 128-bit packed single-precision floating-point data type,
allows four single-precision operations to be performed

simultaneously

ISSE Data Type
 ISSE extensions introduced one new data type
 128-Bit Packed Single-Precision Floating-Point Data Type
 SSE 2 introduced five data types

Internet Streaming SIMD Extensions
 Intel’s Internet Streaming SIMD Extensions (ISSE)

 Help improve the performance of video and 3D applications
 70 new instructions beyond MMX Technology
 Adds new 128-bit registers
 Provide the ability to perform parallel floating point operations
 Four parallel operations on 32-bit numbers
 Reciprocal and reciprocal root instructions - normalization
 Packed average instruction – Motion compensation
 Provide data pre-fetch instructions
 Make certain applications 1.5 to 2.0 times faster.

Instruction Set Architecture Basics

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Instruction Set Architecture Basics

Uploaded by

Copyright:

Available Formats

Advanced Computer Architecture

Bahria Summer 2010

06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 1

 Instruction set architecture is based on the

06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 2

 The programs written in any higher level language

High level language code : C, C++, Java,

06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 3

06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 4

 Instruction set design issues include:

 registers, memory, stack, accumulator

06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 5

Accumulator + Index Registers

Separation of Programming Model

High-level Language Based Concept of a Family

General Purpose Register Machines

Complex Instruction Sets Load/Store Architecture

Accumulator (before 1960):

Stack (1960s to 1970s):

Memory-Memory (1970s to 1980s):

Register-Memory (1970s to present):

Register-Register (Load/Store) (1960s to present):

Addressing Mode Example Action

06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 8

Register Instructions involving data manipulation through registers

Immediate Involves immediate values contained within the instruction

Direct Transfer data to/from memory location to memory/ register

Register Indirect Transfer a data byte/word to a location whose address is

06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 9

 For example, IBM 360/370

 Compromise between code density and ease in decoding

06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 10

 Introduced by Hennessey and Patterson in 1990

06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 11

 DLX is based on the following design decisions

 Support simple instructions that occur frequently

equal, branch, jump, call, and return

 32 (float) and 64-bit (double) floating point

 Use fixed length instructions that are easy to decode

06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 12

Op rs1 rs2 rd function

(ALL reg. operations, read/write special registers and moves)

(b) Register-Immediate (I-type) SUB R1, R2, #3

(ALU immediate operations, loads and stores, conditional branch, jump )

(c) Jump / Call (J-type) JUMP end

(jump, jump and link, trap and return from exception)

06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 14

 x86 instructions typically have two operands,

06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 16

 Instructions sizes vary from 1 to 17 bytes

06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 18

3. Register to / from Memory without displacement

5. Immediate operand to Register

06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 21

Significance of OPCODE fields

06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 22

 Several companies have extended their computer’s

 These extensions improve the performance of

06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 30

06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 31

 MMX Technology supports operations on the

Packed word (four 16-bit elements)

Packed double word (two 32-bit elements)

0 a0c0+..+ a3c3 0 a4c4+..+ a7c7

0 a0c0+..+ a3c3 0 a4c4+..+ a7c7