You are on page 1of 43

Advanced Computer Architecture

Bahria Summer 2010


Instructor: Shaftab Ahmed

Lecture # 3
Instruction Set Architecture

06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 1


Instruction Set Architecture

 Instruction set architecture is based on the


structure of a computer i.e. the description of
the CPU in terms of Registers, Addressability
and various Arithmetic / Control and Store
operations etc.
 Assembly / Machine language programmer
must understand ISA of target processor to
program for it.

06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 2


Instruction Set Architecture

 The programs written in any higher level language


eventually get converted to assembly level containing
instructions in mnemonics of instruction set.
 The Assembler converts these into machine language
before execution

High level language code : C, C++, Java,


Fortan, compiler
Assembly language code: architecture specific statements
assembler
Machine language code: architecture specific bit patterns

software
instruction set
hardware

06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 3


ISA Metrics
 Orthogonally
 All operand modes are available with any data type or
instruction type.
 Completeness
 Support for a wide range of operations and target
applications
 Regularity
 No overloading for the meanings of instruction fields
 Streamlined
 Resource needs easily determined
 Ease of assembly language programming
 Ease of implementation

06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 4


Instruction Set Design Issues

 Instruction set design issues include:


 Where are operands stored?

 registers, memory, stack, accumulator


 How many explicit operands are there?
 0, 1, 2, or 3
 How is the operand location specified?
 register, immediate, indirect, . . .
 What type & size of operands are supported?
 byte, int, float, double, string, vector. . .
 What operations are supported?
 add, sub, mul, move, compare . . .

06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 5


Evolution of Instruction Sets
Single Accumulator (EDSAC 1950)

Accumulator + Index Registers


(Manchester Mark I, IBM 700 series 1953)

Separation of Programming Model


from Implementation

High-level Language Based Concept of a Family


(B5000 1963) (IBM 360 1964)

General Purpose Register Machines

Complex Instruction Sets Load/Store Architecture


(Vax, Intel 8086 1977-80) (CDC 6600, Cray 1 1963-76)

RISC
(Mips,Sparc,88000,IBM RS6000, . . .1987+)
06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 6
Classifying ISAs

Accumulator (before 1960):


1 address add A acc acc + mem[A]

Stack (1960s to 1970s):


0 address add tos tos + next

Memory-Memory (1970s to 1980s):


2 address add A, B mem[A] mem[A] + mem[B]
3 address add A, B, C mem[A] mem[B] + mem[C]

Register-Memory (1970s to present):


2 address add R1, A R1 R1 + mem[A]
load R1, A R1 mem[A]

Register-Register (Load/Store) (1960s to present):


3 address add R1, R2, R3 R1 R2 + R3
load R1, R2 R1 mem[R2]
store R1, R2 mem[R1] R2
06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 7
Types of Addressing Modes (VAX)

Addressing Mode Example Action


1. Register direct Add R4, R3 R4 <- R4 + R3
2. Immediate Add R4, #3 R4 <- R4 + 3
3. Displacement Add R4, 100(R1) R4 <- R4 + M[100 + R1]
4. Register indirect Add R4, (R1) R4 <- R4 + M[R1]
5. Indexed Add R4, (R1 + R2) R4 <- R4 + M[R1 + R2]
6. Direct Add R4, (1000) R4 <- R4 + M[1000]
7. Memory Indirect Add R4, @(R3) R4 <- R4 + M[M[R3]]
8. Autoincrement Add R4, (R2)+ R4 <- R4 + M[R2]
R2 <- R2 + d
9. Autodecrement Add R4, (R2)- R4 <- R4 + M[R2]
R2 <- R2 - d
10. Scaled Add R4, 100(R2)[R3] R4 <- R4 +
M[100 + R2 + R3*d]

06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 8


Types of Addressing Modes Intel Instruction Set

Register Instructions involving data manipulation through registers

Immediate Involves immediate values contained within the instruction

Direct Transfer data to/from memory location to memory/ register

Register Indirect Transfer a data byte/word to a location whose address is


specified in a register e.g. [Bx]
Use of Byte PTR, Word PTR, DWord PTR specifies boundary
of data.
Base + Index MOV AX, [BX+SI]
Indirect
Relative MOV AX, (BX+4)
Base relative MOV AX, (BX+SI+4)
plus index
Scaled Index MOV AX,[AX+4*BX]

06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 9


Instruction Encoding

 Variable Size
 Instruction length varies based on opcode and address

specifiers
 For example, VAX instructions vary between 1 and 53 bytes,
while x86 instruction vary between 1 and 17 bytes.
 Good source code density, but difficult to decode and pipeline

 Fixed Size
 Only a single size for all instructions

 For example, DLX, MIPS, Power PC, Sparc all have 32 bit

instructions
 Not as good code density, but easier to decode and pipeline

 Hybrid Size
 Have multiple format lengths specified by the opcode

 For example, IBM 360/370

 Compromise between code density and ease in decoding

06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 10


DLX Architecture

 Introduced by Hennessey and Patterson in 1990


 Derived from many different instruction set architectures
from MIPS, Sun, IBM, Intel, HP, AMD, etc.
 DLX is a typical RISC architecture.
 32-bit fixed length instructions
 3 instruction formats
 Load/store architecture
 Simple branch conditions (no condition codes)
 DLX registers
 32 32-bit general-purpose registers (R0 = 0)
 32 32-bit (or 16 64-bit) floating point registers
 Special purpose registers (e.g., FP Status and PC)

06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 11


DLX Design Decisions

 DLX is based on the following design decisions


 Use general purpose registers with a load-store architecture
 Support commonly used addressing modes
 displacement, immediate, and register deferred

 Support simple instructions that occur frequently


 load, store, add, subtract, move, and, shift, compare

equal, branch, jump, call, and return


 Support commonly required data sizes
 8 (byte), 16 (half word), and 32-bit (word) integers

 32 (float) and 64-bit (double) floating point

 Use fixed length instructions that are easy to decode


 Provide plenty of general purpose registers and separate
floating point registers

06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 12


DLX Instruction Formats
(a) Register-Register (R-type) ADD R1, R2, R3
31 26 25 21 20 16 15 11 10 6 5 0

Op rs1 rs2 rd function

(ALL reg. operations, read/write special registers and moves)

(b) Register-Immediate (I-type) SUB R1, R2, #3


31 26 25 21 20 16 15 0

Op rs1 rd immediate

(ALU immediate operations, loads and stores, conditional branch, jump )

(c) Jump / Call (J-type) JUMP end


31 26 25 0

Op offset added to PC

(jump, jump and link, trap and return from exception)


06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 13
Intel 80x86 Integer Registers

06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 14


X86 Operand Types

 x86 instructions typically have two operands,


where one operand is both a source and a
destination operand.
 Possible combinations include
Source/destination type Second source type
Register Register
Register Immediate
Register Memory
Memory Register
Memory Immediate
 No memory-memory or immediate-immediate
 Immediate can be 8, 16, or 32 bits
06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 15
Intel 80x86 Floating Point Registers

06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 16


80x86 Instructions
 Data movement
(move, push, pop)
 Arithmetic and logic
(logic ops, tests CCs, shifts, integer and decimal arithmetic)
 Control flow
(branches, jumps, calls, returns)
 String instructions
(move and compare)
 FP data movement
(load, load const., store)
 Arithmetic instructions
(add, subtract, multiply, divide, square root, absolute value)
 Comparisons
(Result to Flag)
 Transcendental functions
(sin, cos, log, etc.)
06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 17
80x86 Instruction Format

 Instructions sizes vary from 1 to 17 bytes

06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 18


Instruction Set 8088 / 8086 CPU

FORMATS
1. One Byte  The instructions have implied data or register
operands.The least significant three bits specify register if any
2. Register to Register
Two Byte instruction where first byte contains Opcode followed by width
and second operand has 2nd register and R/ M fields. Mod field is 11

3. Register to / from Memory without displacement

NOTE: W field’s D1 gives Dir i.e. 0 Byte2 Reg is Source, 1 Byte 2 Reg is Destination
W field’s D0 bit specifies whether it is a eight bit data of 16 bit data
R/M field specifies one of 8 registers. The MOD field is 11 for Register, 00 for memory
without displacement, 01 for memory with 8 bit displacement and 10 for 16 bit displacement
06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 19
06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 20
4. Register to / from Memory with Displacement
One or Two additional bytes specify displacement

5. Immediate operand to Register


In this instruction the 7bits of first byte and bits 3-4 of second
byte specify the op code. The last two bytes specify the data

06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 21


6. Immediate Operand to Memory with 16 bit Displacement
First two bytes specify the Opcode MOD and R/M as before followed by
two bytes of displacement and two bytes of data

Significance of OPCODE fields

06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 22


06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 23
06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 24
06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 25
06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 26
06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 27
06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 28
06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 29
Graphics and Multimedia Instruction Set Extensions

 Several companies have extended their computer’s


instruction sets to support graphics and multimedia
applications.
 Intel’s MMX Technology
 Intel’s Internet Streaming SIMD Extensions
 AMD’s 3DNow! Technology
 Sun’s Visual Instruction Set
 Motorola’s and IBM’s AltiVec Technology

 These extensions improve the performance of


 Computer-aided design
 Internet applications
 Computer visualization
 Video games
 Speech recognition

06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 30


MMX Instructions
 MMX Technology adds 57 new instructions to
the x86 architecture (Reference article on PII MMX)
 Some of these instructions include
 PADD(b, w, d) Packed addition
 PSUB(b, w, d) Packed subtraction
 PCMPE(b, w, d) Packed compare equal
 PMULLw Packed word multiply low
 PMULHw Packed word multiply high
 PMADDwd Packed word multiply-add
 PSRL(w, d, q) Pack shift right logical
 PACKSS(wb, dw) Pack data
 PUNPCK(bw, wd, dq) Unpack data
 PAND, POR, PXOR Packed logical operations

06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 31


MMX Data Types

 MMX Technology supports operations on the


following 64-bit integer data types.
Packed byte (eight 8-bit elements)

Packed word (four 16-bit elements)

Packed double word (two 32-bit elements)

Packed quad word (one 64-bit elements)

06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 32


SIMD Operations

MMX Technology allows a Single Instruction to work on Multiple


pieces of Data (SIMD).

Example: PADD[W]: Packed add word

A3 A2 A1 A0
B3 B2 B1 B0

A3+B3 A2+B2 A1+B1 A0+B0

 4 parallel adds are performed on 16-bit elements.


 Most MMX instructions only require a single cycle.

06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 33


Saturating Arithmetic
 Both wrap-around and saturating ADD instructions are supported.

 With saturating arithmetic, results that overflow are set to the


largest value.
Below are examples for both types

PADD[W]: Packed wrap-around add PADDUS[W]: Packed saturating add

06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 34


Pack and Unpack Instructions

 Pack and unpack instructions provide


conversion between standard data types and
packed data types

PACKSS[DW]: Packed Signed with Saturating Double to Packed Word

06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 35


Multiply-Add Operations
 Many graphics applications require multiply-
accumulate operations
 Vector Dot Products
 Matrix Multiplication
 Fast Fourier Transforms (FFTs)
 Filter implementations

PMADDWD: Packed multiply-add word to double


06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 36
Vector Dot Product of two 8 Byte vectors
 A dot product on two 8-element vector can be performed
using 9 MMX instructions

0 a0*c0+..+ a3*c3 0 a4*c4+..+ a7*c7

a0*c0+..+ a7*c7

With MMX 9 Instructions


2 loads for one of the vectors
Other vector is loaded by PMADD
2 PMADDs,
2 PADDs,
2 shifts (if reqd. to fix precision)
1 Store

06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 37


Vector Dot Product of two 8 Byte vectors
Without MMX 40 instructions
16 Load
8 Multiply
8 Shift
7 Add
1 Store

0 a0*c0+..+ a3*c3 0 a4*c4+..+ a7*c7

a0*c0+..+ a7*c7

06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 38


MMX Technology Summary

 MMX technology extends the Intel x86 architecture to


improve the performance of multimedia and graphics
applications.
 Most MMX instructions can be executed in one clock cycle,
so the performance improvement will be more dramatic than
the simple ratio of instruction counts.
 It provides a speedup of 1.5 to 2.0 for certain applications.
 Only increase the chip area by about 5%.

06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 39


MMX Technology Summary

 MMX instructions are hand-coded in assembly or


implemented as libraries to achieve high
performance.
 MMX data types use the x86 floating point registers
 Makes it easy to handle context switches
 Makes it hard to perform MMX and floating point
instructions at the same time

06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 40


Internet Streaming SIMD Extensions

 ISSE introduced eight 128-bit data registers


(called XMM registers)
 In 64-bit modes, they are available as 16 X 64-bit registers

 The 128-bit packed single-precision floating-point data type,

allows four single-precision operations to be performed


simultaneously

06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 41


ISSE Data Type
 ISSE extensions introduced one new data type
 128-Bit Packed Single-Precision Floating-Point Data Type

 SSE 2 introduced five data types

06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 42


Internet Streaming SIMD Extensions

 Intel’s Internet Streaming SIMD Extensions (ISSE)


 Help improve the performance of video and 3D applications
 70 new instructions beyond MMX Technology
 Adds new 128-bit registers
 Provide the ability to perform parallel floating point operations
 Four parallel operations on 32-bit numbers
 Reciprocal and reciprocal root instructions - normalization
 Packed average instruction – Motion compensation
 Provide data pre-fetch instructions
 Make certain applications 1.5 to 2.0 times faster.

06/23/20 ACA Spring 2011 Bahria Shaftab Ahmed 43

You might also like