09 Intelx86Architecture

4/27/2016
EEL 4768: Computer Architecture
Intel x86 Architecture
Department of Electrical and Computer Engineering
University of Central Florida
Instructor: Zakhia (Zak) Abichar
Intel x86
• Intel x86 refers to a family of architectures
• It started with the Intel 8086 CPU that was introduced in 1978
• Later CPUs evolved the architecture in a backward‐compatible way
• A big concept in Intel x86 architecture is binary compatibility
• This means a customer can buy a new Intel CPU and continue to run the
old software (executable files)
• Eg: if we buy a new computer with a new Windows, we can still run
previous software
“Object code created for processors released as early as 1978 still
executes on the latest processors in the Intel 64 and IA‐32
architecture families”
‐ Intel software developer’s manual
2
1
4/27/2016
Intel x86
• Nowadays, Intel x86 is prevalent and it’s used in Intel CPUs (laptop,
desktop, servers)
License to build x86 CPUs
• Intel x86 is licensed to two other companies that can build x86 CPUs
• AMD offers Intel CPUs for the PC market
• Another company, called VIA, is licensed to build x86 CPUs but isn’t a
major player in the market
Timeline of Intel x86
1978:
• Intel offers the 8086 CPU
• It’s assembly language compatible with the previous 8080 (8‐
bit) CPU; code may need to be re‐assesmbled
• The 8086 is 16‐bit architecture (registers are 16‐bit)
• It supports 8‐bit and 16‐bit operations
• The registers have dedicated use (not general‐purpose
registers)
• Address is 20‐bit (supports up to 1 MB of RAM)
• External data bus is 16‐bit
• Intel 8088 CPU is similar to 8086 but with 8‐bit data bus
• Introduces the segmented memory model
4
2
4/27/2016
1980:
• Intel announces 8087 floating‐point coprocessor
• Extends 8086 with 60 floating‐point (FP) instructions
• 8087 is implemented off the CPU on a different chip
• 8087 is based on a stack implementation model
• The two operands are the two topmost stack positions
• A stack model is used likely since FP operations are
mathematical expressions (can be done well on a stack)
• Even through registers are modeled as a stack, any register
can be accessed (read/written) if need be
5
1982:
• 80286 extends the address to 24‐bit
• It introduces an elaborate memory mapping and protection
model called ‘Protected Mode’
• It adds a few instructions to support the memory protection
model
• It has a mode to execute 8086 programs without change
• Some features of protected mode: segment‐limit checking,
read‐only, execute‐only options
• Segment size up to 16 MB
3
4/27/2016
1985:
• 80386 introduced as a 32‐bit architecture
• It has 32‐bit registers and 32‐bit address (up to 4 GB of RAM)
• New addressing modes are introduced
• New instructions are introduced to make the CPU nearly a
general‐purpose register machine
• Memory paging support is added in addition to segmented
memory
• It has a mode to execute 8086 programs without change
called Virtual 8086
• Memory paging is introduced with fixed 4 KB pages
1989‐1995:
(80486 in 1989)… (Pentium in 1992)… (Pentium Pro in 1995)…
• These machines aimed at improving the performance
• This is done by adding mechanisms in the CPU without
changing the architecture
• Only 4 user‐visible assembly instructions were added to deal
with multiprocessing on the single‐core CPU
• Improvements are: kernel instructions, pipeline
implementation, branch prediction, pre‐fetching
• These make the CPU faster but the architecture (assembly
language environment) can remain the same
• They are called the microarchitecture 8
4
4/27/2016
1997:
• Updating Pentium and Pentium Pro with MMX (Multimedia
Extension)
• MMX allow parallel processing of data by one instruction
• Benefit is speed and fetching fewer instruction from the
memory
• MMX introduced 8 new registers, 64‐bit each
• A MMX register can be used as (2x32‐bit), (4x16‐bit), (8x8bit)
• MMX introduced 57 new instructions
• Pentium II did not introduce any new instruction
9
1999:
• Pentium III introduced an update to MMX called SSE (SIMD
Streaming Extension) which adds 70 new instructions
• SSE introduces 8 registers, 128‐bit each, therefore, the ability
to process more data in parallel
• SSE introduces the support of the single‐precision FP
• SSE has cache pre‐fetch instructions
• SSE has a streaming store instruction that writes to memory
and bypasses the cache
SIMD (Single Instruction Multiple Data) refers to fetching one
instruction and applying it on multiple sets of operands. Some CPUs
are highly specialized SIMD machines. Intel provides MMX and SSE to
provide SIMD capability in its machines.
10
5
4/27/2016
2001:
• Pentium 4 introduces SSE2 with 144 new instructions
• It adds the double‐precision floating‐point
• Most of the new 144 instructions are versions of SSE for the
double data type
• The compiler can use either the stack of the 8087 coprocessor
or the registers of MMX or SSE, which are treated as general‐
purpose
11
2003:
• AMD announced AMD64 to extend x86 to 64‐bit
• Registers and addresses support 64‐bit
• General‐purpose registers increase from 8 to 16
• Instruction prefix used to access the extra registers
• A new mode called Long Mode redefines instruction to 64‐bit
data and address
• Long mode adds PC‐relative addressing mode
• AMD64 has Legacy Mode to run x86 programs
• AMD64 has Compatibility Mode to run OS in 64‐bit and
programs in x86 32‐bit
How come AMD improved the x86? More on this later…
12
6
4/27/2016
2004:
• Intel adopts AMD64 and renames it Extended Memory 64
Technology (EM64T)
• SSE3 introduced with 13 new instructions that support
(complex arithmetic), (graphics operations on arrays of
structures), (video encoding), (floating‐point conversion)
• AMD adds SSE3 in its CPU
13
2006:
• SSE4 introduced and adds 54 new instructions
• They perform (sum of absolute differences), (dot product for
arrays of structures), (sign or zero extension of narrow data to
wider sizes), (population count)
14
7
4/27/2016
2007:
• AMD announces SSE5 with 170 new instructions
• Among these, 46 instructions are a version of the base
instruction that add three operands (eg: ADD R1, R2, R3)
15
2008:
• Intel announced Advanced Vector Extension that expand the
SSE registers from 128‐bit to 256‐bit
• This introduces ~128 new instructions and redefines ~250
existing instructions
16
8
4/27/2016
AMD, Intel and 64‐bit x86
AMD, Intel and x86
Why does Intel license AMD to build x86 CPUs?
• The PC started in the 1970s with Apple; the IBM PC (the
Windows PC we have today) came later
• Intel was interested in providing the CPU for the IBM PC
• IBM didn’t want to tie itself to one CPU supplier, but rather
preferred to have multiple suppliers
• The solution was Intel would license its CPUs to AMD so that
both can supply CPUs for the PC
• The executives of Intel and AMD knew each other since they
both worked at Fairchild Semiconductor which was a major
company at the time
18
9
4/27/2016
AMD, Intel and x86
Intel and AMD’s cross license
• Intel and AMD have a cross‐license; each company can use
the other’s designs
• Therefore, AMD was in a position to extend Intel x86
Intel’s 64‐bit architecture
• Intel developed a 64‐bit architecture with HP, called IA‐64
• This materialized in the Itanium line of server CPUs
• Itanium is a VLIW CPU, where the compiler packages the
instructions that issue simultaneously
• The designers were looking to the far future of 64‐bit code;
Itanium doesn’t run 32‐bit code natively
19
AMD, Intel and x86
• Instead, it translates 32‐bit x86 code at run time to Itanium
assembly instructions
• Therefore, the 32‐bit performance could suffer
• The compiler was a challenge since a VLIW compiler wasn’t
widespread (more research was needed) and it affects the
performance significantly in a VLIW CPU
• As a result Itanium’s real performance was lower than what
was projected
20
10
4/27/2016
AMD, Intel and x86
AMD’s 64‐bit architecture
• AMD’s approach to 64‐bit was evolutionary (called AMD64)
• It supported the 32‐bit natively and provided 64‐bit extension
• This was good for the customers since most of the code at
that time was 32‐bit
• Microsoft supported both Itanium and AMD64
• Eventually, Microsoft dropped support for Itanium
• Intel then adopted AMD64
• Based on the cross‐license with AMD, this was feasible
21
AMD, Intel and x86
• This is the common terminology
Intel’s approach AMD’s approach adopted by Intel
IA‐64 AMD64
Itanium EM64T
x86‐64
x64
Intel 64
22
11
4/27/2016
References
• AMD brought 64‐bit to x86 10 years ago today
http://www.theinquirer.net/inquirer/feature/2262881/amd‐
brought‐64bit‐to‐x86‐10‐years‐ago‐today
• How AMD Beat Intel To 64‐Bits, And Intel Struck Back
http://www.forbes.com/sites/briancaulfield/2012/02/22/how‐
amd‐beat‐intel‐to‐64‐bits‐and‐intel‐bounced‐
back/#6522a96418e9
• What is Windows 64‐bit Itanium?
http://www.thewindowsclub.com/windows‐64‐bit‐itanium
23
Registers
12
4/27/2016
Registers
• In these slides, we’ll explore the 32‐bit x86 called IA‐32
• This is the register file of IA‐32
25
AL is an 8‐bit
AH is an 8‐bit register
register
BX is a 16‐bit
register
EDX is a 32‐bit
register
26
13
4/27/2016
• There are multiple 8‐bit registers: AH, AL, BH, BL, CH, CL, DH, DL
• Terminology: AH stands for ‘A High’ and AL stands for ‘A Low’
• These are 16‐bit registers: AX, BX, CX, DX
• These are 32‐bit registers: EAX, EBX, ECX, EDX, ESI, EDI, EBP, ESP
• ESP is the ‘Stack Pointer’
27
Why are the Registers Overlapping?
• The registers are overlapping since the architecture was originally 16‐bit
• Later, it was extended to 32‐bit
• A code written for a 16‐bit architecture will use 8‐bit and 16‐bit registers
only (it doesn’t know the 32‐bit registers exist)
• A newer code written for IA‐32 will use the 32‐bit register
• Therefore, the 32‐bit architecture is able to run a 16‐bit Intel x86 code
Originally, this
part existed only
28
14
4/27/2016
Why are the Registers Overlapping?
• In Intel 64, the register becomes 64‐bit
• The number of genera‐purpose registers increases to 16
29
Single Instruction Multiple Data
(SIMD)
15
4/27/2016
Single Instruction Multiple Data (SIMD)
• SIMD is a concept where one instruction is applied on
multiple set of operands
• Let’s say we want to add the two arrays below
• The straightforward approach is to loop over the arrays and
add them element by element, using 4 additions
• Alternatively, an SIMD instruction can be applied on four
operands; therefore, the two arrays are added with one
instruction (without the need for a loop) 31
• SIMD provides large registers of size 64‐bit, 128‐bit or 256‐bit
• Therefore, we can load multiple variable in a register
• For the previous example, we can load an array of 4 variables
in one register
• Once the two arrays are loaded, one SIMD instruction adds
the two arrays and produces a third array
• One benefit of SIMD is reducing the number of instructions
executed at run time
• This is beneficial to the instruction miss rate
32
16
4/27/2016
• SIMD is a general concept
• Each computer company/CPU implement SIMD in a slightly
different approach
MMX SSE SSE2 SSE3 SSSE3 SSSE4 AVX
• Intel’s implementations of SIMD are shown in the figure
above (chronologically from left to right)
• MMX: Multimedia Extension
• SSE: Streaming SIMD Extension (SSSE3: Supplemental SSE3)
• AVX: Advanced Vector Extension (a vector is similar to a large
register)
33
• The SIMD extensions perform operations on integer and on
floating‐point data types
• The next few slides show the possible operations and what
each extension can use
• SIMD extension typically introduce large registers in the CPU
in order to hold the operands
34
17
4/27/2016
• The MMX registers (64‐bit) are used in the extensions MMX
through SSE3
Fig. source: Intel
Manual Fig. 2‐4
35
• The XMM registers (128‐bit) are used in the extensions SSE
through AVX (all of them)
36
18
4/27/2016
• The YMM registers (128‐bit) are used in the new AVX
extension
37
Basic execution environment
for non 64‐bit mode
Fig. source: Intel
Manual Fig. 3‐1
38
19
4/27/2016
64‐bit mode execution
environment
Fig. source: Intel
Manual Fig. 3‐2
39
Memory Model
20
4/27/2016
Memory Model Terminology
• The linear address is the address requested by the program
• It’s usually known as the virtual address
• The physical address is that address that’s actually used on
the bus; it’s the address inside the RAM
• The linear address is translated to a physical address
• The logical address is used in the segmented model
• The memory is divided into segments
• A logical address consists of (segment number : offset)
• The logical address is converted into a linear address (and
then into a physical address when paging is used)
41
Segmented Memory (16‐bit)
• The 8086 CPU uses a segmented memory model
• The physical address is 20‐bit
• This gives the ability to address 220 B = 1 MB
• However, the x86 architecture was interested in using a 16‐bit
address to keep the architecture simpler
• A 16‐bit address allows addressing 216 B = 64 KB (too small)
The solution…
• Divide the memory into segments; each segment is 64 KB and
is accessed using a 16‐bit address
42
21
4/27/2016
• The physical address is 20‐bit
• A segment starts at any address that’s 0
multiple of 16 as the figure shows 16
• Why multiple of 16? 32
48
• A 20‐bit address that’s multiple of 16
looks like Size = 64 KB
<16-bit><0000>
• Conveniently, the start address of a Segmented memory model
segment is 16‐bit value (with 4 zeroes)
• The 16‐bit value is preferable
43
• The maximum segment size is 64 KB
• Segments are non‐overlapping
• This means if a segment starts at address 0 and is of size 64
KB, the next segment starts at address 64K
Why is the maximum segment size 64KB?
• Because: log2 64K = 16
• Therefore, we can use a 16‐bit offset to traverse the segment
• Again, this is convenient since we can use a 16‐bit offset
• Therefore a memory word is: (16‐bit segment start address)
and (16‐bit offset in the segment) 44
22
4/27/2016
Use of segments
• The data in the memory is split among segments based on its
type
• The segments are:
Code Segment (CS)
Data Segments (DS)
Stack Segment (SS)
Three other segments (ES), (FS), (GS)
• The segments ES, FS, GS are optional to use and can be used
for program data
45
• Each segment has a segment descriptor register on the CPU
• This data structure contains:
Start address of the segment
Size of the segment
46
23
4/27/2016
Fig. source: Intel
Manual Fig. 3‐3
47
Assembly Instructions
24
4/27/2016
x86 Assembly Code Syntax
ADD EAX, EBX ; This instruction makes EAX=EAX+EBX
This is a comment in the assembly
This is the name of the
language code
instruction. It’s called the
mnemonic
EAX and EBX are called the
operands
• Use a semi‐column ‘;’ to make a comment in the remainder of the line
• In MIPS, use a number sign ‘#’ to make a comment in the remainder of
the line
• Intel x86 code is not case sensitive; the above instruction can be written
as: add eax, ebx
49
• ADD AL, BL ; uses 8‐bit registers
• ADD AX, BX ; uses 16‐bit registers
• ADD EAX, EBX ; uses 32‐bit registers
50
25
4/27/2016
Add Instruction
• This is a C language statement. Let’s assume that ‘a’ and ‘b’
are mapped to EAX and EBX, respectively
a = a + b;
• This is the translation of the C code into x86 assembly
language:
ADD EAX, EBX
• The two operands are added; the answer is saved in the
leftmost operand
• In contrast, MIPS has three operands
51
Add Instruction
• When the operand is in the memory, square brackets [ ] are used
• This instruction adds EAX to the word from memory at address 100
• The result of the addition goes in EAX
• The address here is ‘immediate’
• A word of 32‐bit is loaded from the memory since EAX is 32‐bit
ADD EAX, [100]
• The instruction below loads a 16‐bit data from the memory since AX is
16‐bit
ADD AX, [100]
52
26
4/27/2016
Add Instruction
• This instruction adds EAX to a data from memory
• The memory address is in EBX; this is ‘indirect addressing’
• The result of the addition is stored in EAX
ADD EAX, [EBX]
53
Mov Instruction
• ‘Mov’ instruction copies data from one location to another
• The 32‐bit data from memory at address 100 is copied into EAX
MOV EAX, [100]
• The data from memory is copied into EAX
• The memory address is given in EBX; ‘indirect register
addressing’
MOV EAX, [EBX]
54
27
4/27/2016
One Instruction has Multiple Variants in x86
• All of the instructions below are ‘ADD’
• Note these instructions have different opcodes even though all of
them are ‘ADD’ in the assembly code
Variant Meaning Example

Add a register to
ADD <reg>, <reg> ADD EAX, EBX
another
Add a memory
ADD <reg>, <mem> ADD EAX, [124]
location to a register
Add a register to a
ADD <mem>, <reg> ADD [124], EAX
memory location
Add a constant to a
ADD <reg>, <con> ADD EAX, 35
register
Add a constant to a
ADD <mem>, <con> ADD [124], 35
memory location
55
Move Instruction
• What are the operands of the MOV instruction?
• Similar to ADD, they can be the following:
mov <reg>,<reg>
mov <reg>,<mem>
mov <mem>,<reg>
mov <reg>,<const>
mov <mem>,<const>
In Intel x86 instruction, there is at most one memory address
56
28
4/27/2016
Move Instruction
• This instruction copies one register into another
MOV ECX, EBX ; This will do ECX = EBX
• We can also do:
MOV EAX, 100H ; This will do EAX = 100 (hexadecimal) = 256
• We can also do:
MOV EAX, 1101B ; This will do EAX = 1101 (binary) = 13
• We can also write in decimal:
MOV EAX, 24 ; This will put 24 in register EAX
57
Move Instruction
• This instruction stores AX in the memory
MOV [100], AX ; The content of AX is moved to memory
; location 100
• Eg: we want to write the number 3801 into memory address
124, we do the following:
MOV AX, 3801 ; Now, AX=3801 (We’re using 16 bits)
MOV [124], AX ; Memory location @ address 124 = 3801
or
MOV [124], 3801
58
29
4/27/2016
x86 vs. MIPS
• MIPS cannot load from memory and do an ‘add’ operation in one
instruction
• This is the equivalent code in MIPS
• x86 code: MIPS Code:
ADD EAX, [100] lw $t0, 100($zero)
add $a, $t0, $a
• x86 code: MIPS Code:
ADD EAX, [EBX] lw $t0, 0($b)
add $a, $t0, $a
59
x86 vs. MIPS
• The x86 instruction below accesses the memory and does an ‘add’
operation
• In fact, this instruction accesses the memory twice
• There’s no MIPS that can access the memory and ‘add’
ADD [124], 35 ; add 35 to word at memory address 124
• This is the equivalent code in MIPS:
lw $t0, 124($zero) # load the word in register $t0
addi $t0, $t0, 35 # add 35 to it
sw $t0, 124($zero) # store the result at address 124
60
30
4/27/2016
Number of Operands
• In MIPS, the R‐type instruction have three operands
• In Intel x86, the instructions usually have two operands
• This makes the programming in assembly language a bit
different
61
Example
• This is a C code:
a = a + b;
b = b – c;
c = c + b – d;
• Assume that a, b, c, d are mapped to registers EAX, EBX, ECX, EDX
• The translation of the C code into x86 assembly language is:
ADD EAX, EBX ; a = a + b

SUB EBX, ECX ; b = b – c
ADD ECX, EBX ; c = c + b
SUB ECX, EDX ; c = c – d
62
31
4/27/2016
Example
• This is a C program
a = b + c; Since we can’t have 3 operands in the instruction,
b = a + c – d; we start by moving ‘b’ into ‘a’
• Translate this code into x86 assembly language (a,b,c,d in eax,ebx,ecx,edx)
MOV EAX, EBX ; a = b

ADD EAX, ECX ; a = a + c
MOV EBX, EAX ; b = a
ADD EBX, ECX ; b = a + c
SUB EBX, EDX ; b = (a+c) – d
63
Jumping and Branching
In MIPS:
•Branching instructions (beq, bne…), they compare two entities
and branch
•Jump: unconditional jump
In Intel x86:
•The ‘branch’ term is not used; all are called jump (whether
conditional or unconditional)
•The conditional jump is done in two instructions
64
32
4/27/2016
Jumping and Branching
Conditional Jump in Intel x86:
•The conditional jump is done by two instruction in x86
•The compare (CMP) instruction compares two entities
•the ‘JE’ (jump‐if‐equal) instruction jumps if the two entities are
equal
•How does ‘JE’ knows of the result of ‘CMP’?
•‘CMP’ puts the result of the comparison in a register called
‘FLAGS’ and ‘JE’ looks at the FLAGS register
Conditional Jump in Intel x86:
CMP EAX, EBX ; compare
JE Loop ; jump‐if‐equal
65
Flags
• In Intel x86, there is a register called ‘FLAGS’
• It has multiple flags; each flag is a 1‐bit field that indicates the
value of the last operation done by the ALU
• Below are two of the flags in x86
• Z flag is ‘1’ when the ALU answer is 0
• S flag is ‘1’ when the ALU answer is negative
Zero Flag (Z) Sign Flag (S)
66
33
4/27/2016
Compare (CMP) Instruction
• The compare (CMP) instruction compares two words
• The comparison is done by subtraction at the ALU
CMP EAX, EBX ; compare EAX and EBX

CMP EAX, 100 ; compare EAX and 100
• The first instruction above does (EAX‐EBX) at the ALU; the
ALU operation sets the flag
• If EAX is equal to EBX, Z flag is 1
• The second instruction does (EAX‐100) at the ALU
• If S flag is 1, it means EAX‐100<0, and therefore: EAX<100
67
Example
• This C program finds the sum of the numbers from 1 to 10
sum=0;
for(i=1; i<=10; i++)
sum = sum + i;
• Translate this code into x86 assembly language
MOV EAX, 1 ; i=1
MOV EBX, 0 ; sum=0
START: ADD EBX, EAX ; sum = sum + i
INC EAX ; Increment EAX by 1
CMP EAX, 11 ; Compare EAX and 11
JNE START ; If they’re not equal go to START
; (Jump Not Equal)
68
34
4/27/2016
MOV EAX, 1 ; i=1

MOV EBX, 0 ; sum=0
START: ADD EBX, EAX ; sum = sum + i
CMP EAX, 11 ; Compare EAX and 10
JNE START ; If they’re not equal go to START
; (Jump Not Equal)
• The instruction Compare ‘CMP’ compare EAX to 11
• The instruction Jump‐on‐Not‐Equal ‘JNE’ jumps if EAX is not equal to 11
• The compare and jump decisions are done in two instructions
• How does this work?
• CMP compares EAX and 11 by doing the subtraction ‘EAX – 11’
• CMP saves the answer in a special register called ‘FLAGS’
• JNE looks at the FLAGS to decide whether to jump or not
69
Conditional Jump
• This is a conditional jump in Intel x86 assembly
CMP EAX, 11
JNE START
• CMP sets the flags and JNE makes the jump decision based on
the flags
• Therefore, there should be no other instruction between
‘CMP’ and ‘JNE’ that alters the flags
70
35
4/27/2016
x86 Flags
• “C” flag is also written as “CF”
71
Example
• In this C code, if a<b, we get c=0, else, we get c=1
if (a<b)
c = 0;
else c = 1;
• Translate this code into x86 assembly language (a in EAX, b in EBX, c in
ECX)
CMP EAX, EBX ; Compare EAX and EBX
JL ANSWER0 ; Jump‐if‐Less (jump if EAX < EBX)
MOV ECX, 1
JMP END ; always jump (no condition)
ANSWER0: MOV ECX, 0 ; c = 0
END:
72
36
4/27/2016
Jump and Compare Instructions
• Allows us to go to another place in the code
• jmp <label> (jump without condition)
• je <label> (jump when equal)
jne <label> (jump when not equal)
jz <label> (jump when last result was zero)
jg <label> (jump when greater than)
jge <label> (jump when greater than or equal to)
jl <label> (jump when less than)
jle <label> (jump when less than or equal to)
• When we use jump with a condition, before it, we compare two values:
• cmp <reg>,<reg>
cmp <reg>,<mem>
cmp <mem>,<reg>
cmp <reg>,<con>
73
Increment and Decrement Instructions
• They add or subtract 1 to the operand
• The operand types are:
inc <reg>
inc <mem>
dec <reg>
dec <mem>
74
37
4/27/2016
Resources
• x86 Assembly Guide,
http://www.cs.virginia.edu/~evans/cs216/guides/x86.html
75
Intel x86 vs. MIPS
• Intel uses only 2 operands in an instruction
• MIPS uses 3 operands in an R‐type
• What is the implication of this design decision?
76
38
4/27/2016
• High‐level language statement: Example
Y = (A – B)/[C + (D.E)]
• The assembly code with the difference architecture styles:
This is like MIPS
This is like Intel x86
77
Code Size
• With fewer operands in the instruction, we need more
instructions to write a code
• Therefore, based on this factor only, x86 needs more
instructions than MIPS to write the same code
8 lines of code
6 lines of code
4 lines of code
78
39
4/27/2016
Code Size
• In reality, given an algorithm, the MIPS code is usually larger
than the Intel x86 code
Reason:
• An Intel x86 instruction can do multiple tasks (load from
memory and arithmetic in one instruction)
• MIPS needs multiple instruction to do this (one instruction to
load, another instruction to do arithmetic)
79
x86 Data Types
An unsigned data type is 0 or positive (can’t be negative)
In Intel x86:
Word is 16‐bit
Doubleword is 32‐bit
Quadword is 64‐bit
Signed numbers are represented in two’s complement
80
40
4/27/2016
x86 Data Types
•Floating‐point numbers have a decimal part (eg: 12.333)
81
Skip Instruction
• The skip instruction skips one line of code if a condition is met
• Let’s translate the C code below into x86 assembly:
if(a!=0)
b = b+1;
• We can use the “skip” instruction
…
ISZ A ; Instruction‐Skip‐if‐Zero
INC B
…
• When A=0, we skip the line that increments B
82
41
4/27/2016
Intel x86 vs. MIPS
• In MIPS, to push a word on the stack, decrement the stack
pointer ($sp) by 4 and then use store word ‘sw’
• Intel x86 provides ‘push’ and ‘pop’ instructions; this makes it
easy to save and retrieve data from the stack
MIPS:
addi $sp, $sp, ‐4
sw $t0, 0($sp) # Save $t0 on the stack
…
lw $t0, 0($sp) # Pop a word from the stack and save it in $t0
addi $sp, $sp, 4
Intel x86 stack:
PUSH EAX ; Push EAX on the stack
…
POP EAX ; Pop from the stack and save in EAX
83
Bit Test (BT) Instruction
• The Bit Test ‘BT’ instruction checks if a bit within a word is ‘1’
or ‘0’
• If the bit is ‘1’, the Carry Flag (CF) is set to ‘1’
• Otherwise, CF is set to ‘0’
• Syntax:
BT r/m,reg16
BT r/m,imm
• In MIPS, we didn’t have such an instruction, we did an AND or
OR with a mask
84
42
4/27/2016
Bit Test (BT) Instruction
• This code checks if the value in register AL is even or odd
• Pseudocode:
If(AL is even)
BL=0
Else BL=1
• x86 Code:
BT AL, 0 ; Test the bit 0 (rightmost bit) of AL
JC Else ; Jump‐if‐Carry (jump if CF=1)
MOV BL, 0
JMP End
Else: MOV BL, 1
End:
85
The Sign ‘S’ Flag
• The sign flag is set if the result is negative (leftmost bit is ‘1’)
• It is used with the Jump‐if‐sign (JS) instruction which jumps if
S=1
Code to print the numbers from 1 to 100 to the display (at address: 2400)
MOV EAX, 1
Loop: MOV [2400], EAX ; Print EAX to the display
CMP EAX, 101
JS Loop
86
43
4/27/2016
x86 Flags
• “C” flag is also written as “CF”
87
Exercise
• Write an Intel x86 assembly code that counts the number of
bits in EAX that are equal to 1; the answer goes in EBX
88
44
4/27/2016
Addressing Modes
Intel x86 Addressing Modes
• Addressing mode (definition): the addressing mode is how
the instruction gets its operands
• An addressing more does not necessarily specify a memory
address
– Eg: the ‘immediate addressing mode’ doesn’t deal with a memory
address
• The operand could be:
– An immediate value encoded in the instruction
– A register
– A data from the memory (multiple ways of specifying the address)
90
45
4/27/2016
Intel x86 Addressing Modes
• These addressing modes don’t use a memory address
• Immediate addressing:
– The operand is a constant value in the instruction
– Ex: MOV AL, 13
• Register addressing:
– The operand is a register; the address of the register (3 bits) is in the
instruction
– Ex: MOV AL, BL
• Direct addressing:
– The full address is encoded in the instruction
– Ex: JMP Label
– The address of the instruction at ‘Label’ is encoded in the instruction
91
Other Intel x86 addressing modes use a memory address.
This figure shows the addressing modes that use a memory address.
92
46
4/27/2016
x86 Addressing Modes that Reference the Memory
The address going to the
memory is the sum of
the ‘Base Address’ and
the ‘Effective Address’
The ‘Base Address’ is where the code segment starts in the memory. The
‘Effective Address’ is computed from the instruction.
93
What are the Segment Registers?
• When a program runs, it’s loaded in the memory and it’s divided
into multiples segments (code, data, …) as in the figure
• Each segment of the program has a corresponding segment
register that’s located in the CPU
…
• A segment register points to a data structure that …
indicates where the segment starts
Data
Memory allocated Segment
for a program
Code
Segment
94
47
4/27/2016
• The address computed from the assembly code is called
the ‘effective address’
• Eg: In the instruction ‘ADD AL, [100]’, 100 is the effective
address
• The effective address is added to the base address to
obtain the final address in the memory, called linear
address
• This allows the program to be loaded anywhere in the
memory
95
Effective Address & Linear Address
• When this code is written, we don’t know where it’ll be loaded in the
memory
• The code starts with address 0 (every instruction takes 3 bytes)
This is the memory allocated to the program. The …
Code Segment starts at address 200
…
Data
Linear Address = Base Address + Effective Address Segment
200 Code
Effective (Code to swap AL and BL) Linear Segment
Address: Address: …
0: MOV DL, AL 200:
2: MOV AL, BL 202: …
4: MOV BL, DL 204:
96
48
4/27/2016
Displacement •The full address is encoded in
Addressing the instruction
•Eg: MOV AL, [100]
•The displacement is 8, 16 or 32
bits
•This mode leads to long
instructions since the full address
is in the instruction
97
Displacement Addressing
• Example: MOV AL, [100]
• The displacement addressing is a simple addressing mode
• Every architecture usually has this addressing mode
• This mode cannot be used to make a loop
• On the other hand, if we do: ‘MOV AL, [BL]’, we can increment BL in the
loop and access the next elements in an array
• This mode makes the instruction long (uses too many bits)
MOV AL, [100] MOV AL, [BL]
Encoded on 8, 16 or 32 bits We need 3 bits only to
specify a register in Intel x86 98
49
4/27/2016
Intel x86 vs. MIPS (Immediate Values)
• In MIPS, the immediate values are of constant size, usually 16‐bit
• For example, ‘addi $t0, $t0, 33’
• The number ’33’ is encoded on 16 bits even if it can be done on 8 bits
• So MIPS wastes a bit of space
• In Intel x86, on the other hand, the immediate value in the instruction
can be 8, 16 or 32 bits
• So if the number we’re using is small, the instruction becomes shorter,
hence the memory space is saved since the code is saved in the memory
• MOV EAX, [EBX+100] ; immediate field ‘100’ fits on 8 bits
• MOV EAX, [EBX+400000] ; immediate field ‘400000’ fits on 16 bits
99
Base Addressing •The memory address is inside a
register
•We can use 8‐, 16‐ or 32‐bit
register (eg: AL, AX, EAX)
•Ex: MOV AL, [BL]
•Ex: MOV AL, [BX]
100
50
4/27/2016
Base Addressing
• Example: MOV AL, [EBX]
• This mode is also quite simple
• Most computer architectures support this mode
• This mode allows us to access consecutive memory locations (array) in a
loop
• In MIPS, the addressing mode for ‘load word’ and ‘store word’ is base
with offset
• But effectively, we can make the offset zero if we have only a base
• Eg: lw $t0, 0($s0) # the base address is in $s0
101
Base with Displacement •This addressing mode is similar to base
Addressing addressing, but we also add a constant number
to the base register
•The displacement can be 8, 16
or 32 bits
•Ex: MOV AL, [BL+10]
•This can also be written as:
MOV AL, [BL][10]
102
51
4/27/2016
Base with Displacement Addressing
• This addressing mode is identical to the MIPS addressing mode for ‘load
word’ and ‘store word’
• The difference is in MIPS, the displacement (we call it offset in MIPS) is
16 bits; this is the only supported size
• On the other hand, in Intel x86, the displacement can be 8, 16 or 32 bits
103
Scaled Index with Displacement •The address is made of an index
that’s multiplied (scaled) by 1, 2, 4 or
8 added to a displacement
•We multiply the index because
typical data types take 1, 2, 4 or 8
bytes
•This allows us to access data in an
array
•Like before, the displacement is 8,
16 or 32 bits
104
52
4/27/2016
Scaled Index with Displacement
• Example: MOV AL, [BL*2+100]
• Here, the scale is 2 and the displacement is 100
• The figure shows an array where every element is 2 bytes
• The array starts at address 100
• So the elements are at addresses: 100, 102, 104, …
• We can use this instruction to access the elements in a loop
• BL will be the index of the element (to access Array[i], we have BL=i)
MOV BL, 0 ; Initialize BL
Loop: MOV AX, [BL*2+100]
INC BL
…
100 101 102 103 104 105
105
Scaled Index with Displacement
Index Scale:
Register x 1, 2, 4 or 8 + Displacement = Effective Address
Index of element. Put Put here the
here “i” to get A[i] base address
* We want to access the elements of an array that starts at address 300; make
the displacement 300.
* Every element is 32‐bits (4 bytes so we scale by 4)
* In a loop, increment the index register
MOV SI, 0 ; Reset to Index Register to zero
Loop:
MOV AL, 300[SI*4]
INC SI
…
Word @ 0 0 1 2 3
•This is the memory organization. Every box is 1 byte
Word @ 4 4 5 6 7
(=8 bits) and has its own address.
Word @ 8 8 9 10 11
•Every 4 boxes (or 4 bytes) are one word
Word @ 12 12 13 14 15
•The valid addresses are multiples of 4 … … … … …
•Scaling: A[0] is at address 0; A[1] is at adddress 4;
A[i] is at address 4*i 106
53
4/27/2016
Base with Index and Displacement •The address is the sum of
two registers and an
immediate value
•Ex: MOV AL, [BL+CL+200]
•In this addressing mode,
there is no scaling
•The displacement, as usual,
can be 8, 16 or 32 bits
Other syntax:
MOV BL, [AL][SI][300] 107
Base with Index and Displacement
• Example: MOV AL, [BL+CL+200]
• We can use this addressing mode to access a data structure in the memory
that’s made of multiple arrays (or records)
• Let’s access Record B of the data structure
• BL has the start address of the data structure
Starts at address 100
• CL has the element in Record B we’re
accessing Size=200 Record A
• The displacement is the offset of Record B bytes
from the start of the data structure, which is 200
Record B
MOV BL, 100
MOV CL, 0
Record C
Loop: MOV AL, [BL+CL+200]
INC CL ; increment CL Data structure
… 108
54
4/27/2016
Base with Scaled Index and •This is the most powerful
Displacement Addressing addressing mode
•The address is made of the
sum of a base with a scaled
index with an immediate
number
•Ex: MOV AL,
[BL+CL*4+100]
109
Base with Scaled Index and Displacement Addressing
• Example: MOV AL, [BL+CL*4+100]
• This addressing mode can be used to access an array
• The element in the array can be 1, 2, 4 or 8 bytes
110
55
4/27/2016
Summary of x86 Addressing Modes
111
Addressing Modes Link
• This link provides more details on the x86 addressing modes
http://www.ic.unicamp.br/~celio/mc404s2‐03/addr_modes/intel_addr.html
112
56
4/27/2016
Example: Using Various Addressing Modes
• Write a code that reads the 1000 elements in an array and
print them to the screen
• The array starts at address 400
• To print something to the screen, send it to memory address
7000
• Every element in the array is 4 bytes, so the elements, A[0],
A[1], A[2], …, A[i] are at addresses 400, 404, 408, …, 400+4*i,
respectively
• We’ll write this loop in 3 ways, using:
– Register indirect addressing
– Base and displacement addressing
– Base with scaled index addressing
113
Using Various Addressing Modes
• Register indirect addressing
MOV EBX, 400
Start: MOV EDX, [EBX] ; Register indirect addressing
MOV [7000], EDX
ADD EBX, 4
CMP EBX, 4400
JL Start
• The disadvantage: computing the value 4400 is not too
intuitive
114
57
4/27/2016
• Base and displacement addressing
MOV EBX, 0
Start: MOV EDX, [EBX+400]
MOV [7000], EDX
ADD EBX, 4
CMP EBX, 4000
JL Start
• The disadvantage: the base address, 400, is put in the
instruction as displacement. If a code has a lot instructions
like this, the code size becomes larger.
115
• Base scaled index addressing
MOV ECX, 0 ; use as index

MOV EBX, 400
Start: MOV EDX, [EBX+ECX*4]
MOV [7000], EDX
INC ECX
CMP ECX, 1000
JL Start
• The advantage: everything is simple and transparent. ECX is
the index, it goes from 0 to 999. The element is 4 bytes, so
we multiply by 4 (easy for the user to see what’s happening).
116
58
4/27/2016
Addressing Modes Summary (Intel x86 vs. MIPS)
• Intel x86 has more addressing modes especially the addressing modes
that access the memory
• Intel x86 has almost every conceivable addressing mode possible
– Hence it’s a powerful tool and provides flexibility
– It also allows using the CPU registers in many ways
– The compilers could try to use all the addressing modes to squeeze some
performance gains
• MIPS, on the other hand, provides one way to access the memory (base
register + offset)
– It’s a good and versatile way but it’s not as diverse as the Intel modes
• Another characteristics of Intel x86 is the variable‐size immediate field
• The immediate field can be 8, 16 or 32 bits in order to minimize the size
of the instruction, and therefore, reduce the size of the code
117
Instruction Format & Encoding
59
4/27/2016
Instruction Format – Intel x86 vs. MIPS
• MIPS uses ‘fixed‐length instructions’; all the instructions are 32 bits
– Note: there is a 64‐bit MIPS architecture
• The instruction format is quite simple and ‘predictable’
– The first 6 bits are the opcode
– The immediate field is usually the rightmost 16 bits
– In the encoding, registers that we read are before the register that we write
• Intel x86 uses ‘variable‐length instructions’
– The smallest instruction is 1 byte, the largest is 17 bytes (136‐bit)
• The instruction format is not as simple as MIPS
• The opcode can be 1, 2 or 3 bytes
• There can be more than one way to encode an instruction!
119
Intel x86 Instruction Format
• This is the instruction format of Intel x86
• The opcode field is used in every instruction
• All the other fields are optional
120
60
4/27/2016
There are 4 optional prefixes that may be used before the opcode
• “Instruction Prefix” can be either “LOCK” or a repeat prefix
• “LOCK” prefix is used in multi‐processor environment
• Multiple processors are sharing memory
• When one processor is using a memory location, it locks it, so other
processors can’t use this memory location until the instruction is done
121
• Other use of the “Instruction Prefix” is one of repeat prefixes
• When there’s a repeat prefix, the instruction will be done for a number of
times; so we can have a loop using one instruction only!
• REP prefix: repeat multiple times; the number of times is in register CX
• REPE (repeat until equal), REPZ (repeat until zero), REPNE (repeat until
not equal), REPNZ (repeat until not zero)
122
61
4/27/2016
One‐Instruction Loop
• One instruction can do a loop in x86
• Why was this mode supported?
• Because in earlier years, programming was done in assembly language
• Compilers weren’t advanced
• So the hardware was becoming more and more complicated (and more
complex instructions are supported) to make it easier on the
programmer
• MIPS’ design came when compilers were becoming advanced
• In fact, the inventors of MIPS were working on compilers before
inventing MIPS
• They thought the software should be complex (eg compilers) and the
hardware should be simple
123
• “Segment Override”: specifies which segments this instruction should use
• Overrides the default segment register
• Remember, the program is loaded into allocated memory
• The allocated memory is divided into segments: Code Segment (CS), Data
Segment (DS), Stack Segment (SS), …
• This allows the programmer to reference data from any part of the
memory allocated for the program
124
62
4/27/2016
• “Operand size override”: the default operand size is either 16 or 32 bits
depending on the specific x86 architecture
• Using the “Operand size override” prefix allows us to switch between 16‐
bit operands and 32‐bit operands
• The “Address size override” prefix allows us to switch between using 16‐
bit addresses and 32‐bit addresses
125
• The opcode is 1, 2, or 3 bytes
• How are the opcode chosen in x86?
• They’re not totally random
• The opcode might indicate:
– If the data is 1 byte or full size (16‐bit or 32‐bit)
– Direction of data operation (to/from memory)
– If an immediate field should be sign‐extended
126
63
4/27/2016
• “ModR/m” and “SIB” fields provide addressing information
– “Mod” (mode) and “R/M (register/memory) specify if the operand is a register or a
memory location
– “Reg/Opcode” specifies a register, or it can be used to supplement the opcode
• “SIB”
– “Scale” field specifies the scale factor (multiply by 1, 2, 4, or 8)
– “Index” field specifies the index register
– “Base” field specifies the base register
127
• ‘Displacement’
– The displacement is a constant number that’s part of the memory address; it can be
8, 16 or 32 bits
• ‘Immediate’
– The immediate is a constant number that’s used in a logic or arithmetic operation
(it’s not part of a memory address); it can be 8, 16 or 32 bits
• ‘Displacement’ is used for memory address while ‘Immediate’ is used for
arithmetic and logic instructions
128
64
4/27/2016
Intel x86 Registers
• Let’s take another look at the x86 registers
• ESI, EDI, EBP, ESP are additional registers
Source index
General‐ Destination index
purpose
registers Base Pointer
Stack Pointer
Segment Flags
registers register
129
Register Use
• The registers ‘AX’, ‘BX’, ‘CX’ and ‘DX’ are usually called general‐
purpose registers
• However, the registers are typically used for certain tasks
– AX: accumulator (store intermediary results in it)
– BX: base register (when computing a memory address)
– CX: counter register
– DX: data register
When a code with a loop is
compiled, the counter goes in CX,
the data in DX, the result of a
computation in AX, and the start
address of an array can be saved
in BX.
130
65
4/27/2016
Flags Register
We’ve
used
these
flags
131
Recommended Register Use
• ‘ESI’ is used for Source Index
‐ We put in it the memory address of a source
• ‘EDI’ is used as Destination Index
‐ We put in it the memory address where we’ll write
• ‘EBP’ and ‘ESP’ are used to support procedure calling
132
66
4/27/2016
Stack Pointer ‘SP’ and Base Pointer ‘BP’
• They are used to support procedure calling
• When we call a ‘procedure’ (or function) in assembly code, we often
need to pass parameters (or arguments) to it
• We push the arguments on the stack
SP
Param 2
Param 1
SP BP Return address
… …
Before calling the procedure, When the procedure is called, BP is made
the SP is pointing to the start equal to SP (+1 element)
of the stack
What’s the reason of using BP? During the procedure, the SP might move up
and down. So it will be difficult to access the parameters. BP doesn’t move.
133
• Anytime in the procedure:
• The return address is at address BP
• Param1 is at address BP+1
• Param2 is at address BP +2
SP
Param 2
Param 1
BP Return address
…
When the procedure is called, BP is made
equal to SP+1
134
67
4/27/2016
• If the procedure inserts something on the stack, SP will move
• During the procedure SP moves but BP doesn’t
• So the parameters have the same address with respect to BP
– Param1 is at address BP+1 and Param2 is at address BP+2
• But when SP moves, the parameters’ addresses will change with
respect to SP; that’s why we use BP to reference the parameters
SP …
Param 2
Param 1
BP Return address
…
When the procedure is called, BP is made
equal to SP+1
135
Addressing Modes
• This figure represents the addressing modes of the x86 architecture
• The stack pointer, ESP, can be used as ‘base’ but not as ‘index’
Figure source: "Intel® 64 and IA‐32 Architectures Software Developer’s Manual"
136
68
4/27/2016
Addressing Modes
• Examples:
– Base register MOV EAX, [EBX]
– Base + displacement MOV EAX, [EBX+20]
– Base + scaled index MOV EAX, [EBX+ECX*4]
– Base + scaled index + displacement MOV EAX, [EBX+ECX*4+100]
137
Mod and R/M Fields
To find the values of these fields, we
look up the addressing mode in a
look‐up table.
138
69
4/27/2016
Mod and R/M fields
• The ‘Mod’ and ‘R/M’ fields specify the addressing mode
For “scaled index with displacement” (no base), use Mod=00, R/M=100.
Use SIB field. For Base, use 101.
Eg: “ADD EAX, [EBX*4 + 1400]” (there is a scaled index & displacement but no base register)
139
Register Numbers
• These are the addresses of the registers
Reg Value Register if Register if Register if

operand is 8‐ operand is 16‐ operand is 32‐
bit bit bit
000 AL AX EAX
001 CL CX ECX
010 DL DX EDX
011 BL BX EBX
100 AH SP ESP
101 CH BP EBP
110 DH SI ESI
111 BH DI EDI
140
70
4/27/2016
Instruction Encoding
• ADD CL, AL • ADD ECX, 21
• ADD ECX, EAX • POP EAX
• ADD EDX, [disp32] • POP [1200h]
• ADD EDI, [EBX] • INC EAX
• ADD EAX, [ESI + disp8] • INC [420]
• ADD EBX, [EBP + disp32] • INC [EAX*4]
• ADD [EAX + disp8], ECX
• ADD ECX, [EBX + EDI*4]
• ADD EAX, [ECX + EDI*8 +
disp32]
141
ADD CL, AL
• The opcode for ADD is: 000000ds
– d: direction; (If ‘Reg’ is destination, d=1), (Else, d=0)
– s: size of operand; (if operand is 8‐bit, s=0), (Else, s=1)
• In this case, we will put AL in ‘Reg’, so d=0
• Our operands, ‘CL’ and ‘AL’ are 8‐bit, so s=0
• Therefore, our opcode is: 00000000
8‐bit 2‐bit 3‐bit 3‐bit

opcode Mod Reg R/M
00000000 ‐ ‐ ‐
142
71
4/27/2016
ADD CL, AL (cont’d)
• Since we used d=0, ‘Reg’ field will be source
• So ‘Reg’ field is AL, which is register 000
• The R/M field is used as the second register
• It will have CL, which is
• When the operand in R/M is a register, we use Mod=11

opcode Mod Reg R/M
00000000 11 000 001
143
ADD CL, AL (cont’d)
• Therefore, the instruction “ADD CL, AL” is
0000 0000 1100 0001 in binary
or
00 C1 in hexadecimal

opcode Mod Reg R/M
00000000 11 000 001
144
72
4/27/2016
ADD CL, AL (another way)
• For some instructions, there is more than one way to do the
encoding
• Now, we’ll use d=1 so we’ll put the destination register “CL”
in the “Reg” field
– The opcode is now: 00000010
– Reg field is: 001
• We put the other register in R/M field, which is register AL
(‘000’)
• The ‘Mod’ field stays the same since R/M is a register
(Mod=11)
opcode Mod Reg R/M
00000010 11 001 000
145
ADD CL, AL (another way)
• So, the instruction “ADD CL, AL” can also be encoded as:
0000 0010 1100 1000 in binary
or
02 C8 in hexadecimal

opcode Mod Reg R/M
00000010 11 001 000
146
73
4/27/2016
ADD ECX, EAX
• This instruction will use the 32‐bit registers
– While “ADD CL, AL” uses the 8‐bit registers
• The opcode code is the same: 000000ds
– We’ll put EAX in ‘Reg’, so: d=0
– The size here is 32‐bit, so: s=1 (before s was 0 for 8‐bit operands AL,CL)
• So the instruction is: 0000 0001 1100 0001 or 01 C1 (hex)

opcode Mod Reg R/M
00000001 11 000 001
147
ADD EDX, [<displacement>]
• The ‘displacement’ is a direct memory address
– So it could be “ADD EDX, [24]” or any other address
• The opcode for ADD is the same: 000000ds
– Now, we will put EDX in ‘Reg’ field (we have only 1 register operand,
so it should be in ‘Reg’), so: d=1
– Also, EDX is a 32‐bit register, so: s=1
– Our opcode is 00000011
8‐bit 2‐bit 3‐bit 3‐bit 32‐bit

opcode Mod Reg R/M Displacement
00000011 ‐ EDX ‐ ‐
148
74
4/27/2016
• Now, we need to fill the ‘Mod’ and ‘R/M’ fields
• We look the addressing mode in the table
• Here, we only have a displacement
– We can use the ‘disp32’ mode
– So: Mod=00 and R/M=101
• The ‘Reg’ field should be EDX, we look it up in the table
– EDX is register number 010

00000011 00 010 101 ‐
149
• Finally, we set the displacement
• Assume we have: “ADD EDX, [0122ABC0H]”
• The displacement is a hex number: 01 22 AB C0
• When we put the displacement in the instruction, we put the
low‐order byte first
– So it’s stored as: C0 AB 22 01 (hex)

00000011 00 010 101 C0 AB 22 01 (hex)
150
75
4/27/2016
ADD EDI, [EBX]
• We start with the opcode. For ADD, it’s 000000ds
– Let’s put the destination, ‘EDI’, in the ‘Reg’ field, so: d=1
– The size of the operands here is 32 bits, so: s=1
– So, the opcode=00000011
• The ‘Reg’ field will contain ‘EDI’, so it is:
• Now, we need to lookup the indirect register addressing
• It’s ‘Mod’=00, and ‘R/M’=011
• We don’t need the displacement field here

opcode Mod Reg R/M
00000011 00 111 011
151
ADD EAX, [ESI + disp8]
• The <disp8> is a displacement (constant number) of 8 bits
• The opcode here is: 00000011
– We’ll put ‘EAX’ in ‘Reg’, so we made d=1
– The operand is 32‐bit, so s=1
• For the displacement of 8‐bit, ‘Mod’=01
• ‘R/M’ is register ESI, which is number 110
• Eg: displacement = 44 = 0010 1100 (binary) =2C (hex)

00000011 01 000 110 2C (hex)
152
76
4/27/2016
ADD EBX, [EBP + disp32]
• <disp32> is a displacement of 32 bits (we can put a full
address here)
• The opcode we’ll use here is: 00000011
– Destination in ‘Reg’ (which is EBX) and operand is 32‐bit
• Addressing mode:
– ‘Mod’ = 10 (means displacement is 32 bits)
– ‘R/M’ = 101 (register EBP)
• Example:
– Displacement = 00 00 A2 30 (hex) (base address of an array)
– It’s stored as: 30 A2 00 00 (hex)
00000011 10 011 101 30 A2 00 00 (hex)
153
ADD [EAX + disp8], ECX
• The opcode is: 000000ds
• Here, we have to put ECX in ‘Reg’ field. Therefore, d=0 since
ECX is not a destination
– s=1 since the size is 32‐bit. Therefore, the opcode is: 00000001
• Reg field:
– It indicates ECX, therefore, it’s: 001
• Mod and R/M:
– They correspond to [EAX+disp8]. From the lookup table, we get
Mod=01 and R/M=000

00000001 01 001 000 ‐
154
77
4/27/2016
x86 Instruction Format
• So far, we’ve been using the ‘opcode’ field, the ‘ModR/m’
field and the ‘displacement’ field
• Remember, all the fields are optional except the opcode
• Now, need to use the ‘SIB’ field to specify a scaled index
155
SIB Field
Scale Index *
Scale Value Index Register
00 Index*1 000 EAX
01 Index*2 001 ECX
10 Index*4 010 EDX
11 Index*8 011 EBX
100 Illegal
101 EBP
Base Register 110 ESI
000 EAX 111 EDI
001 ECX
010 EDX
011 EBX
The stack cannot be
100 ESP
used as index register
101 ‘No Base’ if Mod=00, EBP if mod=01 or 10
110 ESI
111 EDI
156
78
4/27/2016
ADD ECX, [EBX + EDI*4]
• We’ll use the following fields: opcode, ModR/M, SIB
• We’ll put ‘ECX’ in ‘Reg’, so d=1; size is 32‐bit, so s=1
– Opcode: 00000011
• Now, we can put ‘ECX’ (reg # 001) in ‘Reg’
• Next, we look up the ‘Mod’ and ‘R/M’; we have base with
scaled index
– ‘Mod’ = 00 and ‘R/M’=100
8‐bit 2‐bit 3‐bit 3‐bit 2‐bit 3‐bit 3‐bit

opcode Mod Reg R/M Scale Index Base
00000011 00 001 100 ‐ ‐ ‐
157
ADD ECX, [EBX + EDI*4]
• Now, we fill the ‘SIB’ field:
• Scale is 4, so we use: Scale=10
• Index is ‘EDI’, it’s number: 111
• Base is ‘EBX’, it’s number: 011
• In hexadecimal
ADD ECX, [EBX + EDI * 4] is 03 0C BB (hex)

00000011 00 001 100 10 111 011
158
79
4/27/2016
ADD EAX, [ECX + EDI*8 + disp32]
• Here, we need the opcode, ModR/M, SIB and displacement
• The opcode is 00000011
• We put ‘EAX’ in ‘Reg’, it’s number: 000
• We look up ‘Mod’ and ‘R/M’ for this addressing mode (base
with scaled index and displacement)
– ‘Mod’=10 and ‘R/M’=100
• Scale is 8, so we use ‘Scale’=11
• Index is ‘EDI’, it’s register number: 111
• The base is ‘ECX’, it’s register number: 001
8‐bit 2‐bit 3‐bit 3‐bit 2‐bit 3‐bit 3‐bit 32‐bit

opcode Mod Reg R/M Scale Index Base Displacement
00000011 10 000 100 11 111 001 <disp32>
159
ADD ECX, 21
• This instruction uses the immediate addressing mode (immediate = 21)
• In the “Lookup tables”, there’s no immediate addressing mode
– The ‘disp32’ field in the first table indicates a memory address (not an immediate)
• There is a different opcode when the 2nd operand is an immediate number
• ADD opcode (with immediate operand) is: 100000xs
– Operand is 8‐bit  s=0; Operand is 16‐bit or 32‐bit  s=1
– Constant is the same size as the register  x=0; otherwise  x=1
• ECX is 32 bits, so s=1
• The immediate number 21 fits on 8 bits, so it’s different from the register
ECX in size, so x=1
– Our opcode is: 1000 0011
opcode Mod Reg R/M Immediate
10000011 ‐ ‐ ‐ ‐
160
80
4/27/2016
ADD ECX, 21
• The ‘Reg’ field in ModR/M is not used, so it’s 000
• The ‘Mod’ and ‘R/M’ fields are used to specify the destination
• The destination is ECX, this is a register
– So Mod=11 and R/M=001 to specify ECX
• The immediate number 21 on 8 bits is: 0001 0101
• The instruction is in hex: 83 C1 15

opcode Mod Reg R/M Immediate
10000011 11 000 001 00010101
161
POP EAX
• The POP instruction has opcode: 8F(hex) or 10001111
• The ‘Reg’ field in ModR/M byte is unused (filled with 000)
• The addressing mode here is register (reg: EAX)
– So, Mod=11
– We also use: R/M=000 to specify register EAX
• The instruction is: 10001111 11000000 or 8F C0 (hex)

opcode Mod Reg R/M
1000 1111 11 000 000
162
81
4/27/2016
POP [1200h]
• This instruction pops the word from the stack and saves in
memory at address 1200 (hex)
• The opcode is the same as before: 8F(hex) or 10001111
• The ‘Reg’ field in ModR/M byte is unused (filed with 000)
• The addressing mode here is direct address (or displacement)
– So, Mod=00 and R/M=101
• The displacement is on 32 bits: 00 00 12 00
– Start writing from low‐order byte in the instruction

1000 1111 00 000 101 00 12 00 00 (h)
163
INC EAX
• The opcode for increment is: 1111111w
– The operand can be either a register or a memory location
• The operand is 32 bits, so w=1
– The opcode is: 11111111
• The ‘Reg’ field is not used and is set to: 000
• The addressing mode is register addressing
– So Mod=11 and R/M=000 to specify EAX
• In hex, the instruction is: FF C0

opcode Mod Reg R/M
11111111 11 000 000
164
82
4/27/2016
INC [420]
– The operand is 32‐bit so, w=1. The opcode is: 11111111
• The addressing mode is direct address or ‘displacement’
– Looking up this mode in the ‘tables’ we use Mod=00 and R/M=101
• The displacement is 420
– In binary, it is: 0… 00001 1010 0100
– In hex, it is: 00 00 01 A4
– It’s written in the instruction as: A4 01 00 00

11111111 00 000 101 A4 01 00 00 (h)
165
INC [EAX*4]
– The operand is 32‐bit so, w=1. The opcode is: 11111111
• With this opcode, the field ‘Reg’ is not used (set to 000)
• The addressing mode is ‘scaled index’ (there’s no base
register)
– Looking up this mode in the ‘tables’ we use Mod=00 and R/M=100

11111111 00 000 100 ‐ ‐ 101
166
83
4/27/2016
INC [EAX*4]
• The index register is EAX, so Index=000
• The scale is 4, so Scale=10
• Since there’s not base, we use the value ‘101’ which means
there’s no base when Mod=00
– In this case, we’re using Mod=00

11111111 00 000 100 10 000 101
167
Resources
• Opcodes for x86 instructions
http://ref.x86asm.net/coder32.html
168
84
4/27/2016
References
• “Computer Organization and Design”, Patterson & Hennessy,
4ed revised., (Section 2.17)
• “Intel 64 and IA‐32 Architectures Software Developer’s
Manual”, Volume 1 (Chapters 1, 2, 3)
• “Essentials of 80x86 Assembly Language”, by Richard Detmer
• “Assembly Language for x86 Processors”, by Kip Irvine
169
85

09 Intelx86Architecture

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

09 Intelx86Architecture

Uploaded by

Copyright:

Available Formats

4/27/2016

MMX SSE SSE2 SSE3 SSSE3 SSSE4 AVX

Variant Meaning Example

ADD [124], 35 ; add 35 to word at memory address 124

ADD EAX, EBX ; a = a + b

MOV EAX, EBX ; a = b

CMP EAX, EBX ; compare EAX and EBX

MOV EAX, 1 ; i=1

MOV AL, [100] MOV AL, [BL]

MOV ECX, 0 ; use as index

Reg Value Register if Register if Register if

8‐bit 2‐bit 3‐bit 3‐bit

8‐bit 2‐bit 3‐bit 3‐bit

8‐bit 2‐bit 3‐bit 3‐bit

8‐bit 2‐bit 3‐bit 3‐bit

8‐bit 2‐bit 3‐bit 3‐bit

8‐bit 2‐bit 3‐bit 3‐bit 32‐bit

8‐bit 2‐bit 3‐bit 3‐bit 32‐bit

8‐bit 2‐bit 3‐bit 3‐bit 32‐bit

8‐bit 2‐bit 3‐bit 3‐bit

8‐bit 2‐bit 3‐bit 3‐bit 8‐bit

8‐bit 2‐bit 3‐bit 3‐bit 8‐bit

8‐bit 2‐bit 3‐bit 3‐bit 2‐bit 3‐bit 3‐bit

8‐bit 2‐bit 3‐bit 3‐bit 2‐bit 3‐bit 3‐bit

8‐bit 2‐bit 3‐bit 3‐bit 2‐bit 3‐bit 3‐bit 32‐bit

8‐bit 2‐bit 3‐bit 3‐bit 8‐bit

8‐bit 2‐bit 3‐bit 3‐bit

8‐bit 2‐bit 3‐bit 3‐bit 32‐bit

8‐bit 2‐bit 3‐bit 3‐bit

8‐bit 2‐bit 3‐bit 3‐bit 32‐bit

8‐bit 2‐bit 3‐bit 3‐bit 2‐bit 3‐bit 3‐bit

8‐bit 2‐bit 3‐bit 3‐bit 2‐bit 3‐bit 3‐bit

You might also like