Professional Documents
Culture Documents
lock
| {z } add
|{z} word ptr [eax] , bx
|{z}
| {z }
Prefix Mnemonic Operand #1 Operand #2
Mem32(DS,Mb,Eax,Ebx,4,0x18)
ds: [
byte ptr |{z} bx
|{z} + si
|{z} + |1234h
{z } ]
| {z }
Operand Size Segment Base Register Index Register Displacement
Mem16(DS,Mb,Bx,Si,0x1234)
Non-Prefixed Prefixed
Prefix Instruction Instruction
Prefix Group #1
LOCK add ecx, [eax] lock add ecx, [eax]
REP movsb rep movsb
REPNZ scasb repnz scasb
Prefix Group #2
SEGMENT mov edx, ds:[0] mov edx, fs:[0]
Prefix Group #3
OPSIZE xor eax, eax xor ax, ax
Prefix Group #4
ADDRSIZE add ecx, [eax] add ecx, [bx+si]
Non-Prefixed Prefixed
Prefix Instruction Instruction
Prefix Group #1
LOCK add ecx, [eax] lock add ecx, [eax]
REP movsb rep movsb
REPNZ scasb repnz scasb
X86 Assembler
66 F7 84 1F 78 56 34 12 BC 9A
66 F7 84 1F 78 56 34 12 BC 9A
test word ptr [edi+ebx+12345678h], 9ABCh
Prefix Stem Memory Expression Immediate
z}|{ z}|{ z }| { z }| {
66 F7 84
|{z} 1F
|{z} 78
| 56{z34 12} BC 9A
z
MOD R/M
}| {z
SIB
}| { Displacement
10 |{z}
000 |{z} 00 |{z}
100 |{z} 011 |{z}
111
|{z}
MOD GGG RM SCALE INDEX BASE
80 7F bx si 7Fh
Encoded Memory Expression Base Index Displacement
C1
|{z}
MOD R/M
z }| {
11 000 001
|{z} |{z} |{z}
MOD GGG RM
01 45
|{z} 80
|{z} [di+0FF80h]
MOD R/M Displacement
10 81
|{z} 34
| {z12} [bx+di+1234h]
MOD R/M Displacement
X86 MOD R/M-16 Encodings
8-Bit Displacements
An 8-bit encoding for displacement d may be used when:
I Take the low 8 bits of the displacement: low = d&0xFF.
I Sign extend low to 16-bits: ext = sign_extend_8_16(low).
I If d == ext, the result is the same as the original displacement.
I I.e., 8-bit encoding can be used when 0FF80h <= d <= 7Fh.
X86 MOD R/M-16 Encodings
Special Case
C1
|{z}
MOD R/M
z }| {
11 000 001
|{z} |{z} |{z}
MOD GGG RM
As before:
I MOD R/M bytes are divided into three fields.
I They can specify either a register or a memory location.
I If MOD is 11 (i.e., 3), RM is used as a register number.
I The register family is specified by the instruction stem.
I If MOD is not 11, a memory expression is specified.
I We assume in the following material that MOD is not 11.
X86 MOD R/M-32 Encodings
MOD R/M Memory Encodings
As before:
I MOD R/M-32.RM selects the base register, as in:
01 42
|{z} 80
|{z} [edx+0FFFFFF80h]
MOD R/M Displacement
10 83 78
|{z} | 56{z34 12} [ebx+12345678h]
MOD R/M Displacement
X86 MOD R/M-32 Encodings
Special Case #1: Displacement Only
1F
|{z}
SIB
z }| {
00 011 111
|{z} |{z} |{z}
SCALE INDEX BASE
I Like MOD R/M, SIB bytes are divided into three fields.
I SCALE specifies the scale factor, i.e., the 4 in ebx*4.
I 00: 1, 01: 2, 10: 4, 11: 8
I INDEX specifies the index register number.
I BASE specifies the base register number.
I A displacement may follow, depending upon MOD R/M.MOD.
X86 MOD R/M-32/SIB Encodings
Special Case #1: No Index Register
I The Intel opcode maps describe the X86 instruction set, i.e.,
the legal encodings for each instruction stem.
I There may be multiple for a given stem; see the group in 80.
I Legal operands are specified by abstract operand types.
I E.g. OEb, OGb, OEv, OGv, OAL, OIb, OIz, OrAX, OES, OSS, OeAX, . . .
X86 Instruction Encodings
Encoding Table
66 60
66 60 pushaw |{z} |{z}
OPSIZE Prefix Stem
80
|{z} C0
|{z} 7F
|{z} 80
|{z} C8
|{z} 7F
|{z}
Stem MOD R/M Immediate OIb Stem MOD R/M Immediate OIb
z }| { z }| {
11 |{z}
|{z} 000 |{z}
000 11 |{z}
|{z} 001 |{z}
000
MOD GGG RM MOD GGG RM
I The Intel opcode maps use over 100 abstract operand types.
I Encoding and decoding operands is the primary source of
complexity in X86 assemblers and disassemblers.
I We simplify our X86 library dramatically by:
I Dividing the AOTs into eight categories.
I Representing them with a simple abstract operand type
description language, AOTDL.
X86 Abstract Operand Types
Exact Operand
EE out dx, al
|{z}
Stem
OAL Exact(Gb(Al))
OCL Exact(Gb(Cl))
OAX Exact(Gw(Ax))
ODX Exact(Gw(Dx))
OYb Exact(Mem32(ES,Mb,Edi,None,0,None))
OYw Exact(Mem32(ES,Mw,Edi,None,0,None))
AA stosb ds:[esi]
|{z} 64 |{z}
|{z} AA stosb fs:[esi]
Stem fs Prefix Stem
OXb ExactSeg(Mem32(DS,Mb,Esi,None,0,None))
OXw ExactSeg(Mem32(DS,Mw,Esi,None,0,None))
OXd ExactSeg(Mem32(DS,Md,Esi,None,0,None))
04
|{z} 7F
|{z} add al, 7Fh
Stem Immediate OIb
X86 Abstract Operand Types
Operand Description Language: ImmEnc
OIb ImmEnc(Ib)
OIw ImmEnc(Iw)
OOb ImmEnc(MemExpr(DS,Mb))
OJz ImmEnc(JccTarget)
.text:00401024 |{z}
EB FE
|{z} jmp loc 401024
Stem Immediate OJb
OJb SignedImm(JccTarget)
32
|{z} C3
|{z}
32 C3 add al , bl
|{z} Stem MOD R/M
z }| {
OGb (GGG) 11 000 011
|{z} |{z} |{z}
MOD GGG RM
X86 Abstract Operand Types
Operand Description Language: GPart
OGb GPart(Gb)
OGw GPart(Gw)
OGd GPart(Gd)
OSw GPart(SegReg)
OCd GPart(ControlReg)
ODd GPart(DebugReg)
OPd GPart(MMXReg)
OVq GPart(XMMReg)
Memory Memory
Operand Register Access Operand Register Access
Type Type Size Type Type Size
OEb 8-bit 8-bit OEw 16-bit 16-bit
OEd 32-bit 32-bit OQq MMX 64-bit
OWdq XMM 128-bit ORw 16-bit ILLEGAL
D7
|{z}
Encoding Stem Abstract MOD R/M #1
Operand No
Type OPSIZE OPSIZE
OeAX ax eax
OGv 16-bit register 32-bit register
OIv 16-bit immediate 32-bit immediate
OEv 16-bit register or 32-bit register or
memory access memory access
Operand No
Type OPSIZE OPSIZE
OeAX ax eax
40 inc eax
|{z} 66
|{z} 40 inc ax
|{z}
Stem OPSIZE Prefix Stem
Operand
Type AOTDL
OeAX SizePrefix(Exact(Gw(Ax)),Exact(Gd(Eax)))
OGv SizePrefix(GPart(Gw),GPart(Gd))
OIv SizePrefix(ImmEnc(Iw),ImmEnc(Id))
OEv SizePrefix(RegOrMem(Gw,Mw),RegOrMem(Gd,Md))
Operand No
Type ADDRSIZE ADDRSIZE
OM word-sized dword-sized
memory memory
access access
8D |{z}
|{z} 00 lea eax, dword ptr [eax]
| {z }
Stem MOD R/M
OM
67
|{z} 8D |{z}
|{z} 00 lea eax, word ptr [bx+si]
| {z }
ADDRSIZE Prefix Stem MOD R/M
OM
Operand
Type AOTDL
OM AddrPrefix(RegOrMem(None,Mw),RegOrMem(None,Md))
NO MATCH Required . . .
. . . OPSIZE ?
. . . ADDRSIZE ?
. . . Segment prefix ?
Operand
Type AOTDL
OGv SizePrefix(GPart(Gw),GPart(Gd)) J
OM AddrPrefix(RegOrMem(None,Mw),RegOrMem(None,Md))
Operand
Type AOTDL
OGv SizePrefix(GPart(Gw)J,GPart(Gd)) J
OM AddrPrefix(RegOrMem(None,Mw),RegOrMem(None,Md))
Operand
Type AOTDL
OGv SizePrefix(GPart(Gw),GPart(Gd)J) J
OM AddrPrefix(RegOrMem(None,Mw),RegOrMem(None,Md))
Operand
Type AOTDL
OGv SizePrefix(GPart(Gw),GPart(Gd))
OM AddrPrefix(RegOrMem(None,Mw),RegOrMem(None,Md)) J
Operand
Type AOTDL
OGv SizePrefix(GPart(Gw),GPart(Gd))
OM AddrPrefix(RegOrMem(None,Mw)J,RegOrMem(None,Md)) J
Operand
Type AOTDL
OGv SizePrefix(GPart(Gw),GPart(Gd))
OM AddrPrefix(RegOrMem(NoneJ,Mw)J,RegOrMem(None,Md)) J
Operand
Type AOTDL
OGv SizePrefix(GPart(Gw),GPart(Gd))
OM AddrPrefix(RegOrMem(None,MwJ)J,RegOrMem(None,Md)) J
typecheck_x86(a,x), Part 1
X86 Type Checking by Variety
Memory Expressions
typecheck_x86(a,x), Continued
return NOMATCH
X86 Instruction
Type-Checker
OPSIZE Required
66 81 C3 34 12
encode_x86(a,x), Part 1
Encoding X86 Operands by Variety, Continued
SizePrefix, AddrPrefix
a x Encoder Logic
SizePrefix(y,n) _ if sizepfx: encode_x86(y,x)
else: encode_x86(n,x)
AddrPrefix(y,n) _ if addrpfx: encode_x86(y,x)
else: encode_x86(n,x)
encode_x86(a,x), Continued
a x Encoder Logic
ImmEnc(JccTarget(t,f)) JccTarget(_) immediates.append(t - next_addr,4)
SignedImm(JccTarget(t,f)) JccTarget(_) immediates.append(t - next_addr,4)
encode_x86(a,x), Concluded
0 1 2 3 4 5 6 7
0 ADD PUSH POP
Eb, Gb Ev, Gv Gb, Eb Gv, Ev AL, Ib rAX, Iz ES ES
Direct(Add,[OEb,OGb])
Direct(Add,[OEv,OGv])
Direct(Add,[OGb,OEb]) I Most decoder entries are of type
Direct(Add,[OGv,OEv]) Direct(mnem,oplist).
Direct(Add,[OAL,OIb]) I mnem: Mnemonic
Direct(Add,[OrAX,OIz]) I oplist: Abstract operand types
Direct(Push,[OES])
Direct(Pop,[OES])
X86 Decoder Entries
Invalid
9C PredOpSize(Direct(Pushfw,[]),Direct(Pushfd,[]))
9D PredOpSize(Direct(Popfw,[]), Direct(Popfd,[]))
A5 PredOpSize(Direct(Movsw,[OXv,OYv]),Direct(Movsd,[OXv,OYv]))
A7 PredOpSize(Direct(Cmpsw,[OXv,OYv]),Direct(Cmpsd,[OXv,OYv]))
AB PredOpSize(Direct(Stosw,[OYv]),Direct(Stosd,[OYv]))
AD PredOpSize(Direct(Lodsw,[OXv]),Direct(Lodsd,[OXv]))
AF PredOpSize(Direct(Scasw,[OYv]),Direct(Scasd,[OYv]))
X86 Decoder Entries
PredAddrSize
E3 PredAddrSize(Direct(Jcxz,[OJb]),Direct(Jecxz,[OJb]))
Instruction Groups
GGG
Opcode Group MOD prefix 000 001 010 011 100 101 110 111
80-83 1 ADD OR ADC SBB AND SUB XOR CMP
80
|{z} D0
|{z} 7F
|{z} 80
|{z} E9
|{z} 7F
|{z}
Stem MOD R/M Immediate Stem MOD R/M Immediate
z }| { z }| {
11 |{z}
|{z} 010 |{z}
000 11 |{z}
|{z} 101 |{z}
001
MOD GGG RM MOD GGG RM
X86 Decoder Entries
Group
GGG
Opcode Group MOD prefix 000 001 010 011 100 101 110 111
80 1 ADD OR ADC SBB AND SUB XOR CMP
0F
| {z18} 00
|{z} 0F
| {z18} C0
|{z}
Stem MOD R/M Stem MOD R/M
z }| { z }| {
00 000 000
|{z} |{z} |{z} 11 000 000
|{z} |{z} |{z}
MOD GGG RM MOD GGG RM
X86 Decoder Entries
PredMOD
GGG
Opcode Group MOD prefix 000 001 010 011 100 101 110 111
0F 18 16 mem prefetch prefetch prefetch prefetch
NTA T0 T1 T2
11b
g = Group(...)
I The element PredMOD(m,r)
has two parameters:
GGG Group() Parameter #GGG
I m: the decoder entry to
0 Direct(Prefetchnta,[OEv])
use if MOD R/M.MOD
1 Direct(Prefetcht0, [OEv])
specifies memory.
2 Direct(Prefetcht1, [OEv]) I r: the decoder entry to
3 Direct(Prefetcht2, [OEv]) use for registers.
4 Invalid
I This group is represented by
5 Invalid
PredMOD(g,Invalid).
6 Invalid
7 Invalid
I g is shown to the left.
Instruction Groups
Distinguishing Based on RM
GGG
Opcode Group MOD prefix 000 001 010 011 100 101 110 111
C6 11 mem MOV
Eb, Ib
11b XABORT
(000)
Ib
C6
|{z} F8
|{z} 7F
|{z} C6
|{z} F9
|{z} 7F
|{z}
Stem MOD R/M Immediate Stem MOD R/M Immediate
z }| { z }| {
11 |{z}
|{z} 111 |{z}
000 11 |{z}
|{z} 111 |{z}
001
MOD GGG RM MOD GGG RM
X86 Decoder Entries
RMGroup
GGG
Opcode Group MOD prefix 000 001 010 011 100 101 110 111
C6 11 mem MOV
Eb, Ib
11b XABORT
(000)
Ib
l = RMGroup(...) m = Group(...)
g = Group(...)
0F
| {zAE} F8
|{z} F3 0F
|{z} | {zAE} C8
|{z}
Stem MOD R/M Prefix Stem MOD R/M
z }| { z }| {
11 111 000
|{z} |{z} |{z} 11 001 000
|{z} |{z} |{z}
MOD GGG RM MOD GGG RM
pfx 0F D0 0F D1 0F D2 0F D3 0F D4 0F D5 0F D6 0F D7
psrlw psrld psrlq paddq pmullw pmovmskb
Pq, Qq Pq, Qq Pq, Qq Pq, Qq Pq, Qq Gd, Nq
66 vaddsubpd vpsrlw vpsrld vpsrlq vpaddq vpmullw vmovq vpmovmskb
Vpd, Hpd, Wpd Vx, Hx, Wx Vx, Hx, Wx Vx, Hx, Wx Vx, Hx, Wx Vx, Hx, Wx Wq, Vq Gd, Ux
F3 movq2dq
Vdq, Nq
F2 vaddsubps movdq2q
Vps, Hps, Wps Pq, Uq
66 6B 84 1F 78 56 34 12 9A
Decoding X86 Instructions
Overview: Decode Prefixes and Stem
66 6B 84 1F 78 56 34 12 9A
66
|{z}
Prefix
Decoding X86 Instructions
Overview: Decode Prefixes and Stem
66 6B 84 1F 78 56 34 12 9A
66 |{z}
|{z} 6B
Prefix Stem
Decoding X86 Instructions
Overview: Decode Prefixes and Stem
66 6B 84 1F 78 56 34 12 9A
66 |{z}
|{z} 6B
Prefix Stem
Decoding X86 Instructions imul OGv OEv OIb
Overview: Decode Operands
N N N
Operand Type AOTDL
OGv SizePrefix(GPart(Gw),GPart(Gd)) J
OEv SizePrefix(RegOrMem(Gw,Mw),RegOrMem(Gd,Md))
OIb ImmEnc(Ib)
66 6B 84 1F 78 56 34 12 9A
66 |{z}
|{z} 6B
Prefix Stem
Decoding X86 Instructions imul OGv OEv OIb
Overview: Decode Operands
N N N
Operand Type AOTDL
OGv SizePrefix(GPart(Gw)J,GPart(Gd)) J
OEv SizePrefix(RegOrMem(Gw,Mw),RegOrMem(Gd,Md))
OIb ImmEnc(Ib)
66 6B 84 1F 78 56 34 12 9A
Memory Expression
z }| {
66 |{z}
|{z} 6B 84
|{z} 1F
|{z} 78
| 56{z34 12}
Prefix Stem
z
MOD R/M
}| {z
SIB
}| { Displacement
10 |{z}
000 |{z} 00 011 111
100 |{z} |{z} |{z}
|{z}
MOD GGG RM SCALE INDEX BASE
Decoding X86 Instructions imul OGv OEv OIb
Overview: Decode Operands
N N N
Operand Type AOTDL
OGv SizePrefix(GPart(Gw),GPart(Gd))
OEv SizePrefix(RegOrMem(Gw,Mw),RegOrMem(Gd,Md)) J
OIb ImmEnc(Ib)
66 6B 84 1F 78 56 34 12 9A
Memory Expression
z }| {
66 |{z}
|{z} 6B 84
|{z} 1F
|{z} 78
| 56{z34 12}
Prefix Stem
z
MOD R/M
}| {z
SIB
}| { Displacement
10 |{z}
000 |{z} 00 011 111
100 |{z} |{z} |{z}
|{z}
MOD GGG RM SCALE INDEX BASE
Decoding X86 Instructions imul OGv OEv OIb
Overview: Decode Operands
N N N
Operand Type AOTDL
OGv SizePrefix(GPart(Gw),GPart(Gd))
OEv SizePrefix(RegOrMem(Gw,Mw)J,RegOrMem(Gd,Md)) J
OIb ImmEnc(Ib)
66 6B 84 1F 78 56 34 12 9A
Memory Expression
z }| {
66 |{z}
|{z} 6B 84
|{z} 1F
|{z} 78
| 56{z34 12}
Prefix Stem
z
MOD R/M
}| {z
SIB
}| { Displacement
10 |{z}
000 |{z} 00 011 111
100 |{z} |{z} |{z}
|{z}
MOD GGG RM SCALE INDEX BASE
Decoding X86 Instructions imul OGv OEv OIb
Overview: Decode Operands
N N N
Operand Type AOTDL
OGv SizePrefix(GPart(Gw),GPart(Gd))
OEv SizePrefix(RegOrMem(Gw,MwJ),RegOrMem(Gd,Md)) J
OIb ImmEnc(Ib)
66 6B 84 1F 78 56 34 12 9A
Memory Expression
z }| {
66 |{z}
|{z} 6B 84
|{z} 1F
|{z} 78
| 56{z34 12}
Prefix Stem
z
MOD R/M
}| {z
SIB
}| { Displacement
10 |{z}
000 |{z} 00 011 111
100 |{z} |{z} |{z}
|{z}
MOD GGG RM SCALE INDEX BASE
Decoding X86 Instructions imul OGv OEv OIb
Overview: Decode Operands
N N N
Operand Type AOTDL
OGv SizePrefix(GPart(Gw),GPart(Gd))
OEv SizePrefix(RegOrMem(Gw,Mw),RegOrMem(Gd,Md))
OIb ImmEnc(Ib) J
10 |{z}
000 |{z} 00 011 111
100 |{z} |{z} |{z}
|{z}
MOD GGG RM SCALE INDEX BASE
Decoding X86 Instructions imul OGv OEv OIb
Overview: Decode Operands
N N N
Operand Type AOTDL
OGv SizePrefix(GPart(Gw),GPart(Gd))
OEv SizePrefix(RegOrMem(Gw,Mw),RegOrMem(Gd,Md))
OIb ImmEnc(Ib)
10 |{z}
000 |{z} 00 011 111
100 |{z} |{z} |{z}
|{z}
MOD GGG RM SCALE INDEX BASE
Instruction is imul ax, word ptr [edi+ebx+12345678h], 9Ah
Decoding X86 Instructions
Consume Prefixes
Consume Stem
Process Decoder Description
Decode Operands
Create Flow Information
Decoding X86 Instructions
Consume Prefixes
pfx 0F D0 0F D1 0F D2 0F D3 0F D4 0F D5 0F D6 0F D7
psrlw psrld psrlq paddq pmullw pmovmskb
Pq, Qq Pq, Qq Pq, Qq Pq, Qq Pq, Qq Gd, Nq
66 vaddsubpd vpsrlw vpsrld vpsrlq vpaddq vpmullw vmovq vpmovmskb
Vpd, Hpd, Wpd Vx, Hx, Wx Vx, Hx, Wx Vx, Hx, Wx Vx, Hx, Wx Vx, Hx, Wx Wq, Vq Gd, Ux
F3 movq2dq
Vdq, Nq
F2 vaddsubps movdq2q
Vps, Hps, Wps Pq, Uq
pfx 0F D0 0F D1 0F D2 0F D3 0F D4 0F D5 0F D6 0F D7
psrlw psrld psrlq paddq pmullw pmovmskb
Pq, Qq Pq, Qq Pq, Qq Pq, Qq Pq, Qq Gd, Nq
66 vaddsubpd vpsrlw vpsrld vpsrlq vpaddq vpmullw vmovq vpmovmskb
Vpd, Hpd, Wpd Vx, Hx, Wx Vx, Hx, Wx Vx, Hx, Wx Vx, Hx, Wx Vx, Hx, Wx Wq, Vq Gd, Ux
F3 movq2dq
Vdq, Nq
F2 vaddsubps movdq2q
Vps, Hps, Wps Pq, Uq
decode(a), Part 1
Decoding X86 Operands, by Variety
decode(a), Part 1
Decoding X86 Instructions
Create Flow Information