x86 Instruction Listings

3/14/23, 12:35 AM x86 instruction listings - Wikipedia
x86 instruction listings

The x86 instruction set refers to the set of instructions that x86-compatible microprocessors support. The instructions are usually part of an executable
program, often stored as a computer file and executed on the processor.
The x86 instruction set has been extended several times, introducing wider registers and datatypes as well as new functionality.[1]
x86 integer instructions
https://en.wikipedia.org/wiki/X86_instruction_listings 1/53
Below is the full 8086/8088 instruction set of Intel (81 instructions total). Most if not all of these instructions are available in 32-bit mode; they just
operate on 32-bit registers (eax, ebx, etc.) and values instead of their 16-bit (ax, bx, etc.) counterparts. The updated instruction set is also grouped
according to architecture (i386, i486, i686) and more generally is referred to as (32-bit) x86 and (64-bit) x86-64 (also known as AMD64).
Original 8086/8088 instructions
Original 8086/8088 instruction set

Instruction Meaning Notes Opcode
AAA ASCII adjust AL after addition used with unpacked binary-coded decimal 0x37
8086/8088 datasheet documents only base 10 version of the AAD instruction
(opcode 0xD5 0x0A), but any other base will work. Later Intel's documentation has
AAD ASCII adjust AX before division the generic form too. NEC V20 and V30 (and possibly other NEC V-series CPUs) 0xD5
always use base 10, and ignore the argument, causing a number of
incompatibilities
AAM ASCII adjust AX after multiplication Only base 10 version (Operand is 0xA) is documented, see notes for AAD 0xD4
AAS ASCII adjust AL after subtraction 0x3F

0x10…0x15, 0x80…0x81/2,
ADC Add with carry destination = destination + source + carry_flag
0x82…0x83/2 (since 80186)
0x00…0x05, 0x80/0…
ADD Add (1) r/m += r/imm; (2) r += m/imm; 0x81/0, 0x82/0…0x83/0
(since 80186)
0x20…0x25, 0x80…0x81/4,
AND Logical AND (1) r/m &= r/imm; (2) r &= m/imm;
0x82…0x83/4 (since 80186)
CALL Call procedure push eip; eip points to the instruction directly after the call 0x9A, 0xE8, 0xFF/2, 0xFF/3
CBW Convert byte to word 0x98

CLC Clear carry flag CF = 0; 0xF8
CLD Clear direction flag DF = 0; 0xFC
CLI Clear interrupt flag IF = 0; 0xFA

CMC Complement carry flag 0xF5
0x38…0x3D, 0x80…0x81/7,
CMP Compare operands
0x82…0x83/7 (since 80186)
Compare bytes in memory. May be used with a

CMPSB 0xA6
REP prefix to repeat the instruction CX times.
Compare words. May be used with a REP prefix
CMPSW 0xA7
to repeat the instruction CX times.
CWD Convert word to doubleword 0x99
DAA Decimal adjust AL after addition (used with packed binary-coded decimal) 0x27
DAS Decimal adjust AL after subtraction 0x2F
0x48…0x4F, 0xFE/1,
DEC Decrement by 1
0xFF/1
(1) AX = DX:AX / r/m; resulting DX = remainder (2) AL = AX / r/m; resulting

DIV Unsigned divide 0xF7/6, 0xF6/6
AH = remainder
ESC Used with floating-point unit 0xD8..0xDF
HLT Enter halt state 0xF4
(1) AX = DX:AX / r/m; resulting DX = remainder (2) AL = AX / r/m; resulting

IDIV Signed divide 0xF7/7, 0xF6/7
AH = remainder
0x69, 0x6B (both since
IMUL Signed multiply in One-operand form (1) DX:AX = AX * r/m; (2) AX = AL * r/m 80186), 0xF7/5, 0xF6/5,
0x0FAF (since 80386)
(1) AL = port[imm]; (2) AL = port[DX]; (3) AX = port[imm]; (4) AX =

IN Input from port 0xE4, 0xE5, 0xEC, 0xED
port[DX];
0x40…0x47, 0xFE/0,
INC Increment by 1
0xFF/0
INT Call to interrupt 0xCC, 0xCD
INTO Call to interrupt if overflow 0xCE
IRET Return from interrupt 0xCF

(JA, JAE, JB, JBE, JC, JE, JG, JGE, JL, JLE, JNA, JNAE, JNB, JNBE,
0x70…0x7F, 0x0F80…
Jcc Jump if condition JNC, JNE, JNG, JNGE, JNL, JNLE, JNO, JNP, JNS, JNZ, JO, JP, JPE,
0x0F8F (since 80386)
JPO, JS, JZ)
JCXZ Jump if CX is zero 0xE3
0xE9…0xEB, 0xFF/4,
JMP Jump
0xFF/5
LAHF Load FLAGS into AH register 0x9F
LDS Load DS:r with far pointer 0xC5
LEA Load Effective Address 0x8D
LES Load ES:r with far pointer 0xC4
LOCK Assert BUS LOCK# signal (for multiprocessing) 0xF0

Load string byte. May be used with a REP prefix
LODSB if (DF==0) AL = *SI++; else AL = *SI--; 0xAC
to repeat the instruction CX times.
Load string word. May be used with a REP

LODSW if (DF==0) AX = *SI++; else AX = *SI--; 0xAD
prefix to repeat the instruction CX times.
LOOP/LOOPx Loop control (LOOPE, LOOPNE, LOOPNZ, LOOPZ) if (x && --CX) goto lbl; 0xE0…0xE2
MOV Move copies data from one location to another, (1) r/m = r; (2) r = r/m; 0xA0...0xA3
if (DF==0)
Move byte from string to string. May be used *(byte*)DI++ = *(byte*)SI++;
MOVSB with a REP prefix to repeat the instruction CX else 0xA4
times. *(byte*)DI-- = *(byte*)SI--;
if (DF==0)
Move word from string to string. May be used
*(word*)DI++ = *(word*)SI++;
MOVSW with a REP prefix to repeat the instruction CX else 0xA5
times. *(word*)DI-- = *(word*)SI--;
MUL Unsigned multiply (1) DX:AX = AX * r/m; (2) AX = AL * r/m; 0xF7/4, 0xF6/4
NEG Two's complement negation r/m *= -1; 0xF6/3…0xF7/3

NOP No operation opcode equivalent to XCHG EAX, EAX 0x90
NOT Negate the operand, logical NOT r/m ^= -1; 0xF6/2…0xF7/2
0x08…0x0D, 0x80…0x81/1,
OR Logical OR (1) r/m |= r/imm; (2) r |= m/imm;
0x82…0x83/1 (since 80186)
(1) port[imm] = AL; (2) port[DX] = AL; (3) port[imm] = AX; (4) port[DX]
OUT Output to port 0xE6, 0xE7, 0xEE, 0xEF
= AX;
0x07, 0x0F(8086/8088 only),

r/m = *SP++; POP CS (opcode 0x0F) works only on 8086/8088. Later CPUs use
POP Pop data from stack 0x17, 0x1F, 0x58…0x5F,
0x0F as a prefix for newer instructions.
0x8F/0
POPF Pop FLAGS register from stack FLAGS = *SP++; 0x9D

0x06, 0x0E, 0x16, 0x1E,
PUSH Push data onto stack *--SP = r/m; 0x50…0x57, 0x68, 0x6A
(both since 80186), 0xFF/6
PUSHF Push FLAGS onto stack *--SP = FLAGS; 0x9C
0xC0…0xC1/2 (since
RCL Rotate left (with carry)
80186), 0xD0…0xD3/2
RCR Rotate right (with carry)
80186), 0xD0…0xD3/3
REPxx Repeat MOVS/STOS/CMPS/LODS/SCAS (REP, REPE, REPNE, REPNZ, REPZ) 0xF2, 0xF3
Not a real instruction. The assembler will translate these to a RETN or a RETF
RET Return from procedure
depending on the memory model of the target system.
RETN Return from near procedure 0xC2, 0xC3
RETF Return from far procedure 0xCA, 0xCB
ROL Rotate left
80186), 0xD0…0xD3/0
ROR Rotate right
80186), 0xD0…0xD3/1
SAHF Store AH into FLAGS 0x9E
SAL Shift Arithmetically left (signed shift left) (1) r/m <<= 1; (2) r/m <<= CL;
80186), 0xD0…0xD3/4
SAR Shift Arithmetically right (signed shift right) (1) (signed) r/m >>= 1; (2) (signed) r/m >>= CL;
80186), 0xD0…0xD3/7
alternative 1-byte encoding of SBB AL, AL is available via undocumented SALC 0x18…0x1D, 0x80…0x81/3,
SBB Subtraction with borrow
instruction 0x82…0x83/3 (since 80186)
Compare byte string. May be used with a REP

SCASB 0xAE
Compare word string. May be used with a REP
SCASW 0xAF
SHL Shift left (unsigned shift left)
80186), 0xD0…0xD3/4
SHR Shift right (unsigned shift right)
80186), 0xD0…0xD3/5
STC Set carry flag CF = 1; 0xF9
STD Set direction flag DF = 1; 0xFD
STI Set interrupt flag IF = 1; 0xFB
Store byte in string. May be used with a REP

STOSB if (DF==0) *ES:DI++ = AL; else *ES:DI-- = AL; 0xAA
Store word in string. May be used with a REP
STOSW if (DF==0) *ES:DI++ = AX; else *ES:DI-- = AX; 0xAB
0x28…0x2D, 0x80…0x81/5,
SUB Subtraction (1) r/m -= r/imm; (2) r -= m/imm;
0x82…0x83/5 (since 80186)
0x84, 0x85, 0xA8, 0xA9,

TEST Logical compare (AND) (1) r/m & r/imm; (2) r & m/imm;
0xF6/0, 0xF7/0
WAIT Wait until not busy Waits until BUSY# pin is inactive (used with floating-point unit) 0x9B
XCHG Exchange data r :=: r/m; A spinlock typically uses xchg as an atomic operation. (coma bug). 0x86, 0x87, 0x91…0x97
XLAT Table look-up translation behaves like MOV AL, [BX+AL] 0xD7
0x30…0x35, 0x80…0x81/6,
XOR Exclusive OR (1) r/m ^= r/imm; (2) r ^= m/imm;
0x82…0x83/6 (since 80186)
Added in specific processors
Added with 80186/80188
Instruction Opcode Meaning Notes
Check array index against

BOUND 62 /r raises software interrupt 5 if test fails
bounds
Modifies stack for entry to procedure for high level language. Takes two operands: the
ENTER C8 iw ib Enter stack frame
amount of storage to be allocated on the stack and the nesting level of the procedure.
equivalent to:
6C
IN AX, DX
INSB/INSW Input from port to string
MOV ES:[DI], AX
; adjust DI according to operand size and DF
6D
LEAVE C9 Leave stack frame Releases the local stack storage created by the previous ENTER instruction.
equivalent to:
6E
MOV AX, DS:[SI]
OUTSB/OUTSW Output string to port
OUT DX, AX
; adjust SI according to operand size and DF
6F
equivalent to:
POP DI
POP SI
POP BP
Pop all general purpose POP AX ; no POP SP here, all it does is ADD SP, 2 (since AX will be overwritten
POPA 61
registers from stack later)
POP BX
POP DX
POP CX
POP AX
equivalent to:
PUSH AX
PUSH CX
PUSH DX
Push all general purpose
PUSHA 60 PUSH BX
registers onto stack PUSH SP ; The value stored is the initial SP value
PUSH BP
PUSH SI
PUSH DI
example:
6A ib
Push an immediate byte/word
PUSH immediate PUSH 12h
value onto the stack PUSH 1200h
68 iw
IMUL immediate Signed and unsigned example:

multiplication of immediate
byte/word value IMUL BX,12h
6B /r ib IMUL DX,1200h
IMUL CX, DX, 12h
IMUL BX, SI, 1200h
IMUL DI, word ptr [BX+SI], 12h
IMUL SI, word ptr [BP-4], 1200h
69 /r iw
Note that since the lower half is the same for unsigned and signed
multiplication, this version of the instruction can be used for unsigned
multiplication as well.
example:
C0
SHL/SHR/SAL/SAR/ROL/ROR/RCL/RCR Rotate/shift bits with an
ROL AX,3
immediate immediate value greater than 1 SHR BL,3
C1
Added with 80286
Available in 16/32-bit protected mode only.
Adjust RPL field of Causes #UD in Real mode and Virtual 8086 Mode - Windows 95 and OS/2 2.x are known to make
ARPL r/m16, r16 63 /r
selector extensive use of this #UD to use the 63 opcode as a one-byte breakpoint to transition from Virtual 8086
Mode to kernel mode.[2][3]
Clear task-switched
CLTS 0F 06 flag in Machine
Status Word.
Sets ZF=1 if the descriptor could be loaded, ZF=0 otherwise.

Load access rights
byte from the 32-bit variant of LAR instruction is documented to load undefined data into bits 19:16 of destination
LAR r,r/m16 0F 02 /r
specified segment
descriptor register on Intel CPUs.
Load segment limit

LSL r,r/m16 0F 03 /r from the specified Sets ZF=1 if the descriptor could be loaded, ZF=0 otherwise.
segment descriptor
Each of these instructions loads a 2-part table descriptor. The first part is a 16-bit value, specifying table size in bytes minus
Load Global
1. The second part is a 32-bit value (64-bit value in 64-bit mode), specifying the linear start address for the table. This
LGDT m16&32 0F 01 /2 Descriptor Table
address is ANDed with 00FFFFFFh for the 16-bit variants of these instructions.
Register
LIDT can relocate the Interrupt Vector Table in Real Mode as well.
Load Interrupt
LIDT m16&32 0F 01 /3 Descriptor Table LGDT and LIDT are serializing instructions.
Register
Load Local
LLDT r/m16 0F 00 /2 Descriptor Table
Register LLDT and LTR are serializing instructions.
LTR r/m16 0F 00 /3 Load Task Register
On 80386 and later, the "Machine Status Word" is the same as the CR0 register, however LMSW can only modify the
bottom 4 bits of this register.
LMSW can be used to enter but not leave x86 Protected Mode. On the 80286, it is not possible to leave
Load Machine Protected Mode at all without a CPU reset - on 80386 and later, it is possible to leave Protected Mode,
LMSW r/m16 0F 01 /6
Status Word
but this requires the use of the 80386-and-later MOV to CR0 instruction.
LMSW is a serializing instruction.
Store Global The SGDT,SIDT,SLDT,SMSW,STR were unprivileged on all x86 CPUs from 80286 onwards until the introduction of UMIP in
SGDT m16&32 0F 01 /0 Descriptor Table 2017.[4]
Register
This has been a significant security problem for software-based virtualization, since it enables these
Store Interrupt instructions to be used by a VM guest to detect that it is running inside a VM.[5][6]
SIDT m16&32 0F 01 /1 Descriptor Table
Register The 16-bit variants of the SGDT and SIDT instructions also show a difference between Intel
documentation and actual behavior observed on Intel CPUs: as of Intel SDM revision 076, December
Store Local 2021 (https://kib.kiev.ua/x86docs/Intel/SDMs/325462-076.pdf), the last 8 bits of the descriptor is
SLDT r/m16 0F 00 /0 Descriptor Table
Register documented as being written as 0, however observed behavior is that bits 31:24 of the descriptor table
address are written instead.[7]
Store Machine
SMSW r/m16 0F 01 /4
Status Word SLDT and SMSW (but not STR) with a 32-bit register argument are documented to set the top 16 bits
of the specified register to an undefined value on Intel CPUs.
STR r/m16 0F 00 /1 Store Task Register
Verify a segment for

VERR r/m16 0F 00 /4 Sets ZF=1 if segment can be read, ZF=0 otherwise.
reading
Sets ZF=1 if segment can be written, ZF=0 otherwise.
Verify a segment for On some Intel CPU/microcode combinations from 2019 onwards, the VERW instruction also flushes
VERW r/m16 0F 00 /5
writing microarchitectural data buffers. This enables it to be used as part of workarounds for Microarchitectural
Data Sampling security vulnerabilities.[8][9]
LOADALL 0F 05 Load all CPU Undocumented, 80286 only. (A different variant of LOADALL with a different opcode and memory layout exists on 80386.)
registers, including
internal ones such
as GDT
Added with 80386
Instruction Meaning Notes
BSF Bit scan forward

BSF and BSR produce undefined results if the source argument is all-0s.
BSR Bit scan reverse
BT Bit test
BTC Bit test and complement
BTR Bit test and reset Instructions atomic only if LOCK prefix present.
BTS Bit test and set
Sign-extends EAX into EDX, forming the quad-word EDX:EAX. Since (I)DIV uses EDX:EAX as its input, CDQ must
CDQ Convert double-word to quad-word
be called after setting EAX if EDX is not manually initialized (as in 64/32 division) before (I)DIV.
Compares ES:[(E)DI] with DS:[(E)SI] and increments or decrements both (E)DI and (E)SI, depending on DF; can be
CMPSD Compare string double-word
prefixed with REP
CWDE Convert word to double-word Unlike CWD, CWDE sign-extends AX to EAX instead of AX to DX:AX
IBTS Insert Bit String Discontinued with B1 step of 80386.
Two-operand form of IMUL: Signed and Allows to multiply two registers directly, storing the partial (truncated) lower bit result. Since the lower half is the same
IMUL
Unsigned for unsigned and signed multiplication, this version of the instruction can be used for unsigned multiplication as well
INSD Input from port to string double-word *(long)ES:EDI±± = port[DX]; (±± depends on DF, ES: cannot be overridden). Can be prefixed with REP.
Interrupt return; D suffix means 32-bit

IRETx return, F suffix means do not generate Use IRETD rather than IRET in 32-bit situations
epilogue code (i.e. LEAVE instruction)
Jxx (near) Jump conditionally Conditional near jump instructions for all 8086 Jxx short jump instructions
JECXZ Jump if ECX is zero
LFS, LGS Load far pointer

LSS Load stack segment and register Normally used to update both SS and SP at the same time.
LODSD Load string double-word EAX = *DS:(E)SI±±; (±± depends on DF, DS: can be overridden); can be prefixed with REP
LOOPW,
Loop, conditional loop Same as LOOP, LOOPcc for earlier processors
LOOPccW
LOOPD,
Loop while equal if (cc && --ECX) goto lbl;, cc = Z(ero), E(qual), NonZero, N(on)E(qual)
LOOPccD
MOV to/from
Move to/from special registers CR=control registers, DR=debug registers, TR=test registers (up to 80486)
CR/DR/TR
MOVSD Move string double-word *(dword*)ES:EDI±± = *(dword*)ESI±±; (±± depends on DF); can be prefixed with REP
MOVSX Move with sign-extension (long)r = (signed char) r/m; and similar
MOVZX Move with zero-extension (long)r = (unsigned char) r/m; and similar
OUTSD Output to port from string double-word port[DX] = *(long*)DS:ESI±±; (±± depends on DF, DS: can be overridden); can be prefixed with REP.
Pop all double-word (32-bit) registers from
POPAD Does not pop register ESP off of stack
stack
POPFD Pop data into EFLAGS register
Push all double-word (32-bit) registers onto

PUSHAD
stack
PUSHFD Push EFLAGS register onto stack
Push a double-word (32-bit) value onto

PUSHD
stack
SCASD Scan string data double-word Compares ES:[(E)DI] with EAX and increments or decrements (E)DI, depending on DF; can be prefixed with REP
(SETA, SETAE, SETB, SETBE, SETC, SETE, SETG, SETGE, SETL, SETLE, SETNA, SETNAE, SETNB, SETNBE,
SETcc Set byte to one on condition, zero otherwise SETNC, SETNE, SETNG, SETNGE, SETNL, SETNLE, SETNO, SETNP, SETNS, SETNZ, SETO, SETP, SETPE,
SETPO, SETS, SETZ)
SHLD Shift left double r1 = r1<<CL ∣ r2>>(register_width - CL); Instead of CL, 8-bit immediate can be used.
r1 = r1>>CL ∣ r2<<(register_width - CL); Instead of CL, 8-bit immediate can be used.
SHLD and SHRD with 16-bit arguments and a shift-amount greater than 16 produce undefined
SHRD Shift right double
results. (Actual results differ between different Intel CPUs, with at least three different behaviors
known.[10])
STOSD Store string double-word *ES:EDI±± = EAX; (±± depends on DF, ES cannot be overridden); can be prefixed with REP
Discontinued with B1 step of 80386.
Used by software mainly for detection of the buggy[11] B0 stepping of the 80386. Microsoft
XBTS Extract Bit String
Windows (v2.01 and later) will attempt to run the XBTS instruction as part of its CPU detection if
CPUID is not present, and will refuse to boot if XBTS is found to be working.[12]
Compared to earlier sets, the 80386 instruction set also adds opcodes with different parameter combinations for the following instructions: BOUND,
IMUL, LDS, LES, MOV, POP, PUSH and prefix opcodes for FS and GS segment overrides.
Added with 80486
r = r<<24 | r<<8&0x00FF0000 | r>>8&0x0000FF00 | r>>24; Only defined for 32-bit registers. Usually used to
BSWAP r32 0F C8+r Byte Swap change between little endian and big endian representations. When used with 16-bit registers produces various different
results on 486,[13] 586, and Bochs/QEMU.[14]
0F A6 /r[15] 0F A6/A7 encodings only available on 80486 stepping A.[16]

CMPXCHG r/m8, r8
0F B0 /r[17] Compare and 0F B0/B1 encodings available on 80486 stepping B and later x86 CPUs.
Exchange
0F A7 /r
CMPXCHG r/m, r16/32 Instruction atomic only if used with LOCK prefix.
0F B1 /r
Invalidate Internal
INVD 0F 08 Flush internal caches. Modified data present in the cache are not written back to memory, potentially causing data loss.
Caches
Invalidate TLB
INVLPG m8 0F 01 /7 Invalidate TLB Entry for page that contains data specified.
Entry
Write Back and Writes back all modified cache lines in the processor's internal cache to main memory and invalidates the internal
WBINVD 0F 09
Invalidate Cache caches.
XADD r/m,r8 0F C0 /r Exchanges the first operand with the second operand, then loads the sum of the two values into the destination operand.
eXchange and
ADD Instruction atomic only if used with LOCK prefix.
XADD r/m,r16/32 0F C1 /r
Added in P5/P6-class processors
Integer/system instructions that were not present in the basic 80486 instruction set, but were added in processors prior to the introduction of SSE.
(Discontinued instructions are not included.)
Instruction Opcode Description Ring Added in
Read Model-specific register. The MSR to read is specified in ECX. The value of the MSR is then
RDMSR 0F 32
returned as a 64-bit value in EDX:EAX. IBM 386SLC,[18]
Intel Pentium,
Write Model-specific register. The MSR to write is specified in ECX, and the data to write is given in
0 AMD K5,
EDX:EAX.[a] Cyrix 6x86MX,MediaGXm,
WRMSR 0F 30 IDT WinChip C6,
Instruction is, with some exceptions, serializing.[b] Transmeta Crusoe
Intel 386SL, 486SL,[c]

Intel Pentium,
AMD 5x86,
-2
RSM[20] 0F AA Resume from System Management Mode.
(SMM) Cyrix 486SLC/e,[21]
IDT WinChip C6,
Transmeta Crusoe,
Rise mP6
CPU Identification and feature information. Takes as input a CPUID leaf index in EAX and, depending on Intel Pentium,[d]
leaf, a sub-index in ECX. Result is returned in EAX,EBX,ECX,EDX.
AMD 5x86,[d]
Cyrix 5x86,[e]
Instruction is serializing, and causes a mandatory #VMEXIT under virtualization. IDT WinChip C6,
CPUID 0F A2 3
Transmeta Crusoe,
Support for CPUID can be checked by toggling bit 21 of EFLAGS (EFLAGS.ID) - if this Rise mP6,
bit can be toggled, CPUID is present. NexGen Nx586,[f]
UMC Green CPU
Intel Pentium,
Compare and Exchange 8 bytes. Compares EDX:EAX with m64. If equal, set ZF and load ECX:EBX into
AMD K5,
m64. Else, clear ZF and load m64 into EDX:EAX.
Cyrix 6x86L,MediaGXm,
CMPXCHG8B m64 0F C7 /1 3 IDT WinChip C6,[h]
Instruction atomic only if used with LOCK prefix.[g] Transmeta Crusoe,[h]
Rise mP6[h]
Read 64-bit Time Stamp Counter (TSC) into EDX:EAX.[i]

Intel Pentium,
In early processors, the TSC was a cycle counter, incrementing by 1 for each clock AMD K5,
Cyrix 6x86MX,MediaGXm,
RDTSC 0F 31 cycle (which could cause its rate to vary on processors that could change clock speed 3[k] IDT WinChip C6,
at runtime) - in later processors, it increments at a fixed rate that doesn't necessarily Transmeta Crusoe,
match the CPU clock speed.[j] Rise mP6
Intel Pentium MMX,

Intel Pentium Pro,
Read Performance Monitoring Counter. The counter to read is specified by ECX and its value is returned AMD K7,
RDPMC 0F 33 3[l]
in EDX:EAX.[i] Cyrix 6x86MX,
IDT WinChip C6,
VIA Nano[m]
Intel Pentium Pro,

AMD K7,
CMOVcc reg,r/m [n] [o] 3 Cyrix 6x86MX,MediaGXm,
0F 4x /r Conditional move to register. The source operand may be either register or memory.
Transmeta Crusoe,
VIA C3 "Nehemiah"
Undefined Instruction. Generates an invalid opcode (#UD) exception in all operation modes.
UD2 0F 0B This instruction is provided for software testing to explicitly generate an invalid opcode. (3) (80186),[p]
Intel Pentium Pro
The opcode for this instruction is reserved for this purpose.
Official long NOP.
Other than AMD K7/K8, broadly unsupported in non-Intel processors released before Intel Pentium Pro,[r]
NOP r/m NFx 0F 1F /0 3
AMD K7, x86-64[s]
2006.[q][28]
SYSCALL 0F 05 Fast System call. 3

AMD K6[t],
SYSRET 0F 07[w] Fast Return from System Call. Designed to be used together with SYSCALL. 0 [x] x86-64[u][v]
Intel Pentium II,[y]

SYSENTER 0F 34 Fast System call. 3[x]
AMD K7,[36][z]
Transmeta Crusoe,[aa]
NatSemi Geode GX2,
SYSEXIT 0F 35[w] Fast Return from System Call. Designed to be used together with SYSENTER. 0[x]
VIA C3 "Nehemiah"[ab]
a. On Intel and AMD CPUs, the WRMSR instruction is also used to update
the CPU microcode. This is done by writing the virtual address of the
new microcode to upload to MSR 79h on Intel CPUs and MSR
C001_0020h on AMD CPUs.
b. Writes to the following MSRs are not serializing:[19] n. The condition codes supported for CMOVcc instruction (opcode 0F 4x
/r, with the x nibble specifying the condition) are:
Number Name
x cc Condition (EFLAGS)
48h SPEC_CTRL
0 O OF=1: "Overflow"
49h PRED_CMD
1 NO OF=0: "Not Overflow"
122h TSX_CTRL
2 C,B,NAE CF=1: "Carry", "Below", "Not Above or Equal"
6E0h TSC_DEADLINE
3 NC,NB,AE CF=0: "Not Carry", "Not Below", "Above or Equal"
6E1h PKRS
4 Z,E ZF=1: "Zero", "Equal"
774h HWP_REQUEST
5 NZ,NE ZF=0: "Not Zero", "Not Equal"
802h to 83Fh (x2APIC MSRs)
6 NA,BE (CF=1 or ZF=1): "Not Above", "Below or Equal"
c. System Management Mode and the RSM instruction were made 7 A,NBE (CF=0 and ZF=0): "Above", "Not Below or Equal"
available on non-SL variants of the Intel 486 only after the initial release
of the Intel Pentium in 1993. 8 S SF=1: "Sign"
d. CPUID is also available on some Intel and AMD 486 processor variants 9 NS SF=0: "Not Sign"
that were released after the initial release of the Intel Pentium.
A P,PE PF=1: "Parity", "Parity Even"
e. On the Cyrix 5x86 and 6x86 CPUs, CPUID is not enabled by default and
must be enabled through a Cyrix configuration register. B NP,PO PF=0: "Not Parity", "Parity Odd"
f. On NexGen CPUs, CPUID is only supported with some system BIOSes.
On some NexGen CPUs that do support CPUID, EFLAGS.ID is not C L,NGE SF≠OF: "Less", "Not Greater Or Equal"
supported but EFLAGS.AC is, complicating CPU detection.[22] D NL,GE SF=OF: "Not Less", "Greater Or Equal"
g. LOCK CMPXCHG8B with a register operand (which is an invalid encoding)
E LE,NG (ZF=1 or SF≠OF): "Less or Equal", "Not Greater"
can cause hangs on some Intel Pentium CPUs (Pentium F00F bug).
h. On IDT WinChip, Transmeta Crusoe and Rise mP6 processors, the F NLE,GE (ZF=0 and SF=OF): "Not Less or Equal", "Greater"
CMPXCHG8B instruction is always supported, however its CPUID bit may
be missing. This is a workaround for a bug in Windows NT.[23] o. For CMOVcc with a memory source operand, the CPU will always read
i. The RDTSC and RDPMC instructions are not ordered with respect to other the operand from memory (potentially causing memory exceptions and
instructions, and may sample their respective counters before earlier cache line-fills) even if the condition for the move is not satisfied.
instructions are executed or after later instructions have executed. p. UD2 will cause an #UD exception on all x86 processors from the 80186
Invocations of RDPMC (but not RDTSC) may be reordered relative to each onwards, but is only documented to do so from Pentium Pro onwards.
other even for reads of the same counter. q. Unlike other instructions added in Pentium Pro, long NOP does not
have a CPUID feature bit.
In order to impose ordering with respect to other instructions, LFENCE or
serializing instructions (e.g. CPUID) are needed.[24] r. 0F 1F /0 as long-NOP was introduced in the Pentium Pro, but
remained undocumented until 2006.[29] The whole 0F 18..1F opcode
j. Fixed-rate TSC was introduced in two stages: range was NOP in Pentium Pro. However, except for 0F 1F /0, Intel
does not guarantee that these opcodes will remain NOP in future
Constant TSC processors, and have indeed assigned some of these opcodes to other
TSC running at a fixed rate as long as the processor core is not in instructions in at least some processors.[30]
a deep-sleep (C2 or deeper) mode, but not synchronized between s. Documented for AMD x86-64 since 2002.[31]
CPU cores. Introduced in Intel Prescott, Yonah and Bonnell. Also
t. On K6, the SYSCALL/SYSRET instructions were available on Model 7
present in all Transmeta and VIA Nano[25] CPUs. Does not have a
(250nm "Little Foot") and later, not on the earlier Model 6.[32]
CPUID bit.
u. SYSCALL and SYSRET were made an integral part of x86-64 - as a result,
Invariant TSC the instructions are available in 64-bit mode on all x86-64 processors
TSC running at a fixed rate synchronized between CPU cores in from AMD, Intel, VIA and Zhaoxin (except Xeon Phi "Knights Corner").
all P-,C- and T-states (but not necessarily S-states).
Outside 64-bit mode, the instructions are available on AMD processors
Present in AMD K10 and later; Intel Nehalem/Saltwell[26] and later; only.
Zhaoxin LuJiaZui[27]. Indicated with a CPUID bit (leaf
8000_0007:EDX[8]). v. The exact semantics of SYSRET differs slightly between AMD and Intel
processors - this has been known to cause security issues.[33]
k. RDTSC can be run outside Ring 0 only if CR4.TSD=0.
w. For the SYSRET and SYSEXIT instructions under x86-64, it is necessary
l. RDPMC can be run outside Ring 0 only if CR4.PCE=1. to add the REX.W prefix for variants that will return to 64-bit user-mode
m. The RDPMC instruction is not present in VIA processors prior to the Nano. code.
Encodings of these instructions without the REX.W prefix are used to

return to 32-bit user-mode code. (Neither of these instructions can be
used to return 16-bit user-mode code.)
x. The SYSRET, SYSENTER and SYSEXIT instructions are unavailable in

Real mode. (SYSENTER is, however, available in Virtual 8086 mode.)
y. The CPUID flags that indicate support for SYSENTER/SYSEXIT are set on
the Pentium Pro, even though the processor does not officially support
these instructions.[34]
Third party testing indicates that the opcodes are present on the
Pentium Pro but too buggy to be usable.[35]
z. On AMD CPUs, the SYSENTER and SYSEXIT are not available in x86-64
long mode (#UD).
aa. On Transmeta CPUs, the SYSENTER and SYSEXIT instructions are only ab. On Nehemiah, SYSENTER and SYSEXIT are available only on stepping 8
available with version 4.2 or higher of the Transmeta Code Morphing and later.[38]
software.[37]
Added as instruction set extensions
Added with x86-64
These instructions can only be encoded in 64 bit mode. They fall in four groups:
original instructions that reuse existing opcodes for a different purpose (MOVSXD replacing ARPL)
original instructions with new opcodes (SWAPGS)
existing instructions extended to a 64 bit address size (JRCXZ)
existing instructions extended to a 64 bit operand size (remaining instructions)
Most instructions with a 64 bit operand size encode this using a REX.W prefix; in the absence of the REX.W prefix, the corresponding instruction with 32 bit
operand size is encoded. This mechanism also applies to most other instructions with 32 bit operand size. These are not listed here as they do not gain a
new mnemonic in Intel syntax when used with a 64 bit operand size.
Instruction Encoding Meaning
CDQE REX.W 98 Sign extend EAX into RAX
CQO REX.W 99 Sign extend RAX into RDX:RAX

CMPSQ REX.W A7 CoMPare String Quadword
CoMPare and eXCHanGe 16 Bytes.
CMPXCHG16B m128[a][b] REX.W 0F C7 /1
Atomic only if used with LOCK prefix.
IRETQ REX.W CF 64-bit Return from Interrupt

JRCXZ rel8 E3 cb Jump if RCX is zero
LODSQ REX.W AD LoaD String Quadword
MOVSXD r64,r/m32 REX.W 63 /r[c] MOV with Sign Extend 32-bit to 64-bit
MOVSQ REX.W A5 Move String Quadword
POPFQ 9D POP RFLAGS Register

PUSHFQ 9C PUSH RFLAGS Register
SCASQ REX.W AF SCAn String Quadword
STOSQ REX.W AB STOre String Quadword

SWAPGS 0F 01 F8 Exchange GS base with KernelGSBase MSR
a. The memory operand to CMPXCHG16B must be 16-byte aligned.

b. The CMPXCHG16B instruction was absent from a few of the earliest Intel/AMD x86-64 processors. On Intel processors, the instruction was missing from
Xeon "Nocona" stepping D,[39] but added in stepping E.[40] On AMD K8 family processors, it was added in stepping F, at the same time as DDR2
support was introduced.[41]
For this reason, CMPXCHG16B has its own CPUID flag, separate from the rest of x86-64.
c. Encodings of MOVSXD without REX.W prefix are permitted but discouraged[42] - such encodings behave identically to 16/32-bit MOV (8B /r).
Bit manipulation extensions
Bit manipulation instructions. For all of the VEX-encoded instructions defined by BMI1 and BMI2, the operand size may be 32 or 64 bits, controlled by
the VEX.W bit - none of these instructions are available in 16-bit variants.
Bit Manipulation Instruction

Opcode Instruction description Added in
Extension mnemonics
POPCNT r16,r/m16
F3 0F B8 /r
POPCNT r32,r/m32 Population Count. Counts the number of bits that are set to 1 in its source argument.
POPCNT r64,r/m64 F3 REX.W 0F B8 /r K10,
ABM (LZCNT)[a] Bobcat,
[b] Haswell,
Advanced Bit LZCNT r16,r/m16 Count Leading zeroes.
Manipulation F3 0F BD /r ZhangJiang,
LZCNT r32,r/m32
If source operand is all-0s, then LZCNT will return operand size in bits Gracemont
(16/32/64) and set CF=1.
LZCNT r64,r/m64 F3 REX.W 0F BD /r
Count Trailing zeroes.[c]

TZCNT r16,r/m16
F3 0F BC /r
TZCNT r32,r/m32
If source operand is all-0s, then TZCNT will return operand size in bits
(16/32/64) and set CF=1.
TZCNT r64,r/m64 F3 REX.W 0F BC /r
ANDN ra,rb,r/m VEX.LZ.0F38 F2 /r Bitwise AND-NOT: ra = r/m AND NOT(rb)

Bitfield extract. Bitfield start position is specified in bits [7:0] of rb, length in bits[15:8] of
rb. The bitfield is then extracted from the r/m value with zero-extension, then stored in
ra. Equivalent to[d]
BEXTR ra,r/m,rb VEX.LZ.0F38 F7 /r
mask = (1 << rb[15:8]) - 1
ra = (r/m >> rb[7:0]) AND mask
Haswell,
BMI1 Piledriver,
Bit Manipulation Jaguar,
Instruction Set 1 Extract lowest set bit in source argument. Returns 0 if source argument is 0. Equivalent ZhangJiang,
to Gracemont
BLSI reg,r/m VEX.LZ.0F38 F3 /3
dst = (-src) AND src
Generate a bitmask of all-1s bits up to the lowest bit position with a 1 in the source
argument. Returns all-1s if source argument is 0. Equivalent to
BLSMSK reg,r/m VEX.LZ.0F38 F3 /2
dst = (src-1) XOR src
Copy all bits of the source argument, then clear the lowest set bit. Equivalent to
BLSR reg,r/m VEX.LZ.0F38 F3 /1
dst = (src-1) AND src
Zero out high-order bits in r/m starting from the bit position specified in rb, then write
result to rd. Equivalent to
BZHI ra,r/m,rb VEX.LZ.0F38 F5 /r
ra = r/m AND NOT(-1 << rb[7:0])
Widening unsigned integer multiply without setting flags. Multiplies EDX/RDX with r/m,
MULX ra,rb,r/m VEX.LZ.F2.0F38 F6 /r then stores the low half of the multiplication result in ra and the high half in rb. If ra
and rb specify the same register, only the high half of the result is stored.
Parallel Bit Deposit. Scatters contiguous bits from rb to the bit positions set in r/m,
then stores result to ra. Operation performed is:
ra=0; k=0; mask=r/m

PDEP ra,rb,r/m VEX.LZ.F2.0F38 F5 /r
for i=0 to opsize-1 do
if (mask[i] == 1) then
ra[i]=rb[k]; k=k+1
Haswell,
BMI2
Bit Manipulation Parallel Bit Extract. Uses r/m argument as a bit mask to select bits in rb, then Excavator,[e]
Instruction Set 2 compacts the selected bits into a contiguous bit-vector. Operation performed is: ZhangJiang,
Gracemont
ra=0; k=0; mask=r/m
PEXT ra,rb,r/m VEX.LZ.F3.0F38 F5 /r
for i=0 to opsize-1 do
if (mask[i] == 1) then
ra[k]=rb[i]; k=k+1
RORX reg,r/m,imm8 VEX.LZ.F2.0F3A F0 /r ib Rotate right by immediate without affecting flags.
Arithmetic shift right without updating flags.
SARX ra,r/m,rb VEX.LZ.F3.0F38 F7 /r For SARX, SHRX and SHLX, the shift-amount specified in rb is masked to
5 bits for 32-bit operand size and 6 bits for 64-bit operand size.
SHRX ra,r/m,rb VEX.LZ.F2.0F38 F7 /r Logical shift right without updating flags.
SHLX ra,r/m,rb VEX.LZ.66.0F38 F7 /r Shift left without updating flags.
a. On AMD CPUs, the "ABM" extension provides both POPCNT and LZCNT. On Intel CPUs, however, the CPUID bit for "ABM" is only documented to
indicate the presence of the LZCNT instruction and is listed as "LZCNT", while POPCNT has its own separate CPUID feature bit.
However, all known processors that implement the "ABM"/"LZCNT" extensions also implement POPCNT and set the CPUID feature bit for POPCNT, so
the distinction is theoretical only.
(The converse is not true - there exist processors that support POPCNT but not ABM, such as Intel Nehalem and VIA Nano 3000.)
b. The LZCNT instruction will execute as BSR on systems that do not support the LZCNT or ABM extensions. BSR computes the index of the highest set bit
in the source operand, producing a different result from LZCNT for most input values.
c. The TZCNT instruction will execute as BSF on systems that do not support the BMI1 extension. BSF produces the same result as TZCNT for all input
operand values except zero - for which TZCNT returns input operand size, but BSF produces undefined behavior (leaves destination unmodified on
most modern CPUs).
d. For BEXTR, the start position and length are not masked and can take values from 0 to 255. If the selected bits extend beyond the end of the r/m
argument (which has the usual 32/64-bit operand size), then the excess bits are read out as 0.
e. On AMD processors before Zen 3, the PEXT and PDEP instructions are quite slow[43] and exhibit data-dependent timing due to the use of a microcoded
implementation (about 18 to 300 cycles, depending on the number of bits set in the mask argment). As a result, it is often faster to use other
instruction sequences on these processors.[44][45]
Added with Intel TSX
TSX Subset Instruction Opcode Description Added in
XBEGIN rel16 C7 F8 cw Start transaction. If transaction fails, perform a branch to Haswell

XBEGIN rel32 C7 F8 cd the given relative offset.
(Deprecated on desktop/laptop CPUs from 10th
RTM XABORT imm8 C6 F8 ib Abort transaction with 8-bit immediate as error code.
Restricted generation (Ice Lake, Comet Lake) onwards,
Transactional XEND NP 0F 01 D5 End transaction. but continues to be available on Xeon-branded
memory server parts (e.g. Ice Lake-SP, Sapphire
Test if in transactional execution. Sets EFLAGS.ZF to 0 if
XTEST NP 0F 01 D6 executed inside a transaction (RTM or HLE), 1 Rapids))
otherwise.
Instruction prefix to indicate start of hardware lock

elision, used with memory atomic instructions only (for
other instructions, the F2 prefix may have other
XACQUIRE F2
meanings). When used with such instructions, may start
Haswell
a transaction instead of performing the memory atomic
HLE operation.
Hardware Lock (Discontinued - the last processors to support
Elision Instruction prefix to indicate end of hardware lock elision, HLE were Coffee Lake and Cascade Lake)
used with memory atomic/store instructions only (for
other instructions, the F3 prefix may have other
XRELEASE F3
meanings). When used with such instructions during
hardware lock elision, will end the associated transaction
instead of performing the store/atomic.
TSXLDTRK XSUSLDTRK F2 0F 01 E8 Suspend Tracking Load Addresses

Load Address
Sapphire Rapids
Tracking
suspend/resume XRESLDTRK F2 0F 01 E9 Resume Tracking Load Addresses
Added with Intel CET
Intel CET (Control-Flow Enforcement Technology) adds two distinct features to help protect against security exploits such as return-oriented
programming: a shadow stack (CET_SS), and indirect branch tracking (CET_IBT).
CET Subset Instruction Opcode Description Ring Added in
INCSSPD r32 F3 0F AE /5
Increment shadow stack pointer
INCSSPQ r64 F3 REX.W 0F AE /5
CET_SS
Shadow stack. Read shadow stack pointer into
RDSSPD r32 F3 0F 1E /1
register (low 32 bits)[a]
When shadow stacks are enabled, return addresses Read shadow stack pointer into
are pushed on both the regular stack and the RDSSPQ r64 F3 REX.W 0F 1E /1 3
register (full 64 bits)[a]
shadow stack when a function call is made. They
SAVEPREVSSP F3 0F 01 EA Save previous shadow stack pointer
are then both popped on return from the function Tiger Lake,
call - if they do not match, then the stack is assumed RSTORSSP m64 F3 0F 01 /5 Restore saved shadow stack pointer Zen 3
to be corrupted, and a #CP exception is issued. WRSSD m32,r32 0F 38 F6 /r Write 4 bytes to shadow stack
The shadow stack is additionally required to be WRSSQ m64,r64 REX.W 0F 38 F6 /r Write 8 bytes to shadow stack
stored in specially marked memory pages which WRUSSD m32,r32 66 0F 38 F5 /r Write 4 bytes to user shadow stack
cannot be modified by normal memory store
WRUSSQ m64,r64 66 REX.W 0F 38 F5 /r Write 8 bytes to user shadow stack
instructions. 0
SETSSBSY F3 0F 01 E8 Mark shadow stack busy
CLRSSBSY m64 F3 0F AE /6 Clear shadow stack busy flag
Terminate indirect branch in 32-bit

ENDBR32 F3 0F 1E FB
mode[b]
CET_IBT Terminate indirect branch in 64-bit
ENDBR64 F3 0F 1E FA
Indirect Branch Tracking. mode[b]
When IBT is enabled, an indirect branch (jump, call, Prefix used with indirect CALL/JMP near 3 Tiger Lake
instructions (opcodes FF /2 and
return) to any instruction that is not an ENDBR32/64 FF /4) to indicate that the branch
instruction will cause a #CP exception. NOTRACK 3E[c] target is not required to start with an
ENDBR32/64 instruction. Prefix only
honored when NO_TRACK_EN flag is
set.
a. The RDSSPD and RDSSPQ instructions act as NOPs on processors where shadow stacks are disabled or CET is not supported.
b. ENDBR32 and ENDBR64 act as NOPs on processors that don't support CET_IBT or where IBT is disabled.
c. This prefix has the same encoding as the DS: segment override prefix - as of April 2022, Intel documentation does not appear to specify whether this
prefix also retains its old segment-override function when used as a no-track prefix, nor does it provide an official mnemonic for this prefix.[46][47] (GNU
binutils use "notrack"[48])
Added with other cross-vendor extensions
Instruction
Instruction Set Extension Opcode Instruction description Ring Added in
mnemonics
Prefetch with Non-Temporal Access.
Prefetch data under the assumption that

the data will be used only once, and
PREFETCHNTA m8 0F 18 /0 attempt to minimize cache pollution from
said data. The methods used to minimize
cache pollution are implementation-
dependent.[b] Pentium III,
SSE[a] 3 K7,
(non-SIMD) Nehemiah
PREFETCHT0 m8 0F 18 /1 Prefetch data to all levels of the cache hierarchy.[b]
Prefetch data to all levels of the cache hierarchy
PREFETCHT1 m8 0F 18 /2
except L1 cache.[b]
Prefetch data to all levels of the cache hierarchy
PREFETCHT2 m8 0F 18 /3
except L1 and L2 caches.[b]
SFENCE NP 0F AE F8+x[c] Store Fence.[d]
LFENCE NP 0F AE E8+x[c] Load Fence and Dispatch Serialization.[e]
MFENCE NP 0F AE F0+x[c] Memory Fence.[f]

MOVNTI m32,r32 NP 0F C3 /r
Non-Temporal Memory Store. Pentium 4,
SSE2 MOVNTI m64,r64 NP REX.W 0F C3 /r
3 K8,
(non-SIMD)
Pauses CPU thread for a short time period.[g] C7 Esther
PAUSE F3 90
Intended for use in spinlocks.[h]
Flush one cache line to memory.
In a system with multiple cache hierarchy

CLFSH[i] levels and/or multiple processors each (SSE2),
CLFLUSH m8 NP 0F AE /7 3
Cache Line Flush. Geode LX
with their own caches, the line is flushed
from all of them.
Start monitoring a memory location for memory

writes. The memory address to monitor is given by
MONITOR[k] DS:AX/EAX/RAX.[l] ECX and EDX are used to
NP 0F 01 C8
MONITOR EAX,ECX,EDX provide extra extension and hint flags.
Prescott,
MONITOR[j] Yonah,
Monitor a memory location for Wait for a write to a monitored memory location 0[m] Bonnell,
memory writes. previously specified with MONITOR.[n] K10,
Nano
MWAIT[k] NP 0F 01 C9
MWAIT EAX,ECX
EAX and ECX are used to provide extra
extension and hint flags.
SMX
Safer Mode Extensions.
Conroe/Merom,
Load, authenticate and execute Perform an SMX function. The function to perform
GETSEC NP 0F 37 0 WuDaoKou,[58]
a digitally signed "Authenticated is given in EAX.[o]
Tremont
Code Module" as part of Intel
Trusted Execution Technology.
XSAVE mem NP 0F AE /4 Save state components specified by EDX:EAX to 3 Penryn,[p]

XSAVE XSAVE64 mem NP REX.W 0F AE /4 memory. Bulldozer,
Processor Extended State Jaguar,
Save/Restore. XRSTOR mem NP 0F AE /5 Restore state components specified by EDX:EAX Goldmont,
XRSTOR64 mem NP REX.W 0F AE /5 from memory. ZhangJiang
XGETBV NP 0F 01 D0 Get value of Extended Control Register.
Reads an XCR specified by ECX into

EDX:EAX.
Set Extended Control Register.
XSETBV NP 0F 01 D1 Write the value in EDX:EAX to the XCR 0

specified by ECX.
Read Time Stamp Counter and processor core

ID.[q] K10,
RDTSCP
Nehalem,
Read Time Stamp Counter and RDTSCP 0F 01 F9
The TSC value is placed in EDX:EAX and 3[s] Silvermont,
Processor ID.
the core ID in ECX.[r] Nano
POPCNT r16,r/m16 K10,

F3 0F B8 /r
POPCNT[t] POPCNT r32,r/m32 Count the number of bits that are set to 1 in its
3 Nehalem,
Population Count. source argument.
Nano 3000
POPCNT r64,r/m64 F3 REX.W 0F B8 /r
CRC32 r32,r/m8 F2 0F 38 F0 /r Accumulate CRC value using the CRC-32C

(Castagnoli) polynomial 0x11EDC6F41 (normal
Nehalem,
SSE4.2 CRC32 r32,r/m16 form 0x1EDC6F41). This is the polynomial used in
F2 0F 38 F1 /r 3 Bulldozer,
(non-SIMD) CRC32 r32,r/m32 iSCSI. In contrast to the more popular one used in
ZhangJiang
Ethernet, its parity is even, and it can thus detect
CRC32 r64,r/m64 F2 REX.W 0F 38 F1 /r any error with an odd number of changed bits.
Save state components specified by EDX:EAX to

memory.
Unlike the older XSAVE instruction, Sandy Bridge,

XSAVEOPT
XSAVEOPT mem NP 0F AE /6 XSAVEOPT may abstain from writing Steamroller,
Processor Extended State processor state items to memory when the 3 Puma,
XSAVEOPT64 mem NP REX.W 0F AE /6
Save/Restore Optimized Goldmont,
CPU can determine that they haven't been ZhangJiang
modified since the most recent
corresponding XRSTOR.
RDFSBASE r32 F3 0F AE /0
Read base address of FS: segment.
RDFSBASE r64 F3 REX.W 0F AE /0
FSGSBASE RDGSBASE r32 F3 0F AE /1 Ivy Bridge,

Read base address of GS: segment.
Read/write base address of FS and RDGSBASE r64 F3 REX.W 0F AE /1 Steamroller,
3
GS segments from user-mode. WRFSBASE r32 F3 0F AE /2 Goldmont,
Available in 64-bit mode only. Write base address of FS: segment. ZhangJiang
WRFSBASE r64 F3 REX.W 0F AE /2
WRGSBASE r32 F3 0F AE /3
Write base address of GS: segment.
WRGSBASE r64 F3 REX.W 0F AE /3
MOVBE r16,m16
NFx 0F 38 F0 /r Load from memory to register with byte-order
MOVBE r32,m32
swap. Bonnell,
MOVBE MOVBE r64,m64 NFx REX.W 0F 38 F0 /r Haswell,
Move to/from memory with byte order 3 Jaguar,
swap. MOVBE m16,r16 Steamroller,
NFx 0F 38 F1 /r Store to memory from register with byte-order
MOVBE m32,r32 ZhangJiang
swap.
MOVBE m64,r64 NFx REX.W 0F 38 F1 /r
Invalidate entries in TLB and paging-structure Haswell,

INVPCID caches based on invalidation type in register[u] and ZhangJiang,
INVPCID reg,m128 66 0F 38 82 /r 0
Invalidate Process-context identifier. descriptor in m128. The descriptor contains a Zen 3,
memory address and a PCID.[v] Gracemont
K6-2,
PREFETCHW m8 0F 0D /1 Prefetch cache line with intent to write.[b]
PREFETCHW[w] (Cedar Mill),[x]
Cache-line prefetch with intent to 3 Silvermont,
write. Broadwell,
PREFETCH m8[y] 0F 0D /0 Prefetch cache line.[b]
ZhangJiang
Add-with-carry. Differs from the older ADC

ADCX r32,r/m32 66 0F 38 F6 /r
instruction in that it leaves flags other than Broadwell,
ADCX r64,r/m64 66 REX.W 0F 38 F6 /r
ADX EFLAGS.CF unchanged. Zen 1,
3
Enhanced variants of add-with-carry. Add-with-carry, with the overflow-flag EFLAGS.OF ZhangJiang,
ADOX r32,r/m32 F3 0F 38 F6 /r Gracemont
serving as carry input and output, with other flags
ADOX r64,r/m64 F3 REX.W 0F 38 F6 /r
left unchanged.
SMAP CLAC NP 0F 01 CA Clear EFLAGS.AC.

Supervisor Mode Access Prevention.
Broadwell,
Repurposes the EFLAGS.AC
0 Goldmont,
(alignment check) flag to a flag that
Zen 1
prevents access to user-mode
memory while in ring 0, 1 or 2. STAC NP 0F 01 CB Set EFLAGS.AC.
CLFLUSHOPT m8 NFx 66 0F AE /7 Flush cache line. 3 Skylake,

CLFLUSHOPT Goldmont,
Optimized Cache Line Flush. Zen 1
Differs from the older CLFLUSH instruction
in that it has more relaxed ordering rules
with respect to memory stores and other
cache line flushes, enabling improved
performance.
XSAVEC Skylake,
XSAVEC mem NP 0F C7 /4 Save processor extended state components
Processor Extended State 3 Goldmont,
XSAVEC64 mem NP REX.W 0F C7 /4 specified by EDX:EAX to memory with compaction.
save/restore with compaction. Zen 1
Save processor extended state components

XSAVES mem NP 0F C7 /5
XSS specified by EDX:EAX to memory with compaction Skylake,
XSAVES64 mem NP REX.W 0F C7 /5
Processor Extended State and optimization if possible. 0 Goldmont,
save/restore, supervisor state. XRSTORS mem NP 0F C7 /3 Restore state components specified by EDX:EAX Zen 1
XRSTORS64 mem NP REX.W 0F C7 /3 from memory.
RDPKRU NP 0F 01 EE Read User Page Key register into EAX. Comet Lake,
PKU
3 Gracemont,
Protection Keys for user pages.
WRPKRU NP 0F 01 EF Write data from EAX into User Page Key Register. Zen 3
Goldmont Plus,
RDPID
Read processor core ID.
RDPID r32 F3 0F C7 /7 Read processor core ID into register.[q] 3[z] Zen 2,
Ice Lake
Zen 2,
CLWB Write one cache line back to memory without
CLWB m8 NFx 66 0F AE /6 3 Tiger Lake,
Cache Line Writeback to memory. invalidating the cache line.
Tremont
WBNOINVD
Write back all dirty cache lines to memory without Zen 2,
Whole Cache Writeback without WBNOINVD F3 0F 09 0
invalidate. invalidation.[aa] Ice Lake-SP
a. On AMD Athlon processors prior to the Athlon XP, the non-SIMD e. The LFENCE instruction ensures that all memory loads after the LFENCE
instructions of SSE were introduced as part of "MMX Extensions".[49] instruction are made globally observable after all memory loads before
b. All of the PREFETCH* instructions are hint instructions with effects only the LFENCE.
on performance, not program semantics. Providing an invalid address
On all Intel CPUs that support SSE2, the LFENCE instruction provides a
(e.g. address of an unmapped page or a non-canonical address) will
cause the instruction to act as a NOP without any exceptions generated. stronger ordering guarantee:[52] it is dispatch-serializing, meaning that
instructions after the LFENCE instruction are allowed to start executing
c. For the SFENCE, LFENCE and MFENCE instructions, the bottom 3 bits of
only after all instructions before it have retired (which will ensure that all
the ModR/M byte are ignored, and any value of x in the range 0..7 will
preceding loads but not necessarily stores have completed). The effect
result in a valid instruction.
of dispatch-serialization is that LFENCE also acts as a speculation barrier
d. The SFENCE instruction ensures that all memory stores after the SFENCE and a reordering barrier for accesses to non-memory resources such as
instruction are made globally observable after all memory stores before performance counters (accessed through e.g. RDTSC or RDPMC) and
the SFENCE. This imposes ordering on stores that can otherwise be
x2apic MSRs.
reordered, such as non-temporal stores and stores to WC (Write-
Combining) memory regions.[50] On AMD CPUs, LFENCE is not necessarily dispatch-serializing by default
- however, on all AMD CPUs that support any form of non-dispatch-
On Intel CPUs, as well as AMD CPUs from Zen1 onwards (but not older serializing LFENCE, it can be made dispatch-serializing by setting bit 1 of
AMD CPUs), SFENCE also acts as a reordering barrier on cache
MSR C001_1029.[53]
flushes/writebacks performed with the CLFLUSH, CLFLUSHOPT and CLWB
instructions. (Older AMD CPUs require MFENCE to order CLFLUSH.) f. The MFENCE instruction ensures that all memory loads, stores and
cacheline-flushes after the MFENCE instruction are made globally
SFENCE is not ordered with respect to LFENCE, and an SFENCE+LFENCE
observable after all memory loads, stores and cacheline-flushes before
sequence is not sufficient to prevent a load from being reordered past a
the MFENCE.
previous store.[51] To prevent such reordering, it is necessary to execute
an MFENCE, LOCK or a serializing instruction. On Intel CPUs, MFENCE is not dispatch-serializing, and therefore cannot
be used to enforce ordering on accesses to non-memory resources
such as performance counters and x2apic MSRs. MFENCE is still ordered
with respect to LFENCE, so if a memory barrier with dispatch serialization
is needed, then it can be obtained by issuing an MFENCE followed by an
LFENCE.[24]
On AMD CPUs, MFENCE is serializing.
g. The actual length of the pause performed by the PAUSE instruction is

implementation-dependent.
On systems without SSE2, PAUSE will execute as NOP.
h. Under VT-x or AMD-V virtualization, executing PAUSE many times in a

short time interval may cause a #VMEXIT. The number of PAUSE
executions and interval length that can trigger #VMEXIT are platform-
specific.
i. While the CLFLUSH instruction was introduced together with SSE2, it has r. Unlike the older RDTSC instruction, RDTSCP will delay the TSC read until
its own CPUID flag and may be present on processors not otherwise all previous instructions have retired, guaranteeing ordering with respect
implementing SSE2 and/or absent from processors that otherwise to preceding memory loads (but not stores). RDTSCP is not ordered with
implement SSE2. (E.g. AMD Geode LX supports CLFLUSH but not respect to subsequent instructions, though.
SSE2.) s. RDTSCP can be run outside Ring 0 only if CR4.TSD=0.
j. While the MONITOR and MWAIT instructions were introduced at the same t. While the POPCNT instruction was introduced at the same time as
time as SSE3, they have their own CPUID flag that needs to be SSE4.2, it is not considered to be a part of SSE4.2, but instead a
checked separately from the SSE3 CPUID flag (e.g. Athlon64 X2 and separate extension with its own CPUID flag.
VIA C7 supported SSE3 but not MONITOR.)
k. For the MONITOR and MWAIT instructions, older Intel documentation[54] On AMD processors, it is considered to be a part of the ABM extension,
lists instruction mnemonics with explicit operands (MONITOR but still has its own CPUID flag.
EAX,ECX,EDX and MWAIT EAX,ECX), while newer documentation omits
these operands. Assemblers/disassemblers may support one or both of u. The invalidation types defined for INVPCID (selected by register
these variants.[55] argument) are:
l. For MONITOR, the DS: segment can be overridden with a segment prefix.
Value Function
The memory area that will be monitored will be not just the single byte Invalidate TLB entries matching PCID and virtual memory address in
specified by DS:rAX, but a linear memory region containing the byte - 0
descriptor, excluding global entries
the size and alignment of this memory region is implementation-
dependent and can be queried through CPUID. Invalidate TLB entries matching PCID in descriptor, excluding global
1
entries
The memory location to monitor should have memory type WB (write-
2 Invalidate all TLB entries, including global entries
back cacheable), or else monitoring may fail.
3 Invalidate all TLB entries, excluding global entries
m. On some processors, such as Intel Xeon Phi x200[56] and AMD K10[57]
and later, there exist documented MSRs that can be used to enable Any other value in register argument causes a #GP exception.
MONITOR and MWAIT to run in Ring 3.
n. The wait performed by MWAITmay be ended by system events other v. Unlike the older INVLPG instruction, INVPCID will cause a #GP
than a memory write (e.g. cacheline evictions, interrupts) - the exact set exception if the provided memory address is non-canonical. This
of events that can cause the wait to end is implementation-specific. discrepancy has been known to cause security issues.[59]
w. The PREFETCH and PREFETCHW instructions are mandatory parts of the
Regardless of whether the wait was ended by a memory write or some
3DNow! instruction set extension, but are also available as a standalone
other event, monitoring will have ended and it will be necessary to set
extension on systems that do not support 3DNow!
up monitoring again with MONITOR before performing any further
MWAITs. x. The opcodes for PREFETCH and PREFETCHW (0F 0D /r) execute as
NOPs on Intel CPUs from Cedar Mill (65nm Pentium 4) onwards, with
PREFETCHW gaining prefetch functionality from Broadwell onwards.
o. The leaf functions defined for GETSEC (selected by EAX) are:
y. The PREFETCH (0F 0D /0) instruction is a 3DNow! instruction, present
EAX Function on all processors with 3DNow! but not necessarily on processors with
the PREFETCHW extension.
0 (CAPABILITIES) Report SMX capabilities
On AMD CPUs with PREFETCHW, opcode 0F 0D /0 as well as
2 (ENTERACCES) Enter execution of authenticated code module
opcodes 0F 0D /2../7 are all documented to be performing prefetch.
3 (ENTERACCES) Exit execution of authenticated code module
On Intel processors with PREFETCHW, these opcodes are documented
4 (SENTER) Enter measured environment as performing reserved-NOPs[60] (except 0F 0D /2 being
5 (SENTER) Exit measured environment PREFETCHWT1 m8 on Xeon Phi only) - third party testing[61] indicates that
some or all of these opcodes may be performing prefetch on at least
6 (PARAMETERS) Report SMX parameters some Intel Core CPUs.
7 (SMCTRL) SMX Mode Control
z. Unlike the older RDTSCP instruction which can also be used to read the
8 (WAKEUP) Wake up sleeping processors in measured environment processor ID, user-mode RDPID is not disabled by CR4.TSD=1.
aa. The WBNOINVD instruction will execute as WBINVD if run on a system that
Any other value in EAX causes an #UD exception. doesn't support the WBNOINVD extension.
p. XSAVE was added in steppings E0/R0 of Penryn and is not available in WBINVD differs from WBNOINVD in that WBINVD will invalidate all cache
earlier steppings. lines after writeback.
q. The "core ID" value read by RDTSCP and RDPID is actually the TSC_AUX
MSR (MSR C000_0103h). Whether this value actually corresponds to a
processor ID is a matter of operating system convention.
Added with other Intel-specific extensions
Instruction
mnemonics
SGX Perform an SGX Supervisor function. The function SGX1

ENCLS NP 0F 01 CF 0
Software Guard Extensions. to perform is given in EAX.[a] Skylake,[b]
Goldmont Plus
Set up an encrypted SGX2
enclave in which a guest Perform an SGX User function. The function to Goldmont Plus,
can execute code that a ENCLU NP 0F 01 D7 3[d] Ice Lake-SP[63]
perform is given in EAX.[c]
compromised or malicious
host cannot inspect or
tamper with. ENCLV NP 0F 01 C0 Perform an SGX virtualization function. The 0 Ice Lake-SP,
function to perform is given in EAX.[e] Tremont
PTWRITE
PTWRITE r/m32 F3 0F AE /4 Read data from register or memory to encode into Kaby Lake,
Write data to a Processor Trace 3
Packet.
PTWRITE r/m64 F3 REX.W 0F AE /4 a PTW packet.[f] Goldmont Plus
Store to memory using Direct Store (memory store

MOVDIRI MOVDIRI m32,r32 NP 0F 38 F9 /r Ice Lake,
that is not cached or write-combined with other 3
Move to memory as Direct Store. MOVDIRI m64,r64 NP REX.W 0F 38 F9 /r Tremont
stores).
Move 64 bytes of data from m512 to address

MOVDIR64B Ice Lake,
MOVDIR64B reg,m512 66 0F 38 F8 /r given by ES:reg. The 64-byte write is done 3
Move 64 bytes as Direct Store. Tremont
atomically with Direct Store.[g]
Perform a platform feature configuration function.

PCONFIG
Platform Configuration, including The function to perform is specified in
PCONFIG NP 0F 01 C5 0 Ice Lake-SP
MKTME ("Multi-Key Total Memory
Encryption"). EAX.[h]
Move cache line containing m8 from CPU L1

CLDEMOTE (Tremont),
CLDEMOTE m8 NP 0F 1C /0 cache to a more distant level of the cache 3
Cache Line Demotion Hint. (Alder Lake)[j]
hierarchy.[i]
Wait until the Time Stamp Counter reaches the

value specified in EDX:EAX.
TPAUSE r32 The register argument to the instruction

66 0F AE /6
TPAUSE r32,EDX,EAX
specifies extra flags to control the
operation of the instruction.
WAITPKG
Start monitoring a memory location for memory Tremont,
User-mode memory monitoring 3[k] Alder Lake
and waiting. UMONITOR r16/32/64 F3 0F AE /6 writes. The memory address to monitor is given by
the register argument.[l]
Timed wait for a write to a monitored memory
location previously specified with UMONITOR. In the
UMWAIT r32 absence of a memory write, the wait will end when
F2 0F AE /6 either the TSC reaches the value specified by
UMWAIT r32,EAX,EDX
EDX:EAX or the wait has been going on for an
OS-controlled maximum amount of time.[m]
SERIALIZE
Instruction Execution SERIALIZE NP 0F 01 E8 Serialize instruction fetch and execution.[n] 3 Alder Lake
Serialization.
Request that the processor reset selected

components of hardware-maintained prediction
HRESET
HRESET imm8 F3 0F 3A F0 C0 ib history. A bitmap of which components of the 0 Alder Lake
Processor History Reset.
CPU's prediction history to reset is given in EAX
(the imm8 argument is ignored).
Send Interprocessor User Interrupt. [o]

SENDUIPI reg F3 0F C7 /6
UIRET F3 0F 01 EC User Interrupt Return

UINTR
Test User Interrupt Flag.
User Interprocessor interrupt. 3 Sapphire Rapids
Available in 64-bit mode only. TESTUI F3 0F 01 ED
Copies UIF to EFLAGS.CF .
CLUI F3 0F 01 EE Clear User Interrupt Flag

STUI F3 0F 01 EF Set User Interrupt Flag
Enqueue Command. Reads a 64-byte "command

data" structure from memory (m512 argument)
and writes atomically to a memory-mapped
Enqueue Store device (register argument provides
the memory address of this device, using ES
ENQCMD r32/64,m512 F2 0F 38 F8 /r 3
segment and requiring 64-byte alignment.) Sets
ENQCMD ZF=0 to indicate that device accepted the
command, or ZF=1 to indicate that command was Sapphire Rapids
Enqueue Store.
not accepted (e.g. queue full or the memory
location was not an Enqueue Store device.)
Enqueue Command Supervisor. Differs from
ENQCMD in that it can place an arbitrary PASID
ENQCMDS r32/64,m512 F3 0F 38 F8 /r 0
(process address-space identifier) and a privilege-
bit in the "command data" to enqueue.
a. The leaf functions defined for ENCLS (selected by EAX) are: e. The leaf functions defined for ENCLV (selected by EAX) are:
EAX Function EAX Function
0 (ECREATE) Create an enclave 0 (EDECVIRTCHILD) Decrement VIRTCHILDCNT in SECS
1 (EADD) Add a page 1 (EINCVIRTCHILD) Increment VIRTCHILDCNT in SECS
2 (EINIT) Initialize an enclave 2 (ESETCONTEXT) Set ENCLAVECONTEXT field in SECS
3 (EREMOVE) Remove a page from EPC

Any unsupported value in EAX causes a #GP exception.
4 (EDBGRD) Read data by debugger
f. For PTWRITE, the write to the Processor Trace Packet will only happen if
5 (EDBGWR) Write data by debugger
a set of enable-bits (the "TriggerEn", "ContextEn", "FilterEn" bits of the
6 (EEXTEND) Extend EPC page measurement RTIT_STATUS MSR and the "PTWEn" bit of the RTIT_CTL MSR) are all
set to 1.
7 (ELDB) Load an EPC page as blocked
The PTWRITE instruction is indicated in the SDM to cause an #UD
8 (ELDU) Load an EPC page as unblocked
exception if the 66h instruction prefix is used, regardless of other
9 (EBLOCK) Block an EPC page prefixes.
A (EPA) Add version array

g. For MOVDIR64, the destination address given by ES:reg must be 64-byte
B (EWB) Writeback/invalidate EPC page aligned.
C (ETRACK) Activate EBLOCK checks The operand size for the register argument is given by the address size,
which may be overridden by the 67h prefix.
Added with SGX2
D (EAUG) Add page to initialized enclave The 64-byte memory source argument does not need to be 64-byte
aligned, and is not guaranteed to be read atomically.
E (EMODPTR) Restrict permissions of EPC page
h. The leaf functions defined for PCONFIG (selected by EAX) are:
F (EMODT) Change type of EPC page
10 (ERDINFO) Read EPC page type/status info EAX Function
11 (ETRACKC) Activate EBLOCK checks Program key and encryption mode to use with an
0 (MKTME_KEY_PROGRAM)
MKTME Key ID.
12 (ELDBC) Load EPC page as blocked with enhanced error reporting
13 (ELDUC) Load EPC page as unblocked with enhanced error reporting Any other value in EAX causes a #GP(0) exception.
Any unsupported value in EAX causes a #GP exception. i. For CLDEMOTE, the cache level that it will demote a cache line to is
implementation-dependent.
b. SGX is deprecated on desktop/laptop processors from 11th generation
Since the instruction is considered a hint, it will execute as a NOP
(Rocket Lake, Tiger Lake) onwards, but continues to be available on
without any exceptions if the provided memory address is invalid or not
Xeon-branded server parts.[62]
in the L1 cache. It may also execute as a NOP under other
c. The leaf functions defined for ENCLU (selected by EAX) are: implementation-dependent circumstances as well.
EAX Function On systems that do not support the CLDEMOTE extension, it executes
as a NOP.
0 (EREPORT) Create a cryptographic report
1 (EGETKEY) Create a cryptographic key j. As of May 2022, no Tremont or Alder Lake models have been observed
to have the CPUID feature bit for CLDEMOTE set, while several of them
2 (EENTER) Enter an Enclave
have the CPUID bit cleared.[64]
3 (ERESUME) Re-enter an Enclave k. TPAUSE and UMWAIT can be run outside Ring 0 only if CR4.TSD=0.
4 (EEXIT) Exit an Enclave l. For UMONITOR, the operand size of the address argument is given by the
address size, which may be overridden by the 67h prefix. The default
Added with SGX2 segment used is DS:, which can be overridden with a segment prefix.
5 (EACCEPT) Accept changes to EPC page
m. For UMWAIT, the operating system can use the IA32_UMWAIT_CONTROL
MSR to limit the maximum amount of time that a single UMWAIT
6 (EMODPE) Extend EPC page permissions invocation is permitted to wait. The UMWAIT instruction will set
RFLAGS.CF to 1 if it reached the IA32_UMWAIT_CONTROL-defined time
7 (EACCEPTCOPY) Initialize pending page
limit and 0 otherwise.
Added with TDX n. While serialization can be performed with older instructions such as e.g.
CPUID and IRET, these instructions perform additional functions,
8 (EVERIFYREPORT2) Verify a cryptographic report of a trust domain
causing side-effects and reduced performance when stand-alone
instruction serialization is needed. (CPUID additionally has the issue that
Any unsupported value in EAX causes a #GP exception. it causes a mandatory #VMEXIT when executed under virtualization,
which causes a very large overhead.) The SERIALIZE instruction
d. ENCLU can only be executed in ring 3, not rings 0/1/2. performs serialization only, avoiding these added costs.
o. The register argument to SENDUIPI is an index to pick an entry from the
UITT (User-Interrupt Target Table, a table specified by the new
UINTR_TT and UINT_MISC MSRs.)
Added with other AMD-specific extensions
Instruction
mnemonics
AltMovCr8 MOV reg,CR8 F0 0F 20 /0[b] Read the CR8 register.

Alternative mechanism
to access the CR8 0 K8[c]
control register.[a] MOV CR8,reg F0 0F 22 /0[b] Write to the CR8 register.
Start monitoring a memory location for memory writes. Similar to older MONITOR, except
MONITORX NP 0F 01 FA
available in user mode.
MONITORX Wait for a write to a monitored memory location previously specified with MONITORX.
Monitor a memory
3 Excavator
location for writes in MWAITX differs from the older MWAIT instruction mainly in that it runs in user
user mode. MWAITX NP 0F 01 FB
mode and that it can accept an optional timeout argument (given in TSC
time units) in EBX (enabled by setting bit[1] of ECX to 1.)
CLZERO Write zeroes to all bytes in a memory region that has the size and alignment of a CPU
CLZERO rAX NP 0F 01 FC 3 Zen 1
Zero out full cache line. cache line and contains the byte addressed by DS:rAX.[d]
Read selected MSRs (mainly performance counters) in user mode. ECX specifies which
RDPRU register to read.[e]
Read processor register RDPRU NP 0F 01 FD 3[f] Zen 2
in user mode. The value of the MSR is returned in EDX:EAX.
Ensure that all preceding stores in thread have been committed to memory, and that any
errors encountered by these stores have been signalled to any associated error logging
MCOMMIT resources. The set of errors that can be reported and the logging mechanism are platform-
Commit Stores To MCOMMIT F3 0F 01 FA specific. 3 Zen 2
Memory.
Sets EFLAGS.CF to 0 if any errors occurred, 1 otherwise.
Invalidate TLB Entries for a range of pages, with broadcast. The invalidation is performed
on the processor executing the instruction, and also broadcast to all other processors in the
system.
INVLPGB NP 0F 01 FE rAX takes the virtual address to invalidate and some additional flags, ECX
takes the number of pages to invalidate, and EDX specifies ASID and PCID
INVLPGB to perform TLB invalidation for.
Invalidate TLB Entries 0 Zen 3
with broadcast.
Synchronize TLB invalidations.
Wait until all TLB invalidations signalled by preceding invocations of the

TLBSYNC NP 0F 01 FF
INVLPGB instruction on the same logical processor have been responded to
by all processors in the system. Instruction is serializing.
a. The standard way to access the CR8 register is to use an encoding that makes use of the REX.R prefix, e.g. 44 0F 20 07 (MOV RDI,CR8). However,
the REX.R prefix is only available in 64-bit mode.
The AltMovCr8 extension adds an additional method to access CR8, using the F0 (LOCK) prefix instead of REX.R - this provides access to CR8 outside
64-bit mode.
b. Like other variants of MOV to/from the CRx registers, the AltMovCr8 encodings ignore the top 2 bits of the instruction's ModR/M byte, and always
execute as if these two bits are set to 11b.
The AltMovCr8 encodings are available in 64-bit mode. However, using both the LOCK prefix and the REX.R is not permitted and will cause an #UD
exception.
c. Support for AltMovCR8 was added in stepping F of the AMD K8, and is not available on earlier steppings.
d. For CLZERO, the address size and 67h prefix control whether to use AX, EAX or RAX as address. The default segment DS: can be overridden by a
segment-override prefix. The provided address does not need to be aligned - hardware will align it as necessary.
The CLZERO instruction is intended for recovery from otherwise-fatal Machine Check errors. It is non-cacheable, cannot be used to allocate a cache
line without a memory access, and should not be used for fast memory clears.[65]
e. The register numbering used by RDPRU does not necessarily match that of RDMSR/WRMSR.
The registers supported by RDPRU as of December 2022 are:
ECX Register
0 MPERF (MSR 0E7h: Maximum Performance Frequency Clock Count)
1 APERF (MSR 0E8h: Actual Performance Frequency Clock Count)
Unsupported values in ECX return 0.
f. If CR4.TSD=1, then the RDPRU instruction can only run in ring 0.
x87 floating-point instructions

The x87 coprocessor, if present, provides support for floating-point arithmetic. The coprocessor provides eight data registers, each holding one 80-bit
floating-point value (1 sign bit, 15 exponent bits, 64 mantissa bits) - these registers are organized as a stack, with the top-of-stack register referred to as
"st" or "st(0)", and the other registers referred to as st(1),st(2),...st(7). It additionally provides a number of control and status registers, including "PC"
(precision control, to control whether floating-point operations should be rounded to 24, 53 or 64 mantissa bits) and "RC" (rounding control, to pick
rounding-mode: round-to-zero, round-to-positive-infinity, round-to-negative-infinity, round-to-nearest-even) and a 4-bit condition code register "CC",
whose four bits are individually referred to as C0,C1,C2 and C3). Not all of the arithmetic instructions provided by x87 obey PC and RC.
Original 8087 instructions
Instruction description Mnemonic Opcode Additional items
Waiting
x87 Non-Waiting[a] FPU Control Instructions
mnemonic[b]
Initialize x87 FPU FNINIT DB E3 FINIT
Load x87 Control Word FLDCW m16 D9 /5 (none)
Store x87 Control Word FNSTCW m16 D9 /7 FSTCW

Store x87 Status Word FNSTSW m16 DD /7 FSTSW
Clear x87 Exception Flags FNCLEX DB E2 FCLEX
Load x87 FPU Environment FLDENV m112/m224[c] D9 /4 (none)
Store x87 FPU Environment [c] D9 /6 FSTENV

FNSTENV m112/m224
Save x87 FPU State, then initialize x87 FPU [c] DD /6 FSAVE
FNSAVE m752/m864
Restore x87 FPU State FRSTOR m752/m864[c] DD /4 (none)

[d] FNENI DB E0 FENI
Enable Interrupts (8087 only)
Disable Interrupts (8087 only)[d] FNDISI DB E1 FDISI
precision rounding
x87 Floating-point Load/Store/Move Instructions
control control
FLD m32 D9 /0
FLD m64 DD /0
Load floating-point value onto stack No —
FLD m80 DB /5
FLD st(i) D9 C0+i
FST m32 D9 /2
No Yes
Store top-of-stack floating-point value to memory or stack register FST m64 DD /2
FST st(i)[e] DD D0+i No —
FSTP m32 D9 /3
No Yes
FSTP m64 DD /3
FSTP m80[e] DB /7
Store top-of-stack floating-point value to memory or stack register, then pop
DD D8+i
No —
FSTP st(i)[e][f] DF D0+i[g]
DF D8+i[g]
Push +0.0 onto stack FLDZ D9 EE
No —
Push +1.0 onto stack FLD1 D9 E8
Push π (approximately 3.14159) onto stack FLDPI D9 EB
Push (approximately 3.32193) onto stack FLDL2T D9 E9
Push (approximately 1.44269) onto stack FLDL2E D9 EA No 387[h]

Push (approximately 0.30103) onto stack FLDLG2 D9 EC
Push (approximately 0.69315) onto stack FLDLN2 D9 ED
D9 C8+i
Exchange top-of-stack register with other stack register FXCH st(i) [i][j] DD C8+i[g] No —
DF C8+i[g]
precision rounding
x87 Integer Load/Store Instructions
control control
FILD m16 DF /0
Load signed integer value onto stack from memory, with conversion to floating-point FILD m32 DB /0 No —
FILD m64 DF /5
FIST m16 DF /2
Store top-of-stack value to memory, with conversion to signed integer No Yes
FIST m32 DB /2
FISTP m16 DF /3
Store top-of-stack value to memory, with conversion to signed integer, then pop stack FISTP m32 DB /3 No Yes
FISTP m64 DF /7
Load 18-digit Binary-Coded-Decimal integer value onto stack from memory, with conversion to floating-point FBLD m80[k] DF /4 No —
Store top-of-stack value to memory, with conversion to 18-digit Binary-Coded-Decimal integer, then pop stack FBSTP m80 DF /6 No 387[h]
precision rounding
x87 Basic Arithmetic Instructions
control control
FADD m32 D8 /0
Floating-point add
FADD m64 DC /0
Yes Yes
dst <- dst + src FADD st,st(i) D8 C0+i
FADD st(i),st DC C0+i

FMUL m32 D8 /1
Floating-point multiply
FMUL m64 DC /1
Yes Yes
dst <- dst * src FMUL st,st(i) D8 C8+i
FMUL st(i),st DC C8+i
FSUB m32 D8 /4
Floating-point subtract
FSUB m64 DC /4
Yes Yes
dst <- dst - src FSUB st,st(i) D8 E0+i
FSUB st(i),st DC E8+i
FSUBR m32 D8 /5
Floating-point reverse subtract
FSUBR m64 DC /5
Yes Yes
dst <- src - dst FSUBR st,st(i) D8 E8+i
FSUBR st(i),st DC E0+i

FDIV m32 D8 /6
Floating-point divide[l] FDIV m64 DC /6
Yes Yes
dst <- dst / src FDIV st,st(i) D8 F0+i
FDIV st(i),st DC F8+i
FDIVR m32 D8 /7
Floating-point reverse divide
FDIVR m64 DC /7
Yes Yes
dst <- src / dst FDIVR st,st(i) D8 F8+i
FDIVR st(i),st DC F0+i
Floating-point compare FCOM m32 D8 /2
FCOM m64 DC /2
CC <- result_of( st(0) - src )
No —
Same operation as subtract, except that it updates the x87 CC status register instead of any of the FPU D8 D0+i
stack registers FCOM st(i)[i]
DC D0+i[g]
precision rounding
x87 Basic Arithmetic Instructions with Stack Pop
control control
Floating-point add and pop FADDP st(i),st[i] DE C0+i Yes Yes
Floating-point multiply and pop FMULP st(i),st[i] DE C8+i Yes Yes
Floating-point subtract and pop [i] DE E8+i Yes Yes

FSUBP st(i),st
Floating-point reverse-subtract and pop FSUBRP st(i),st[i] DE E0+i Yes Yes
Floating-point divide and pop FDIVP st(i),st[i] DE F8+i Yes Yes
Floating-point reverse-divide and pop [i] DE F0+i Yes Yes

FDIVRP st(i),st
FCOMP m32 D8 /3
FCOMP m64 DC /3
Floating-point compare and pop D8 D8+i No —
FCOMP st(i) [i] DC D8+i[g]
DE D0+i[g]
Floating-point compare to st(1), then pop twice FCOMPP DE D9 No —
precision rounding
x87 Basic Arithmetic Instructions with Integer Source Argument
control control
FIADD m16 DA /0
Floating-point add by integer Yes Yes
FIADD m32 DE /0
FIMUL m16 DA /1
Floating-point multiply by integer Yes Yes
FIMUL m32 DE /1
FISUB m16 DA /4
Floating-point subtract by integer Yes Yes
FISUB m32 DE /4
Floating-point reverse-subtract by integer FISUBR m16 DA /5 Yes Yes
FISUBR m32 DE /5
FIDIV m16 DA /6
Floating-point divide by integer Yes Yes
FIDIV m32 DE /6
FIDIVR m16 DA /7
Floating-point reverse-divide by integer Yes Yes
FIDIVR m32 DE /7
FICOM m16 DA /2
Floating-point compare to integer No —
FICOM m32 DE /2
FICOMP m16 DA /3
Floating-point compare to integer, and stack pop No —
FICOMP m32 DE /3
precision rounding
x87 Additional Arithmetic Instructions
control control
Floating-point change sign FCHS D9 E0 No —
Floating-point absolute value FABS D9 E1 No —
Floating-point compare top-of-stack value to 0 FTST D9 E4 No —

Classify top-of-stack st(0) register value.
FXAM D9 E5 No —
The classification result is stored in the x87 CC register.[m]
Split the st(0) value into two values E and M representing the exponent and mantissa of st(0).
The split is done such that , where E is an integer and M is a number whose absolute value is
FXTRACT D9 F4 No —
within the range . [n]
st(0) is then replaced with E, after which M is pushed onto the stack.
Floating-point partial[o] remainder (not IEEE 754 compliant):
FPREM D9 F8 No —[p]
Floating-point square root FSQRT D9 FA Yes Yes
Floating-point round to integer FRNDINT D9 FC No Yes
Floating-point power-of-2 scaling. Rounds the value of st(1) to integer with round-to-zero, then uses it as a scale
factor for st(0):[q]
FSCALE D9 FD No Yes[r]
Source operand
x87 Transcendental Instructions[s] range restriction
Base-2 exponential minus 1, with extra precision for st(0) close to 0:

8087:
F2XM1 D9 F0
80387:
Base-2 Logarithm:
FYL2X[t] D9 F1 no restrictions
followed by stack pop
Partial Tangent: Computes from st(0) a pair of values X and Y, such that
8087:
FPTAN D9 F2
80387:
The Y value replaces the top-of-stack value, and then X is pushed onto the stack.
On 80387 and later x87, but not original 8087, X is always 1.0
Two-argument arctangent with quadrant adjustment:[u]
8087:
FPATAN D9 F3
80387: no restrictions
Base-2 Logarithm plus 1, with extra precision for st(0) close to 0: Intel:
FYL2XP1[t] D9 F9 AMD:
Other x87 Instructions

[v] FNOP D9 D0
No operation
Decrement x87 FPU Register Stack Pointer FDECSTP D9 F6
Increment x87 FPU Register Stack Pointer FINCSTP D9 F7
Free x87 FPU Register FFREE st(i) DD C0+i

WAIT,
Check and handle pending unmasked x87 FPU exceptions 9B
FWAIT
Floating-point store and pop, without stack underflow exception FSTPNCE st(i) D9 D8+i[g]
Free x87 register, then stack pop FFREEP st(i) DF C0+i[g]
a. x87 coprocessors (other than the 8087) handle exceptions in a fairly i. For the FADDP, FSUBP, FSUBRP, FMULP, FDIVP, FDIVRP, FCOM, FCOMP and
unusual way. When an x87 instruction generates an unmasked FXCH instructions, x86 assemblers/disassemblers may recognize
arithmetic exception, it will still complete without causing a CPU fault - variants of the instructions with no arguments. Such variants are
instead of causing a fault, it will record within the coprocessor equivalent to variants using st(1) as their first argument.
information needed to handle the exception (instruction pointer, opcode, j. On Intel Pentium and later processors, FXCH is implemented as a
data pointer if the instruction had a memory operand) and set FPU register renaming rather than a true data move. This has no semantic
status-word flag to indicate that a pending exception is present. This effect, but enables zero-cycle-latency operation. It also allows the
pending exception will then cause a CPU fault when the next x87, MMX instruction to break data dependencies for the x87 top-of-stack value,
or WAIT instruction is executed. improving attainable performance for code optimized for these
The exception to this is x87's "Non-Waiting" instructions, which will processors.
execute without causing such a fault even if a pending exception is
k. The result of executing the FBLD instruction on non-BCD data is
present (with some caveats, see application note AP-578[66]). These undefined.
instructions are mostly control instructions that can inspect and/or
l. On early Intel Pentium processors, floating-point divide was subject to
modify the pending-exception state of the x87 FPU.
the Pentium FDIV bug. This also affected instructions that perform
b. For each non-waiting x87 instruction whose mnemonic begins with FN,
divide as part of their operations, such as FPREM and FPATAN.[75]
there exists a pseudo-instruction that has the same mnemonic except
without the N. These pseudo-instructions consist of a WAIT instruction m. The FXAM instruction will set C0, C2 and C3 based on value type in st(0)
(opcode 9B) followed by the corresponding non-waiting x87 instruction. as follows:
For example:
C3 C2 C0 Classification
FNCLEX is an instruction with the opcode DB E2. The corresponding
0 0 0 Unsupported (unnormal or pseudo-NaN)
pseudo-instruction FCLEX is then encoded as 9B DB E2.
FNSAVE ES:[BX+6] is an instruction with the opcode 26 DD 77 06. 0 0 1 NaN
The corresponding pseudo-instruction FSAVE ES:[BX+6] is then
encoded as 9B 26 DD 77 06 0 1 0 Normal finite number
These pseudo-instructions are commonly recognized by x86 0 1 1 Infinity

assemblers and disassemblers and treated as single instructions, even
though all x86 CPUs with x87 coprocessors execute them as a 1 0 0 Zero
sequence of two instructions.
1 0 1 Empty
c. On 80387 and later x87 FPUs, FLDENV, F(N)STENV, FRSTOR and
F(N)SAVE exist in 16-bit and 32-bit variants. The 16-bit variants will 1 1 0 Denormal number
load/store a 14-byte floating-point environment data structure to/from
1 1 1 Empty (may occur on 8087/80287 only)
memory - the 32-bit variants will load/store a 28-byte data structure
instead. (F(N)SAVE/FRSTOR will additionally load/store an additional 80
bytes of FPU data register content after the FPU environment, for a total C1 is set to the sign-bit of st(0), regardless of whether st(0) is Empty or
of 94 or 108 bytes). The choice between the 16-bit and 32-bit variants is not.
based on the CS.D bit and the presence of the 66h instruction prefix. On
8087 and 80287, only the 16-bit variants are available. n. For FXTRACT, if st(0) is zero or ±∞, then M is set equal to st(0). If st(0) is
64-bit variants of these instructions do not exist - using REX.W under zero, E is set to 0 on 8087/80287 but -∞ on 80387 and later. If st(0) is
x86-64 will cause the 32-bit variants to be used. Since these can only ±∞, then E is set to +∞.
load/store the bottom 32 bits of FIP and FDP, it is recommended to use
o. For FPREM, if the quotient Q is larger than , then the remainder
FXSAVE/FXRSTOR instead if 64-bit operation is desired.
calculation may have been done only partially - in this case, the FPREM
d. In the case of an x87 instruction producing an unmasked FPU instruction will need to be run again in order to complete the remainder
exception, the 8087 FPU will signal an IRQ some indeterminate time calculation. This is indicated by the instruction setting C2 to 1.
after the instruction was issued. This may not always be possible to If the instruction did complete the remainder calculation, it will set C2 to
handle,[67] and so the FPU offers the F(N)DISI and F(N)ENI 0 and set the three bits {C0,C3,C1} to the bottom three bits of the
instructions to set/clear the Interrupt Mask bit (bit 7) of the x87 Control quotient Q.
Word,[68] to control the interrupt.
Later x87 FPUs, from 80287 onwards, changed the FPU exception On 80387 and later, if the instruction didn't complete the remainder
mechanism to instead produce a CPU exception on the next x87 calculation, then the computed remainder Q used for argument
instruction. This made the Interrupt Mask bit unnecessary, so it was reduction will have been rounded to a multiple of 8 (or larger power-of-
removed.[69] In later Intel x87 FPUs, the F(N)ENI and F(N)DISI 2), so that the bottom 3 bits of the quotient can still be correctly retrieved
instructions were kept for backwards compatibility, executing as NOPs in a later pass that does complete the remainder calculation.
that do not modify any x87 state.
e. FST/FSTP with an 80-bit destination (m80 or st(i)) and an sNaN source p. The remainder computation done by the FPREM instruction is always
value will produce exceptions on AMD but not Intel FPUs. exact with no roundoff errors.
f. FSTP ST(0) is a commonly used idiom for popping a single register off q. For the FSCALE instruction on 8087 and 80287, st(1) is required to be in
the x87 register stack. the range . Also, its absolute value must be either 0
g. Intel x87 alias opcode. Use of this opcode is not recommended. or at least 1. If these requirements are not satisfied, the result is
undefined.
On the Intel 8087 coprocessor, several reserved opcodes would perform These restrictions were removed in the 80387.
operations behaving similarly to existing defined x87 instructions. These r. For FSCALE, rounding is only applied in the case of overflow, underflow
opcodes were documented for the 8087[70] and 80287,[71] but then or subnormal result.
omitted from later manuals until the October 2017 update of the Intel s. The x87 transcendental instructions do not obey PC or RC, but instead
SDM.[72] compute full 80-bit results. These results are not necessarily correctly
rounded (see Table-maker's dilemma) - they may have an error of up to
They are present on all known Intel x87 FPUs but unavailable on some ±1 ulp on Pentium or later, or up to ±1.5 ulps on earlier x87
older non-Intel FPUs, such as AMD Geode GX/LX, DM&P Vortex86[73] coprocessors.
and NexGen 586PF.[74] t. For the FYL2X and FYL2XP1, the maximum error bound of ±1 ulp only
holds for st(1)=1.0 - for other values of st(1), the error bound is
h. On the 8087 and 80287, FBSTP and the load-constant instructions increased to ±1.35 ulps.
always use the round-to-nearest rounding mode. On the 80387 and
later x87 FPUs, these instructions will use the rounding mode specified
in the x87 RC register.
u. For FPATAN, the following adjustments are done as compared to just v. While FNOP is a no-op in the sense that will leave the x87 FPU register
stack unmodified, it may still modify FIP and CC, and it may fault if a
computing a one-argument arctangent of the ratio : pending x87 FPU exception is present.
If both st(0) and st(1) are ±∞, then the arctangent is computed as if
each of st(0) and st(1) had been replaced with ±1 of the same sign.
This produces a result that is an odd multiple of .
If both st(0) and st(1) are ±0, then the arctangent is computed as if
st(0) but not st(1) had been replaced with ±1 of the same sign,
producing a result of ±0 or .
If st(0) is negative (has sign bit set), then an addend of with the
same sign as st(1) is added to the result.
x87 instructions added in later processors
Instruction description Mnemonic Opcode Additional items
Waiting
x87 Non-Waiting Control Instructions added in 80287
mnemonic
Notify FPU of entry into Protected Mode FNSETPM[a] DB E4 FSETPM
Store x87 Status Word to AX FNSTSW AX DF E0 FSTSW AX
Source operand
x87 Instructions added in 80387
range restriction
Floating-point unordered compare.

Similar to the regular floating-point compare instruction FCOM, except will not produce an exception in response FUCOM st(i)[b] DD E0+i
to any qNaN operands.
Floating-point unordered compare and pop FUCOMP st(i)[b] DD E8+i no restrictions
Floating-point unordered compare to st(1), then pop twice FUCOMPP DA E9
IEEE 754 compliant floating-point partial remainder.[c] FPREM1 D9 F5
Floating-point sine and cosine.

Computes two values and [d] FSINCOS D9 FB
Top-of-stack st(0) is replaced with S, after which C is pushed onto the stack.
Floating-point sine.[d]
FSIN D9 FE
Floating-point cosine.[d]
FCOS D9 FF
Condition for
x87 Instructions added in Pentium Pro
conditional moves
FCMOVB st(0),st(i) DA C0+i below (CF=1)
FCMOVE st(0),st(i) DA C8+i equal (ZF=1)

below or equal
FCMOVBE st(0),st(i) DA D0+i
(CF=1 or ZF=1)
FCMOVU st(0),st(i) DA D8+i unordered (PF=1)

Floating-point conditional move to st(0) based on EFLAGS
FCMOVNB st(0),st(i) DB C0+i not below (CF=0)
FCMOVNE st(0),st(i) DB C8+i not equal (ZF=0)
not below or equal

FCMOVNBE st(0),st(i) DB D0+i
(CF=0 and ZF=0)
FCMOVNU st(0),st(i) DB D8+i not unordered (PF=0)

Floating-point compare and set EFLAGS.
Differs from the older FCOM floating-point compare instruction in that it puts its result in the integer EFLAGS FCOMI st(0),st(i) DB F0+i
register rather than the x87 CC register.[e]
Floating-point compare and set EFLAGS, then pop FCOMIP st(0),st(i) DF F0+i
Floating-point unordered compare and set EFLAGS FUCOMI st(0),st(i) DB E8+i
Floating-point unordered compare and set EFLAGS, then pop FUCOMIP st(0),st(i) DF E8+i
64-bit mnemonic
x87 Non-Waiting Instructions added in Pentium II[f] (REX.W prefix)
Save x87, MMX and SSE state to 512-byte data structure[g][h][i] FXSAVE m512byte NP 0F AE /0 FXSAVE64 m512byte
[g][h] FXRSTOR m512byte NP 0F AE /1 FXRSTOR64 m512byte
Restore x87, MMX and SSE state from 512-byte data structure
x87 Instructions added as part of SSE3
Floating-point store integer and pop, with round-to-zero FISTTP m16 DF /1
FISTTP m32 DB /1
FISTTP m64 DD /1
a. The x87 FPU needs to know whether it is operating in Real Mode or Protected Mode because the floating-point environment accessed by the
F(N)SAVE, FRSTOR, FLDENV and F(N)STENV instructions has different formats in Real Mode and Protected Mode. On 80287, the F(N)SETPM instruction
is required to communicate the real-to-protected mode transition to the FPU. On 80387 and later x87 FPUs, real↔protected mode transitions are
communicated automatically to the FPU without the need for any dedicated instructions - therefore, on these FPUs, FNSETPM executes as a NOP that
does not modify any FPU state.
b. For the FUCOM and FUCOMP instructions, x86 assemblers/disassemblers may recognize variants of the instructions with no arguments. Such variants
are equivalent to variants using st(1) as their first argument.
c. The 80387 FPREM1 instruction differs from the older FPREM (D9 F8) instruction in that the quotient Q is rounded to integer with round-to-nearest-even
rounding rather than the round-to-zero rounding used by FPREM. Like FPREM, FPREM1 always computes an exact result with no roundoff errors. Like
FPREM, it may also perform a partial computation if the quotient is too large, in which case it must be run again.
d. Due to the x87 FPU performing argument reduction for sin/cos with only about 68 bits of precision, the value of k used in the calculation of FSIN, FCOS
and FSINCOS is not precisely 1.0, but instead given by[76][77]
This argument reduction inaccuracy also affects the FPTAN instruction.

e. The FCOMI, FCOMIP, FUCOMI and FUCOMIP instructions write their results to the ZF, CF and PF bits of the EFLAGS register. On Intel but not AMD
processors, the SF, AF and OF bits of EFLAGS are also zeroed out by these instructions.
f. The FXSAVE and FXRSTOR instructions were added in the "Deschutes" revision of Pentium II, and are not present in earlier "Klamath" revision.
g. The FXSAVE and FXRSTOR instructions will save/restore SSE state only on processors that support SSE. Otherwise, they will only save/restore x87 and
MMX state.
The x87 section of the state saved/restored by FXSAVE/FXRSTOR has a completely different layout than the data structure of the older
F(N)SAVE/FRSTOR instructions, enabling faster save/restore by avoiding misaligned loads and stores.
h. When floating-point emulation is enabled with CR0.EM=1, FXSAVE(64) and FXRSTOR(64) are considered to be x87 instructions and will accordingly
produce an #NM (device-not-available) exception. Other than WAIT, these are the only opcodes outside the D8..DF ESC opcode space that exhibit this
behavior. (All opcodes in D8..DF will produce #NM if CR0.EM=1, even for undefined opcodes that would produce #UD otherwise.
i. Unlike the older F(N)SAVE instruction, FXSAVE will not initialize the FPU after saving its state to memory, but instead leave the x87 coprocessor state
unmodified.
SIMD instructions
MMX instructions
MMX instructions operate on the mm registers, which are 64 bits wide. They are shared with the FPU registers.
Original MMX instructions
Added with Pentium MMX
EMMS 0F 77 Empty MMX Technology State Marks all x87 FPU registers for use by FPU
MOVD mm, r/m32 0F 6E /r Move doubleword

MOVD r/m32, mm 0F 7E /r Move doubleword
MOVQ mm/m64, mm 0F 7F /r Move quadword
MOVQ mm, mm/m64 0F 6F /r Move quadword

MOVQ mm, r/m64 REX.W + 0F 6E /r Move quadword
MOVQ r/m64, mm REX.W + 0F 7E /r Move quadword
PACKSSDW mm1, mm2/m64 0F 6B /r Pack doublewords to words (signed with saturation)

PACKSSWB mm1, mm2/m64 0F 63 /r Pack words to bytes (signed with saturation)
PACKUSWB mm, mm/m64 0F 67 /r Pack words to bytes (unsigned with saturation)
PADDB mm, mm/m64 0F FC /r Add packed byte integers

PADDW mm, mm/m64 0F FD /r Add packed word integers
PADDD mm, mm/m64 0F FE /r Add packed doubleword integers
PADDQ mm, mm/m64 0F D4 /r Add packed quadword integers

PADDSB mm, mm/m64 0F EC /r Add packed signed byte integers and saturate
PADDSW mm, mm/m64 0F ED /r Add packed signed word integers and saturate
PADDUSB mm, mm/m64 0F DC /r Add packed unsigned byte integers and saturate
PADDUSW mm, mm/m64 0F DD /r Add packed unsigned word integers and saturate
PAND mm, mm/m64 0F DB /r Bitwise AND

PANDN mm, mm/m64 0F DF /r Bitwise AND NOT
POR mm, mm/m64 0F EB /r Bitwise OR
PXOR mm, mm/m64 0F EF /r Bitwise XOR

PCMPEQB mm, mm/m64 0F 74 /r Compare packed bytes for equality
PCMPEQW mm, mm/m64 0F 75 /r Compare packed words for equality
PCMPEQD mm, mm/m64 0F 76 /r Compare packed doublewords for equality

PCMPGTB mm, mm/m64 0F 64 /r Compare packed signed byte integers for greater than
PCMPGTW mm, mm/m64 0F 65 /r Compare packed signed word integers for greater than
PCMPGTD mm, mm/m64 0F 66 /r Compare packed signed doubleword integers for greater than
PMADDWD mm, mm/m64 0F F5 /r Multiply packed words, add adjacent doubleword results
PMULHW mm, mm/m64 0F E5 /r Multiply packed signed word integers, store high 16 bits of results
PMULLW mm, mm/m64 0F D5 /r Multiply packed signed word integers, store low 16 bits of results
PSLLW mm1, imm8 0F 71 /6 ib Shift left words, shift in zeros
PSLLW mm, mm/m64 0F F1 /r Shift left words, shift in zeros
PSLLD mm, imm8 0F 72 /6 ib Shift left doublewords, shift in zeros

PSLLD mm, mm/m64 0F F2 /r Shift left doublewords, shift in zeros
PSLLQ mm, imm8 0F 73 /6 ib Shift left quadword, shift in zeros
PSLLQ mm, mm/m64 0F F3 /r Shift left quadword, shift in zeros

PSRAD mm, imm8 0F 72 /4 ib Shift right doublewords, shift in sign bits
PSRAD mm, mm/m64 0F E2 /r Shift right doublewords, shift in sign bits
PSRAW mm, imm8 0F 71 /4 ib Shift right words, shift in sign bits

PSRAW mm, mm/m64 0F E1 /r Shift right words, shift in sign bits
PSRLW mm, imm8 0F 71 /2 ib Shift right words, shift in zeros
PSRLW mm, mm/m64 0F D1 /r Shift right words, shift in zeros

PSRLD mm, imm8 0F 72 /2 ib Shift right doublewords, shift in zeros
PSRLD mm, mm/m64 0F D2 /r Shift right doublewords, shift in zeros
PSRLQ mm, imm8 0F 73 /2 ib Shift right quadword, shift in zeros

PSRLQ mm, mm/m64 0F D3 /r Shift right quadword, shift in zeros
PSUBB mm, mm/m64 0F F8 /r Subtract packed byte integers

PSUBW mm, mm/m64 0F F9 /r Subtract packed word integers
PSUBD mm, mm/m64 0F FA /r Subtract packed doubleword integers
PSUBSB mm, mm/m64 0F E8 /r Subtract signed packed bytes with saturation

PSUBSW mm, mm/m64 0F E9 /r Subtract signed packed words with saturation
PSUBUSB mm, mm/m64 0F D8 /r Subtract unsigned packed bytes with saturation
PSUBUSW mm, mm/m64 0F D9 /r Subtract unsigned packed words with saturation
PUNPCKHBW mm, mm/m64 0F 68 /r Unpack and interleave high-order bytes

PUNPCKHWD mm, mm/m64 0F 69 /r Unpack and interleave high-order words
PUNPCKHDQ mm, mm/m64 0F 6A /r Unpack and interleave high-order doublewords
PUNPCKLBW mm, mm/m32 0F 60 /r Unpack and interleave low-order bytes

PUNPCKLWD mm, mm/m32 0F 61 /r Unpack and interleave low-order words
PUNPCKLDQ mm, mm/m32 0F 62 /r Unpack and interleave low-order doublewords
MMX instructions added in specific processors
MMX instructions added with MMX+ and SSE
The following MMX instruction were added with SSE. They are also available on the Athlon under the name MMX+.
Instruction Opcode Meaning

MASKMOVQ mm1, mm2 0F F7 /r Masked Move of Quadword
MOVNTQ m64, mm 0F E7 /r Move Quadword Using Non-Temporal Hint
PSHUFW mm1, mm2/m64, imm8 0F 70 /r ib Shuffle Packed Words

PINSRW mm, r32/m16, imm8 0F C4 /r Insert Word
PEXTRW reg, mm, imm8 0F C5 /r Extract Word
PMOVMSKB reg, mm 0F D7 /r Move Byte Mask

PMINUB mm1, mm2/m64 0F DA /r Minimum of Packed Unsigned Byte Integers
PMAXUB mm1, mm2/m64 0F DE /r Maximum of Packed Unsigned Byte Integers
PAVGB mm1, mm2/m64 0F E0 /r Average Packed Integers

PAVGW mm1, mm2/m64 0F E3 /r Average Packed Integers
PMULHUW mm1, mm2/m64 0F E4 /r Multiply Packed Unsigned Integers and Store High Result
PMINSW mm1, mm2/m64 0F EA /r Minimum of Packed Signed Word Integers
PMAXSW mm1, mm2/m64 0F EE /r Maximum of Packed Signed Word Integers
PSADBW mm1, mm2/m64 0F F6 /r Compute Sum of Absolute Differences
MMX instructions added with SSE2
The following MMX instructions were added with SSE2:
PSUBQ mm1, mm2/m64 0F FB /r Subtract quadword integer

PMULUDQ mm1, mm2/m64 0F F4 /r Multiply unsigned doubleword integer
MMX instructions added with SSSE3

PSIGNB mm1, mm2/m64 0F 38 08 /r Negate/zero/preserve packed byte integers depending on corresponding sign
PSIGNW mm1, mm2/m64 0F 38 09 /r Negate/zero/preserve packed word integers depending on corresponding sign
PSIGND mm1, mm2/m64 0F 38 0A /r Negate/zero/preserve packed doubleword integers depending on corresponding sign
PSHUFB mm1, mm2/m64 0F 38 00 /r Shuffle bytes
PMULHRSW mm1, mm2/m64 0F 38 0B /r Multiply 16-bit signed words, scale and round signed doublewords, pack high 16 bits
PMADDUBSW mm1, mm2/m64 0F 38 04 /r Multiply signed and unsigned bytes, add horizontal pair of signed words, pack saturated signed-words
PHSUBW mm1, mm2/m64 0F 38 05 /r Subtract and pack 16-bit signed integers horizontally
PHSUBSW mm1, mm2/m64 0F 38 07 /r Subtract and pack 16-bit signed integer horizontally with saturation
PHSUBD mm1, mm2/m64 0F 38 06 /r Subtract and pack 32-bit signed integers horizontally
PHADDSW mm1, mm2/m64 0F 38 03 /r Add and pack 16-bit signed integers horizontally, pack saturated integers to mm1.
PHADDW mm1, mm2/m64 0F 38 01 /r Add and pack 16-bit integers horizontally

PHADDD mm1, mm2/m64 0F 38 02 /r Add and pack 32-bit integers horizontally
PALIGNR mm1, mm2/m64, imm8 0F 3A 0F /r ib Concatenate destination and source operands, extract byte-aligned result shifted to the right
PABSB mm1, mm2/m64 0F 38 1C /r Compute the absolute value of bytes and store unsigned result
PABSW mm1, mm2/m64 0F 38 1D /r Compute the absolute value of 16-bit integers and store unsigned result
PABSD mm1, mm2/m64 0F 38 1E /r Compute the absolute value of 32-bit integers and store unsigned result
SSE instructions
Added with Pentium III
SSE instructions operate on xmm registers, which are 128 bit wide.
SSE consists of the following SSE SIMD floating-point instructions:
ANDPS* xmm1, xmm2/m128 0F 54 /r Bitwise Logical AND of Packed Single-Precision Floating-Point Values
ANDNPS* xmm1, xmm2/m128 0F 55 /r Bitwise Logical AND NOT of Packed Single-Precision Floating-Point Values
ORPS* xmm1, xmm2/m128 0F 56 /r Bitwise Logical OR of Single-Precision Floating-Point Values
XORPS* xmm1, xmm2/m128 0F 57 /r Bitwise Logical XOR for Single-Precision Floating-Point Values
MOVUPS xmm1, xmm2/m128 0F 10 /r Move Unaligned Packed Single-Precision Floating-Point Values

MOVSS xmm1, xmm2/m32 F3 0F 10 /r Move Scalar Single-Precision Floating-Point Values
MOVUPS xmm2/m128, xmm1 0F 11 /r Move Unaligned Packed Single-Precision Floating-Point Values
MOVSS xmm2/m32, xmm1 F3 0F 11 /r Move Scalar Single-Precision Floating-Point Values

MOVLPS xmm, m64 0F 12 /r Move Low Packed Single-Precision Floating-Point Values
MOVHLPS xmm1, xmm2 0F 12 /r Move Packed Single-Precision Floating-Point Values High to Low
MOVLPS m64, xmm 0F 13 /r Move Low Packed Single-Precision Floating-Point Values

UNPCKLPS xmm1, xmm2/m128 0F 14 /r Unpack and Interleave Low Packed Single-Precision Floating-Point Values
UNPCKHPS xmm1, xmm2/m128 0F 15 /r Unpack and Interleave High Packed Single-Precision Floating-Point Values
MOVHPS xmm, m64 0F 16 /r Move High Packed Single-Precision Floating-Point Values

MOVLHPS xmm1, xmm2 0F 16 /r Move Packed Single-Precision Floating-Point Values Low to High
MOVHPS m64, xmm 0F 17 /r Move High Packed Single-Precision Floating-Point Values

MOVAPS xmm1, xmm2/m128 0F 28 /r Move Aligned Packed Single-Precision Floating-Point Values
MOVAPS xmm2/m128, xmm1 0F 29 /r Move Aligned Packed Single-Precision Floating-Point Values
MOVNTPS m128, xmm1 0F 2B /r Move Aligned Four Packed Single-FP Non Temporal
MOVMSKPS reg, xmm 0F 50 /r Extract Packed Single-Precision Floating-Point 4-bit Sign Mask. The upper bits of the register are filled with zeros.
CVTPI2PS xmm, mm/m64 0F 2A /r Convert Packed Dword Integers to Packed Single-Precision FP Values
CVTSI2SS xmm, r/m32 F3 0F 2A /r Convert Dword Integer to Scalar Single-Precision FP Value

CVTSI2SS xmm, r/m64 F3 REX.W 0F 2A /r Convert Qword Integer to Scalar Single-Precision FP Value
CVTTPS2PI mm, xmm/m64 0F 2C /r Convert with Truncation Packed Single-Precision FP Values to Packed Dword Integers
CVTTSS2SI r32, xmm/m32 F3 0F 2C /r Convert with Truncation Scalar Single-Precision FP Value to Dword Integer
CVTTSS2SI r64, xmm1/m32 F3 REX.W 0F 2C /r Convert with Truncation Scalar Single-Precision FP Value to Qword Integer
CVTPS2PI mm, xmm/m64 0F 2D /r Convert Packed Single-Precision FP Values to Packed Dword Integers
CVTSS2SI r32, xmm/m32 F3 0F 2D /r Convert Scalar Single-Precision FP Value to Dword Integer

CVTSS2SI r64, xmm1/m32 F3 REX.W 0F 2D /r Convert Scalar Single-Precision FP Value to Qword Integer
UCOMISS xmm1, xmm2/m32 0F 2E /r Unordered Compare Scalar Single-Precision Floating-Point Values and Set EFLAGS
COMISS xmm1, xmm2/m32 0F 2F /r Compare Scalar Ordered Single-Precision Floating-Point Values and Set EFLAGS
SQRTPS xmm1, xmm2/m128 0F 51 /r Compute Square Roots of Packed Single-Precision Floating-Point Values
SQRTSS xmm1, xmm2/m32 F3 0F 51 /r Compute Square Root of Scalar Single-Precision Floating-Point Value
RSQRTPS xmm1, xmm2/m128 0F 52 /r Compute Reciprocal of Square Root of Packed Single-Precision Floating-Point Value
RSQRTSS xmm1, xmm2/m32 F3 0F 52 /r Compute Reciprocal of Square Root of Scalar Single-Precision Floating-Point Value
RCPPS xmm1, xmm2/m128 0F 53 /r Compute Reciprocal of Packed Single-Precision Floating-Point Values
RCPSS xmm1, xmm2/m32 F3 0F 53 /r Compute Reciprocal of Scalar Single-Precision Floating-Point Values

ADDPS xmm1, xmm2/m128 0F 58 /r Add Packed Single-Precision Floating-Point Values
ADDSS xmm1, xmm2/m32 F3 0F 58 /r Add Scalar Single-Precision Floating-Point Values
MULPS xmm1, xmm2/m128 0F 59 /r Multiply Packed Single-Precision Floating-Point Values

MULSS xmm1, xmm2/m32 F3 0F 59 /r Multiply Scalar Single-Precision Floating-Point Values
SUBPS xmm1, xmm2/m128 0F 5C /r Subtract Packed Single-Precision Floating-Point Values
SUBSS xmm1, xmm2/m32 F3 0F 5C /r Subtract Scalar Single-Precision Floating-Point Values

MINPS xmm1, xmm2/m128 0F 5D /r Return Minimum Packed Single-Precision Floating-Point Values
MINSS xmm1, xmm2/m32 F3 0F 5D /r Return Minimum Scalar Single-Precision Floating-Point Values
DIVPS xmm1, xmm2/m128 0F 5E /r Divide Packed Single-Precision Floating-Point Values

DIVSS xmm1, xmm2/m32 F3 0F 5E /r Divide Scalar Single-Precision Floating-Point Values
MAXPS xmm1, xmm2/m128 0F 5F /r Return Maximum Packed Single-Precision Floating-Point Values

MAXSS xmm1, xmm2/m32 F3 0F 5F /r Return Maximum Scalar Single-Precision Floating-Point Values
LDMXCSR m32 0F AE /2 Load MXCSR Register State
STMXCSR m32 0F AE /3 Store MXCSR Register State

CMPPS xmm1, xmm2/m128, imm8 0F C2 /r ib Compare Packed Single-Precision Floating-Point Values
CMPSS xmm1, xmm2/m32, imm8 F3 0F C2 /r ib Compare Scalar Single-Precision Floating-Point Values
SHUFPS xmm1, xmm2/m128, imm8 0F C6 /r ib Shuffle Packed Single-Precision Floating-Point Values
The floating point single bitwise operations ANDPS, ANDNPS, ORPS and XORPS produce the same result as the SSE2 integer (PAND, PANDN,
POR, PXOR) and double ones (ANDPD, ANDNPD, ORPD, XORPD), but can introduce extra latency for domain changes when applied values of the
wrong type.[78]
SSE2 instructions
Added with Pentium 4
SSE2 SIMD floating-point instructions
SSE2 data movement instructions
MOVAPD xmm1, xmm2/m128 66 0F 28 /r Move Aligned Packed Double-Precision Floating-Point Values
MOVAPD xmm2/m128, xmm1 66 0F 29 /r Move Aligned Packed Double-Precision Floating-Point Values

MOVNTPD m128, xmm1 66 0F 2B /r Store Packed Double-Precision Floating-Point Values Using Non-Temporal Hint
MOVHPD xmm1, m64 66 0F 16 /r Move High Packed Double-Precision Floating-Point Value
MOVHPD m64, xmm1 66 0F 17 /r Move High Packed Double-Precision Floating-Point Value

MOVLPD xmm1, m64 66 0F 12 /r Move Low Packed Double-Precision Floating-Point Value
MOVLPD m64, xmm1 66 0F 13/r Move Low Packed Double-Precision Floating-Point Value
MOVUPD xmm1, xmm2/m128 66 0F 10 /r Move Unaligned Packed Double-Precision Floating-Point Values

MOVUPD xmm2/m128, xmm1 66 0F 11 /r Move Unaligned Packed Double-Precision Floating-Point Values
MOVMSKPD reg, xmm 66 0F 50 /r Extract Packed Double-Precision Floating-Point Sign Mask
MOVSD* xmm1, xmm2/m64 F2 0F 10 /r Move or Merge Scalar Double-Precision Floating-Point Value

MOVSD xmm1/m64, xmm2 F2 0F 11 /r Move or Merge Scalar Double-Precision Floating-Point Value
SSE2 packed arithmetic instructions

ADDPD xmm1, xmm2/m128 66 0F 58 /r Add Packed Double-Precision Floating-Point Values
ADDSD xmm1, xmm2/m64 F2 0F 58 /r Add Low Double-Precision Floating-Point Value
DIVPD xmm1, xmm2/m128 66 0F 5E /r Divide Packed Double-Precision Floating-Point Values

DIVSD xmm1, xmm2/m64 F2 0F 5E /r Divide Scalar Double-Precision Floating-Point Value
MAXPD xmm1, xmm2/m128 66 0F 5F /r Maximum of Packed Double-Precision Floating-Point Values
MAXSD xmm1, xmm2/m64 F2 0F 5F /r Return Maximum Scalar Double-Precision Floating-Point Value

MINPD xmm1, xmm2/m128 66 0F 5D /r Minimum of Packed Double-Precision Floating-Point Values
MINSD xmm1, xmm2/m64 F2 0F 5D /r Return Minimum Scalar Double-Precision Floating-Point Value
MULPD xmm1, xmm2/m128 66 0F 59 /r Multiply Packed Double-Precision Floating-Point Values

MULSD xmm1,xmm2/m64 F2 0F 59 /r Multiply Scalar Double-Precision Floating-Point Value
SQRTPD xmm1, xmm2/m128 66 0F 51 /r Square Root of Double-Precision Floating-Point Values
SQRTSD xmm1,xmm2/m64 F2 0F 51/r Compute Square Root of Scalar Double-Precision Floating-Point Value
SUBPD xmm1, xmm2/m128 66 0F 5C /r Subtract Packed Double-Precision Floating-Point Values
SUBSD xmm1, xmm2/m64 F2 0F 5C /r Subtract Scalar Double-Precision Floating-Point Value
SSE2 logical instructions
ANDPD xmm1, xmm2/m128 66 0F 54 /r Bitwise Logical AND of Packed Double Precision Floating-Point Values
ANDNPD xmm1, xmm2/m128 66 0F 55 /r Bitwise Logical AND NOT of Packed Double Precision Floating-Point Values
ORPD xmm1, xmm2/m128 66 0F 56/r Bitwise Logical OR of Packed Double Precision Floating-Point Values
XORPD xmm1, xmm2/m128 66 0F 57/r Bitwise Logical XOR of Packed Double Precision Floating-Point Values
SSE2 compare instructions
CMPPD xmm1, xmm2/m128, imm8 66 0F C2 /r ib Compare Packed Double-Precision Floating-Point Values

CMPSD* xmm1, xmm2/m64, imm8 F2 0F C2 /r ib Compare Low Double-Precision Floating-Point Values
COMISD xmm1, xmm2/m64 66 0F 2F /r Compare Scalar Ordered Double-Precision Floating-Point Values and Set EFLAGS
UCOMISD xmm1, xmm2/m64 66 0F 2E /r Unordered Compare Scalar Double-Precision Floating-Point Values and Set EFLAGS
SSE2 shuffle and unpack instructions
SHUFPD xmm1, xmm2/m128, imm8 66 0F C6 /r ib Packed Interleave Shuffle of Pairs of Double-Precision Floating-Point Values
UNPCKHPD xmm1, xmm2/m128 66 0F 15 /r Unpack and Interleave High Packed Double-Precision Floating-Point Values
UNPCKLPD xmm1, xmm2/m128 66 0F 14 /r Unpack and Interleave Low Packed Double-Precision Floating-Point Values
SSE2 conversion instructions
CVTDQ2PD xmm1, xmm2/m64 F3 0F E6 /r Convert Packed Doubleword Integers to Packed Double-Precision Floating-Point Values
CVTDQ2PS xmm1, xmm2/m128 0F 5B /r Convert Packed Doubleword Integers to Packed Single-Precision Floating-Point Values
CVTPD2DQ xmm1, xmm2/m128 F2 0F E6 /r Convert Packed Double-Precision Floating-Point Values to Packed Doubleword Integers
CVTPD2PI mm, xmm/m128 66 0F 2D /r Convert Packed Double-Precision FP Values to Packed Dword Integers
CVTPD2PS xmm1, xmm2/m128 66 0F 5A /r Convert Packed Double-Precision Floating-Point Values to Packed Single-Precision Floating-Point Values
CVTPI2PD xmm, mm/m64 66 0F 2A /r Convert Packed Dword Integers to Packed Double-Precision FP Values
CVTPS2DQ xmm1, xmm2/m128 66 0F 5B /r Convert Packed Single-Precision Floating-Point Values to Packed Signed Doubleword Integer Values
CVTPS2PD xmm1, xmm2/m64 0F 5A /r Convert Packed Single-Precision Floating-Point Values to Packed Double-Precision Floating-Point Values
CVTSD2SI r32, xmm1/m64 F2 0F 2D /r Convert Scalar Double-Precision Floating-Point Value to Doubleword Integer
CVTSD2SI r64, xmm1/m64 F2 REX.W 0F 2D /r Convert Scalar Double-Precision Floating-Point Value to Quadword Integer With Sign Extension
CVTSD2SS xmm1, xmm2/m64 F2 0F 5A /r Convert Scalar Double-Precision Floating-Point Value to Scalar Single-Precision Floating-Point Value
CVTSI2SD xmm1, r32/m32 F2 0F 2A /r Convert Doubleword Integer to Scalar Double-Precision Floating-Point Value
CVTSI2SD xmm1, r/m64 F2 REX.W 0F 2A /r Convert Quadword Integer to Scalar Double-Precision Floating-Point value
CVTSS2SD xmm1, xmm2/m32 F3 0F 5A /r Convert Scalar Single-Precision Floating-Point Value to Scalar Double-Precision Floating-Point Value
CVTTPD2DQ xmm1, xmm2/m128 66 0F E6 /r Convert with Truncation Packed Double-Precision Floating-Point Values to Packed Doubleword Integers
CVTTPD2PI mm, xmm/m128 66 0F 2C /r Convert with Truncation Packed Double-Precision FP Values to Packed Dword Integers
CVTTPS2DQ xmm1, xmm2/m128 F3 0F 5B /r Convert with Truncation Packed Single-Precision Floating-Point Values to Packed Signed Doubleword Integer Values
CVTTSD2SI r32, xmm1/m64 F2 0F 2C /r Convert with Truncation Scalar Double-Precision Floating-Point Value to Signed Dword Integer
CVTTSD2SI r64, xmm1/m64 F2 REX.W 0F 2C /r Convert with Truncation Scalar Double-Precision Floating-Point Value To Signed Qword Integer
CMPSD and MOVSD have the same name as the string instruction mnemonics CMPSD (CMPS) and MOVSD (MOVS); however, the former refer to
scalar double-precision floating-points whereas the latters refer to doubleword strings.
SSE2 SIMD integer instructions
SSE2 MMX-like instructions extended to SSE registers
SSE2 allows execution of MMX instructions on SSE registers, processing twice the amount of data at once.
MOVD xmm, r/m32 66 0F 6E /r Move doubleword
MOVD r/m32, xmm 66 0F 7E /r Move doubleword

MOVQ xmm1, xmm2/m64 F3 0F 7E /r Move quadword
MOVQ xmm2/m64, xmm1 66 0F D6 /r Move quadword
MOVQ r/m64, xmm 66 REX.W 0F 7E /r Move quadword

MOVQ xmm, r/m64 66 REX.W 0F 6E /r Move quadword
PMOVMSKB reg, xmm 66 0F D7 /r Move a byte mask, zeroing the upper bits of the register
PEXTRW reg, xmm, imm8 66 0F C5 /r ib Extract specified word and move it to reg, setting bits 15-0 and zeroing the rest
PINSRW xmm, r32/m16, imm8 66 0F C4 /r ib Move low word at the specified word position
PACKSSDW xmm1, xmm2/m128 66 0F 6B /r Converts 4 packed signed doubleword integers into 8 packed signed word integers with saturation
PACKSSWB xmm1, xmm2/m128 66 0F 63 /r Converts 8 packed signed word integers into 16 packed signed byte integers with saturation
PACKUSWB xmm1, xmm2/m128 66 0F 67 /r Converts 8 signed word integers into 16 unsigned byte integers with saturation
PADDB xmm1, xmm2/m128 66 0F FC /r Add packed byte integers
PADDW xmm1, xmm2/m128 66 0F FD /r Add packed word integers

PADDD xmm1, xmm2/m128 66 0F FE /r Add packed doubleword integers
PADDQ xmm1, xmm2/m128 66 0F D4 /r Add packed quadword integers.

PADDSB xmm1, xmm2/m128 66 0F EC /r Add packed signed byte integers with saturation
PADDSW xmm1, xmm2/m128 66 0F ED /r Add packed signed word integers with saturation
PADDUSB xmm1, xmm2/m128 66 0F DC /r Add packed unsigned byte integers with saturation
PADDUSW xmm1, xmm2/m128 66 0F DD /r Add packed unsigned word integers with saturation
PAND xmm1, xmm2/m128 66 0F DB /r Bitwise AND
PANDN xmm1, xmm2/m128 66 0F DF /r Bitwise AND NOT

POR xmm1, xmm2/m128 66 0F EB /r Bitwise OR
PXOR xmm1, xmm2/m128 66 0F EF /r Bitwise XOR
PCMPEQB xmm1, xmm2/m128 66 0F 74 /r Compare packed bytes for equality.

PCMPEQW xmm1, xmm2/m128 66 0F 75 /r Compare packed words for equality.
PCMPEQD xmm1, xmm2/m128 66 0F 76 /r Compare packed doublewords for equality.
PCMPGTB xmm1, xmm2/m128 66 0F 64 /r Compare packed signed byte integers for greater than
PCMPGTW xmm1, xmm2/m128 66 0F 65 /r Compare packed signed word integers for greater than
PCMPGTD xmm1, xmm2/m128 66 0F 66 /r Compare packed signed doubleword integers for greater than
PMULLW xmm1, xmm2/m128 66 0F D5 /r Multiply packed signed word integers with saturation
PMULHW xmm1, xmm2/m128 66 0F E5 /r Multiply the packed signed word integers, store the high 16 bits of the results
PMULHUW xmm1, xmm2/m128 66 0F E4 /r Multiply packed unsigned word integers, store the high 16 bits of the results
PMULUDQ xmm1, xmm2/m128 66 0F F4 /r Multiply packed unsigned doubleword integers

PSLLW xmm1, xmm2/m128 66 0F F1 /r Shift words left while shifting in 0s
PSLLW xmm1, imm8 66 0F 71 /6 ib Shift words left while shifting in 0s
PSLLD xmm1, xmm2/m128 66 0F F2 /r Shift doublewords left while shifting in 0s

PSLLD xmm1, imm8 66 0F 72 /6 ib Shift doublewords left while shifting in 0s
PSLLQ xmm1, xmm2/m128 66 0F F3 /r Shift quadwords left while shifting in 0s
PSLLQ xmm1, imm8 66 0F 73 /6 ib Shift quadwords left while shifting in 0s

PSRAD xmm1, xmm2/m128 66 0F E2 /r Shift doubleword right while shifting in sign bits
PSRAD xmm1, imm8 66 0F 72 /4 ib Shift doublewords right while shifting in sign bits
PSRAW xmm1, xmm2/m128 66 0F E1 /r Shift words right while shifting in sign bits
PSRAW xmm1, imm8 66 0F 71 /4 ib Shift words right while shifting in sign bits
PSRLW xmm1, xmm2/m128 66 0F D1 /r Shift words right while shifting in 0s
PSRLW xmm1, imm8 66 0F 71 /2 ib Shift words right while shifting in 0s

PSRLD xmm1, xmm2/m128 66 0F D2 /r Shift doublewords right while shifting in 0s
PSRLD xmm1, imm8 66 0F 72 /2 ib Shift doublewords right while shifting in 0s

PSRLQ xmm1, xmm2/m128 66 0F D3 /r Shift quadwords right while shifting in 0s
PSRLQ xmm1, imm8 66 0F 73 /2 ib Shift quadwords right while shifting in 0s
PSUBB xmm1, xmm2/m128 66 0F F8 /r Subtract packed byte integers

PSUBW xmm1, xmm2/m128 66 0F F9 /r Subtract packed word integers
PSUBD xmm1, xmm2/m128 66 0F FA /r Subtract packed doubleword integers
PSUBQ xmm1, xmm2/m128 66 0F FB /r Subtract packed quadword integers.
PSUBSB xmm1, xmm2/m128 66 0F E8 /r Subtract packed signed byte integers with saturation
PSUBSW xmm1, xmm2/m128 66 0F E9 /r Subtract packed signed word integers with saturation
PMADDWD xmm1, xmm2/m128 66 0F F5 /r Multiply the packed word integers, add adjacent doubleword results
PSUBUSB xmm1, xmm2/m128 66 0F D8 /r Subtract packed unsigned byte integers with saturation
PSUBUSW xmm1, xmm2/m128 66 0F D9 /r Subtract packed unsigned word integers with saturation
PUNPCKHBW xmm1, xmm2/m128 66 0F 68 /r Unpack and interleave high-order bytes
PUNPCKHWD xmm1, xmm2/m128 66 0F 69 /r Unpack and interleave high-order words

PUNPCKHDQ xmm1, xmm2/m128 66 0F 6A /r Unpack and interleave high-order doublewords
PUNPCKLBW xmm1, xmm2/m128 66 0F 60 /r Interleave low-order bytes
PUNPCKLWD xmm1, xmm2/m128 66 0F 61 /r Interleave low-order words

PUNPCKLDQ xmm1, xmm2/m128 66 0F 62 /r Interleave low-order doublewords
PAVGB xmm1, xmm2/m128 66 0F E0, /r Average packed unsigned byte integers with rounding
PAVGW xmm1, xmm2/m128 66 0F E3 /r Average packed unsigned word integers with rounding
PMINUB xmm1, xmm2/m128 66 0F DA /r Compare packed unsigned byte integers and store packed minimum values
PMINSW xmm1, xmm2/m128 66 0F EA /r Compare packed signed word integers and store packed minimum values
PMAXSW xmm1, xmm2/m128 66 0F EE /r Compare packed signed word integers and store maximum packed values
PMAXUB xmm1, xmm2/m128 66 0F DE /r Compare packed unsigned byte integers and store packed maximum values
Computes the absolute differences of the packed unsigned byte integers; the 8 low differences and 8 high differences are
PSADBW xmm1, xmm2/m128 66 0F F6 /r
then summed separately to produce two unsigned word integer results
SSE2 integer instructions for SSE registers only
The following instructions can be used only on SSE registers, since by their nature they do not work on MMX registers
MASKMOVDQU xmm1, xmm2 66 0F F7 /r Non-Temporal Store of Selected Bytes from an XMM Register into Memory
MOVDQ2Q mm, xmm F2 0F D6 /r Move low quadword from XMM to MMX register.
MOVDQA xmm1, xmm2/m128 66 0F 6F /r Move aligned double quadword
MOVDQA xmm2/m128, xmm1 66 0F 7F /r Move aligned double quadword

MOVDQU xmm1, xmm2/m128 F3 0F 6F /r Move unaligned double quadword
MOVDQU xmm2/m128, xmm1 F3 0F 7F /r Move unaligned double quadword
MOVQ2DQ xmm, mm F3 0F D6 /r Move quadword from MMX register to low quadword of XMM register
MOVNTDQ m128, xmm1 66 0F E7 /r Store Packed Integers Using Non-Temporal Hint
PSHUFHW xmm1, xmm2/m128, imm8 F3 0F 70 /r ib Shuffle packed high words.
PSHUFLW xmm1, xmm2/m128, imm8 F2 0F 70 /r ib Shuffle packed low words.

PSHUFD xmm1, xmm2/m128, imm8 66 0F 70 /r ib Shuffle packed doublewords.
PSLLDQ xmm1, imm8 66 0F 73 /7 ib Packed shift left logical double quadwords.
PSRLDQ xmm1, imm8 66 0F 73 /3 ib Packed shift right logical double quadwords.

PUNPCKHQDQ xmm1, xmm2/m128 66 0F 6D /r Unpack and interleave high-order quadwords,
PUNPCKLQDQ xmm1, xmm2/m128 66 0F 6C /r Interleave low quadwords,
SSE3 instructions
Added with Pentium 4 supporting SSE3
SSE3 SIMD floating-point instructions
ADDSUBPS xmm1, xmm2/m128 F2 0F D0 /r Add/subtract single-precision floating-point values

ADDSUBPD xmm1, xmm2/m128 66 0F D0 /r Add/subtract double-precision floating-point values
MOVDDUP xmm1, xmm2/m64 F2 0F 12 /r Move double-precision floating-point value and duplicate for Complex Arithmetic
MOVSLDUP xmm1, xmm2/m128 F3 0F 12 /r Move and duplicate even index single-precision floating-point values
MOVSHDUP xmm1, xmm2/m128 F3 0F 16 /r Move and duplicate odd index single-precision floating-point values
HADDPS xmm1, xmm2/m128 F2 0F 7C /r Horizontal add packed single-precision floating-point values
HADDPD xmm1, xmm2/m128 66 0F 7C /r Horizontal add packed double-precision floating-point values

for Graphics
HSUBPS xmm1, xmm2/m128 F2 0F 7D /r Horizontal subtract packed single-precision floating-point values
HSUBPD xmm1, xmm2/m128 66 0F 7D /r Horizontal subtract packed double-precision floating-point values
SSE3 SIMD integer instructions
LDDQU xmm1, mem F2 0F F0 /r Load unaligned data and return double quadword Instructionally equivalent to MOVDQU. For video encoding
SSSE3 instructions
Added with Xeon 5100 series and initial Core 2
The following MMX-like instructions extended to SSE registers were added with SSSE3
PSIGNB xmm1, xmm2/m128 66 0F 38 08 /r Negate/zero/preserve packed byte integers depending on corresponding sign
PSIGNW xmm1, xmm2/m128 66 0F 38 09 /r Negate/zero/preserve packed word integers depending on corresponding sign
PSIGND xmm1, xmm2/m128 66 0F 38 0A /r Negate/zero/preserve packed doubleword integers depending on corresponding
PSHUFB xmm1, xmm2/m128 66 0F 38 00 /r Shuffle bytes
PMULHRSW xmm1, xmm2/m128 66 0F 38 0B /r Multiply 16-bit signed words, scale and round signed doublewords, pack high 16 bits
PMADDUBSW xmm1, xmm2/m128 66 0F 38 04 /r Multiply signed and unsigned bytes, add horizontal pair of signed words, pack saturated signed-words
PHSUBW xmm1, xmm2/m128 66 0F 38 05 /r Subtract and pack 16-bit signed integers horizontally
PHSUBSW xmm1, xmm2/m128 66 0F 38 07 /r Subtract and pack 16-bit signed integer horizontally with saturation
PHSUBD xmm1, xmm2/m128 66 0F 38 06 /r Subtract and pack 32-bit signed integers horizontally
PHADDSW xmm1, xmm2/m128 66 0F 38 03 /r Add and pack 16-bit signed integers horizontally with saturation
PHADDW xmm1, xmm2/m128 66 0F 38 01 /r Add and pack 16-bit integers horizontally

PHADDD xmm1, xmm2/m128 66 0F 38 02 /r Add and pack 32-bit integers horizontally
PALIGNR xmm1, xmm2/m128, imm8 66 0F 3A 0F /r ib Concatenate destination and source operands, extract byte-aligned result shifted to the right
PABSB xmm1, xmm2/m128 66 0F 38 1C /r Compute the absolute value of bytes and store unsigned result
PABSW xmm1, xmm2/m128 66 0F 38 1D /r Compute the absolute value of 16-bit integers and store unsigned result
PABSD xmm1, xmm2/m128 66 0F 38 1E /r Compute the absolute value of 32-bit integers and store unsigned result
SSE4 instructions
SSE4.1
Added with Core 2 manufactured in 45nm
SSE4.1 SIMD floating-point instructions
66 0F 3A 40 /r
DPPS xmm1, xmm2/m128, imm8 Selectively multiply packed SP floating-point values, add and selectively store
ib
66 0F 3A 41 /r
DPPD xmm1, xmm2/m128, imm8 Selectively multiply packed DP floating-point values, add and selectively store
ib
66 0F 3A 0C /r
BLENDPS xmm1, xmm2/m128, imm8 Select packed single precision floating-point values from specified mask
ib
BLENDVPS xmm1, xmm2/m128,

66 0F 38 14 /r Select packed single precision floating-point values from specified mask
<XMM0>
66 0F 3A 0D /r
BLENDPD xmm1, xmm2/m128, imm8 Select packed DP-FP values from specified mask
ib
BLENDVPD xmm1, xmm2/m128,
66 0F 38 15 /r Select packed DP FP values from specified mask
<XMM0>
66 0F 3A 08 /r
ROUNDPS xmm1, xmm2/m128, imm8 Round packed single precision floating-point values
ib
66 0F 3A 0A /r
ROUNDSS xmm1, xmm2/m32, imm8 Round the low packed single precision floating-point value
ib
66 0F 3A 09 /r
ROUNDPD xmm1, xmm2/m128, imm8 Round packed double precision floating-point values
ib
66 0F 3A 0B /r
ROUNDSD xmm1, xmm2/m64, imm8 Round the low packed double precision floating-point value
ib
66 0F 3A 21 /r Insert a selected single-precision floating-point value at the specified destination element and zero out destination
INSERTPS xmm1, xmm2/m32, imm8
ib elements
66 0F 3A 17 /r
EXTRACTPS reg/m32, xmm1, imm8 Extract one single-precision floating-point value at specified offset and store the result (zero-extended, if applicable)
ib
SSE4.1 SIMD integer instructions
MPSADBW xmm1, xmm2/m128, imm8 66 0F 3A 42 /r ib Sums absolute 8-bit integer difference of adjacent groups of 4 byte integers with starting offset
PHMINPOSUW xmm1, xmm2/m128 66 0F 38 41 /r Find the minimum unsigned word
PMULLD xmm1, xmm2/m128 66 0F 38 40 /r Multiply the packed dword signed integers and store the low 32 bits
PMULDQ xmm1, xmm2/m128 66 0F 38 28 /r Multiply packed signed doubleword integers and store quadword result
PBLENDVB xmm1, xmm2/m128, <XMM0> 66 0F 38 10 /r Select byte values from specified mask
PBLENDW xmm1, xmm2/m128, imm8 66 0F 3A 0E /r ib Select words from specified mask
PMINSB xmm1, xmm2/m128 66 0F 38 38 /r Compare packed signed byte integers

PMINUW xmm1, xmm2/m128 66 0F 38 3A/r Compare packed unsigned word integers
PMINSD xmm1, xmm2/m128 66 0F 38 39 /r Compare packed signed dword integers
PMINUD xmm1, xmm2/m128 66 0F 38 3B /r Compare packed unsigned dword integers

PMAXSB xmm1, xmm2/m128 66 0F 38 3C /r Compare packed signed byte integers
PMAXUW xmm1, xmm2/m128 66 0F 38 3E/r Compare packed unsigned word integers
PMAXSD xmm1, xmm2/m128 66 0F 38 3D /r Compare packed signed dword integers

PMAXUD xmm1, xmm2/m128 66 0F 38 3F /r Compare packed unsigned dword integers
PINSRB xmm1, r32/m8, imm8 66 0F 3A 20 /r ib Insert a byte integer value at specified destination element
PINSRD xmm1, r/m32, imm8 66 0F 3A 22 /r ib Insert a dword integer value at specified destination element
PINSRQ xmm1, r/m64, imm8 66 REX.W 0F 3A 22 /r ib Insert a qword integer value at specified destination element
PEXTRB reg/m8, xmm2, imm8 66 0F 3A 14 /r ib Extract a byte integer value at source byte offset, upper bits are zeroed.
PEXTRW reg/m16, xmm, imm8 66 0F 3A 15 /r ib Extract word and copy to lowest 16 bits, zero-extended
PEXTRD r/m32, xmm2, imm8 66 0F 3A 16 /r ib Extract a dword integer value at source dword offset
PEXTRQ r/m64, xmm2, imm8 66 REX.W 0F 3A 16 /r ib Extract a qword integer value at source qword offset
PMOVSXBW xmm1, xmm2/m64 66 0f 38 20 /r Sign extend 8 packed 8-bit integers to 8 packed 16-bit integers
PMOVZXBW xmm1, xmm2/m64 66 0f 38 30 /r Zero extend 8 packed 8-bit integers to 8 packed 16-bit integers
PMOVSXBD xmm1, xmm2/m32 66 0f 38 21 /r Sign extend 4 packed 8-bit integers to 4 packed 32-bit integers
PMOVZXBD xmm1, xmm2/m32 66 0f 38 31 /r Zero extend 4 packed 8-bit integers to 4 packed 32-bit integers
PMOVSXBQ xmm1, xmm2/m16 66 0f 38 22 /r Sign extend 2 packed 8-bit integers to 2 packed 64-bit integers
PMOVZXBQ xmm1, xmm2/m16 66 0f 38 32 /r Zero extend 2 packed 8-bit integers to 2 packed 64-bit integers
PMOVSXWD xmm1, xmm2/m64 66 0f 38 23/r Sign extend 4 packed 16-bit integers to 4 packed 32-bit integers
PMOVZXWD xmm1, xmm2/m64 66 0f 38 33 /r Zero extend 4 packed 16-bit integers to 4 packed 32-bit integers
PMOVSXWQ xmm1, xmm2/m32 66 0f 38 24 /r Sign extend 2 packed 16-bit integers to 2 packed 64-bit integers
PMOVZXWQ xmm1, xmm2/m32 66 0f 38 34 /r Zero extend 2 packed 16-bit integers to 2 packed 64-bit integers
PMOVSXDQ xmm1, xmm2/m64 66 0f 38 25 /r Sign extend 2 packed 32-bit integers to 2 packed 64-bit integers
PMOVZXDQ xmm1, xmm2/m64 66 0f 38 35 /r Zero extend 2 packed 32-bit integers to 2 packed 64-bit integers
PTEST xmm1, xmm2/m128 66 0F 38 17 /r Set ZF if AND result is all 0s, set CF if AND NOT result is all 0s
PCMPEQQ xmm1, xmm2/m128 66 0F 38 29 /r Compare packed qwords for equality
PACKUSDW xmm1, xmm2/m128 66 0F 38 2B /r Convert 2 × 4 packed signed doubleword integers into 8 packed unsigned word integers with saturation
MOVNTDQA xmm1, m128 66 0F 38 2A /r Move double quadword using non-temporal hint if WC memory type
SSE4a
Added with Phenom processors
66 0F 78 /0 ib ib
EXTRQ Extract Field From Register
66 0F 79 /r
F2 0F 78 /r ib ib
INSERTQ Insert Field
F2 0F 79 /r
MOVNTSD F2 0F 2B /r Move Non-Temporal Scalar Double-Precision Floating-Point

MOVNTSS F3 0F 2B /r Move Non-Temporal Scalar Single-Precision Floating-Point
SSE4.2
Added with Nehalem processors
PCMPESTRI xmm1, xmm2/m128, imm8 66 0F 3A 61 /r imm8 Packed comparison of string data with explicit lengths, generating an index
PCMPESTRM xmm1, xmm2/m128, imm8 66 0F 3A 60 /r imm8 Packed comparison of string data with explicit lengths, generating a mask
PCMPISTRI xmm1, xmm2/m128, imm8 66 0F 3A 63 /r imm8 Packed comparison of string data with implicit lengths, generating an index
PCMPISTRM xmm1, xmm2/m128, imm8 66 0F 3A 62 /r imm8 Packed comparison of string data with implicit lengths, generating a mask
PCMPGTQ xmm1,xmm2/m128 66 0F 38 37 /r Compare packed signed qwords for greater than.
F16C
Half-precision floating-point conversion.
Instruction Meaning
Convert four half-precision floating point values in memory or the bottom half of an XMM register to four single-precision floating-point values
VCVTPH2PS xmmreg,xmmrm64
in an XMM register
Convert eight half-precision floating point values in memory or an XMM register (the bottom half of a YMM register) to eight single-precision
VCVTPH2PS ymmreg,xmmrm128
floating-point values in a YMM register
Convert four single-precision floating point values in an XMM register to half-precision floating-point values in memory or the bottom half an
VCVTPS2PH xmmrm64,xmmreg,imm8
XMM register
VCVTPS2PH xmmrm128,ymmreg,imm8 Convert eight single-precision floating point values in a YMM register to half-precision floating-point values in memory or an XMM register
FMA3
Supported in AMD processors starting with the Piledriver architecture and Intel starting with Haswell processors and Broadwell processors since 2014.
Fused multiply-add (floating-point vector multiply–accumulate) with three operands.
Instruction Meaning
VFMADD132PD
VFMADD213PD Fused Multiply-Add of Packed Double-Precision Floating-Point Values

VFMADD231PD
VFMADD132PS
VFMADD213PS Fused Multiply-Add of Packed Single-Precision Floating-Point Values

VFMADD231PS
VFMADD132SD
VFMADD213SD Fused Multiply-Add of Scalar Double-Precision Floating-Point Values

VFMADD231SD
VFMADD132SS
VFMADD213SS Fused Multiply-Add of Scalar Single-Precision Floating-Point Values

VFMADD231SS
VFMADDSUB132PD
VFMADDSUB213PD Fused Multiply-Alternating Add/Subtract of Packed Double-Precision Floating-Point Values

VFMADDSUB231PD
VFMADDSUB132PS
VFMADDSUB213PS Fused Multiply-Alternating Add/Subtract of Packed Single-Precision Floating-Point Values
VFMADDSUB231PS
VFMSUB132PD
VFMSUB213PD Fused Multiply-Subtract of Packed Double-Precision Floating-Point Values
VFMSUB231PD
VFMSUB132PS
VFMSUB213PS Fused Multiply-Subtract of Packed Single-Precision Floating-Point Values
VFMSUB231PS
VFMSUB132SD
VFMSUB213SD Fused Multiply-Subtract of Scalar Double-Precision Floating-Point Values
VFMSUB231SD
VFMSUB132SS
VFMSUB213SS Fused Multiply-Subtract of Scalar Single-Precision Floating-Point Values
VFMSUB231SS
VFMSUBADD132PD
VFMSUBADD213PD Fused Multiply-Alternating Subtract/Add of Packed Double-Precision Floating-Point Values
VFMSUBADD231PD
VFMSUBADD132PS
VFMSUBADD213PS Fused Multiply-Alternating Subtract/Add of Packed Single-Precision Floating-Point Values
VFMSUBADD231PS
VFNMADD132PD
VFNMADD213PD Fused Negative Multiply-Add of Packed Double-Precision Floating-Point Values
VFNMADD231PD
VFNMADD132PS
VFNMADD213PS Fused Negative Multiply-Add of Packed Single-Precision Floating-Point Values
VFNMADD231PS
VFNMADD132SD
VFNMADD213SD Fused Negative Multiply-Add of Scalar Double-Precision Floating-Point Values
VFNMADD231SD
VFNMADD132SS
VFNMADD213SS Fused Negative Multiply-Add of Scalar Single-Precision Floating-Point Values
VFNMADD231SS
VFNMSUB132PD
VFNMSUB213PD Fused Negative Multiply-Subtract of Packed Double-Precision Floating-Point Values
VFNMSUB231PD
VFNMSUB132PS
VFNMSUB213PS Fused Negative Multiply-Subtract of Packed Single-Precision Floating-Point Values
VFNMSUB231PS
VFNMSUB132SD
VFNMSUB213SD Fused Negative Multiply-Subtract of Scalar Double-Precision Floating-Point Values
VFNMSUB231SD
VFNMSUB132SS
VFNMSUB213SS Fused Negative Multiply-Subtract of Scalar Single-Precision Floating-Point Values
VFNMSUB231SS
AVX
AVX were first supported by Intel with Sandy Bridge and by AMD with Bulldozer.
Vector operations on 256 bit registers.
Instruction Description
VBROADCASTSS
VBROADCASTSD Copy a 32-bit, 64-bit or 128-bit memory operand to all elements of a XMM or YMM vector register.
VBROADCASTF128
Replaces either the lower half or the upper half of a 256-bit YMM register with the value of a 128-bit source operand. The other half of the destination is
VINSERTF128
unchanged.
VEXTRACTF128 Extracts either the lower half or the upper half of a 256-bit YMM register and copies the value to a 128-bit destination operand.
Conditionally reads any number of elements from a SIMD vector memory operand into a destination register, leaving the remaining vector elements unread
VMASKMOVPS and setting the corresponding elements in the destination register to zero. Alternatively, conditionally writes any number of elements from a SIMD vector
register operand to a vector memory operand, leaving the remaining elements of the memory operand unchanged. On the AMD Jaguar processor
architecture, this instruction with a memory source operand takes more than 300 clock cycles when the mask is zero, in which case the instruction should do
VMASKMOVPD
nothing. This appears to be a design flaw.[79]
VPERMILPS Permute In-Lane. Shuffle the 32-bit or 64-bit vector elements of one input operand. These are in-lane 256-bit instructions, meaning that they operate on all
VPERMILPD 256 bits with two separate 128-bit shuffles, so they can not shuffle across the 128-bit lanes.[80]
VPERM2F128 Shuffle the four 128-bit vector elements of two 256-bit source operands into a 256-bit destination operand, with an immediate constant as selector.
VZEROALL Set all YMM registers to zero and tag them as unused. Used when switching between 128-bit use and 256-bit use.
VZEROUPPER Set the upper half of all YMM registers to zero. Used when switching between 128-bit use and 256-bit use.
AVX2
Introduced in Intel's Haswell microarchitecture and AMD's Excavator.
Expansion of most vector integer SSE and AVX instructions to 256 bits
Instruction Description
VBROADCASTSS Copy a 32-bit or 64-bit register operand to all elements of a XMM or YMM vector register. These are register versions of the same instructions in AVX1. There
VBROADCASTSD is no 128-bit version however, but the same effect can be simply achieved using VINSERTF128.
VPBROADCASTB
VPBROADCASTW
Copy an 8, 16, 32 or 64-bit integer register or memory operand to all elements of a XMM or YMM vector register.
VPBROADCASTD
VPBROADCASTQ
VBROADCASTI128 Copy a 128-bit memory operand to all elements of a YMM vector register.
Replaces either the lower half or the upper half of a 256-bit YMM register with the value of a 128-bit source operand. The other half of the destination is
VINSERTI128
unchanged.
VEXTRACTI128 Extracts either the lower half or the upper half of a 256-bit YMM register and copies the value to a 128-bit destination operand.
VGATHERDPD
VGATHERQPD
Gathers single or double precision floating point values using either 32 or 64-bit indices and scale.
VGATHERDPS
VGATHERQPS
VPGATHERDD
VPGATHERDQ
Gathers 32 or 64-bit integer values using either 32 or 64-bit indices and scale.
VPGATHERQD
VPGATHERQQ
VPMASKMOVD Conditionally reads any number of elements from a SIMD vector memory operand into a destination register, leaving the remaining vector elements unread
and setting the corresponding elements in the destination register to zero. Alternatively, conditionally writes any number of elements from a SIMD vector
VPMASKMOVQ register operand to a vector memory operand, leaving the remaining elements of the memory operand unchanged.
VPERMPS
Shuffle the eight 32-bit vector elements of one 256-bit source operand into a 256-bit destination operand, with a register or memory operand as selector.
VPERMD
VPERMPD
Shuffle the four 64-bit vector elements of one 256-bit source operand into a 256-bit destination operand, with a register or memory operand as selector.
VPERMQ
VPERM2I128 Shuffle (two of) the four 128-bit vector elements of two 256-bit source operands into a 256-bit destination operand, with an immediate constant as selector.
VPBLENDD Doubleword immediate version of the PBLEND instructions from SSE4.
VPSLLVD
Shift left logical. Allows variable shifts where each element is shifted according to the packed input.
VPSLLVQ
VPSRLVD
Shift right logical. Allows variable shifts where each element is shifted according to the packed input.
VPSRLVQ
VPSRAVD Shift right arithmetically. Allows variable shifts where each element is shifted according to the packed input.
AVX-512
AVX-512, introduced in 2014, adds 512-bit wide vector registers (extending the 256-bit registers, which become the new registers' lower halves) and
doubles their count to 32; the new registers are thus named zmm0 through zmm31. It adds eight mask registers, named k0 through k7, which may be used
to restrict operations to specific parts of a vector register. Unlike previous instruction set extensions, AVX-512 is implemented in several groups; only the
foundation ("AVX-512F") extension is mandatory.[81] Most of the added instructions may also be used with the 256- and 128-bit registers.
Cryptographic instructions
Intel AES instructions
6 new instructions.
Instruction Encoding Description
AESENC 66 0F 38 DC /r Perform one round of an AES encryption flow
AESENCLAST 66 0F 38 DD /r Perform the last round of an AES encryption flow

AESDEC 66 0F 38 DE /r Perform one round of an AES decryption flow
AESDECLAST 66 0F 38 DF /r Perform the last round of an AES decryption flow
AESKEYGENASSIST 66 0F 3A DF /r ib Assist in AES round key generation

AESIMC 66 0F 38 DB /r Assist in AES Inverse Mix Columns
CLMUL instructions
Instruction Opcode Description
PCLMULQDQ xmmreg,xmmrm,imm 66 0f 3a 44 /r ib Perform a carry-less multiplication of two 64-bit polynomials over the finite field GF(2k).
PCLMULLQLQDQ xmmreg,xmmrm 66 0f 3a 44 /r 00 Multiply the low halves of the two registers.
PCLMULHQLQDQ xmmreg,xmmrm 66 0f 3a 44 /r 01 Multiply the high half of the destination register by the low half of the source register.
PCLMULLQHQDQ xmmreg,xmmrm 66 0f 3a 44 /r 10 Multiply the low half of the destination register by the high half of the source register.
PCLMULHQHQDQ xmmreg,xmmrm 66 0f 3a 44 /r 11 Multiply the high halves of the two registers.
RDRAND and RDSEED
I
RDRAND r16 E
NFx 0F C7 /6
RDRAND r32 Return a random number that has been generated with a CSPRNG (Cryptographically Secure Pseudo-Random Number Generator) P
compliant with NIST SP 800-90A (https://csrc.nist.gov/publications/detail/sp/800-90a/rev-1/final).[a] Z
K
RDRAND r64 NFx REX.W 0F C7 /6
G
RDSEED r16 B
NFx 0F C7 /7 Z
RDSEED r32 Return a random number that has been generated with a HRNG/TRNG (Hardware/"True" Random Number Generator) compliant with
K
NIST SP 800-90B (https://csrc.nist.gov/publications/detail/sp/800-90b/final) and C (https://csrc.nist.gov/publications/detail/sp/800-90c/draft).[a]
Z
RDSEED r64 NFx REX.W 0F C7 /7 G
a. The RDRAND and RDSEED instructions may fail to obtain and return a random number if the CPU's random number generators cannot keep up with the
issuing of these instructions - if this happens, then the instructions may be retried (although the number of retries should be limited, in order to ensure
forward progress). The instructions set EFLAGS.CF to 1 if a random number was successfully obtained and 0 otherwise. Failure to obtain a random
number will also set the instruction's destination register to 0.
Intel SHA instructions
7 new instructions.
SHA1RNDS4 0F 3A CC /r ib Perform Four Rounds of SHA1 Operation

SHA1NEXTE 0F 38 C8 /r Calculate SHA1 State Variable E after Four Rounds
SHA1MSG1 0F 38 C9 /r Perform an Intermediate Calculation for the Next Four SHA1 Message Dwords
SHA1MSG2 0F 38 CA /r Perform a Final Calculation for the Next Four SHA1 Message Dwords
SHA256RNDS2 0F 38 CB /r Perform Two Rounds of SHA256 Operation
SHA256MSG1 0F 38 CC /r Perform an Intermediate Calculation for the Next Four SHA256 Message Dwords
SHA256MSG2 0F 38 CD /r Perform a Final Calculation for the Next Four SHA256 Message Dwords
Intel AES Key Locker instructions
These instructions, available in Tiger Lake and later Intel processors, are designed to enable encryption/decryption with an AES key without having access
to any unencrypted copies of the key during the actual encryption/decryption process.
Instruction Encoding Description Notes
The two explicit operands (which must be register operands)

specify a 256-bit encryption key.
The implicit operand in XMM0 specifies a 128-bit integrity key.
EAX contains flags controlling operation of instruction.
Load internal wrapping key ("IWKey") from xmm1, xmm2 and
LOADIWKEY xmm1,xmm2 F3 0F 38 DC /r After being loaded, the IWKey cannot be directly read from
XMM0.
software, but is used for the key wrapping done by
ENCODEKEY128/256 and checked by the Key Locker
encode/decode instructions.
LOADIWKEY is privileged and can run in Ring 0 only.
Wrap a 128-bit AES key from XMM0 into a 384-bit key handle Source operand specifies handle restrictions to build into the
ENCODEKEY128 r32,r32 F3 0F 38 FA /r handle.
and output handle in XMM0-2.
Destination operand is initialized with information about the
source and attributes of the key.
Wrap a 256-bit AES key from XMM1:XMM0 into a 512-bit key The instruction also modifies XMM4-6 (zeroed out in existing
ENCODEKEY256 r32,32 F3 0F 3A FB /r
handle and output handle in XMM0-3. implementations, but this should not be relied on).
Encrypt xmm using 128-bit AES key indicated by handle at m384
AESENC128KL xmm,m384 F3 0F 38 DC /r
and store result in xmm.
Decrypt xmm using 128-bit AES key indicated by handle at m384

AESDEC128KL xmm,m384 F3 0F 38 DD /r
Encrypt xmm using 256-bit AES key indicated by handle at m512

AESENC256KL xmm,m512 F3 0F 38 DE /r
Decrypt xmm using 256-bit AES key indicated by handle at m512
AESDEC256KL xmm,m512 F3 0F 38 DF /r
All of the Key Locker encode/decode instructions will check
Encrypt XMM0-7 using 128-bit AES key indicated by handle at whether the handle is valid for the current IWKey
AESENCWIDE128KL m384 F3 0F 38 D8 /0 m384 and store each resultant block back to its corresponding and encode/decode data only if the handle is valid.
register. The ZF flag is used to indicate whether the provided handle
was valid (ZF=0) or not (ZF=1).
Decrypt XMM0-7 using 128-bit AES key indicated by handle at
AESDECWIDE128KL m384 F3 0F 38 D8 /1 m384 and store each resultant block back to its corresponding
register.
Encrypt XMM0-7 using 256-bit AES key indicated by handle at
AESENCWIDE256KL m512 F3 0F 38 D8 /2 m512 and store each resultant block back to its corresponding
register.
Decrypt XMM0-7 using 256-bit AES key indicated by handle at

AESDECWIDE256KL m512 F3 0F 38 D8 /3 m512 and store each resultant block back to its corresponding
register.
VIA PadLock instructions
The VIA PadLock instructions are instructions designed to apply cryptographic primitives in bulk, similar to the 8086 repeated string instructions. As
such, unless otherwise specified, they take, as applicable, pointers to source data in ES:rSI and destination data in ES:rDI, and a data-size or count in rCX.
Like the old string instructions, they are all designed to be interruptible.
Padlock subset Instruction Encoding Description Added in
XSTORE 0F A7 C0 Store random bytes to ES:[rDI], and increment ES:rDI accordingly. XSTORE will store currently-
RNG
available bytes, which may be from 0 to 8 bytes. REP XSTORE will write the number of random Nehemiah
Random Number
bytes specified by rCX, waiting for the random number generator when needed. EDX (stepping 3)
Generation. REP XSTORE F3 0F A7 C0 specifies a "quality factor".
REP XCRYPTECB F3 0F A7 C8
ACE REP XCRYPTCBC F3 0F A7 D0
Encrypt/Decrypt data, using the AES cipher in various block modes (ECB, CBC, CTR, CFB Nehemiah
Advanced Cryptography
REP XCRYPTCFB F3 0F A7 E0 and OFB, respectively). rCX contains the number of 16-byte blocks to encrypt/decrypt, rBX (stepping 8)
Engine.
contains a pointer to an encryption key, rAX a pointer to an initialization vector for block
REP XCRYPTOFB F3 0F A7 E8 modes that need it, and rDX a pointer to a control word.[a]
ACE2[b] REP XCRYPTCTR F3 0F A7 D8 C7 "Esther"[82]
REP XSHA1 F3 0F A6 C8 Compute a cryptographic hash (using the SHA-1 and SHA-256 functions, respectively).
PHE
ES:rSI points to data to compute a hash for, ES:rDI points to a message digest and rCX Esther
Hash Engine.
REP XSHA256 F3 0F A6 D0 specifies the number of bytes. rAX should be set to 0 at the start of a calculation.[c]
Perform Montgomery Multiplication. Takes an operand width in ECX (given as a number of

PMM
REP MONTMUL F3 0F A6 C0 bits - must be in range 256..32768 and divisble by 128) and pointer to a data structure in Esther
Montgomery Multiplier.
ES:ESI.[d]
Compute SM3 hash, similar to the REP XSHA* instructions. The rBX register is used to specify
CCS_HASH F3 0F A6 E8
GMI[83][84] hash function (20h for SM3 being the only documented value).
Chinese national
cryptographic Encrypt/Decrypt data, using the SM4 cipher in various block modes. rCX contains the number ZhangJiang
algorithms. (Zhaoxin of 16-byte blocks to encrypt/decrypt, rBX contains a pointer to an encryption key, rDX a
CCS_ENCRYPT F3 0F A7 F0
only.) pointer to an initialization vector for block modes that need it, and rAX contains a control
word.[e]
a. The control word for REP XCRYPT* is a 128-bit c. On VIA Nano and later processors, setting e. The CCS_ENCRYPT control word in rAX has the
data structure with the following layout: rAX to an all-1s value for the REP XSHA* following format:
instructions will enable an alternate operation
Bits Usage mode, where rCX specifies the number of 64- Bits Usage
byte blocks, and where the standard FIPS-
3:0 AES round count 0 0=Encrypt, 1=Decrypt
180-2 length extension procedure at the end
4 Digest mode enable (ACE2 only) of the hash calculation is omitted. This makes 5:1 Must be 10000b for SM4.
for a variant more suitable for data streaming
5
1=allow data that is not 16-byte aligned than the original EAX=0 variant. This 6 ECB block mode
(ACE2 only) functionality also exists for CCS_HASH.
7 CBC block mode

6 Cipher: 0=AES, 1=undefined
d. The data structure to REP MONTMUL contains 8 CFB block mode
Key schedule: 0=compute, 1=load from six 32-bit elements, where the first one is a
7 9 OFB block mode
memory negated modular inverse of the bottom 32 bits
of the modulus and the remaining 5 are 10 CTR block mode
8 0=normal, 1=intermediate-result
pointers to various memory buffers:
11 Digest enable
9 0=encrypt, 1=decrypt
Offset Data item
Key size: 00=128bit, 01=192bit, Remaining bits in rAX must be set to all-0s.
11:10 0 Negated modular inverse
10=256bit, 11=reserved
127:12 Reserved, set to 0 4 Pointer to first multiplicand
8 Pointer to second multiplicand

b. ACE2 also adds extra features to the other
REP XCRYPT instructions: a digest mode for 12 Pointer to result buffer
the CBC and CFB instructions, and the ability
16 Pointer to modulus
to use input/output data that are not 16-byte
aligned for the non-ECB instructions. 20 Pointer to 32-byte scratchpad
Other instructions
x86 also includes discontinued instruction sets which are no longer supported by Intel and AMD, and undocumented instructions which execute but are
not officially documented.
Virtualization instructions
AMD-V instructions
Instruction Opcode Meaning Notes Added in
Basic SVM instructions
VMRUN rAX[a] 0F 01 D8 Run virtual machine Performs a switch to the guest OS.
VMMCALL 0F 01 D9 Call VMM Used exclusively to communicate with VMM.

Load state From Loads a subset of processor state from the VMCB specified by the physical address in the rAX
VMLOAD rAX[a] 0F 01 DA
VMCB register.
VMSAVE rAX[a] 0F 01 DB Save state To VMCB Saves additional guest state to VMCB.
K8[b]
Set Global Interrupt Normally used by the VMM.
STGI 0F 01 DC
Flag
Clear Global Interrupt Available to the VM guest if VGIF feature is supported.

CLGI 0F 01 DD
Flag
Invalidate TLB entry
INVLPGA rAX,ECX[a] 0F 01 DF
in a specified ASID
Invalidates the TLB mapping for the virtual page specified in RAX and the ASID specified in ECX.
Secure Init and Jump Shanghai,

SKINIT EAX 0F 01 DE Verifiable startup of trusted software based on secure hash comparison
with Attestation Phenom II
Virtualization Encrypted State (SEV-ES) instructions

Explicit communication with the VMM for SEV-ES VMs.
VMGEXIT F2/F3 0F 01 D9 SEV-ES Exit to VMM Zen 1
Executed as VMMCALL if not executed by a SEV-ES guest.
Secure Nested Paging (SEV-SNP) Reverse-Map Table instructions
Expands a 2MB-page RMP entry into a corresponding set of contiguous 4KB-page RMP entries.
PSMASH F3 0F 01 FF Page Smash
The 2MB page's system physical address is specified in the RAX register.
Validates or rescinds validation of a guest page's RMP entry. The guest virtual address is specified
PVALIDATE F2 0F 01 FF Page Validate
in the register operand rAX.
Modifies RMP permissions for a guest page. The guest virtual address is specified in the RAX Zen 3
Adjust RMP
RMPADJUST F3 0F 01 FE register. The page size is specified in RCX[0]. The target VMPL and its permissions are specified in
Permissions
the RDX register.
Writes a new RMP entry. The system physical address of a page whose RMP entry is modified is
RMPUPDATE F2 0F 01 FE Write RMP Entry specified in the RAX register. The RCX register provides the effective address of a 16-byte data
structure which contains the new RMP state.
Reads an RMP permission mask for a guest page. The guest virtual address is specified in the RAX
Read RMP
RMPQUERY F3 0F 01 FD register. The target VMPL is specified in RDX[7:0]. RMP permissions for the specified VMPL are Zen 4
Permissions
returned in RDX[63:8] and the RCX register.
a. For the rAX argument to the VMRUN, VMLOAD, VMSAVE and INVLPGA instructions, the choice of AX/EAX/RAX depends on address-size, which can be
overridden with the 67h prefix.
b. Support for AMD-V was added in stepping F of the AMD K8, and is not available on earlier steppings.
Intel VT-x instructions
Instruction Opcode Description Added in
Basic VMX (Virtual Machine Extensions) instructions
VMFUNC NP 0F 01 D4 Invoke VM function specified in EAX.
VMPTRLD m64[a] NP 0F C7 /6 Load pointer to Virtual-Machine Control Structure (VMCS) from memory and mark it valid.
VMPTRST m64[a] NP 0F C7 /7 Store pointer to current VMCS to memory.
Flush VMCS data from CPU to VMCS region in memory. If the specified VMCS is the current VMCS, then the
VMCLEAR m64[a] 66 0F C7 /6
current-VMCS is marked as invalid.
Read a specified field from the current-VMCS. The reg argument specifies which field to read - the result is stored
VMREAD r/m,reg NP 0F 78 /r Prescott 2M,
to r/m.
Yonah,
Write to specified field of current-VMCS. The reg argument specifies which field to write, and the r/m argument Centerton,
VMWRITE reg,r/m NP 0F 79 /r Nano 3000
provides the data item to write to the field.
VMCALL NP 0F 01 C1 Call to VM monitor by causing a VMEXIT.

VMLAUNCH NP 0F 01 C2 Launch virtual machine managed by current VMCS.
VMRESUME NP 0F 01 C3 Resume virtual machine managed by current VMCS.
VMXOFF NP 0F 01 C4 Leave VMX Operation - stops hardware supported virtualisation environment.

[a]
VMXON m64 F3 0F C7 /6 Enter VMX Operation - enters hardware supported virtualisation environment.[b]
Extended Page Tables (EPT) instructions
INVEPT reg,m128 66 0F 38 80 /r Invalidates EPT-derived entries in the TLBs and paging-structure caches. Nehalem,
Centerton,[85]
INVVPID reg,m128 66 0F 38 81 /r Invalidates entries in the TLBs and paging-structure caches based on VPID. ZhangJiang
Trust Domain Extensions (TDX): Secure Arbitration Mode (SEAM) instructions
SEAMCALL 66 0F 01 CF Call to SEAM VMX root operation.
SEAMOPS 66 0F 01 CE Invoke SEAM specific operations.

Sapphire Rapids[86]
SEAMRET 66 0F 01 CD Return to legacy VMX root operation from SEAM VMX root operation.
TDCALL 66 0F 01 CC Call to VM monitor by causing a VMEXIT.
a. The m64 argument to VMPTRLD, VMPTRST, VMCLEAR and VMXON is a 64-bit physical address.
b. The m64 argument to VMXON is the 64-bit physical address to a "VMXON region", which is a 4Kbyte region that must be 4 Kbyte aligned. This region
may be used by the processor to support VMX operation in an implementation-dependent manner and should never be accessed by software until the
processor has left VMX operation through the VMXOFF instruction.
Undocumented instructions
Undocumented x86 instructions
The x86 CPUs contain undocumented instructions which are implemented on the chips but not listed in some official documents. They can be found in
various sources across the Internet, such as Ralf Brown's Interrupt List and at sandpile.org (https://www.sandpile.org/)
Some of these instructions are widely available across many/most x86 CPUs, while others are specific to a narrow range of CPUs.
Undocumented instructions that are widely available across many x86 CPUs include
Mnemonics Opcodes Description Status
ASCII-Adjust-after-Multiply. On the 8086, documented for imm8=0Ah only,

which is used to convert a binary multiplication result to BCD.
AAM imm8 D4 imm8 The actual operation is AH ← AL/imm8; AL ← AL mod imm8 for
any imm8 value (except zero, which produces a divide-by-zero
exception).[87] Available beginning with 8086,
documented for imm8 values other
than 0Ah since Pentium (earlier
ASCII-Adjust-Before-Division. On the 8086, documented for imm8=0Ah only,
documentation lists no arguments).
which is used to convert a BCD value to binary for a following division
instruction.
AAD imm8 D5 imm8
The actual operation is AL ← (AL+(AH*imm8)) & 0FFh; AH ← 0
for any imm8 value.
SALC,
Set AL depending on the value of the Carry Flag (a 1-byte alternative of Available beginning with 8086, but only
D6
SETALC SBB AL, AL) documented since Pentium Pro.
Available since the 8086.

F6 /1 imm8,
[88]
TEST Undocumented variants of the TEST instruction. Performs the same Unavailable on some 80486
F7 /1 imm16/32 operation as the documented F6 /0 and F7 /0 variants, respectively.
steppings.[89][90]
SHL, (D0..D3) /6,
Undocumented variants of the SHL instruction.[88] Performs the same Available since the 80186 (performs
operation as the documented (D0..D3) /4 and (C0..C1) /4 variants,
SAL (C0..C1) /6 imm8 different operation on the 8086)[91]
respectively.
Alias of opcode 80, which provides variants of 8-bit integer instructions (ADD, Available since the 8086.[92] Explicitly
(multiple) 82 /(0..7) imm8 unavailable in 64-bit mode but kept and
OR, ADC, SBB, AND, SUB, XOR, CMP) with an 8-bit immediate argument.[92]
reserved for compatibility.[93]
Available on 8086, but only
OR,AND,XOR 83 /(1,4,6) imm8 16-bit OR/AND/XOR with a sign-extended 8-bit immediate. documented from 80386
onwards.[94][95]
The behavior of the F2 prefix (REPNZ, REPNE) when used with string
REPNZ MOVS F2 (A4..A5) instructions other than CMPS/SCAS is officially undefined, but there exists
commercial software (e.g. the version of FDISK distributed with MS-DOS Available since the 8086.
REPNZ STOS F2 (AA..AB) versions 3.30 to 6.22[96]) that rely on it to behave in the same way as the
documented F3 (REP) prefix.
The use of the REP prefix with the RET instruction is not listed as supported in
either the Intel SDM or the AMD APM. However, AMD's optimization guide for
the AMD-K8 describes the F3 C3 encoding as a way to encode a two-byte
RET instruction - this is the recommended workaround for an issue in the Executes as RET on all known x86
REP RET F3 C3
AMD-K8's branch predictor that can cause branch prediction to fail for some 1- CPUs.
byte RET instructions.[97] At least some versions of gcc are known to use this
encoding.[98]
NOP with address-size override prefix. The use of the 67 prefix for instructions
without memory operands is listed by the Intel SDM (vol 2, section 2.1.1) as
NOP 67 90 Executes as NOP on 80386 and later.
"reserved", but it is used in Microsoft Windows 95 as a workaround for a bug in
the B1 stepping of Intel 80386.[99][100]
ICEBP, Available beginning with 80386,

documented (as INT1) since Pentium
F1 Single byte single-step exception / Invoke ICE Pro. Treated as undocumented
INT1 instruction prefix on 8086 and
80286.[101]
Available on Pentium Pro and AMD
Official long NOP.
K7[104] and later.
NOP r/m 0F 1F /0 Introduced in the Pentium Pro in 1995, but remained Unavailable on AMD K6, AMD
undocumented until March 2006.[29][102][103]
Geode LX, VIA Nehemiah.[105]
Reserved-NOP. Introduced in 65 nm Pentium 4. Intel documentation lists this

opcode as NOP in opcode tables but not instruction listings since June
2005.[106][107] From Broadwell onwards, 0F 0D /1 has been documented as
PREFETCHW.
Available on Intel CPUs since
NOP r/m 0F 0D /r On AMD CPUs, 0F 0D with a memory argument is documented 65 nm Pentium 4.
as PREFETCH/PREFETCHW since K6-2 - originally as part of
3dnow!, but has been kept in later AMD CPUs even after the rest
of 3dnow! was dropped.
UD1 0F B9 /r Intentionally undefined instructions, but unlike UD2 (0F 0B) these instructions All of these opcodes produce #UD
were left unpublished until December 2016.[108][109] exceptions on 80186 and later (except
Microsoft Windows 95 Setup is known to depend on 0F FF being on NEC V20/V30, which assign at least
0F FF to the BRKEM instruction.)
invalid[110][111] - it is used as a self check to test that its #UD
exception handler is working properly.
Other invalid opcodes that are being relied on by commercial

software to produce #UD exceptions include FF FF (DIF-2,[112]
UD0 0F FF LaserLok[113]) and C4 C4 ("BOP"[114][115]), however as of January
2022 they are not published as intentionally invalid opcodes.
Undocumented instructions that appear only in a limited subset of x86 CPUs include

A REP or REPNZ prefix on an IMUL instruction causes the result to be
REP IMUL F3 F6 /5, F3 F7 /5 negated. This is due to the microcode using the “REP prefix present” bit to 8086/8088 only.[116]
store the sign of the result.
A REP or REPNZ prefix on an IDIV instruction causes the quotient to be

REP IDIV F3 F6 /7, F3 F7 /7 negated. This is due to the microcode using the “REP prefix present” bit to 8086/8088 only.[116]
store the sign of the quotient.
Exact purpose unknown, causes CPU hang (HCF). The only way out is CPU
reset.[117]
In some implementations, emulated through BIOS as a halting

SAVEALL, sequence.[118]
0F 04 Only available on 80286
STOREALL In a forum post at the Vintage Computing Federation (https://foru
m.vcfed.org/index.php?threads/i-found-the-saveall-opcode.7151
9/), this instruction is explained as SAVEALL. It interacts with ICE
mode.
Only available on 80286.
LOADALL 0F 05 Loads All Registers from Memory Address 0x000800H Opcode reused for SYSCALL
in AMD K6-2 and later CPUs.
Only available on 80386.
LOADALLD 0F 07 Loads All Registers from Memory Address ES:EDI Opcode reused for SYSRET in
AMD K6-2 and later CPUs.
On the Intel SCC (Single-chip Cloud Computer), invalidate all message buffers.
CL1INVMB 0F 0A[119] The menmonic and operation of the instruction, but not its opcode, are Available on the SCC only.
described in Intel's SCC architecture specification.[120]
On AMD K6 and later maps to FEMMS operation (fast clear of MMX state) but Only available in Red unlock state
PATCH2 0F 0E
on Intel identified as uarch data read on Intel[121] (0F 0F too)
Can change RAM part of microcode

PATCH3 0F 0F Write uarch
on Intel
Available on some 386 and 486

UMOV r,r/m processors only.
0F (10..13) /r Moves data to/from user memory when operating in ICE HALT mode.[122] Acts
UMOV r/m,r as regular MOV otherwise. Opcodes reused for SSE
instructions in later CPUs.
NXOP 0F 55 NexGen hypercode interface.[123] Available on NexGen Nx586 only.
NexGen Nx586 "hyper mode" instructions.
The NexGen Nx586 CPU uses "hyper code"[125] (x86 code

sequences unpacked at boot time and only accessible in a special
"hyper mode" operation mode, similar to DEC Alpha's PALcode)
(multiple) 0F (E0..FB)[124] Available in Nx586 hyper mode only.
for many complicated operations that are implemented with
microcode in most other x86 CPUs. The Nx586 provides a large
number of undocumented instructions to assist hyper mode
operation.
Available on K6-2 and K6-3 only.

Undocumented AMD 3DNow! instruction on K6-2 and K6-3. Swaps 16-bit
words within 64-bit MMX register.[126][127] Opcode reused for
PSWAPW mm,mm/m64 0F 0F /r BB documented PSWAPD
Instruction known to be recognized by MASM 6.13 and 6.14. instruction from AMD K7
onwards.
Using the 64h (FS: segment) prefix with the undocumented D6 Available on the UMC Green CPU
Unknown mnemonic 64 D6 (SALC/SETALC) instruction will, on UMC CPUs only, cause EAX to be set to only. Executes as SALC on non-
0xAB6B1B07.[128][129] UMC CPUs.
Available on NetBurst CPUs only.
64 (70..7F) rel8, On Intel "NetBurst" (Pentium 4) CPUs, the 64h (FS: segment) instruction prefix
will, when used with conditional branch instructions, act as a branch hint to Segment prefixes on
FS: Jcc indicate that the branch will be alternating between taken and not-taken.[130] conditional branches are
64 0F (80..8F) rel16/32 Unlike other NetBurst branch hints (CS: and DS: segment prefixes), this hint is accepted but ignored by non-
not documented. NetBurst CPUs.
Only available on some x86

ALTINST 0F 3F Jump and execute instructions in the undocumented Alternate Instruction Set. processors made by VIA
Technologies.
VEX.66.0F38 On AMD Zen1, FMA4 instructions are present but undocumented (missing
(FMA4) (5C..5F,68..6F,78..7F) /r CPUID flag). The reason for leaving the feature undocumented may or may not Removed from Zen2 onwards.
imm8 have been due to a buggy implementation.[131]
Perform SHA-512 hashing.
Supported by OpenSSL [132] as part of its VIA PadLock support,

REP XSHA512 F3 0F A6 E0 but not documented by the VIA PadLock Programming Guide (htt
p://linux.via.com.tw/support/beginDownload.action?eleid=181&fid
=261).
Instructions to perform modular exponentiation and random number generation,

REP XMODEXP F3 0F A6 F8 respectively.
Only available on some x86
processors made by VIA
Listed in a VIA-supplied patch to add support for VIA Nano- Technologies and Zhaoxin.
specific PadLock instructions to OpenSSL,[133] but not
XRNG2 F3 0F A7 F8 documented by the VIA PadLock Programming Guide.
Detected by CPU fuzzing tools such as SandSifter[134] and UISFuzz[135] as executing without
causing #UD on several different VIA and Zhaoxin CPUs.
Unknown mnemonic 0F A7 (C1..C7) Unknown operation, may be related to the documented XSTORE
(0F A7 C0) instruction.
The whitepapers for SandSifter[134] and UISFuzz[135] report the detection of

large numbers of undocumented instructions in the 3DNow! opcode range on
several different AMD CPUs (at least Geode NX and C-50). Their operation is
not known.
Present on some AMD CPUs with
(unknown, multiple) 0F 0F /r ?? On at least AMD K6-2, all of the unassigned 3DNow! opcodes 3DNow!.
(other than the undocumented PF2IW, PI2FW and PSWAPW
instructions) execute as equivalents of POR (MMX bitwise-OR
instruction).[127]
Zhaoxin RSA/"xmodx" instructions. Mnemonics and CPUID flags are listed in a

Unknown. Some Zhaoxin CPUs[137] have the
MONTMUL2 Unknown Linux kernel patch for OpenEuler,[136] but opcodes and instruction descriptions CPUID flags for these instructions set.
are not available.
Microprocessor Report's article "MediaGX Targets Low-Cost PCs" from 1997, Unknown.
MOVDB, covering the introduction of the Cyrix MediaGX processor, lists several new
instructions that are said to have been added to this processor in order to
Unknown
support its new "Virtual System Architecture" features, including MOVDB and
No specification known to have
GP2MEM been published.
GP2MEM - and also mentions that Cyrix did not intend to publish specifications
for these instructions.[138]
Undocumented x87 instructions

FENI, Documented for the Intel 80287. [71]
FPU Enable
DB E0
FENI8087_NOP Interrupts (8087) Present on all Intel x87 FPUs from 80287 onwards. For FPUs other than the ones where they
were introduced on (8087 for FENI/FDISI and 80287 for FSETPM), they act as NOPs.
FDISI,
FPU Disable These instructions and their operation on modern CPUs are commonly mentioned in later
DB E1
FDISI8087_NOP Interrupts (8087) Intel documentation, but with opcodes omitted and opcode table entries left blank (e.g. Intel
SDM 325462-077, April 2022 (https://cdrdv2-public.intel.com/671200/325462-sdm-vol-1-2abc
FSETPM, d-3abcd.pdf) mentions them twice without opcodes).
FPU Set
FSETPM287_NOP
DB E4 Protected Mode The opcodes are, however, recognized by Intel XED.[139]
(80287)
These opcodes are listed as reserved opcodes that will produce "unpredictable results" without generating
D9 D7, D9 E2, exceptions on at least Cyrix 6x86,[140] 6x86MX, MII, MediaGX, and AMD Geode GX/LX.[141] (The
D9 E7, DD FC, documentation for these CPUs all list the same ten opcodes.)
"Reserved by
(no mnemonic) DE D8, DE DA,
Cyrix" opcodes
DE DC, DE DD, Their actual operation is not known, nor is it known whether their operation is the same on all
DE DE, DF FC of these CPUs.
See also
CLMUL
RDRAND
Larrabee extensions
Advanced Vector Extensions 2
Bit Manipulation Instruction Sets
CPUID
List of discontinued x86 instructions
References
1. "Re: Intel Processor Identification and the CPUID Instruction" (http://ww 21. Cyrix 486SLC/e Data Sheet (1992) (http://www.bitsavers.org/component
w.intel.com/content/www/us/en/processors/processor-identification-cpui s/cyrix/Cyrix_Cx486SLCe_Data_Sheet_1992.pdf), section 2.6.4
d-instruction-note.html?wapkw=processor-identification-cpuid-instructio 22. Robert Collins, CPUID Algorithm Wars (https://web.archive.org/web/200
n). Retrieved 2013-04-21. 01218003500/http://www.rcollins.org/ddj/Nov96/Nov96.html), nov 1996.
2. Andrew Schulman, "Unauthorized Windows 95" (ISBN 1-56884-169-8), Archived from the original (http://www.rcollins.org/ddj/Nov96/Nov96.htm
chapter 8, p.249,257. l) on dec 18, 2000.
3. US Patent 4974159 (https://patents.google.com/patent/US4974159A/), 23. Geoff Chappell, CMPXCHG8B Support in the 32-Bit Windows Kernel (ht
"Method of transferring control in a multitasking computer system" tps://www.geoffchappell.com/studies/windows/km/cpu/cx8.htm), 23 jan
mentions 63h/ARPL. 2008
4. WikiChip, UMIP - x86 (https://en.wikichip.org/wiki/x86/umip) 24. Intel, Software Developer's Manual (https://kib.kiev.ua/x86docs/Intel/SD
5. Oracle Corp, Oracle® VM VirtualBox Administrator's Guide for Release Ms/325462-077.pdf), order no. 325426-077, Nov 2022 - the entry on the
6.0, section 3.5: Details About Software Virtualization (https://docs.oracl RDTSC instruction on p.1739 describes the instruction sequences
e.com/en/virtualization/virtualbox/6.0/admin/swvirt-details.html) required to order the RDTSC instruction with respect to earlier and later
6. MBC Project,Virtual Machine Detection (https://github.com/MBCProject/ instructions.
mbc-markdown/blob/master/anti-behavioral-analysis/detect-vm.md) 25. Linux kernel 5.4.12, /arch/x86/kernel/cpu/centaur.c (https://elixir.bootlin.c
om/linux/v5.4.12/source/arch/x86/kernel/cpu/centaur.c#L110)
7. Michal Necasek, SGDT/SIDT Fiction and Reality (https://www.os2muse
um.com/wp/sgdtsidt-fiction-and-reality/) 26. Stack Overflow, Can constant non-invariant tsc change frequency
8. Intel, How Microarchitectural Data Sampling works (https://www.intel.co across cpu states? (https://stackoverflow.com/questions/62492053/can-
m/content/www/us/en/developer/articles/technical/software-security-guid constant-non-invariant-tsc-change-frequency-across-cpu-states)
ance/technical-documentation/intel-analysis-microarchitectural-data-sa Accessed 24 Jan 2023.
mpling.html), see mitigations section. Archived (https://archive.ph/dRI0 27. CPU-World, CPUID for IDT ZHAOXIN KaiXian KX-U6780 2.7 GHz (by
B) on Apr 22,2022 KZ) (https://www.cpu-world.com/cgi-bin/CPUID.pl?CPUID=78468).
9. Linux kernel documentation, Microarchitectural Data Sampling (MDS) Accessed 24 Jan 2023.
mitigation (https://www.kernel.org/doc/html/latest/x86/mds.html) 28. JookWiki, "nopl" (https://www.jookia.org/wiki/Nopl), sep 24, 2022 -
provides a lengthy account of the history of the long NOP and the
10. sandpile.org, x86 architecture rFLAGS register (https://www.sandpile.or
issues around it. Archived (https://web.archive.org/web/2022102823322
g/x86/flags.htm), see note #7
5/https://www.jookia.org/wiki/Nopl) on oct 28, 2022.
11. "Intel 80386 CPU Information | PCJS Machines" (https://www.pcjs.org/d
ocuments/manuals/intel/80386/#b0-stepping). 29. Intel Community: Multibyte NOP Made Official (https://community.intel.c
om/t5/Software-Archive/Multi-byte-NOP-opcode-made-official/td-p/9325
12. Geoff Chappell, CPU Identification before CPUID (https://www.geoffcha 80). Archived (https://web.archive.org/web/20220407203915/https://com
ppell.com/studies/windows/km/cpu/precpuid.htm?tx=245) munity.intel.com/t5/Software-Archive/Multi-byte-NOP-opcode-made-offic
13. Toth, Ervin (1998-03-16). "BSWAP with 16-bit registers" (https://web.arc ial/td-p/932580) on 7 Apr 2022.
hive.org/web/19991103025640/http://www.df.lth.se/~john_e/gems/gem0 30. Intel Software Developers Manual, vol 3B (https://www.intel.com/conten
00c.html). Archived from the original (http://www.df.lth.se/~john_e/gems/ t/www/us/en/developer/articles/technical/intel-sdm.html) (order no
gem000c.html) on 1999-11-03. "The instruction brings down the upper 253669-076us, December 2021), section 22.15 "Reserved NOP"
word of the doubleword register without affecting its upper 16 bits."
31. AMD, AMD 64-bit Technology - AMD x86-64 Architecture Programmer’s
14. Coldwin, Gynvael (2009-12-29). "BSWAP + 66h prefix" (https://gynvael. Manual Volume 3 (https://kib.kiev.ua/x86docs/AMD/AMD64/24594_APM
coldwind.pl/?id=268). Retrieved 2018-10-03. "internal (zero-)extending _v3-r3.02.pdf), publication no. 24594, rev 3.02, aug 2002, page 379.
the value of a smaller (16-bit) register … applying the bswap to a 32-bit
32. AMD, AMD-K6 Processor Data Sheet (https://www.ardent-tool.com/CP
value "00 00 AH AL", … truncated to lower 16-bits, which are "00 00". …
U/docs/AMD/K6/20695.pdf), order no. 20695H/0, March 1998, section
Bochs … bswap reg16 acts just like the bswap reg32 … QEMU …
ignores the 66h prefix" 24.2, page 283.
15. Intel "i486 Microprocessor" (https://www.ardent-tool.com/CPU/docs/Inte 33. George Dunlap, The Intel SYSRET Privilege Escalation (https://xenproje
l/486/datasheets/240440-001.pdf) (April 1989, order no. 240440-001) ct.org/2012/06/13/the-intel-sysret-privilege-escalation/), The Xen
Project., 13 june 2012
p.142 lists CMPXCHG with 0F A6/A7 encodings.
34. Intel, AP-485: Intel® Processor Identification and the CPUID Instruction
16. "Intel 486 & 486 POD CPUID, S-spec, & Steppings" (http://datasheets.c
hipdb.org/Intel/x86/486/Intel486.htm). (http://kib.kiev.ua/x86docs/Intel/AppNote485/241618-039.pdf), order no.
241618-039, may 2012, section 5.1.2.5, page 32
17. Intel "i486 Microprocessor" (https://www.ardent-tool.com/CPU/docs/Inte
l/486/datasheets/240440-002.pdf) (November 1989, order no. 240440- 35. Michal Necasek, "SYSENTER, Where Are You?" (http://www.os2museu
m.com/wp/sysenter-where-are-you/)
002) p.135 lists CMPXCHG with 0F B0/B1 encodings.
36. AMD, Athlon Processor x86 Code Optimization Guide (https://pdf.datash
18. Frank van Gilluwe, "The Undocumented PC, second edition", 1997,
ISBN 0-201-47950-8, page 55 eetcatalog.com/datasheet/AdvancedMicroDevices/mXvyvs.pdf),
publication no. 22007, rev K, feb 2002, appendix F, page 284
19. Intel, Software Developer’s Manual, vol 3A (http://kib.kiev.ua/x86docs/Int
el/SDMs/253668-078.pdf), order no. 253668-078, Dec 2022, section 37. Transmeta, Processor Recognition (http://datasheets.chipdb.org/Transm
eta/Crusoe/Crusoe_CPUID_5-7-02.pdf), May 7, 2002.
9.3, page 299
20. "RSM—Resume from System Management Mode" (https://web.archive. 38. VIA, VIA C3 Nehemiah Processor Datasheet (http://datasheets.chipdb.o
rg/VIA/Nehemiah/VIA%20C3%20Nehemiah%20Datasheet%20R113.pd
org/web/20120312224625/http://www.softeng.rl.ac.uk/st/archive/SoftEn
g/SESP/html/SoftwareTools/vtune/users_guide/mergedProjects/analyze f), rev 1.13, sep 29, 2004, page 17
r_ec/mergedProjects/reference_olh/mergedProjects/instructions/instruct 39. CPU-World, CPUID for Intel Xeon 3.40 GHz (https://www.cpu-world.co
32_hh/vc279.htm). Archived from the original on 2012-03-12. m/cgi-bin/CPUID.pl?CPUID=75151) - Nocona stepping D CPUID
without CMPXCHG16B
40. CPU-World, CPUID for Intel Xeon 3.60 GHz (https://www.cpu-world.co 62. Intel, 11th Generation Intel® Core™ Processor Desktop Datasheet,
m/cgi-bin/CPUID.pl?CPUID=75154) - Nocona stepping E CPUID with Volume 1 (https://cdrdv2.intel.com/v1/dl/getContent/634648), may 2022,
CMPXCHG16B order no. 634648-004, section 3.5, page 65
41. SuperUser StackExchange, How prevalent are old x64 processors 63. Intel, Which Platforms Support Intel® Software Guard Extensions
lacking the cmpxchg16b instruction? (https://superuser.com/questions/1 (Intel® SGX) SGX2? (https://www.intel.com/content/www/us/en/support/
87254/how-prevalent-are-old-x64-processors-lacking-the-cmpxchg16b-i articles/000058764/software/intel-security-products.html) Archived (http
nstruction) s://archive.ph/nQHXO) on 5 May 2022.
42. Intel SDM order no. 325462-077 (https://www.intel.com/content/www/us/ 64. "The CLDEMOTE Story" (https://twitter.com/InstLatX64/status/1521562
en/developer/articles/technical/intel-sdm.html), apr 2022, vol 2B, p.4- 151848132609). Twitter. Retrieved 2023-01-23.
130 "MOVSX/MOVSXD-Move with Sign-Extension" lists MOVSXD 65. Wikichip, CLZERO - x86 (https://en.wikichip.org/w/index.php?title=x86/c
without REX.W as "discouraged" lzero&oldid=94738)
43. Anandtech, AMD Zen 3 Ryzen Deep Dive Review (https://www.anandte 66. Intel, Application note AP-578: Software and Hardware Considerations
ch.com/show/16214/amd-zen-3-ryzen-deep-dive-review-5950x-5900x-5 for FPU Exception Handlers for Intel Architecture Processors (https://ard
800x-and-5700x-tested/6), nov 5, 2020, page 6 ent-tool.com/CPU/docs/Intel/IA/243291-002.pdf), order no. 243291-002,
44. "Saving Private Ryzen: PEXT/PDEP 32/64b replacement functions for February 1997
#AMD CPUs (BR/#Zen/Zen+/#Zen2) based on @zwegner's zp7" (http 67. Intel, Application Note AP-113: Getting Started With The Numeric Data
s://twitter.com/instlatx64/status/1322503571288559617). Twitter. Processor (https://ardent-tool.com/CPU/docs/Intel/808x/8087/appnotes/
Retrieved 2023-01-20. AP-113.pdf), feb 1981, pages 24-25
45. Wegner, Zach (4 November 2020). "zwegner/zp7" (https://github.com/z 68. Intel, 8087 Math Coprocessor (http://www.datasheetcatalog.com/datash
wegner/zp7). GitHub. eets_pdf/8/0/8/7/8087.shtml), oct 1989, order no. 285385-007, page 3-
46. Intel, Control-flow Enforcement Technology Specification (https://kib.kie 100, fig 9
v.ua/x86docs/Intel/CET/334525-003.pdf) (v3.0, order no. 334525-003, 69. Intel, 80287 80-bit HMOS Numeric Processor Extension (http://www.bits
March 2019) avers.org/components/intel/_dataSheets/80287_Data_Sheet_Feb83.pd
47. Intel SDM, rev 076, December 2021 (https://kib.kiev.ua/x86docs/Intel/S f), feb 1983, order no. 201920-001, page 14
DMs/253665-076.pdf), volume 1, section 18.3.1 70. Intel, iAPX86, 88 User's Manual (https://ardent-tool.com/CPU/docs/Intel/
48. Binutils mailing list: x86: CET v2.0: Update NOTRACK prefix (https://sou 808x/manuals/210201-001.pdf), 1981 (order no. 210201-001), p. 797
rceware.org/pipermail/binutils/2017-June/098516.html) 71. Intel 80286 and 80287 Programmers Reference Manual (http://bitsaver
49. AMD, Extensions to the 3DNow! and MMX Instruction Sets (https://refsp s.trailing-edge.com/components/intel/80286/210498-005_80286_and_8
ecs.linuxfoundation.org/AMD-extensions.pdf), ref no. 22466D/0, March 0287_Programmers_Reference_Manual_1987.pdf), 1987 (order no.
2000, p.11 210498-005), p. 485
50. Hadi Brais, The Significance of the x86 SFENCE instruction (https://hadi 72. Intel Software Developer's Manual (https://kib.kiev.ua/x86docs/Intel/SD
brais.wordpress.com/2019/02/26/the-significance-of-the-x86-sfence-inst Ms/253669-064.pdf) volume 3B, revision 064, section 22.18.9
ruction/), 26 Feb 2019. 73. "GCC Bugzilla – 37179 – GCC emits bad opcode 'ffreep' " (https://gcc.g
51. Intel, Software Developer's Manual (https://kib.kiev.ua/x86docs/Intel/SD nu.org/bugzilla/show_bug.cgi?id=37179).
Ms/325462-077.pdf), order no. 325426-077, Nov 2022, Volume 1, 74. Michael Steil, FFREEP – the assembly instruction that never existed (htt
section 11.4.4.3, page 276. ps://www.pagetable.com/?p=16)
52. Hadi Brais, The Significance of the LFENCE instruction (https://hadibrai 75. Dusko Koncaliev, Pentium FDIV Bug (https://www.cs.earlham.edu/~dus
s.wordpress.com/2018/05/14/the-significance-of-the-x86-lfence-instructi ko/cs63/fdiv.html)
on/), 14 May 2018
76. Bruce Dawson, Intel Underestimates Error Bounds by 1.3 quintillion (htt
53. AMD, Software techniques for managing speculation on AMD processor ps://randomascii.wordpress.com/2014/10/09/intel-underestimates-error-
(https://www.amd.com/system/files/documents/software-techniques-for- bounds-by-1-3-quintillion/)
managing-speculation.pdf), rev 3.8.22, 8 March 2022, page 4. Archived 77. Intel SDM, rev 053 (https://kib.kiev.ua/x86docs/Intel/SDMs/253665-053.
(https://web.archive.org/web/20220313090311/https://www.amd.com/sy
pdf) and later, describes the exact argument reduction procedure used
stem/files/documents/software-techniques-for-managing-speculation.pd
for FSIN, FCOS, FSINCOS and FPTAN in volume 1, section 8.3.8
f) on 13 March 2022.
78. Intel, Intel® 64 and IA-32 Architectures Optimization Reference Manual
54. Intel, Prescott New Instructions Software Developer’s Guide (https://mat (https://kib.kiev.ua/x86docs/Intel/Intel-OptimGuide/248966-029.pdf)
h-atlas.sourceforge.net/devel/assembly/sse3.pdf), order no. 252490- (order no. 248966-044, June 2021) section 3.5.2.3
003, june 2003, pages 3-26 and 3-38 list MONITOR and MWAIT with
explicit operands. 79. "The microarchitecture of Intel, AMD and VIA CPUs: An optimization
guide for assembly programmers and compiler makers" (http://www.agn
55. Flat Assembler messageboard, "BLENDVPS/BLENDVPD/PBLENDVB er.org/optimize/microarchitecture.pdf) (PDF). Retrieved October 17,
syntax" (https://board.flatassembler.net/topic.php?p=98558#98558), 2016.
also covers MONITOR/MWAIT mnemonics
80. "Chess programming AVX2" (https://chessprogramming.wikispaces.co
56. Intel, Intel® Xeon Phi™ Product Family x200 (KNL) User mode (ring 3)
m/AVX2). Retrieved October 17, 2016.
MONITOR and MWAIT (https://web.archive.org/web/20170305002312/h
ttps://software.intel.com/en-us/blogs/2016/10/06/intel-xeon-phi-product-f 81. "Intel AVX-512 Instructions" (https://www.intel.com/content/www/us/en/d
amily-x200-knl-user-mode-ring-3-monitor-and-mwait) (archived 5 mar eveloper/articles/technical/intel-avx-512-instructions.html). Intel.
2017) Retrieved 21 June 2022.
57. AMD, BIOS and Kernel Developer’s Guide (BKDG) For AMD Family 82. Michal Ludvig, VIA PadLock—Wicked Fast Encryption (https://www.linux
10h Processors (https://www.amd.com/system/files/TechDocs/31116.pd journal.com/article/8042), 6 Apr 2005
f), order no. 31116, rev 3.62, page 419 83. Zhaoxin, Core Technology | Instructions for the use of accelerated
58. Guru3D, VIA Zhaoxin x86 4 and 8-core SoC processors launch (https:// instructions for national encryption algorithm based on Zhaoxin
www.guru3d.com/news-story/via-zhaoxin-x86-4-and-8-core-processors-l processor (https://www.zhaoxin.com/prodshow_view.aspx?id=483&nid=
aunched.html), Jan 22, 2018 31&typeid=63&IsActiveTarget=True) (in Chinese). Archived (https://web.
archive.org/web/20220105165938/https://www.zhaoxin.com/prodshow_
59. Vulners, x86: DoS from attempting to use INVPCID with a non- view.aspx?id=483&nid=31&typeid=63&IsActiveTarget=True) on Jan 5,
canonical addresses (https://vulners.com/xen/XSA-279), 20 nov 2018 2022
60. Intel, Intel® 64 and IA-32 Architectures Software Developer’s Manual (ht 84. Zhaoxin, GMI User Manual v1.0 (https://github.com/ZXOpenSource/Ope
tp://kib.kiev.ua/x86docs/Intel/SDMs/325384-078.pdf) volume 3, order
nSSL-ZX-GMI/blob/63e8835e1854042ac949a04eb87c604cb1d7d5ea/G
no. 325384-078, december 2022, chapter 23.15 MI%20User%20Manual%20V1.0.pdf) (in Chinese). Archived (https://we
61. Catherine Easdon, Undocumented CPU Behaviour on x86 and RISC-V b.archive.org/web/20220228071045/https://raw.githubusercontent.com/
Microarchitectures: A Security Perspective (https://www.cattius.com/ima ZXOpenSource/OpenSSL-ZX-GMI/master/GMI%20User%20Manual%2
ges/thesis-unsigned.pdf), 10 May 2019, page 39 0V1.0.pdf) on Feb 28, 2022
85. Intel, Intel® Atom™ Processor S1200 Product Family for Microserver 112. "Invalid opcode handling \ VOGONS" (https://www.vogons.org/viewtopi
Datasheet, Volume 1 of 2 (https://www.intel.com/content/dam/www/publi c.php?t=13379). Archived (https://web.archive.org/web/2022040907115
c/us/en/documents/datasheets/atom-processor-s1200-datasheet-vol-1.p 9/https://www.vogons.org/viewtopic.php?t=13379) from the original on 9
df), order no. 328194-001, dec 2012, page 44 Apr 2022.
86. SecurityWeek, Intel Adds TDX to Confidential Computing Portfolio With 113. "Invalid instructions cause exit even if Int 6 is hooked \ VOGONS" (http
Launch of 4th Gen Xeon Processors (https://www.securityweek.com/inte s://www.vogons.org/viewtopic.php?t=21418). Archived (https://web.archi
l-adds-tdx-confidential-computing-portfolio-launch-4th-gen-xeon-process ve.org/web/20220409071155/https://www.vogons.org/viewtopic.php?t=2
ors), 10 jan 2023 1418) from the original on 9 Apr 2022.
87. Robert Collins, Undocumented OpCodes: AAM (http://www.rcollins.org/s 114. "Tutorial - Calling Win32 from DOS" (https://www.ragestorm.net/tutorial?
ecrets/opcodes/AAM.html) id=27). Ragestorm. 17 Sep 2005. Archived (https://web.archive.org/web/
88. Frank van Gilluwe, "The Undocumented PC - Second Edition", p. 93-95 20220409071159/https://www.ragestorm.net/tutorial?id=27) from the
89. Michal Necasek, Intel 486 Errata? (http://www.os2museum.com/wp/intel original on 4 Apr 2022.
-486-errata/) 115. "Accessing Windows device drivers from DOS programs" (https://sta.c6
4.org/blog/dosvddaccess.html).
90. Robert Hummel, "PC Magazine Programmer's Technical Reference"
(ISBN 1-56276-016-5) p.728 116. "8086 microcode disassembled" (https://www.reenigne.org/blog/8086-mi
91. Raúl Gutiérrez Sanz, Undocumented 8086 Opcodes, Part I (http://www. crocode-disassembled/). Reengine blog. 2020-09-03. Retrieved
2022-07-26. "Using the REP or REPNE prefix with a MUL or IMUL
os2museum.com/wp/undocumented-8086-opcodes-part-i/)
instruction negates the product. Using the REP or REPNE prefix with an
92. "Asm, opcode 82h" (http://computer-programming-forum.com/46-asm/1 IDIV instruction negates the quotient."
43edbd28ae1a091.htm).
117. "Re: Undocumented opcodes (HINT_NOP)" (https://web.archive.org/we
93. Intel Corporation 2022, p. 3698. b/20041106070621/http://www.sandpile.org/post/msgs/20004129.htm).
94. Intel, The 8086 Family User's Manual, October 1979 (http://bitsavers.or Archived from the original (http://www.sandpile.org/post/msgs/2000412
g/components/intel/8086/9800722-03_The_8086_Family_Users_Manua 9.htm) on 2004-11-06. Retrieved 2010-11-07.
l_Oct79.pdf), opcodes omitted on pages 4-25 and 4-31 118. "Re: Also some undocumented 0Fh opcodes" (https://web.archive.org/w
95. Retrocomputing StackExchange, Undocumented instructions in x86 eb/20030626044017/http://www.sandpile.org/post/msgs/20003986.htm).
CPU prior to 80386? (https://retrocomputing.stackexchange.com/questi Archived from the original (http://www.sandpile.org/post/msgs/2000398
ons/20031/undocumented-instructions-in-x86-cpu-prior-to-80386) 6.htm) on 2003-06-26. Retrieved 2010-11-07.
96. Daniel B. Sedory, An Examination of the Standard MBR (https://thestar 119. Intel's RCCE library (https://github.com/Intel-SCC/RCCE/blob/master/sr
man.pcministry.com/asm/mbr/STDMBR.htm#REP) c/RCCE_admin.c#L87) for the SCC uses opcode 0F 0A for SCC's
97. AMD, Software Optimization Guide for AMD64 Processors (https://www. message invalidation instruction.
amd.com/system/files/TechDocs/25112.PDF) (publication 25112, 120. Intel Labs, SCC External Architecture Specification (EAS), Revision
revision 3.06, sep 2005), section 6.2, p.128 0.94 (https://www.intel.com/content/dam/www/public/us/en/documents/t
98. Bug 48227 - "rep ret" generated for -march=core2 (https://gcc.gnu.org/b echnology-briefs/intel-labs-single-chip-cloud-architecture-brief.pdf), p.29
ugzilla/show_bug.cgi?id=48227) 121. "Undocumented x86 instructions to control the CPU at the
99. Raymond Chen, My, what strange NOPs you have! (https://devblogs.mi microarchitecture level in modern Intel processors" (https://raw.githubus
crosoft.com/oldnewthing/20110112-00/?p=11773) ercontent.com/chip-red-pill/udbgInstr/main/paper/undocumented_x86_in
100. Jeff Parsons, Intel 80386 CPU information (https://www.pcjs.org/docum sts_for_uarch_control.pdf) (PDF). 9 July 2021.
ents/manuals/intel/80386/#b1-errata) (B1 errata section, item #7) 122. Robert R. Collins, Undocumented OpCodes: UMOV (http://www.rcollins.
101. Retrocomputing StackExchange, 0F1h opcode-prefix on i80286 (https:// org/secrets/opcodes/UMOV.html)
retrocomputing.stackexchange.com/questions/12004/0f1h-opcode-prefi 123. Herbert Oppmann, NXOP (Opcode 0Fh 55h) (https://www.memotech.fra
x-on-i80286) nken.de/NexGen/Opcode0F55.html)
102. Intel Software Developers Manual, volume 2B (https://kib.kiev.ua/x86do 124. Herbert Oppmann, NexGen Nx586 Hypercode Source (https://www.me
cs/Intel/SDMs/253667-018.pdf) (Jan 2006, order no 235667-018, does motech.franken.de/NexGen/Source/index.html), see COMMON.INC
not have long NOP) 125. Herbert Oppmann, Inside the NexGen Nx586 System BIOS (https://ww
103. Intel Software Developers Manual, volume 2B (https://kib.kiev.ua/x86do w.memotech.franken.de/NexGen/Bios.html)
cs/Intel/SDMs/253667-019.pdf) (March 2006, order no 235667-019, has 126. Grzegorz Mazur, AMD 3DNow! undocumented instructions (https://web.
long NOP) archive.org/web/20000121143428/http://x86.ddj.com/articles/3dnow/am
104. Agner Fog, Instruction Tables (https://www.agner.org/optimize/instructio d_3dnow.htm)
n_tables.pdf), AMD K7 section. 127. "Undocumented 3DNow! Instructions" (https://web.archive.org/web/200
105. "579838 – glibc not compatible with AMD Geode LX" (https://bugzilla.re 30130030723/http://grafi.ii.pw.edu.pl/gbm/x86/3dundoc.html).
dhat.com/show_bug.cgi?id=579838#c46). grafi.ii.pw.edu.pl. Archived from the original (http://grafi.ii.pw.edu.pl/gbm/
106. Intel Software Developers Manual, volume 2B (https://kib.kiev.ua/x86do x86/3dundoc.html) on 30 January 2003. Retrieved 22 February 2022.
cs/Intel/SDMs/253667-015.pdf) (April 2005, order no 235667-015, does 128. Potemkin's Hacker Group's OPCODE.LST, v4.51 (http://phg.chat.ru/opc
not list 0F0D-nop) ode.txt)
107. Intel Software Developers Manual, volume 2B (https://kib.kiev.ua/x86do 129. "[UCA CPU Analysis] Prototype UMC Green CPU U5S-SUPER33" (http
cs/Intel/SDMs/253667-016.pdf) (June 2005, order no 235667-016, lists s://x86.fr/uca-cpu-analysis-prototype-umc-green-cpu-u5s-super33). 25
0F0D-nop in opcode table but not under NOP instruction description.) May 2020.
108. Intel Software Developers Manual, volume 2B (https://kib.kiev.ua/x86do 130. Agner Fog, The Microarchitecture of Intel, AMD and VIA CPUs (https://w
cs/Intel/SDMs/253667-060.pdf) (order no. 253667-060, September ww.agner.org/optimize/microarchitecture.pdf), section 3.4 "Branch
2016) does not list UD0 and UD1. Prediction in P4 and P4E".
109. Intel Software Developers Manual, volume 2B (https://kib.kiev.ua/x86do 131. Reddit /r/Amd discussion thread: Ryzen has undocumented support for
cs/Intel/SDMs/253667-061.pdf) (order no. 253667-061, December FMA4 (https://www.reddit.com/r/Amd/comments/68s4bj/ryzen_has_und
2016) lists UD0 and UD1. ocumented_support_for_fma4/dh0y353/)
110. "PCJS : pcjs/x86op0F.js (two-byte x86 opcode handlers), lines 1647- 132. "Welcome to the OpenSSL Project" (https://github.com/openssl/openssl/
1651" (https://github.com/jeffpar/pcjs/blob/e565ffa65d8ee5d600ec04e62 blob/1aa89a7a3afb053d0c0b7fad8d3ea1b0a5447289/engines/asm/e_p
c6651dabb4894cb/machines/pcx86/lib/x86op0f.js#L1647). GitHub. 17 adlock-x86.pl#L597). GitHub. 21 April 2022.
April 2022. 133. PATCH: Update PadLock engine for VIA C7 and Nano CPUs (https://ma
111. "80486 paging protection faults? \ VOGONS" (https://www.vogons.org/vi rc.info/?l=openssl-dev&m=130767391615291&w=2)
ewtopic.php?t=62949). Archived (https://web.archive.org/web/20220409 134. Christopher Domas, Breaking the x86 ISA (https://github.com/xoreaxeax
071156/https://www.vogons.org/viewtopic.php?t=62949) from the eax/sandsifter/blob/master/references/domas_breaking_the_x86_isa_w
original on 9 Apr 2022. p.pdf)
135. Xixing Li et al, UISFuzz: An Efficient Fuzzing Method for CPU 138. Microprocessor Report, MediaGX Targets Low-Cost PCs (http://www.ce
Undocumented Instruction Searching (https://ieeexplore.ieee.org/abstra cs.uci.edu/~papers/mpr/MPR/ARTICLES/110301.PDF) (vol 11, no. 3,
ct/document/8863327) mar 10, 1997)
136. OpenEuler mailing list, PATCH kernel-4.19 v2 5/6 : x86/cpufeatures: 139. ISA datafile for Intel XED (https://github.com/intelxed/xed/blob/ef19f00d
Add Zhaoxin feature bits (https://mailweb.openeuler.org/hyperkitty/list/ke e14a9c2c253c1c9b1119e1617280e3f2/datafiles/xed-isa.txt#L916) (April
rnel@openeuler.org/thread/W6GXBRRO6OKNHVJ3WDDUXSLQGI2G 17, 2022), lines 916-944
FU4X/). Archived (https://web.archive.org/web/20220409071314/https:// 140. Cyrix 6x86 processor data book (https://www.ardent-tool.com/CPU/doc
mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/thread/W6G s/Cyrix/6x86/94175.pdf), page 6-34
XBRRO6OKNHVJ3WDDUXSLQGI2GFU4X/) on 9 Apr 2022. 141. AMD Geode LX Processors Data Book (https://www.amd.com/system/fil
137. CPUID dump for Zhaoxin KaiXian-U6870 (http://users.atw.hu/instlatx64/ es/TechDocs/33234H_LX_databook.pdf), publication 33234H, p.670
CentaurHauls/CentaurHauls00307B1_U6780A_CPUID2.txt), see
C0000001 line. Archived (https://web.archive.org/web/2022040907131
4/http://users.atw.hu/instlatx64/CentaurHauls/CentaurHauls00307B1_U
6780A_CPUID2.txt) on 9 Apr 2022.
Intel Corporation (April 2022). "Intel 64 and IA-32 Architectures Software Developer's Manual, Combined Volumes: 1, 2A, 2B, 2C, 2D, 3A, 3B, 3C, 3D
and 4" (https://cdrdv2.intel.com/v1/dl/getContent/671200). Intel. Retrieved 21 June 2022.
External links
Free IA-32 and x86-64 documentation (https://software.intel.com/en-us/articles/intel-sdm), provided by Intel
x86 Opcode and Instruction Reference (http://ref.x86asm.net)
x86 and amd64 instruction reference (https://www.felixcloutier.com/x86/index.html)
Instruction tables: Lists of instruction latencies, throughputs and micro-operation breakdowns for Intel, AMD and VIA CPUs (https://www.agner.org/opti
mize/instruction_tables.pdf)
Netwide Assembler Instruction List (https://www.nasm.us/doc/nasmdocb.html) (from Netwide Assembler)
Retrieved from "https://en.wikipedia.org/w/index.php?title=X86_instruction_listings&oldid=1143743363"

x86 Instruction Listings

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

x86 Instruction Listings

Uploaded by

Copyright:

Available Formats

3/14/23, 12:35 AM x86 instruction listings - Wikipedia

x86 instruction listings

x86 integer instructions

Original 8086/8088 instructions

Original 8086/8088 instruction set

AAS ASCII adjust AL after subtraction 0x3F

CBW Convert byte to word 0x98

CLD Clear direction flag DF = 0; 0xFC

CLI Clear interrupt flag IF = 0; 0xFA

Compare bytes in memory. May be used with a

CWD Convert word to doubleword 0x99

(1) AX = DX:AX / r/m; resulting DX = remainder (2) AL = AX / r/m; resulting

HLT Enter halt state 0xF4

(1) AX = DX:AX / r/m; resulting DX = remainder (2) AL = AX / r/m; resulting

(1) AL = port[imm]; (2) AL = port[DX]; (3) AX = port[imm]; (4) AX =

INTO Call to interrupt if overflow 0xCE

IRET Return from interrupt 0xCF

JCXZ Jump if CX is zero 0xE3

LDS Load DS:r with far pointer 0xC5

LEA Load Effective Address 0x8D

Instruction Meaning Notes Opcode

LES Load ES:r with far pointer 0xC4

LOCK Assert BUS LOCK# signal (for multiprocessing) 0xF0

Load string word. May be used with a REP

NEG Two's complement negation r/m *= -1; 0xF6/3…0xF7/3

NOT Negate the operand, logical NOT r/m ^= -1; 0xF6/2…0xF7/2

0x07, 0x0F(8086/8088 only),

POPF Pop FLAGS register from stack FLAGS = *SP++; 0x9D

PUSHF Push FLAGS onto stack *--SP = FLAGS; 0x9C

RETF Return from far procedure 0xCA, 0xCB

SAHF Store AH into FLAGS 0x9E

Compare byte string. May be used with a REP

STD Set direction flag DF = 1; 0xFD

Instruction Meaning Notes Opcode

STI Set interrupt flag IF = 1; 0xFB

Store byte in string. May be used with a REP

0x84, 0x85, 0xA8, 0xA9,

Added in specific processors

Added with 80186/80188

Instruction Opcode Meaning Notes

Check array index against

IMUL immediate Signed and unsigned example:

Added with 80286

Instruction Opcode Meaning Notes

Available in 16/32-bit protected mode only.

Sets ZF=1 if the descriptor could be loaded, ZF=0 otherwise.

Load segment limit

LTR r/m16 0F 00 /3 Load Task Register

LMSW is a serializing instruction.

Verify a segment for

Sets ZF=1 if segment can be written, ZF=0 otherwise.

Added with 80386

Instruction Meaning Notes

BSF Bit scan forward

BTC Bit test and complement

IBTS Insert Bit String Discontinued with B1 step of 80386.

Interrupt return; D suffix means 32-bit

LFS, LGS Load far pointer

POPFD Pop data into EFLAGS register

Push all double-word (32-bit) registers onto

Push a double-word (32-bit) value onto

r1 = r1>>CL ∣ r2<<(register_width - CL); Instead of CL, 8-bit immediate can be used.

Discontinued with B1 step of 80386.