Professional Documents
Culture Documents
The x86 instruction set has been extended several times, introducing wider registers and datatypes as well as new functionality.[1]
https://en.wikipedia.org/wiki/X86_instruction_listings 1/53
3/14/23, 12:35 AM x86 instruction listings - Wikipedia
Below is the full 8086/8088 instruction set of Intel (81 instructions total). Most if not all of these instructions are available in 32-bit mode; they just
operate on 32-bit registers (eax, ebx, etc.) and values instead of their 16-bit (ax, bx, etc.) counterparts. The updated instruction set is also grouped
according to architecture (i386, i486, i686) and more generally is referred to as (32-bit) x86 and (64-bit) x86-64 (also known as AMD64).
AAA ASCII adjust AL after addition used with unpacked binary-coded decimal 0x37
8086/8088 datasheet documents only base 10 version of the AAD instruction
(opcode 0xD5 0x0A), but any other base will work. Later Intel's documentation has
AAD ASCII adjust AX before division the generic form too. NEC V20 and V30 (and possibly other NEC V-series CPUs) 0xD5
always use base 10, and ignore the argument, causing a number of
incompatibilities
AAM ASCII adjust AX after multiplication Only base 10 version (Operand is 0xA) is documented, see notes for AAD 0xD4
0x00…0x05, 0x80/0…
ADD Add (1) r/m += r/imm; (2) r += m/imm; 0x81/0, 0x82/0…0x83/0
(since 80186)
0x20…0x25, 0x80…0x81/4,
AND Logical AND (1) r/m &= r/imm; (2) r &= m/imm;
0x82…0x83/4 (since 80186)
CALL Call procedure push eip; eip points to the instruction directly after the call 0x9A, 0xE8, 0xFF/2, 0xFF/3
0x38…0x3D, 0x80…0x81/7,
CMP Compare operands
0x82…0x83/7 (since 80186)
DAA Decimal adjust AL after addition (used with packed binary-coded decimal) 0x27
DAS Decimal adjust AL after subtraction 0x2F
0x48…0x4F, 0xFE/1,
DEC Decrement by 1
0xFF/1
0x40…0x47, 0xFE/0,
INC Increment by 1
0xFF/0
INT Call to interrupt 0xCC, 0xCD
0xE9…0xEB, 0xFF/4,
JMP Jump
0xFF/5
LAHF Load FLAGS into AH register 0x9F
https://en.wikipedia.org/wiki/X86_instruction_listings 2/53
3/14/23, 12:35 AM x86 instruction listings - Wikipedia
LOOP/LOOPx Loop control (LOOPE, LOOPNE, LOOPNZ, LOOPZ) if (x && --CX) goto lbl; 0xE0…0xE2
MOV Move copies data from one location to another, (1) r/m = r; (2) r = r/m; 0xA0...0xA3
if (DF==0)
Move byte from string to string. May be used *(byte*)DI++ = *(byte*)SI++;
MOVSB with a REP prefix to repeat the instruction CX else 0xA4
times. *(byte*)DI-- = *(byte*)SI--;
if (DF==0)
Move word from string to string. May be used
*(word*)DI++ = *(word*)SI++;
MOVSW with a REP prefix to repeat the instruction CX else 0xA5
times. *(word*)DI-- = *(word*)SI--;
MUL Unsigned multiply (1) DX:AX = AX * r/m; (2) AX = AL * r/m; 0xF7/4, 0xF6/4
0x08…0x0D, 0x80…0x81/1,
OR Logical OR (1) r/m |= r/imm; (2) r |= m/imm;
0x82…0x83/1 (since 80186)
(1) port[imm] = AL; (2) port[DX] = AL; (3) port[imm] = AX; (4) port[DX]
OUT Output to port 0xE6, 0xE7, 0xEE, 0xEF
= AX;
0xC0…0xC1/2 (since
RCL Rotate left (with carry)
80186), 0xD0…0xD3/2
0xC0…0xC1/3 (since
RCR Rotate right (with carry)
80186), 0xD0…0xD3/3
REPxx Repeat MOVS/STOS/CMPS/LODS/SCAS (REP, REPE, REPNE, REPNZ, REPZ) 0xF2, 0xF3
Not a real instruction. The assembler will translate these to a RETN or a RETF
RET Return from procedure
depending on the memory model of the target system.
RETN Return from near procedure 0xC2, 0xC3
0xC0…0xC1/0 (since
ROL Rotate left
80186), 0xD0…0xD3/0
0xC0…0xC1/1 (since
ROR Rotate right
80186), 0xD0…0xD3/1
0xC0…0xC1/4 (since
SAL Shift Arithmetically left (signed shift left) (1) r/m <<= 1; (2) r/m <<= CL;
80186), 0xD0…0xD3/4
0xC0…0xC1/7 (since
SAR Shift Arithmetically right (signed shift right) (1) (signed) r/m >>= 1; (2) (signed) r/m >>= CL;
80186), 0xD0…0xD3/7
alternative 1-byte encoding of SBB AL, AL is available via undocumented SALC 0x18…0x1D, 0x80…0x81/3,
SBB Subtraction with borrow
instruction 0x82…0x83/3 (since 80186)
0xC0…0xC1/4 (since
SHL Shift left (unsigned shift left)
80186), 0xD0…0xD3/4
0xC0…0xC1/5 (since
SHR Shift right (unsigned shift right)
80186), 0xD0…0xD3/5
STC Set carry flag CF = 1; 0xF9
https://en.wikipedia.org/wiki/X86_instruction_listings 3/53
3/14/23, 12:35 AM x86 instruction listings - Wikipedia
0x28…0x2D, 0x80…0x81/5,
SUB Subtraction (1) r/m -= r/imm; (2) r -= m/imm;
0x82…0x83/5 (since 80186)
XCHG Exchange data r :=: r/m; A spinlock typically uses xchg as an atomic operation. (coma bug). 0x86, 0x87, 0x91…0x97
XLAT Table look-up translation behaves like MOV AL, [BX+AL] 0xD7
0x30…0x35, 0x80…0x81/6,
XOR Exclusive OR (1) r/m ^= r/imm; (2) r ^= m/imm;
0x82…0x83/6 (since 80186)
Modifies stack for entry to procedure for high level language. Takes two operands: the
ENTER C8 iw ib Enter stack frame
amount of storage to be allocated on the stack and the nesting level of the procedure.
equivalent to:
6C
IN AX, DX
INSB/INSW Input from port to string
MOV ES:[DI], AX
; adjust DI according to operand size and DF
6D
LEAVE C9 Leave stack frame Releases the local stack storage created by the previous ENTER instruction.
equivalent to:
6E
MOV AX, DS:[SI]
OUTSB/OUTSW Output string to port
OUT DX, AX
; adjust SI according to operand size and DF
6F
equivalent to:
POP DI
POP SI
POP BP
Pop all general purpose POP AX ; no POP SP here, all it does is ADD SP, 2 (since AX will be overwritten
POPA 61
registers from stack later)
POP BX
POP DX
POP CX
POP AX
equivalent to:
PUSH AX
PUSH CX
PUSH DX
Push all general purpose
PUSHA 60 PUSH BX
registers onto stack PUSH SP ; The value stored is the initial SP value
PUSH BP
PUSH SI
PUSH DI
example:
6A ib
Push an immediate byte/word
PUSH immediate PUSH 12h
value onto the stack PUSH 1200h
68 iw
https://en.wikipedia.org/wiki/X86_instruction_listings 4/53
3/14/23, 12:35 AM x86 instruction listings - Wikipedia
Note that since the lower half is the same for unsigned and signed
multiplication, this version of the instruction can be used for unsigned
multiplication as well.
example:
C0
SHL/SHR/SAL/SAR/ROL/ROR/RCL/RCR Rotate/shift bits with an
ROL AX,3
immediate immediate value greater than 1 SHR BL,3
C1
Adjust RPL field of Causes #UD in Real mode and Virtual 8086 Mode - Windows 95 and OS/2 2.x are known to make
ARPL r/m16, r16 63 /r
selector extensive use of this #UD to use the 63 opcode as a one-byte breakpoint to transition from Virtual 8086
Mode to kernel mode.[2][3]
Clear task-switched
CLTS 0F 06 flag in Machine
Status Word.
Each of these instructions loads a 2-part table descriptor. The first part is a 16-bit value, specifying table size in bytes minus
Load Global
1. The second part is a 32-bit value (64-bit value in 64-bit mode), specifying the linear start address for the table. This
LGDT m16&32 0F 01 /2 Descriptor Table
address is ANDed with 00FFFFFFh for the 16-bit variants of these instructions.
Register
LIDT can relocate the Interrupt Vector Table in Real Mode as well.
Load Interrupt
LIDT m16&32 0F 01 /3 Descriptor Table LGDT and LIDT are serializing instructions.
Register
Load Local
LLDT r/m16 0F 00 /2 Descriptor Table
Register LLDT and LTR are serializing instructions.
On 80386 and later, the "Machine Status Word" is the same as the CR0 register, however LMSW can only modify the
bottom 4 bits of this register.
LMSW can be used to enter but not leave x86 Protected Mode. On the 80286, it is not possible to leave
Load Machine Protected Mode at all without a CPU reset - on 80386 and later, it is possible to leave Protected Mode,
LMSW r/m16 0F 01 /6
Status Word
but this requires the use of the 80386-and-later MOV to CR0 instruction.
Store Global The SGDT,SIDT,SLDT,SMSW,STR were unprivileged on all x86 CPUs from 80286 onwards until the introduction of UMIP in
SGDT m16&32 0F 01 /0 Descriptor Table 2017.[4]
Register
This has been a significant security problem for software-based virtualization, since it enables these
Store Interrupt instructions to be used by a VM guest to detect that it is running inside a VM.[5][6]
SIDT m16&32 0F 01 /1 Descriptor Table
Register The 16-bit variants of the SGDT and SIDT instructions also show a difference between Intel
documentation and actual behavior observed on Intel CPUs: as of Intel SDM revision 076, December
Store Local 2021 (https://kib.kiev.ua/x86docs/Intel/SDMs/325462-076.pdf), the last 8 bits of the descriptor is
SLDT r/m16 0F 00 /0 Descriptor Table
Register documented as being written as 0, however observed behavior is that bits 31:24 of the descriptor table
address are written instead.[7]
Store Machine
SMSW r/m16 0F 01 /4
Status Word SLDT and SMSW (but not STR) with a 32-bit register argument are documented to set the top 16 bits
of the specified register to an undefined value on Intel CPUs.
STR r/m16 0F 00 /1 Store Task Register
Verify a segment for On some Intel CPU/microcode combinations from 2019 onwards, the VERW instruction also flushes
VERW r/m16 0F 00 /5
writing microarchitectural data buffers. This enables it to be used as part of workarounds for Microarchitectural
Data Sampling security vulnerabilities.[8][9]
LOADALL 0F 05 Load all CPU Undocumented, 80286 only. (A different variant of LOADALL with a different opcode and memory layout exists on 80386.)
registers, including
https://en.wikipedia.org/wiki/X86_instruction_listings 5/53
3/14/23, 12:35 AM x86 instruction listings - Wikipedia
internal ones such
as GDT
BTR Bit test and reset Instructions atomic only if LOCK prefix present.
BTS Bit test and set
Sign-extends EAX into EDX, forming the quad-word EDX:EAX. Since (I)DIV uses EDX:EAX as its input, CDQ must
CDQ Convert double-word to quad-word
be called after setting EAX if EDX is not manually initialized (as in 64/32 division) before (I)DIV.
Compares ES:[(E)DI] with DS:[(E)SI] and increments or decrements both (E)DI and (E)SI, depending on DF; can be
CMPSD Compare string double-word
prefixed with REP
CWDE Convert word to double-word Unlike CWD, CWDE sign-extends AX to EAX instead of AX to DX:AX
Two-operand form of IMUL: Signed and Allows to multiply two registers directly, storing the partial (truncated) lower bit result. Since the lower half is the same
IMUL
Unsigned for unsigned and signed multiplication, this version of the instruction can be used for unsigned multiplication as well
INSD Input from port to string double-word *(long)ES:EDI±± = port[DX]; (±± depends on DF, ES: cannot be overridden). Can be prefixed with REP.
Jxx (near) Jump conditionally Conditional near jump instructions for all 8086 Jxx short jump instructions
JECXZ Jump if ECX is zero
LODSD Load string double-word EAX = *DS:(E)SI±±; (±± depends on DF, DS: can be overridden); can be prefixed with REP
LOOPW,
Loop, conditional loop Same as LOOP, LOOPcc for earlier processors
LOOPccW
LOOPD,
Loop while equal if (cc && --ECX) goto lbl;, cc = Z(ero), E(qual), NonZero, N(on)E(qual)
LOOPccD
MOV to/from
Move to/from special registers CR=control registers, DR=debug registers, TR=test registers (up to 80486)
CR/DR/TR
MOVSD Move string double-word *(dword*)ES:EDI±± = *(dword*)ESI±±; (±± depends on DF); can be prefixed with REP
MOVSX Move with sign-extension (long)r = (signed char) r/m; and similar
MOVZX Move with zero-extension (long)r = (unsigned char) r/m; and similar
OUTSD Output to port from string double-word port[DX] = *(long*)DS:ESI±±; (±± depends on DF, DS: can be overridden); can be prefixed with REP.
Pop all double-word (32-bit) registers from
POPAD Does not pop register ESP off of stack
stack
SCASD Scan string data double-word Compares ES:[(E)DI] with EAX and increments or decrements (E)DI, depending on DF; can be prefixed with REP
(SETA, SETAE, SETB, SETBE, SETC, SETE, SETG, SETGE, SETL, SETLE, SETNA, SETNAE, SETNB, SETNBE,
SETcc Set byte to one on condition, zero otherwise SETNC, SETNE, SETNG, SETNGE, SETNL, SETNLE, SETNO, SETNP, SETNS, SETNZ, SETO, SETP, SETPE,
SETPO, SETS, SETZ)
SHLD Shift left double r1 = r1<<CL ∣ r2>>(register_width - CL); Instead of CL, 8-bit immediate can be used.
SHLD and SHRD with 16-bit arguments and a shift-amount greater than 16 produce undefined
SHRD Shift right double
results. (Actual results differ between different Intel CPUs, with at least three different behaviors
known.[10])
STOSD Store string double-word *ES:EDI±± = EAX; (±± depends on DF, ES cannot be overridden); can be prefixed with REP
Used by software mainly for detection of the buggy[11] B0 stepping of the 80386. Microsoft
XBTS Extract Bit String
Windows (v2.01 and later) will attempt to run the XBTS instruction as part of its CPU detection if
CPUID is not present, and will refuse to boot if XBTS is found to be working.[12]
https://en.wikipedia.org/wiki/X86_instruction_listings 6/53
3/14/23, 12:35 AM x86 instruction listings - Wikipedia
Compared to earlier sets, the 80386 instruction set also adds opcodes with different parameter combinations for the following instructions: BOUND,
IMUL, LDS, LES, MOV, POP, PUSH and prefix opcodes for FS and GS segment overrides.
r = r<<24 | r<<8&0x00FF0000 | r>>8&0x0000FF00 | r>>24; Only defined for 32-bit registers. Usually used to
BSWAP r32 0F C8+r Byte Swap change between little endian and big endian representations. When used with 16-bit registers produces various different
results on 486,[13] 586, and Bochs/QEMU.[14]
Invalidate Internal
INVD 0F 08 Flush internal caches. Modified data present in the cache are not written back to memory, potentially causing data loss.
Caches
Invalidate TLB
INVLPG m8 0F 01 /7 Invalidate TLB Entry for page that contains data specified.
Entry
Write Back and Writes back all modified cache lines in the processor's internal cache to main memory and invalidates the internal
WBINVD 0F 09
Invalidate Cache caches.
XADD r/m,r8 0F C0 /r Exchanges the first operand with the second operand, then loads the sum of the two values into the destination operand.
eXchange and
ADD Instruction atomic only if used with LOCK prefix.
XADD r/m,r16/32 0F C1 /r
Integer/system instructions that were not present in the basic 80486 instruction set, but were added in processors prior to the introduction of SSE.
(Discontinued instructions are not included.)
https://en.wikipedia.org/wiki/X86_instruction_listings 7/53
3/14/23, 12:35 AM x86 instruction listings - Wikipedia
Read Model-specific register. The MSR to read is specified in ECX. The value of the MSR is then
RDMSR 0F 32
returned as a 64-bit value in EDX:EAX. IBM 386SLC,[18]
Intel Pentium,
Write Model-specific register. The MSR to write is specified in ECX, and the data to write is given in
0 AMD K5,
EDX:EAX.[a] Cyrix 6x86MX,MediaGXm,
WRMSR 0F 30 IDT WinChip C6,
Instruction is, with some exceptions, serializing.[b] Transmeta Crusoe
Intel Pentium,
Compare and Exchange 8 bytes. Compares EDX:EAX with m64. If equal, set ZF and load ECX:EBX into
AMD K5,
m64. Else, clear ZF and load m64 into EDX:EAX.
Cyrix 6x86L,MediaGXm,
CMPXCHG8B m64 0F C7 /1 3 IDT WinChip C6,[h]
Instruction atomic only if used with LOCK prefix.[g] Transmeta Crusoe,[h]
Rise mP6[h]
UD2 0F 0B This instruction is provided for software testing to explicitly generate an invalid opcode. (3) (80186),[p]
Intel Pentium Pro
The opcode for this instruction is reserved for this purpose.
Other than AMD K7/K8, broadly unsupported in non-Intel processors released before Intel Pentium Pro,[r]
NOP r/m NFx 0F 1F /0 3
AMD K7, x86-64[s]
2006.[q][28]
a. On Intel and AMD CPUs, the WRMSR instruction is also used to update
the CPU microcode. This is done by writing the virtual address of the
new microcode to upload to MSR 79h on Intel CPUs and MSR
C001_0020h on AMD CPUs.
https://en.wikipedia.org/wiki/X86_instruction_listings 8/53
3/14/23, 12:35 AM x86 instruction listings - Wikipedia
b. Writes to the following MSRs are not serializing:[19] n. The condition codes supported for CMOVcc instruction (opcode 0F 4x
/r, with the x nibble specifying the condition) are:
Number Name
x cc Condition (EFLAGS)
48h SPEC_CTRL
0 O OF=1: "Overflow"
49h PRED_CMD
1 NO OF=0: "Not Overflow"
122h TSX_CTRL
2 C,B,NAE CF=1: "Carry", "Below", "Not Above or Equal"
6E0h TSC_DEADLINE
3 NC,NB,AE CF=0: "Not Carry", "Not Below", "Above or Equal"
6E1h PKRS
4 Z,E ZF=1: "Zero", "Equal"
774h HWP_REQUEST
5 NZ,NE ZF=0: "Not Zero", "Not Equal"
802h to 83Fh (x2APIC MSRs)
6 NA,BE (CF=1 or ZF=1): "Not Above", "Below or Equal"
c. System Management Mode and the RSM instruction were made 7 A,NBE (CF=0 and ZF=0): "Above", "Not Below or Equal"
available on non-SL variants of the Intel 486 only after the initial release
of the Intel Pentium in 1993. 8 S SF=1: "Sign"
d. CPUID is also available on some Intel and AMD 486 processor variants 9 NS SF=0: "Not Sign"
that were released after the initial release of the Intel Pentium.
A P,PE PF=1: "Parity", "Parity Even"
e. On the Cyrix 5x86 and 6x86 CPUs, CPUID is not enabled by default and
must be enabled through a Cyrix configuration register. B NP,PO PF=0: "Not Parity", "Parity Odd"
f. On NexGen CPUs, CPUID is only supported with some system BIOSes.
On some NexGen CPUs that do support CPUID, EFLAGS.ID is not C L,NGE SF≠OF: "Less", "Not Greater Or Equal"
supported but EFLAGS.AC is, complicating CPU detection.[22] D NL,GE SF=OF: "Not Less", "Greater Or Equal"
g. LOCK CMPXCHG8B with a register operand (which is an invalid encoding)
E LE,NG (ZF=1 or SF≠OF): "Less or Equal", "Not Greater"
can cause hangs on some Intel Pentium CPUs (Pentium F00F bug).
h. On IDT WinChip, Transmeta Crusoe and Rise mP6 processors, the F NLE,GE (ZF=0 and SF=OF): "Not Less or Equal", "Greater"
CMPXCHG8B instruction is always supported, however its CPUID bit may
be missing. This is a workaround for a bug in Windows NT.[23] o. For CMOVcc with a memory source operand, the CPU will always read
i. The RDTSC and RDPMC instructions are not ordered with respect to other the operand from memory (potentially causing memory exceptions and
instructions, and may sample their respective counters before earlier cache line-fills) even if the condition for the move is not satisfied.
instructions are executed or after later instructions have executed. p. UD2 will cause an #UD exception on all x86 processors from the 80186
Invocations of RDPMC (but not RDTSC) may be reordered relative to each onwards, but is only documented to do so from Pentium Pro onwards.
other even for reads of the same counter. q. Unlike other instructions added in Pentium Pro, long NOP does not
have a CPUID feature bit.
In order to impose ordering with respect to other instructions, LFENCE or
serializing instructions (e.g. CPUID) are needed.[24] r. 0F 1F /0 as long-NOP was introduced in the Pentium Pro, but
remained undocumented until 2006.[29] The whole 0F 18..1F opcode
j. Fixed-rate TSC was introduced in two stages: range was NOP in Pentium Pro. However, except for 0F 1F /0, Intel
does not guarantee that these opcodes will remain NOP in future
Constant TSC processors, and have indeed assigned some of these opcodes to other
TSC running at a fixed rate as long as the processor core is not in instructions in at least some processors.[30]
a deep-sleep (C2 or deeper) mode, but not synchronized between s. Documented for AMD x86-64 since 2002.[31]
CPU cores. Introduced in Intel Prescott, Yonah and Bonnell. Also
t. On K6, the SYSCALL/SYSRET instructions were available on Model 7
present in all Transmeta and VIA Nano[25] CPUs. Does not have a
(250nm "Little Foot") and later, not on the earlier Model 6.[32]
CPUID bit.
u. SYSCALL and SYSRET were made an integral part of x86-64 - as a result,
Invariant TSC the instructions are available in 64-bit mode on all x86-64 processors
TSC running at a fixed rate synchronized between CPU cores in from AMD, Intel, VIA and Zhaoxin (except Xeon Phi "Knights Corner").
all P-,C- and T-states (but not necessarily S-states).
Outside 64-bit mode, the instructions are available on AMD processors
Present in AMD K10 and later; Intel Nehalem/Saltwell[26] and later; only.
Zhaoxin LuJiaZui[27]. Indicated with a CPUID bit (leaf
8000_0007:EDX[8]). v. The exact semantics of SYSRET differs slightly between AMD and Intel
processors - this has been known to cause security issues.[33]
k. RDTSC can be run outside Ring 0 only if CR4.TSD=0.
w. For the SYSRET and SYSEXIT instructions under x86-64, it is necessary
l. RDPMC can be run outside Ring 0 only if CR4.PCE=1. to add the REX.W prefix for variants that will return to 64-bit user-mode
m. The RDPMC instruction is not present in VIA processors prior to the Nano. code.
Third party testing indicates that the opcodes are present on the
Pentium Pro but too buggy to be usable.[35]
z. On AMD CPUs, the SYSENTER and SYSEXIT are not available in x86-64
long mode (#UD).
https://en.wikipedia.org/wiki/X86_instruction_listings 9/53
3/14/23, 12:35 AM x86 instruction listings - Wikipedia
aa. On Transmeta CPUs, the SYSENTER and SYSEXIT instructions are only ab. On Nehemiah, SYSENTER and SYSEXIT are available only on stepping 8
available with version 4.2 or higher of the Transmeta Code Morphing and later.[38]
software.[37]
These instructions can only be encoded in 64 bit mode. They fall in four groups:
original instructions that reuse existing opcodes for a different purpose (MOVSXD replacing ARPL)
original instructions with new opcodes (SWAPGS)
existing instructions extended to a 64 bit address size (JRCXZ)
existing instructions extended to a 64 bit operand size (remaining instructions)
Most instructions with a 64 bit operand size encode this using a REX.W prefix; in the absence of the REX.W prefix, the corresponding instruction with 32 bit
operand size is encoded. This mechanism also applies to most other instructions with 32 bit operand size. These are not listed here as they do not gain a
new mnemonic in Intel syntax when used with a 64 bit operand size.
CMPXCHG16B m128[a][b] REX.W 0F C7 /1
Atomic only if used with LOCK prefix.
For this reason, CMPXCHG16B has its own CPUID flag, separate from the rest of x86-64.
c. Encodings of MOVSXD without REX.W prefix are permitted but discouraged[42] - such encodings behave identically to 16/32-bit MOV (8B /r).
Bit manipulation instructions. For all of the VEX-encoded instructions defined by BMI1 and BMI2, the operand size may be 32 or 64 bits, controlled by
the VEX.W bit - none of these instructions are available in 16-bit variants.
https://en.wikipedia.org/wiki/X86_instruction_listings 10/53
3/14/23, 12:35 AM x86 instruction listings - Wikipedia
POPCNT r16,r/m16
F3 0F B8 /r
POPCNT r32,r/m32 Population Count. Counts the number of bits that are set to 1 in its source argument.
POPCNT r64,r/m64 F3 REX.W 0F B8 /r K10,
ABM (LZCNT)[a] Bobcat,
[b] Haswell,
Advanced Bit LZCNT r16,r/m16 Count Leading zeroes.
Manipulation F3 0F BD /r ZhangJiang,
LZCNT r32,r/m32
If source operand is all-0s, then LZCNT will return operand size in bits Gracemont
(16/32/64) and set CF=1.
LZCNT r64,r/m64 F3 REX.W 0F BD /r
Generate a bitmask of all-1s bits up to the lowest bit position with a 1 in the source
argument. Returns all-1s if source argument is 0. Equivalent to
BLSMSK reg,r/m VEX.LZ.0F38 F3 /2
dst = (src-1) XOR src
Copy all bits of the source argument, then clear the lowest set bit. Equivalent to
BLSR reg,r/m VEX.LZ.0F38 F3 /1
dst = (src-1) AND src
Zero out high-order bits in r/m starting from the bit position specified in rb, then write
result to rd. Equivalent to
BZHI ra,r/m,rb VEX.LZ.0F38 F5 /r
ra = r/m AND NOT(-1 << rb[7:0])
Widening unsigned integer multiply without setting flags. Multiplies EDX/RDX with r/m,
MULX ra,rb,r/m VEX.LZ.F2.0F38 F6 /r then stores the low half of the multiplication result in ra and the high half in rb. If ra
and rb specify the same register, only the high half of the result is stored.
Parallel Bit Deposit. Scatters contiguous bits from rb to the bit positions set in r/m,
then stores result to ra. Operation performed is:
Haswell,
BMI2
Bit Manipulation Parallel Bit Extract. Uses r/m argument as a bit mask to select bits in rb, then Excavator,[e]
Instruction Set 2 compacts the selected bits into a contiguous bit-vector. Operation performed is: ZhangJiang,
Gracemont
ra=0; k=0; mask=r/m
PEXT ra,rb,r/m VEX.LZ.F3.0F38 F5 /r
for i=0 to opsize-1 do
if (mask[i] == 1) then
ra[k]=rb[i]; k=k+1
SARX ra,r/m,rb VEX.LZ.F3.0F38 F7 /r For SARX, SHRX and SHLX, the shift-amount specified in rb is masked to
5 bits for 32-bit operand size and 6 bits for 64-bit operand size.
a. On AMD CPUs, the "ABM" extension provides both POPCNT and LZCNT. On Intel CPUs, however, the CPUID bit for "ABM" is only documented to
indicate the presence of the LZCNT instruction and is listed as "LZCNT", while POPCNT has its own separate CPUID feature bit.
https://en.wikipedia.org/wiki/X86_instruction_listings 11/53
3/14/23, 12:35 AM x86 instruction listings - Wikipedia
However, all known processors that implement the "ABM"/"LZCNT" extensions also implement POPCNT and set the CPUID feature bit for POPCNT, so
the distinction is theoretical only.
(The converse is not true - there exist processors that support POPCNT but not ABM, such as Intel Nehalem and VIA Nano 3000.)
b. The LZCNT instruction will execute as BSR on systems that do not support the LZCNT or ABM extensions. BSR computes the index of the highest set bit
in the source operand, producing a different result from LZCNT for most input values.
c. The TZCNT instruction will execute as BSF on systems that do not support the BMI1 extension. BSF produces the same result as TZCNT for all input
operand values except zero - for which TZCNT returns input operand size, but BSF produces undefined behavior (leaves destination unmodified on
most modern CPUs).
d. For BEXTR, the start position and length are not masked and can take values from 0 to 255. If the selected bits extend beyond the end of the r/m
argument (which has the usual 32/64-bit operand size), then the excess bits are read out as 0.
e. On AMD processors before Zen 3, the PEXT and PDEP instructions are quite slow[43] and exhibit data-dependent timing due to the use of a microcoded
implementation (about 18 to 300 cycles, depending on the number of bits set in the mask argment). As a result, it is often faster to use other
instruction sequences on these processors.[44][45]
Intel CET (Control-Flow Enforcement Technology) adds two distinct features to help protect against security exploits such as return-oriented
programming: a shadow stack (CET_SS), and indirect branch tracking (CET_IBT).
https://en.wikipedia.org/wiki/X86_instruction_listings 12/53
3/14/23, 12:35 AM x86 instruction listings - Wikipedia
INCSSPD r32 F3 0F AE /5
Increment shadow stack pointer
INCSSPQ r64 F3 REX.W 0F AE /5
CET_SS
Shadow stack. Read shadow stack pointer into
RDSSPD r32 F3 0F 1E /1
register (low 32 bits)[a]
When shadow stacks are enabled, return addresses Read shadow stack pointer into
are pushed on both the regular stack and the RDSSPQ r64 F3 REX.W 0F 1E /1 3
register (full 64 bits)[a]
shadow stack when a function call is made. They
SAVEPREVSSP F3 0F 01 EA Save previous shadow stack pointer
are then both popped on return from the function Tiger Lake,
call - if they do not match, then the stack is assumed RSTORSSP m64 F3 0F 01 /5 Restore saved shadow stack pointer Zen 3
to be corrupted, and a #CP exception is issued. WRSSD m32,r32 0F 38 F6 /r Write 4 bytes to shadow stack
The shadow stack is additionally required to be WRSSQ m64,r64 REX.W 0F 38 F6 /r Write 8 bytes to shadow stack
stored in specially marked memory pages which WRUSSD m32,r32 66 0F 38 F5 /r Write 4 bytes to user shadow stack
cannot be modified by normal memory store
WRUSSQ m64,r64 66 REX.W 0F 38 F5 /r Write 8 bytes to user shadow stack
instructions. 0
SETSSBSY F3 0F 01 E8 Mark shadow stack busy
When IBT is enabled, an indirect branch (jump, call, Prefix used with indirect CALL/JMP near 3 Tiger Lake
instructions (opcodes FF /2 and
return) to any instruction that is not an ENDBR32/64 FF /4) to indicate that the branch
instruction will cause a #CP exception. NOTRACK 3E[c] target is not required to start with an
ENDBR32/64 instruction. Prefix only
honored when NO_TRACK_EN flag is
set.
a. The RDSSPD and RDSSPQ instructions act as NOPs on processors where shadow stacks are disabled or CET is not supported.
b. ENDBR32 and ENDBR64 act as NOPs on processors that don't support CET_IBT or where IBT is disabled.
https://en.wikipedia.org/wiki/X86_instruction_listings 13/53
3/14/23, 12:35 AM x86 instruction listings - Wikipedia
c. This prefix has the same encoding as the DS: segment override prefix - as of April 2022, Intel documentation does not appear to specify whether this
prefix also retains its old segment-override function when used as a no-track prefix, nor does it provide an official mnemonic for this prefix.[46][47] (GNU
binutils use "notrack"[48])
Instruction
Instruction Set Extension Opcode Instruction description Ring Added in
mnemonics
PAUSE F3 90
Intended for use in spinlocks.[h]
Prescott,
MONITOR[j] Yonah,
Monitor a memory location for Wait for a write to a monitored memory location 0[m] Bonnell,
memory writes. previously specified with MONITOR.[n] K10,
Nano
MWAIT[k] NP 0F 01 C9
MWAIT EAX,ECX
EAX and ECX are used to provide extra
extension and hint flags.
SMX
Safer Mode Extensions.
Conroe/Merom,
Load, authenticate and execute Perform an SMX function. The function to perform
GETSEC NP 0F 37 0 WuDaoKou,[58]
a digitally signed "Authenticated is given in EAX.[o]
Tremont
Code Module" as part of Intel
Trusted Execution Technology.
https://en.wikipedia.org/wiki/X86_instruction_listings 14/53
3/14/23, 12:35 AM x86 instruction listings - Wikipedia
RDFSBASE r32 F3 0F AE /0
Read base address of FS: segment.
RDFSBASE r64 F3 REX.W 0F AE /0
MOVBE r16,m16
NFx 0F 38 F0 /r Load from memory to register with byte-order
MOVBE r32,m32
swap. Bonnell,
MOVBE MOVBE r64,m64 NFx REX.W 0F 38 F0 /r Haswell,
Move to/from memory with byte order 3 Jaguar,
swap. MOVBE m16,r16 Steamroller,
NFx 0F 38 F1 /r Store to memory from register with byte-order
MOVBE m32,r32 ZhangJiang
swap.
MOVBE m64,r64 NFx REX.W 0F 38 F1 /r
K6-2,
PREFETCHW m8 0F 0D /1 Prefetch cache line with intent to write.[b]
PREFETCHW[w] (Cedar Mill),[x]
Cache-line prefetch with intent to 3 Silvermont,
write. Broadwell,
PREFETCH m8[y] 0F 0D /0 Prefetch cache line.[b]
ZhangJiang
XSAVEC Skylake,
XSAVEC mem NP 0F C7 /4 Save processor extended state components
Processor Extended State 3 Goldmont,
XSAVEC64 mem NP REX.W 0F C7 /4 specified by EDX:EAX to memory with compaction.
save/restore with compaction. Zen 1
RDPKRU NP 0F 01 EE Read User Page Key register into EAX. Comet Lake,
PKU
3 Gracemont,
Protection Keys for user pages.
WRPKRU NP 0F 01 EF Write data from EAX into User Page Key Register. Zen 3
Goldmont Plus,
RDPID
Read processor core ID.
RDPID r32 F3 0F C7 /7 Read processor core ID into register.[q] 3[z] Zen 2,
Ice Lake
Zen 2,
CLWB Write one cache line back to memory without
CLWB m8 NFx 66 0F AE /6 3 Tiger Lake,
Cache Line Writeback to memory. invalidating the cache line.
Tremont
WBNOINVD
Write back all dirty cache lines to memory without Zen 2,
Whole Cache Writeback without WBNOINVD F3 0F 09 0
invalidate. invalidation.[aa] Ice Lake-SP
a. On AMD Athlon processors prior to the Athlon XP, the non-SIMD e. The LFENCE instruction ensures that all memory loads after the LFENCE
instructions of SSE were introduced as part of "MMX Extensions".[49] instruction are made globally observable after all memory loads before
b. All of the PREFETCH* instructions are hint instructions with effects only the LFENCE.
on performance, not program semantics. Providing an invalid address
On all Intel CPUs that support SSE2, the LFENCE instruction provides a
(e.g. address of an unmapped page or a non-canonical address) will
cause the instruction to act as a NOP without any exceptions generated. stronger ordering guarantee:[52] it is dispatch-serializing, meaning that
instructions after the LFENCE instruction are allowed to start executing
c. For the SFENCE, LFENCE and MFENCE instructions, the bottom 3 bits of
only after all instructions before it have retired (which will ensure that all
the ModR/M byte are ignored, and any value of x in the range 0..7 will
preceding loads but not necessarily stores have completed). The effect
result in a valid instruction.
of dispatch-serialization is that LFENCE also acts as a speculation barrier
d. The SFENCE instruction ensures that all memory stores after the SFENCE and a reordering barrier for accesses to non-memory resources such as
instruction are made globally observable after all memory stores before performance counters (accessed through e.g. RDTSC or RDPMC) and
the SFENCE. This imposes ordering on stores that can otherwise be
x2apic MSRs.
reordered, such as non-temporal stores and stores to WC (Write-
Combining) memory regions.[50] On AMD CPUs, LFENCE is not necessarily dispatch-serializing by default
- however, on all AMD CPUs that support any form of non-dispatch-
On Intel CPUs, as well as AMD CPUs from Zen1 onwards (but not older serializing LFENCE, it can be made dispatch-serializing by setting bit 1 of
AMD CPUs), SFENCE also acts as a reordering barrier on cache
MSR C001_1029.[53]
flushes/writebacks performed with the CLFLUSH, CLFLUSHOPT and CLWB
instructions. (Older AMD CPUs require MFENCE to order CLFLUSH.) f. The MFENCE instruction ensures that all memory loads, stores and
cacheline-flushes after the MFENCE instruction are made globally
SFENCE is not ordered with respect to LFENCE, and an SFENCE+LFENCE
observable after all memory loads, stores and cacheline-flushes before
sequence is not sufficient to prevent a load from being reordered past a
the MFENCE.
previous store.[51] To prevent such reordering, it is necessary to execute
an MFENCE, LOCK or a serializing instruction. On Intel CPUs, MFENCE is not dispatch-serializing, and therefore cannot
be used to enforce ordering on accesses to non-memory resources
such as performance counters and x2apic MSRs. MFENCE is still ordered
with respect to LFENCE, so if a memory barrier with dispatch serialization
is needed, then it can be obtained by issuing an MFENCE followed by an
LFENCE.[24]
https://en.wikipedia.org/wiki/X86_instruction_listings 16/53
3/14/23, 12:35 AM x86 instruction listings - Wikipedia
i. While the CLFLUSH instruction was introduced together with SSE2, it has r. Unlike the older RDTSC instruction, RDTSCP will delay the TSC read until
its own CPUID flag and may be present on processors not otherwise all previous instructions have retired, guaranteeing ordering with respect
implementing SSE2 and/or absent from processors that otherwise to preceding memory loads (but not stores). RDTSCP is not ordered with
implement SSE2. (E.g. AMD Geode LX supports CLFLUSH but not respect to subsequent instructions, though.
SSE2.) s. RDTSCP can be run outside Ring 0 only if CR4.TSD=0.
j. While the MONITOR and MWAIT instructions were introduced at the same t. While the POPCNT instruction was introduced at the same time as
time as SSE3, they have their own CPUID flag that needs to be SSE4.2, it is not considered to be a part of SSE4.2, but instead a
checked separately from the SSE3 CPUID flag (e.g. Athlon64 X2 and separate extension with its own CPUID flag.
VIA C7 supported SSE3 but not MONITOR.)
k. For the MONITOR and MWAIT instructions, older Intel documentation[54] On AMD processors, it is considered to be a part of the ABM extension,
lists instruction mnemonics with explicit operands (MONITOR but still has its own CPUID flag.
EAX,ECX,EDX and MWAIT EAX,ECX), while newer documentation omits
these operands. Assemblers/disassemblers may support one or both of u. The invalidation types defined for INVPCID (selected by register
these variants.[55] argument) are:
l. For MONITOR, the DS: segment can be overridden with a segment prefix.
Value Function
The memory area that will be monitored will be not just the single byte Invalidate TLB entries matching PCID and virtual memory address in
specified by DS:rAX, but a linear memory region containing the byte - 0
descriptor, excluding global entries
the size and alignment of this memory region is implementation-
dependent and can be queried through CPUID. Invalidate TLB entries matching PCID in descriptor, excluding global
1
entries
The memory location to monitor should have memory type WB (write-
2 Invalidate all TLB entries, including global entries
back cacheable), or else monitoring may fail.
3 Invalidate all TLB entries, excluding global entries
m. On some processors, such as Intel Xeon Phi x200[56] and AMD K10[57]
and later, there exist documented MSRs that can be used to enable Any other value in register argument causes a #GP exception.
MONITOR and MWAIT to run in Ring 3.
n. The wait performed by MWAITmay be ended by system events other v. Unlike the older INVLPG instruction, INVPCID will cause a #GP
than a memory write (e.g. cacheline evictions, interrupts) - the exact set exception if the provided memory address is non-canonical. This
of events that can cause the wait to end is implementation-specific. discrepancy has been known to cause security issues.[59]
w. The PREFETCH and PREFETCHW instructions are mandatory parts of the
Regardless of whether the wait was ended by a memory write or some
3DNow! instruction set extension, but are also available as a standalone
other event, monitoring will have ended and it will be necessary to set
extension on systems that do not support 3DNow!
up monitoring again with MONITOR before performing any further
MWAITs. x. The opcodes for PREFETCH and PREFETCHW (0F 0D /r) execute as
NOPs on Intel CPUs from Cedar Mill (65nm Pentium 4) onwards, with
PREFETCHW gaining prefetch functionality from Broadwell onwards.
o. The leaf functions defined for GETSEC (selected by EAX) are:
y. The PREFETCH (0F 0D /0) instruction is a 3DNow! instruction, present
EAX Function on all processors with 3DNow! but not necessarily on processors with
the PREFETCHW extension.
0 (CAPABILITIES) Report SMX capabilities
On AMD CPUs with PREFETCHW, opcode 0F 0D /0 as well as
2 (ENTERACCES) Enter execution of authenticated code module
opcodes 0F 0D /2../7 are all documented to be performing prefetch.
3 (ENTERACCES) Exit execution of authenticated code module
On Intel processors with PREFETCHW, these opcodes are documented
4 (SENTER) Enter measured environment as performing reserved-NOPs[60] (except 0F 0D /2 being
5 (SENTER) Exit measured environment PREFETCHWT1 m8 on Xeon Phi only) - third party testing[61] indicates that
some or all of these opcodes may be performing prefetch on at least
6 (PARAMETERS) Report SMX parameters some Intel Core CPUs.
7 (SMCTRL) SMX Mode Control
z. Unlike the older RDTSCP instruction which can also be used to read the
8 (WAKEUP) Wake up sleeping processors in measured environment processor ID, user-mode RDPID is not disabled by CR4.TSD=1.
aa. The WBNOINVD instruction will execute as WBINVD if run on a system that
Any other value in EAX causes an #UD exception. doesn't support the WBNOINVD extension.
p. XSAVE was added in steppings E0/R0 of Penryn and is not available in WBINVD differs from WBNOINVD in that WBINVD will invalidate all cache
earlier steppings. lines after writeback.
q. The "core ID" value read by RDTSCP and RDPID is actually the TSC_AUX
MSR (MSR C000_0103h). Whether this value actually corresponds to a
processor ID is a matter of operating system convention.
Instruction
Instruction Set Extension Opcode Instruction description Ring Added in
mnemonics
https://en.wikipedia.org/wiki/X86_instruction_listings 17/53
3/14/23, 12:35 AM x86 instruction listings - Wikipedia
PTWRITE
PTWRITE r/m32 F3 0F AE /4 Read data from register or memory to encode into Kaby Lake,
Write data to a Processor Trace 3
Packet.
PTWRITE r/m64 F3 REX.W 0F AE /4 a PTW packet.[f] Goldmont Plus
WAITPKG
Start monitoring a memory location for memory Tremont,
User-mode memory monitoring 3[k] Alder Lake
and waiting. UMONITOR r16/32/64 F3 0F AE /6 writes. The memory address to monitor is given by
the register argument.[l]
Timed wait for a write to a monitored memory
location previously specified with UMONITOR. In the
UMWAIT r32 absence of a memory write, the wait will end when
F2 0F AE /6 either the TSC reaches the value specified by
UMWAIT r32,EAX,EDX
EDX:EAX or the wait has been going on for an
OS-controlled maximum amount of time.[m]
SERIALIZE
Instruction Execution SERIALIZE NP 0F 01 E8 Serialize instruction fetch and execution.[n] 3 Alder Lake
Serialization.
https://en.wikipedia.org/wiki/X86_instruction_listings 18/53
3/14/23, 12:35 AM x86 instruction listings - Wikipedia
a. The leaf functions defined for ENCLS (selected by EAX) are: e. The leaf functions defined for ENCLV (selected by EAX) are:
C (ETRACK) Activate EBLOCK checks The operand size for the register argument is given by the address size,
which may be overridden by the 67h prefix.
Added with SGX2
D (EAUG) Add page to initialized enclave The 64-byte memory source argument does not need to be 64-byte
aligned, and is not guaranteed to be read atomically.
E (EMODPTR) Restrict permissions of EPC page
h. The leaf functions defined for PCONFIG (selected by EAX) are:
F (EMODT) Change type of EPC page
11 (ETRACKC) Activate EBLOCK checks Program key and encryption mode to use with an
0 (MKTME_KEY_PROGRAM)
MKTME Key ID.
12 (ELDBC) Load EPC page as blocked with enhanced error reporting
13 (ELDUC) Load EPC page as unblocked with enhanced error reporting Any other value in EAX causes a #GP(0) exception.
Any unsupported value in EAX causes a #GP exception. i. For CLDEMOTE, the cache level that it will demote a cache line to is
implementation-dependent.
b. SGX is deprecated on desktop/laptop processors from 11th generation
Since the instruction is considered a hint, it will execute as a NOP
(Rocket Lake, Tiger Lake) onwards, but continues to be available on
without any exceptions if the provided memory address is invalid or not
Xeon-branded server parts.[62]
in the L1 cache. It may also execute as a NOP under other
c. The leaf functions defined for ENCLU (selected by EAX) are: implementation-dependent circumstances as well.
EAX Function On systems that do not support the CLDEMOTE extension, it executes
as a NOP.
0 (EREPORT) Create a cryptographic report
1 (EGETKEY) Create a cryptographic key j. As of May 2022, no Tremont or Alder Lake models have been observed
to have the CPUID feature bit for CLDEMOTE set, while several of them
2 (EENTER) Enter an Enclave
have the CPUID bit cleared.[64]
3 (ERESUME) Re-enter an Enclave k. TPAUSE and UMWAIT can be run outside Ring 0 only if CR4.TSD=0.
4 (EEXIT) Exit an Enclave l. For UMONITOR, the operand size of the address argument is given by the
address size, which may be overridden by the 67h prefix. The default
Added with SGX2 segment used is DS:, which can be overridden with a segment prefix.
5 (EACCEPT) Accept changes to EPC page
m. For UMWAIT, the operating system can use the IA32_UMWAIT_CONTROL
MSR to limit the maximum amount of time that a single UMWAIT
6 (EMODPE) Extend EPC page permissions invocation is permitted to wait. The UMWAIT instruction will set
RFLAGS.CF to 1 if it reached the IA32_UMWAIT_CONTROL-defined time
7 (EACCEPTCOPY) Initialize pending page
limit and 0 otherwise.
Added with TDX n. While serialization can be performed with older instructions such as e.g.
CPUID and IRET, these instructions perform additional functions,
8 (EVERIFYREPORT2) Verify a cryptographic report of a trust domain
causing side-effects and reduced performance when stand-alone
instruction serialization is needed. (CPUID additionally has the issue that
Any unsupported value in EAX causes a #GP exception. it causes a mandatory #VMEXIT when executed under virtualization,
which causes a very large overhead.) The SERIALIZE instruction
d. ENCLU can only be executed in ring 3, not rings 0/1/2. performs serialization only, avoiding these added costs.
o. The register argument to SENDUIPI is an index to pick an entry from the
UITT (User-Interrupt Target Table, a table specified by the new
UINTR_TT and UINT_MISC MSRs.)
Instruction
Instruction Set Extension Opcode Instruction description Ring Added in
mnemonics
https://en.wikipedia.org/wiki/X86_instruction_listings 19/53
3/14/23, 12:35 AM x86 instruction listings - Wikipedia
Start monitoring a memory location for memory writes. Similar to older MONITOR, except
MONITORX NP 0F 01 FA
available in user mode.
MONITORX Wait for a write to a monitored memory location previously specified with MONITORX.
Monitor a memory
3 Excavator
location for writes in MWAITX differs from the older MWAIT instruction mainly in that it runs in user
user mode. MWAITX NP 0F 01 FB
mode and that it can accept an optional timeout argument (given in TSC
time units) in EBX (enabled by setting bit[1] of ECX to 1.)
CLZERO Write zeroes to all bytes in a memory region that has the size and alignment of a CPU
CLZERO rAX NP 0F 01 FC 3 Zen 1
Zero out full cache line. cache line and contains the byte addressed by DS:rAX.[d]
Read selected MSRs (mainly performance counters) in user mode. ECX specifies which
RDPRU register to read.[e]
Read processor register RDPRU NP 0F 01 FD 3[f] Zen 2
in user mode. The value of the MSR is returned in EDX:EAX.
Ensure that all preceding stores in thread have been committed to memory, and that any
errors encountered by these stores have been signalled to any associated error logging
MCOMMIT resources. The set of errors that can be reported and the logging mechanism are platform-
Commit Stores To MCOMMIT F3 0F 01 FA specific. 3 Zen 2
Memory.
Sets EFLAGS.CF to 0 if any errors occurred, 1 otherwise.
Invalidate TLB Entries for a range of pages, with broadcast. The invalidation is performed
on the processor executing the instruction, and also broadcast to all other processors in the
system.
INVLPGB NP 0F 01 FE rAX takes the virtual address to invalidate and some additional flags, ECX
takes the number of pages to invalidate, and EDX specifies ASID and PCID
INVLPGB to perform TLB invalidation for.
Invalidate TLB Entries 0 Zen 3
with broadcast.
Synchronize TLB invalidations.
a. The standard way to access the CR8 register is to use an encoding that makes use of the REX.R prefix, e.g. 44 0F 20 07 (MOV RDI,CR8). However,
the REX.R prefix is only available in 64-bit mode.
The AltMovCr8 extension adds an additional method to access CR8, using the F0 (LOCK) prefix instead of REX.R - this provides access to CR8 outside
64-bit mode.
b. Like other variants of MOV to/from the CRx registers, the AltMovCr8 encodings ignore the top 2 bits of the instruction's ModR/M byte, and always
execute as if these two bits are set to 11b.
The AltMovCr8 encodings are available in 64-bit mode. However, using both the LOCK prefix and the REX.R is not permitted and will cause an #UD
exception.
c. Support for AltMovCR8 was added in stepping F of the AMD K8, and is not available on earlier steppings.
d. For CLZERO, the address size and 67h prefix control whether to use AX, EAX or RAX as address. The default segment DS: can be overridden by a
segment-override prefix. The provided address does not need to be aligned - hardware will align it as necessary.
The CLZERO instruction is intended for recovery from otherwise-fatal Machine Check errors. It is non-cacheable, cannot be used to allocate a cache
line without a memory access, and should not be used for fast memory clears.[65]
e. The register numbering used by RDPRU does not necessarily match that of RDMSR/WRMSR.
ECX Register
https://en.wikipedia.org/wiki/X86_instruction_listings 20/53
3/14/23, 12:35 AM x86 instruction listings - Wikipedia
f. If CR4.TSD=1, then the RDPRU instruction can only run in ring 0.
https://en.wikipedia.org/wiki/X86_instruction_listings 21/53
3/14/23, 12:35 AM x86 instruction listings - Wikipedia
(precision control, to control whether floating-point operations should be rounded to 24, 53 or 64 mantissa bits) and "RC" (rounding control, to pick
rounding-mode: round-to-zero, round-to-positive-infinity, round-to-negative-infinity, round-to-nearest-even) and a 4-bit condition code register "CC",
whose four bits are individually referred to as C0,C1,C2 and C3). Not all of the arithmetic instructions provided by x87 obey PC and RC.
Waiting
x87 Non-Waiting[a] FPU Control Instructions
mnemonic[b]
Initialize x87 FPU FNINIT DB E3 FINIT
Save x87 FPU State, then initialize x87 FPU [c] DD /6 FSAVE
FNSAVE m752/m864
precision rounding
x87 Floating-point Load/Store/Move Instructions
control control
FLD m32 D9 /0
FLD m64 DD /0
Load floating-point value onto stack No —
FLD m80 DB /5
FLD st(i) D9 C0+i
FST m32 D9 /2
No Yes
Store top-of-stack floating-point value to memory or stack register FST m64 DD /2
FSTP m32 D9 /3
No Yes
FSTP m64 DD /3
FSTP m80[e] DB /7
Store top-of-stack floating-point value to memory or stack register, then pop
DD D8+i
No —
FSTP st(i)[e][f] DF D0+i[g]
DF D8+i[g]
Push +0.0 onto stack FLDZ D9 EE
No —
Push +1.0 onto stack FLD1 D9 E8
D9 C8+i
Exchange top-of-stack register with other stack register FXCH st(i) [i][j] DD C8+i[g] No —
DF C8+i[g]
precision rounding
x87 Integer Load/Store Instructions
control control
FILD m16 DF /0
Load signed integer value onto stack from memory, with conversion to floating-point FILD m32 DB /0 No —
FILD m64 DF /5
FIST m16 DF /2
Store top-of-stack value to memory, with conversion to signed integer No Yes
FIST m32 DB /2
FISTP m16 DF /3
Store top-of-stack value to memory, with conversion to signed integer, then pop stack FISTP m32 DB /3 No Yes
FISTP m64 DF /7
https://en.wikipedia.org/wiki/X86_instruction_listings 22/53
3/14/23, 12:35 AM x86 instruction listings - Wikipedia
Load 18-digit Binary-Coded-Decimal integer value onto stack from memory, with conversion to floating-point FBLD m80[k] DF /4 No —
Store top-of-stack value to memory, with conversion to 18-digit Binary-Coded-Decimal integer, then pop stack FBSTP m80 DF /6 No 387[h]
precision rounding
x87 Basic Arithmetic Instructions
control control
FADD m32 D8 /0
Floating-point add
FADD m64 DC /0
Yes Yes
dst <- dst + src FADD st,st(i) D8 C0+i
FSUB m32 D8 /4
Floating-point subtract
FSUB m64 DC /4
Yes Yes
dst <- dst - src FSUB st,st(i) D8 E0+i
FSUBR m32 D8 /5
Floating-point reverse subtract
FSUBR m64 DC /5
Yes Yes
dst <- src - dst FSUBR st,st(i) D8 E8+i
FDIVR m32 D8 /7
Floating-point reverse divide
FDIVR m64 DC /7
Yes Yes
dst <- src / dst FDIVR st,st(i) D8 F8+i
FCOM m64 DC /2
CC <- result_of( st(0) - src )
No —
Same operation as subtract, except that it updates the x87 CC status register instead of any of the FPU D8 D0+i
stack registers FCOM st(i)[i]
DC D0+i[g]
precision rounding
x87 Basic Arithmetic Instructions with Stack Pop
control control
DE D0+i[g]
Floating-point compare to st(1), then pop twice FCOMPP DE D9 No —
precision rounding
x87 Basic Arithmetic Instructions with Integer Source Argument
control control
FIADD m16 DA /0
Floating-point add by integer Yes Yes
FIADD m32 DE /0
FIMUL m16 DA /1
Floating-point multiply by integer Yes Yes
FIMUL m32 DE /1
FISUB m16 DA /4
Floating-point subtract by integer Yes Yes
FISUB m32 DE /4
https://en.wikipedia.org/wiki/X86_instruction_listings 23/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia
FISUBR m32 DE /5
FIDIV m16 DA /6
Floating-point divide by integer Yes Yes
FIDIV m32 DE /6
FIDIVR m16 DA /7
Floating-point reverse-divide by integer Yes Yes
FIDIVR m32 DE /7
FICOM m16 DA /2
Floating-point compare to integer No —
FICOM m32 DE /2
FICOMP m16 DA /3
Floating-point compare to integer, and stack pop No —
FICOMP m32 DE /3
precision rounding
x87 Additional Arithmetic Instructions
control control
Floating-point change sign FCHS D9 E0 No —
Split the st(0) value into two values E and M representing the exponent and mantissa of st(0).
The split is done such that , where E is an integer and M is a number whose absolute value is
FXTRACT D9 F4 No —
within the range . [n]
st(0) is then replaced with E, after which M is pushed onto the stack.
FPREM D9 F8 No —[p]
Floating-point power-of-2 scaling. Rounds the value of st(1) to integer with round-to-zero, then uses it as a scale
factor for st(0):[q]
FSCALE D9 FD No Yes[r]
Source operand
x87 Transcendental Instructions[s] range restriction
Base-2 Logarithm:
FYL2X[t] D9 F1 no restrictions
followed by stack pop
Partial Tangent: Computes from st(0) a pair of values X and Y, such that
8087:
FPTAN D9 F2
80387:
The Y value replaces the top-of-stack value, and then X is pushed onto the stack.
On 80387 and later x87, but not original 8087, X is always 1.0
8087:
FPATAN D9 F3
80387: no restrictions
Base-2 Logarithm plus 1, with extra precision for st(0) close to 0: Intel:
FYL2XP1[t] D9 F9 AMD:
Floating-point store and pop, without stack underflow exception FSTPNCE st(i) D9 D8+i[g]
https://en.wikipedia.org/wiki/X86_instruction_listings 24/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia
a. x87 coprocessors (other than the 8087) handle exceptions in a fairly i. For the FADDP, FSUBP, FSUBRP, FMULP, FDIVP, FDIVRP, FCOM, FCOMP and
unusual way. When an x87 instruction generates an unmasked FXCH instructions, x86 assemblers/disassemblers may recognize
arithmetic exception, it will still complete without causing a CPU fault - variants of the instructions with no arguments. Such variants are
instead of causing a fault, it will record within the coprocessor equivalent to variants using st(1) as their first argument.
information needed to handle the exception (instruction pointer, opcode, j. On Intel Pentium and later processors, FXCH is implemented as a
data pointer if the instruction had a memory operand) and set FPU register renaming rather than a true data move. This has no semantic
status-word flag to indicate that a pending exception is present. This effect, but enables zero-cycle-latency operation. It also allows the
pending exception will then cause a CPU fault when the next x87, MMX instruction to break data dependencies for the x87 top-of-stack value,
or WAIT instruction is executed. improving attainable performance for code optimized for these
The exception to this is x87's "Non-Waiting" instructions, which will processors.
execute without causing such a fault even if a pending exception is
k. The result of executing the FBLD instruction on non-BCD data is
present (with some caveats, see application note AP-578[66]). These undefined.
instructions are mostly control instructions that can inspect and/or
l. On early Intel Pentium processors, floating-point divide was subject to
modify the pending-exception state of the x87 FPU.
the Pentium FDIV bug. This also affected instructions that perform
b. For each non-waiting x87 instruction whose mnemonic begins with FN,
divide as part of their operations, such as FPREM and FPATAN.[75]
there exists a pseudo-instruction that has the same mnemonic except
without the N. These pseudo-instructions consist of a WAIT instruction m. The FXAM instruction will set C0, C2 and C3 based on value type in st(0)
(opcode 9B) followed by the corresponding non-waiting x87 instruction. as follows:
For example:
C3 C2 C0 Classification
FNCLEX is an instruction with the opcode DB E2. The corresponding
0 0 0 Unsupported (unnormal or pseudo-NaN)
pseudo-instruction FCLEX is then encoded as 9B DB E2.
FNSAVE ES:[BX+6] is an instruction with the opcode 26 DD 77 06. 0 0 1 NaN
The corresponding pseudo-instruction FSAVE ES:[BX+6] is then
encoded as 9B 26 DD 77 06 0 1 0 Normal finite number
https://en.wikipedia.org/wiki/X86_instruction_listings 25/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia
u. For FPATAN, the following adjustments are done as compared to just v. While FNOP is a no-op in the sense that will leave the x87 FPU register
stack unmodified, it may still modify FIP and CC, and it may fault if a
computing a one-argument arctangent of the ratio : pending x87 FPU exception is present.
If both st(0) and st(1) are ±∞, then the arctangent is computed as if
each of st(0) and st(1) had been replaced with ±1 of the same sign.
This produces a result that is an odd multiple of .
If both st(0) and st(1) are ±0, then the arctangent is computed as if
st(0) but not st(1) had been replaced with ±1 of the same sign,
producing a result of ±0 or .
If st(0) is negative (has sign bit set), then an addend of with the
same sign as st(1) is added to the result.
Waiting
x87 Non-Waiting Control Instructions added in 80287
mnemonic
Source operand
x87 Instructions added in 80387
range restriction
Floating-point sine.[d]
FSIN D9 FE
Floating-point cosine.[d]
FCOS D9 FF
Condition for
x87 Instructions added in Pentium Pro
conditional moves
Floating-point unordered compare and set EFLAGS, then pop FUCOMIP st(0),st(i) DF E8+i
64-bit mnemonic
x87 Non-Waiting Instructions added in Pentium II[f] (REX.W prefix)
Save x87, MMX and SSE state to 512-byte data structure[g][h][i] FXSAVE m512byte NP 0F AE /0 FXSAVE64 m512byte
[g][h] FXRSTOR m512byte NP 0F AE /1 FXRSTOR64 m512byte
Restore x87, MMX and SSE state from 512-byte data structure
https://en.wikipedia.org/wiki/X86_instruction_listings 26/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia
FISTTP m32 DB /1
FISTTP m64 DD /1
a. The x87 FPU needs to know whether it is operating in Real Mode or Protected Mode because the floating-point environment accessed by the
F(N)SAVE, FRSTOR, FLDENV and F(N)STENV instructions has different formats in Real Mode and Protected Mode. On 80287, the F(N)SETPM instruction
is required to communicate the real-to-protected mode transition to the FPU. On 80387 and later x87 FPUs, real↔protected mode transitions are
communicated automatically to the FPU without the need for any dedicated instructions - therefore, on these FPUs, FNSETPM executes as a NOP that
does not modify any FPU state.
b. For the FUCOM and FUCOMP instructions, x86 assemblers/disassemblers may recognize variants of the instructions with no arguments. Such variants
are equivalent to variants using st(1) as their first argument.
c. The 80387 FPREM1 instruction differs from the older FPREM (D9 F8) instruction in that the quotient Q is rounded to integer with round-to-nearest-even
rounding rather than the round-to-zero rounding used by FPREM. Like FPREM, FPREM1 always computes an exact result with no roundoff errors. Like
FPREM, it may also perform a partial computation if the quotient is too large, in which case it must be run again.
d. Due to the x87 FPU performing argument reduction for sin/cos with only about 68 bits of precision, the value of k used in the calculation of FSIN, FCOS
and FSINCOS is not precisely 1.0, but instead given by[76][77]
SIMD instructions
MMX instructions
MMX instructions operate on the mm registers, which are 64 bits wide. They are shared with the FPU registers.
https://en.wikipedia.org/wiki/X86_instruction_listings 27/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia
EMMS 0F 77 Empty MMX Technology State Marks all x87 FPU registers for use by FPU
PADDSW mm, mm/m64 0F ED /r Add packed signed word integers and saturate
PADDUSB mm, mm/m64 0F DC /r Add packed unsigned byte integers and saturate
PADDUSW mm, mm/m64 0F DD /r Add packed unsigned word integers and saturate
PCMPGTW mm, mm/m64 0F 65 /r Compare packed signed word integers for greater than
PCMPGTD mm, mm/m64 0F 66 /r Compare packed signed doubleword integers for greater than
PMADDWD mm, mm/m64 0F F5 /r Multiply packed words, add adjacent doubleword results
PMULHW mm, mm/m64 0F E5 /r Multiply packed signed word integers, store high 16 bits of results
PMULLW mm, mm/m64 0F D5 /r Multiply packed signed word integers, store low 16 bits of results
PSLLW mm1, imm8 0F 71 /6 ib Shift left words, shift in zeros
https://en.wikipedia.org/wiki/X86_instruction_listings 28/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia
The following MMX instruction were added with SSE. They are also available on the Athlon under the name MMX+.
PMULHUW mm1, mm2/m64 0F E4 /r Multiply Packed Unsigned Integers and Store High Result
PMINSW mm1, mm2/m64 0F EA /r Minimum of Packed Signed Word Integers
PSIGNW mm1, mm2/m64 0F 38 09 /r Negate/zero/preserve packed word integers depending on corresponding sign
PSIGND mm1, mm2/m64 0F 38 0A /r Negate/zero/preserve packed doubleword integers depending on corresponding sign
PSHUFB mm1, mm2/m64 0F 38 00 /r Shuffle bytes
PMULHRSW mm1, mm2/m64 0F 38 0B /r Multiply 16-bit signed words, scale and round signed doublewords, pack high 16 bits
PMADDUBSW mm1, mm2/m64 0F 38 04 /r Multiply signed and unsigned bytes, add horizontal pair of signed words, pack saturated signed-words
PHSUBW mm1, mm2/m64 0F 38 05 /r Subtract and pack 16-bit signed integers horizontally
PHSUBSW mm1, mm2/m64 0F 38 07 /r Subtract and pack 16-bit signed integer horizontally with saturation
PHSUBD mm1, mm2/m64 0F 38 06 /r Subtract and pack 32-bit signed integers horizontally
PHADDSW mm1, mm2/m64 0F 38 03 /r Add and pack 16-bit signed integers horizontally, pack saturated integers to mm1.
PALIGNR mm1, mm2/m64, imm8 0F 3A 0F /r ib Concatenate destination and source operands, extract byte-aligned result shifted to the right
PABSB mm1, mm2/m64 0F 38 1C /r Compute the absolute value of bytes and store unsigned result
PABSW mm1, mm2/m64 0F 38 1D /r Compute the absolute value of 16-bit integers and store unsigned result
PABSD mm1, mm2/m64 0F 38 1E /r Compute the absolute value of 32-bit integers and store unsigned result
https://en.wikipedia.org/wiki/X86_instruction_listings 29/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia
SSE instructions
SSE instructions operate on xmm registers, which are 128 bit wide.
https://en.wikipedia.org/wiki/X86_instruction_listings 30/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia
ANDPS* xmm1, xmm2/m128 0F 54 /r Bitwise Logical AND of Packed Single-Precision Floating-Point Values
ANDNPS* xmm1, xmm2/m128 0F 55 /r Bitwise Logical AND NOT of Packed Single-Precision Floating-Point Values
ORPS* xmm1, xmm2/m128 0F 56 /r Bitwise Logical OR of Single-Precision Floating-Point Values
XORPS* xmm1, xmm2/m128 0F 57 /r Bitwise Logical XOR for Single-Precision Floating-Point Values
MOVHLPS xmm1, xmm2 0F 12 /r Move Packed Single-Precision Floating-Point Values High to Low
UNPCKHPS xmm1, xmm2/m128 0F 15 /r Unpack and Interleave High Packed Single-Precision Floating-Point Values
MOVNTPS m128, xmm1 0F 2B /r Move Aligned Four Packed Single-FP Non Temporal
MOVMSKPS reg, xmm 0F 50 /r Extract Packed Single-Precision Floating-Point 4-bit Sign Mask. The upper bits of the register are filled with zeros.
CVTPI2PS xmm, mm/m64 0F 2A /r Convert Packed Dword Integers to Packed Single-Precision FP Values
CVTTPS2PI mm, xmm/m64 0F 2C /r Convert with Truncation Packed Single-Precision FP Values to Packed Dword Integers
CVTTSS2SI r32, xmm/m32 F3 0F 2C /r Convert with Truncation Scalar Single-Precision FP Value to Dword Integer
CVTTSS2SI r64, xmm1/m32 F3 REX.W 0F 2C /r Convert with Truncation Scalar Single-Precision FP Value to Qword Integer
CVTPS2PI mm, xmm/m64 0F 2D /r Convert Packed Single-Precision FP Values to Packed Dword Integers
UCOMISS xmm1, xmm2/m32 0F 2E /r Unordered Compare Scalar Single-Precision Floating-Point Values and Set EFLAGS
COMISS xmm1, xmm2/m32 0F 2F /r Compare Scalar Ordered Single-Precision Floating-Point Values and Set EFLAGS
SQRTPS xmm1, xmm2/m128 0F 51 /r Compute Square Roots of Packed Single-Precision Floating-Point Values
SQRTSS xmm1, xmm2/m32 F3 0F 51 /r Compute Square Root of Scalar Single-Precision Floating-Point Value
RSQRTPS xmm1, xmm2/m128 0F 52 /r Compute Reciprocal of Square Root of Packed Single-Precision Floating-Point Value
RSQRTSS xmm1, xmm2/m32 F3 0F 52 /r Compute Reciprocal of Square Root of Scalar Single-Precision Floating-Point Value
https://en.wikipedia.org/wiki/X86_instruction_listings 31/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia
The floating point single bitwise operations ANDPS, ANDNPS, ORPS and XORPS produce the same result as the SSE2 integer (PAND, PANDN,
POR, PXOR) and double ones (ANDPD, ANDNPD, ORPD, XORPD), but can introduce extra latency for domain changes when applied values of the
wrong type.[78]
SSE2 instructions
MOVLPD m64, xmm1 66 0F 13/r Move Low Packed Double-Precision Floating-Point Value
SQRTSD xmm1,xmm2/m64 F2 0F 51/r Compute Square Root of Scalar Double-Precision Floating-Point Value
SUBPD xmm1, xmm2/m128 66 0F 5C /r Subtract Packed Double-Precision Floating-Point Values
ANDPD xmm1, xmm2/m128 66 0F 54 /r Bitwise Logical AND of Packed Double Precision Floating-Point Values
ANDNPD xmm1, xmm2/m128 66 0F 55 /r Bitwise Logical AND NOT of Packed Double Precision Floating-Point Values
ORPD xmm1, xmm2/m128 66 0F 56/r Bitwise Logical OR of Packed Double Precision Floating-Point Values
XORPD xmm1, xmm2/m128 66 0F 57/r Bitwise Logical XOR of Packed Double Precision Floating-Point Values
COMISD xmm1, xmm2/m64 66 0F 2F /r Compare Scalar Ordered Double-Precision Floating-Point Values and Set EFLAGS
https://en.wikipedia.org/wiki/X86_instruction_listings 32/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia
UCOMISD xmm1, xmm2/m64 66 0F 2E /r Unordered Compare Scalar Double-Precision Floating-Point Values and Set EFLAGS
SHUFPD xmm1, xmm2/m128, imm8 66 0F C6 /r ib Packed Interleave Shuffle of Pairs of Double-Precision Floating-Point Values
UNPCKHPD xmm1, xmm2/m128 66 0F 15 /r Unpack and Interleave High Packed Double-Precision Floating-Point Values
UNPCKLPD xmm1, xmm2/m128 66 0F 14 /r Unpack and Interleave Low Packed Double-Precision Floating-Point Values
CVTDQ2PD xmm1, xmm2/m64 F3 0F E6 /r Convert Packed Doubleword Integers to Packed Double-Precision Floating-Point Values
CVTDQ2PS xmm1, xmm2/m128 0F 5B /r Convert Packed Doubleword Integers to Packed Single-Precision Floating-Point Values
CVTPD2DQ xmm1, xmm2/m128 F2 0F E6 /r Convert Packed Double-Precision Floating-Point Values to Packed Doubleword Integers
CVTPD2PI mm, xmm/m128 66 0F 2D /r Convert Packed Double-Precision FP Values to Packed Dword Integers
CVTPD2PS xmm1, xmm2/m128 66 0F 5A /r Convert Packed Double-Precision Floating-Point Values to Packed Single-Precision Floating-Point Values
CVTPI2PD xmm, mm/m64 66 0F 2A /r Convert Packed Dword Integers to Packed Double-Precision FP Values
CVTPS2DQ xmm1, xmm2/m128 66 0F 5B /r Convert Packed Single-Precision Floating-Point Values to Packed Signed Doubleword Integer Values
CVTPS2PD xmm1, xmm2/m64 0F 5A /r Convert Packed Single-Precision Floating-Point Values to Packed Double-Precision Floating-Point Values
CVTSD2SI r32, xmm1/m64 F2 0F 2D /r Convert Scalar Double-Precision Floating-Point Value to Doubleword Integer
CVTSD2SI r64, xmm1/m64 F2 REX.W 0F 2D /r Convert Scalar Double-Precision Floating-Point Value to Quadword Integer With Sign Extension
CVTSD2SS xmm1, xmm2/m64 F2 0F 5A /r Convert Scalar Double-Precision Floating-Point Value to Scalar Single-Precision Floating-Point Value
CVTSI2SD xmm1, r32/m32 F2 0F 2A /r Convert Doubleword Integer to Scalar Double-Precision Floating-Point Value
CVTSI2SD xmm1, r/m64 F2 REX.W 0F 2A /r Convert Quadword Integer to Scalar Double-Precision Floating-Point value
CVTSS2SD xmm1, xmm2/m32 F3 0F 5A /r Convert Scalar Single-Precision Floating-Point Value to Scalar Double-Precision Floating-Point Value
CVTTPD2DQ xmm1, xmm2/m128 66 0F E6 /r Convert with Truncation Packed Double-Precision Floating-Point Values to Packed Doubleword Integers
CVTTPD2PI mm, xmm/m128 66 0F 2C /r Convert with Truncation Packed Double-Precision FP Values to Packed Dword Integers
CVTTPS2DQ xmm1, xmm2/m128 F3 0F 5B /r Convert with Truncation Packed Single-Precision Floating-Point Values to Packed Signed Doubleword Integer Values
CVTTSD2SI r32, xmm1/m64 F2 0F 2C /r Convert with Truncation Scalar Double-Precision Floating-Point Value to Signed Dword Integer
CVTTSD2SI r64, xmm1/m64 F2 REX.W 0F 2C /r Convert with Truncation Scalar Double-Precision Floating-Point Value To Signed Qword Integer
CMPSD and MOVSD have the same name as the string instruction mnemonics CMPSD (CMPS) and MOVSD (MOVS); however, the former refer to
scalar double-precision floating-points whereas the latters refer to doubleword strings.
SSE2 allows execution of MMX instructions on SSE registers, processing twice the amount of data at once.
https://en.wikipedia.org/wiki/X86_instruction_listings 33/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia
PMOVMSKB reg, xmm 66 0F D7 /r Move a byte mask, zeroing the upper bits of the register
PEXTRW reg, xmm, imm8 66 0F C5 /r ib Extract specified word and move it to reg, setting bits 15-0 and zeroing the rest
PINSRW xmm, r32/m16, imm8 66 0F C4 /r ib Move low word at the specified word position
PACKSSDW xmm1, xmm2/m128 66 0F 6B /r Converts 4 packed signed doubleword integers into 8 packed signed word integers with saturation
PACKSSWB xmm1, xmm2/m128 66 0F 63 /r Converts 8 packed signed word integers into 16 packed signed byte integers with saturation
PACKUSWB xmm1, xmm2/m128 66 0F 67 /r Converts 8 signed word integers into 16 unsigned byte integers with saturation
PADDSW xmm1, xmm2/m128 66 0F ED /r Add packed signed word integers with saturation
PADDUSB xmm1, xmm2/m128 66 0F DC /r Add packed unsigned byte integers with saturation
PADDUSW xmm1, xmm2/m128 66 0F DD /r Add packed unsigned word integers with saturation
PCMPGTB xmm1, xmm2/m128 66 0F 64 /r Compare packed signed byte integers for greater than
PCMPGTW xmm1, xmm2/m128 66 0F 65 /r Compare packed signed word integers for greater than
PCMPGTD xmm1, xmm2/m128 66 0F 66 /r Compare packed signed doubleword integers for greater than
PMULLW xmm1, xmm2/m128 66 0F D5 /r Multiply packed signed word integers with saturation
PMULHW xmm1, xmm2/m128 66 0F E5 /r Multiply the packed signed word integers, store the high 16 bits of the results
PMULHUW xmm1, xmm2/m128 66 0F E4 /r Multiply packed unsigned word integers, store the high 16 bits of the results
PSRAD xmm1, imm8 66 0F 72 /4 ib Shift doublewords right while shifting in sign bits
PSRAW xmm1, xmm2/m128 66 0F E1 /r Shift words right while shifting in sign bits
PSRAW xmm1, imm8 66 0F 71 /4 ib Shift words right while shifting in sign bits
https://en.wikipedia.org/wiki/X86_instruction_listings 34/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia
PSUBSB xmm1, xmm2/m128 66 0F E8 /r Subtract packed signed byte integers with saturation
PSUBSW xmm1, xmm2/m128 66 0F E9 /r Subtract packed signed word integers with saturation
PMADDWD xmm1, xmm2/m128 66 0F F5 /r Multiply the packed word integers, add adjacent doubleword results
PSUBUSB xmm1, xmm2/m128 66 0F D8 /r Subtract packed unsigned byte integers with saturation
PSUBUSW xmm1, xmm2/m128 66 0F D9 /r Subtract packed unsigned word integers with saturation
PAVGB xmm1, xmm2/m128 66 0F E0, /r Average packed unsigned byte integers with rounding
PAVGW xmm1, xmm2/m128 66 0F E3 /r Average packed unsigned word integers with rounding
PMINUB xmm1, xmm2/m128 66 0F DA /r Compare packed unsigned byte integers and store packed minimum values
PMINSW xmm1, xmm2/m128 66 0F EA /r Compare packed signed word integers and store packed minimum values
PMAXSW xmm1, xmm2/m128 66 0F EE /r Compare packed signed word integers and store maximum packed values
PMAXUB xmm1, xmm2/m128 66 0F DE /r Compare packed unsigned byte integers and store packed maximum values
Computes the absolute differences of the packed unsigned byte integers; the 8 low differences and 8 high differences are
PSADBW xmm1, xmm2/m128 66 0F F6 /r
then summed separately to produce two unsigned word integer results
The following instructions can be used only on SSE registers, since by their nature they do not work on MMX registers
MASKMOVDQU xmm1, xmm2 66 0F F7 /r Non-Temporal Store of Selected Bytes from an XMM Register into Memory
MOVDQ2Q mm, xmm F2 0F D6 /r Move low quadword from XMM to MMX register.
MOVQ2DQ xmm, mm F3 0F D6 /r Move quadword from MMX register to low quadword of XMM register
MOVNTDQ m128, xmm1 66 0F E7 /r Store Packed Integers Using Non-Temporal Hint
SSE3 instructions
MOVDDUP xmm1, xmm2/m64 F2 0F 12 /r Move double-precision floating-point value and duplicate for Complex Arithmetic
MOVSLDUP xmm1, xmm2/m128 F3 0F 12 /r Move and duplicate even index single-precision floating-point values
MOVSHDUP xmm1, xmm2/m128 F3 0F 16 /r Move and duplicate odd index single-precision floating-point values
https://en.wikipedia.org/wiki/X86_instruction_listings 35/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia
SSE3 SIMD integer instructions
LDDQU xmm1, mem F2 0F F0 /r Load unaligned data and return double quadword Instructionally equivalent to MOVDQU. For video encoding
SSSE3 instructions
The following MMX-like instructions extended to SSE registers were added with SSSE3
PSIGNB xmm1, xmm2/m128 66 0F 38 08 /r Negate/zero/preserve packed byte integers depending on corresponding sign
PSIGNW xmm1, xmm2/m128 66 0F 38 09 /r Negate/zero/preserve packed word integers depending on corresponding sign
PSIGND xmm1, xmm2/m128 66 0F 38 0A /r Negate/zero/preserve packed doubleword integers depending on corresponding
PMULHRSW xmm1, xmm2/m128 66 0F 38 0B /r Multiply 16-bit signed words, scale and round signed doublewords, pack high 16 bits
PMADDUBSW xmm1, xmm2/m128 66 0F 38 04 /r Multiply signed and unsigned bytes, add horizontal pair of signed words, pack saturated signed-words
PHSUBW xmm1, xmm2/m128 66 0F 38 05 /r Subtract and pack 16-bit signed integers horizontally
PHSUBSW xmm1, xmm2/m128 66 0F 38 07 /r Subtract and pack 16-bit signed integer horizontally with saturation
PHSUBD xmm1, xmm2/m128 66 0F 38 06 /r Subtract and pack 32-bit signed integers horizontally
PHADDSW xmm1, xmm2/m128 66 0F 38 03 /r Add and pack 16-bit signed integers horizontally with saturation
PALIGNR xmm1, xmm2/m128, imm8 66 0F 3A 0F /r ib Concatenate destination and source operands, extract byte-aligned result shifted to the right
PABSB xmm1, xmm2/m128 66 0F 38 1C /r Compute the absolute value of bytes and store unsigned result
PABSW xmm1, xmm2/m128 66 0F 38 1D /r Compute the absolute value of 16-bit integers and store unsigned result
PABSD xmm1, xmm2/m128 66 0F 38 1E /r Compute the absolute value of 32-bit integers and store unsigned result
SSE4 instructions
SSE4.1
66 0F 3A 40 /r
DPPS xmm1, xmm2/m128, imm8 Selectively multiply packed SP floating-point values, add and selectively store
ib
66 0F 3A 41 /r
DPPD xmm1, xmm2/m128, imm8 Selectively multiply packed DP floating-point values, add and selectively store
ib
66 0F 3A 0C /r
BLENDPS xmm1, xmm2/m128, imm8 Select packed single precision floating-point values from specified mask
ib
66 0F 3A 0D /r
BLENDPD xmm1, xmm2/m128, imm8 Select packed DP-FP values from specified mask
ib
BLENDVPD xmm1, xmm2/m128,
66 0F 38 15 /r Select packed DP FP values from specified mask
<XMM0>
66 0F 3A 08 /r
ROUNDPS xmm1, xmm2/m128, imm8 Round packed single precision floating-point values
ib
66 0F 3A 0A /r
ROUNDSS xmm1, xmm2/m32, imm8 Round the low packed single precision floating-point value
ib
66 0F 3A 09 /r
ROUNDPD xmm1, xmm2/m128, imm8 Round packed double precision floating-point values
ib
66 0F 3A 0B /r
ROUNDSD xmm1, xmm2/m64, imm8 Round the low packed double precision floating-point value
ib
66 0F 3A 21 /r Insert a selected single-precision floating-point value at the specified destination element and zero out destination
INSERTPS xmm1, xmm2/m32, imm8
ib elements
66 0F 3A 17 /r
EXTRACTPS reg/m32, xmm1, imm8 Extract one single-precision floating-point value at specified offset and store the result (zero-extended, if applicable)
ib
https://en.wikipedia.org/wiki/X86_instruction_listings 36/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia
SSE4.1 SIMD integer instructions
MPSADBW xmm1, xmm2/m128, imm8 66 0F 3A 42 /r ib Sums absolute 8-bit integer difference of adjacent groups of 4 byte integers with starting offset
PHMINPOSUW xmm1, xmm2/m128 66 0F 38 41 /r Find the minimum unsigned word
PMULLD xmm1, xmm2/m128 66 0F 38 40 /r Multiply the packed dword signed integers and store the low 32 bits
PMULDQ xmm1, xmm2/m128 66 0F 38 28 /r Multiply packed signed doubleword integers and store quadword result
PBLENDVB xmm1, xmm2/m128, <XMM0> 66 0F 38 10 /r Select byte values from specified mask
PINSRB xmm1, r32/m8, imm8 66 0F 3A 20 /r ib Insert a byte integer value at specified destination element
PINSRD xmm1, r/m32, imm8 66 0F 3A 22 /r ib Insert a dword integer value at specified destination element
PINSRQ xmm1, r/m64, imm8 66 REX.W 0F 3A 22 /r ib Insert a qword integer value at specified destination element
PEXTRB reg/m8, xmm2, imm8 66 0F 3A 14 /r ib Extract a byte integer value at source byte offset, upper bits are zeroed.
PEXTRW reg/m16, xmm, imm8 66 0F 3A 15 /r ib Extract word and copy to lowest 16 bits, zero-extended
PEXTRD r/m32, xmm2, imm8 66 0F 3A 16 /r ib Extract a dword integer value at source dword offset
PEXTRQ r/m64, xmm2, imm8 66 REX.W 0F 3A 16 /r ib Extract a qword integer value at source qword offset
PMOVSXBW xmm1, xmm2/m64 66 0f 38 20 /r Sign extend 8 packed 8-bit integers to 8 packed 16-bit integers
PMOVZXBW xmm1, xmm2/m64 66 0f 38 30 /r Zero extend 8 packed 8-bit integers to 8 packed 16-bit integers
PMOVSXBD xmm1, xmm2/m32 66 0f 38 21 /r Sign extend 4 packed 8-bit integers to 4 packed 32-bit integers
PMOVZXBD xmm1, xmm2/m32 66 0f 38 31 /r Zero extend 4 packed 8-bit integers to 4 packed 32-bit integers
PMOVSXBQ xmm1, xmm2/m16 66 0f 38 22 /r Sign extend 2 packed 8-bit integers to 2 packed 64-bit integers
PMOVZXBQ xmm1, xmm2/m16 66 0f 38 32 /r Zero extend 2 packed 8-bit integers to 2 packed 64-bit integers
PMOVSXWD xmm1, xmm2/m64 66 0f 38 23/r Sign extend 4 packed 16-bit integers to 4 packed 32-bit integers
PMOVZXWD xmm1, xmm2/m64 66 0f 38 33 /r Zero extend 4 packed 16-bit integers to 4 packed 32-bit integers
PMOVSXWQ xmm1, xmm2/m32 66 0f 38 24 /r Sign extend 2 packed 16-bit integers to 2 packed 64-bit integers
PMOVZXWQ xmm1, xmm2/m32 66 0f 38 34 /r Zero extend 2 packed 16-bit integers to 2 packed 64-bit integers
PMOVSXDQ xmm1, xmm2/m64 66 0f 38 25 /r Sign extend 2 packed 32-bit integers to 2 packed 64-bit integers
PMOVZXDQ xmm1, xmm2/m64 66 0f 38 35 /r Zero extend 2 packed 32-bit integers to 2 packed 64-bit integers
PTEST xmm1, xmm2/m128 66 0F 38 17 /r Set ZF if AND result is all 0s, set CF if AND NOT result is all 0s
PACKUSDW xmm1, xmm2/m128 66 0F 38 2B /r Convert 2 × 4 packed signed doubleword integers into 8 packed unsigned word integers with saturation
MOVNTDQA xmm1, m128 66 0F 38 2A /r Move double quadword using non-temporal hint if WC memory type
SSE4a
66 0F 78 /0 ib ib
EXTRQ Extract Field From Register
66 0F 79 /r
F2 0F 78 /r ib ib
INSERTQ Insert Field
F2 0F 79 /r
SSE4.2
https://en.wikipedia.org/wiki/X86_instruction_listings 37/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia
PCMPESTRI xmm1, xmm2/m128, imm8 66 0F 3A 61 /r imm8 Packed comparison of string data with explicit lengths, generating an index
PCMPESTRM xmm1, xmm2/m128, imm8 66 0F 3A 60 /r imm8 Packed comparison of string data with explicit lengths, generating a mask
PCMPISTRI xmm1, xmm2/m128, imm8 66 0F 3A 63 /r imm8 Packed comparison of string data with implicit lengths, generating an index
PCMPISTRM xmm1, xmm2/m128, imm8 66 0F 3A 62 /r imm8 Packed comparison of string data with implicit lengths, generating a mask
F16C
Instruction Meaning
Convert four half-precision floating point values in memory or the bottom half of an XMM register to four single-precision floating-point values
VCVTPH2PS xmmreg,xmmrm64
in an XMM register
Convert eight half-precision floating point values in memory or an XMM register (the bottom half of a YMM register) to eight single-precision
VCVTPH2PS ymmreg,xmmrm128
floating-point values in a YMM register
Convert four single-precision floating point values in an XMM register to half-precision floating-point values in memory or the bottom half an
VCVTPS2PH xmmrm64,xmmreg,imm8
XMM register
VCVTPS2PH xmmrm128,ymmreg,imm8 Convert eight single-precision floating point values in a YMM register to half-precision floating-point values in memory or an XMM register
FMA3
Supported in AMD processors starting with the Piledriver architecture and Intel starting with Haswell processors and Broadwell processors since 2014.
https://en.wikipedia.org/wiki/X86_instruction_listings 38/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia
Instruction Meaning
VFMADD132PD
VFMADD132PS
VFMADD132SD
VFMADD132SS
VFMADDSUB132PD
VFMADDSUB132PS
VFMADDSUB213PS Fused Multiply-Alternating Add/Subtract of Packed Single-Precision Floating-Point Values
VFMADDSUB231PS
VFMSUB132PD
VFMSUB213PD Fused Multiply-Subtract of Packed Double-Precision Floating-Point Values
VFMSUB231PD
VFMSUB132PS
VFMSUB213PS Fused Multiply-Subtract of Packed Single-Precision Floating-Point Values
VFMSUB231PS
VFMSUB132SD
VFMSUB213SD Fused Multiply-Subtract of Scalar Double-Precision Floating-Point Values
VFMSUB231SD
VFMSUB132SS
VFMSUB213SS Fused Multiply-Subtract of Scalar Single-Precision Floating-Point Values
VFMSUB231SS
VFMSUBADD132PD
VFMSUBADD213PD Fused Multiply-Alternating Subtract/Add of Packed Double-Precision Floating-Point Values
VFMSUBADD231PD
VFMSUBADD132PS
VFMSUBADD213PS Fused Multiply-Alternating Subtract/Add of Packed Single-Precision Floating-Point Values
VFMSUBADD231PS
VFNMADD132PD
VFNMADD213PD Fused Negative Multiply-Add of Packed Double-Precision Floating-Point Values
VFNMADD231PD
VFNMADD132PS
VFNMADD213PS Fused Negative Multiply-Add of Packed Single-Precision Floating-Point Values
VFNMADD231PS
VFNMADD132SD
VFNMADD213SD Fused Negative Multiply-Add of Scalar Double-Precision Floating-Point Values
VFNMADD231SD
VFNMADD132SS
VFNMADD213SS Fused Negative Multiply-Add of Scalar Single-Precision Floating-Point Values
VFNMADD231SS
VFNMSUB132PD
VFNMSUB231PD
VFNMSUB132PS
VFNMSUB231PS
https://en.wikipedia.org/wiki/X86_instruction_listings 39/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia
VFNMSUB132SD
VFNMSUB213SD Fused Negative Multiply-Subtract of Scalar Double-Precision Floating-Point Values
VFNMSUB231SD
VFNMSUB132SS
VFNMSUB213SS Fused Negative Multiply-Subtract of Scalar Single-Precision Floating-Point Values
VFNMSUB231SS
AVX
AVX were first supported by Intel with Sandy Bridge and by AMD with Bulldozer.
Instruction Description
VBROADCASTSS
VBROADCASTSD Copy a 32-bit, 64-bit or 128-bit memory operand to all elements of a XMM or YMM vector register.
VBROADCASTF128
Replaces either the lower half or the upper half of a 256-bit YMM register with the value of a 128-bit source operand. The other half of the destination is
VINSERTF128
unchanged.
VEXTRACTF128 Extracts either the lower half or the upper half of a 256-bit YMM register and copies the value to a 128-bit destination operand.
Conditionally reads any number of elements from a SIMD vector memory operand into a destination register, leaving the remaining vector elements unread
VMASKMOVPS and setting the corresponding elements in the destination register to zero. Alternatively, conditionally writes any number of elements from a SIMD vector
register operand to a vector memory operand, leaving the remaining elements of the memory operand unchanged. On the AMD Jaguar processor
architecture, this instruction with a memory source operand takes more than 300 clock cycles when the mask is zero, in which case the instruction should do
VMASKMOVPD
nothing. This appears to be a design flaw.[79]
VPERMILPS Permute In-Lane. Shuffle the 32-bit or 64-bit vector elements of one input operand. These are in-lane 256-bit instructions, meaning that they operate on all
VPERMILPD 256 bits with two separate 128-bit shuffles, so they can not shuffle across the 128-bit lanes.[80]
VPERM2F128 Shuffle the four 128-bit vector elements of two 256-bit source operands into a 256-bit destination operand, with an immediate constant as selector.
VZEROALL Set all YMM registers to zero and tag them as unused. Used when switching between 128-bit use and 256-bit use.
VZEROUPPER Set the upper half of all YMM registers to zero. Used when switching between 128-bit use and 256-bit use.
AVX2
Expansion of most vector integer SSE and AVX instructions to 256 bits
https://en.wikipedia.org/wiki/X86_instruction_listings 40/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia
Instruction Description
VBROADCASTSS Copy a 32-bit or 64-bit register operand to all elements of a XMM or YMM vector register. These are register versions of the same instructions in AVX1. There
VBROADCASTSD is no 128-bit version however, but the same effect can be simply achieved using VINSERTF128.
VPBROADCASTB
VPBROADCASTW
Copy an 8, 16, 32 or 64-bit integer register or memory operand to all elements of a XMM or YMM vector register.
VPBROADCASTD
VPBROADCASTQ
VBROADCASTI128 Copy a 128-bit memory operand to all elements of a YMM vector register.
Replaces either the lower half or the upper half of a 256-bit YMM register with the value of a 128-bit source operand. The other half of the destination is
VINSERTI128
unchanged.
VEXTRACTI128 Extracts either the lower half or the upper half of a 256-bit YMM register and copies the value to a 128-bit destination operand.
VGATHERDPD
VGATHERQPD
Gathers single or double precision floating point values using either 32 or 64-bit indices and scale.
VGATHERDPS
VGATHERQPS
VPGATHERDD
VPGATHERDQ
Gathers 32 or 64-bit integer values using either 32 or 64-bit indices and scale.
VPGATHERQD
VPGATHERQQ
VPMASKMOVD Conditionally reads any number of elements from a SIMD vector memory operand into a destination register, leaving the remaining vector elements unread
and setting the corresponding elements in the destination register to zero. Alternatively, conditionally writes any number of elements from a SIMD vector
VPMASKMOVQ register operand to a vector memory operand, leaving the remaining elements of the memory operand unchanged.
VPERMPS
Shuffle the eight 32-bit vector elements of one 256-bit source operand into a 256-bit destination operand, with a register or memory operand as selector.
VPERMD
VPERMPD
Shuffle the four 64-bit vector elements of one 256-bit source operand into a 256-bit destination operand, with a register or memory operand as selector.
VPERMQ
VPERM2I128 Shuffle (two of) the four 128-bit vector elements of two 256-bit source operands into a 256-bit destination operand, with an immediate constant as selector.
VPSLLVD
Shift left logical. Allows variable shifts where each element is shifted according to the packed input.
VPSLLVQ
VPSRLVD
Shift right logical. Allows variable shifts where each element is shifted according to the packed input.
VPSRLVQ
VPSRAVD Shift right arithmetically. Allows variable shifts where each element is shifted according to the packed input.
AVX-512
AVX-512, introduced in 2014, adds 512-bit wide vector registers (extending the 256-bit registers, which become the new registers' lower halves) and
doubles their count to 32; the new registers are thus named zmm0 through zmm31. It adds eight mask registers, named k0 through k7, which may be used
to restrict operations to specific parts of a vector register. Unlike previous instruction set extensions, AVX-512 is implemented in several groups; only the
foundation ("AVX-512F") extension is mandatory.[81] Most of the added instructions may also be used with the 256- and 128-bit registers.
Cryptographic instructions
6 new instructions.
https://en.wikipedia.org/wiki/X86_instruction_listings 41/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia
CLMUL instructions
PCLMULQDQ xmmreg,xmmrm,imm 66 0f 3a 44 /r ib Perform a carry-less multiplication of two 64-bit polynomials over the finite field GF(2k).
PCLMULLQLQDQ xmmreg,xmmrm 66 0f 3a 44 /r 00 Multiply the low halves of the two registers.
PCLMULHQLQDQ xmmreg,xmmrm 66 0f 3a 44 /r 01 Multiply the high half of the destination register by the low half of the source register.
PCLMULLQHQDQ xmmreg,xmmrm 66 0f 3a 44 /r 10 Multiply the low half of the destination register by the high half of the source register.
I
RDRAND r16 E
NFx 0F C7 /6
RDRAND r32 Return a random number that has been generated with a CSPRNG (Cryptographically Secure Pseudo-Random Number Generator) P
compliant with NIST SP 800-90A (https://csrc.nist.gov/publications/detail/sp/800-90a/rev-1/final).[a] Z
K
RDRAND r64 NFx REX.W 0F C7 /6
G
RDSEED r16 B
NFx 0F C7 /7 Z
RDSEED r32 Return a random number that has been generated with a HRNG/TRNG (Hardware/"True" Random Number Generator) compliant with
K
NIST SP 800-90B (https://csrc.nist.gov/publications/detail/sp/800-90b/final) and C (https://csrc.nist.gov/publications/detail/sp/800-90c/draft).[a]
Z
RDSEED r64 NFx REX.W 0F C7 /7 G
a. The RDRAND and RDSEED instructions may fail to obtain and return a random number if the CPU's random number generators cannot keep up with the
issuing of these instructions - if this happens, then the instructions may be retried (although the number of retries should be limited, in order to ensure
forward progress). The instructions set EFLAGS.CF to 1 if a random number was successfully obtained and 0 otherwise. Failure to obtain a random
number will also set the instruction's destination register to 0.
7 new instructions.
SHA1MSG1 0F 38 C9 /r Perform an Intermediate Calculation for the Next Four SHA1 Message Dwords
SHA1MSG2 0F 38 CA /r Perform a Final Calculation for the Next Four SHA1 Message Dwords
SHA256RNDS2 0F 38 CB /r Perform Two Rounds of SHA256 Operation
SHA256MSG1 0F 38 CC /r Perform an Intermediate Calculation for the Next Four SHA256 Message Dwords
SHA256MSG2 0F 38 CD /r Perform a Final Calculation for the Next Four SHA256 Message Dwords
These instructions, available in Tiger Lake and later Intel processors, are designed to enable encryption/decryption with an AES key without having access
to any unencrypted copies of the key during the actual encryption/decryption process.
https://en.wikipedia.org/wiki/X86_instruction_listings 42/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia
Wrap a 128-bit AES key from XMM0 into a 384-bit key handle Source operand specifies handle restrictions to build into the
ENCODEKEY128 r32,r32 F3 0F 38 FA /r handle.
and output handle in XMM0-2.
Destination operand is initialized with information about the
source and attributes of the key.
Wrap a 256-bit AES key from XMM1:XMM0 into a 512-bit key The instruction also modifies XMM4-6 (zeroed out in existing
ENCODEKEY256 r32,32 F3 0F 3A FB /r
handle and output handle in XMM0-3. implementations, but this should not be relied on).
Encrypt xmm using 128-bit AES key indicated by handle at m384
AESENC128KL xmm,m384 F3 0F 38 DC /r
and store result in xmm.
The VIA PadLock instructions are instructions designed to apply cryptographic primitives in bulk, similar to the 8086 repeated string instructions. As
such, unless otherwise specified, they take, as applicable, pointers to source data in ES:rSI and destination data in ES:rDI, and a data-size or count in rCX.
Like the old string instructions, they are all designed to be interruptible.
XSTORE 0F A7 C0 Store random bytes to ES:[rDI], and increment ES:rDI accordingly. XSTORE will store currently-
RNG
available bytes, which may be from 0 to 8 bytes. REP XSTORE will write the number of random Nehemiah
Random Number
bytes specified by rCX, waiting for the random number generator when needed. EDX (stepping 3)
Generation. REP XSTORE F3 0F A7 C0 specifies a "quality factor".
REP XCRYPTECB F3 0F A7 C8
ACE REP XCRYPTCBC F3 0F A7 D0
Encrypt/Decrypt data, using the AES cipher in various block modes (ECB, CBC, CTR, CFB Nehemiah
Advanced Cryptography
REP XCRYPTCFB F3 0F A7 E0 and OFB, respectively). rCX contains the number of 16-byte blocks to encrypt/decrypt, rBX (stepping 8)
Engine.
contains a pointer to an encryption key, rAX a pointer to an initialization vector for block
REP XCRYPTOFB F3 0F A7 E8 modes that need it, and rDX a pointer to a control word.[a]
REP XSHA1 F3 0F A6 C8 Compute a cryptographic hash (using the SHA-1 and SHA-256 functions, respectively).
PHE
ES:rSI points to data to compute a hash for, ES:rDI points to a message digest and rCX Esther
Hash Engine.
REP XSHA256 F3 0F A6 D0 specifies the number of bytes. rAX should be set to 0 at the start of a calculation.[c]
Compute SM3 hash, similar to the REP XSHA* instructions. The rBX register is used to specify
CCS_HASH F3 0F A6 E8
GMI[83][84] hash function (20h for SM3 being the only documented value).
Chinese national
cryptographic Encrypt/Decrypt data, using the SM4 cipher in various block modes. rCX contains the number ZhangJiang
algorithms. (Zhaoxin of 16-byte blocks to encrypt/decrypt, rBX contains a pointer to an encryption key, rDX a
CCS_ENCRYPT F3 0F A7 F0
only.) pointer to an initialization vector for block modes that need it, and rAX contains a control
word.[e]
https://en.wikipedia.org/wiki/X86_instruction_listings 43/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia
a. The control word for REP XCRYPT* is a 128-bit c. On VIA Nano and later processors, setting e. The CCS_ENCRYPT control word in rAX has the
data structure with the following layout: rAX to an all-1s value for the REP XSHA* following format:
instructions will enable an alternate operation
Bits Usage mode, where rCX specifies the number of 64- Bits Usage
byte blocks, and where the standard FIPS-
3:0 AES round count 0 0=Encrypt, 1=Decrypt
180-2 length extension procedure at the end
4 Digest mode enable (ACE2 only) of the hash calculation is omitted. This makes 5:1 Must be 10000b for SM4.
for a variant more suitable for data streaming
5
1=allow data that is not 16-byte aligned than the original EAX=0 variant. This 6 ECB block mode
(ACE2 only) functionality also exists for CCS_HASH.
7 CBC block mode
6 Cipher: 0=AES, 1=undefined
d. The data structure to REP MONTMUL contains 8 CFB block mode
Key schedule: 0=compute, 1=load from six 32-bit elements, where the first one is a
7 9 OFB block mode
memory negated modular inverse of the bottom 32 bits
of the modulus and the remaining 5 are 10 CTR block mode
8 0=normal, 1=intermediate-result
pointers to various memory buffers:
11 Digest enable
9 0=encrypt, 1=decrypt
Offset Data item
Key size: 00=128bit, 01=192bit, Remaining bits in rAX must be set to all-0s.
11:10 0 Negated modular inverse
10=256bit, 11=reserved
Other instructions
https://en.wikipedia.org/wiki/X86_instruction_listings 44/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia
x86 also includes discontinued instruction sets which are no longer supported by Intel and AMD, and undocumented instructions which execute but are
not officially documented.
Virtualization instructions
AMD-V instructions
VMRUN rAX[a] 0F 01 D8 Run virtual machine Performs a switch to the guest OS.
VMSAVE rAX[a] 0F 01 DB Save state To VMCB Saves additional guest state to VMCB.
K8[b]
Set Global Interrupt Normally used by the VMM.
STGI 0F 01 DC
Flag
Expands a 2MB-page RMP entry into a corresponding set of contiguous 4KB-page RMP entries.
PSMASH F3 0F 01 FF Page Smash
The 2MB page's system physical address is specified in the RAX register.
Validates or rescinds validation of a guest page's RMP entry. The guest virtual address is specified
PVALIDATE F2 0F 01 FF Page Validate
in the register operand rAX.
Modifies RMP permissions for a guest page. The guest virtual address is specified in the RAX Zen 3
Adjust RMP
RMPADJUST F3 0F 01 FE register. The page size is specified in RCX[0]. The target VMPL and its permissions are specified in
Permissions
the RDX register.
Writes a new RMP entry. The system physical address of a page whose RMP entry is modified is
RMPUPDATE F2 0F 01 FE Write RMP Entry specified in the RAX register. The RCX register provides the effective address of a 16-byte data
structure which contains the new RMP state.
Reads an RMP permission mask for a guest page. The guest virtual address is specified in the RAX
Read RMP
RMPQUERY F3 0F 01 FD register. The target VMPL is specified in RDX[7:0]. RMP permissions for the specified VMPL are Zen 4
Permissions
returned in RDX[63:8] and the RCX register.
a. For the rAX argument to the VMRUN, VMLOAD, VMSAVE and INVLPGA instructions, the choice of AX/EAX/RAX depends on address-size, which can be
overridden with the 67h prefix.
https://en.wikipedia.org/wiki/X86_instruction_listings 45/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia
b. Support for AMD-V was added in stepping F of the AMD K8, and is not available on earlier steppings.
VMPTRLD m64[a] NP 0F C7 /6 Load pointer to Virtual-Machine Control Structure (VMCS) from memory and mark it valid.
Flush VMCS data from CPU to VMCS region in memory. If the specified VMCS is the current VMCS, then the
VMCLEAR m64[a] 66 0F C7 /6
current-VMCS is marked as invalid.
Read a specified field from the current-VMCS. The reg argument specifies which field to read - the result is stored
VMREAD r/m,reg NP 0F 78 /r Prescott 2M,
to r/m.
Yonah,
Write to specified field of current-VMCS. The reg argument specifies which field to write, and the r/m argument Centerton,
VMWRITE reg,r/m NP 0F 79 /r Nano 3000
provides the data item to write to the field.
INVEPT reg,m128 66 0F 38 80 /r Invalidates EPT-derived entries in the TLBs and paging-structure caches. Nehalem,
Centerton,[85]
INVVPID reg,m128 66 0F 38 81 /r Invalidates entries in the TLBs and paging-structure caches based on VPID. ZhangJiang
Trust Domain Extensions (TDX): Secure Arbitration Mode (SEAM) instructions
a. The m64 argument to VMPTRLD, VMPTRST, VMCLEAR and VMXON is a 64-bit physical address.
b. The m64 argument to VMXON is the 64-bit physical address to a "VMXON region", which is a 4Kbyte region that must be 4 Kbyte aligned. This region
may be used by the processor to support VMX operation in an implementation-dependent manner and should never be accessed by software until the
processor has left VMX operation through the VMXOFF instruction.
Undocumented instructions
The x86 CPUs contain undocumented instructions which are implemented on the chips but not listed in some official documents. They can be found in
various sources across the Internet, such as Ralf Brown's Interrupt List and at sandpile.org (https://www.sandpile.org/)
https://en.wikipedia.org/wiki/X86_instruction_listings 46/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia
Some of these instructions are widely available across many/most x86 CPUs, while others are specific to a narrow range of CPUs.
Undocumented instructions that are widely available across many x86 CPUs include
AAM imm8 D4 imm8 The actual operation is AH ← AL/imm8; AL ← AL mod imm8 for
any imm8 value (except zero, which produces a divide-by-zero
exception).[87] Available beginning with 8086,
documented for imm8 values other
than 0Ah since Pentium (earlier
ASCII-Adjust-Before-Division. On the 8086, documented for imm8=0Ah only,
documentation lists no arguments).
which is used to convert a BCD value to binary for a following division
instruction.
AAD imm8 D5 imm8
The actual operation is AL ← (AL+(AH*imm8)) & 0FFh; AH ← 0
for any imm8 value.
SALC,
Set AL depending on the value of the Carry Flag (a 1-byte alternative of Available beginning with 8086, but only
D6
SETALC SBB AL, AL) documented since Pentium Pro.
SHL, (D0..D3) /6,
Undocumented variants of the SHL instruction.[88] Performs the same Available since the 80186 (performs
operation as the documented (D0..D3) /4 and (C0..C1) /4 variants,
SAL (C0..C1) /6 imm8 different operation on the 8086)[91]
respectively.
Alias of opcode 80, which provides variants of 8-bit integer instructions (ADD, Available since the 8086.[92] Explicitly
(multiple) 82 /(0..7) imm8 unavailable in 64-bit mode but kept and
OR, ADC, SBB, AND, SUB, XOR, CMP) with an 8-bit immediate argument.[92]
reserved for compatibility.[93]
Available on 8086, but only
OR,AND,XOR 83 /(1,4,6) imm8 16-bit OR/AND/XOR with a sign-extended 8-bit immediate. documented from 80386
onwards.[94][95]
The behavior of the F2 prefix (REPNZ, REPNE) when used with string
REPNZ MOVS F2 (A4..A5) instructions other than CMPS/SCAS is officially undefined, but there exists
commercial software (e.g. the version of FDISK distributed with MS-DOS Available since the 8086.
REPNZ STOS F2 (AA..AB) versions 3.30 to 6.22[96]) that rely on it to behave in the same way as the
documented F3 (REP) prefix.
The use of the REP prefix with the RET instruction is not listed as supported in
either the Intel SDM or the AMD APM. However, AMD's optimization guide for
the AMD-K8 describes the F3 C3 encoding as a way to encode a two-byte
RET instruction - this is the recommended workaround for an issue in the Executes as RET on all known x86
REP RET F3 C3
AMD-K8's branch predictor that can cause branch prediction to fail for some 1- CPUs.
byte RET instructions.[97] At least some versions of gcc are known to use this
encoding.[98]
NOP with address-size override prefix. The use of the 67 prefix for instructions
without memory operands is listed by the Intel SDM (vol 2, section 2.1.1) as
NOP 67 90 Executes as NOP on 80386 and later.
"reserved", but it is used in Microsoft Windows 95 as a workaround for a bug in
the B1 stepping of Intel 80386.[99][100]
UD1 0F B9 /r Intentionally undefined instructions, but unlike UD2 (0F 0B) these instructions All of these opcodes produce #UD
were left unpublished until December 2016.[108][109] exceptions on 80186 and later (except
https://en.wikipedia.org/wiki/X86_instruction_listings 47/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia
Microsoft Windows 95 Setup is known to depend on 0F FF being on NEC V20/V30, which assign at least
0F FF to the BRKEM instruction.)
invalid[110][111] - it is used as a self check to test that its #UD
exception handler is working properly.
Undocumented instructions that appear only in a limited subset of x86 CPUs include
Exact purpose unknown, causes CPU hang (HCF). The only way out is CPU
reset.[117]
LOADALL 0F 05 Loads All Registers from Memory Address 0x000800H Opcode reused for SYSCALL
in AMD K6-2 and later CPUs.
LOADALLD 0F 07 Loads All Registers from Memory Address ES:EDI Opcode reused for SYSRET in
AMD K6-2 and later CPUs.
On the Intel SCC (Single-chip Cloud Computer), invalidate all message buffers.
CL1INVMB 0F 0A[119] The menmonic and operation of the instruction, but not its opcode, are Available on the SCC only.
described in Intel's SCC architecture specification.[120]
On AMD K6 and later maps to FEMMS operation (fast clear of MMX state) but Only available in Red unlock state
PATCH2 0F 0E
on Intel identified as uarch data read on Intel[121] (0F 0F too)
0F (10..13) /r Moves data to/from user memory when operating in ICE HALT mode.[122] Acts
UMOV r/m,r as regular MOV otherwise. Opcodes reused for SSE
instructions in later CPUs.
Using the 64h (FS: segment) prefix with the undocumented D6 Available on the UMC Green CPU
Unknown mnemonic 64 D6 (SALC/SETALC) instruction will, on UMC CPUs only, cause EAX to be set to only. Executes as SALC on non-
0xAB6B1B07.[128][129] UMC CPUs.
https://en.wikipedia.org/wiki/X86_instruction_listings 48/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia
64 (70..7F) rel8, On Intel "NetBurst" (Pentium 4) CPUs, the 64h (FS: segment) instruction prefix
will, when used with conditional branch instructions, act as a branch hint to Segment prefixes on
FS: Jcc indicate that the branch will be alternating between taken and not-taken.[130] conditional branches are
64 0F (80..8F) rel16/32 Unlike other NetBurst branch hints (CS: and DS: segment prefixes), this hint is accepted but ignored by non-
not documented. NetBurst CPUs.
VEX.66.0F38 On AMD Zen1, FMA4 instructions are present but undocumented (missing
(FMA4) (5C..5F,68..6F,78..7F) /r CPUID flag). The reason for leaving the feature undocumented may or may not Removed from Zen2 onwards.
imm8 have been due to a buggy implementation.[131]
Perform SHA-512 hashing.
Detected by CPU fuzzing tools such as SandSifter[134] and UISFuzz[135] as executing without
causing #UD on several different VIA and Zhaoxin CPUs.
Unknown mnemonic 0F A7 (C1..C7) Unknown operation, may be related to the documented XSTORE
(0F A7 C0) instruction.
These opcodes are listed as reserved opcodes that will produce "unpredictable results" without generating
D9 D7, D9 E2, exceptions on at least Cyrix 6x86,[140] 6x86MX, MII, MediaGX, and AMD Geode GX/LX.[141] (The
D9 E7, DD FC, documentation for these CPUs all list the same ten opcodes.)
"Reserved by
(no mnemonic) DE D8, DE DA,
Cyrix" opcodes
DE DC, DE DD, Their actual operation is not known, nor is it known whether their operation is the same on all
DE DE, DF FC of these CPUs.
https://en.wikipedia.org/wiki/X86_instruction_listings 49/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia
See also
CLMUL
RDRAND
Larrabee extensions
Advanced Vector Extensions 2
Bit Manipulation Instruction Sets
CPUID
List of discontinued x86 instructions
References
1. "Re: Intel Processor Identification and the CPUID Instruction" (http://ww 21. Cyrix 486SLC/e Data Sheet (1992) (http://www.bitsavers.org/component
w.intel.com/content/www/us/en/processors/processor-identification-cpui s/cyrix/Cyrix_Cx486SLCe_Data_Sheet_1992.pdf), section 2.6.4
d-instruction-note.html?wapkw=processor-identification-cpuid-instructio 22. Robert Collins, CPUID Algorithm Wars (https://web.archive.org/web/200
n). Retrieved 2013-04-21. 01218003500/http://www.rcollins.org/ddj/Nov96/Nov96.html), nov 1996.
2. Andrew Schulman, "Unauthorized Windows 95" (ISBN 1-56884-169-8), Archived from the original (http://www.rcollins.org/ddj/Nov96/Nov96.htm
chapter 8, p.249,257. l) on dec 18, 2000.
3. US Patent 4974159 (https://patents.google.com/patent/US4974159A/), 23. Geoff Chappell, CMPXCHG8B Support in the 32-Bit Windows Kernel (ht
"Method of transferring control in a multitasking computer system" tps://www.geoffchappell.com/studies/windows/km/cpu/cx8.htm), 23 jan
mentions 63h/ARPL. 2008
4. WikiChip, UMIP - x86 (https://en.wikichip.org/wiki/x86/umip) 24. Intel, Software Developer's Manual (https://kib.kiev.ua/x86docs/Intel/SD
5. Oracle Corp, Oracle® VM VirtualBox Administrator's Guide for Release Ms/325462-077.pdf), order no. 325426-077, Nov 2022 - the entry on the
6.0, section 3.5: Details About Software Virtualization (https://docs.oracl RDTSC instruction on p.1739 describes the instruction sequences
e.com/en/virtualization/virtualbox/6.0/admin/swvirt-details.html) required to order the RDTSC instruction with respect to earlier and later
6. MBC Project,Virtual Machine Detection (https://github.com/MBCProject/ instructions.
mbc-markdown/blob/master/anti-behavioral-analysis/detect-vm.md) 25. Linux kernel 5.4.12, /arch/x86/kernel/cpu/centaur.c (https://elixir.bootlin.c
om/linux/v5.4.12/source/arch/x86/kernel/cpu/centaur.c#L110)
7. Michal Necasek, SGDT/SIDT Fiction and Reality (https://www.os2muse
um.com/wp/sgdtsidt-fiction-and-reality/) 26. Stack Overflow, Can constant non-invariant tsc change frequency
8. Intel, How Microarchitectural Data Sampling works (https://www.intel.co across cpu states? (https://stackoverflow.com/questions/62492053/can-
m/content/www/us/en/developer/articles/technical/software-security-guid constant-non-invariant-tsc-change-frequency-across-cpu-states)
ance/technical-documentation/intel-analysis-microarchitectural-data-sa Accessed 24 Jan 2023.
mpling.html), see mitigations section. Archived (https://archive.ph/dRI0 27. CPU-World, CPUID for IDT ZHAOXIN KaiXian KX-U6780 2.7 GHz (by
B) on Apr 22,2022 KZ) (https://www.cpu-world.com/cgi-bin/CPUID.pl?CPUID=78468).
9. Linux kernel documentation, Microarchitectural Data Sampling (MDS) Accessed 24 Jan 2023.
mitigation (https://www.kernel.org/doc/html/latest/x86/mds.html) 28. JookWiki, "nopl" (https://www.jookia.org/wiki/Nopl), sep 24, 2022 -
provides a lengthy account of the history of the long NOP and the
10. sandpile.org, x86 architecture rFLAGS register (https://www.sandpile.or
issues around it. Archived (https://web.archive.org/web/2022102823322
g/x86/flags.htm), see note #7
5/https://www.jookia.org/wiki/Nopl) on oct 28, 2022.
11. "Intel 80386 CPU Information | PCJS Machines" (https://www.pcjs.org/d
ocuments/manuals/intel/80386/#b0-stepping). 29. Intel Community: Multibyte NOP Made Official (https://community.intel.c
om/t5/Software-Archive/Multi-byte-NOP-opcode-made-official/td-p/9325
12. Geoff Chappell, CPU Identification before CPUID (https://www.geoffcha 80). Archived (https://web.archive.org/web/20220407203915/https://com
ppell.com/studies/windows/km/cpu/precpuid.htm?tx=245) munity.intel.com/t5/Software-Archive/Multi-byte-NOP-opcode-made-offic
13. Toth, Ervin (1998-03-16). "BSWAP with 16-bit registers" (https://web.arc ial/td-p/932580) on 7 Apr 2022.
hive.org/web/19991103025640/http://www.df.lth.se/~john_e/gems/gem0 30. Intel Software Developers Manual, vol 3B (https://www.intel.com/conten
00c.html). Archived from the original (http://www.df.lth.se/~john_e/gems/ t/www/us/en/developer/articles/technical/intel-sdm.html) (order no
gem000c.html) on 1999-11-03. "The instruction brings down the upper 253669-076us, December 2021), section 22.15 "Reserved NOP"
word of the doubleword register without affecting its upper 16 bits."
31. AMD, AMD 64-bit Technology - AMD x86-64 Architecture Programmer’s
14. Coldwin, Gynvael (2009-12-29). "BSWAP + 66h prefix" (https://gynvael. Manual Volume 3 (https://kib.kiev.ua/x86docs/AMD/AMD64/24594_APM
coldwind.pl/?id=268). Retrieved 2018-10-03. "internal (zero-)extending _v3-r3.02.pdf), publication no. 24594, rev 3.02, aug 2002, page 379.
the value of a smaller (16-bit) register … applying the bswap to a 32-bit
32. AMD, AMD-K6 Processor Data Sheet (https://www.ardent-tool.com/CP
value "00 00 AH AL", … truncated to lower 16-bits, which are "00 00". …
U/docs/AMD/K6/20695.pdf), order no. 20695H/0, March 1998, section
Bochs … bswap reg16 acts just like the bswap reg32 … QEMU …
ignores the 66h prefix" 24.2, page 283.
15. Intel "i486 Microprocessor" (https://www.ardent-tool.com/CPU/docs/Inte 33. George Dunlap, The Intel SYSRET Privilege Escalation (https://xenproje
l/486/datasheets/240440-001.pdf) (April 1989, order no. 240440-001) ct.org/2012/06/13/the-intel-sysret-privilege-escalation/), The Xen
Project., 13 june 2012
p.142 lists CMPXCHG with 0F A6/A7 encodings.
34. Intel, AP-485: Intel® Processor Identification and the CPUID Instruction
16. "Intel 486 & 486 POD CPUID, S-spec, & Steppings" (http://datasheets.c
hipdb.org/Intel/x86/486/Intel486.htm). (http://kib.kiev.ua/x86docs/Intel/AppNote485/241618-039.pdf), order no.
241618-039, may 2012, section 5.1.2.5, page 32
17. Intel "i486 Microprocessor" (https://www.ardent-tool.com/CPU/docs/Inte
l/486/datasheets/240440-002.pdf) (November 1989, order no. 240440- 35. Michal Necasek, "SYSENTER, Where Are You?" (http://www.os2museu
m.com/wp/sysenter-where-are-you/)
002) p.135 lists CMPXCHG with 0F B0/B1 encodings.
36. AMD, Athlon Processor x86 Code Optimization Guide (https://pdf.datash
18. Frank van Gilluwe, "The Undocumented PC, second edition", 1997,
ISBN 0-201-47950-8, page 55 eetcatalog.com/datasheet/AdvancedMicroDevices/mXvyvs.pdf),
publication no. 22007, rev K, feb 2002, appendix F, page 284
19. Intel, Software Developer’s Manual, vol 3A (http://kib.kiev.ua/x86docs/Int
el/SDMs/253668-078.pdf), order no. 253668-078, Dec 2022, section 37. Transmeta, Processor Recognition (http://datasheets.chipdb.org/Transm
eta/Crusoe/Crusoe_CPUID_5-7-02.pdf), May 7, 2002.
9.3, page 299
20. "RSM—Resume from System Management Mode" (https://web.archive. 38. VIA, VIA C3 Nehemiah Processor Datasheet (http://datasheets.chipdb.o
rg/VIA/Nehemiah/VIA%20C3%20Nehemiah%20Datasheet%20R113.pd
org/web/20120312224625/http://www.softeng.rl.ac.uk/st/archive/SoftEn
g/SESP/html/SoftwareTools/vtune/users_guide/mergedProjects/analyze f), rev 1.13, sep 29, 2004, page 17
r_ec/mergedProjects/reference_olh/mergedProjects/instructions/instruct 39. CPU-World, CPUID for Intel Xeon 3.40 GHz (https://www.cpu-world.co
32_hh/vc279.htm). Archived from the original on 2012-03-12. m/cgi-bin/CPUID.pl?CPUID=75151) - Nocona stepping D CPUID
without CMPXCHG16B
https://en.wikipedia.org/wiki/X86_instruction_listings 50/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia
40. CPU-World, CPUID for Intel Xeon 3.60 GHz (https://www.cpu-world.co 62. Intel, 11th Generation Intel® Core™ Processor Desktop Datasheet,
m/cgi-bin/CPUID.pl?CPUID=75154) - Nocona stepping E CPUID with Volume 1 (https://cdrdv2.intel.com/v1/dl/getContent/634648), may 2022,
CMPXCHG16B order no. 634648-004, section 3.5, page 65
41. SuperUser StackExchange, How prevalent are old x64 processors 63. Intel, Which Platforms Support Intel® Software Guard Extensions
lacking the cmpxchg16b instruction? (https://superuser.com/questions/1 (Intel® SGX) SGX2? (https://www.intel.com/content/www/us/en/support/
87254/how-prevalent-are-old-x64-processors-lacking-the-cmpxchg16b-i articles/000058764/software/intel-security-products.html) Archived (http
nstruction) s://archive.ph/nQHXO) on 5 May 2022.
42. Intel SDM order no. 325462-077 (https://www.intel.com/content/www/us/ 64. "The CLDEMOTE Story" (https://twitter.com/InstLatX64/status/1521562
en/developer/articles/technical/intel-sdm.html), apr 2022, vol 2B, p.4- 151848132609). Twitter. Retrieved 2023-01-23.
130 "MOVSX/MOVSXD-Move with Sign-Extension" lists MOVSXD 65. Wikichip, CLZERO - x86 (https://en.wikichip.org/w/index.php?title=x86/c
without REX.W as "discouraged" lzero&oldid=94738)
43. Anandtech, AMD Zen 3 Ryzen Deep Dive Review (https://www.anandte 66. Intel, Application note AP-578: Software and Hardware Considerations
ch.com/show/16214/amd-zen-3-ryzen-deep-dive-review-5950x-5900x-5 for FPU Exception Handlers for Intel Architecture Processors (https://ard
800x-and-5700x-tested/6), nov 5, 2020, page 6 ent-tool.com/CPU/docs/Intel/IA/243291-002.pdf), order no. 243291-002,
44. "Saving Private Ryzen: PEXT/PDEP 32/64b replacement functions for February 1997
#AMD CPUs (BR/#Zen/Zen+/#Zen2) based on @zwegner's zp7" (http 67. Intel, Application Note AP-113: Getting Started With The Numeric Data
s://twitter.com/instlatx64/status/1322503571288559617). Twitter. Processor (https://ardent-tool.com/CPU/docs/Intel/808x/8087/appnotes/
Retrieved 2023-01-20. AP-113.pdf), feb 1981, pages 24-25
45. Wegner, Zach (4 November 2020). "zwegner/zp7" (https://github.com/z 68. Intel, 8087 Math Coprocessor (http://www.datasheetcatalog.com/datash
wegner/zp7). GitHub. eets_pdf/8/0/8/7/8087.shtml), oct 1989, order no. 285385-007, page 3-
46. Intel, Control-flow Enforcement Technology Specification (https://kib.kie 100, fig 9
v.ua/x86docs/Intel/CET/334525-003.pdf) (v3.0, order no. 334525-003, 69. Intel, 80287 80-bit HMOS Numeric Processor Extension (http://www.bits
March 2019) avers.org/components/intel/_dataSheets/80287_Data_Sheet_Feb83.pd
47. Intel SDM, rev 076, December 2021 (https://kib.kiev.ua/x86docs/Intel/S f), feb 1983, order no. 201920-001, page 14
DMs/253665-076.pdf), volume 1, section 18.3.1 70. Intel, iAPX86, 88 User's Manual (https://ardent-tool.com/CPU/docs/Intel/
48. Binutils mailing list: x86: CET v2.0: Update NOTRACK prefix (https://sou 808x/manuals/210201-001.pdf), 1981 (order no. 210201-001), p. 797
rceware.org/pipermail/binutils/2017-June/098516.html) 71. Intel 80286 and 80287 Programmers Reference Manual (http://bitsaver
49. AMD, Extensions to the 3DNow! and MMX Instruction Sets (https://refsp s.trailing-edge.com/components/intel/80286/210498-005_80286_and_8
ecs.linuxfoundation.org/AMD-extensions.pdf), ref no. 22466D/0, March 0287_Programmers_Reference_Manual_1987.pdf), 1987 (order no.
2000, p.11 210498-005), p. 485
50. Hadi Brais, The Significance of the x86 SFENCE instruction (https://hadi 72. Intel Software Developer's Manual (https://kib.kiev.ua/x86docs/Intel/SD
brais.wordpress.com/2019/02/26/the-significance-of-the-x86-sfence-inst Ms/253669-064.pdf) volume 3B, revision 064, section 22.18.9
ruction/), 26 Feb 2019. 73. "GCC Bugzilla – 37179 – GCC emits bad opcode 'ffreep' " (https://gcc.g
51. Intel, Software Developer's Manual (https://kib.kiev.ua/x86docs/Intel/SD nu.org/bugzilla/show_bug.cgi?id=37179).
Ms/325462-077.pdf), order no. 325426-077, Nov 2022, Volume 1, 74. Michael Steil, FFREEP – the assembly instruction that never existed (htt
section 11.4.4.3, page 276. ps://www.pagetable.com/?p=16)
52. Hadi Brais, The Significance of the LFENCE instruction (https://hadibrai 75. Dusko Koncaliev, Pentium FDIV Bug (https://www.cs.earlham.edu/~dus
s.wordpress.com/2018/05/14/the-significance-of-the-x86-lfence-instructi ko/cs63/fdiv.html)
on/), 14 May 2018
76. Bruce Dawson, Intel Underestimates Error Bounds by 1.3 quintillion (htt
53. AMD, Software techniques for managing speculation on AMD processor ps://randomascii.wordpress.com/2014/10/09/intel-underestimates-error-
(https://www.amd.com/system/files/documents/software-techniques-for- bounds-by-1-3-quintillion/)
managing-speculation.pdf), rev 3.8.22, 8 March 2022, page 4. Archived 77. Intel SDM, rev 053 (https://kib.kiev.ua/x86docs/Intel/SDMs/253665-053.
(https://web.archive.org/web/20220313090311/https://www.amd.com/sy
pdf) and later, describes the exact argument reduction procedure used
stem/files/documents/software-techniques-for-managing-speculation.pd
for FSIN, FCOS, FSINCOS and FPTAN in volume 1, section 8.3.8
f) on 13 March 2022.
78. Intel, Intel® 64 and IA-32 Architectures Optimization Reference Manual
54. Intel, Prescott New Instructions Software Developer’s Guide (https://mat (https://kib.kiev.ua/x86docs/Intel/Intel-OptimGuide/248966-029.pdf)
h-atlas.sourceforge.net/devel/assembly/sse3.pdf), order no. 252490- (order no. 248966-044, June 2021) section 3.5.2.3
003, june 2003, pages 3-26 and 3-38 list MONITOR and MWAIT with
explicit operands. 79. "The microarchitecture of Intel, AMD and VIA CPUs: An optimization
guide for assembly programmers and compiler makers" (http://www.agn
55. Flat Assembler messageboard, "BLENDVPS/BLENDVPD/PBLENDVB er.org/optimize/microarchitecture.pdf) (PDF). Retrieved October 17,
syntax" (https://board.flatassembler.net/topic.php?p=98558#98558), 2016.
also covers MONITOR/MWAIT mnemonics
80. "Chess programming AVX2" (https://chessprogramming.wikispaces.co
56. Intel, Intel® Xeon Phi™ Product Family x200 (KNL) User mode (ring 3)
m/AVX2). Retrieved October 17, 2016.
MONITOR and MWAIT (https://web.archive.org/web/20170305002312/h
ttps://software.intel.com/en-us/blogs/2016/10/06/intel-xeon-phi-product-f 81. "Intel AVX-512 Instructions" (https://www.intel.com/content/www/us/en/d
amily-x200-knl-user-mode-ring-3-monitor-and-mwait) (archived 5 mar eveloper/articles/technical/intel-avx-512-instructions.html). Intel.
2017) Retrieved 21 June 2022.
57. AMD, BIOS and Kernel Developer’s Guide (BKDG) For AMD Family 82. Michal Ludvig, VIA PadLock—Wicked Fast Encryption (https://www.linux
10h Processors (https://www.amd.com/system/files/TechDocs/31116.pd journal.com/article/8042), 6 Apr 2005
f), order no. 31116, rev 3.62, page 419 83. Zhaoxin, Core Technology | Instructions for the use of accelerated
58. Guru3D, VIA Zhaoxin x86 4 and 8-core SoC processors launch (https:// instructions for national encryption algorithm based on Zhaoxin
www.guru3d.com/news-story/via-zhaoxin-x86-4-and-8-core-processors-l processor (https://www.zhaoxin.com/prodshow_view.aspx?id=483&nid=
aunched.html), Jan 22, 2018 31&typeid=63&IsActiveTarget=True) (in Chinese). Archived (https://web.
archive.org/web/20220105165938/https://www.zhaoxin.com/prodshow_
59. Vulners, x86: DoS from attempting to use INVPCID with a non- view.aspx?id=483&nid=31&typeid=63&IsActiveTarget=True) on Jan 5,
canonical addresses (https://vulners.com/xen/XSA-279), 20 nov 2018 2022
60. Intel, Intel® 64 and IA-32 Architectures Software Developer’s Manual (ht 84. Zhaoxin, GMI User Manual v1.0 (https://github.com/ZXOpenSource/Ope
tp://kib.kiev.ua/x86docs/Intel/SDMs/325384-078.pdf) volume 3, order
nSSL-ZX-GMI/blob/63e8835e1854042ac949a04eb87c604cb1d7d5ea/G
no. 325384-078, december 2022, chapter 23.15 MI%20User%20Manual%20V1.0.pdf) (in Chinese). Archived (https://we
61. Catherine Easdon, Undocumented CPU Behaviour on x86 and RISC-V b.archive.org/web/20220228071045/https://raw.githubusercontent.com/
Microarchitectures: A Security Perspective (https://www.cattius.com/ima ZXOpenSource/OpenSSL-ZX-GMI/master/GMI%20User%20Manual%2
ges/thesis-unsigned.pdf), 10 May 2019, page 39 0V1.0.pdf) on Feb 28, 2022
https://en.wikipedia.org/wiki/X86_instruction_listings 51/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia
85. Intel, Intel® Atom™ Processor S1200 Product Family for Microserver 112. "Invalid opcode handling \ VOGONS" (https://www.vogons.org/viewtopi
Datasheet, Volume 1 of 2 (https://www.intel.com/content/dam/www/publi c.php?t=13379). Archived (https://web.archive.org/web/2022040907115
c/us/en/documents/datasheets/atom-processor-s1200-datasheet-vol-1.p 9/https://www.vogons.org/viewtopic.php?t=13379) from the original on 9
df), order no. 328194-001, dec 2012, page 44 Apr 2022.
86. SecurityWeek, Intel Adds TDX to Confidential Computing Portfolio With 113. "Invalid instructions cause exit even if Int 6 is hooked \ VOGONS" (http
Launch of 4th Gen Xeon Processors (https://www.securityweek.com/inte s://www.vogons.org/viewtopic.php?t=21418). Archived (https://web.archi
l-adds-tdx-confidential-computing-portfolio-launch-4th-gen-xeon-process ve.org/web/20220409071155/https://www.vogons.org/viewtopic.php?t=2
ors), 10 jan 2023 1418) from the original on 9 Apr 2022.
87. Robert Collins, Undocumented OpCodes: AAM (http://www.rcollins.org/s 114. "Tutorial - Calling Win32 from DOS" (https://www.ragestorm.net/tutorial?
ecrets/opcodes/AAM.html) id=27). Ragestorm. 17 Sep 2005. Archived (https://web.archive.org/web/
88. Frank van Gilluwe, "The Undocumented PC - Second Edition", p. 93-95 20220409071159/https://www.ragestorm.net/tutorial?id=27) from the
89. Michal Necasek, Intel 486 Errata? (http://www.os2museum.com/wp/intel original on 4 Apr 2022.
-486-errata/) 115. "Accessing Windows device drivers from DOS programs" (https://sta.c6
4.org/blog/dosvddaccess.html).
90. Robert Hummel, "PC Magazine Programmer's Technical Reference"
(ISBN 1-56276-016-5) p.728 116. "8086 microcode disassembled" (https://www.reenigne.org/blog/8086-mi
91. Raúl Gutiérrez Sanz, Undocumented 8086 Opcodes, Part I (http://www. crocode-disassembled/). Reengine blog. 2020-09-03. Retrieved
2022-07-26. "Using the REP or REPNE prefix with a MUL or IMUL
os2museum.com/wp/undocumented-8086-opcodes-part-i/)
instruction negates the product. Using the REP or REPNE prefix with an
92. "Asm, opcode 82h" (http://computer-programming-forum.com/46-asm/1 IDIV instruction negates the quotient."
43edbd28ae1a091.htm).
117. "Re: Undocumented opcodes (HINT_NOP)" (https://web.archive.org/we
93. Intel Corporation 2022, p. 3698. b/20041106070621/http://www.sandpile.org/post/msgs/20004129.htm).
94. Intel, The 8086 Family User's Manual, October 1979 (http://bitsavers.or Archived from the original (http://www.sandpile.org/post/msgs/2000412
g/components/intel/8086/9800722-03_The_8086_Family_Users_Manua 9.htm) on 2004-11-06. Retrieved 2010-11-07.
l_Oct79.pdf), opcodes omitted on pages 4-25 and 4-31 118. "Re: Also some undocumented 0Fh opcodes" (https://web.archive.org/w
95. Retrocomputing StackExchange, Undocumented instructions in x86 eb/20030626044017/http://www.sandpile.org/post/msgs/20003986.htm).
CPU prior to 80386? (https://retrocomputing.stackexchange.com/questi Archived from the original (http://www.sandpile.org/post/msgs/2000398
ons/20031/undocumented-instructions-in-x86-cpu-prior-to-80386) 6.htm) on 2003-06-26. Retrieved 2010-11-07.
96. Daniel B. Sedory, An Examination of the Standard MBR (https://thestar 119. Intel's RCCE library (https://github.com/Intel-SCC/RCCE/blob/master/sr
man.pcministry.com/asm/mbr/STDMBR.htm#REP) c/RCCE_admin.c#L87) for the SCC uses opcode 0F 0A for SCC's
97. AMD, Software Optimization Guide for AMD64 Processors (https://www. message invalidation instruction.
amd.com/system/files/TechDocs/25112.PDF) (publication 25112, 120. Intel Labs, SCC External Architecture Specification (EAS), Revision
revision 3.06, sep 2005), section 6.2, p.128 0.94 (https://www.intel.com/content/dam/www/public/us/en/documents/t
98. Bug 48227 - "rep ret" generated for -march=core2 (https://gcc.gnu.org/b echnology-briefs/intel-labs-single-chip-cloud-architecture-brief.pdf), p.29
ugzilla/show_bug.cgi?id=48227) 121. "Undocumented x86 instructions to control the CPU at the
99. Raymond Chen, My, what strange NOPs you have! (https://devblogs.mi microarchitecture level in modern Intel processors" (https://raw.githubus
crosoft.com/oldnewthing/20110112-00/?p=11773) ercontent.com/chip-red-pill/udbgInstr/main/paper/undocumented_x86_in
100. Jeff Parsons, Intel 80386 CPU information (https://www.pcjs.org/docum sts_for_uarch_control.pdf) (PDF). 9 July 2021.
ents/manuals/intel/80386/#b1-errata) (B1 errata section, item #7) 122. Robert R. Collins, Undocumented OpCodes: UMOV (http://www.rcollins.
101. Retrocomputing StackExchange, 0F1h opcode-prefix on i80286 (https:// org/secrets/opcodes/UMOV.html)
retrocomputing.stackexchange.com/questions/12004/0f1h-opcode-prefi 123. Herbert Oppmann, NXOP (Opcode 0Fh 55h) (https://www.memotech.fra
x-on-i80286) nken.de/NexGen/Opcode0F55.html)
102. Intel Software Developers Manual, volume 2B (https://kib.kiev.ua/x86do 124. Herbert Oppmann, NexGen Nx586 Hypercode Source (https://www.me
cs/Intel/SDMs/253667-018.pdf) (Jan 2006, order no 235667-018, does motech.franken.de/NexGen/Source/index.html), see COMMON.INC
not have long NOP) 125. Herbert Oppmann, Inside the NexGen Nx586 System BIOS (https://ww
103. Intel Software Developers Manual, volume 2B (https://kib.kiev.ua/x86do w.memotech.franken.de/NexGen/Bios.html)
cs/Intel/SDMs/253667-019.pdf) (March 2006, order no 235667-019, has 126. Grzegorz Mazur, AMD 3DNow! undocumented instructions (https://web.
long NOP) archive.org/web/20000121143428/http://x86.ddj.com/articles/3dnow/am
104. Agner Fog, Instruction Tables (https://www.agner.org/optimize/instructio d_3dnow.htm)
n_tables.pdf), AMD K7 section. 127. "Undocumented 3DNow! Instructions" (https://web.archive.org/web/200
105. "579838 – glibc not compatible with AMD Geode LX" (https://bugzilla.re 30130030723/http://grafi.ii.pw.edu.pl/gbm/x86/3dundoc.html).
dhat.com/show_bug.cgi?id=579838#c46). grafi.ii.pw.edu.pl. Archived from the original (http://grafi.ii.pw.edu.pl/gbm/
106. Intel Software Developers Manual, volume 2B (https://kib.kiev.ua/x86do x86/3dundoc.html) on 30 January 2003. Retrieved 22 February 2022.
cs/Intel/SDMs/253667-015.pdf) (April 2005, order no 235667-015, does 128. Potemkin's Hacker Group's OPCODE.LST, v4.51 (http://phg.chat.ru/opc
not list 0F0D-nop) ode.txt)
107. Intel Software Developers Manual, volume 2B (https://kib.kiev.ua/x86do 129. "[UCA CPU Analysis] Prototype UMC Green CPU U5S-SUPER33" (http
cs/Intel/SDMs/253667-016.pdf) (June 2005, order no 235667-016, lists s://x86.fr/uca-cpu-analysis-prototype-umc-green-cpu-u5s-super33). 25
0F0D-nop in opcode table but not under NOP instruction description.) May 2020.
108. Intel Software Developers Manual, volume 2B (https://kib.kiev.ua/x86do 130. Agner Fog, The Microarchitecture of Intel, AMD and VIA CPUs (https://w
cs/Intel/SDMs/253667-060.pdf) (order no. 253667-060, September ww.agner.org/optimize/microarchitecture.pdf), section 3.4 "Branch
2016) does not list UD0 and UD1. Prediction in P4 and P4E".
109. Intel Software Developers Manual, volume 2B (https://kib.kiev.ua/x86do 131. Reddit /r/Amd discussion thread: Ryzen has undocumented support for
cs/Intel/SDMs/253667-061.pdf) (order no. 253667-061, December FMA4 (https://www.reddit.com/r/Amd/comments/68s4bj/ryzen_has_und
2016) lists UD0 and UD1. ocumented_support_for_fma4/dh0y353/)
110. "PCJS : pcjs/x86op0F.js (two-byte x86 opcode handlers), lines 1647- 132. "Welcome to the OpenSSL Project" (https://github.com/openssl/openssl/
1651" (https://github.com/jeffpar/pcjs/blob/e565ffa65d8ee5d600ec04e62 blob/1aa89a7a3afb053d0c0b7fad8d3ea1b0a5447289/engines/asm/e_p
c6651dabb4894cb/machines/pcx86/lib/x86op0f.js#L1647). GitHub. 17 adlock-x86.pl#L597). GitHub. 21 April 2022.
April 2022. 133. PATCH: Update PadLock engine for VIA C7 and Nano CPUs (https://ma
111. "80486 paging protection faults? \ VOGONS" (https://www.vogons.org/vi rc.info/?l=openssl-dev&m=130767391615291&w=2)
ewtopic.php?t=62949). Archived (https://web.archive.org/web/20220409 134. Christopher Domas, Breaking the x86 ISA (https://github.com/xoreaxeax
071156/https://www.vogons.org/viewtopic.php?t=62949) from the eax/sandsifter/blob/master/references/domas_breaking_the_x86_isa_w
original on 9 Apr 2022. p.pdf)
https://en.wikipedia.org/wiki/X86_instruction_listings 52/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia
135. Xixing Li et al, UISFuzz: An Efficient Fuzzing Method for CPU 138. Microprocessor Report, MediaGX Targets Low-Cost PCs (http://www.ce
Undocumented Instruction Searching (https://ieeexplore.ieee.org/abstra cs.uci.edu/~papers/mpr/MPR/ARTICLES/110301.PDF) (vol 11, no. 3,
ct/document/8863327) mar 10, 1997)
136. OpenEuler mailing list, PATCH kernel-4.19 v2 5/6 : x86/cpufeatures: 139. ISA datafile for Intel XED (https://github.com/intelxed/xed/blob/ef19f00d
Add Zhaoxin feature bits (https://mailweb.openeuler.org/hyperkitty/list/ke e14a9c2c253c1c9b1119e1617280e3f2/datafiles/xed-isa.txt#L916) (April
rnel@openeuler.org/thread/W6GXBRRO6OKNHVJ3WDDUXSLQGI2G 17, 2022), lines 916-944
FU4X/). Archived (https://web.archive.org/web/20220409071314/https:// 140. Cyrix 6x86 processor data book (https://www.ardent-tool.com/CPU/doc
mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/thread/W6G s/Cyrix/6x86/94175.pdf), page 6-34
XBRRO6OKNHVJ3WDDUXSLQGI2GFU4X/) on 9 Apr 2022. 141. AMD Geode LX Processors Data Book (https://www.amd.com/system/fil
137. CPUID dump for Zhaoxin KaiXian-U6870 (http://users.atw.hu/instlatx64/ es/TechDocs/33234H_LX_databook.pdf), publication 33234H, p.670
CentaurHauls/CentaurHauls00307B1_U6780A_CPUID2.txt), see
C0000001 line. Archived (https://web.archive.org/web/2022040907131
4/http://users.atw.hu/instlatx64/CentaurHauls/CentaurHauls00307B1_U
6780A_CPUID2.txt) on 9 Apr 2022.
Intel Corporation (April 2022). "Intel 64 and IA-32 Architectures Software Developer's Manual, Combined Volumes: 1, 2A, 2B, 2C, 2D, 3A, 3B, 3C, 3D
and 4" (https://cdrdv2.intel.com/v1/dl/getContent/671200). Intel. Retrieved 21 June 2022.
External links
Free IA-32 and x86-64 documentation (https://software.intel.com/en-us/articles/intel-sdm), provided by Intel
x86 Opcode and Instruction Reference (http://ref.x86asm.net)
x86 and amd64 instruction reference (https://www.felixcloutier.com/x86/index.html)
Instruction tables: Lists of instruction latencies, throughputs and micro-operation breakdowns for Intel, AMD and VIA CPUs (https://www.agner.org/opti
mize/instruction_tables.pdf)
Netwide Assembler Instruction List (https://www.nasm.us/doc/nasmdocb.html) (from Netwide Assembler)
https://en.wikipedia.org/wiki/X86_instruction_listings 53/53