You are on page 1of 53

3/14/23, 12:35 AM x86 instruction listings - Wikipedia

x86 instruction listings


The x86 instruction set refers to the set of instructions that x86-compatible microprocessors support. The instructions are usually part of an executable
program, often stored as a computer file and executed on the processor.

The x86 instruction set has been extended several times, introducing wider registers and datatypes as well as new functionality.[1]

x86 integer instructions

https://en.wikipedia.org/wiki/X86_instruction_listings 1/53
3/14/23, 12:35 AM x86 instruction listings - Wikipedia
Below is the full 8086/8088 instruction set of Intel (81 instructions total). Most if not all of these instructions are available in 32-bit mode; they just
operate on 32-bit registers (eax, ebx, etc.) and values instead of their 16-bit (ax, bx, etc.) counterparts. The updated instruction set is also grouped
according to architecture (i386, i486, i686) and more generally is referred to as (32-bit) x86 and (64-bit) x86-64 (also known as AMD64).

Original 8086/8088 instructions

Original 8086/8088 instruction set


Instruction Meaning Notes Opcode

AAA ASCII adjust AL after addition used with unpacked binary-coded decimal 0x37
8086/8088 datasheet documents only base 10 version of the AAD instruction
(opcode 0xD5 0x0A), but any other base will work. Later Intel's documentation has
AAD ASCII adjust AX before division the generic form too. NEC V20 and V30 (and possibly other NEC V-series CPUs) 0xD5
always use base 10, and ignore the argument, causing a number of
incompatibilities

AAM ASCII adjust AX after multiplication Only base 10 version (Operand is 0xA) is documented, see notes for AAD 0xD4

AAS ASCII adjust AL after subtraction 0x3F


0x10…0x15, 0x80…0x81/2,
ADC Add with carry destination = destination + source + carry_flag
0x82…0x83/2 (since 80186)

0x00…0x05, 0x80/0…
ADD Add (1) r/m += r/imm; (2) r += m/imm; 0x81/0, 0x82/0…0x83/0
(since 80186)
0x20…0x25, 0x80…0x81/4,
AND Logical AND (1) r/m &= r/imm; (2) r &= m/imm;
0x82…0x83/4 (since 80186)

CALL Call procedure push eip; eip points to the instruction directly after the call 0x9A, 0xE8, 0xFF/2, 0xFF/3

CBW Convert byte to word 0x98


CLC Clear carry flag CF = 0; 0xF8

CLD Clear direction flag DF = 0; 0xFC

CLI Clear interrupt flag IF = 0; 0xFA


CMC Complement carry flag 0xF5

0x38…0x3D, 0x80…0x81/7,
CMP Compare operands
0x82…0x83/7 (since 80186)

Compare bytes in memory. May be used with a


CMPSB 0xA6
REP prefix to repeat the instruction CX times.
Compare words. May be used with a REP prefix
CMPSW 0xA7
to repeat the instruction CX times.

CWD Convert word to doubleword 0x99

DAA Decimal adjust AL after addition (used with packed binary-coded decimal) 0x27
DAS Decimal adjust AL after subtraction 0x2F

0x48…0x4F, 0xFE/1,
DEC Decrement by 1
0xFF/1

(1) AX = DX:AX / r/m; resulting DX = remainder (2) AL = AX / r/m; resulting


DIV Unsigned divide 0xF7/6, 0xF6/6
AH = remainder
ESC Used with floating-point unit 0xD8..0xDF

HLT Enter halt state 0xF4

(1) AX = DX:AX / r/m; resulting DX = remainder (2) AL = AX / r/m; resulting


IDIV Signed divide 0xF7/7, 0xF6/7
AH = remainder
0x69, 0x6B (both since
IMUL Signed multiply in One-operand form (1) DX:AX = AX * r/m; (2) AX = AL * r/m 80186), 0xF7/5, 0xF6/5,
0x0FAF (since 80386)

(1) AL = port[imm]; (2) AL = port[DX]; (3) AX = port[imm]; (4) AX =


IN Input from port 0xE4, 0xE5, 0xEC, 0xED
port[DX];

0x40…0x47, 0xFE/0,
INC Increment by 1
0xFF/0
INT Call to interrupt 0xCC, 0xCD

INTO Call to interrupt if overflow 0xCE

IRET Return from interrupt 0xCF


(JA, JAE, JB, JBE, JC, JE, JG, JGE, JL, JLE, JNA, JNAE, JNB, JNBE,
0x70…0x7F, 0x0F80…
Jcc Jump if condition JNC, JNE, JNG, JNGE, JNL, JNLE, JNO, JNP, JNS, JNZ, JO, JP, JPE,
0x0F8F (since 80386)
JPO, JS, JZ)

JCXZ Jump if CX is zero 0xE3

0xE9…0xEB, 0xFF/4,
JMP Jump
0xFF/5
LAHF Load FLAGS into AH register 0x9F

LDS Load DS:r with far pointer 0xC5

LEA Load Effective Address 0x8D

https://en.wikipedia.org/wiki/X86_instruction_listings 2/53
3/14/23, 12:35 AM x86 instruction listings - Wikipedia

Instruction Meaning Notes Opcode

LES Load ES:r with far pointer 0xC4

LOCK Assert BUS LOCK# signal (for multiprocessing) 0xF0


Load string byte. May be used with a REP prefix
LODSB if (DF==0) AL = *SI++; else AL = *SI--; 0xAC
to repeat the instruction CX times.

Load string word. May be used with a REP


LODSW if (DF==0) AX = *SI++; else AX = *SI--; 0xAD
prefix to repeat the instruction CX times.

LOOP/LOOPx Loop control (LOOPE, LOOPNE, LOOPNZ, LOOPZ) if (x && --CX) goto lbl; 0xE0…0xE2
MOV Move copies data from one location to another, (1) r/m = r; (2) r = r/m; 0xA0...0xA3

if (DF==0)
Move byte from string to string. May be used *(byte*)DI++ = *(byte*)SI++;
MOVSB with a REP prefix to repeat the instruction CX else 0xA4
times. *(byte*)DI-- = *(byte*)SI--;

if (DF==0)
Move word from string to string. May be used
*(word*)DI++ = *(word*)SI++;
MOVSW with a REP prefix to repeat the instruction CX else 0xA5
times. *(word*)DI-- = *(word*)SI--;

MUL Unsigned multiply (1) DX:AX = AX * r/m; (2) AX = AL * r/m; 0xF7/4, 0xF6/4

NEG Two's complement negation r/m *= -1; 0xF6/3…0xF7/3


NOP No operation opcode equivalent to XCHG EAX, EAX 0x90

NOT Negate the operand, logical NOT r/m ^= -1; 0xF6/2…0xF7/2

0x08…0x0D, 0x80…0x81/1,
OR Logical OR (1) r/m |= r/imm; (2) r |= m/imm;
0x82…0x83/1 (since 80186)
(1) port[imm] = AL; (2) port[DX] = AL; (3) port[imm] = AX; (4) port[DX]
OUT Output to port 0xE6, 0xE7, 0xEE, 0xEF
= AX;

0x07, 0x0F(8086/8088 only),


r/m = *SP++; POP CS (opcode 0x0F) works only on 8086/8088. Later CPUs use
POP Pop data from stack 0x17, 0x1F, 0x58…0x5F,
0x0F as a prefix for newer instructions.
0x8F/0

POPF Pop FLAGS register from stack FLAGS = *SP++; 0x9D


0x06, 0x0E, 0x16, 0x1E,
PUSH Push data onto stack *--SP = r/m; 0x50…0x57, 0x68, 0x6A
(both since 80186), 0xFF/6

PUSHF Push FLAGS onto stack *--SP = FLAGS; 0x9C

0xC0…0xC1/2 (since
RCL Rotate left (with carry)
80186), 0xD0…0xD3/2
0xC0…0xC1/3 (since
RCR Rotate right (with carry)
80186), 0xD0…0xD3/3

REPxx Repeat MOVS/STOS/CMPS/LODS/SCAS (REP, REPE, REPNE, REPNZ, REPZ) 0xF2, 0xF3

Not a real instruction. The assembler will translate these to a RETN or a RETF
RET Return from procedure
depending on the memory model of the target system.
RETN Return from near procedure 0xC2, 0xC3

RETF Return from far procedure 0xCA, 0xCB

0xC0…0xC1/0 (since
ROL Rotate left
80186), 0xD0…0xD3/0
0xC0…0xC1/1 (since
ROR Rotate right
80186), 0xD0…0xD3/1

SAHF Store AH into FLAGS 0x9E

0xC0…0xC1/4 (since
SAL Shift Arithmetically left (signed shift left) (1) r/m <<= 1; (2) r/m <<= CL;
80186), 0xD0…0xD3/4
0xC0…0xC1/7 (since
SAR Shift Arithmetically right (signed shift right) (1) (signed) r/m >>= 1; (2) (signed) r/m >>= CL;
80186), 0xD0…0xD3/7

alternative 1-byte encoding of SBB AL, AL is available via undocumented SALC 0x18…0x1D, 0x80…0x81/3,
SBB Subtraction with borrow
instruction 0x82…0x83/3 (since 80186)

Compare byte string. May be used with a REP


SCASB 0xAE
prefix to repeat the instruction CX times.
Compare word string. May be used with a REP
SCASW 0xAF
prefix to repeat the instruction CX times.

0xC0…0xC1/4 (since
SHL Shift left (unsigned shift left)
80186), 0xD0…0xD3/4

0xC0…0xC1/5 (since
SHR Shift right (unsigned shift right)
80186), 0xD0…0xD3/5
STC Set carry flag CF = 1; 0xF9

STD Set direction flag DF = 1; 0xFD

https://en.wikipedia.org/wiki/X86_instruction_listings 3/53
3/14/23, 12:35 AM x86 instruction listings - Wikipedia

Instruction Meaning Notes Opcode

STI Set interrupt flag IF = 1; 0xFB

Store byte in string. May be used with a REP


STOSB if (DF==0) *ES:DI++ = AL; else *ES:DI-- = AL; 0xAA
prefix to repeat the instruction CX times.
Store word in string. May be used with a REP
STOSW if (DF==0) *ES:DI++ = AX; else *ES:DI-- = AX; 0xAB
prefix to repeat the instruction CX times.

0x28…0x2D, 0x80…0x81/5,
SUB Subtraction (1) r/m -= r/imm; (2) r -= m/imm;
0x82…0x83/5 (since 80186)

0x84, 0x85, 0xA8, 0xA9,


TEST Logical compare (AND) (1) r/m & r/imm; (2) r & m/imm;
0xF6/0, 0xF7/0
WAIT Wait until not busy Waits until BUSY# pin is inactive (used with floating-point unit) 0x9B

XCHG Exchange data r :=: r/m; A spinlock typically uses xchg as an atomic operation. (coma bug). 0x86, 0x87, 0x91…0x97

XLAT Table look-up translation behaves like MOV AL, [BX+AL] 0xD7
0x30…0x35, 0x80…0x81/6,
XOR Exclusive OR (1) r/m ^= r/imm; (2) r ^= m/imm;
0x82…0x83/6 (since 80186)

Added in specific processors

Added with 80186/80188

Instruction Opcode Meaning Notes

Check array index against


BOUND 62 /r raises software interrupt 5 if test fails
bounds

Modifies stack for entry to procedure for high level language. Takes two operands: the
ENTER C8 iw ib Enter stack frame
amount of storage to be allocated on the stack and the nesting level of the procedure.
equivalent to:
6C
IN AX, DX
INSB/INSW Input from port to string
MOV ES:[DI], AX
; adjust DI according to operand size and DF
6D

LEAVE C9 Leave stack frame Releases the local stack storage created by the previous ENTER instruction.

equivalent to:
6E
MOV AX, DS:[SI]
OUTSB/OUTSW Output string to port
OUT DX, AX
; adjust SI according to operand size and DF
6F

equivalent to:

POP DI
POP SI
POP BP
Pop all general purpose POP AX ; no POP SP here, all it does is ADD SP, 2 (since AX will be overwritten
POPA 61
registers from stack later)
POP BX
POP DX
POP CX
POP AX

equivalent to:

PUSH AX
PUSH CX
PUSH DX
Push all general purpose
PUSHA 60 PUSH BX
registers onto stack PUSH SP ; The value stored is the initial SP value
PUSH BP
PUSH SI
PUSH DI

example:
6A ib
Push an immediate byte/word
PUSH immediate PUSH 12h
value onto the stack PUSH 1200h
68 iw

IMUL immediate Signed and unsigned example:


multiplication of immediate
byte/word value IMUL BX,12h
6B /r ib IMUL DX,1200h
IMUL CX, DX, 12h
IMUL BX, SI, 1200h
IMUL DI, word ptr [BX+SI], 12h
IMUL SI, word ptr [BP-4], 1200h
69 /r iw

https://en.wikipedia.org/wiki/X86_instruction_listings 4/53
3/14/23, 12:35 AM x86 instruction listings - Wikipedia
Note that since the lower half is the same for unsigned and signed
multiplication, this version of the instruction can be used for unsigned
multiplication as well.
example:
C0
SHL/SHR/SAL/SAR/ROL/ROR/RCL/RCR Rotate/shift bits with an
ROL AX,3
immediate immediate value greater than 1 SHR BL,3
C1

Added with 80286

Instruction Opcode Meaning Notes

Available in 16/32-bit protected mode only.

Adjust RPL field of Causes #UD in Real mode and Virtual 8086 Mode - Windows 95 and OS/2 2.x are known to make
ARPL r/m16, r16 63 /r
selector extensive use of this #UD to use the 63 opcode as a one-byte breakpoint to transition from Virtual 8086
Mode to kernel mode.[2][3]

Clear task-switched
CLTS 0F 06 flag in Machine
Status Word.

Sets ZF=1 if the descriptor could be loaded, ZF=0 otherwise.


Load access rights
byte from the 32-bit variant of LAR instruction is documented to load undefined data into bits 19:16 of destination
LAR r,r/m16 0F 02 /r
specified segment
descriptor register on Intel CPUs.

Load segment limit


LSL r,r/m16 0F 03 /r from the specified Sets ZF=1 if the descriptor could be loaded, ZF=0 otherwise.
segment descriptor

Each of these instructions loads a 2-part table descriptor. The first part is a 16-bit value, specifying table size in bytes minus
Load Global
1. The second part is a 32-bit value (64-bit value in 64-bit mode), specifying the linear start address for the table. This
LGDT m16&32 0F 01 /2 Descriptor Table
address is ANDed with 00FFFFFFh for the 16-bit variants of these instructions.
Register

LIDT can relocate the Interrupt Vector Table in Real Mode as well.
Load Interrupt
LIDT m16&32 0F 01 /3 Descriptor Table LGDT and LIDT are serializing instructions.
Register

Load Local
LLDT r/m16 0F 00 /2 Descriptor Table
Register LLDT and LTR are serializing instructions.

LTR r/m16 0F 00 /3 Load Task Register

On 80386 and later, the "Machine Status Word" is the same as the CR0 register, however LMSW can only modify the
bottom 4 bits of this register.

LMSW can be used to enter but not leave x86 Protected Mode. On the 80286, it is not possible to leave
Load Machine Protected Mode at all without a CPU reset - on 80386 and later, it is possible to leave Protected Mode,
LMSW r/m16 0F 01 /6
Status Word
but this requires the use of the 80386-and-later MOV to CR0 instruction.

LMSW is a serializing instruction.

Store Global The SGDT,SIDT,SLDT,SMSW,STR were unprivileged on all x86 CPUs from 80286 onwards until the introduction of UMIP in
SGDT m16&32 0F 01 /0 Descriptor Table 2017.[4]
Register
This has been a significant security problem for software-based virtualization, since it enables these
Store Interrupt instructions to be used by a VM guest to detect that it is running inside a VM.[5][6]
SIDT m16&32 0F 01 /1 Descriptor Table
Register The 16-bit variants of the SGDT and SIDT instructions also show a difference between Intel
documentation and actual behavior observed on Intel CPUs: as of Intel SDM revision 076, December
Store Local 2021 (https://kib.kiev.ua/x86docs/Intel/SDMs/325462-076.pdf), the last 8 bits of the descriptor is
SLDT r/m16 0F 00 /0 Descriptor Table
Register documented as being written as 0, however observed behavior is that bits 31:24 of the descriptor table
address are written instead.[7]
Store Machine
SMSW r/m16 0F 01 /4
Status Word SLDT and SMSW (but not STR) with a 32-bit register argument are documented to set the top 16 bits
of the specified register to an undefined value on Intel CPUs.
STR r/m16 0F 00 /1 Store Task Register

Verify a segment for


VERR r/m16 0F 00 /4 Sets ZF=1 if segment can be read, ZF=0 otherwise.
reading

Sets ZF=1 if segment can be written, ZF=0 otherwise.

Verify a segment for On some Intel CPU/microcode combinations from 2019 onwards, the VERW instruction also flushes
VERW r/m16 0F 00 /5
writing microarchitectural data buffers. This enables it to be used as part of workarounds for Microarchitectural
Data Sampling security vulnerabilities.[8][9]

LOADALL 0F 05 Load all CPU Undocumented, 80286 only. (A different variant of LOADALL with a different opcode and memory layout exists on 80386.)
registers, including

https://en.wikipedia.org/wiki/X86_instruction_listings 5/53
3/14/23, 12:35 AM x86 instruction listings - Wikipedia
internal ones such
as GDT

Added with 80386

Instruction Meaning Notes

BSF Bit scan forward


BSF and BSR produce undefined results if the source argument is all-0s.
BSR Bit scan reverse
BT Bit test

BTC Bit test and complement

BTR Bit test and reset Instructions atomic only if LOCK prefix present.
BTS Bit test and set

Sign-extends EAX into EDX, forming the quad-word EDX:EAX. Since (I)DIV uses EDX:EAX as its input, CDQ must
CDQ Convert double-word to quad-word
be called after setting EAX if EDX is not manually initialized (as in 64/32 division) before (I)DIV.

Compares ES:[(E)DI] with DS:[(E)SI] and increments or decrements both (E)DI and (E)SI, depending on DF; can be
CMPSD Compare string double-word
prefixed with REP
CWDE Convert word to double-word Unlike CWD, CWDE sign-extends AX to EAX instead of AX to DX:AX

IBTS Insert Bit String Discontinued with B1 step of 80386.

Two-operand form of IMUL: Signed and Allows to multiply two registers directly, storing the partial (truncated) lower bit result. Since the lower half is the same
IMUL
Unsigned for unsigned and signed multiplication, this version of the instruction can be used for unsigned multiplication as well
INSD Input from port to string double-word *(long)ES:EDI±± = port[DX]; (±± depends on DF, ES: cannot be overridden). Can be prefixed with REP.

Interrupt return; D suffix means 32-bit


IRETx return, F suffix means do not generate Use IRETD rather than IRET in 32-bit situations
epilogue code (i.e. LEAVE instruction)

Jxx (near) Jump conditionally Conditional near jump instructions for all 8086 Jxx short jump instructions
JECXZ Jump if ECX is zero

LFS, LGS Load far pointer


LSS Load stack segment and register Normally used to update both SS and SP at the same time.

LODSD Load string double-word EAX = *DS:(E)SI±±; (±± depends on DF, DS: can be overridden); can be prefixed with REP

LOOPW,
Loop, conditional loop Same as LOOP, LOOPcc for earlier processors
LOOPccW
LOOPD,
Loop while equal if (cc && --ECX) goto lbl;, cc = Z(ero), E(qual), NonZero, N(on)E(qual)
LOOPccD

MOV to/from
Move to/from special registers CR=control registers, DR=debug registers, TR=test registers (up to 80486)
CR/DR/TR

MOVSD Move string double-word *(dword*)ES:EDI±± = *(dword*)ESI±±; (±± depends on DF); can be prefixed with REP
MOVSX Move with sign-extension (long)r = (signed char) r/m; and similar

MOVZX Move with zero-extension (long)r = (unsigned char) r/m; and similar

OUTSD Output to port from string double-word port[DX] = *(long*)DS:ESI±±; (±± depends on DF, DS: can be overridden); can be prefixed with REP.
Pop all double-word (32-bit) registers from
POPAD Does not pop register ESP off of stack
stack

POPFD Pop data into EFLAGS register

Push all double-word (32-bit) registers onto


PUSHAD
stack
PUSHFD Push EFLAGS register onto stack

Push a double-word (32-bit) value onto


PUSHD
stack

SCASD Scan string data double-word Compares ES:[(E)DI] with EAX and increments or decrements (E)DI, depending on DF; can be prefixed with REP
(SETA, SETAE, SETB, SETBE, SETC, SETE, SETG, SETGE, SETL, SETLE, SETNA, SETNAE, SETNB, SETNBE,
SETcc Set byte to one on condition, zero otherwise SETNC, SETNE, SETNG, SETNGE, SETNL, SETNLE, SETNO, SETNP, SETNS, SETNZ, SETO, SETP, SETPE,
SETPO, SETS, SETZ)

SHLD Shift left double r1 = r1<<CL ∣ r2>>(register_width - CL); Instead of CL, 8-bit immediate can be used.

r1 = r1>>CL ∣ r2<<(register_width - CL); Instead of CL, 8-bit immediate can be used.

SHLD and SHRD with 16-bit arguments and a shift-amount greater than 16 produce undefined
SHRD Shift right double
results. (Actual results differ between different Intel CPUs, with at least three different behaviors
known.[10])

STOSD Store string double-word *ES:EDI±± = EAX; (±± depends on DF, ES cannot be overridden); can be prefixed with REP

Discontinued with B1 step of 80386.

Used by software mainly for detection of the buggy[11] B0 stepping of the 80386. Microsoft
XBTS Extract Bit String
Windows (v2.01 and later) will attempt to run the XBTS instruction as part of its CPU detection if
CPUID is not present, and will refuse to boot if XBTS is found to be working.[12]

https://en.wikipedia.org/wiki/X86_instruction_listings 6/53
3/14/23, 12:35 AM x86 instruction listings - Wikipedia
Compared to earlier sets, the 80386 instruction set also adds opcodes with different parameter combinations for the following instructions: BOUND,
IMUL, LDS, LES, MOV, POP, PUSH and prefix opcodes for FS and GS segment overrides.

Added with 80486

Instruction Opcode Meaning Notes

r = r<<24 | r<<8&0x00FF0000 | r>>8&0x0000FF00 | r>>24; Only defined for 32-bit registers. Usually used to
BSWAP r32 0F C8+r Byte Swap change between little endian and big endian representations. When used with 16-bit registers produces various different
results on 486,[13] 586, and Bochs/QEMU.[14]

0F A6 /r[15] 0F A6/A7 encodings only available on 80486 stepping A.[16]


CMPXCHG r/m8, r8
0F B0 /r[17] Compare and 0F B0/B1 encodings available on 80486 stepping B and later x86 CPUs.
Exchange
0F A7 /r
CMPXCHG r/m, r16/32 Instruction atomic only if used with LOCK prefix.
0F B1 /r

Invalidate Internal
INVD 0F 08 Flush internal caches. Modified data present in the cache are not written back to memory, potentially causing data loss.
Caches
Invalidate TLB
INVLPG m8 0F 01 /7 Invalidate TLB Entry for page that contains data specified.
Entry

Write Back and Writes back all modified cache lines in the processor's internal cache to main memory and invalidates the internal
WBINVD 0F 09
Invalidate Cache caches.

XADD r/m,r8 0F C0 /r Exchanges the first operand with the second operand, then loads the sum of the two values into the destination operand.
eXchange and
ADD Instruction atomic only if used with LOCK prefix.
XADD r/m,r16/32 0F C1 /r

Added in P5/P6-class processors

Integer/system instructions that were not present in the basic 80486 instruction set, but were added in processors prior to the introduction of SSE.
(Discontinued instructions are not included.)

https://en.wikipedia.org/wiki/X86_instruction_listings 7/53
3/14/23, 12:35 AM x86 instruction listings - Wikipedia

Instruction Opcode Description Ring Added in

Read Model-specific register. The MSR to read is specified in ECX. The value of the MSR is then
RDMSR 0F 32
returned as a 64-bit value in EDX:EAX. IBM 386SLC,[18]
Intel Pentium,
Write Model-specific register. The MSR to write is specified in ECX, and the data to write is given in
0 AMD K5,
EDX:EAX.[a] Cyrix 6x86MX,MediaGXm,
WRMSR 0F 30 IDT WinChip C6,
Instruction is, with some exceptions, serializing.[b] Transmeta Crusoe

Intel 386SL, 486SL,[c]


Intel Pentium,
AMD 5x86,
-2
RSM[20] 0F AA Resume from System Management Mode.
(SMM) Cyrix 486SLC/e,[21]
IDT WinChip C6,
Transmeta Crusoe,
Rise mP6
CPU Identification and feature information. Takes as input a CPUID leaf index in EAX and, depending on Intel Pentium,[d]
leaf, a sub-index in ECX. Result is returned in EAX,EBX,ECX,EDX.
AMD 5x86,[d]
Cyrix 5x86,[e]
Instruction is serializing, and causes a mandatory #VMEXIT under virtualization. IDT WinChip C6,
CPUID 0F A2 3
Transmeta Crusoe,
Support for CPUID can be checked by toggling bit 21 of EFLAGS (EFLAGS.ID) - if this Rise mP6,
bit can be toggled, CPUID is present. NexGen Nx586,[f]
UMC Green CPU

Intel Pentium,
Compare and Exchange 8 bytes. Compares EDX:EAX with m64. If equal, set ZF and load ECX:EBX into
AMD K5,
m64. Else, clear ZF and load m64 into EDX:EAX.
Cyrix 6x86L,MediaGXm,
CMPXCHG8B m64 0F C7 /1 3 IDT WinChip C6,[h]
Instruction atomic only if used with LOCK prefix.[g] Transmeta Crusoe,[h]
Rise mP6[h]

Read 64-bit Time Stamp Counter (TSC) into EDX:EAX.[i]


Intel Pentium,
In early processors, the TSC was a cycle counter, incrementing by 1 for each clock AMD K5,
Cyrix 6x86MX,MediaGXm,
RDTSC 0F 31 cycle (which could cause its rate to vary on processors that could change clock speed 3[k] IDT WinChip C6,
at runtime) - in later processors, it increments at a fixed rate that doesn't necessarily Transmeta Crusoe,
match the CPU clock speed.[j] Rise mP6

Intel Pentium MMX,


Intel Pentium Pro,
Read Performance Monitoring Counter. The counter to read is specified by ECX and its value is returned AMD K7,
RDPMC 0F 33 3[l]
in EDX:EAX.[i] Cyrix 6x86MX,
IDT WinChip C6,
VIA Nano[m]

Intel Pentium Pro,


AMD K7,
CMOVcc reg,r/m [n] [o] 3 Cyrix 6x86MX,MediaGXm,
0F 4x /r Conditional move to register. The source operand may be either register or memory.
Transmeta Crusoe,
VIA C3 "Nehemiah"
Undefined Instruction. Generates an invalid opcode (#UD) exception in all operation modes.

UD2 0F 0B This instruction is provided for software testing to explicitly generate an invalid opcode. (3) (80186),[p]
Intel Pentium Pro
The opcode for this instruction is reserved for this purpose.

Official long NOP.

Other than AMD K7/K8, broadly unsupported in non-Intel processors released before Intel Pentium Pro,[r]
NOP r/m NFx 0F 1F /0 3
AMD K7, x86-64[s]
2006.[q][28]

SYSCALL 0F 05 Fast System call. 3


AMD K6[t],
SYSRET 0F 07[w] Fast Return from System Call. Designed to be used together with SYSCALL. 0 [x] x86-64[u][v]

Intel Pentium II,[y]


SYSENTER 0F 34 Fast System call. 3[x]
AMD K7,[36][z]
Transmeta Crusoe,[aa]
NatSemi Geode GX2,
SYSEXIT 0F 35[w] Fast Return from System Call. Designed to be used together with SYSENTER. 0[x]
VIA C3 "Nehemiah"[ab]

a. On Intel and AMD CPUs, the WRMSR instruction is also used to update
the CPU microcode. This is done by writing the virtual address of the
new microcode to upload to MSR 79h on Intel CPUs and MSR
C001_0020h on AMD CPUs.

https://en.wikipedia.org/wiki/X86_instruction_listings 8/53
3/14/23, 12:35 AM x86 instruction listings - Wikipedia

b. Writes to the following MSRs are not serializing:[19] n. The condition codes supported for CMOVcc instruction (opcode 0F 4x
/r, with the x nibble specifying the condition) are:
Number Name
x cc Condition (EFLAGS)
48h SPEC_CTRL
0 O OF=1: "Overflow"
49h PRED_CMD
1 NO OF=0: "Not Overflow"
122h TSX_CTRL
2 C,B,NAE CF=1: "Carry", "Below", "Not Above or Equal"
6E0h TSC_DEADLINE
3 NC,NB,AE CF=0: "Not Carry", "Not Below", "Above or Equal"
6E1h PKRS
4 Z,E ZF=1: "Zero", "Equal"
774h HWP_REQUEST
5 NZ,NE ZF=0: "Not Zero", "Not Equal"
802h to 83Fh (x2APIC MSRs)
6 NA,BE (CF=1 or ZF=1): "Not Above", "Below or Equal"

c. System Management Mode and the RSM instruction were made 7 A,NBE (CF=0 and ZF=0): "Above", "Not Below or Equal"
available on non-SL variants of the Intel 486 only after the initial release
of the Intel Pentium in 1993. 8 S SF=1: "Sign"
d. CPUID is also available on some Intel and AMD 486 processor variants 9 NS SF=0: "Not Sign"
that were released after the initial release of the Intel Pentium.
A P,PE PF=1: "Parity", "Parity Even"
e. On the Cyrix 5x86 and 6x86 CPUs, CPUID is not enabled by default and
must be enabled through a Cyrix configuration register. B NP,PO PF=0: "Not Parity", "Parity Odd"
f. On NexGen CPUs, CPUID is only supported with some system BIOSes.
On some NexGen CPUs that do support CPUID, EFLAGS.ID is not C L,NGE SF≠OF: "Less", "Not Greater Or Equal"

supported but EFLAGS.AC is, complicating CPU detection.[22] D NL,GE SF=OF: "Not Less", "Greater Or Equal"
g. LOCK CMPXCHG8B with a register operand (which is an invalid encoding)
E LE,NG (ZF=1 or SF≠OF): "Less or Equal", "Not Greater"
can cause hangs on some Intel Pentium CPUs (Pentium F00F bug).
h. On IDT WinChip, Transmeta Crusoe and Rise mP6 processors, the F NLE,GE (ZF=0 and SF=OF): "Not Less or Equal", "Greater"
CMPXCHG8B instruction is always supported, however its CPUID bit may
be missing. This is a workaround for a bug in Windows NT.[23] o. For CMOVcc with a memory source operand, the CPU will always read
i. The RDTSC and RDPMC instructions are not ordered with respect to other the operand from memory (potentially causing memory exceptions and
instructions, and may sample their respective counters before earlier cache line-fills) even if the condition for the move is not satisfied.
instructions are executed or after later instructions have executed. p. UD2 will cause an #UD exception on all x86 processors from the 80186
Invocations of RDPMC (but not RDTSC) may be reordered relative to each onwards, but is only documented to do so from Pentium Pro onwards.
other even for reads of the same counter. q. Unlike other instructions added in Pentium Pro, long NOP does not
have a CPUID feature bit.
In order to impose ordering with respect to other instructions, LFENCE or
serializing instructions (e.g. CPUID) are needed.[24] r. 0F 1F /0 as long-NOP was introduced in the Pentium Pro, but
remained undocumented until 2006.[29] The whole 0F 18..1F opcode
j. Fixed-rate TSC was introduced in two stages: range was NOP in Pentium Pro. However, except for 0F 1F /0, Intel
does not guarantee that these opcodes will remain NOP in future
Constant TSC processors, and have indeed assigned some of these opcodes to other
TSC running at a fixed rate as long as the processor core is not in instructions in at least some processors.[30]
a deep-sleep (C2 or deeper) mode, but not synchronized between s. Documented for AMD x86-64 since 2002.[31]
CPU cores. Introduced in Intel Prescott, Yonah and Bonnell. Also
t. On K6, the SYSCALL/SYSRET instructions were available on Model 7
present in all Transmeta and VIA Nano[25] CPUs. Does not have a
(250nm "Little Foot") and later, not on the earlier Model 6.[32]
CPUID bit.
u. SYSCALL and SYSRET were made an integral part of x86-64 - as a result,
Invariant TSC the instructions are available in 64-bit mode on all x86-64 processors
TSC running at a fixed rate synchronized between CPU cores in from AMD, Intel, VIA and Zhaoxin (except Xeon Phi "Knights Corner").
all P-,C- and T-states (but not necessarily S-states).
Outside 64-bit mode, the instructions are available on AMD processors
Present in AMD K10 and later; Intel Nehalem/Saltwell[26] and later; only.
Zhaoxin LuJiaZui[27]. Indicated with a CPUID bit (leaf
8000_0007:EDX[8]). v. The exact semantics of SYSRET differs slightly between AMD and Intel
processors - this has been known to cause security issues.[33]
k. RDTSC can be run outside Ring 0 only if CR4.TSD=0.
w. For the SYSRET and SYSEXIT instructions under x86-64, it is necessary
l. RDPMC can be run outside Ring 0 only if CR4.PCE=1. to add the REX.W prefix for variants that will return to 64-bit user-mode
m. The RDPMC instruction is not present in VIA processors prior to the Nano. code.

Encodings of these instructions without the REX.W prefix are used to


return to 32-bit user-mode code. (Neither of these instructions can be
used to return 16-bit user-mode code.)

x. The SYSRET, SYSENTER and SYSEXIT instructions are unavailable in


Real mode. (SYSENTER is, however, available in Virtual 8086 mode.)
y. The CPUID flags that indicate support for SYSENTER/SYSEXIT are set on
the Pentium Pro, even though the processor does not officially support
these instructions.[34]

Third party testing indicates that the opcodes are present on the
Pentium Pro but too buggy to be usable.[35]

z. On AMD CPUs, the SYSENTER and SYSEXIT are not available in x86-64
long mode (#UD).

https://en.wikipedia.org/wiki/X86_instruction_listings 9/53
3/14/23, 12:35 AM x86 instruction listings - Wikipedia
aa. On Transmeta CPUs, the SYSENTER and SYSEXIT instructions are only ab. On Nehemiah, SYSENTER and SYSEXIT are available only on stepping 8
available with version 4.2 or higher of the Transmeta Code Morphing and later.[38]
software.[37]

Added as instruction set extensions

Added with x86-64

These instructions can only be encoded in 64 bit mode. They fall in four groups:

original instructions that reuse existing opcodes for a different purpose (MOVSXD replacing ARPL)
original instructions with new opcodes (SWAPGS)
existing instructions extended to a 64 bit address size (JRCXZ)
existing instructions extended to a 64 bit operand size (remaining instructions)

Most instructions with a 64 bit operand size encode this using a REX.W prefix; in the absence of the REX.W prefix, the corresponding instruction with 32 bit
operand size is encoded. This mechanism also applies to most other instructions with 32 bit operand size. These are not listed here as they do not gain a
new mnemonic in Intel syntax when used with a 64 bit operand size.

Instruction Encoding Meaning

CDQE REX.W 98 Sign extend EAX into RAX

CQO REX.W 99 Sign extend RAX into RDX:RAX


CMPSQ REX.W A7 CoMPare String Quadword

CoMPare and eXCHanGe 16 Bytes.

CMPXCHG16B m128[a][b] REX.W 0F C7 /1
Atomic only if used with LOCK prefix.

IRETQ REX.W CF 64-bit Return from Interrupt


JRCXZ rel8 E3 cb Jump if RCX is zero

LODSQ REX.W AD LoaD String Quadword

MOVSXD r64,r/m32 REX.W 63 /r[c] MOV with Sign Extend 32-bit to 64-bit

MOVSQ REX.W A5 Move String Quadword

POPFQ 9D POP RFLAGS Register


PUSHFQ 9C PUSH RFLAGS Register

SCASQ REX.W AF SCAn String Quadword

STOSQ REX.W AB STOre String Quadword


SWAPGS 0F 01 F8 Exchange GS base with KernelGSBase MSR

a. The memory operand to CMPXCHG16B must be 16-byte aligned.


b. The CMPXCHG16B instruction was absent from a few of the earliest Intel/AMD x86-64 processors. On Intel processors, the instruction was missing from
Xeon "Nocona" stepping D,[39] but added in stepping E.[40] On AMD K8 family processors, it was added in stepping F, at the same time as DDR2
support was introduced.[41]

For this reason, CMPXCHG16B has its own CPUID flag, separate from the rest of x86-64.

c. Encodings of MOVSXD without REX.W prefix are permitted but discouraged[42] - such encodings behave identically to 16/32-bit MOV (8B /r).

Bit manipulation extensions

Bit manipulation instructions. For all of the VEX-encoded instructions defined by BMI1 and BMI2, the operand size may be 32 or 64 bits, controlled by
the VEX.W bit - none of these instructions are available in 16-bit variants.

https://en.wikipedia.org/wiki/X86_instruction_listings 10/53
3/14/23, 12:35 AM x86 instruction listings - Wikipedia

Bit Manipulation Instruction


Opcode Instruction description Added in
Extension mnemonics

POPCNT r16,r/m16
F3 0F B8 /r
POPCNT r32,r/m32 Population Count. Counts the number of bits that are set to 1 in its source argument.
POPCNT r64,r/m64 F3 REX.W 0F B8 /r K10,
ABM (LZCNT)[a] Bobcat,
[b] Haswell,
Advanced Bit LZCNT r16,r/m16 Count Leading zeroes.
Manipulation F3 0F BD /r ZhangJiang,
LZCNT r32,r/m32
If source operand is all-0s, then LZCNT will return operand size in bits Gracemont
(16/32/64) and set CF=1.
LZCNT r64,r/m64 F3 REX.W 0F BD /r

Count Trailing zeroes.[c]


TZCNT r16,r/m16
F3 0F BC /r
TZCNT r32,r/m32
If source operand is all-0s, then TZCNT will return operand size in bits
(16/32/64) and set CF=1.
TZCNT r64,r/m64 F3 REX.W 0F BC /r

ANDN ra,rb,r/m VEX.LZ.0F38 F2 /r Bitwise AND-NOT: ra = r/m AND NOT(rb)


Bitfield extract. Bitfield start position is specified in bits [7:0] of rb, length in bits[15:8] of
rb. The bitfield is then extracted from the r/m value with zero-extension, then stored in
ra. Equivalent to[d]
BEXTR ra,r/m,rb VEX.LZ.0F38 F7 /r
mask = (1 << rb[15:8]) - 1
ra = (r/m >> rb[7:0]) AND mask
Haswell,
BMI1 Piledriver,
Bit Manipulation Jaguar,
Instruction Set 1 Extract lowest set bit in source argument. Returns 0 if source argument is 0. Equivalent ZhangJiang,
to Gracemont
BLSI reg,r/m VEX.LZ.0F38 F3 /3
dst = (-src) AND src

Generate a bitmask of all-1s bits up to the lowest bit position with a 1 in the source
argument. Returns all-1s if source argument is 0. Equivalent to
BLSMSK reg,r/m VEX.LZ.0F38 F3 /2
dst = (src-1) XOR src

Copy all bits of the source argument, then clear the lowest set bit. Equivalent to
BLSR reg,r/m VEX.LZ.0F38 F3 /1
dst = (src-1) AND src

Zero out high-order bits in r/m starting from the bit position specified in rb, then write
result to rd. Equivalent to
BZHI ra,r/m,rb VEX.LZ.0F38 F5 /r
ra = r/m AND NOT(-1 << rb[7:0])

Widening unsigned integer multiply without setting flags. Multiplies EDX/RDX with r/m,
MULX ra,rb,r/m VEX.LZ.F2.0F38 F6 /r then stores the low half of the multiplication result in ra and the high half in rb. If ra
and rb specify the same register, only the high half of the result is stored.
Parallel Bit Deposit. Scatters contiguous bits from rb to the bit positions set in r/m,
then stores result to ra. Operation performed is:

ra=0; k=0; mask=r/m


PDEP ra,rb,r/m VEX.LZ.F2.0F38 F5 /r
for i=0 to opsize-1 do
if (mask[i] == 1) then
ra[i]=rb[k]; k=k+1

Haswell,
BMI2
Bit Manipulation Parallel Bit Extract. Uses r/m argument as a bit mask to select bits in rb, then Excavator,[e]
Instruction Set 2 compacts the selected bits into a contiguous bit-vector. Operation performed is: ZhangJiang,
Gracemont
ra=0; k=0; mask=r/m
PEXT ra,rb,r/m VEX.LZ.F3.0F38 F5 /r
for i=0 to opsize-1 do
if (mask[i] == 1) then
ra[k]=rb[i]; k=k+1

RORX reg,r/m,imm8 VEX.LZ.F2.0F3A F0 /r ib Rotate right by immediate without affecting flags.

Arithmetic shift right without updating flags.

SARX ra,r/m,rb VEX.LZ.F3.0F38 F7 /r For SARX, SHRX and SHLX, the shift-amount specified in rb is masked to
5 bits for 32-bit operand size and 6 bits for 64-bit operand size.

SHRX ra,r/m,rb VEX.LZ.F2.0F38 F7 /r Logical shift right without updating flags.

SHLX ra,r/m,rb VEX.LZ.66.0F38 F7 /r Shift left without updating flags.

a. On AMD CPUs, the "ABM" extension provides both POPCNT and LZCNT. On Intel CPUs, however, the CPUID bit for "ABM" is only documented to
indicate the presence of the LZCNT instruction and is listed as "LZCNT", while POPCNT has its own separate CPUID feature bit.

https://en.wikipedia.org/wiki/X86_instruction_listings 11/53
3/14/23, 12:35 AM x86 instruction listings - Wikipedia
However, all known processors that implement the "ABM"/"LZCNT" extensions also implement POPCNT and set the CPUID feature bit for POPCNT, so
the distinction is theoretical only.

(The converse is not true - there exist processors that support POPCNT but not ABM, such as Intel Nehalem and VIA Nano 3000.)
b. The LZCNT instruction will execute as BSR on systems that do not support the LZCNT or ABM extensions. BSR computes the index of the highest set bit
in the source operand, producing a different result from LZCNT for most input values.
c. The TZCNT instruction will execute as BSF on systems that do not support the BMI1 extension. BSF produces the same result as TZCNT for all input
operand values except zero - for which TZCNT returns input operand size, but BSF produces undefined behavior (leaves destination unmodified on
most modern CPUs).
d. For BEXTR, the start position and length are not masked and can take values from 0 to 255. If the selected bits extend beyond the end of the r/m
argument (which has the usual 32/64-bit operand size), then the excess bits are read out as 0.
e. On AMD processors before Zen 3, the PEXT and PDEP instructions are quite slow[43] and exhibit data-dependent timing due to the use of a microcoded
implementation (about 18 to 300 cycles, depending on the number of bits set in the mask argment). As a result, it is often faster to use other
instruction sequences on these processors.[44][45]

Added with Intel TSX

TSX Subset Instruction Opcode Description Added in

XBEGIN rel16 C7 F8 cw Start transaction. If transaction fails, perform a branch to Haswell


XBEGIN rel32 C7 F8 cd the given relative offset.
(Deprecated on desktop/laptop CPUs from 10th
RTM XABORT imm8 C6 F8 ib Abort transaction with 8-bit immediate as error code.
Restricted generation (Ice Lake, Comet Lake) onwards,
Transactional XEND NP 0F 01 D5 End transaction. but continues to be available on Xeon-branded
memory server parts (e.g. Ice Lake-SP, Sapphire
Test if in transactional execution. Sets EFLAGS.ZF to 0 if
XTEST NP 0F 01 D6 executed inside a transaction (RTM or HLE), 1 Rapids))
otherwise.

Instruction prefix to indicate start of hardware lock


elision, used with memory atomic instructions only (for
other instructions, the F2 prefix may have other
XACQUIRE F2
meanings). When used with such instructions, may start
Haswell
a transaction instead of performing the memory atomic
HLE operation.
Hardware Lock (Discontinued - the last processors to support
Elision Instruction prefix to indicate end of hardware lock elision, HLE were Coffee Lake and Cascade Lake)
used with memory atomic/store instructions only (for
other instructions, the F3 prefix may have other
XRELEASE F3
meanings). When used with such instructions during
hardware lock elision, will end the associated transaction
instead of performing the store/atomic.

TSXLDTRK XSUSLDTRK F2 0F 01 E8 Suspend Tracking Load Addresses


Load Address
Sapphire Rapids
Tracking
suspend/resume XRESLDTRK F2 0F 01 E9 Resume Tracking Load Addresses

Added with Intel CET

Intel CET (Control-Flow Enforcement Technology) adds two distinct features to help protect against security exploits such as return-oriented
programming: a shadow stack (CET_SS), and indirect branch tracking (CET_IBT).

https://en.wikipedia.org/wiki/X86_instruction_listings 12/53
3/14/23, 12:35 AM x86 instruction listings - Wikipedia

CET Subset Instruction Opcode Description Ring Added in

INCSSPD r32 F3 0F AE /5
Increment shadow stack pointer
INCSSPQ r64 F3 REX.W 0F AE /5
CET_SS
Shadow stack. Read shadow stack pointer into
RDSSPD r32 F3 0F 1E /1
register (low 32 bits)[a]
When shadow stacks are enabled, return addresses Read shadow stack pointer into
are pushed on both the regular stack and the RDSSPQ r64 F3 REX.W 0F 1E /1 3
register (full 64 bits)[a]
shadow stack when a function call is made. They
SAVEPREVSSP F3 0F 01 EA Save previous shadow stack pointer
are then both popped on return from the function Tiger Lake,
call - if they do not match, then the stack is assumed RSTORSSP m64 F3 0F 01 /5 Restore saved shadow stack pointer Zen 3
to be corrupted, and a #CP exception is issued. WRSSD m32,r32 0F 38 F6 /r Write 4 bytes to shadow stack

The shadow stack is additionally required to be WRSSQ m64,r64 REX.W 0F 38 F6 /r Write 8 bytes to shadow stack
stored in specially marked memory pages which WRUSSD m32,r32 66 0F 38 F5 /r Write 4 bytes to user shadow stack
cannot be modified by normal memory store
WRUSSQ m64,r64 66 REX.W 0F 38 F5 /r Write 8 bytes to user shadow stack
instructions. 0
SETSSBSY F3 0F 01 E8 Mark shadow stack busy

CLRSSBSY m64 F3 0F AE /6 Clear shadow stack busy flag

Terminate indirect branch in 32-bit


ENDBR32 F3 0F 1E FB
mode[b]
CET_IBT Terminate indirect branch in 64-bit
ENDBR64 F3 0F 1E FA
Indirect Branch Tracking. mode[b]

When IBT is enabled, an indirect branch (jump, call, Prefix used with indirect CALL/JMP near 3 Tiger Lake
instructions (opcodes FF /2 and
return) to any instruction that is not an ENDBR32/64 FF /4) to indicate that the branch
instruction will cause a #CP exception. NOTRACK 3E[c] target is not required to start with an
ENDBR32/64 instruction. Prefix only
honored when NO_TRACK_EN flag is
set.

a. The RDSSPD and RDSSPQ instructions act as NOPs on processors where shadow stacks are disabled or CET is not supported.
b. ENDBR32 and ENDBR64 act as NOPs on processors that don't support CET_IBT or where IBT is disabled.

https://en.wikipedia.org/wiki/X86_instruction_listings 13/53
3/14/23, 12:35 AM x86 instruction listings - Wikipedia
c. This prefix has the same encoding as the DS: segment override prefix - as of April 2022, Intel documentation does not appear to specify whether this
prefix also retains its old segment-override function when used as a no-track prefix, nor does it provide an official mnemonic for this prefix.[46][47] (GNU
binutils use "notrack"[48])

Added with other cross-vendor extensions

Instruction
Instruction Set Extension Opcode Instruction description Ring Added in
mnemonics

Prefetch with Non-Temporal Access.

Prefetch data under the assumption that


the data will be used only once, and
PREFETCHNTA m8 0F 18 /0 attempt to minimize cache pollution from
said data. The methods used to minimize
cache pollution are implementation-
dependent.[b] Pentium III,
SSE[a] 3 K7,
(non-SIMD) Nehemiah
PREFETCHT0 m8 0F 18 /1 Prefetch data to all levels of the cache hierarchy.[b]
Prefetch data to all levels of the cache hierarchy
PREFETCHT1 m8 0F 18 /2
except L1 cache.[b]
Prefetch data to all levels of the cache hierarchy
PREFETCHT2 m8 0F 18 /3
except L1 and L2 caches.[b]

SFENCE NP 0F AE F8+x[c] Store Fence.[d]

LFENCE NP 0F AE E8+x[c] Load Fence and Dispatch Serialization.[e]

MFENCE NP 0F AE F0+x[c] Memory Fence.[f]


MOVNTI m32,r32 NP 0F C3 /r
Non-Temporal Memory Store. Pentium 4,
SSE2 MOVNTI m64,r64 NP REX.W 0F C3 /r
3 K8,
(non-SIMD)
Pauses CPU thread for a short time period.[g] C7 Esther

PAUSE F3 90
Intended for use in spinlocks.[h]

Flush one cache line to memory.

In a system with multiple cache hierarchy


CLFSH[i] levels and/or multiple processors each (SSE2),
CLFLUSH m8 NP 0F AE /7 3
Cache Line Flush. Geode LX
with their own caches, the line is flushed
from all of them.

Start monitoring a memory location for memory


writes. The memory address to monitor is given by
MONITOR[k] DS:AX/EAX/RAX.[l] ECX and EDX are used to
NP 0F 01 C8
MONITOR EAX,ECX,EDX provide extra extension and hint flags.

Prescott,
MONITOR[j] Yonah,
Monitor a memory location for Wait for a write to a monitored memory location 0[m] Bonnell,
memory writes. previously specified with MONITOR.[n] K10,
Nano
MWAIT[k] NP 0F 01 C9
MWAIT EAX,ECX
EAX and ECX are used to provide extra
extension and hint flags.

SMX
Safer Mode Extensions.
Conroe/Merom,
Load, authenticate and execute Perform an SMX function. The function to perform
GETSEC NP 0F 37 0 WuDaoKou,[58]
a digitally signed "Authenticated is given in EAX.[o]
Tremont
Code Module" as part of Intel
Trusted Execution Technology.

XSAVE mem NP 0F AE /4 Save state components specified by EDX:EAX to 3 Penryn,[p]


XSAVE XSAVE64 mem NP REX.W 0F AE /4 memory. Bulldozer,
Processor Extended State Jaguar,
Save/Restore. XRSTOR mem NP 0F AE /5 Restore state components specified by EDX:EAX Goldmont,
XRSTOR64 mem NP REX.W 0F AE /5 from memory. ZhangJiang
XGETBV NP 0F 01 D0 Get value of Extended Control Register.

Reads an XCR specified by ECX into


EDX:EAX.

https://en.wikipedia.org/wiki/X86_instruction_listings 14/53
3/14/23, 12:35 AM x86 instruction listings - Wikipedia

Set Extended Control Register.

XSETBV NP 0F 01 D1 Write the value in EDX:EAX to the XCR 0


specified by ECX.

Read Time Stamp Counter and processor core


ID.[q] K10,
RDTSCP
Nehalem,
Read Time Stamp Counter and RDTSCP 0F 01 F9
The TSC value is placed in EDX:EAX and 3[s] Silvermont,
Processor ID.
the core ID in ECX.[r] Nano

POPCNT r16,r/m16 K10,


F3 0F B8 /r
POPCNT[t] POPCNT r32,r/m32 Count the number of bits that are set to 1 in its
3 Nehalem,
Population Count. source argument.
Nano 3000
POPCNT r64,r/m64 F3 REX.W 0F B8 /r

CRC32 r32,r/m8 F2 0F 38 F0 /r Accumulate CRC value using the CRC-32C


(Castagnoli) polynomial 0x11EDC6F41 (normal
Nehalem,
SSE4.2 CRC32 r32,r/m16 form 0x1EDC6F41). This is the polynomial used in
F2 0F 38 F1 /r 3 Bulldozer,
(non-SIMD) CRC32 r32,r/m32 iSCSI. In contrast to the more popular one used in
ZhangJiang
Ethernet, its parity is even, and it can thus detect
CRC32 r64,r/m64 F2 REX.W 0F 38 F1 /r any error with an odd number of changed bits.

Save state components specified by EDX:EAX to


memory.

Unlike the older XSAVE instruction, Sandy Bridge,


XSAVEOPT
XSAVEOPT mem NP 0F AE /6 XSAVEOPT may abstain from writing Steamroller,
Processor Extended State processor state items to memory when the 3 Puma,
XSAVEOPT64 mem NP REX.W 0F AE /6
Save/Restore Optimized Goldmont,
CPU can determine that they haven't been ZhangJiang
modified since the most recent
corresponding XRSTOR.

RDFSBASE r32 F3 0F AE /0
Read base address of FS: segment.
RDFSBASE r64 F3 REX.W 0F AE /0

FSGSBASE RDGSBASE r32 F3 0F AE /1 Ivy Bridge,


Read base address of GS: segment.
Read/write base address of FS and RDGSBASE r64 F3 REX.W 0F AE /1 Steamroller,
3
GS segments from user-mode. WRFSBASE r32 F3 0F AE /2 Goldmont,
Available in 64-bit mode only. Write base address of FS: segment. ZhangJiang
WRFSBASE r64 F3 REX.W 0F AE /2
WRGSBASE r32 F3 0F AE /3
Write base address of GS: segment.
WRGSBASE r64 F3 REX.W 0F AE /3

MOVBE r16,m16
NFx 0F 38 F0 /r Load from memory to register with byte-order
MOVBE r32,m32
swap. Bonnell,
MOVBE MOVBE r64,m64 NFx REX.W 0F 38 F0 /r Haswell,
Move to/from memory with byte order 3 Jaguar,
swap. MOVBE m16,r16 Steamroller,
NFx 0F 38 F1 /r Store to memory from register with byte-order
MOVBE m32,r32 ZhangJiang
swap.
MOVBE m64,r64 NFx REX.W 0F 38 F1 /r

Invalidate entries in TLB and paging-structure Haswell,


INVPCID caches based on invalidation type in register[u] and ZhangJiang,
INVPCID reg,m128 66 0F 38 82 /r 0
Invalidate Process-context identifier. descriptor in m128. The descriptor contains a Zen 3,
memory address and a PCID.[v] Gracemont

K6-2,
PREFETCHW m8 0F 0D /1 Prefetch cache line with intent to write.[b]
PREFETCHW[w] (Cedar Mill),[x]
Cache-line prefetch with intent to 3 Silvermont,
write. Broadwell,
PREFETCH m8[y] 0F 0D /0 Prefetch cache line.[b]
ZhangJiang

Add-with-carry. Differs from the older ADC


ADCX r32,r/m32 66 0F 38 F6 /r
instruction in that it leaves flags other than Broadwell,
ADCX r64,r/m64 66 REX.W 0F 38 F6 /r
ADX EFLAGS.CF unchanged. Zen 1,
3
Enhanced variants of add-with-carry. Add-with-carry, with the overflow-flag EFLAGS.OF ZhangJiang,
ADOX r32,r/m32 F3 0F 38 F6 /r Gracemont
serving as carry input and output, with other flags
ADOX r64,r/m64 F3 REX.W 0F 38 F6 /r
left unchanged.

SMAP CLAC NP 0F 01 CA Clear EFLAGS.AC.


Supervisor Mode Access Prevention.
Broadwell,
Repurposes the EFLAGS.AC
0 Goldmont,
(alignment check) flag to a flag that
Zen 1
prevents access to user-mode
memory while in ring 0, 1 or 2. STAC NP 0F 01 CB Set EFLAGS.AC.

CLFLUSHOPT m8 NFx 66 0F AE /7 Flush cache line. 3 Skylake,


CLFLUSHOPT Goldmont,
Optimized Cache Line Flush. Zen 1
https://en.wikipedia.org/wiki/X86_instruction_listings 15/53
3/14/23, 12:35 AM x86 instruction listings - Wikipedia
Differs from the older CLFLUSH instruction
in that it has more relaxed ordering rules
with respect to memory stores and other
cache line flushes, enabling improved
performance.

XSAVEC Skylake,
XSAVEC mem NP 0F C7 /4 Save processor extended state components
Processor Extended State 3 Goldmont,
XSAVEC64 mem NP REX.W 0F C7 /4 specified by EDX:EAX to memory with compaction.
save/restore with compaction. Zen 1

Save processor extended state components


XSAVES mem NP 0F C7 /5
XSS specified by EDX:EAX to memory with compaction Skylake,
XSAVES64 mem NP REX.W 0F C7 /5
Processor Extended State and optimization if possible. 0 Goldmont,
save/restore, supervisor state. XRSTORS mem NP 0F C7 /3 Restore state components specified by EDX:EAX Zen 1
XRSTORS64 mem NP REX.W 0F C7 /3 from memory.

RDPKRU NP 0F 01 EE Read User Page Key register into EAX. Comet Lake,
PKU
3 Gracemont,
Protection Keys for user pages.
WRPKRU NP 0F 01 EF Write data from EAX into User Page Key Register. Zen 3

Goldmont Plus,
RDPID
Read processor core ID.
RDPID r32 F3 0F C7 /7 Read processor core ID into register.[q] 3[z] Zen 2,
Ice Lake

Zen 2,
CLWB Write one cache line back to memory without
CLWB m8 NFx 66 0F AE /6 3 Tiger Lake,
Cache Line Writeback to memory. invalidating the cache line.
Tremont

WBNOINVD
Write back all dirty cache lines to memory without Zen 2,
Whole Cache Writeback without WBNOINVD F3 0F 09 0
invalidate. invalidation.[aa] Ice Lake-SP

a. On AMD Athlon processors prior to the Athlon XP, the non-SIMD e. The LFENCE instruction ensures that all memory loads after the LFENCE
instructions of SSE were introduced as part of "MMX Extensions".[49] instruction are made globally observable after all memory loads before
b. All of the PREFETCH* instructions are hint instructions with effects only the LFENCE.
on performance, not program semantics. Providing an invalid address
On all Intel CPUs that support SSE2, the LFENCE instruction provides a
(e.g. address of an unmapped page or a non-canonical address) will
cause the instruction to act as a NOP without any exceptions generated. stronger ordering guarantee:[52] it is dispatch-serializing, meaning that
instructions after the LFENCE instruction are allowed to start executing
c. For the SFENCE, LFENCE and MFENCE instructions, the bottom 3 bits of
only after all instructions before it have retired (which will ensure that all
the ModR/M byte are ignored, and any value of x in the range 0..7 will
preceding loads but not necessarily stores have completed). The effect
result in a valid instruction.
of dispatch-serialization is that LFENCE also acts as a speculation barrier
d. The SFENCE instruction ensures that all memory stores after the SFENCE and a reordering barrier for accesses to non-memory resources such as
instruction are made globally observable after all memory stores before performance counters (accessed through e.g. RDTSC or RDPMC) and
the SFENCE. This imposes ordering on stores that can otherwise be
x2apic MSRs.
reordered, such as non-temporal stores and stores to WC (Write-
Combining) memory regions.[50] On AMD CPUs, LFENCE is not necessarily dispatch-serializing by default
- however, on all AMD CPUs that support any form of non-dispatch-
On Intel CPUs, as well as AMD CPUs from Zen1 onwards (but not older serializing LFENCE, it can be made dispatch-serializing by setting bit 1 of
AMD CPUs), SFENCE also acts as a reordering barrier on cache
MSR C001_1029.[53]
flushes/writebacks performed with the CLFLUSH, CLFLUSHOPT and CLWB
instructions. (Older AMD CPUs require MFENCE to order CLFLUSH.) f. The MFENCE instruction ensures that all memory loads, stores and
cacheline-flushes after the MFENCE instruction are made globally
SFENCE is not ordered with respect to LFENCE, and an SFENCE+LFENCE
observable after all memory loads, stores and cacheline-flushes before
sequence is not sufficient to prevent a load from being reordered past a
the MFENCE.
previous store.[51] To prevent such reordering, it is necessary to execute
an MFENCE, LOCK or a serializing instruction. On Intel CPUs, MFENCE is not dispatch-serializing, and therefore cannot
be used to enforce ordering on accesses to non-memory resources
such as performance counters and x2apic MSRs. MFENCE is still ordered
with respect to LFENCE, so if a memory barrier with dispatch serialization
is needed, then it can be obtained by issuing an MFENCE followed by an
LFENCE.[24]

On AMD CPUs, MFENCE is serializing.

g. The actual length of the pause performed by the PAUSE instruction is


implementation-dependent.

On systems without SSE2, PAUSE will execute as NOP.

h. Under VT-x or AMD-V virtualization, executing PAUSE many times in a


short time interval may cause a #VMEXIT. The number of PAUSE
executions and interval length that can trigger #VMEXIT are platform-
specific.

https://en.wikipedia.org/wiki/X86_instruction_listings 16/53
3/14/23, 12:35 AM x86 instruction listings - Wikipedia
i. While the CLFLUSH instruction was introduced together with SSE2, it has r. Unlike the older RDTSC instruction, RDTSCP will delay the TSC read until
its own CPUID flag and may be present on processors not otherwise all previous instructions have retired, guaranteeing ordering with respect
implementing SSE2 and/or absent from processors that otherwise to preceding memory loads (but not stores). RDTSCP is not ordered with
implement SSE2. (E.g. AMD Geode LX supports CLFLUSH but not respect to subsequent instructions, though.
SSE2.) s. RDTSCP can be run outside Ring 0 only if CR4.TSD=0.
j. While the MONITOR and MWAIT instructions were introduced at the same t. While the POPCNT instruction was introduced at the same time as
time as SSE3, they have their own CPUID flag that needs to be SSE4.2, it is not considered to be a part of SSE4.2, but instead a
checked separately from the SSE3 CPUID flag (e.g. Athlon64 X2 and separate extension with its own CPUID flag.
VIA C7 supported SSE3 but not MONITOR.)
k. For the MONITOR and MWAIT instructions, older Intel documentation[54] On AMD processors, it is considered to be a part of the ABM extension,
lists instruction mnemonics with explicit operands (MONITOR but still has its own CPUID flag.
EAX,ECX,EDX and MWAIT EAX,ECX), while newer documentation omits
these operands. Assemblers/disassemblers may support one or both of u. The invalidation types defined for INVPCID (selected by register
these variants.[55] argument) are:
l. For MONITOR, the DS: segment can be overridden with a segment prefix.
Value Function
The memory area that will be monitored will be not just the single byte Invalidate TLB entries matching PCID and virtual memory address in
specified by DS:rAX, but a linear memory region containing the byte - 0
descriptor, excluding global entries
the size and alignment of this memory region is implementation-
dependent and can be queried through CPUID. Invalidate TLB entries matching PCID in descriptor, excluding global
1
entries
The memory location to monitor should have memory type WB (write-
2 Invalidate all TLB entries, including global entries
back cacheable), or else monitoring may fail.
3 Invalidate all TLB entries, excluding global entries
m. On some processors, such as Intel Xeon Phi x200[56] and AMD K10[57]
and later, there exist documented MSRs that can be used to enable Any other value in register argument causes a #GP exception.
MONITOR and MWAIT to run in Ring 3.
n. The wait performed by MWAITmay be ended by system events other v. Unlike the older INVLPG instruction, INVPCID will cause a #GP
than a memory write (e.g. cacheline evictions, interrupts) - the exact set exception if the provided memory address is non-canonical. This
of events that can cause the wait to end is implementation-specific. discrepancy has been known to cause security issues.[59]
w. The PREFETCH and PREFETCHW instructions are mandatory parts of the
Regardless of whether the wait was ended by a memory write or some
3DNow! instruction set extension, but are also available as a standalone
other event, monitoring will have ended and it will be necessary to set
extension on systems that do not support 3DNow!
up monitoring again with MONITOR before performing any further
MWAITs. x. The opcodes for PREFETCH and PREFETCHW (0F 0D /r) execute as
NOPs on Intel CPUs from Cedar Mill (65nm Pentium 4) onwards, with
PREFETCHW gaining prefetch functionality from Broadwell onwards.
o. The leaf functions defined for GETSEC (selected by EAX) are:
y. The PREFETCH (0F 0D /0) instruction is a 3DNow! instruction, present
EAX Function on all processors with 3DNow! but not necessarily on processors with
the PREFETCHW extension.
0 (CAPABILITIES) Report SMX capabilities
On AMD CPUs with PREFETCHW, opcode 0F 0D /0 as well as
2 (ENTERACCES) Enter execution of authenticated code module
opcodes 0F 0D /2../7 are all documented to be performing prefetch.
3 (ENTERACCES) Exit execution of authenticated code module
On Intel processors with PREFETCHW, these opcodes are documented
4 (SENTER) Enter measured environment as performing reserved-NOPs[60] (except 0F 0D /2 being
5 (SENTER) Exit measured environment PREFETCHWT1 m8 on Xeon Phi only) - third party testing[61] indicates that
some or all of these opcodes may be performing prefetch on at least
6 (PARAMETERS) Report SMX parameters some Intel Core CPUs.
7 (SMCTRL) SMX Mode Control
z. Unlike the older RDTSCP instruction which can also be used to read the
8 (WAKEUP) Wake up sleeping processors in measured environment processor ID, user-mode RDPID is not disabled by CR4.TSD=1.
aa. The WBNOINVD instruction will execute as WBINVD if run on a system that
Any other value in EAX causes an #UD exception. doesn't support the WBNOINVD extension.

p. XSAVE was added in steppings E0/R0 of Penryn and is not available in WBINVD differs from WBNOINVD in that WBINVD will invalidate all cache
earlier steppings. lines after writeback.
q. The "core ID" value read by RDTSCP and RDPID is actually the TSC_AUX
MSR (MSR C000_0103h). Whether this value actually corresponds to a
processor ID is a matter of operating system convention.

Added with other Intel-specific extensions

Instruction
Instruction Set Extension Opcode Instruction description Ring Added in
mnemonics

SGX Perform an SGX Supervisor function. The function SGX1


ENCLS NP 0F 01 CF 0
Software Guard Extensions. to perform is given in EAX.[a] Skylake,[b]
Goldmont Plus
Set up an encrypted SGX2
enclave in which a guest Perform an SGX User function. The function to Goldmont Plus,
can execute code that a ENCLU NP 0F 01 D7 3[d] Ice Lake-SP[63]
perform is given in EAX.[c]
compromised or malicious
host cannot inspect or
tamper with. ENCLV NP 0F 01 C0 Perform an SGX virtualization function. The 0 Ice Lake-SP,
function to perform is given in EAX.[e] Tremont

https://en.wikipedia.org/wiki/X86_instruction_listings 17/53
3/14/23, 12:35 AM x86 instruction listings - Wikipedia

PTWRITE
PTWRITE r/m32 F3 0F AE /4 Read data from register or memory to encode into Kaby Lake,
Write data to a Processor Trace 3
Packet.
PTWRITE r/m64 F3 REX.W 0F AE /4 a PTW packet.[f] Goldmont Plus

Store to memory using Direct Store (memory store


MOVDIRI MOVDIRI m32,r32 NP 0F 38 F9 /r Ice Lake,
that is not cached or write-combined with other 3
Move to memory as Direct Store. MOVDIRI m64,r64 NP REX.W 0F 38 F9 /r Tremont
stores).

Move 64 bytes of data from m512 to address


MOVDIR64B Ice Lake,
MOVDIR64B reg,m512 66 0F 38 F8 /r given by ES:reg. The 64-byte write is done 3
Move 64 bytes as Direct Store. Tremont
atomically with Direct Store.[g]

Perform a platform feature configuration function.


PCONFIG
Platform Configuration, including The function to perform is specified in
PCONFIG NP 0F 01 C5 0 Ice Lake-SP
MKTME ("Multi-Key Total Memory
Encryption"). EAX.[h]

Move cache line containing m8 from CPU L1


CLDEMOTE (Tremont),
CLDEMOTE m8 NP 0F 1C /0 cache to a more distant level of the cache 3
Cache Line Demotion Hint. (Alder Lake)[j]
hierarchy.[i]

Wait until the Time Stamp Counter reaches the


value specified in EDX:EAX.

TPAUSE r32 The register argument to the instruction


66 0F AE /6
TPAUSE r32,EDX,EAX
specifies extra flags to control the
operation of the instruction.

WAITPKG
Start monitoring a memory location for memory Tremont,
User-mode memory monitoring 3[k] Alder Lake
and waiting. UMONITOR r16/32/64 F3 0F AE /6 writes. The memory address to monitor is given by
the register argument.[l]
Timed wait for a write to a monitored memory
location previously specified with UMONITOR. In the
UMWAIT r32 absence of a memory write, the wait will end when
F2 0F AE /6 either the TSC reaches the value specified by
UMWAIT r32,EAX,EDX
EDX:EAX or the wait has been going on for an
OS-controlled maximum amount of time.[m]

SERIALIZE
Instruction Execution SERIALIZE NP 0F 01 E8 Serialize instruction fetch and execution.[n] 3 Alder Lake
Serialization.

Request that the processor reset selected


components of hardware-maintained prediction
HRESET
HRESET imm8 F3 0F 3A F0 C0 ib history. A bitmap of which components of the 0 Alder Lake
Processor History Reset.
CPU's prediction history to reset is given in EAX
(the imm8 argument is ignored).

Send Interprocessor User Interrupt. [o]


SENDUIPI reg F3 0F C7 /6

UIRET F3 0F 01 EC User Interrupt Return


UINTR
Test User Interrupt Flag.
User Interprocessor interrupt. 3 Sapphire Rapids
Available in 64-bit mode only. TESTUI F3 0F 01 ED
Copies UIF to EFLAGS.CF .

CLUI F3 0F 01 EE Clear User Interrupt Flag


STUI F3 0F 01 EF Set User Interrupt Flag

Enqueue Command. Reads a 64-byte "command


data" structure from memory (m512 argument)
and writes atomically to a memory-mapped
Enqueue Store device (register argument provides
the memory address of this device, using ES
ENQCMD r32/64,m512 F2 0F 38 F8 /r 3
segment and requiring 64-byte alignment.) Sets
ENQCMD ZF=0 to indicate that device accepted the
command, or ZF=1 to indicate that command was Sapphire Rapids
Enqueue Store.
not accepted (e.g. queue full or the memory
location was not an Enqueue Store device.)
Enqueue Command Supervisor. Differs from
ENQCMD in that it can place an arbitrary PASID
ENQCMDS r32/64,m512 F3 0F 38 F8 /r 0
(process address-space identifier) and a privilege-
bit in the "command data" to enqueue.

https://en.wikipedia.org/wiki/X86_instruction_listings 18/53
3/14/23, 12:35 AM x86 instruction listings - Wikipedia
a. The leaf functions defined for ENCLS (selected by EAX) are: e. The leaf functions defined for ENCLV (selected by EAX) are:

EAX Function EAX Function

0 (ECREATE) Create an enclave 0 (EDECVIRTCHILD) Decrement VIRTCHILDCNT in SECS

1 (EADD) Add a page 1 (EINCVIRTCHILD) Increment VIRTCHILDCNT in SECS

2 (EINIT) Initialize an enclave 2 (ESETCONTEXT) Set ENCLAVECONTEXT field in SECS

3 (EREMOVE) Remove a page from EPC


Any unsupported value in EAX causes a #GP exception.
4 (EDBGRD) Read data by debugger
f. For PTWRITE, the write to the Processor Trace Packet will only happen if
5 (EDBGWR) Write data by debugger
a set of enable-bits (the "TriggerEn", "ContextEn", "FilterEn" bits of the
6 (EEXTEND) Extend EPC page measurement RTIT_STATUS MSR and the "PTWEn" bit of the RTIT_CTL MSR) are all
set to 1.
7 (ELDB) Load an EPC page as blocked
The PTWRITE instruction is indicated in the SDM to cause an #UD
8 (ELDU) Load an EPC page as unblocked
exception if the 66h instruction prefix is used, regardless of other
9 (EBLOCK) Block an EPC page prefixes.

A (EPA) Add version array


g. For MOVDIR64, the destination address given by ES:reg must be 64-byte
B (EWB) Writeback/invalidate EPC page aligned.

C (ETRACK) Activate EBLOCK checks The operand size for the register argument is given by the address size,
which may be overridden by the 67h prefix.
Added with SGX2

D (EAUG) Add page to initialized enclave The 64-byte memory source argument does not need to be 64-byte
aligned, and is not guaranteed to be read atomically.
E (EMODPTR) Restrict permissions of EPC page
h. The leaf functions defined for PCONFIG (selected by EAX) are:
F (EMODT) Change type of EPC page

10 (ERDINFO) Read EPC page type/status info EAX Function

11 (ETRACKC) Activate EBLOCK checks Program key and encryption mode to use with an
0 (MKTME_KEY_PROGRAM)
MKTME Key ID.
12 (ELDBC) Load EPC page as blocked with enhanced error reporting

13 (ELDUC) Load EPC page as unblocked with enhanced error reporting Any other value in EAX causes a #GP(0) exception.

Any unsupported value in EAX causes a #GP exception. i. For CLDEMOTE, the cache level that it will demote a cache line to is
implementation-dependent.
b. SGX is deprecated on desktop/laptop processors from 11th generation
Since the instruction is considered a hint, it will execute as a NOP
(Rocket Lake, Tiger Lake) onwards, but continues to be available on
without any exceptions if the provided memory address is invalid or not
Xeon-branded server parts.[62]
in the L1 cache. It may also execute as a NOP under other
c. The leaf functions defined for ENCLU (selected by EAX) are: implementation-dependent circumstances as well.
EAX Function On systems that do not support the CLDEMOTE extension, it executes
as a NOP.
0 (EREPORT) Create a cryptographic report

1 (EGETKEY) Create a cryptographic key j. As of May 2022, no Tremont or Alder Lake models have been observed
to have the CPUID feature bit for CLDEMOTE set, while several of them
2 (EENTER) Enter an Enclave
have the CPUID bit cleared.[64]
3 (ERESUME) Re-enter an Enclave k. TPAUSE and UMWAIT can be run outside Ring 0 only if CR4.TSD=0.
4 (EEXIT) Exit an Enclave l. For UMONITOR, the operand size of the address argument is given by the
address size, which may be overridden by the 67h prefix. The default
Added with SGX2 segment used is DS:, which can be overridden with a segment prefix.
5 (EACCEPT) Accept changes to EPC page
m. For UMWAIT, the operating system can use the IA32_UMWAIT_CONTROL
MSR to limit the maximum amount of time that a single UMWAIT
6 (EMODPE) Extend EPC page permissions invocation is permitted to wait. The UMWAIT instruction will set
RFLAGS.CF to 1 if it reached the IA32_UMWAIT_CONTROL-defined time
7 (EACCEPTCOPY) Initialize pending page
limit and 0 otherwise.
Added with TDX n. While serialization can be performed with older instructions such as e.g.
CPUID and IRET, these instructions perform additional functions,
8 (EVERIFYREPORT2) Verify a cryptographic report of a trust domain
causing side-effects and reduced performance when stand-alone
instruction serialization is needed. (CPUID additionally has the issue that
Any unsupported value in EAX causes a #GP exception. it causes a mandatory #VMEXIT when executed under virtualization,
which causes a very large overhead.) The SERIALIZE instruction
d. ENCLU can only be executed in ring 3, not rings 0/1/2. performs serialization only, avoiding these added costs.
o. The register argument to SENDUIPI is an index to pick an entry from the
UITT (User-Interrupt Target Table, a table specified by the new
UINTR_TT and UINT_MISC MSRs.)

Added with other AMD-specific extensions

Instruction
Instruction Set Extension Opcode Instruction description Ring Added in
mnemonics

https://en.wikipedia.org/wiki/X86_instruction_listings 19/53
3/14/23, 12:35 AM x86 instruction listings - Wikipedia

AltMovCr8 MOV reg,CR8 F0 0F 20 /0[b] Read the CR8 register.


Alternative mechanism
to access the CR8 0 K8[c]
control register.[a] MOV CR8,reg F0 0F 22 /0[b] Write to the CR8 register.

Start monitoring a memory location for memory writes. Similar to older MONITOR, except
MONITORX NP 0F 01 FA
available in user mode.

MONITORX Wait for a write to a monitored memory location previously specified with MONITORX.
Monitor a memory
3 Excavator
location for writes in MWAITX differs from the older MWAIT instruction mainly in that it runs in user
user mode. MWAITX NP 0F 01 FB
mode and that it can accept an optional timeout argument (given in TSC
time units) in EBX (enabled by setting bit[1] of ECX to 1.)

CLZERO Write zeroes to all bytes in a memory region that has the size and alignment of a CPU
CLZERO rAX NP 0F 01 FC 3 Zen 1
Zero out full cache line. cache line and contains the byte addressed by DS:rAX.[d]

Read selected MSRs (mainly performance counters) in user mode. ECX specifies which
RDPRU register to read.[e]
Read processor register RDPRU NP 0F 01 FD 3[f] Zen 2
in user mode. The value of the MSR is returned in EDX:EAX.

Ensure that all preceding stores in thread have been committed to memory, and that any
errors encountered by these stores have been signalled to any associated error logging
MCOMMIT resources. The set of errors that can be reported and the logging mechanism are platform-
Commit Stores To MCOMMIT F3 0F 01 FA specific. 3 Zen 2
Memory.
Sets EFLAGS.CF to 0 if any errors occurred, 1 otherwise.

Invalidate TLB Entries for a range of pages, with broadcast. The invalidation is performed
on the processor executing the instruction, and also broadcast to all other processors in the
system.

INVLPGB NP 0F 01 FE rAX takes the virtual address to invalidate and some additional flags, ECX
takes the number of pages to invalidate, and EDX specifies ASID and PCID
INVLPGB to perform TLB invalidation for.
Invalidate TLB Entries 0 Zen 3
with broadcast.
Synchronize TLB invalidations.

Wait until all TLB invalidations signalled by preceding invocations of the


TLBSYNC NP 0F 01 FF
INVLPGB instruction on the same logical processor have been responded to
by all processors in the system. Instruction is serializing.

a. The standard way to access the CR8 register is to use an encoding that makes use of the REX.R prefix, e.g. 44 0F 20 07 (MOV RDI,CR8). However,
the REX.R prefix is only available in 64-bit mode.

The AltMovCr8 extension adds an additional method to access CR8, using the F0 (LOCK) prefix instead of REX.R - this provides access to CR8 outside
64-bit mode.

b. Like other variants of MOV to/from the CRx registers, the AltMovCr8 encodings ignore the top 2 bits of the instruction's ModR/M byte, and always
execute as if these two bits are set to 11b.

The AltMovCr8 encodings are available in 64-bit mode. However, using both the LOCK prefix and the REX.R is not permitted and will cause an #UD
exception.

c. Support for AltMovCR8 was added in stepping F of the AMD K8, and is not available on earlier steppings.
d. For CLZERO, the address size and 67h prefix control whether to use AX, EAX or RAX as address. The default segment DS: can be overridden by a
segment-override prefix. The provided address does not need to be aligned - hardware will align it as necessary.

The CLZERO instruction is intended for recovery from otherwise-fatal Machine Check errors. It is non-cacheable, cannot be used to allocate a cache
line without a memory access, and should not be used for fast memory clears.[65]

e. The register numbering used by RDPRU does not necessarily match that of RDMSR/WRMSR.

The registers supported by RDPRU as of December 2022 are:

ECX Register

0 MPERF (MSR 0E7h: Maximum Performance Frequency Clock Count)

1 APERF (MSR 0E8h: Actual Performance Frequency Clock Count)

Unsupported values in ECX return 0.

https://en.wikipedia.org/wiki/X86_instruction_listings 20/53
3/14/23, 12:35 AM x86 instruction listings - Wikipedia
f. If CR4.TSD=1, then the RDPRU instruction can only run in ring 0.

x87 floating-point instructions


The x87 coprocessor, if present, provides support for floating-point arithmetic. The coprocessor provides eight data registers, each holding one 80-bit
floating-point value (1 sign bit, 15 exponent bits, 64 mantissa bits) - these registers are organized as a stack, with the top-of-stack register referred to as
"st" or "st(0)", and the other registers referred to as st(1),st(2),...st(7). It additionally provides a number of control and status registers, including "PC"

https://en.wikipedia.org/wiki/X86_instruction_listings 21/53
3/14/23, 12:35 AM x86 instruction listings - Wikipedia
(precision control, to control whether floating-point operations should be rounded to 24, 53 or 64 mantissa bits) and "RC" (rounding control, to pick
rounding-mode: round-to-zero, round-to-positive-infinity, round-to-negative-infinity, round-to-nearest-even) and a 4-bit condition code register "CC",
whose four bits are individually referred to as C0,C1,C2 and C3). Not all of the arithmetic instructions provided by x87 obey PC and RC.

Original 8087 instructions

Instruction description Mnemonic Opcode Additional items

Waiting
x87 Non-Waiting[a] FPU Control Instructions
mnemonic[b]
Initialize x87 FPU FNINIT DB E3 FINIT

Load x87 Control Word FLDCW m16 D9 /5 (none)

Store x87 Control Word FNSTCW m16 D9 /7 FSTCW


Store x87 Status Word FNSTSW m16 DD /7 FSTSW

Clear x87 Exception Flags FNCLEX DB E2 FCLEX

Load x87 FPU Environment FLDENV m112/m224[c] D9 /4 (none)

Store x87 FPU Environment [c] D9 /6 FSTENV


FNSTENV m112/m224

Save x87 FPU State, then initialize x87 FPU [c] DD /6 FSAVE
FNSAVE m752/m864

Restore x87 FPU State FRSTOR m752/m864[c] DD /4 (none)


[d] FNENI DB E0 FENI
Enable Interrupts (8087 only)

Disable Interrupts (8087 only)[d] FNDISI DB E1 FDISI

precision rounding
x87 Floating-point Load/Store/Move Instructions
control control
FLD m32 D9 /0

FLD m64 DD /0
Load floating-point value onto stack No —
FLD m80 DB /5
FLD st(i) D9 C0+i

FST m32 D9 /2
No Yes
Store top-of-stack floating-point value to memory or stack register FST m64 DD /2

FST st(i)[e] DD D0+i No —

FSTP m32 D9 /3
No Yes
FSTP m64 DD /3

FSTP m80[e] DB /7
Store top-of-stack floating-point value to memory or stack register, then pop
DD D8+i
No —
FSTP st(i)[e][f] DF D0+i[g]

DF D8+i[g]
Push +0.0 onto stack FLDZ D9 EE
No —
Push +1.0 onto stack FLD1 D9 E8

Push π (approximately 3.14159) onto stack FLDPI D9 EB

Push (approximately 3.32193) onto stack FLDL2T D9 E9

Push (approximately 1.44269) onto stack FLDL2E D9 EA No 387[h]


Push (approximately 0.30103) onto stack FLDLG2 D9 EC

Push (approximately 0.69315) onto stack FLDLN2 D9 ED

D9 C8+i

Exchange top-of-stack register with other stack register FXCH st(i) [i][j] DD C8+i[g] No —

DF C8+i[g]
precision rounding
x87 Integer Load/Store Instructions
control control

FILD m16 DF /0
Load signed integer value onto stack from memory, with conversion to floating-point FILD m32 DB /0 No —

FILD m64 DF /5

FIST m16 DF /2
Store top-of-stack value to memory, with conversion to signed integer No Yes
FIST m32 DB /2

FISTP m16 DF /3

Store top-of-stack value to memory, with conversion to signed integer, then pop stack FISTP m32 DB /3 No Yes
FISTP m64 DF /7

https://en.wikipedia.org/wiki/X86_instruction_listings 22/53
3/14/23, 12:35 AM x86 instruction listings - Wikipedia

Load 18-digit Binary-Coded-Decimal integer value onto stack from memory, with conversion to floating-point FBLD m80[k] DF /4 No —

Store top-of-stack value to memory, with conversion to 18-digit Binary-Coded-Decimal integer, then pop stack FBSTP m80 DF /6 No 387[h]
precision rounding
x87 Basic Arithmetic Instructions
control control

FADD m32 D8 /0
Floating-point add
FADD m64 DC /0
Yes Yes
dst <- dst + src FADD st,st(i) D8 C0+i

FADD st(i),st DC C0+i


FMUL m32 D8 /1
Floating-point multiply
FMUL m64 DC /1
Yes Yes
dst <- dst * src FMUL st,st(i) D8 C8+i
FMUL st(i),st DC C8+i

FSUB m32 D8 /4
Floating-point subtract
FSUB m64 DC /4
Yes Yes
dst <- dst - src FSUB st,st(i) D8 E0+i

FSUB st(i),st DC E8+i

FSUBR m32 D8 /5
Floating-point reverse subtract
FSUBR m64 DC /5
Yes Yes
dst <- src - dst FSUBR st,st(i) D8 E8+i

FSUBR st(i),st DC E0+i


FDIV m32 D8 /6
Floating-point divide[l] FDIV m64 DC /6
Yes Yes
dst <- dst / src FDIV st,st(i) D8 F0+i

FDIV st(i),st DC F8+i

FDIVR m32 D8 /7
Floating-point reverse divide
FDIVR m64 DC /7
Yes Yes
dst <- src / dst FDIVR st,st(i) D8 F8+i

FDIVR st(i),st DC F0+i

Floating-point compare FCOM m32 D8 /2

FCOM m64 DC /2
CC <- result_of( st(0) - src )
No —
Same operation as subtract, except that it updates the x87 CC status register instead of any of the FPU D8 D0+i
stack registers FCOM st(i)[i]
DC D0+i[g]
precision rounding
x87 Basic Arithmetic Instructions with Stack Pop
control control

Floating-point add and pop FADDP st(i),st[i] DE C0+i Yes Yes

Floating-point multiply and pop FMULP st(i),st[i] DE C8+i Yes Yes

Floating-point subtract and pop [i] DE E8+i Yes Yes


FSUBP st(i),st

Floating-point reverse-subtract and pop FSUBRP st(i),st[i] DE E0+i Yes Yes

Floating-point divide and pop FDIVP st(i),st[i] DE F8+i Yes Yes

Floating-point reverse-divide and pop [i] DE F0+i Yes Yes


FDIVRP st(i),st
FCOMP m32 D8 /3
FCOMP m64 DC /3

Floating-point compare and pop D8 D8+i No —

FCOMP st(i) [i] DC D8+i[g]

DE D0+i[g]
Floating-point compare to st(1), then pop twice FCOMPP DE D9 No —

precision rounding
x87 Basic Arithmetic Instructions with Integer Source Argument
control control
FIADD m16 DA /0
Floating-point add by integer Yes Yes
FIADD m32 DE /0
FIMUL m16 DA /1
Floating-point multiply by integer Yes Yes
FIMUL m32 DE /1

FISUB m16 DA /4
Floating-point subtract by integer Yes Yes
FISUB m32 DE /4

Floating-point reverse-subtract by integer FISUBR m16 DA /5 Yes Yes

https://en.wikipedia.org/wiki/X86_instruction_listings 23/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia

FISUBR m32 DE /5
FIDIV m16 DA /6
Floating-point divide by integer Yes Yes
FIDIV m32 DE /6

FIDIVR m16 DA /7
Floating-point reverse-divide by integer Yes Yes
FIDIVR m32 DE /7

FICOM m16 DA /2
Floating-point compare to integer No —
FICOM m32 DE /2
FICOMP m16 DA /3
Floating-point compare to integer, and stack pop No —
FICOMP m32 DE /3

precision rounding
x87 Additional Arithmetic Instructions
control control
Floating-point change sign FCHS D9 E0 No —

Floating-point absolute value FABS D9 E1 No —

Floating-point compare top-of-stack value to 0 FTST D9 E4 No —


Classify top-of-stack st(0) register value.
FXAM D9 E5 No —
The classification result is stored in the x87 CC register.[m]

Split the st(0) value into two values E and M representing the exponent and mantissa of st(0).
The split is done such that , where E is an integer and M is a number whose absolute value is
FXTRACT D9 F4 No —
within the range .  [n]
st(0) is then replaced with E, after which M is pushed onto the stack.

Floating-point partial[o] remainder (not IEEE 754 compliant):

FPREM D9 F8 No —[p]

Floating-point square root FSQRT D9 FA Yes Yes

Floating-point round to integer FRNDINT D9 FC No Yes

Floating-point power-of-2 scaling. Rounds the value of st(1) to integer with round-to-zero, then uses it as a scale
factor for st(0):[q]
FSCALE D9 FD No Yes[r]

Source operand
x87 Transcendental Instructions[s] range restriction

Base-2 exponential minus 1, with extra precision for st(0) close to 0:


8087: 
F2XM1 D9 F0
80387: 

Base-2 Logarithm:
FYL2X[t] D9 F1 no restrictions
followed by stack pop

Partial Tangent: Computes from st(0) a pair of values X and Y, such that

8087: 
FPTAN D9 F2
80387: 
The Y value replaces the top-of-stack value, and then X is pushed onto the stack.
On 80387 and later x87, but not original 8087, X is always 1.0

Two-argument arctangent with quadrant adjustment:[u]

8087: 
FPATAN D9 F3
80387: no restrictions

followed by stack pop

Base-2 Logarithm plus 1, with extra precision for st(0) close to 0: Intel: 

FYL2XP1[t] D9 F9 AMD: 

followed by stack pop

Other x87 Instructions


[v] FNOP D9 D0
No operation
Decrement x87 FPU Register Stack Pointer FDECSTP D9 F6

Increment x87 FPU Register Stack Pointer FINCSTP D9 F7

Free x87 FPU Register FFREE st(i) DD C0+i


WAIT,
Check and handle pending unmasked x87 FPU exceptions 9B
FWAIT

Floating-point store and pop, without stack underflow exception FSTPNCE st(i) D9 D8+i[g]

https://en.wikipedia.org/wiki/X86_instruction_listings 24/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia

Free x87 register, then stack pop FFREEP st(i) DF C0+i[g]

a. x87 coprocessors (other than the 8087) handle exceptions in a fairly i. For the FADDP, FSUBP, FSUBRP, FMULP, FDIVP, FDIVRP, FCOM, FCOMP and
unusual way. When an x87 instruction generates an unmasked FXCH instructions, x86 assemblers/disassemblers may recognize
arithmetic exception, it will still complete without causing a CPU fault - variants of the instructions with no arguments. Such variants are
instead of causing a fault, it will record within the coprocessor equivalent to variants using st(1) as their first argument.
information needed to handle the exception (instruction pointer, opcode, j. On Intel Pentium and later processors, FXCH is implemented as a
data pointer if the instruction had a memory operand) and set FPU register renaming rather than a true data move. This has no semantic
status-word flag to indicate that a pending exception is present. This effect, but enables zero-cycle-latency operation. It also allows the
pending exception will then cause a CPU fault when the next x87, MMX instruction to break data dependencies for the x87 top-of-stack value,
or WAIT instruction is executed. improving attainable performance for code optimized for these
The exception to this is x87's "Non-Waiting" instructions, which will processors.
execute without causing such a fault even if a pending exception is
k. The result of executing the FBLD instruction on non-BCD data is
present (with some caveats, see application note AP-578[66]). These undefined.
instructions are mostly control instructions that can inspect and/or
l. On early Intel Pentium processors, floating-point divide was subject to
modify the pending-exception state of the x87 FPU.
the Pentium FDIV bug. This also affected instructions that perform
b. For each non-waiting x87 instruction whose mnemonic begins with FN,
divide as part of their operations, such as FPREM and FPATAN.[75]
there exists a pseudo-instruction that has the same mnemonic except
without the N. These pseudo-instructions consist of a WAIT instruction m. The FXAM instruction will set C0, C2 and C3 based on value type in st(0)
(opcode 9B) followed by the corresponding non-waiting x87 instruction. as follows:
For example:
C3 C2 C0 Classification
FNCLEX is an instruction with the opcode DB E2. The corresponding
0 0 0 Unsupported (unnormal or pseudo-NaN)
pseudo-instruction FCLEX is then encoded as 9B DB E2.
FNSAVE ES:[BX+6] is an instruction with the opcode 26 DD 77 06. 0 0 1 NaN
The corresponding pseudo-instruction FSAVE ES:[BX+6] is then
encoded as 9B 26 DD 77 06 0 1 0 Normal finite number

These pseudo-instructions are commonly recognized by x86 0 1 1 Infinity


assemblers and disassemblers and treated as single instructions, even
though all x86 CPUs with x87 coprocessors execute them as a 1 0 0 Zero
sequence of two instructions.
1 0 1 Empty
c. On 80387 and later x87 FPUs, FLDENV, F(N)STENV, FRSTOR and
F(N)SAVE exist in 16-bit and 32-bit variants. The 16-bit variants will 1 1 0 Denormal number
load/store a 14-byte floating-point environment data structure to/from
1 1 1 Empty (may occur on 8087/80287 only)
memory - the 32-bit variants will load/store a 28-byte data structure
instead. (F(N)SAVE/FRSTOR will additionally load/store an additional 80
bytes of FPU data register content after the FPU environment, for a total C1 is set to the sign-bit of st(0), regardless of whether st(0) is Empty or
of 94 or 108 bytes). The choice between the 16-bit and 32-bit variants is not.
based on the CS.D bit and the presence of the 66h instruction prefix. On
8087 and 80287, only the 16-bit variants are available. n. For FXTRACT, if st(0) is zero or ±∞, then M is set equal to st(0). If st(0) is
64-bit variants of these instructions do not exist - using REX.W under zero, E is set to 0 on 8087/80287 but -∞ on 80387 and later. If st(0) is
x86-64 will cause the 32-bit variants to be used. Since these can only ±∞, then E is set to +∞.
load/store the bottom 32 bits of FIP and FDP, it is recommended to use
o. For FPREM, if the quotient Q is larger than , then the remainder
FXSAVE/FXRSTOR instead if 64-bit operation is desired.
calculation may have been done only partially - in this case, the FPREM
d. In the case of an x87 instruction producing an unmasked FPU instruction will need to be run again in order to complete the remainder
exception, the 8087 FPU will signal an IRQ some indeterminate time calculation. This is indicated by the instruction setting C2 to 1.
after the instruction was issued. This may not always be possible to If the instruction did complete the remainder calculation, it will set C2 to
handle,[67] and so the FPU offers the F(N)DISI and F(N)ENI 0 and set the three bits {C0,C3,C1} to the bottom three bits of the
instructions to set/clear the Interrupt Mask bit (bit 7) of the x87 Control quotient Q.
Word,[68] to control the interrupt.
Later x87 FPUs, from 80287 onwards, changed the FPU exception On 80387 and later, if the instruction didn't complete the remainder
mechanism to instead produce a CPU exception on the next x87 calculation, then the computed remainder Q used for argument
instruction. This made the Interrupt Mask bit unnecessary, so it was reduction will have been rounded to a multiple of 8 (or larger power-of-
removed.[69] In later Intel x87 FPUs, the F(N)ENI and F(N)DISI 2), so that the bottom 3 bits of the quotient can still be correctly retrieved
instructions were kept for backwards compatibility, executing as NOPs in a later pass that does complete the remainder calculation.
that do not modify any x87 state.
e. FST/FSTP with an 80-bit destination (m80 or st(i)) and an sNaN source p. The remainder computation done by the FPREM instruction is always
value will produce exceptions on AMD but not Intel FPUs. exact with no roundoff errors.
f. FSTP ST(0) is a commonly used idiom for popping a single register off q. For the FSCALE instruction on 8087 and 80287, st(1) is required to be in
the x87 register stack. the range . Also, its absolute value must be either 0
g. Intel x87 alias opcode. Use of this opcode is not recommended. or at least 1. If these requirements are not satisfied, the result is
undefined.
On the Intel 8087 coprocessor, several reserved opcodes would perform These restrictions were removed in the 80387.
operations behaving similarly to existing defined x87 instructions. These r. For FSCALE, rounding is only applied in the case of overflow, underflow
opcodes were documented for the 8087[70] and 80287,[71] but then or subnormal result.
omitted from later manuals until the October 2017 update of the Intel s. The x87 transcendental instructions do not obey PC or RC, but instead
SDM.[72] compute full 80-bit results. These results are not necessarily correctly
rounded (see Table-maker's dilemma) - they may have an error of up to
They are present on all known Intel x87 FPUs but unavailable on some ±1 ulp on Pentium or later, or up to ±1.5 ulps on earlier x87
older non-Intel FPUs, such as AMD Geode GX/LX, DM&P Vortex86[73] coprocessors.
and NexGen 586PF.[74] t. For the FYL2X and FYL2XP1, the maximum error bound of ±1 ulp only
holds for st(1)=1.0 - for other values of st(1), the error bound is
h. On the 8087 and 80287, FBSTP and the load-constant instructions increased to ±1.35 ulps.
always use the round-to-nearest rounding mode. On the 80387 and
later x87 FPUs, these instructions will use the rounding mode specified
in the x87 RC register.

https://en.wikipedia.org/wiki/X86_instruction_listings 25/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia
u. For FPATAN, the following adjustments are done as compared to just v. While FNOP is a no-op in the sense that will leave the x87 FPU register
stack unmodified, it may still modify FIP and CC, and it may fault if a
computing a one-argument arctangent of the ratio : pending x87 FPU exception is present.

If both st(0) and st(1) are ±∞, then the arctangent is computed as if
each of st(0) and st(1) had been replaced with ±1 of the same sign.
This produces a result that is an odd multiple of .

If both st(0) and st(1) are ±0, then the arctangent is computed as if
st(0) but not st(1) had been replaced with ±1 of the same sign,
producing a result of ±0 or .
If st(0) is negative (has sign bit set), then an addend of with the
same sign as st(1) is added to the result.

x87 instructions added in later processors

Instruction description Mnemonic Opcode Additional items

Waiting
x87 Non-Waiting Control Instructions added in 80287
mnemonic

Notify FPU of entry into Protected Mode FNSETPM[a] DB E4 FSETPM

Store x87 Status Word to AX FNSTSW AX DF E0 FSTSW AX

Source operand
x87 Instructions added in 80387
range restriction

Floating-point unordered compare.


Similar to the regular floating-point compare instruction FCOM, except will not produce an exception in response FUCOM st(i)[b] DD E0+i
to any qNaN operands.

Floating-point unordered compare and pop FUCOMP st(i)[b] DD E8+i no restrictions

Floating-point unordered compare to st(1), then pop twice FUCOMPP DA E9

IEEE 754 compliant floating-point partial remainder.[c] FPREM1 D9 F5

Floating-point sine and cosine.


Computes two values and  [d] FSINCOS D9 FB
Top-of-stack st(0) is replaced with S, after which C is pushed onto the stack.

Floating-point sine.[d]
FSIN D9 FE

Floating-point cosine.[d]
FCOS D9 FF

Condition for
x87 Instructions added in Pentium Pro
conditional moves

FCMOVB st(0),st(i) DA C0+i below (CF=1)

FCMOVE st(0),st(i) DA C8+i equal (ZF=1)


below or equal
FCMOVBE st(0),st(i) DA D0+i
(CF=1 or ZF=1)

FCMOVU st(0),st(i) DA D8+i unordered (PF=1)


Floating-point conditional move to st(0) based on EFLAGS
FCMOVNB st(0),st(i) DB C0+i not below (CF=0)
FCMOVNE st(0),st(i) DB C8+i not equal (ZF=0)

not below or equal


FCMOVNBE st(0),st(i) DB D0+i
(CF=0 and ZF=0)

FCMOVNU st(0),st(i) DB D8+i not unordered (PF=0)


Floating-point compare and set EFLAGS.
Differs from the older FCOM floating-point compare instruction in that it puts its result in the integer EFLAGS FCOMI st(0),st(i) DB F0+i
register rather than the x87 CC register.[e]
Floating-point compare and set EFLAGS, then pop FCOMIP st(0),st(i) DF F0+i

Floating-point unordered compare and set EFLAGS FUCOMI st(0),st(i) DB E8+i

Floating-point unordered compare and set EFLAGS, then pop FUCOMIP st(0),st(i) DF E8+i

64-bit mnemonic
x87 Non-Waiting Instructions added in Pentium II[f] (REX.W prefix)

Save x87, MMX and SSE state to 512-byte data structure[g][h][i] FXSAVE m512byte NP 0F AE /0 FXSAVE64 m512byte
[g][h] FXRSTOR m512byte NP 0F AE /1 FXRSTOR64 m512byte
Restore x87, MMX and SSE state from 512-byte data structure

x87 Instructions added as part of SSE3

Floating-point store integer and pop, with round-to-zero FISTTP m16 DF /1

https://en.wikipedia.org/wiki/X86_instruction_listings 26/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia

FISTTP m32 DB /1
FISTTP m64 DD /1

a. The x87 FPU needs to know whether it is operating in Real Mode or Protected Mode because the floating-point environment accessed by the
F(N)SAVE, FRSTOR, FLDENV and F(N)STENV instructions has different formats in Real Mode and Protected Mode. On 80287, the F(N)SETPM instruction
is required to communicate the real-to-protected mode transition to the FPU. On 80387 and later x87 FPUs, real↔protected mode transitions are
communicated automatically to the FPU without the need for any dedicated instructions - therefore, on these FPUs, FNSETPM executes as a NOP that
does not modify any FPU state.
b. For the FUCOM and FUCOMP instructions, x86 assemblers/disassemblers may recognize variants of the instructions with no arguments. Such variants
are equivalent to variants using st(1) as their first argument.
c. The 80387 FPREM1 instruction differs from the older FPREM (D9 F8) instruction in that the quotient Q is rounded to integer with round-to-nearest-even
rounding rather than the round-to-zero rounding used by FPREM. Like FPREM, FPREM1 always computes an exact result with no roundoff errors. Like
FPREM, it may also perform a partial computation if the quotient is too large, in which case it must be run again.
d. Due to the x87 FPU performing argument reduction for sin/cos with only about 68 bits of precision, the value of k used in the calculation of FSIN, FCOS
and FSINCOS is not precisely 1.0, but instead given by[76][77]

This argument reduction inaccuracy also affects the FPTAN instruction.


e. The FCOMI, FCOMIP, FUCOMI and FUCOMIP instructions write their results to the ZF, CF and PF bits of the EFLAGS register. On Intel but not AMD
processors, the SF, AF and OF bits of EFLAGS are also zeroed out by these instructions.
f. The FXSAVE and FXRSTOR instructions were added in the "Deschutes" revision of Pentium II, and are not present in earlier "Klamath" revision.
g. The FXSAVE and FXRSTOR instructions will save/restore SSE state only on processors that support SSE. Otherwise, they will only save/restore x87 and
MMX state.
The x87 section of the state saved/restored by FXSAVE/FXRSTOR has a completely different layout than the data structure of the older
F(N)SAVE/FRSTOR instructions, enabling faster save/restore by avoiding misaligned loads and stores.
h. When floating-point emulation is enabled with CR0.EM=1, FXSAVE(64) and FXRSTOR(64) are considered to be x87 instructions and will accordingly
produce an #NM (device-not-available) exception. Other than WAIT, these are the only opcodes outside the D8..DF ESC opcode space that exhibit this
behavior. (All opcodes in D8..DF will produce #NM if CR0.EM=1, even for undefined opcodes that would produce #UD otherwise.
i. Unlike the older F(N)SAVE instruction, FXSAVE will not initialize the FPU after saving its state to memory, but instead leave the x87 coprocessor state
unmodified.

SIMD instructions

MMX instructions

MMX instructions operate on the mm registers, which are 64 bits wide. They are shared with the FPU registers.

Original MMX instructions

Added with Pentium MMX

https://en.wikipedia.org/wiki/X86_instruction_listings 27/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia

Instruction Opcode Meaning Notes

EMMS 0F 77 Empty MMX Technology State Marks all x87 FPU registers for use by FPU

MOVD mm, r/m32 0F 6E /r Move doubleword


MOVD r/m32, mm 0F 7E /r Move doubleword

MOVQ mm/m64, mm 0F 7F /r Move quadword

MOVQ mm, mm/m64 0F 6F /r Move quadword


MOVQ mm, r/m64 REX.W + 0F 6E /r Move quadword

MOVQ r/m64, mm REX.W + 0F 7E /r Move quadword

PACKSSDW mm1, mm2/m64 0F 6B /r Pack doublewords to words (signed with saturation)


PACKSSWB mm1, mm2/m64 0F 63 /r Pack words to bytes (signed with saturation)

PACKUSWB mm, mm/m64 0F 67 /r Pack words to bytes (unsigned with saturation)

PADDB mm, mm/m64 0F FC /r Add packed byte integers


PADDW mm, mm/m64 0F FD /r Add packed word integers

PADDD mm, mm/m64 0F FE /r Add packed doubleword integers

PADDQ mm, mm/m64 0F D4 /r Add packed quadword integers


PADDSB mm, mm/m64 0F EC /r Add packed signed byte integers and saturate

PADDSW mm, mm/m64 0F ED /r Add packed signed word integers and saturate
PADDUSB mm, mm/m64 0F DC /r Add packed unsigned byte integers and saturate

PADDUSW mm, mm/m64 0F DD /r Add packed unsigned word integers and saturate

PAND mm, mm/m64 0F DB /r Bitwise AND


PANDN mm, mm/m64 0F DF /r Bitwise AND NOT

POR mm, mm/m64 0F EB /r Bitwise OR

PXOR mm, mm/m64 0F EF /r Bitwise XOR


PCMPEQB mm, mm/m64 0F 74 /r Compare packed bytes for equality

PCMPEQW mm, mm/m64 0F 75 /r Compare packed words for equality

PCMPEQD mm, mm/m64 0F 76 /r Compare packed doublewords for equality


PCMPGTB mm, mm/m64 0F 64 /r Compare packed signed byte integers for greater than

PCMPGTW mm, mm/m64 0F 65 /r Compare packed signed word integers for greater than

PCMPGTD mm, mm/m64 0F 66 /r Compare packed signed doubleword integers for greater than
PMADDWD mm, mm/m64 0F F5 /r Multiply packed words, add adjacent doubleword results

PMULHW mm, mm/m64 0F E5 /r Multiply packed signed word integers, store high 16 bits of results

PMULLW mm, mm/m64 0F D5 /r Multiply packed signed word integers, store low 16 bits of results
PSLLW mm1, imm8 0F 71 /6 ib Shift left words, shift in zeros

PSLLW mm, mm/m64 0F F1 /r Shift left words, shift in zeros

PSLLD mm, imm8 0F 72 /6 ib Shift left doublewords, shift in zeros


PSLLD mm, mm/m64 0F F2 /r Shift left doublewords, shift in zeros

PSLLQ mm, imm8 0F 73 /6 ib Shift left quadword, shift in zeros

PSLLQ mm, mm/m64 0F F3 /r Shift left quadword, shift in zeros


PSRAD mm, imm8 0F 72 /4 ib Shift right doublewords, shift in sign bits

PSRAD mm, mm/m64 0F E2 /r Shift right doublewords, shift in sign bits

PSRAW mm, imm8 0F 71 /4 ib Shift right words, shift in sign bits


PSRAW mm, mm/m64 0F E1 /r Shift right words, shift in sign bits

PSRLW mm, imm8 0F 71 /2 ib Shift right words, shift in zeros

PSRLW mm, mm/m64 0F D1 /r Shift right words, shift in zeros


PSRLD mm, imm8 0F 72 /2 ib Shift right doublewords, shift in zeros

PSRLD mm, mm/m64 0F D2 /r Shift right doublewords, shift in zeros

PSRLQ mm, imm8 0F 73 /2 ib Shift right quadword, shift in zeros


PSRLQ mm, mm/m64 0F D3 /r Shift right quadword, shift in zeros

PSUBB mm, mm/m64 0F F8 /r Subtract packed byte integers


PSUBW mm, mm/m64 0F F9 /r Subtract packed word integers

PSUBD mm, mm/m64 0F FA /r Subtract packed doubleword integers

PSUBSB mm, mm/m64 0F E8 /r Subtract signed packed bytes with saturation


PSUBSW mm, mm/m64 0F E9 /r Subtract signed packed words with saturation

PSUBUSB mm, mm/m64 0F D8 /r Subtract unsigned packed bytes with saturation

PSUBUSW mm, mm/m64 0F D9 /r Subtract unsigned packed words with saturation

https://en.wikipedia.org/wiki/X86_instruction_listings 28/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia

PUNPCKHBW mm, mm/m64 0F 68 /r Unpack and interleave high-order bytes


PUNPCKHWD mm, mm/m64 0F 69 /r Unpack and interleave high-order words

PUNPCKHDQ mm, mm/m64 0F 6A /r Unpack and interleave high-order doublewords

PUNPCKLBW mm, mm/m32 0F 60 /r Unpack and interleave low-order bytes


PUNPCKLWD mm, mm/m32 0F 61 /r Unpack and interleave low-order words

PUNPCKLDQ mm, mm/m32 0F 62 /r Unpack and interleave low-order doublewords

MMX instructions added in specific processors

MMX instructions added with MMX+ and SSE

The following MMX instruction were added with SSE. They are also available on the Athlon under the name MMX+.

Instruction Opcode Meaning


MASKMOVQ mm1, mm2 0F F7 /r Masked Move of Quadword

MOVNTQ m64, mm 0F E7 /r Move Quadword Using Non-Temporal Hint

PSHUFW mm1, mm2/m64, imm8 0F 70 /r ib Shuffle Packed Words


PINSRW mm, r32/m16, imm8 0F C4 /r Insert Word

PEXTRW reg, mm, imm8 0F C5 /r Extract Word

PMOVMSKB reg, mm 0F D7 /r Move Byte Mask


PMINUB mm1, mm2/m64 0F DA /r Minimum of Packed Unsigned Byte Integers

PMAXUB mm1, mm2/m64 0F DE /r Maximum of Packed Unsigned Byte Integers

PAVGB mm1, mm2/m64 0F E0 /r Average Packed Integers


PAVGW mm1, mm2/m64 0F E3 /r Average Packed Integers

PMULHUW mm1, mm2/m64 0F E4 /r Multiply Packed Unsigned Integers and Store High Result
PMINSW mm1, mm2/m64 0F EA /r Minimum of Packed Signed Word Integers

PMAXSW mm1, mm2/m64 0F EE /r Maximum of Packed Signed Word Integers

PSADBW mm1, mm2/m64 0F F6 /r Compute Sum of Absolute Differences

MMX instructions added with SSE2

The following MMX instructions were added with SSE2:

Instruction Opcode Meaning

PSUBQ mm1, mm2/m64 0F FB /r Subtract quadword integer


PMULUDQ mm1, mm2/m64 0F F4 /r Multiply unsigned doubleword integer

MMX instructions added with SSSE3

Instruction Opcode Meaning


PSIGNB mm1, mm2/m64 0F 38 08 /r Negate/zero/preserve packed byte integers depending on corresponding sign

PSIGNW mm1, mm2/m64 0F 38 09 /r Negate/zero/preserve packed word integers depending on corresponding sign

PSIGND mm1, mm2/m64 0F 38 0A /r Negate/zero/preserve packed doubleword integers depending on corresponding sign
PSHUFB mm1, mm2/m64 0F 38 00 /r Shuffle bytes

PMULHRSW mm1, mm2/m64 0F 38 0B /r Multiply 16-bit signed words, scale and round signed doublewords, pack high 16 bits

PMADDUBSW mm1, mm2/m64 0F 38 04 /r Multiply signed and unsigned bytes, add horizontal pair of signed words, pack saturated signed-words
PHSUBW mm1, mm2/m64 0F 38 05 /r Subtract and pack 16-bit signed integers horizontally

PHSUBSW mm1, mm2/m64 0F 38 07 /r Subtract and pack 16-bit signed integer horizontally with saturation

PHSUBD mm1, mm2/m64 0F 38 06 /r Subtract and pack 32-bit signed integers horizontally
PHADDSW mm1, mm2/m64 0F 38 03 /r Add and pack 16-bit signed integers horizontally, pack saturated integers to mm1.

PHADDW mm1, mm2/m64 0F 38 01 /r Add and pack 16-bit integers horizontally


PHADDD mm1, mm2/m64 0F 38 02 /r Add and pack 32-bit integers horizontally

PALIGNR mm1, mm2/m64, imm8 0F 3A 0F /r ib Concatenate destination and source operands, extract byte-aligned result shifted to the right

PABSB mm1, mm2/m64 0F 38 1C /r Compute the absolute value of bytes and store unsigned result
PABSW mm1, mm2/m64 0F 38 1D /r Compute the absolute value of 16-bit integers and store unsigned result

PABSD mm1, mm2/m64 0F 38 1E /r Compute the absolute value of 32-bit integers and store unsigned result

https://en.wikipedia.org/wiki/X86_instruction_listings 29/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia

SSE instructions

Added with Pentium III

SSE instructions operate on xmm registers, which are 128 bit wide.

SSE consists of the following SSE SIMD floating-point instructions:

https://en.wikipedia.org/wiki/X86_instruction_listings 30/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia

Instruction Opcode Meaning

ANDPS* xmm1, xmm2/m128 0F 54 /r Bitwise Logical AND of Packed Single-Precision Floating-Point Values

ANDNPS* xmm1, xmm2/m128 0F 55 /r Bitwise Logical AND NOT of Packed Single-Precision Floating-Point Values
ORPS* xmm1, xmm2/m128 0F 56 /r Bitwise Logical OR of Single-Precision Floating-Point Values

XORPS* xmm1, xmm2/m128 0F 57 /r Bitwise Logical XOR for Single-Precision Floating-Point Values

MOVUPS xmm1, xmm2/m128 0F 10 /r Move Unaligned Packed Single-Precision Floating-Point Values


MOVSS xmm1, xmm2/m32 F3 0F 10 /r Move Scalar Single-Precision Floating-Point Values

MOVUPS xmm2/m128, xmm1 0F 11 /r Move Unaligned Packed Single-Precision Floating-Point Values

MOVSS xmm2/m32, xmm1 F3 0F 11 /r Move Scalar Single-Precision Floating-Point Values


MOVLPS xmm, m64 0F 12 /r Move Low Packed Single-Precision Floating-Point Values

MOVHLPS xmm1, xmm2 0F 12 /r Move Packed Single-Precision Floating-Point Values High to Low

MOVLPS m64, xmm 0F 13 /r Move Low Packed Single-Precision Floating-Point Values


UNPCKLPS xmm1, xmm2/m128 0F 14 /r Unpack and Interleave Low Packed Single-Precision Floating-Point Values

UNPCKHPS xmm1, xmm2/m128 0F 15 /r Unpack and Interleave High Packed Single-Precision Floating-Point Values

MOVHPS xmm, m64 0F 16 /r Move High Packed Single-Precision Floating-Point Values


MOVLHPS xmm1, xmm2 0F 16 /r Move Packed Single-Precision Floating-Point Values Low to High

MOVHPS m64, xmm 0F 17 /r Move High Packed Single-Precision Floating-Point Values


MOVAPS xmm1, xmm2/m128 0F 28 /r Move Aligned Packed Single-Precision Floating-Point Values

MOVAPS xmm2/m128, xmm1 0F 29 /r Move Aligned Packed Single-Precision Floating-Point Values

MOVNTPS m128, xmm1 0F 2B /r Move Aligned Four Packed Single-FP Non Temporal
MOVMSKPS reg, xmm 0F 50 /r Extract Packed Single-Precision Floating-Point 4-bit Sign Mask. The upper bits of the register are filled with zeros.

CVTPI2PS xmm, mm/m64 0F 2A /r Convert Packed Dword Integers to Packed Single-Precision FP Values

CVTSI2SS xmm, r/m32 F3 0F 2A /r Convert Dword Integer to Scalar Single-Precision FP Value


CVTSI2SS xmm, r/m64 F3 REX.W 0F 2A /r Convert Qword Integer to Scalar Single-Precision FP Value

CVTTPS2PI mm, xmm/m64 0F 2C /r Convert with Truncation Packed Single-Precision FP Values to Packed Dword Integers

CVTTSS2SI r32, xmm/m32 F3 0F 2C /r Convert with Truncation Scalar Single-Precision FP Value to Dword Integer
CVTTSS2SI r64, xmm1/m32 F3 REX.W 0F 2C /r Convert with Truncation Scalar Single-Precision FP Value to Qword Integer

CVTPS2PI mm, xmm/m64 0F 2D /r Convert Packed Single-Precision FP Values to Packed Dword Integers

CVTSS2SI r32, xmm/m32 F3 0F 2D /r Convert Scalar Single-Precision FP Value to Dword Integer


CVTSS2SI r64, xmm1/m32 F3 REX.W 0F 2D /r Convert Scalar Single-Precision FP Value to Qword Integer

UCOMISS xmm1, xmm2/m32 0F 2E /r Unordered Compare Scalar Single-Precision Floating-Point Values and Set EFLAGS

COMISS xmm1, xmm2/m32 0F 2F /r Compare Scalar Ordered Single-Precision Floating-Point Values and Set EFLAGS
SQRTPS xmm1, xmm2/m128 0F 51 /r Compute Square Roots of Packed Single-Precision Floating-Point Values

SQRTSS xmm1, xmm2/m32 F3 0F 51 /r Compute Square Root of Scalar Single-Precision Floating-Point Value

RSQRTPS xmm1, xmm2/m128 0F 52 /r Compute Reciprocal of Square Root of Packed Single-Precision Floating-Point Value
RSQRTSS xmm1, xmm2/m32 F3 0F 52 /r Compute Reciprocal of Square Root of Scalar Single-Precision Floating-Point Value

RCPPS xmm1, xmm2/m128 0F 53 /r Compute Reciprocal of Packed Single-Precision Floating-Point Values

RCPSS xmm1, xmm2/m32 F3 0F 53 /r Compute Reciprocal of Scalar Single-Precision Floating-Point Values


ADDPS xmm1, xmm2/m128 0F 58 /r Add Packed Single-Precision Floating-Point Values

ADDSS xmm1, xmm2/m32 F3 0F 58 /r Add Scalar Single-Precision Floating-Point Values

MULPS xmm1, xmm2/m128 0F 59 /r Multiply Packed Single-Precision Floating-Point Values


MULSS xmm1, xmm2/m32 F3 0F 59 /r Multiply Scalar Single-Precision Floating-Point Values

SUBPS xmm1, xmm2/m128 0F 5C /r Subtract Packed Single-Precision Floating-Point Values

SUBSS xmm1, xmm2/m32 F3 0F 5C /r Subtract Scalar Single-Precision Floating-Point Values


MINPS xmm1, xmm2/m128 0F 5D /r Return Minimum Packed Single-Precision Floating-Point Values

MINSS xmm1, xmm2/m32 F3 0F 5D /r Return Minimum Scalar Single-Precision Floating-Point Values

DIVPS xmm1, xmm2/m128 0F 5E /r Divide Packed Single-Precision Floating-Point Values


DIVSS xmm1, xmm2/m32 F3 0F 5E /r Divide Scalar Single-Precision Floating-Point Values

MAXPS xmm1, xmm2/m128 0F 5F /r Return Maximum Packed Single-Precision Floating-Point Values


MAXSS xmm1, xmm2/m32 F3 0F 5F /r Return Maximum Scalar Single-Precision Floating-Point Values

LDMXCSR m32 0F AE /2 Load MXCSR Register State

STMXCSR m32 0F AE /3 Store MXCSR Register State


CMPPS xmm1, xmm2/m128, imm8 0F C2 /r ib Compare Packed Single-Precision Floating-Point Values

CMPSS xmm1, xmm2/m32, imm8 F3 0F C2 /r ib Compare Scalar Single-Precision Floating-Point Values

SHUFPS xmm1, xmm2/m128, imm8 0F C6 /r ib Shuffle Packed Single-Precision Floating-Point Values

https://en.wikipedia.org/wiki/X86_instruction_listings 31/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia
The floating point single bitwise operations ANDPS, ANDNPS, ORPS and XORPS produce the same result as the SSE2 integer (PAND, PANDN,
POR, PXOR) and double ones (ANDPD, ANDNPD, ORPD, XORPD), but can introduce extra latency for domain changes when applied values of the
wrong type.[78]

SSE2 instructions

Added with Pentium 4

SSE2 SIMD floating-point instructions

SSE2 data movement instructions

Instruction Opcode Meaning

MOVAPD xmm1, xmm2/m128 66 0F 28 /r Move Aligned Packed Double-Precision Floating-Point Values

MOVAPD xmm2/m128, xmm1 66 0F 29 /r Move Aligned Packed Double-Precision Floating-Point Values


MOVNTPD m128, xmm1 66 0F 2B /r Store Packed Double-Precision Floating-Point Values Using Non-Temporal Hint

MOVHPD xmm1, m64 66 0F 16 /r Move High Packed Double-Precision Floating-Point Value

MOVHPD m64, xmm1 66 0F 17 /r Move High Packed Double-Precision Floating-Point Value


MOVLPD xmm1, m64 66 0F 12 /r Move Low Packed Double-Precision Floating-Point Value

MOVLPD m64, xmm1 66 0F 13/r Move Low Packed Double-Precision Floating-Point Value

MOVUPD xmm1, xmm2/m128 66 0F 10 /r Move Unaligned Packed Double-Precision Floating-Point Values


MOVUPD xmm2/m128, xmm1 66 0F 11 /r Move Unaligned Packed Double-Precision Floating-Point Values

MOVMSKPD reg, xmm 66 0F 50 /r Extract Packed Double-Precision Floating-Point Sign Mask

MOVSD* xmm1, xmm2/m64 F2 0F 10 /r Move or Merge Scalar Double-Precision Floating-Point Value


MOVSD xmm1/m64, xmm2 F2 0F 11 /r Move or Merge Scalar Double-Precision Floating-Point Value

SSE2 packed arithmetic instructions

Instruction Opcode Meaning


ADDPD xmm1, xmm2/m128 66 0F 58 /r Add Packed Double-Precision Floating-Point Values

ADDSD xmm1, xmm2/m64 F2 0F 58 /r Add Low Double-Precision Floating-Point Value

DIVPD xmm1, xmm2/m128 66 0F 5E /r Divide Packed Double-Precision Floating-Point Values


DIVSD xmm1, xmm2/m64 F2 0F 5E /r Divide Scalar Double-Precision Floating-Point Value

MAXPD xmm1, xmm2/m128 66 0F 5F /r Maximum of Packed Double-Precision Floating-Point Values

MAXSD xmm1, xmm2/m64 F2 0F 5F /r Return Maximum Scalar Double-Precision Floating-Point Value


MINPD xmm1, xmm2/m128 66 0F 5D /r Minimum of Packed Double-Precision Floating-Point Values

MINSD xmm1, xmm2/m64 F2 0F 5D /r Return Minimum Scalar Double-Precision Floating-Point Value

MULPD xmm1, xmm2/m128 66 0F 59 /r Multiply Packed Double-Precision Floating-Point Values


MULSD xmm1,xmm2/m64 F2 0F 59 /r Multiply Scalar Double-Precision Floating-Point Value

SQRTPD xmm1, xmm2/m128 66 0F 51 /r Square Root of Double-Precision Floating-Point Values

SQRTSD xmm1,xmm2/m64 F2 0F 51/r Compute Square Root of Scalar Double-Precision Floating-Point Value
SUBPD xmm1, xmm2/m128 66 0F 5C /r Subtract Packed Double-Precision Floating-Point Values

SUBSD xmm1, xmm2/m64 F2 0F 5C /r Subtract Scalar Double-Precision Floating-Point Value

SSE2 logical instructions

Instruction Opcode Meaning

ANDPD xmm1, xmm2/m128 66 0F 54 /r Bitwise Logical AND of Packed Double Precision Floating-Point Values
ANDNPD xmm1, xmm2/m128 66 0F 55 /r Bitwise Logical AND NOT of Packed Double Precision Floating-Point Values

ORPD xmm1, xmm2/m128 66 0F 56/r Bitwise Logical OR of Packed Double Precision Floating-Point Values

XORPD xmm1, xmm2/m128 66 0F 57/r Bitwise Logical XOR of Packed Double Precision Floating-Point Values

SSE2 compare instructions

Instruction Opcode Meaning

CMPPD xmm1, xmm2/m128, imm8 66 0F C2 /r ib Compare Packed Double-Precision Floating-Point Values


CMPSD* xmm1, xmm2/m64, imm8 F2 0F C2 /r ib Compare Low Double-Precision Floating-Point Values

COMISD xmm1, xmm2/m64 66 0F 2F /r Compare Scalar Ordered Double-Precision Floating-Point Values and Set EFLAGS

https://en.wikipedia.org/wiki/X86_instruction_listings 32/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia

UCOMISD xmm1, xmm2/m64 66 0F 2E /r Unordered Compare Scalar Double-Precision Floating-Point Values and Set EFLAGS

SSE2 shuffle and unpack instructions

Instruction Opcode Meaning

SHUFPD xmm1, xmm2/m128, imm8 66 0F C6 /r ib Packed Interleave Shuffle of Pairs of Double-Precision Floating-Point Values
UNPCKHPD xmm1, xmm2/m128 66 0F 15 /r Unpack and Interleave High Packed Double-Precision Floating-Point Values

UNPCKLPD xmm1, xmm2/m128 66 0F 14 /r Unpack and Interleave Low Packed Double-Precision Floating-Point Values

SSE2 conversion instructions

Instruction Opcode Meaning

CVTDQ2PD xmm1, xmm2/m64 F3 0F E6 /r Convert Packed Doubleword Integers to Packed Double-Precision Floating-Point Values

CVTDQ2PS xmm1, xmm2/m128 0F 5B /r Convert Packed Doubleword Integers to Packed Single-Precision Floating-Point Values
CVTPD2DQ xmm1, xmm2/m128 F2 0F E6 /r Convert Packed Double-Precision Floating-Point Values to Packed Doubleword Integers

CVTPD2PI mm, xmm/m128 66 0F 2D /r Convert Packed Double-Precision FP Values to Packed Dword Integers

CVTPD2PS xmm1, xmm2/m128 66 0F 5A /r Convert Packed Double-Precision Floating-Point Values to Packed Single-Precision Floating-Point Values
CVTPI2PD xmm, mm/m64 66 0F 2A /r Convert Packed Dword Integers to Packed Double-Precision FP Values

CVTPS2DQ xmm1, xmm2/m128 66 0F 5B /r Convert Packed Single-Precision Floating-Point Values to Packed Signed Doubleword Integer Values

CVTPS2PD xmm1, xmm2/m64 0F 5A /r Convert Packed Single-Precision Floating-Point Values to Packed Double-Precision Floating-Point Values
CVTSD2SI r32, xmm1/m64 F2 0F 2D /r Convert Scalar Double-Precision Floating-Point Value to Doubleword Integer

CVTSD2SI r64, xmm1/m64 F2 REX.W 0F 2D /r Convert Scalar Double-Precision Floating-Point Value to Quadword Integer With Sign Extension

CVTSD2SS xmm1, xmm2/m64 F2 0F 5A /r Convert Scalar Double-Precision Floating-Point Value to Scalar Single-Precision Floating-Point Value
CVTSI2SD xmm1, r32/m32 F2 0F 2A /r Convert Doubleword Integer to Scalar Double-Precision Floating-Point Value

CVTSI2SD xmm1, r/m64 F2 REX.W 0F 2A /r Convert Quadword Integer to Scalar Double-Precision Floating-Point value

CVTSS2SD xmm1, xmm2/m32 F3 0F 5A /r Convert Scalar Single-Precision Floating-Point Value to Scalar Double-Precision Floating-Point Value
CVTTPD2DQ xmm1, xmm2/m128 66 0F E6 /r Convert with Truncation Packed Double-Precision Floating-Point Values to Packed Doubleword Integers

CVTTPD2PI mm, xmm/m128 66 0F 2C /r Convert with Truncation Packed Double-Precision FP Values to Packed Dword Integers

CVTTPS2DQ xmm1, xmm2/m128 F3 0F 5B /r Convert with Truncation Packed Single-Precision Floating-Point Values to Packed Signed Doubleword Integer Values
CVTTSD2SI r32, xmm1/m64 F2 0F 2C /r Convert with Truncation Scalar Double-Precision Floating-Point Value to Signed Dword Integer

CVTTSD2SI r64, xmm1/m64 F2 REX.W 0F 2C /r Convert with Truncation Scalar Double-Precision Floating-Point Value To Signed Qword Integer

CMPSD and MOVSD have the same name as the string instruction mnemonics CMPSD (CMPS) and MOVSD (MOVS); however, the former refer to
scalar double-precision floating-points whereas the latters refer to doubleword strings.

SSE2 SIMD integer instructions

SSE2 MMX-like instructions extended to SSE registers

SSE2 allows execution of MMX instructions on SSE registers, processing twice the amount of data at once.

https://en.wikipedia.org/wiki/X86_instruction_listings 33/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia

Instruction Opcode Meaning

MOVD xmm, r/m32 66 0F 6E /r Move doubleword

MOVD r/m32, xmm 66 0F 7E /r Move doubleword


MOVQ xmm1, xmm2/m64 F3 0F 7E /r Move quadword

MOVQ xmm2/m64, xmm1 66 0F D6 /r Move quadword

MOVQ r/m64, xmm 66 REX.W 0F 7E /r Move quadword


MOVQ xmm, r/m64 66 REX.W 0F 6E /r Move quadword

PMOVMSKB reg, xmm 66 0F D7 /r Move a byte mask, zeroing the upper bits of the register

PEXTRW reg, xmm, imm8 66 0F C5 /r ib Extract specified word and move it to reg, setting bits 15-0 and zeroing the rest
PINSRW xmm, r32/m16, imm8 66 0F C4 /r ib Move low word at the specified word position

PACKSSDW xmm1, xmm2/m128 66 0F 6B /r Converts 4 packed signed doubleword integers into 8 packed signed word integers with saturation

PACKSSWB xmm1, xmm2/m128 66 0F 63 /r Converts 8 packed signed word integers into 16 packed signed byte integers with saturation
PACKUSWB xmm1, xmm2/m128 66 0F 67 /r Converts 8 signed word integers into 16 unsigned byte integers with saturation

PADDB xmm1, xmm2/m128 66 0F FC /r Add packed byte integers

PADDW xmm1, xmm2/m128 66 0F FD /r Add packed word integers


PADDD xmm1, xmm2/m128 66 0F FE /r Add packed doubleword integers

PADDQ xmm1, xmm2/m128 66 0F D4 /r Add packed quadword integers.


PADDSB xmm1, xmm2/m128 66 0F EC /r Add packed signed byte integers with saturation

PADDSW xmm1, xmm2/m128 66 0F ED /r Add packed signed word integers with saturation

PADDUSB xmm1, xmm2/m128 66 0F DC /r Add packed unsigned byte integers with saturation
PADDUSW xmm1, xmm2/m128 66 0F DD /r Add packed unsigned word integers with saturation

PAND xmm1, xmm2/m128 66 0F DB /r Bitwise AND

PANDN xmm1, xmm2/m128 66 0F DF /r Bitwise AND NOT


POR xmm1, xmm2/m128 66 0F EB /r Bitwise OR

PXOR xmm1, xmm2/m128 66 0F EF /r Bitwise XOR

PCMPEQB xmm1, xmm2/m128 66 0F 74 /r Compare packed bytes for equality.


PCMPEQW xmm1, xmm2/m128 66 0F 75 /r Compare packed words for equality.

PCMPEQD xmm1, xmm2/m128 66 0F 76 /r Compare packed doublewords for equality.

PCMPGTB xmm1, xmm2/m128 66 0F 64 /r Compare packed signed byte integers for greater than
PCMPGTW xmm1, xmm2/m128 66 0F 65 /r Compare packed signed word integers for greater than

PCMPGTD xmm1, xmm2/m128 66 0F 66 /r Compare packed signed doubleword integers for greater than

PMULLW xmm1, xmm2/m128 66 0F D5 /r Multiply packed signed word integers with saturation
PMULHW xmm1, xmm2/m128 66 0F E5 /r Multiply the packed signed word integers, store the high 16 bits of the results

PMULHUW xmm1, xmm2/m128 66 0F E4 /r Multiply packed unsigned word integers, store the high 16 bits of the results

PMULUDQ xmm1, xmm2/m128 66 0F F4 /r Multiply packed unsigned doubleword integers


PSLLW xmm1, xmm2/m128 66 0F F1 /r Shift words left while shifting in 0s

PSLLW xmm1, imm8 66 0F 71 /6 ib Shift words left while shifting in 0s

PSLLD xmm1, xmm2/m128 66 0F F2 /r Shift doublewords left while shifting in 0s


PSLLD xmm1, imm8 66 0F 72 /6 ib Shift doublewords left while shifting in 0s

PSLLQ xmm1, xmm2/m128 66 0F F3 /r Shift quadwords left while shifting in 0s

PSLLQ xmm1, imm8 66 0F 73 /6 ib Shift quadwords left while shifting in 0s


PSRAD xmm1, xmm2/m128 66 0F E2 /r Shift doubleword right while shifting in sign bits

PSRAD xmm1, imm8 66 0F 72 /4 ib Shift doublewords right while shifting in sign bits

PSRAW xmm1, xmm2/m128 66 0F E1 /r Shift words right while shifting in sign bits
PSRAW xmm1, imm8 66 0F 71 /4 ib Shift words right while shifting in sign bits

PSRLW xmm1, xmm2/m128 66 0F D1 /r Shift words right while shifting in 0s

PSRLW xmm1, imm8 66 0F 71 /2 ib Shift words right while shifting in 0s


PSRLD xmm1, xmm2/m128 66 0F D2 /r Shift doublewords right while shifting in 0s

PSRLD xmm1, imm8 66 0F 72 /2 ib Shift doublewords right while shifting in 0s


PSRLQ xmm1, xmm2/m128 66 0F D3 /r Shift quadwords right while shifting in 0s

PSRLQ xmm1, imm8 66 0F 73 /2 ib Shift quadwords right while shifting in 0s

PSUBB xmm1, xmm2/m128 66 0F F8 /r Subtract packed byte integers


PSUBW xmm1, xmm2/m128 66 0F F9 /r Subtract packed word integers

PSUBD xmm1, xmm2/m128 66 0F FA /r Subtract packed doubleword integers

PSUBQ xmm1, xmm2/m128 66 0F FB /r Subtract packed quadword integers.

https://en.wikipedia.org/wiki/X86_instruction_listings 34/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia

PSUBSB xmm1, xmm2/m128 66 0F E8 /r Subtract packed signed byte integers with saturation
PSUBSW xmm1, xmm2/m128 66 0F E9 /r Subtract packed signed word integers with saturation

PMADDWD xmm1, xmm2/m128 66 0F F5 /r Multiply the packed word integers, add adjacent doubleword results

PSUBUSB xmm1, xmm2/m128 66 0F D8 /r Subtract packed unsigned byte integers with saturation
PSUBUSW xmm1, xmm2/m128 66 0F D9 /r Subtract packed unsigned word integers with saturation

PUNPCKHBW xmm1, xmm2/m128 66 0F 68 /r Unpack and interleave high-order bytes

PUNPCKHWD xmm1, xmm2/m128 66 0F 69 /r Unpack and interleave high-order words


PUNPCKHDQ xmm1, xmm2/m128 66 0F 6A /r Unpack and interleave high-order doublewords

PUNPCKLBW xmm1, xmm2/m128 66 0F 60 /r Interleave low-order bytes

PUNPCKLWD xmm1, xmm2/m128 66 0F 61 /r Interleave low-order words


PUNPCKLDQ xmm1, xmm2/m128 66 0F 62 /r Interleave low-order doublewords

PAVGB xmm1, xmm2/m128 66 0F E0, /r Average packed unsigned byte integers with rounding

PAVGW xmm1, xmm2/m128 66 0F E3 /r Average packed unsigned word integers with rounding
PMINUB xmm1, xmm2/m128 66 0F DA /r Compare packed unsigned byte integers and store packed minimum values

PMINSW xmm1, xmm2/m128 66 0F EA /r Compare packed signed word integers and store packed minimum values

PMAXSW xmm1, xmm2/m128 66 0F EE /r Compare packed signed word integers and store maximum packed values
PMAXUB xmm1, xmm2/m128 66 0F DE /r Compare packed unsigned byte integers and store packed maximum values

Computes the absolute differences of the packed unsigned byte integers; the 8 low differences and 8 high differences are
PSADBW xmm1, xmm2/m128 66 0F F6 /r
then summed separately to produce two unsigned word integer results

SSE2 integer instructions for SSE registers only

The following instructions can be used only on SSE registers, since by their nature they do not work on MMX registers

Instruction Opcode Meaning

MASKMOVDQU xmm1, xmm2 66 0F F7 /r Non-Temporal Store of Selected Bytes from an XMM Register into Memory
MOVDQ2Q mm, xmm F2 0F D6 /r Move low quadword from XMM to MMX register.

MOVDQA xmm1, xmm2/m128 66 0F 6F /r Move aligned double quadword

MOVDQA xmm2/m128, xmm1 66 0F 7F /r Move aligned double quadword


MOVDQU xmm1, xmm2/m128 F3 0F 6F /r Move unaligned double quadword

MOVDQU xmm2/m128, xmm1 F3 0F 7F /r Move unaligned double quadword

MOVQ2DQ xmm, mm F3 0F D6 /r Move quadword from MMX register to low quadword of XMM register
MOVNTDQ m128, xmm1 66 0F E7 /r Store Packed Integers Using Non-Temporal Hint

PSHUFHW xmm1, xmm2/m128, imm8 F3 0F 70 /r ib Shuffle packed high words.

PSHUFLW xmm1, xmm2/m128, imm8 F2 0F 70 /r ib Shuffle packed low words.


PSHUFD xmm1, xmm2/m128, imm8 66 0F 70 /r ib Shuffle packed doublewords.

PSLLDQ xmm1, imm8 66 0F 73 /7 ib Packed shift left logical double quadwords.

PSRLDQ xmm1, imm8 66 0F 73 /3 ib Packed shift right logical double quadwords.


PUNPCKHQDQ xmm1, xmm2/m128 66 0F 6D /r Unpack and interleave high-order quadwords,

PUNPCKLQDQ xmm1, xmm2/m128 66 0F 6C /r Interleave low quadwords,

SSE3 instructions

Added with Pentium 4 supporting SSE3

SSE3 SIMD floating-point instructions

Instruction Opcode Meaning Notes

ADDSUBPS xmm1, xmm2/m128 F2 0F D0 /r Add/subtract single-precision floating-point values


ADDSUBPD xmm1, xmm2/m128 66 0F D0 /r Add/subtract double-precision floating-point values

MOVDDUP xmm1, xmm2/m64 F2 0F 12 /r Move double-precision floating-point value and duplicate for Complex Arithmetic

MOVSLDUP xmm1, xmm2/m128 F3 0F 12 /r Move and duplicate even index single-precision floating-point values
MOVSHDUP xmm1, xmm2/m128 F3 0F 16 /r Move and duplicate odd index single-precision floating-point values

HADDPS xmm1, xmm2/m128 F2 0F 7C /r Horizontal add packed single-precision floating-point values

HADDPD xmm1, xmm2/m128 66 0F 7C /r Horizontal add packed double-precision floating-point values


for Graphics
HSUBPS xmm1, xmm2/m128 F2 0F 7D /r Horizontal subtract packed single-precision floating-point values

HSUBPD xmm1, xmm2/m128 66 0F 7D /r Horizontal subtract packed double-precision floating-point values

https://en.wikipedia.org/wiki/X86_instruction_listings 35/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia
SSE3 SIMD integer instructions

Instruction Opcode Meaning Notes

LDDQU xmm1, mem F2 0F F0 /r Load unaligned data and return double quadword Instructionally equivalent to MOVDQU. For video encoding

SSSE3 instructions

Added with Xeon 5100 series and initial Core 2

The following MMX-like instructions extended to SSE registers were added with SSSE3

Instruction Opcode Meaning

PSIGNB xmm1, xmm2/m128 66 0F 38 08 /r Negate/zero/preserve packed byte integers depending on corresponding sign

PSIGNW xmm1, xmm2/m128 66 0F 38 09 /r Negate/zero/preserve packed word integers depending on corresponding sign
PSIGND xmm1, xmm2/m128 66 0F 38 0A /r Negate/zero/preserve packed doubleword integers depending on corresponding

PSHUFB xmm1, xmm2/m128 66 0F 38 00 /r Shuffle bytes

PMULHRSW xmm1, xmm2/m128 66 0F 38 0B /r Multiply 16-bit signed words, scale and round signed doublewords, pack high 16 bits
PMADDUBSW xmm1, xmm2/m128 66 0F 38 04 /r Multiply signed and unsigned bytes, add horizontal pair of signed words, pack saturated signed-words

PHSUBW xmm1, xmm2/m128 66 0F 38 05 /r Subtract and pack 16-bit signed integers horizontally

PHSUBSW xmm1, xmm2/m128 66 0F 38 07 /r Subtract and pack 16-bit signed integer horizontally with saturation
PHSUBD xmm1, xmm2/m128 66 0F 38 06 /r Subtract and pack 32-bit signed integers horizontally

PHADDSW xmm1, xmm2/m128 66 0F 38 03 /r Add and pack 16-bit signed integers horizontally with saturation

PHADDW xmm1, xmm2/m128 66 0F 38 01 /r Add and pack 16-bit integers horizontally


PHADDD xmm1, xmm2/m128 66 0F 38 02 /r Add and pack 32-bit integers horizontally

PALIGNR xmm1, xmm2/m128, imm8 66 0F 3A 0F /r ib Concatenate destination and source operands, extract byte-aligned result shifted to the right

PABSB xmm1, xmm2/m128 66 0F 38 1C /r Compute the absolute value of bytes and store unsigned result
PABSW xmm1, xmm2/m128 66 0F 38 1D /r Compute the absolute value of 16-bit integers and store unsigned result

PABSD xmm1, xmm2/m128 66 0F 38 1E /r Compute the absolute value of 32-bit integers and store unsigned result

SSE4 instructions

SSE4.1

Added with Core 2 manufactured in 45nm

SSE4.1 SIMD floating-point instructions

Instruction Opcode Meaning

66 0F 3A 40 /r
DPPS xmm1, xmm2/m128, imm8 Selectively multiply packed SP floating-point values, add and selectively store
ib

66 0F 3A 41 /r
DPPD xmm1, xmm2/m128, imm8 Selectively multiply packed DP floating-point values, add and selectively store
ib
66 0F 3A 0C /r
BLENDPS xmm1, xmm2/m128, imm8 Select packed single precision floating-point values from specified mask
ib

BLENDVPS xmm1, xmm2/m128,


66 0F 38 14 /r Select packed single precision floating-point values from specified mask
<XMM0>

66 0F 3A 0D /r
BLENDPD xmm1, xmm2/m128, imm8 Select packed DP-FP values from specified mask
ib
BLENDVPD xmm1, xmm2/m128,
66 0F 38 15 /r Select packed DP FP values from specified mask
<XMM0>

66 0F 3A 08 /r
ROUNDPS xmm1, xmm2/m128, imm8 Round packed single precision floating-point values
ib

66 0F 3A 0A /r
ROUNDSS xmm1, xmm2/m32, imm8 Round the low packed single precision floating-point value
ib
66 0F 3A 09 /r
ROUNDPD xmm1, xmm2/m128, imm8 Round packed double precision floating-point values
ib

66 0F 3A 0B /r
ROUNDSD xmm1, xmm2/m64, imm8 Round the low packed double precision floating-point value
ib

66 0F 3A 21 /r Insert a selected single-precision floating-point value at the specified destination element and zero out destination
INSERTPS xmm1, xmm2/m32, imm8
ib elements
66 0F 3A 17 /r
EXTRACTPS reg/m32, xmm1, imm8 Extract one single-precision floating-point value at specified offset and store the result (zero-extended, if applicable)
ib

https://en.wikipedia.org/wiki/X86_instruction_listings 36/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia
SSE4.1 SIMD integer instructions

Instruction Opcode Meaning

MPSADBW xmm1, xmm2/m128, imm8 66 0F 3A 42 /r ib Sums absolute 8-bit integer difference of adjacent groups of 4 byte integers with starting offset
PHMINPOSUW xmm1, xmm2/m128 66 0F 38 41 /r Find the minimum unsigned word

PMULLD xmm1, xmm2/m128 66 0F 38 40 /r Multiply the packed dword signed integers and store the low 32 bits

PMULDQ xmm1, xmm2/m128 66 0F 38 28 /r Multiply packed signed doubleword integers and store quadword result
PBLENDVB xmm1, xmm2/m128, <XMM0> 66 0F 38 10 /r Select byte values from specified mask

PBLENDW xmm1, xmm2/m128, imm8 66 0F 3A 0E /r ib Select words from specified mask

PMINSB xmm1, xmm2/m128 66 0F 38 38 /r Compare packed signed byte integers


PMINUW xmm1, xmm2/m128 66 0F 38 3A/r Compare packed unsigned word integers

PMINSD xmm1, xmm2/m128 66 0F 38 39 /r Compare packed signed dword integers

PMINUD xmm1, xmm2/m128 66 0F 38 3B /r Compare packed unsigned dword integers


PMAXSB xmm1, xmm2/m128 66 0F 38 3C /r Compare packed signed byte integers

PMAXUW xmm1, xmm2/m128 66 0F 38 3E/r Compare packed unsigned word integers

PMAXSD xmm1, xmm2/m128 66 0F 38 3D /r Compare packed signed dword integers


PMAXUD xmm1, xmm2/m128 66 0F 38 3F /r Compare packed unsigned dword integers

PINSRB xmm1, r32/m8, imm8 66 0F 3A 20 /r ib Insert a byte integer value at specified destination element
PINSRD xmm1, r/m32, imm8 66 0F 3A 22 /r ib Insert a dword integer value at specified destination element

PINSRQ xmm1, r/m64, imm8 66 REX.W 0F 3A 22 /r ib Insert a qword integer value at specified destination element

PEXTRB reg/m8, xmm2, imm8 66 0F 3A 14 /r ib Extract a byte integer value at source byte offset, upper bits are zeroed.
PEXTRW reg/m16, xmm, imm8 66 0F 3A 15 /r ib Extract word and copy to lowest 16 bits, zero-extended

PEXTRD r/m32, xmm2, imm8 66 0F 3A 16 /r ib Extract a dword integer value at source dword offset

PEXTRQ r/m64, xmm2, imm8 66 REX.W 0F 3A 16 /r ib Extract a qword integer value at source qword offset
PMOVSXBW xmm1, xmm2/m64 66 0f 38 20 /r Sign extend 8 packed 8-bit integers to 8 packed 16-bit integers

PMOVZXBW xmm1, xmm2/m64 66 0f 38 30 /r Zero extend 8 packed 8-bit integers to 8 packed 16-bit integers

PMOVSXBD xmm1, xmm2/m32 66 0f 38 21 /r Sign extend 4 packed 8-bit integers to 4 packed 32-bit integers
PMOVZXBD xmm1, xmm2/m32 66 0f 38 31 /r Zero extend 4 packed 8-bit integers to 4 packed 32-bit integers

PMOVSXBQ xmm1, xmm2/m16 66 0f 38 22 /r Sign extend 2 packed 8-bit integers to 2 packed 64-bit integers

PMOVZXBQ xmm1, xmm2/m16 66 0f 38 32 /r Zero extend 2 packed 8-bit integers to 2 packed 64-bit integers
PMOVSXWD xmm1, xmm2/m64 66 0f 38 23/r Sign extend 4 packed 16-bit integers to 4 packed 32-bit integers

PMOVZXWD xmm1, xmm2/m64 66 0f 38 33 /r Zero extend 4 packed 16-bit integers to 4 packed 32-bit integers

PMOVSXWQ xmm1, xmm2/m32 66 0f 38 24 /r Sign extend 2 packed 16-bit integers to 2 packed 64-bit integers
PMOVZXWQ xmm1, xmm2/m32 66 0f 38 34 /r Zero extend 2 packed 16-bit integers to 2 packed 64-bit integers

PMOVSXDQ xmm1, xmm2/m64 66 0f 38 25 /r Sign extend 2 packed 32-bit integers to 2 packed 64-bit integers

PMOVZXDQ xmm1, xmm2/m64 66 0f 38 35 /r Zero extend 2 packed 32-bit integers to 2 packed 64-bit integers
PTEST xmm1, xmm2/m128 66 0F 38 17 /r Set ZF if AND result is all 0s, set CF if AND NOT result is all 0s

PCMPEQQ xmm1, xmm2/m128 66 0F 38 29 /r Compare packed qwords for equality

PACKUSDW xmm1, xmm2/m128 66 0F 38 2B /r Convert 2 × 4 packed signed doubleword integers into 8 packed unsigned word integers with saturation
MOVNTDQA xmm1, m128 66 0F 38 2A /r Move double quadword using non-temporal hint if WC memory type

SSE4a

Added with Phenom processors

Instruction Opcode Meaning

66 0F 78 /0 ib ib
EXTRQ Extract Field From Register
66 0F 79 /r
F2 0F 78 /r ib ib
INSERTQ Insert Field
F2 0F 79 /r

MOVNTSD F2 0F 2B /r Move Non-Temporal Scalar Double-Precision Floating-Point


MOVNTSS F3 0F 2B /r Move Non-Temporal Scalar Single-Precision Floating-Point

SSE4.2

Added with Nehalem processors

https://en.wikipedia.org/wiki/X86_instruction_listings 37/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia

Instruction Opcode Meaning

PCMPESTRI xmm1, xmm2/m128, imm8 66 0F 3A 61 /r imm8 Packed comparison of string data with explicit lengths, generating an index

PCMPESTRM xmm1, xmm2/m128, imm8 66 0F 3A 60 /r imm8 Packed comparison of string data with explicit lengths, generating a mask
PCMPISTRI xmm1, xmm2/m128, imm8 66 0F 3A 63 /r imm8 Packed comparison of string data with implicit lengths, generating an index

PCMPISTRM xmm1, xmm2/m128, imm8 66 0F 3A 62 /r imm8 Packed comparison of string data with implicit lengths, generating a mask

PCMPGTQ xmm1,xmm2/m128 66 0F 38 37 /r Compare packed signed qwords for greater than.

F16C

Half-precision floating-point conversion.

Instruction Meaning
Convert four half-precision floating point values in memory or the bottom half of an XMM register to four single-precision floating-point values
VCVTPH2PS xmmreg,xmmrm64
in an XMM register

Convert eight half-precision floating point values in memory or an XMM register (the bottom half of a YMM register) to eight single-precision
VCVTPH2PS ymmreg,xmmrm128
floating-point values in a YMM register

Convert four single-precision floating point values in an XMM register to half-precision floating-point values in memory or the bottom half an
VCVTPS2PH xmmrm64,xmmreg,imm8
XMM register
VCVTPS2PH xmmrm128,ymmreg,imm8 Convert eight single-precision floating point values in a YMM register to half-precision floating-point values in memory or an XMM register

FMA3

Supported in AMD processors starting with the Piledriver architecture and Intel starting with Haswell processors and Broadwell processors since 2014.

Fused multiply-add (floating-point vector multiply–accumulate) with three operands.

https://en.wikipedia.org/wiki/X86_instruction_listings 38/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia

Instruction Meaning

VFMADD132PD

VFMADD213PD Fused Multiply-Add of Packed Double-Precision Floating-Point Values


VFMADD231PD

VFMADD132PS

VFMADD213PS Fused Multiply-Add of Packed Single-Precision Floating-Point Values


VFMADD231PS

VFMADD132SD

VFMADD213SD Fused Multiply-Add of Scalar Double-Precision Floating-Point Values


VFMADD231SD

VFMADD132SS

VFMADD213SS Fused Multiply-Add of Scalar Single-Precision Floating-Point Values


VFMADD231SS

VFMADDSUB132PD

VFMADDSUB213PD Fused Multiply-Alternating Add/Subtract of Packed Double-Precision Floating-Point Values


VFMADDSUB231PD

VFMADDSUB132PS
VFMADDSUB213PS Fused Multiply-Alternating Add/Subtract of Packed Single-Precision Floating-Point Values

VFMADDSUB231PS

VFMSUB132PD
VFMSUB213PD Fused Multiply-Subtract of Packed Double-Precision Floating-Point Values

VFMSUB231PD

VFMSUB132PS
VFMSUB213PS Fused Multiply-Subtract of Packed Single-Precision Floating-Point Values

VFMSUB231PS

VFMSUB132SD
VFMSUB213SD Fused Multiply-Subtract of Scalar Double-Precision Floating-Point Values

VFMSUB231SD

VFMSUB132SS
VFMSUB213SS Fused Multiply-Subtract of Scalar Single-Precision Floating-Point Values

VFMSUB231SS

VFMSUBADD132PD
VFMSUBADD213PD Fused Multiply-Alternating Subtract/Add of Packed Double-Precision Floating-Point Values

VFMSUBADD231PD

VFMSUBADD132PS
VFMSUBADD213PS Fused Multiply-Alternating Subtract/Add of Packed Single-Precision Floating-Point Values

VFMSUBADD231PS

VFNMADD132PD
VFNMADD213PD Fused Negative Multiply-Add of Packed Double-Precision Floating-Point Values

VFNMADD231PD

VFNMADD132PS
VFNMADD213PS Fused Negative Multiply-Add of Packed Single-Precision Floating-Point Values

VFNMADD231PS

VFNMADD132SD
VFNMADD213SD Fused Negative Multiply-Add of Scalar Double-Precision Floating-Point Values

VFNMADD231SD

VFNMADD132SS
VFNMADD213SS Fused Negative Multiply-Add of Scalar Single-Precision Floating-Point Values

VFNMADD231SS
VFNMSUB132PD

VFNMSUB213PD Fused Negative Multiply-Subtract of Packed Double-Precision Floating-Point Values

VFNMSUB231PD
VFNMSUB132PS

VFNMSUB213PS Fused Negative Multiply-Subtract of Packed Single-Precision Floating-Point Values

VFNMSUB231PS

https://en.wikipedia.org/wiki/X86_instruction_listings 39/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia

VFNMSUB132SD
VFNMSUB213SD Fused Negative Multiply-Subtract of Scalar Double-Precision Floating-Point Values

VFNMSUB231SD

VFNMSUB132SS
VFNMSUB213SS Fused Negative Multiply-Subtract of Scalar Single-Precision Floating-Point Values

VFNMSUB231SS

AVX

AVX were first supported by Intel with Sandy Bridge and by AMD with Bulldozer.

Vector operations on 256 bit registers.

Instruction Description
VBROADCASTSS

VBROADCASTSD Copy a 32-bit, 64-bit or 128-bit memory operand to all elements of a XMM or YMM vector register.

VBROADCASTF128
Replaces either the lower half or the upper half of a 256-bit YMM register with the value of a 128-bit source operand. The other half of the destination is
VINSERTF128
unchanged.

VEXTRACTF128 Extracts either the lower half or the upper half of a 256-bit YMM register and copies the value to a 128-bit destination operand.

Conditionally reads any number of elements from a SIMD vector memory operand into a destination register, leaving the remaining vector elements unread
VMASKMOVPS and setting the corresponding elements in the destination register to zero. Alternatively, conditionally writes any number of elements from a SIMD vector
register operand to a vector memory operand, leaving the remaining elements of the memory operand unchanged. On the AMD Jaguar processor
architecture, this instruction with a memory source operand takes more than 300 clock cycles when the mask is zero, in which case the instruction should do
VMASKMOVPD
nothing. This appears to be a design flaw.[79]
VPERMILPS Permute In-Lane. Shuffle the 32-bit or 64-bit vector elements of one input operand. These are in-lane 256-bit instructions, meaning that they operate on all
VPERMILPD 256 bits with two separate 128-bit shuffles, so they can not shuffle across the 128-bit lanes.[80]

VPERM2F128 Shuffle the four 128-bit vector elements of two 256-bit source operands into a 256-bit destination operand, with an immediate constant as selector.

VZEROALL Set all YMM registers to zero and tag them as unused. Used when switching between 128-bit use and 256-bit use.
VZEROUPPER Set the upper half of all YMM registers to zero. Used when switching between 128-bit use and 256-bit use.

AVX2

Introduced in Intel's Haswell microarchitecture and AMD's Excavator.

Expansion of most vector integer SSE and AVX instructions to 256 bits

https://en.wikipedia.org/wiki/X86_instruction_listings 40/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia

Instruction Description

VBROADCASTSS Copy a 32-bit or 64-bit register operand to all elements of a XMM or YMM vector register. These are register versions of the same instructions in AVX1. There
VBROADCASTSD is no 128-bit version however, but the same effect can be simply achieved using VINSERTF128.

VPBROADCASTB

VPBROADCASTW
Copy an 8, 16, 32 or 64-bit integer register or memory operand to all elements of a XMM or YMM vector register.
VPBROADCASTD
VPBROADCASTQ

VBROADCASTI128 Copy a 128-bit memory operand to all elements of a YMM vector register.

Replaces either the lower half or the upper half of a 256-bit YMM register with the value of a 128-bit source operand. The other half of the destination is
VINSERTI128
unchanged.
VEXTRACTI128 Extracts either the lower half or the upper half of a 256-bit YMM register and copies the value to a 128-bit destination operand.

VGATHERDPD

VGATHERQPD
Gathers single or double precision floating point values using either 32 or 64-bit indices and scale.
VGATHERDPS

VGATHERQPS

VPGATHERDD
VPGATHERDQ
Gathers 32 or 64-bit integer values using either 32 or 64-bit indices and scale.
VPGATHERQD
VPGATHERQQ

VPMASKMOVD Conditionally reads any number of elements from a SIMD vector memory operand into a destination register, leaving the remaining vector elements unread
and setting the corresponding elements in the destination register to zero. Alternatively, conditionally writes any number of elements from a SIMD vector
VPMASKMOVQ register operand to a vector memory operand, leaving the remaining elements of the memory operand unchanged.

VPERMPS
Shuffle the eight 32-bit vector elements of one 256-bit source operand into a 256-bit destination operand, with a register or memory operand as selector.
VPERMD

VPERMPD
Shuffle the four 64-bit vector elements of one 256-bit source operand into a 256-bit destination operand, with a register or memory operand as selector.
VPERMQ
VPERM2I128 Shuffle (two of) the four 128-bit vector elements of two 256-bit source operands into a 256-bit destination operand, with an immediate constant as selector.

VPBLENDD Doubleword immediate version of the PBLEND instructions from SSE4.

VPSLLVD
Shift left logical. Allows variable shifts where each element is shifted according to the packed input.
VPSLLVQ

VPSRLVD
Shift right logical. Allows variable shifts where each element is shifted according to the packed input.
VPSRLVQ
VPSRAVD Shift right arithmetically. Allows variable shifts where each element is shifted according to the packed input.

AVX-512

AVX-512, introduced in 2014, adds 512-bit wide vector registers (extending the 256-bit registers, which become the new registers' lower halves) and
doubles their count to 32; the new registers are thus named zmm0 through zmm31. It adds eight mask registers, named k0 through k7, which may be used
to restrict operations to specific parts of a vector register. Unlike previous instruction set extensions, AVX-512 is implemented in several groups; only the
foundation ("AVX-512F") extension is mandatory.[81] Most of the added instructions may also be used with the 256- and 128-bit registers.

Cryptographic instructions

Intel AES instructions

6 new instructions.

https://en.wikipedia.org/wiki/X86_instruction_listings 41/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia

Instruction Encoding Description

AESENC 66 0F 38 DC /r Perform one round of an AES encryption flow

AESENCLAST 66 0F 38 DD /r Perform the last round of an AES encryption flow


AESDEC 66 0F 38 DE /r Perform one round of an AES decryption flow

AESDECLAST 66 0F 38 DF /r Perform the last round of an AES decryption flow

AESKEYGENASSIST 66 0F 3A DF /r ib Assist in AES round key generation


AESIMC 66 0F 38 DB /r Assist in AES Inverse Mix Columns

CLMUL instructions

Instruction Opcode Description

PCLMULQDQ xmmreg,xmmrm,imm 66 0f 3a 44 /r ib Perform a carry-less multiplication of two 64-bit polynomials over the finite field GF(2k).
PCLMULLQLQDQ xmmreg,xmmrm 66 0f 3a 44 /r 00 Multiply the low halves of the two registers.
PCLMULHQLQDQ xmmreg,xmmrm 66 0f 3a 44 /r 01 Multiply the high half of the destination register by the low half of the source register.

PCLMULLQHQDQ xmmreg,xmmrm 66 0f 3a 44 /r 10 Multiply the low half of the destination register by the high half of the source register.

PCLMULHQHQDQ xmmreg,xmmrm 66 0f 3a 44 /r 11 Multiply the high halves of the two registers.

RDRAND and RDSEED

Instruction Encoding Description

I
RDRAND r16 E
NFx 0F C7 /6
RDRAND r32 Return a random number that has been generated with a CSPRNG (Cryptographically Secure Pseudo-Random Number Generator) P
compliant with NIST SP 800-90A (https://csrc.nist.gov/publications/detail/sp/800-90a/rev-1/final).[a] Z
K
RDRAND r64 NFx REX.W 0F C7 /6
G

RDSEED r16 B
NFx 0F C7 /7 Z
RDSEED r32 Return a random number that has been generated with a HRNG/TRNG (Hardware/"True" Random Number Generator) compliant with
K
NIST SP 800-90B (https://csrc.nist.gov/publications/detail/sp/800-90b/final) and C (https://csrc.nist.gov/publications/detail/sp/800-90c/draft).[a]
Z
RDSEED r64 NFx REX.W 0F C7 /7 G

a. The RDRAND and RDSEED instructions may fail to obtain and return a random number if the CPU's random number generators cannot keep up with the
issuing of these instructions - if this happens, then the instructions may be retried (although the number of retries should be limited, in order to ensure
forward progress). The instructions set EFLAGS.CF to 1 if a random number was successfully obtained and 0 otherwise. Failure to obtain a random
number will also set the instruction's destination register to 0.

Intel SHA instructions

7 new instructions.

Instruction Encoding Description

SHA1RNDS4 0F 3A CC /r ib Perform Four Rounds of SHA1 Operation


SHA1NEXTE 0F 38 C8 /r Calculate SHA1 State Variable E after Four Rounds

SHA1MSG1 0F 38 C9 /r Perform an Intermediate Calculation for the Next Four SHA1 Message Dwords

SHA1MSG2 0F 38 CA /r Perform a Final Calculation for the Next Four SHA1 Message Dwords
SHA256RNDS2 0F 38 CB /r Perform Two Rounds of SHA256 Operation

SHA256MSG1 0F 38 CC /r Perform an Intermediate Calculation for the Next Four SHA256 Message Dwords

SHA256MSG2 0F 38 CD /r Perform a Final Calculation for the Next Four SHA256 Message Dwords

Intel AES Key Locker instructions

These instructions, available in Tiger Lake and later Intel processors, are designed to enable encryption/decryption with an AES key without having access
to any unencrypted copies of the key during the actual encryption/decryption process.

https://en.wikipedia.org/wiki/X86_instruction_listings 42/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia

Instruction Encoding Description Notes

The two explicit operands (which must be register operands)


specify a 256-bit encryption key.
The implicit operand in XMM0 specifies a 128-bit integrity key.
EAX contains flags controlling operation of instruction.
Load internal wrapping key ("IWKey") from xmm1, xmm2 and
LOADIWKEY xmm1,xmm2 F3 0F 38 DC /r After being loaded, the IWKey cannot be directly read from
XMM0.
software, but is used for the key wrapping done by
ENCODEKEY128/256 and checked by the Key Locker
encode/decode instructions.
LOADIWKEY is privileged and can run in Ring 0 only.

Wrap a 128-bit AES key from XMM0 into a 384-bit key handle Source operand specifies handle restrictions to build into the
ENCODEKEY128 r32,r32 F3 0F 38 FA /r handle.
and output handle in XMM0-2.
Destination operand is initialized with information about the
source and attributes of the key.
Wrap a 256-bit AES key from XMM1:XMM0 into a 512-bit key The instruction also modifies XMM4-6 (zeroed out in existing
ENCODEKEY256 r32,32 F3 0F 3A FB /r
handle and output handle in XMM0-3. implementations, but this should not be relied on).
Encrypt xmm using 128-bit AES key indicated by handle at m384
AESENC128KL xmm,m384 F3 0F 38 DC /r
and store result in xmm.

Decrypt xmm using 128-bit AES key indicated by handle at m384


AESDEC128KL xmm,m384 F3 0F 38 DD /r
and store result in xmm.

Encrypt xmm using 256-bit AES key indicated by handle at m512


AESENC256KL xmm,m512 F3 0F 38 DE /r
and store result in xmm.
Decrypt xmm using 256-bit AES key indicated by handle at m512
AESDEC256KL xmm,m512 F3 0F 38 DF /r
and store result in xmm.
All of the Key Locker encode/decode instructions will check
Encrypt XMM0-7 using 128-bit AES key indicated by handle at whether the handle is valid for the current IWKey
AESENCWIDE128KL m384 F3 0F 38 D8 /0 m384 and store each resultant block back to its corresponding and encode/decode data only if the handle is valid.
register. The ZF flag is used to indicate whether the provided handle
was valid (ZF=0) or not (ZF=1).
Decrypt XMM0-7 using 128-bit AES key indicated by handle at
AESDECWIDE128KL m384 F3 0F 38 D8 /1 m384 and store each resultant block back to its corresponding
register.
Encrypt XMM0-7 using 256-bit AES key indicated by handle at
AESENCWIDE256KL m512 F3 0F 38 D8 /2 m512 and store each resultant block back to its corresponding
register.

Decrypt XMM0-7 using 256-bit AES key indicated by handle at


AESDECWIDE256KL m512 F3 0F 38 D8 /3 m512 and store each resultant block back to its corresponding
register.

VIA PadLock instructions

The VIA PadLock instructions are instructions designed to apply cryptographic primitives in bulk, similar to the 8086 repeated string instructions. As
such, unless otherwise specified, they take, as applicable, pointers to source data in ES:rSI and destination data in ES:rDI, and a data-size or count in rCX.
Like the old string instructions, they are all designed to be interruptible.

Padlock subset Instruction Encoding Description Added in

XSTORE 0F A7 C0 Store random bytes to ES:[rDI], and increment ES:rDI accordingly. XSTORE will store currently-
RNG
available bytes, which may be from 0 to 8 bytes. REP XSTORE will write the number of random Nehemiah
Random Number
bytes specified by rCX, waiting for the random number generator when needed. EDX (stepping 3)
Generation. REP XSTORE F3 0F A7 C0 specifies a "quality factor".

REP XCRYPTECB F3 0F A7 C8
ACE REP XCRYPTCBC F3 0F A7 D0
Encrypt/Decrypt data, using the AES cipher in various block modes (ECB, CBC, CTR, CFB Nehemiah
Advanced Cryptography
REP XCRYPTCFB F3 0F A7 E0 and OFB, respectively). rCX contains the number of 16-byte blocks to encrypt/decrypt, rBX (stepping 8)
Engine.
contains a pointer to an encryption key, rAX a pointer to an initialization vector for block
REP XCRYPTOFB F3 0F A7 E8 modes that need it, and rDX a pointer to a control word.[a]

ACE2[b] REP XCRYPTCTR F3 0F A7 D8 C7 "Esther"[82]

REP XSHA1 F3 0F A6 C8 Compute a cryptographic hash (using the SHA-1 and SHA-256 functions, respectively).
PHE
ES:rSI points to data to compute a hash for, ES:rDI points to a message digest and rCX Esther
Hash Engine.
REP XSHA256 F3 0F A6 D0 specifies the number of bytes. rAX should be set to 0 at the start of a calculation.[c]

Perform Montgomery Multiplication. Takes an operand width in ECX (given as a number of


PMM
REP MONTMUL F3 0F A6 C0 bits - must be in range 256..32768 and divisble by 128) and pointer to a data structure in Esther
Montgomery Multiplier.
ES:ESI.[d]

Compute SM3 hash, similar to the REP XSHA* instructions. The rBX register is used to specify
CCS_HASH F3 0F A6 E8
GMI[83][84] hash function (20h for SM3 being the only documented value).
Chinese national
cryptographic Encrypt/Decrypt data, using the SM4 cipher in various block modes. rCX contains the number ZhangJiang
algorithms. (Zhaoxin of 16-byte blocks to encrypt/decrypt, rBX contains a pointer to an encryption key, rDX a
CCS_ENCRYPT F3 0F A7 F0
only.) pointer to an initialization vector for block modes that need it, and rAX contains a control
word.[e]

https://en.wikipedia.org/wiki/X86_instruction_listings 43/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia
a. The control word for REP XCRYPT* is a 128-bit c. On VIA Nano and later processors, setting e. The CCS_ENCRYPT control word in rAX has the
data structure with the following layout: rAX to an all-1s value for the REP XSHA* following format:
instructions will enable an alternate operation
Bits Usage mode, where rCX specifies the number of 64- Bits Usage
byte blocks, and where the standard FIPS-
3:0 AES round count 0 0=Encrypt, 1=Decrypt
180-2 length extension procedure at the end
4 Digest mode enable (ACE2 only) of the hash calculation is omitted. This makes 5:1 Must be 10000b for SM4.
for a variant more suitable for data streaming
5
1=allow data that is not 16-byte aligned than the original EAX=0 variant. This 6 ECB block mode
(ACE2 only) functionality also exists for CCS_HASH.
7 CBC block mode
 
6 Cipher: 0=AES, 1=undefined
d. The data structure to REP MONTMUL contains 8 CFB block mode
Key schedule: 0=compute, 1=load from six 32-bit elements, where the first one is a
7 9 OFB block mode
memory negated modular inverse of the bottom 32 bits
of the modulus and the remaining 5 are 10 CTR block mode
8 0=normal, 1=intermediate-result
pointers to various memory buffers:
11 Digest enable
9 0=encrypt, 1=decrypt
Offset Data item
Key size: 00=128bit, 01=192bit, Remaining bits in rAX must be set to all-0s.
11:10 0 Negated modular inverse
10=256bit, 11=reserved

127:12 Reserved, set to 0 4 Pointer to first multiplicand

8 Pointer to second multiplicand


b. ACE2 also adds extra features to the other
REP XCRYPT instructions: a digest mode for 12 Pointer to result buffer
the CBC and CFB instructions, and the ability
16 Pointer to modulus
to use input/output data that are not 16-byte
aligned for the non-ECB instructions. 20 Pointer to 32-byte scratchpad

Other instructions

https://en.wikipedia.org/wiki/X86_instruction_listings 44/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia
x86 also includes discontinued instruction sets which are no longer supported by Intel and AMD, and undocumented instructions which execute but are
not officially documented.

Virtualization instructions

AMD-V instructions

Instruction Opcode Meaning Notes Added in

Basic SVM instructions

VMRUN rAX[a] 0F 01 D8 Run virtual machine Performs a switch to the guest OS.

VMMCALL 0F 01 D9 Call VMM Used exclusively to communicate with VMM.


Load state From Loads a subset of processor state from the VMCB specified by the physical address in the rAX
VMLOAD rAX[a] 0F 01 DA
VMCB register.

VMSAVE rAX[a] 0F 01 DB Save state To VMCB Saves additional guest state to VMCB.
K8[b]
Set Global Interrupt Normally used by the VMM.
STGI 0F 01 DC
Flag

Clear Global Interrupt Available to the VM guest if VGIF feature is supported.


CLGI 0F 01 DD
Flag
Invalidate TLB entry
INVLPGA rAX,ECX[a] 0F 01 DF
in a specified ASID
Invalidates the TLB mapping for the virtual page specified in RAX and the ASID specified in ECX.

Secure Init and Jump Shanghai,


SKINIT EAX 0F 01 DE Verifiable startup of trusted software based on secure hash comparison
with Attestation Phenom II

Virtualization Encrypted State (SEV-ES) instructions


Explicit communication with the VMM for SEV-ES VMs.
VMGEXIT F2/F3 0F 01 D9 SEV-ES Exit to VMM Zen 1
Executed as VMMCALL if not executed by a SEV-ES guest.

Secure Nested Paging (SEV-SNP) Reverse-Map Table instructions

Expands a 2MB-page RMP entry into a corresponding set of contiguous 4KB-page RMP entries.
PSMASH F3 0F 01 FF Page Smash
The 2MB page's system physical address is specified in the RAX register.

Validates or rescinds validation of a guest page's RMP entry. The guest virtual address is specified
PVALIDATE F2 0F 01 FF Page Validate
in the register operand rAX.
Modifies RMP permissions for a guest page. The guest virtual address is specified in the RAX Zen 3
Adjust RMP
RMPADJUST F3 0F 01 FE register. The page size is specified in RCX[0]. The target VMPL and its permissions are specified in
Permissions
the RDX register.

Writes a new RMP entry. The system physical address of a page whose RMP entry is modified is
RMPUPDATE F2 0F 01 FE Write RMP Entry specified in the RAX register. The RCX register provides the effective address of a 16-byte data
structure which contains the new RMP state.
Reads an RMP permission mask for a guest page. The guest virtual address is specified in the RAX
Read RMP
RMPQUERY F3 0F 01 FD register. The target VMPL is specified in RDX[7:0]. RMP permissions for the specified VMPL are Zen 4
Permissions
returned in RDX[63:8] and the RCX register.

a. For the rAX argument to the VMRUN, VMLOAD, VMSAVE and INVLPGA instructions, the choice of AX/EAX/RAX depends on address-size, which can be
overridden with the 67h prefix.

https://en.wikipedia.org/wiki/X86_instruction_listings 45/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia
b. Support for AMD-V was added in stepping F of the AMD K8, and is not available on earlier steppings.

Intel VT-x instructions

Instruction Opcode Description Added in

Basic VMX (Virtual Machine Extensions) instructions

VMFUNC NP 0F 01 D4 Invoke VM function specified in EAX.

VMPTRLD m64[a] NP 0F C7 /6 Load pointer to Virtual-Machine Control Structure (VMCS) from memory and mark it valid.

VMPTRST m64[a] NP 0F C7 /7 Store pointer to current VMCS to memory.

Flush VMCS data from CPU to VMCS region in memory. If the specified VMCS is the current VMCS, then the
VMCLEAR m64[a] 66 0F C7 /6
current-VMCS is marked as invalid.
Read a specified field from the current-VMCS. The reg argument specifies which field to read - the result is stored
VMREAD r/m,reg NP 0F 78 /r Prescott 2M,
to r/m.
Yonah,
Write to specified field of current-VMCS. The reg argument specifies which field to write, and the r/m argument Centerton,
VMWRITE reg,r/m NP 0F 79 /r Nano 3000
provides the data item to write to the field.

VMCALL NP 0F 01 C1 Call to VM monitor by causing a VMEXIT.


VMLAUNCH NP 0F 01 C2 Launch virtual machine managed by current VMCS.

VMRESUME NP 0F 01 C3 Resume virtual machine managed by current VMCS.

VMXOFF NP 0F 01 C4 Leave VMX Operation - stops hardware supported virtualisation environment.


[a]
VMXON m64 F3 0F C7 /6 Enter VMX Operation - enters hardware supported virtualisation environment.[b]
Extended Page Tables (EPT) instructions

INVEPT reg,m128 66 0F 38 80 /r Invalidates EPT-derived entries in the TLBs and paging-structure caches. Nehalem,
Centerton,[85]
INVVPID reg,m128 66 0F 38 81 /r Invalidates entries in the TLBs and paging-structure caches based on VPID. ZhangJiang
Trust Domain Extensions (TDX): Secure Arbitration Mode (SEAM) instructions

SEAMCALL 66 0F 01 CF Call to SEAM VMX root operation.

SEAMOPS 66 0F 01 CE Invoke SEAM specific operations.


Sapphire Rapids[86]
SEAMRET 66 0F 01 CD Return to legacy VMX root operation from SEAM VMX root operation.

TDCALL 66 0F 01 CC Call to VM monitor by causing a VMEXIT.

a. The m64 argument to VMPTRLD, VMPTRST, VMCLEAR and VMXON is a 64-bit physical address.
b. The m64 argument to VMXON is the 64-bit physical address to a "VMXON region", which is a 4Kbyte region that must be 4 Kbyte aligned. This region
may be used by the processor to support VMX operation in an implementation-dependent manner and should never be accessed by software until the
processor has left VMX operation through the VMXOFF instruction.

Undocumented instructions

Undocumented x86 instructions

The x86 CPUs contain undocumented instructions which are implemented on the chips but not listed in some official documents. They can be found in
various sources across the Internet, such as Ralf Brown's Interrupt List and at sandpile.org (https://www.sandpile.org/)

https://en.wikipedia.org/wiki/X86_instruction_listings 46/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia
Some of these instructions are widely available across many/most x86 CPUs, while others are specific to a narrow range of CPUs.

Undocumented instructions that are widely available across many x86 CPUs include

Mnemonics Opcodes Description Status

ASCII-Adjust-after-Multiply. On the 8086, documented for imm8=0Ah only,


which is used to convert a binary multiplication result to BCD.

AAM imm8 D4 imm8 The actual operation is AH ← AL/imm8; AL ← AL mod imm8 for
any imm8 value (except zero, which produces a divide-by-zero
exception).[87] Available beginning with 8086,
documented for imm8 values other
than 0Ah since Pentium (earlier
ASCII-Adjust-Before-Division. On the 8086, documented for imm8=0Ah only,
documentation lists no arguments).
which is used to convert a BCD value to binary for a following division
instruction.
AAD imm8 D5 imm8
The actual operation is AL ← (AL+(AH*imm8)) & 0FFh; AH ← 0
for any imm8 value.

SALC,
Set AL depending on the value of the Carry Flag (a 1-byte alternative of Available beginning with 8086, but only
D6
SETALC SBB AL, AL) documented since Pentium Pro.

Available since the 8086.


F6 /1 imm8,
[88]
TEST Undocumented variants of the TEST instruction. Performs the same Unavailable on some 80486
F7 /1 imm16/32 operation as the documented F6 /0 and F7 /0 variants, respectively.
steppings.[89][90]

SHL, (D0..D3) /6,
Undocumented variants of the SHL instruction.[88] Performs the same Available since the 80186 (performs
operation as the documented (D0..D3) /4 and (C0..C1) /4 variants,
SAL (C0..C1) /6 imm8 different operation on the 8086)[91]
respectively.

Alias of opcode 80, which provides variants of 8-bit integer instructions (ADD, Available since the 8086.[92] Explicitly
 (multiple) 82 /(0..7) imm8 unavailable in 64-bit mode but kept and
OR, ADC, SBB, AND, SUB, XOR, CMP) with an 8-bit immediate argument.[92]
reserved for compatibility.[93]
Available on 8086, but only
OR,AND,XOR 83 /(1,4,6) imm8 16-bit OR/AND/XOR with a sign-extended 8-bit immediate. documented from 80386
onwards.[94][95]
The behavior of the F2 prefix (REPNZ, REPNE) when used with string
REPNZ MOVS F2 (A4..A5) instructions other than CMPS/SCAS is officially undefined, but there exists
commercial software (e.g. the version of FDISK distributed with MS-DOS Available since the 8086.
REPNZ STOS F2 (AA..AB) versions 3.30 to 6.22[96]) that rely on it to behave in the same way as the
documented F3 (REP) prefix.
The use of the REP prefix with the RET instruction is not listed as supported in
either the Intel SDM or the AMD APM. However, AMD's optimization guide for
the AMD-K8 describes the F3 C3 encoding as a way to encode a two-byte
RET instruction - this is the recommended workaround for an issue in the Executes as RET on all known x86
REP RET F3 C3
AMD-K8's branch predictor that can cause branch prediction to fail for some 1- CPUs.
byte RET instructions.[97] At least some versions of gcc are known to use this
encoding.[98]
NOP with address-size override prefix. The use of the 67 prefix for instructions
without memory operands is listed by the Intel SDM (vol 2, section 2.1.1) as
NOP 67 90 Executes as NOP on 80386 and later.
"reserved", but it is used in Microsoft Windows 95 as a workaround for a bug in
the B1 stepping of Intel 80386.[99][100]

ICEBP, Available beginning with 80386,


documented (as INT1) since Pentium
F1 Single byte single-step exception / Invoke ICE Pro. Treated as undocumented
INT1 instruction prefix on 8086 and
80286.[101]
Available on Pentium Pro and AMD
Official long NOP.
K7[104] and later.
NOP r/m 0F 1F /0 Introduced in the Pentium Pro in 1995, but remained Unavailable on AMD K6, AMD
undocumented until March 2006.[29][102][103]
Geode LX, VIA Nehemiah.[105]

Reserved-NOP. Introduced in 65 nm Pentium 4. Intel documentation lists this


opcode as NOP in opcode tables but not instruction listings since June
2005.[106][107] From Broadwell onwards, 0F 0D /1 has been documented as
PREFETCHW.
Available on Intel CPUs since
NOP r/m 0F 0D /r On AMD CPUs, 0F 0D with a memory argument is documented 65 nm Pentium 4.
as PREFETCH/PREFETCHW since K6-2 - originally as part of
3dnow!, but has been kept in later AMD CPUs even after the rest
of 3dnow! was dropped.

UD1 0F B9 /r Intentionally undefined instructions, but unlike UD2 (0F 0B) these instructions All of these opcodes produce #UD
were left unpublished until December 2016.[108][109] exceptions on 80186 and later (except

https://en.wikipedia.org/wiki/X86_instruction_listings 47/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia
Microsoft Windows 95 Setup is known to depend on 0F FF being on NEC V20/V30, which assign at least
0F FF to the BRKEM instruction.)
invalid[110][111] - it is used as a self check to test that its #UD
exception handler is working properly.

Other invalid opcodes that are being relied on by commercial


software to produce #UD exceptions include FF  FF (DIF-2,[112]
UD0 0F FF LaserLok[113]) and C4 C4 ("BOP"[114][115]), however as of January
2022 they are not published as intentionally invalid opcodes.

Undocumented instructions that appear only in a limited subset of x86 CPUs include

Mnemonics Opcodes Description Status


A REP or REPNZ prefix on an IMUL instruction causes the result to be
REP IMUL F3 F6 /5, F3 F7 /5 negated. This is due to the microcode using the “REP prefix present” bit to 8086/8088 only.[116]
store the sign of the result.

A REP or REPNZ prefix on an IDIV instruction causes the quotient to be


REP IDIV F3 F6 /7, F3 F7 /7 negated. This is due to the microcode using the “REP prefix present” bit to 8086/8088 only.[116]
store the sign of the quotient.

Exact purpose unknown, causes CPU hang (HCF). The only way out is CPU
reset.[117]

In some implementations, emulated through BIOS as a halting


SAVEALL, sequence.[118]
0F 04 Only available on 80286
STOREALL In a forum post at the Vintage Computing Federation (https://foru
m.vcfed.org/index.php?threads/i-found-the-saveall-opcode.7151
9/), this instruction is explained as SAVEALL. It interacts with ICE
mode.

Only available on 80286.

LOADALL 0F 05 Loads All Registers from Memory Address 0x000800H Opcode reused for SYSCALL
in AMD K6-2 and later CPUs.

Only available on 80386.

LOADALLD 0F 07 Loads All Registers from Memory Address ES:EDI Opcode reused for SYSRET in
AMD K6-2 and later CPUs.

On the Intel SCC (Single-chip Cloud Computer), invalidate all message buffers.
CL1INVMB 0F 0A[119] The menmonic and operation of the instruction, but not its opcode, are Available on the SCC only.
described in Intel's SCC architecture specification.[120]
On AMD K6 and later maps to FEMMS operation (fast clear of MMX state) but Only available in Red unlock state
PATCH2 0F 0E
on Intel identified as uarch data read on Intel[121] (0F 0F too)

Can change RAM part of microcode


PATCH3 0F 0F Write uarch
on Intel

Available on some 386 and 486


UMOV r,r/m processors only.

0F (10..13) /r Moves data to/from user memory when operating in ICE HALT mode.[122] Acts
UMOV r/m,r as regular MOV otherwise. Opcodes reused for SSE
instructions in later CPUs.

NXOP 0F 55 NexGen hypercode interface.[123] Available on NexGen Nx586 only.

NexGen Nx586 "hyper mode" instructions.

The NexGen Nx586 CPU uses "hyper code"[125] (x86 code


sequences unpacked at boot time and only accessible in a special
"hyper mode" operation mode, similar to DEC Alpha's PALcode)
(multiple) 0F (E0..FB)[124] Available in Nx586 hyper mode only.
for many complicated operations that are implemented with
microcode in most other x86 CPUs. The Nx586 provides a large
number of undocumented instructions to assist hyper mode
operation.

Available on K6-2 and K6-3 only.


Undocumented AMD 3DNow! instruction on K6-2 and K6-3. Swaps 16-bit
words within 64-bit MMX register.[126][127] Opcode reused for
PSWAPW mm,mm/m64 0F 0F /r BB documented PSWAPD
Instruction known to be recognized by MASM 6.13 and 6.14. instruction from AMD K7
onwards.

Using the 64h (FS: segment) prefix with the undocumented D6 Available on the UMC Green CPU
Un­known mnemonic 64 D6 (SALC/SETALC) instruction will, on UMC CPUs only, cause EAX to be set to only. Executes as SALC on non-
0xAB6B1B07.[128][129] UMC CPUs.

https://en.wikipedia.org/wiki/X86_instruction_listings 48/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia

Available on NetBurst CPUs only.

64 (70..7F) rel8, On Intel "NetBurst" (Pentium 4) CPUs, the 64h (FS: segment) instruction prefix
will, when used with conditional branch instructions, act as a branch hint to Segment prefixes on
FS: Jcc indicate that the branch will be alternating between taken and not-taken.[130] conditional branches are
64 0F (80..8F) rel16/32 Unlike other NetBurst branch hints (CS: and DS: segment prefixes), this hint is accepted but ignored by non-
not documented. NetBurst CPUs.

Only available on some x86


ALTINST 0F 3F Jump and execute instructions in the undocumented Alternate Instruction Set. processors made by VIA
Technologies.

VEX.66.0F38 On AMD Zen1, FMA4 instructions are present but undocumented (missing
(FMA4) (5C..5F,68..6F,78..7F) /r CPUID flag). The reason for leaving the feature undocumented may or may not Removed from Zen2 onwards.
imm8 have been due to a buggy implementation.[131]
Perform SHA-512 hashing.

Supported by OpenSSL [132] as part of its VIA PadLock support,


REP XSHA512 F3 0F A6 E0 but not documented by the VIA PadLock Programming Guide (htt
p://linux.via.com.tw/support/beginDownload.action?eleid=181&fid
=261).

Instructions to perform modular exponentiation and random number generation,


REP XMODEXP F3 0F A6 F8 respectively.
Only available on some x86
processors made by VIA
Listed in a VIA-supplied patch to add support for VIA Nano- Technologies and Zhaoxin.
specific PadLock instructions to OpenSSL,[133] but not
XRNG2 F3 0F A7 F8 documented by the VIA PadLock Programming Guide.

Detected by CPU fuzzing tools such as SandSifter[134] and UISFuzz[135] as executing without
causing #UD on several different VIA and Zhaoxin CPUs.

Un­known mnemonic 0F A7 (C1..C7) Unknown operation, may be related to the documented XSTORE
(0F A7 C0) instruction.

The whitepapers for SandSifter[134] and UISFuzz[135] report the detection of


large numbers of undocumented instructions in the 3DNow! opcode range on
several different AMD CPUs (at least Geode NX and C-50). Their operation is
not known.
Present on some AMD CPUs with
(unknown, multiple) 0F 0F /r ?? On at least AMD K6-2, all of the unassigned 3DNow! opcodes 3DNow!.
(other than the undocumented PF2IW, PI2FW and PSWAPW
instructions) execute as equivalents of POR (MMX bitwise-OR
instruction).[127]

Zhaoxin RSA/"xmodx" instructions. Mnemonics and CPUID flags are listed in a


Un­known. Some Zhaoxin CPUs[137] have the
MONTMUL2 Un­known Linux kernel patch for OpenEuler,[136] but opcodes and instruction descriptions CPUID flags for these instructions set.
are not available.
Microprocessor Report's article "MediaGX Targets Low-Cost PCs" from 1997, Un­known.
MOVDB, covering the introduction of the Cyrix MediaGX processor, lists several new
instructions that are said to have been added to this processor in order to
Un­known
support its new "Virtual System Architecture" features, including MOVDB and
No specification known to have
GP2MEM been published.
GP2MEM - and also mentions that Cyrix did not intend to publish specifications
for these instructions.[138]

Undocumented x87 instructions

Mnemonics Opcodes Description Status


FENI, Documented for the Intel 80287. [71]
FPU Enable
DB E0
FENI8087_NOP Interrupts (8087) Present on all Intel x87 FPUs from 80287 onwards. For FPUs other than the ones where they
were introduced on (8087 for FENI/FDISI and 80287 for FSETPM), they act as NOPs.
FDISI,
FPU Disable These instructions and their operation on modern CPUs are commonly mentioned in later
DB E1
FDISI8087_NOP Interrupts (8087) Intel documentation, but with opcodes omitted and opcode table entries left blank (e.g. Intel
SDM 325462-077, April 2022 (https://cdrdv2-public.intel.com/671200/325462-sdm-vol-1-2abc
FSETPM, d-3abcd.pdf) mentions them twice without opcodes).
FPU Set
FSETPM287_NOP
DB E4 Protected Mode The opcodes are, however, recognized by Intel XED.[139]
(80287)

These opcodes are listed as reserved opcodes that will produce "unpredictable results" without generating
D9 D7,  D9 E2, exceptions on at least Cyrix 6x86,[140] 6x86MX, MII, MediaGX, and AMD Geode GX/LX.[141] (The
D9 E7,  DD FC, documentation for these CPUs all list the same ten opcodes.)
"Reserved by
(no mnemonic) DE D8,  DE DA,
Cyrix" opcodes
DE DC,  DE DD, Their actual operation is not known, nor is it known whether their operation is the same on all
DE DE,  DF FC of these CPUs.

https://en.wikipedia.org/wiki/X86_instruction_listings 49/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia

See also
CLMUL
RDRAND
Larrabee extensions
Advanced Vector Extensions 2
Bit Manipulation Instruction Sets
CPUID
List of discontinued x86 instructions

References
1. "Re: Intel Processor Identification and the CPUID Instruction" (http://ww 21. Cyrix 486SLC/e Data Sheet (1992) (http://www.bitsavers.org/component
w.intel.com/content/www/us/en/processors/processor-identification-cpui s/cyrix/Cyrix_Cx486SLCe_Data_Sheet_1992.pdf), section 2.6.4
d-instruction-note.html?wapkw=processor-identification-cpuid-instructio 22. Robert Collins, CPUID Algorithm Wars (https://web.archive.org/web/200
n). Retrieved 2013-04-21. 01218003500/http://www.rcollins.org/ddj/Nov96/Nov96.html), nov 1996.
2. Andrew Schulman, "Unauthorized Windows 95" (ISBN 1-56884-169-8), Archived from the original (http://www.rcollins.org/ddj/Nov96/Nov96.htm
chapter 8, p.249,257. l) on dec 18, 2000.
3. US Patent 4974159 (https://patents.google.com/patent/US4974159A/), 23. Geoff Chappell, CMPXCHG8B Support in the 32-Bit Windows Kernel (ht
"Method of transferring control in a multitasking computer system" tps://www.geoffchappell.com/studies/windows/km/cpu/cx8.htm), 23 jan
mentions 63h/ARPL. 2008
4. WikiChip, UMIP - x86 (https://en.wikichip.org/wiki/x86/umip) 24. Intel, Software Developer's Manual (https://kib.kiev.ua/x86docs/Intel/SD
5. Oracle Corp, Oracle® VM VirtualBox Administrator's Guide for Release Ms/325462-077.pdf), order no. 325426-077, Nov 2022 - the entry on the
6.0, section 3.5: Details About Software Virtualization (https://docs.oracl RDTSC instruction on p.1739 describes the instruction sequences
e.com/en/virtualization/virtualbox/6.0/admin/swvirt-details.html) required to order the RDTSC instruction with respect to earlier and later
6. MBC Project,Virtual Machine Detection (https://github.com/MBCProject/ instructions.
mbc-markdown/blob/master/anti-behavioral-analysis/detect-vm.md) 25. Linux kernel 5.4.12, /arch/x86/kernel/cpu/centaur.c (https://elixir.bootlin.c
om/linux/v5.4.12/source/arch/x86/kernel/cpu/centaur.c#L110)
7. Michal Necasek, SGDT/SIDT Fiction and Reality (https://www.os2muse
um.com/wp/sgdtsidt-fiction-and-reality/) 26. Stack Overflow, Can constant non-invariant tsc change frequency
8. Intel, How Microarchitectural Data Sampling works (https://www.intel.co across cpu states? (https://stackoverflow.com/questions/62492053/can-
m/content/www/us/en/developer/articles/technical/software-security-guid constant-non-invariant-tsc-change-frequency-across-cpu-states)
ance/technical-documentation/intel-analysis-microarchitectural-data-sa Accessed 24 Jan 2023.
mpling.html), see mitigations section. Archived (https://archive.ph/dRI0 27. CPU-World, CPUID for IDT ZHAOXIN KaiXian KX-U6780 2.7 GHz (by
B) on Apr 22,2022 KZ) (https://www.cpu-world.com/cgi-bin/CPUID.pl?CPUID=78468).
9. Linux kernel documentation, Microarchitectural Data Sampling (MDS) Accessed 24 Jan 2023.
mitigation (https://www.kernel.org/doc/html/latest/x86/mds.html) 28. JookWiki, "nopl" (https://www.jookia.org/wiki/Nopl), sep 24, 2022 -
provides a lengthy account of the history of the long NOP and the
10. sandpile.org, x86 architecture rFLAGS register (https://www.sandpile.or
issues around it. Archived (https://web.archive.org/web/2022102823322
g/x86/flags.htm), see note #7
5/https://www.jookia.org/wiki/Nopl) on oct 28, 2022.
11. "Intel 80386 CPU Information | PCJS Machines" (https://www.pcjs.org/d
ocuments/manuals/intel/80386/#b0-stepping). 29. Intel Community: Multibyte NOP Made Official (https://community.intel.c
om/t5/Software-Archive/Multi-byte-NOP-opcode-made-official/td-p/9325
12. Geoff Chappell, CPU Identification before CPUID (https://www.geoffcha 80). Archived (https://web.archive.org/web/20220407203915/https://com
ppell.com/studies/windows/km/cpu/precpuid.htm?tx=245) munity.intel.com/t5/Software-Archive/Multi-byte-NOP-opcode-made-offic
13. Toth, Ervin (1998-03-16). "BSWAP with 16-bit registers" (https://web.arc ial/td-p/932580) on 7 Apr 2022.
hive.org/web/19991103025640/http://www.df.lth.se/~john_e/gems/gem0 30. Intel Software Developers Manual, vol 3B (https://www.intel.com/conten
00c.html). Archived from the original (http://www.df.lth.se/~john_e/gems/ t/www/us/en/developer/articles/technical/intel-sdm.html) (order no
gem000c.html) on 1999-11-03. "The instruction brings down the upper 253669-076us, December 2021), section 22.15 "Reserved NOP"
word of the doubleword register without affecting its upper 16 bits."
31. AMD, AMD 64-bit Technology - AMD x86-64 Architecture Programmer’s
14. Coldwin, Gynvael (2009-12-29). "BSWAP + 66h prefix" (https://gynvael. Manual Volume 3 (https://kib.kiev.ua/x86docs/AMD/AMD64/24594_APM
coldwind.pl/?id=268). Retrieved 2018-10-03. "internal (zero-)extending _v3-r3.02.pdf), publication no. 24594, rev 3.02, aug 2002, page 379.
the value of a smaller (16-bit) register … applying the bswap to a 32-bit
32. AMD, AMD-K6 Processor Data Sheet (https://www.ardent-tool.com/CP
value "00 00 AH AL", … truncated to lower 16-bits, which are "00 00". …
U/docs/AMD/K6/20695.pdf), order no. 20695H/0, March 1998, section
Bochs … bswap reg16 acts just like the bswap reg32 … QEMU …
ignores the 66h prefix" 24.2, page 283.
15. Intel "i486 Microprocessor" (https://www.ardent-tool.com/CPU/docs/Inte 33. George Dunlap, The Intel SYSRET Privilege Escalation (https://xenproje
l/486/datasheets/240440-001.pdf) (April 1989, order no. 240440-001) ct.org/2012/06/13/the-intel-sysret-privilege-escalation/), The Xen
Project., 13 june 2012
p.142 lists CMPXCHG with 0F A6/A7 encodings.
34. Intel, AP-485: Intel® Processor Identification and the CPUID Instruction
16. "Intel 486 & 486 POD CPUID, S-spec, & Steppings" (http://datasheets.c
hipdb.org/Intel/x86/486/Intel486.htm). (http://kib.kiev.ua/x86docs/Intel/AppNote485/241618-039.pdf), order no.
241618-039, may 2012, section 5.1.2.5, page 32
17. Intel "i486 Microprocessor" (https://www.ardent-tool.com/CPU/docs/Inte
l/486/datasheets/240440-002.pdf) (November 1989, order no. 240440- 35. Michal Necasek, "SYSENTER, Where Are You?" (http://www.os2museu
m.com/wp/sysenter-where-are-you/)
002) p.135 lists CMPXCHG with 0F B0/B1 encodings.
36. AMD, Athlon Processor x86 Code Optimization Guide (https://pdf.datash
18. Frank van Gilluwe, "The Undocumented PC, second edition", 1997,
ISBN 0-201-47950-8, page 55 eetcatalog.com/datasheet/AdvancedMicroDevices/mXvyvs.pdf),
publication no. 22007, rev K, feb 2002, appendix F, page 284
19. Intel, Software Developer’s Manual, vol 3A (http://kib.kiev.ua/x86docs/Int
el/SDMs/253668-078.pdf), order no. 253668-078, Dec 2022, section 37. Transmeta, Processor Recognition (http://datasheets.chipdb.org/Transm
eta/Crusoe/Crusoe_CPUID_5-7-02.pdf), May 7, 2002.
9.3, page 299
20. "RSM—Resume from System Management Mode" (https://web.archive. 38. VIA, VIA C3 Nehemiah Processor Datasheet (http://datasheets.chipdb.o
rg/VIA/Nehemiah/VIA%20C3%20Nehemiah%20Datasheet%20R113.pd
org/web/20120312224625/http://www.softeng.rl.ac.uk/st/archive/SoftEn
g/SESP/html/SoftwareTools/vtune/users_guide/mergedProjects/analyze f), rev 1.13, sep 29, 2004, page 17
r_ec/mergedProjects/reference_olh/mergedProjects/instructions/instruct 39. CPU-World, CPUID for Intel Xeon 3.40 GHz (https://www.cpu-world.co
32_hh/vc279.htm). Archived from the original on 2012-03-12. m/cgi-bin/CPUID.pl?CPUID=75151) - Nocona stepping D CPUID
without CMPXCHG16B

https://en.wikipedia.org/wiki/X86_instruction_listings 50/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia
40. CPU-World, CPUID for Intel Xeon 3.60 GHz (https://www.cpu-world.co 62. Intel, 11th Generation Intel® Core™ Processor Desktop Datasheet,
m/cgi-bin/CPUID.pl?CPUID=75154) - Nocona stepping E CPUID with Volume 1 (https://cdrdv2.intel.com/v1/dl/getContent/634648), may 2022,
CMPXCHG16B order no. 634648-004, section 3.5, page 65
41. SuperUser StackExchange, How prevalent are old x64 processors 63. Intel, Which Platforms Support Intel® Software Guard Extensions
lacking the cmpxchg16b instruction? (https://superuser.com/questions/1 (Intel® SGX) SGX2? (https://www.intel.com/content/www/us/en/support/
87254/how-prevalent-are-old-x64-processors-lacking-the-cmpxchg16b-i articles/000058764/software/intel-security-products.html) Archived (http
nstruction) s://archive.ph/nQHXO) on 5 May 2022.
42. Intel SDM order no. 325462-077 (https://www.intel.com/content/www/us/ 64. "The CLDEMOTE Story" (https://twitter.com/InstLatX64/status/1521562
en/developer/articles/technical/intel-sdm.html), apr 2022, vol 2B, p.4- 151848132609). Twitter. Retrieved 2023-01-23.
130 "MOVSX/MOVSXD-Move with Sign-Extension" lists MOVSXD 65. Wikichip, CLZERO - x86 (https://en.wikichip.org/w/index.php?title=x86/c
without REX.W as "discouraged" lzero&oldid=94738)
43. Anandtech, AMD Zen 3 Ryzen Deep Dive Review (https://www.anandte 66. Intel, Application note AP-578: Software and Hardware Considerations
ch.com/show/16214/amd-zen-3-ryzen-deep-dive-review-5950x-5900x-5 for FPU Exception Handlers for Intel Architecture Processors (https://ard
800x-and-5700x-tested/6), nov 5, 2020, page 6 ent-tool.com/CPU/docs/Intel/IA/243291-002.pdf), order no. 243291-002,
44. "Saving Private Ryzen: PEXT/PDEP 32/64b replacement functions for February 1997
#AMD CPUs (BR/#Zen/Zen+/#Zen2) based on @zwegner's zp7" (http 67. Intel, Application Note AP-113: Getting Started With The Numeric Data
s://twitter.com/instlatx64/status/1322503571288559617). Twitter. Processor (https://ardent-tool.com/CPU/docs/Intel/808x/8087/appnotes/
Retrieved 2023-01-20. AP-113.pdf), feb 1981, pages 24-25
45. Wegner, Zach (4 November 2020). "zwegner/zp7" (https://github.com/z 68. Intel, 8087 Math Coprocessor (http://www.datasheetcatalog.com/datash
wegner/zp7). GitHub. eets_pdf/8/0/8/7/8087.shtml), oct 1989, order no. 285385-007, page 3-
46. Intel, Control-flow Enforcement Technology Specification (https://kib.kie 100, fig 9
v.ua/x86docs/Intel/CET/334525-003.pdf) (v3.0, order no. 334525-003, 69. Intel, 80287 80-bit HMOS Numeric Processor Extension (http://www.bits
March 2019) avers.org/components/intel/_dataSheets/80287_Data_Sheet_Feb83.pd
47. Intel SDM, rev 076, December 2021 (https://kib.kiev.ua/x86docs/Intel/S f), feb 1983, order no. 201920-001, page 14
DMs/253665-076.pdf), volume 1, section 18.3.1 70. Intel, iAPX86, 88 User's Manual (https://ardent-tool.com/CPU/docs/Intel/
48. Binutils mailing list: x86: CET v2.0: Update NOTRACK prefix (https://sou 808x/manuals/210201-001.pdf), 1981 (order no. 210201-001), p. 797
rceware.org/pipermail/binutils/2017-June/098516.html) 71. Intel 80286 and 80287 Programmers Reference Manual (http://bitsaver
49. AMD, Extensions to the 3DNow! and MMX Instruction Sets (https://refsp s.trailing-edge.com/components/intel/80286/210498-005_80286_and_8
ecs.linuxfoundation.org/AMD-extensions.pdf), ref no. 22466D/0, March 0287_Programmers_Reference_Manual_1987.pdf), 1987 (order no.
2000, p.11 210498-005), p. 485
50. Hadi Brais, The Significance of the x86 SFENCE instruction (https://hadi 72. Intel Software Developer's Manual (https://kib.kiev.ua/x86docs/Intel/SD
brais.wordpress.com/2019/02/26/the-significance-of-the-x86-sfence-inst Ms/253669-064.pdf) volume 3B, revision 064, section 22.18.9
ruction/), 26 Feb 2019. 73. "GCC Bugzilla – 37179 – GCC emits bad opcode 'ffreep' " (https://gcc.g
51. Intel, Software Developer's Manual (https://kib.kiev.ua/x86docs/Intel/SD nu.org/bugzilla/show_bug.cgi?id=37179).
Ms/325462-077.pdf), order no. 325426-077, Nov 2022, Volume 1, 74. Michael Steil, FFREEP – the assembly instruction that never existed (htt
section 11.4.4.3, page 276. ps://www.pagetable.com/?p=16)
52. Hadi Brais, The Significance of the LFENCE instruction (https://hadibrai 75. Dusko Koncaliev, Pentium FDIV Bug (https://www.cs.earlham.edu/~dus
s.wordpress.com/2018/05/14/the-significance-of-the-x86-lfence-instructi ko/cs63/fdiv.html)
on/), 14 May 2018
76. Bruce Dawson, Intel Underestimates Error Bounds by 1.3 quintillion (htt
53. AMD, Software techniques for managing speculation on AMD processor ps://randomascii.wordpress.com/2014/10/09/intel-underestimates-error-
(https://www.amd.com/system/files/documents/software-techniques-for- bounds-by-1-3-quintillion/)
managing-speculation.pdf), rev 3.8.22, 8 March 2022, page 4. Archived 77. Intel SDM, rev 053 (https://kib.kiev.ua/x86docs/Intel/SDMs/253665-053.
(https://web.archive.org/web/20220313090311/https://www.amd.com/sy
pdf) and later, describes the exact argument reduction procedure used
stem/files/documents/software-techniques-for-managing-speculation.pd
for FSIN, FCOS, FSINCOS and FPTAN in volume 1, section 8.3.8
f) on 13 March 2022.
78. Intel, Intel® 64 and IA-32 Architectures Optimization Reference Manual
54. Intel, Prescott New Instructions Software Developer’s Guide (https://mat (https://kib.kiev.ua/x86docs/Intel/Intel-OptimGuide/248966-029.pdf)
h-atlas.sourceforge.net/devel/assembly/sse3.pdf), order no. 252490- (order no. 248966-044, June 2021) section 3.5.2.3
003, june 2003, pages 3-26 and 3-38 list MONITOR and MWAIT with
explicit operands. 79. "The microarchitecture of Intel, AMD and VIA CPUs: An optimization
guide for assembly programmers and compiler makers" (http://www.agn
55. Flat Assembler messageboard, "BLENDVPS/BLENDVPD/PBLENDVB er.org/optimize/microarchitecture.pdf) (PDF). Retrieved October 17,
syntax" (https://board.flatassembler.net/topic.php?p=98558#98558), 2016.
also covers MONITOR/MWAIT mnemonics
80. "Chess programming AVX2" (https://chessprogramming.wikispaces.co
56. Intel, Intel® Xeon Phi™ Product Family x200 (KNL) User mode (ring 3)
m/AVX2). Retrieved October 17, 2016.
MONITOR and MWAIT (https://web.archive.org/web/20170305002312/h
ttps://software.intel.com/en-us/blogs/2016/10/06/intel-xeon-phi-product-f 81. "Intel AVX-512 Instructions" (https://www.intel.com/content/www/us/en/d
amily-x200-knl-user-mode-ring-3-monitor-and-mwait) (archived 5 mar eveloper/articles/technical/intel-avx-512-instructions.html). Intel.
2017) Retrieved 21 June 2022.
57. AMD, BIOS and Kernel Developer’s Guide (BKDG) For AMD Family 82. Michal Ludvig, VIA PadLock—Wicked Fast Encryption (https://www.linux
10h Processors (https://www.amd.com/system/files/TechDocs/31116.pd journal.com/article/8042), 6 Apr 2005
f), order no. 31116, rev 3.62, page 419 83. Zhaoxin, Core Technology | Instructions for the use of accelerated
58. Guru3D, VIA Zhaoxin x86 4 and 8-core SoC processors launch (https:// instructions for national encryption algorithm based on Zhaoxin
www.guru3d.com/news-story/via-zhaoxin-x86-4-and-8-core-processors-l processor (https://www.zhaoxin.com/prodshow_view.aspx?id=483&nid=
aunched.html), Jan 22, 2018 31&typeid=63&IsActiveTarget=True) (in Chinese). Archived (https://web.
archive.org/web/20220105165938/https://www.zhaoxin.com/prodshow_
59. Vulners, x86: DoS from attempting to use INVPCID with a non- view.aspx?id=483&nid=31&typeid=63&IsActiveTarget=True) on Jan 5,
canonical addresses (https://vulners.com/xen/XSA-279), 20 nov 2018 2022
60. Intel, Intel® 64 and IA-32 Architectures Software Developer’s Manual (ht 84. Zhaoxin, GMI User Manual v1.0 (https://github.com/ZXOpenSource/Ope
tp://kib.kiev.ua/x86docs/Intel/SDMs/325384-078.pdf) volume 3, order
nSSL-ZX-GMI/blob/63e8835e1854042ac949a04eb87c604cb1d7d5ea/G
no. 325384-078, december 2022, chapter 23.15 MI%20User%20Manual%20V1.0.pdf) (in Chinese). Archived (https://we
61. Catherine Easdon, Undocumented CPU Behaviour on x86 and RISC-V b.archive.org/web/20220228071045/https://raw.githubusercontent.com/
Microarchitectures: A Security Perspective (https://www.cattius.com/ima ZXOpenSource/OpenSSL-ZX-GMI/master/GMI%20User%20Manual%2
ges/thesis-unsigned.pdf), 10 May 2019, page 39 0V1.0.pdf) on Feb 28, 2022

https://en.wikipedia.org/wiki/X86_instruction_listings 51/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia
85. Intel, Intel® Atom™ Processor S1200 Product Family for Microserver 112. "Invalid opcode handling \ VOGONS" (https://www.vogons.org/viewtopi
Datasheet, Volume 1 of 2 (https://www.intel.com/content/dam/www/publi c.php?t=13379). Archived (https://web.archive.org/web/2022040907115
c/us/en/documents/datasheets/atom-processor-s1200-datasheet-vol-1.p 9/https://www.vogons.org/viewtopic.php?t=13379) from the original on 9
df), order no. 328194-001, dec 2012, page 44 Apr 2022.
86. SecurityWeek, Intel Adds TDX to Confidential Computing Portfolio With 113. "Invalid instructions cause exit even if Int 6 is hooked \ VOGONS" (http
Launch of 4th Gen Xeon Processors (https://www.securityweek.com/inte s://www.vogons.org/viewtopic.php?t=21418). Archived (https://web.archi
l-adds-tdx-confidential-computing-portfolio-launch-4th-gen-xeon-process ve.org/web/20220409071155/https://www.vogons.org/viewtopic.php?t=2
ors), 10 jan 2023 1418) from the original on 9 Apr 2022.
87. Robert Collins, Undocumented OpCodes: AAM (http://www.rcollins.org/s 114. "Tutorial - Calling Win32 from DOS" (https://www.ragestorm.net/tutorial?
ecrets/opcodes/AAM.html) id=27). Ragestorm. 17 Sep 2005. Archived (https://web.archive.org/web/
88. Frank van Gilluwe, "The Undocumented PC - Second Edition", p. 93-95 20220409071159/https://www.ragestorm.net/tutorial?id=27) from the
89. Michal Necasek, Intel 486 Errata? (http://www.os2museum.com/wp/intel original on 4 Apr 2022.
-486-errata/) 115. "Accessing Windows device drivers from DOS programs" (https://sta.c6
4.org/blog/dosvddaccess.html).
90. Robert Hummel, "PC Magazine Programmer's Technical Reference"
(ISBN 1-56276-016-5) p.728 116. "8086 microcode disassembled" (https://www.reenigne.org/blog/8086-mi
91. Raúl Gutiérrez Sanz, Undocumented 8086 Opcodes, Part I (http://www. crocode-disassembled/). Reengine blog. 2020-09-03. Retrieved
2022-07-26. "Using the REP or REPNE prefix with a MUL or IMUL
os2museum.com/wp/undocumented-8086-opcodes-part-i/)
instruction negates the product. Using the REP or REPNE prefix with an
92. "Asm, opcode 82h" (http://computer-programming-forum.com/46-asm/1 IDIV instruction negates the quotient."
43edbd28ae1a091.htm).
117. "Re: Undocumented opcodes (HINT_NOP)" (https://web.archive.org/we
93. Intel Corporation 2022, p. 3698. b/20041106070621/http://www.sandpile.org/post/msgs/20004129.htm).
94. Intel, The 8086 Family User's Manual, October 1979 (http://bitsavers.or Archived from the original (http://www.sandpile.org/post/msgs/2000412
g/components/intel/8086/9800722-03_The_8086_Family_Users_Manua 9.htm) on 2004-11-06. Retrieved 2010-11-07.
l_Oct79.pdf), opcodes omitted on pages 4-25 and 4-31 118. "Re: Also some undocumented 0Fh opcodes" (https://web.archive.org/w
95. Retrocomputing StackExchange, Undocumented instructions in x86 eb/20030626044017/http://www.sandpile.org/post/msgs/20003986.htm).
CPU prior to 80386? (https://retrocomputing.stackexchange.com/questi Archived from the original (http://www.sandpile.org/post/msgs/2000398
ons/20031/undocumented-instructions-in-x86-cpu-prior-to-80386) 6.htm) on 2003-06-26. Retrieved 2010-11-07.
96. Daniel B. Sedory, An Examination of the Standard MBR (https://thestar 119. Intel's RCCE library (https://github.com/Intel-SCC/RCCE/blob/master/sr
man.pcministry.com/asm/mbr/STDMBR.htm#REP) c/RCCE_admin.c#L87) for the SCC uses opcode 0F 0A for SCC's
97. AMD, Software Optimization Guide for AMD64 Processors (https://www. message invalidation instruction.
amd.com/system/files/TechDocs/25112.PDF) (publication 25112, 120. Intel Labs, SCC External Architecture Specification (EAS), Revision
revision 3.06, sep 2005), section 6.2, p.128 0.94 (https://www.intel.com/content/dam/www/public/us/en/documents/t
98. Bug 48227 - "rep ret" generated for -march=core2 (https://gcc.gnu.org/b echnology-briefs/intel-labs-single-chip-cloud-architecture-brief.pdf), p.29
ugzilla/show_bug.cgi?id=48227) 121. "Undocumented x86 instructions to control the CPU at the
99. Raymond Chen, My, what strange NOPs you have! (https://devblogs.mi microarchitecture level in modern Intel processors" (https://raw.githubus
crosoft.com/oldnewthing/20110112-00/?p=11773) ercontent.com/chip-red-pill/udbgInstr/main/paper/undocumented_x86_in
100. Jeff Parsons, Intel 80386 CPU information (https://www.pcjs.org/docum sts_for_uarch_control.pdf) (PDF). 9 July 2021.
ents/manuals/intel/80386/#b1-errata) (B1 errata section, item #7) 122. Robert R. Collins, Undocumented OpCodes: UMOV (http://www.rcollins.
101. Retrocomputing StackExchange, 0F1h opcode-prefix on i80286 (https:// org/secrets/opcodes/UMOV.html)
retrocomputing.stackexchange.com/questions/12004/0f1h-opcode-prefi 123. Herbert Oppmann, NXOP (Opcode 0Fh 55h) (https://www.memotech.fra
x-on-i80286) nken.de/NexGen/Opcode0F55.html)
102. Intel Software Developers Manual, volume 2B (https://kib.kiev.ua/x86do 124. Herbert Oppmann, NexGen Nx586 Hypercode Source (https://www.me
cs/Intel/SDMs/253667-018.pdf) (Jan 2006, order no 235667-018, does motech.franken.de/NexGen/Source/index.html), see COMMON.INC
not have long NOP) 125. Herbert Oppmann, Inside the NexGen Nx586 System BIOS (https://ww
103. Intel Software Developers Manual, volume 2B (https://kib.kiev.ua/x86do w.memotech.franken.de/NexGen/Bios.html)
cs/Intel/SDMs/253667-019.pdf) (March 2006, order no 235667-019, has 126. Grzegorz Mazur, AMD 3DNow! undocumented instructions (https://web.
long NOP) archive.org/web/20000121143428/http://x86.ddj.com/articles/3dnow/am
104. Agner Fog, Instruction Tables (https://www.agner.org/optimize/instructio d_3dnow.htm)
n_tables.pdf), AMD K7 section. 127. "Undocumented 3DNow! Instructions" (https://web.archive.org/web/200
105. "579838 – glibc not compatible with AMD Geode LX" (https://bugzilla.re 30130030723/http://grafi.ii.pw.edu.pl/gbm/x86/3dundoc.html).
dhat.com/show_bug.cgi?id=579838#c46). grafi.ii.pw.edu.pl. Archived from the original (http://grafi.ii.pw.edu.pl/gbm/
106. Intel Software Developers Manual, volume 2B (https://kib.kiev.ua/x86do x86/3dundoc.html) on 30 January 2003. Retrieved 22 February 2022.
cs/Intel/SDMs/253667-015.pdf) (April 2005, order no 235667-015, does 128. Potemkin's Hacker Group's OPCODE.LST, v4.51 (http://phg.chat.ru/opc
not list 0F0D-nop) ode.txt)
107. Intel Software Developers Manual, volume 2B (https://kib.kiev.ua/x86do 129. "[UCA CPU Analysis] Prototype UMC Green CPU U5S-SUPER33" (http
cs/Intel/SDMs/253667-016.pdf) (June 2005, order no 235667-016, lists s://x86.fr/uca-cpu-analysis-prototype-umc-green-cpu-u5s-super33). 25
0F0D-nop in opcode table but not under NOP instruction description.) May 2020.
108. Intel Software Developers Manual, volume 2B (https://kib.kiev.ua/x86do 130. Agner Fog, The Microarchitecture of Intel, AMD and VIA CPUs (https://w
cs/Intel/SDMs/253667-060.pdf) (order no. 253667-060, September ww.agner.org/optimize/microarchitecture.pdf), section 3.4 "Branch
2016) does not list UD0 and UD1. Prediction in P4 and P4E".
109. Intel Software Developers Manual, volume 2B (https://kib.kiev.ua/x86do 131. Reddit /r/Amd discussion thread: Ryzen has undocumented support for
cs/Intel/SDMs/253667-061.pdf) (order no. 253667-061, December FMA4 (https://www.reddit.com/r/Amd/comments/68s4bj/ryzen_has_und
2016) lists UD0 and UD1. ocumented_support_for_fma4/dh0y353/)
110. "PCJS : pcjs/x86op0F.js (two-byte x86 opcode handlers), lines 1647- 132. "Welcome to the OpenSSL Project" (https://github.com/openssl/openssl/
1651" (https://github.com/jeffpar/pcjs/blob/e565ffa65d8ee5d600ec04e62 blob/1aa89a7a3afb053d0c0b7fad8d3ea1b0a5447289/engines/asm/e_p
c6651dabb4894cb/machines/pcx86/lib/x86op0f.js#L1647). GitHub. 17 adlock-x86.pl#L597). GitHub. 21 April 2022.
April 2022. 133. PATCH: Update PadLock engine for VIA C7 and Nano CPUs (https://ma
111. "80486 paging protection faults? \ VOGONS" (https://www.vogons.org/vi rc.info/?l=openssl-dev&m=130767391615291&w=2)
ewtopic.php?t=62949). Archived (https://web.archive.org/web/20220409 134. Christopher Domas, Breaking the x86 ISA (https://github.com/xoreaxeax
071156/https://www.vogons.org/viewtopic.php?t=62949) from the eax/sandsifter/blob/master/references/domas_breaking_the_x86_isa_w
original on 9 Apr 2022. p.pdf)

https://en.wikipedia.org/wiki/X86_instruction_listings 52/53
3/14/23, 12:36 AM x86 instruction listings - Wikipedia
135. Xixing Li et al, UISFuzz: An Efficient Fuzzing Method for CPU 138. Microprocessor Report, MediaGX Targets Low-Cost PCs (http://www.ce
Undocumented Instruction Searching (https://ieeexplore.ieee.org/abstra cs.uci.edu/~papers/mpr/MPR/ARTICLES/110301.PDF) (vol 11, no. 3,
ct/document/8863327) mar 10, 1997)
136. OpenEuler mailing list, PATCH kernel-4.19 v2 5/6 : x86/cpufeatures: 139. ISA datafile for Intel XED (https://github.com/intelxed/xed/blob/ef19f00d
Add Zhaoxin feature bits (https://mailweb.openeuler.org/hyperkitty/list/ke e14a9c2c253c1c9b1119e1617280e3f2/datafiles/xed-isa.txt#L916) (April
rnel@openeuler.org/thread/W6GXBRRO6OKNHVJ3WDDUXSLQGI2G 17, 2022), lines 916-944
FU4X/). Archived (https://web.archive.org/web/20220409071314/https:// 140. Cyrix 6x86 processor data book (https://www.ardent-tool.com/CPU/doc
mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/thread/W6G s/Cyrix/6x86/94175.pdf), page 6-34
XBRRO6OKNHVJ3WDDUXSLQGI2GFU4X/) on 9 Apr 2022. 141. AMD Geode LX Processors Data Book (https://www.amd.com/system/fil
137. CPUID dump for Zhaoxin KaiXian-U6870 (http://users.atw.hu/instlatx64/ es/TechDocs/33234H_LX_databook.pdf), publication 33234H, p.670
CentaurHauls/CentaurHauls00307B1_U6780A_CPUID2.txt), see
C0000001 line. Archived (https://web.archive.org/web/2022040907131
4/http://users.atw.hu/instlatx64/CentaurHauls/CentaurHauls00307B1_U
6780A_CPUID2.txt) on 9 Apr 2022.

Intel Corporation (April 2022). "Intel 64 and IA-32 Architectures Software Developer's Manual, Combined Volumes: 1, 2A, 2B, 2C, 2D, 3A, 3B, 3C, 3D
and 4" (https://cdrdv2.intel.com/v1/dl/getContent/671200). Intel. Retrieved 21 June 2022.

External links
Free IA-32 and x86-64 documentation (https://software.intel.com/en-us/articles/intel-sdm), provided by Intel
x86 Opcode and Instruction Reference (http://ref.x86asm.net)
x86 and amd64 instruction reference (https://www.felixcloutier.com/x86/index.html)
Instruction tables: Lists of instruction latencies, throughputs and micro-operation breakdowns for Intel, AMD and VIA CPUs (https://www.agner.org/opti
mize/instruction_tables.pdf)
Netwide Assembler Instruction List (https://www.nasm.us/doc/nasmdocb.html) (from Netwide Assembler)

Retrieved from "https://en.wikipedia.org/w/index.php?title=X86_instruction_listings&oldid=1143743363"

https://en.wikipedia.org/wiki/X86_instruction_listings 53/53

You might also like