You are on page 1of 38



CPU Architecture- Bus Operations Pipelining Brach predication floating point unit- Operating Modes
Paging Multitasking Exception and Interrupts Instruction set addressing modes Programming the Pentium processor.


Arcon RISC Machine Architectural Inheritance Core & Architectures - Registers Pipeline - Interrupts
ARM organization - ARM processor family Co-processors - ARM instruction set- Thumb Instruction set - Instruction
cycle timings - The ARM ProgrammersmodelARM Development tools ARM Assembly Language Programming - C
programming Optimizing ARM Assembly Code Optimized Primitives.

Introduction to DSP on ARM FIR filter IIR filter Discrete fourier transform Exception handling
Interrupts Interrupt handling schemes- Firmware and bootloader Embedded Operating systems Integrated
Development Environment- STDIO Libraries Peripheral Interface Application of ARM Processor - Caches Memory
protection Units Memory Management units Future ARM Technologies.
Instruction set addressing modes operating modes- Interrupt system- RTC-Serial Communication Interface
A/D Converter PWM and UART.
CPU Architecture Instruction set interrupts- Timers- I2C Interfacing UART- A/D Converter PWM and
introduction to C-Compilers.

1. Andrew N.Sloss, Dominic Symes and Chris Wright ARM System Devel Guide : Designing and Optimizing System
1. Steve Furber , ARMSystem On Chip architecture,AddisionWesley,2000.
2. Daniel Tabak , AdvancedMicroprocessors,McGrawHill.Inc.1995
3. James L. Antonakos , The Pentium Microprocessor,PearsonEducation,1997.
4. Gene .H.Miller, MicroComputer Engineering,PearsonEducation , 2003.
5. John .B.Peatman , DesignwithPIC Microcontroller,PrenticeHall,1997.
6. James L.Antonakos, AnIntroduction to the Intel family of Microproces
7. Barry.B.Brey,TheIntelMicroprocessors Architecture, Programming and
8. Valvano, "Embedded Microcomputer Systems", Thomson Asia PVT LTD first reprint 2001. Readings: Web links

2 UNIT - I

High Performance CISC

Architecture Pentium
32- bit address bus
64- bit data bus
8 KB instruction cache
Branch target buffer and prefetch buffer
Branch prediction
Branch target
8 KB data cache
Super scalar architecture parallel execution of instructions.
U and V pipeline
Floating point unit 80 bit precision

Control flags - IF (interrupt enable flag), DF (direction flag) and TF (trap flag).
Status flag CF (carry flag), PF (parity flag), AF (auxiliary carry flag), ZF (zero flag), SF (sign Flag), OF (overflow
flag), NT (nested task) and IOPL (input/output privilege level).


Bus Operations - Decoding a Bus Cycle

Memory/Input Output
Cache ability
Cache Enable
Special Bus Cycles
Requires additional decoding
Use byte enable outputs for selection

Bus Cycle States
Six possible states Ti, T1, T2,T12, T2P and TD
Single Transfer Cycle

This shows transfer up to 8 bytes of non-cacheable data between processor and memory

Burst Cycles

32 bytes of burst read and write.

Processor supplies the starting address of first group of 8 bytes at the beginning of the cycle.
The next three groups are transferred according to the burst order.

Locked Operations
An atomic operation cannot be broken down into smaller suboperations
The data accessed is in the form of a semaphore a special type of counter variable that must be read, updated and stored in one
single uninterruptable operation
No device can access the processor busses i.e lock the bus
Used in multiprocessor system
Causes the processor to terminate any bus cycle currently in progress and tri-state its busses
Execution of the interrupted bus cycle is restarted when BOFF goes high.
Bus Hold
Completes the current bus cycle and then tri states its busses.
HLDA output indicates when the Pentium is in the HOLD state.
Interrupt Acknowledge
-Two interrupt acknowledge cycles INTR request and both are locked
-External circuitry supplies 8 bit interrupt vector number on D0 D7
-Data in the first cycle is ignored and accepted during the second.
-BE4 is low first interrupt acknowledge , BE0 is low second interrupt acknowledge
Cache Flush

When FLUSH input is low, Pentium flushes its internal code and data cache by performing writebacks for modified lines.
-When writeback is complete, processor runs a flush acknowledge cycle to inform external circuitry
-Shutdown cycle runs when internal parity error is detected
-Execution is suspended until the processor receives an NMI, INT or RESET request
-Shutdown cycle runs when HLT instruction is encountered
-Execution is suspended until the processor receives an NMI, INT, INTR or RESET request

Pipelined Cycles
-Pipelined read and write logic is done in response to a request on the NA input.
-Read cycles may be pipelined into write cycles and vice versa.
-Writeback and locked bus cycles are not pipelined
Inquire Cycles
Inquire cycles are used to maintain cache coherency in a multiprocessor
-Pentium processor watch the system bus ( address, data and control) in a multiprocessor system is called bus snooping
-If Pentium detects a memory read / write operation performed by another CPU, it runs an internal inquire cycle to determine
whether the address on the bus is stored in one of its internal caches


Technique used to enable one instruction to complete with each clock cycle
Non pipelined - nine clock cycles and pipelined five clock cycles
For eg 1000 instruction : 3000 clock cycles non pipelined and 1002 clock cycles
Two types of pipeline: instruction pipeline and a pipeline that performs special types of bus cycles

Instruction pipeline are 5 stages and are capable of independent operations.

PF - Prefetch
D1 - Instruction Decode
D2 - Address Generate
EX - Execute, Cache and ALU Access
WB Writeback

U pipeline executes any processor instruction but V pipeline simple instructions
-Instruction executing in parallel cannot have data dependencies between them
-Instructions are fed into PF after being prefetched from cache or memory.
-Decoders in D1 stage determine if current pair of instruction can execute together
-Prefix byte instruction require an additional clock cycle can execute in U pipeline only
-In D2 stage, addresses for operands that reside in memory are calculated
-In EX stage, operands are read for data cache (or memory), ALU operations are performed and branch prediction for instructions
are verified


Branch Prediction
Target Address address to which program transfer on branch instructions
Flushed instructions in the pipeline are discarded and new instructions are loaded
Bubbles no work done as pipeline are reloaded causes bubbles
-Dynamic branch prediction if prediction is true, pipeline will not be flushed and no clock cycles is lost
-Branch target buffer - a special cache that stores the instruction and target address of any branch instruction
-BTB stores two history bits that indicate the execution history of the last two branch instructions
-History bits are initially set to 00 when a new target address is placed into BTB and updated whenever the instruction is
-Failures to take the branch cause the history bits to become 00
-It is used to predict whether branch is taken or not
-Two 32 byte prefetch buffer work with BTB
-One buffer prefetches instructions from current program address
-The other buffer is activatedprefetch instruction from target address
-BTB is accessed with the address of the branch instruction during the D1 stage
-Incorrect predictions, correct prediction with wrong target address causes pipeline to be flushed.
-This wastes 3 clock cycle in U pipeline and 4 clock cycle in V pipeline


Floating Point Unit

8 stage pipelining (first 5 stages make up in U pipeline which processes integer instruction)
-PF - Prefetch
-D1 - Instruction Decode
-D2 - Address Generate
-EX Memory and register read, floating point data converted into memory format, memory read

13 X1Floating point execute, stage1. Memory data converted into floating point format, write operand to floating point register
file, bypass 1.
X2 - Floating point execute, stage two
WF Round floating point result and write to floating point register file, bypass2
ER Error reporting, update status word
Bypass1 connects the output of the X1 stage to input of EX stage
To allow a floating point register write operation in X1 stage to bypass floating point register file and send the results to the
instruction performing a floating point register read in the EX stage
This is technique called forwarding
Bypass 2 takes place between WF and EX stages
The result of an arithmetic instruction is made available to the next instruction fetching operands in the EX stage
Floating point register file contains the eight 80 bit floating point registers ST(0) to ST(7)
Read and write sections are dual ported to allow 2 reads or 2 writes simultaneously
Two read ports send data into EX stage
Two write ports receive data from X1 and WF stages
Operating Modes
Real Mode
-selected on power-up
-DOS operating system
1MB memory limit
Same base architecture of 8086 but allows access to 32 bit register set
Segment size is 64 KB and always start on 16- byte boundaries
20 bit physical address shifting 16 bit segment register to left and adding 16 bit offset
Uses a 1 KB interrupt vector table beginning at address 00000H
Protected Mode
-4GB of memory
Windows operating system
Supports multitasking, virtual memory addressing, memory management and protection
-Segment selectors points to segment descriptor contains address and control information. This 32 bit linear address is further
translated into physical address by paging
Relies on an interrupt descriptor table to support interrupts and exceptions.
Additional registers in protected mode:
Five control register control how memory and cache are used and how the FPU is handled and provide information on the
current execution state
CR0 contains control and status bits.
PG: paging. Enables paging when set
-CD: Cache Disable. Disables cache writes when set
-NW: Not Writethrough. Disables cache writethrough operations when set
-WP: Write Protect. Enforces supervisor level write protection when set
NE: Numeric Error. Allows floating point errors to be reported V
ET: Extension Type. Reserved
TS: Task Switch. Set when a task switch occurs
EM: Emulation. Indicates the presence of a coprocessor.
MP: Monitor Coprocessor. Must be set to run 80286 and 80386 programs on the Pentium
CR2 contains the 32 bit linear address that generated recent page fault
CR3 contains the base address of the current Page Directory
16 CR4 has six bits
VME: Virtual 8086 Mode Extensions. When set , enables emulation of a virtual interrupt flag
-PVI: Protected Mode Virtual Interrupts. When set, allows a virtual interrupt flag to be maintained in protected mode
TSD: Time Stamp Disable. Used to make RDTSC instruction privileged
DE: Debugging Extensions. Enables I/O breakpoints when set
PSE: Allows 4MB pages when set
MCE: Enables the machine check exception
Eight debug register
Used to support program debugging by indicating the address at which a program breakpoint was generated
Size of the breakpoint data or instruction
Whether it is a read or write request
What kind of bus cycle to generate breakpoints on
Segment Selectors
Contains 13 bit index field to select 8,192 segment descriptors that reside in Global descriptor table(GDT) or Local descriptor
TI bit selects appropriate descriptor table during translation
GDT is located in memory through GDTR and initialized with the LGDT instruction
LGDT loads six bytes of data from a source memory operand into GDTR
LDT are referenced through LDTR and initialized with the LLDT instruction
LLDT requires word size register or memory operand which represents index of the LDT in GDT
To obtain a copy of the current GDTR or LDTR, use SGDT or SLDT instruction
SGDT requires a six bytes destination operand in memory
The destination for SLDT requires word size register or memory operand
Two requestor privilege level (RPL) are used in protection checks to determine if access to the segment is allowed
A selector that has an index value of zero and points to the GDT is called a null selector
Segment Descriptor
Selector points to 8,192 descriptors stored in GDT or LDT
32 bit base address specifies the beginning of the segment of memory controlled by the descriptor
20 bit limit field indicates size of the segment


Granularity bit G is clear, segment size is up to 64KB.

When G is set, limit bits represent the number of 4KB pages contained in the segment (4KB to 4GB)
Two descriptor privilege level (DPL) bits specify the privilege level required to access the segment
P: indicates whether the segment is present in memory (set)
S:When set indicates system segment and when clear, the segment is a code or data segment
D/B:For code segments it controls the default operand and address size ( 16 bits when D/B is clear and 32 bits when set).
For data segments it controls how the stack is manipulated(via SP or ESP)
AVL: Available to programmer
Type: 4- bit field determines what kind of segment descriptor is being used
Segment Translation- Generating a linear address

Privilege Levels
RPL and DPL bits are used to perform protection checks each time an address is generated
Four level privilege hierarchy with level 0 is highest and level 3 is lowest
The privilege level of currently executing program is called current privilege level (CPL)
The lower 2 bits of the CS register specify the CPL
CPL is compared with RPL and DPL during address generation to enforce protection
Less privileged program may not access higher privileged segments
Four privilege level refers to as rings of protection
Privilege level 0 for private OS functions, level 1 for OS services available to applications, Level 2 for device drivers and level 3
for application programs

Fixed size page frames (4 KB each), paging enabled when PG bit in CR0 is set
Upper 20 bits of the virtual address is translated into physical address where page frame is located.
Lower 12 bits of virtual address are not translated and point to one of 4096 byte locations within a page frame.


Page Directories and Page Tables

Upper 10 bits of virtual address select one of 1024 entries in page directory.
The base address of page directory is stored in the page directory base register.
Each entry in the page directory is four bytes wide and contains the base address of a page table .
Next 10 bits of virtual address select one of 1024 entries in page table .
This entry is also four bytes wide and contains the base address of the actual physical memory page frame .
Translation Lookaside Buffers
TLBs automatically translate the upper 20 bits of the virtual address into upper 20 physical memory address bits.
TLBs are needed because the cache must be accessed with physical not virtual addresses
TLB can translate the virtual address and access the cache in a single clock cycle
TLBs contain few of most recently used pages
If the required information is not found in the TLB, the processor accesses the page directory and page table entries stored in
OS is responsible for loading the new translation information into the TLB
Page Directory Entry and Page Table Entry Formats
The upper 20 bits of each entry specify the base address.
For PDE, it is the base address of page table
For PTE, it is the base address of physical memory page frame
Three bits available for programmer
D: Dirty. This bit is set if a write has been performed to the page pointed to by the PTE. Dirty bits are used to determine if the
23 page should be written back to hard disk when the page is swapped out
A: Accessed. This bit is set if a read or write was performed to the page selected by the PDE and PTE
PCD: Cache Disable. This bit determines whether the current memory access is cached
PWT: Writethrough. This bit enables writethrough operations between the cache and memory
U: User.This bit is used when performing protection checks on the current memory address.
W: Writeable. This bit determines whether the page may be written to and is also used in protection checks
P:Present. This bit indicates whether the page is actually stored in memory.
Pentium uses the P bit to generate a page fault when attempting to access a page that is not in memory
Pentium provides protection for segmented and paged memory access
It is accomplished by comparing the privilege levels during address translation
Protecting Segmented Accesses
Pentium performs five different checks prior to any memory access using segment selectors
Type check used to determine whether the current memory access is allowed
Limit check uses 20 limit bits stored in the segment descriptor to guarantee that addresses outside the range of the segment are
not generated.
Addressable domain check - The addressable domain of a task is a function of the tasksCPL.
Procedure entry point check it is performed through the use of call gate which is used to control the transfer of execution
between procedures of different privilege levels.
DWORD count specifies the no. of double words to be transferred from callersstacktothestack in the new procedure.
Call gates are used by JMP and CALL instruction and reside in GDT or LDT
Privileged instruction check - some instructions are privileged and may only be executed when the CPL is 0

Page Level Protection

Protection for memory is performed after the protection checks for segmented address generation
Type check(reads and writes)
When R/W is zero, the page is read only When R/W is one available for
read and write
Addressable domain check(via privilege levels)
If CPL is 0, 1, 2 then supervisor level task and can access any page
If CPL is 3 then user level task can access only user page

Execution of multiple programs simultaneously
Each task executes for a period of time called time slice and then task switch is used to switch from one task to next


Task State Segment

During a task switch, the content of all processor registers are saved for the task being suspended and new information is loaded
for next task in a special memory structure called Task State Segment (TSS)


TSS Descriptors
TSS utilizes a descriptor that defines the various characteristics the segment will exhibit.
The individual bit fields are :
Base address: 32 bit segment base address Segment limit: 20 bit segment size limit G: Granularity
AVL: Segment available P: Segmen present
DPL: Descriptor privilege level B: Busy
Task Register
When multiple TSS descriptors exist in the GDT, the TSS currently in use is accessed through the use of the task register.
It is used as an index pointer into GDT to locate a TSS descriptor.
Task register contains two parts: a visible portion accessible by the programmer and an invisible portion that is automatically
loaded from associated TSS descriptor
The task register may be loaded with a new TSS selector with Load Task Register instruction.
The visible portion is read with Store Task Register instruction.

Task Gates
Task gates is stored in the LDT of a task or in a IDT.
Task gates allow a single busy bit to be used for a segment (available in TSS descriptor).
Even though many different tasks might have access to a segment through their respective task gates, only one TSS
descriptor is required for the segment.

Task Switching
Four ways:
The current task JMPs or CALLs a TSS descriptor
. The current task JMPs or CALLs a task gate
The current task executes an IRET when the NT flag is set. An interrupt or exception selects a task gate
Steps for task switch:
The new TSS descriptor or task gate must have sufficient privilege to allow a task switch
The new TSS descriptor must have its P bit set and have a valid limit field
The state of the current task is saved
The task register is loaded with the selector of the new TSS descriptor
The busy bit in the new TSS descriptor is set as is TS bit in CR0
The state of the new task is loaded from its TSS and execution is resumed


Exceptions and Interrupts

Interrupt Descriptor Table
Real mode uses a 1 KB interrupt vector table beginning at address 00000H
Each 4-byte entry in the IVT consists of a CS:IP pair that specifies the address of the first instruction in the interrupt service
An 8 bit vector number is shifted two bits to the left to form an index into the IVT
Protected mode relies on an interrupt descriptor table to support interrupts and exceptions.
IDT comprises of 8 byte descriptors for task, trap or interrupt gates
The IDT has a maximum size of 256 descriptors.
The size of the IDT is controlled by a 16 bit limit value stored in the interrupt table descriptor register (ITDR).
This 48 bit register contains 32 bit base address for the IDT and 16 bit size limit.

IDT Descriptors
There are 3 descriptors used within the IDT: task gates, trap gates and interrupt gates.
Interrupt and Exception Descriptions Classification of Interrupt/Exceptions


Vector8:Double Fault
Vector14:Page Fault

Addressing Modes
Immediate Addressing

Register Addressing
Direct Addressing

Register Indirect Addressing
Based Addressing Indexed Addressing


Based Indexed Addressing

Based Indexed with Displacement Addressing
String Addressing

Port Addressing

Instruction Set
Data Transfer Instruction
MOV Destination, source
34 MOVSX Destination, source (Move with sign extended)
MOVZX Destination, source (Move with sign extended)
PUSH Source (Push word onto stack)
PUSHW/PUSHD Source (Push word/Double word onto stack)
PUSHA/PUSHAD Source (Push All register/All double register onto stack)
POP Destination (Pop data off stack)
POPA/POPAD Destination (Pop All register/All double register )
IN Accumulator, Port (Input Byte or word from port)
IN Destination, DX (Input String from port)
OUT Port, Accumulator (Output Byte or word to port)
OUTS DX, Source (Output String to port)
LEA Destination, source (Load effective address)
PUSHF/PUSHFD (Push flags onto stack)
POPF/POPFD (Pop flag off stack)
XCHG Destination, source (Exchange data)
BSWAP Destination (byte swap)
XLAT (Translate Table)
LDS Destination, source (Load pointer using DS)
LES/LFS/LGS/LSS Destination, source (Load pointer using ES/FS/GS/SS)
LAHF (Load AH register from flags) SAHF (Store AH register from flags)
Arithmetic Instruction
ADD Destination, source (Add byte, word, double word)
ADC Destination, source (Add byte, word, double word with carry) INC Destination ( Increment byte,word,doubleword by1)
SUB Destination, source (Subtract byte, word, double word)
SBB Destination, source (Subtract byte, word, double word with borrow)
DEC Destination ( Decrement byte, word, double word by 1) CMP Destination, source (Compare byte, word,
double word) CMPXCHG Destination, source (Compare and exchange)
CMPXCHG8B Destination (Compare and exchange 8 bytes) MUL source (Multiply byte, word, double word
unsigned) IMUL source (Integer multiply byte, word, double word) DIV source (Divide byte, word, double word
IDIV source (Integer divide byte, word, double word) NEG Destination (Negate byte, word, double word) CBW (Convert
byte to word)
CWD/CWDE (Convert word to double word) CDQ (Convert doubleword to Quadword) DAA (Decimal Adjust
35 for addition)
DAS (Decimal Adjust for subtraction) AAA (ASCII Adjust for addition) AAS (ASCII Adjust for
subtraction) AAM (ASCII Adjust for multiply) AAD (ASCII Adjust for divide)
XADD (Exchange and Add byte, word, double word)

Logical Instructions
NOT Destination (NOT byte, word, double word)
AND Destination, Source (AND byte, word, double word)
OR Destination, Source (OR byte, word, double word)
XOR Destination, Source (Exclusive -OR byte, word, double word)
TEST Destination, Source (TEST byte, word, double word)
BSF/BSR Destination, Source (Bit scan Forward/Reverse)
BT/BTC/BTS/BTR Destination, Source (Bit Test and Complement/Set/Reset)
SETcc Destination, Source (Set Byte on Condition)
Bit Manipulation Instruction
SHL/SAL Destination, Count (Shift Logical/ Arithmetic Left Byte, Word or Double Word)
SHR Destination, Count (Shift Logical Right Byte, Word or Double Word)
SAR Destination, Count (Shift Arithmetic Right Byte, Word or Double Word)
SHLD/SHRD Destination, Source, Count (Shift Left/Right Double Precision)
ROL Destination, Count (Rotate Left Byte, Word or Double Word)
ROR Destination, Count (Rotate Right Byte, Word or Double Word)
RCL Destination, Count (Rotate Left through Carry Byte, Word or Double Word)
RCR Destination, Count (Rotate Right through Carry Byte, Word or Double Word)
Program Transfer Instructions
JMP Target
LOOP Short Label (Loop)
LOOPE/LOOPZ Short Label (Loop if Equal, Loop if Zero)
LOOPNE/LOOPNZ Short Label (Loop if not Equal, Loop if not Zero)
JCXZ/JECXZ Short Label (Jump if CX/ECX=0)
CALL Procedure Name (Call Procedure)
RET Optional Pop Value (Return from Procedure)
INT Interrupt Type (Interrupt)
36 IRET/IRETD (Interrupt Return)
INTO (Interrupt on Overflow)
BOUND Index, Range (Check Array Index Against Bounds)
ENTER/LEAVE (Enter /Leave a Procedure)
JA (Jump if Above) ( CF or ZF) = 0
JAE (Jump if Above or Equal) CF = 0
JB (Jump if Below) CF = 1
JBE (Jump if Below or Equal) ( CF or ZF) = 1
JBNE (Jump if not Below or not Equal) ( CF or ZF) = 0
JC (Jump if Carry) CF = 1
JE (Jump if Equal) ZF = 1
JG (Jump if Greater) (( SF xor OF) or ZF) = 0
JGE (Jump if Greater or Equal) ( SF xor OF) = 0
JL (Jump if Less) ( SF xor OF) = 1
JLE (Jump if Less or Equal) (( SF xor OF) or ZF) = 1
JNA (Jump if not Above) ( CF or ZF) = 1
JNAE (Jump if not Above nor Equal) CF = 1
JNB (Jump if not below) CF = 0
JNC (Jump if no carry) CF = 0
JNE (Jump if not Equal) ZF = 0
JNG (Jump if not Greater) (( SF xor OF) or ZF) = 1
JNGE (Jump if not Greater nor Equal) ( SF xor OF) = 1
JNL (Jump if not Less) ( SF xor OF) = 0
JNLE (Jump if not Less nor Equal) (( SF xor OF) or ZF)= 0
JNO (Jump if no Overflow) OF = 0
JNP (Jump if no Parity) PF = 0
JNS (Jump if no Sign) SF = 0
JNZ (Jump if not Zero) ZF = 0
JO (Jump if Overflow) OF = 1
JP (Jump if Parity) PF = 1
JPE (Jump if Parity Even) PF = 1
JPO (Jump if Parity Odd) PF = 0
JS (Jump if Sign) SF = 1
JZ (Jump if Zero) ZF = 1
Process Control Instructions
CLC (Clear Carry Flag)
STC (Set Carry Flag)
CMC (Compliment Carry Flag)
CLD (Clear Direction Flag)
STD (Set Direction Flag)
CLI (Clear Interrupt Enable Flag)
STI (Set Interrupt Enable Flag)
HLT (Halt until Interrupt or Reset)
NOP (No Operation)
LOCK (Lock Bus During Next Instruction)
ESC (Escape to external Processor)
WAIT (Wait for TEST pin activity)
ARPL (Adjust Requested Privilege Level)
CLTS (Clear Task Switched Flag)
LAR (Load Access Rights)
LGDT (Load Global Descriptor Table)
LIDT (Load Interrupt Descriptor Table)
LLDT (Load Local Descriptor Table)
LMSW (Load Machine Status Word)
LSL (Load Segment Limit)
LTR (Load Task Register)
SGDT (Store Global Descriptor Table)
SIDT (Store Interrupt Descriptor Table)
SLDT (Store Local Descriptor Table)
SMSW (Store Machine Status Word)
STR (Store Task Register)
VERR (Verify Segment for Reading)
VERW (Verify Segment for Writing)
INVD (Invalidate Cache)
INVLPG (Invalidate TLB Entry)
WBINVD (Writeback and Invalidate Cache)
CPUID (CPU Identification)
RDMSR (Read from Model-Specific Register)
RDTSC (Read from Time Stamp Counter)
RSM (Resume from System Management Mode)
WRMSR (Write from Model-Specific Register)
String Instructions
MOVS Destination - string, Source string (Move String)
CMPS Destination - string, Source string ( Compare String) SCAS Destination string (Scan String)
LODS Source string (Load String) STOS Destination string (Store String)