You are on page 1of 83

Arithmetic Coprocessor

Coprocessor basic:

• The 80x87 is able to multiply, divide, add, subtract, find the


sqrt and calculate transcendental functions and logarithms.

• Data types include


- 16-, 32- and 64-bit signed integers
- 18-digit BCD data and
- 32-,64- and 80-bit (extended precision) floating-point
numbers.

• The operation performed by the 80x87 generally executes


much faster than equivalent operation written in
microprocessor normal instruction.

Advanced Microprocessor 1
Arithmetic Coprocessor

Data Formats for the Arithmetic Coprocessor:

Signed Integers:-
• 16 bit ( word ) – range -32768 to +32767
• 32bit ( short integer ) – range -2x10+9 to + 2x10+9
• 64 bit ( long integer ) – range -9x10+18 to +9x10+18

3 forms of signed integers-


15 0

s magnitude
31 0

s magnitude

s magnitude
63 0

Advanced Microprocessor 2
Arithmetic Coprocessor

• The directives dw, dd and dq are used for declaring signed


integer storage

- dw to define word
- dd to define short integer
- dq to define long integer

 for every microprocessor their will be a coprocessor

8086 8087
8088 8087
80186 80187

& so on

Advanced Microprocessor 3
Arithmetic Coprocessor
Binary Coded Decimal ( BCD ):-

• BCD form requires 80 bits of memory.

• Each number is stored as an 18-digit packed integer in 9 bytes


of memory as 2 digit per byte, 10th byte for sign bit.
S 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

• Both positive & negative numbers are stored in true form


ex :
DATA1 DT 20 ; 20 as bcd
00 00 00 00 00 00 00 00 00 20
DATA2 DT -220 ; -220 as bcd
80 00 00 00 00 00 00 00 02 20
DATA3 DT 50000 ; 50000 as bcd
00 00 00 00 00 00 00 05 00 00
Advanced Microprocessor 4
Arithmetic Coprocessor

Floating point:-

• Hold signed integers, fractions & mixed numbers.

• Floating point numbers has 3 parts

- Sign bit
- Biased exponent
- Significand

• Intel family arithmetic coprocessor supports 3 types of


floating point numbers

- Short (32 bit) : single precision, with a bias of 7FH


- Long (64 bit) : double precision, with a bias of 3FFH
- Temporary (80 bit) : extended precision, with a bias of
3FFFH
Advanced Microprocessor 5
Arithmetic Coprocessor

31 30 23 22 0

s exp fraction
63 62 52 51 0

s exp fraction
79 78 64 63 0

s exp 1 fraction

Converting Decimal to Floating-point form:

- Convert the decimal number into binary.


- Normalize the binary number.
- Calculate the biased exponent.
- Store the number in the floating-point format.

Advanced Microprocessor 6
Arithmetic Coprocessor
Ex : convert decimal to floating-point

 100.2510
1 - convert to binary
100 ->1100100
.25 -> 01
1100100.01
2 - normalize binary
1100100.01 = 1.10010001x26
3 - calculate bias expo
7FH(127) for single precision
add expo with precision
110 + 01111111 ( 6 + 127)
10000101
4 - floating-point number
sign -> 0
expo -> 10000101
significand -> 10010001000000000000000
Advanced Microprocessor 7
Arithmetic Coprocessor

 -100.2510
1- convert to binary
100 ->1100100
.25 -> 01
-1100100.01
2 - normalize binary
-1100100.01 = -1.10010001x26
3 - calculate bias expo
7FH(127) for single precision
add expo with precision
110 + 01111111 ( 6 + 127)
10000101
4 - floating-point number
sign -> 1
expo -> 10000101
significand -> -10010001000000000000000
Advanced Microprocessor 8
Arithmetic Coprocessor
Special Rules:

- The number 0 is stored as all 0s (except for the sign bit).


- +/- infinity is stored as logic 1s in the exponent, with a

significand of all 0s. Sign bit is used to represent +/-


infinity. - A NAN (not-a-number) is an invalid floating-point
result that
has all 1s in the exponent with a Significand that is NOT all
zeros.

Converting Floating-point to Decimal:

- Separate the sign-bit, biased exponent and significand.


- Convert the biased exponent into a true exponent by
subtracting the bias.
- Write the number as a normalized binary number.
- Convert it to a de-normalized binary number.
- Convert the de-normalized binary number to decimal.
Advanced Microprocessor 9
Arithmetic Coprocessor

Ex: convert floating-point to decimal:


 1- separate the
sign = 0
expo = 10000011
significand = 10010010000000000000000
2 - convert the biased to true expo
100 <- 10000011 – 01111111 ( 7FH , 127 for single preci)
3 - normalized binary number
1.1001001 x 24
4 - convert to de-normalized binary number
11001.001
5 - convert into decimal
25.125

Advanced Microprocessor 10
Arithmetic Coprocessor

 1 - separate the
sign = 1
expo = 10000011
significand = 10010010000000000000000
2 - convert the biased to true expo
100 <- 10000011 – 01111111 ( 7FH , 127 for single preci)
3 - normalized binary number
1.1001001 x 24
4 - convert to de-normalized binary number
11001.001
5 - convert into decimal
-25.125

Advanced Microprocessor 11
Arithmetic Coprocessor

The 8087 Architecture:

• 8087 designed to operate concurrently with microprocessor

• 8087 executes 68 different instructions

• Both microprocessor & coprocessor can execute their


respective instruction simultaneously or concurrently

• The numeric or arithmetic coprocessor is a special purpose


microprocessor, especially designed to execute arithmetic &
transcendental operation

• Microprocessor intercepts & executes normal instruction


set, Coprocessor intercepts & executes its instruction

Advanced Microprocessor 12
Arithmetic Coprocessor
Internal Structure of the 80x87:

Status
Address

Advanced Microprocessor 13
Arithmetic Coprocessor
• Control unit ( CU ):- interface the coprocessor to the
microprocessor data bus. if instruction is ESC then coprocessor
executes, if not microprocessor will executes it.

• Numeric execution unit ( NEU ) :-


- Unit is responsible for executing all coprocessor instruction

- Has 8 register stack, hold arithmetic instruction & results

- Also other register status, tag, control & exception pointers

- Stack within the coprocessor contain 8-registers each 80


bits wide, contain 80 bit extended-precision floating-point
number

- Coprocessors converted data are moved between memory &


coprocessor register stack.
Advanced Microprocessor 14
Arithmetic Coprocessor

Status register:
• Reflects overall operation of the coprocessor.

• Coprocessor is accessed by executing, FSTSW


instructions which stores the content of status register into
word of memory

• The coprocessor/microprocessor communications are


carried out thru I/O ports

B – busy bit: indicate coprocessor is busy, can be checked


by testing status register or by FWAIT instruction

Advanced Microprocessor 15
Arithmetic Coprocessor
• C3 to C0 – condition code bit : indicate the condition of the
coprocessor

• TOP - top of stack (ST) : bit indicate the current register


address as the top of stack

• ES – error summary : bit is set if any unmasked error bit


(PE,UE, OE, ZE, IE ) is set. In 8087 coprocessor the error
summary also caused a coprocessor interrupt

• PE – precision error : result exceed the precision

• UE – underflow : non-zero result , which is too small to


represent it current precision selected

• OE – overflow : result is too large. If error is masked,


coprocessor enters infinite time
Advanced Microprocessor 16
Arithmetic Coprocessor
• ZE – zero error : divisor is zero, dividend is a non infinity or non
zero number.

• DE – denormalized error : least one of the operands is


denormalized

• IE – invalid error : indicate stack underflow/overflow,


indeterminate form or the use of a NAN as an operand. Sqrt of –
ve number

Control register:
- selects – precision, rounding control & infinity control

- masks & unmasked the exception bits that corresponds


to the rightmost 6 bits of the status register

- FLDCW instruction is used to load a value onto the


control register Advanced Microprocessor 17
Arithmetic Coprocessor

Infinity control Invalid


0 – projective Operation
1 – affine mask
Precision control
Denormalized
00 – single
Operand mask
Rounding control 01 – reserved
00 – round nearest or even 10 – double
Division by zero
01 – round down towards 11 - extended
mask
minus infinity
10 – round up towards plus
infinity
11 – chop or truncate
towards zero

Advanced Microprocessor 18
Arithmetic Coprocessor
• IC – infinity control : affine allows +ve or –ve infinity & projective
assumes infinity is unsigned

• RC – rounding control : determine the type of rounding

• PC – precision control : sets the precision of the results

• Exception masks : check error indicated by the exception affects


the error bit in the status register , if logic 1 present in the one of
the exception control bits , corresponding bit in the status register
is masked off

Advanced Microprocessor 19
Arithmetic Coprocessor

fdiv DATA1
fstsw ax ;Copy status reg to AX
test ax, 4 ;Test bit position 2
jnz DIVIDE_ERROR
fcom DATA1 ;Compare DATA1 to ST0 and set status.
fstsw ax
sahf ;Copy status bits to flags.
je ST_EQUAL
jb ST_BELOW
ja ST_ABOVE

Advanced Microprocessor 20
Arithmetic Coprocessor

Tag register :

-Indicates the contents of each location in the coprocessor stack

- program can view the tag register by storing the coprocessor


Environment using FSTENV, FSAVE, FRSTOR

- 00 – VALID, 01 – ZERO, 10 – INVALID or INIFINITY, 11 - EMPTY

TAG 7 TAG 6 TAG 5 TAG 4 TAG 3 TAG 2 TAG 1 TAG 0

Advanced Microprocessor 21
Arithmetic Coprocessor
Instruction Set:

• executes over 68 different instructions

• coprocessor uses the data bus for data transfer during


coprocessor instruction , microprocessor uses during
normal instruction

Types of instruction :-
- data transfer instructions
- arithmetic instructions
- comparison instructions
- transcendental operations
- constant operation
- coprocessor control instructions

Advanced Microprocessor 22
Arithmetic Coprocessor

i) Data transfer instruction:


- floating-point
- signed-integer
- BCD
- pentium pro thru pentium4 FCMOV instruction

coprocessor stores the data in 80-bit extended precision floating


point number

Floating – point data transfer

FLD (Load Real) :


- Loads floating-point data to Stack Top (ST).
- Stack pointer is then decremented by 1.
- Data can be retrieved from memory, or another stack
position.

Advanced Microprocessor 23
Arithmetic Coprocessor

Ex :
FLD st2 ;Copies contents of register two to ST
top of the stack is register 0 when coprocessor is reset
or initialized

FLD data7 ;copies the content memory location data7 to


the ;top of stack
size of the transfer is automatically determined by the
assembler thru directives

FST ( store real) :

- Stores a copy of the top of the stack into memory or


another coprocessor register.
- Rounding occurs when the storage operation
completes according to the control register
- copy instruction
Advanced Microprocessor 24
Arithmetic Coprocessor
FSTP ( floating point store and pop)
- Stores a copy of the top of the stack into memory or
another coprocessor register
- pop the data from the top of stack
- a removal instruction

FXCH ( exchange )
- exchanges the content of register with top of stack

ex : FXCH st2 ; exchanges top of the stack with register 2

Integer data transfer instruction

- FILD ( load integer)


- FIST ( store integer)
- FISTP ( store integer and pop)
While transferring the data , coprocessor automatically converts
extended floating-point number to integer data.
Advanced Microprocessor 25
Arithmetic Coprocessor
BCD data transfer instruction

- FBLD – loads the top of stack with BCD memory data


- FBSTP – stores top of the stack and does a pop

Pentium pro thru pentium4 instruction


FCMOV
- contains condition
- if condition true, copies the source to destination
- condition are checked for either an ordered or
unordered
- testing for NAN and denormalized numbers are not
checked
FCMOVB - move if below, FCMOVE - move if equal
FCMOVBE - move if below or equal, FCMOVU - move if unordered
FCMOVNB - move if not below, FCMOVNE - move if not equal
FCMOVNBE - move if not below or equal, FCMOVNU - move if not
ordered
Advanced Microprocessor 26
Arithmetic Coprocessor
ii) Arithmetic instruction:

- addition, subtraction, multiplication, division,


calculating square roots
- arithmetic related – scaling, rounding, absolute value,
changing sign
-Stack addressing mode
Addressing modes is restricted to use ST
Mode Form Example (stack top) and ST1.
Stack ST(1),ST FADD
-The source operand is
Register ST,ST(n) FADD ST,ST(2)
ST while the destination
ST(n),ST FADD ST(2),ST
operand is ST1.
Register pop ST(n),ST FADDP ST(3),ST

Memory operand FADD data2 -After the operation, the


source is popped,
leaving the dest. at ST.
Advanced Microprocessor 27
Arithmetic Coprocessor

Stack addressing mode,

• stack, uses top of the stack as the source operand & next to the
top as destination.

• later, top is popped out, result is the top of the stack


ex :
FADD – adds ST and ST1, result will store in ST1
FSUB – subtract ST from ST1, result will be ST,
FSUBR, reverse instruction – subtracts ST1 from ST, result
in ST
to compute reciprocal
FDIVR – result stored in ST

Advanced Microprocessor 28
Arithmetic Coprocessor

Register addressing mode,

• MUST use ST as one of the operands.

• The other operand can be any register, including ST0 which is ST.
Note that the destination can be either ST or STn.

• unlike stack addressing, non-popping versions can be used.

Memory addressing mode,

• always uses ST as the destination, coprocessor stack oriented


Machine

Advanced Microprocessor 29
Arithmetic Coprocessor
Arithmetic operation,

The following letters are used to additionally qualify the


operation:

• P: Perform a register pop after the operation, FADD and


FADDP.

• R: Reverse mode for subtraction and division.

• I: Indicates that the memory operand is an integer. I appears as


the second letter in the instruction, e.g., FIADD, FISUB, FIMUL,
FIDIV.

Advanced Microprocessor 30
Arithmetic Coprocessor

Arithmetic related operations,

• FSQRT: Finds the square root of operand at ST. Leave result there.
Check IE bit for an invalid result, e.g., the operand was negative
using FSTSW AX, and TEST AX, 1.

• FSCALE: Adds contents of ST1 (interpreted as an integer) to the


exponent of ST. value of ST must be between 2-15 and 2+15

• FPREM1: Performs modulo division of ST by ST1. The resultant


remainder is found at ST.

• FRNDINT: Rounds ST to an integer.

• FXTRACT: Decomposes ST into an unbiased exponent and a


significand. Extracted significand is at ST and unbiased exponent at
ST1.
Advanced Microprocessor 31
Arithmetic Coprocessor

• FABS: Change sign of ST to positive.

• FCHS: Invert sign of ST.

iii) Comparison instruction:

-Instruction examines the data at the top of the stack with other,
return the result of the comparison in status register condition
code c3 to c0 .

• FCOM: Compares ST with an memory or register operand. FCOM


by itself compares ST and ST1.

• FCOMP/FCOMPP: Compare and pop once or twice.

• FICOM/FICOMP: Compare ST with integer memory operand and


optionally pop the stack.
Advanced Microprocessor 32
Arithmetic Coprocessor

• FTST: Compare ST with 0.0.

• FXAM: Exam ST and modify CC bits to indicate whether contents


are positive, negative, normalized, etc.

• FCOMI/FUCOMI: pentium’s, same as FCOM, has one additional


feature moves the floating point flags register to flag register
FNSTSW AX, and SAHF.

iv) Transcendental operations

• FPTAN – finds partial tangent of y/x = tanθ, θ value on top of the


stack must be between 0 and n/4 for 87 & 287 , must less than 263
for 387 – pentium4

• FPATAN – partial arctangent θ


Advanced Microprocessor 33
Arithmetic Coprocessor
•F2XM1: Compute 2x -1
Function equation
10y 2y x log2 10

εy 2y x log2 ε
xy 2y x log2 x

•FSIN/FCOS : sin or cosine , result found in ST

•FSINCOS : sin & cosine, ST – sine & ST1 – cosine

•FYL2X: Compute Ylog2X, X – ST & Y – ST1, result on top of the


stack, X range between 0 and infinity & Y range between
•-infinity and 0

•FYL2XP1: Compute Ylog2(X + 1)


Advanced Microprocessor 34
Arithmetic Coprocessor
V - Constant operation

• coprocessor instruction set include opcodes that return


constants to the top of the stack.
- FLDZ: Store +0.0 to ST.

- FLD1: Store +1.0 to ST.

- FLDPI: Store pi to ST.

- FLDL2T: Store log210 to ST.

- FLDL2E: Store log2e to ST.

- FLDLG2: Store log10 2 to ST.

- FLDLN2: Store loge2 to ST.


Advanced Microprocessor 35
Arithmetic Coprocessor

VI . Coprocessor Control instruction

-Control instruction for initialization, exception handling & task


switching

 FINIT/ FNINIT : performs a reset operation, sets register0 as


top of the stack
round, busy,

 FSETPM : changes the addressing mode of the coprocessor


to the protected addressing mode

 FLDCW : loads the control register with the word addressed


by the operands

FSTCW/FNSTCW : store the control register into the word


sized memory operand
Advanced Microprocessor 36
Arithmetic Coprocessor
• FSTSW AX/ FNSTSW AX : copies the contents of the control
register to AX ( not for 8087)

• FCLEX/FNCLEX : clear the error flags in the status register


and also busy flag

• FSAVE/FNSAVE : writes the entire state of the machine to


memory

• FRSTOR : restores the state of the machine from memory

• FSTENV/FNSTENV : stores the environment of the


coprocessor – real mode or protected mode

• FLDENV : reloads the environment

• FINCST : increments the stack pointer FDECSTP : decrement the stack


pointer Advanced Microprocessor 37
Arithmetic Coprocessor

• FFREE : frees a register content

• FNOP : floating point coprocessor NOP

• FWAIT : causes the microprocessor to wait for the


coprocessor to finish an operation, it should be used before
the microprocessor access memory data that are affect by the
coprocessor

Advanced Microprocessor 38
Arithmetic Coprocessor

Coprocessor instruction:

- lists of the instruction for all coprocessor from 8087 thru


pentium 4, with number of clocking periods required to
execute each instruction.

General:

reg = floating point register, st(0), st(1) ... st(7)


Mem = memory address
mem32 = memory address of 32-bit item
mem64 = memory address of 64-bit item
mem80 = memory address of 80-bit item

Advanced Microprocessor 39
Arithmetic Coprocessor

FX = pairs with FXCH


NP = no pairing

Instruction clock cycles

• F2XM1 Compute 2x-1

8087 287 387 486 Pentium


310-630 310 -630 211-476 140-279 13-57 NP

• FABS Absolute value

8087 287 387 486 Pentium


10-17 10-17 22 3 1 FX

Advanced Microprocessor 40
Arithmetic Coprocessor

• FADD Floating point add


• FADDP Floating point add and pop
variations/
operand 8087 287 387 486 Pentium
fadd 70-100 70-100 23-34 8-20 3/1 FX
fadd mem32 90-120 90-120 24-32 8-20 3/1 FX
+EA
fadd mem64 95-125 95-125 29-37 8-20 3/1 FX
+EA
faddp 75-105 75-105 23-31 8-20 3/1 FX

• FBLD Load BCD


operand 8087 287 387 486 Pentium
mem (290-310) 290-310 266-275 70-103 48-58 NP
+EA

Advanced Microprocessor 41
Arithmetic Coprocessor

• FBSTP Store BCD and pop


8087 287 387 486 Pentium
(520-540)+EA 520-540 512-534 172-176 148-154 NP

• FCHS Change sign


8087 287 387 486 Pentium
10-17 10-17 24-25 6 1 FX

• FNCLEX Clear exceptions, no wait


variations 8087 287 387 486 Pentium
fclex 2-8 2-8 11 7 9 NP
fnclex 2-8 2-8 11 7 9 NP
The wait version may take additional cycles

Advanced Microprocessor 42
Arithmetic Coprocessor

• FCOM Floating point compare


• FCOMP Floating point compare and pop
• FCOMPP Floating point compare and pop twice
variations/
operand 8087 287 387 486 Pentium
fcom reg 40-50 40-50 24 4 4/1 FX
fcom mem32 (60-70) 60-70 26 4 4/1 FX
+EA
fcom mem64 (65-75) 65-75 31 4 4/1 FX
+EA
fcomp 42-52 42-52 26 4 4/1 FX
fcompp 45-55 45-55 26 5 4/1 FX

FCOS Floating point cosine (387+)


8087 287 387 486 Pentium
- - 123-772 257-354 18-124 NP
Additional cycles required if operand > pi/4 (~3.141/4 =~.785)
Advanced Microprocessor 43
Arithmetic Coprocessor

•FDISI Disable interrupts (8087 only, others do fnop)


•FNDISI Disable interrupts, no wait (8087 only, others do fnop)
variations 8087 287 387 486 Pentium
fdisi 2-8 2 2 3 1 NP
fndisi 2-8 2 2 3 1 NP
The wait version may take additional cycles

•FDIV Floating divide


•FDIVP Floating divide and pop
variations/
operand 8087 287 387 486 Pentium
fdiv reg 193-203 193-203 88-91 73 39 FX
fdiv mem32 (215-225) 215-225 89 73 39 FX
+EA
fdiv mem64 (220-230) 220-230 94 73 39 FX
+EA
fdivp 197-207 197-207 91 73 39 FX
Advanced Microprocessor 44
Arithmetic Coprocessor

•FDIVR Floating divide reversed


•FDIVRP Floating divide reversed and pop
variations/
operand 8087 287 387 486 Pentium
fdivr reg 194-204 194-204 88-91 73 39 FX
fdivr mem32 (216-226) 216-226 89 73 39 FX
+EA
fdivr mem64 (221-231) 221-231 94 73 39 FX
+EA
fdivrp 198-208 198-208 91 73 39 FX

•FENI Enable interrupts (8087 only, others do fnop)


•FNENI Enable interrupts, nowait (8087 only, others do fnop)
Variations 8087 287 387 486 Pentium
feni 2-8 2 2 3 1 NP
fneni 2-8 2 2 3 1 NP

Advanced Microprocessor 45
Arithmetic Coprocessor

• FFREE Free register


8087 287 387 486 Pentium
9-16 9-16 18 3 1 NP

• FIADD Integer add


operand 8087 287 387 486 Pentium
Mem16 (102-137) 102-137 71-85 20-35 7/4 NP
+EA
mem32 (108-143) 108-143 57-72 19-32 7/4 NP
+EA

•FINIT Initialize floating point processor


•FNINIT Initialize floating point processor, no wait
variations 8087 287 387 486 Pentium
finit 2-8 2-8 33 17 16 NP
fninit 2-8 2-8 33 17 12 NP
The wait version may take additional cycles
Advanced Microprocessor 46
Arithmetic Coprocessor

•FICOM Integer compare


•FICOMP Integer compare and pop
variations/
operand 8087 287 387 486 Pentium
ficom mem16 (72-86) 72-86 71-75 16-20 8/4 NP
+EA
ficom mem32 (78-91) 78-91 56-63 15-17 8/4 NP
+EA
ficomp mem16 (74-88) 74-88 71-75 16-20 8/4 NP
+EA
ficomp mem32 (80-93) 80-93 56-63 15-17 8/4 NP
+EA

• FIMUL Integer multiply


Operand 8087 287 387 486 Pentium
mem16 (124-138) 124-138 76-87 23-27 7/4 NP
+EA
mem32 (130-144) 130-144 61-82 22-24 7/4 NP
+EA Advanced Microprocessor 47
Arithmetic Coprocessor

•FIDIV Integer divide


•FIDIVR Integer divide reversed
variations/
operand 8087 287 387 486 Pentium
fidiv mem16 (224-238) 224-238 136-140 85-89 42 NP
+EA
fidiv mem32 (230-243) 230-243 120-127 84-86 42 NP
+EA
fidivr mem16 (225-239)225-239 135-141 85-89 42 NP
+EA
fidivr mem32 (231-245) 231-245 121-128 84-86 42 NP
+EA

• FILD Load integer


operand 8087 287 387 486 Pentium
mem16 (46-54)+EA 46-54 61-65 13-16 3/1 NP
mem32 (52-60)+EA 52-60 45-52 9-12 3/1 NP
mem64 (60-68)+EA 60-68Advanced
56-67 10-18
Microprocessor 3/1 NP 48
Arithmetic Coprocessor
•FIST Store integer
•FISTP Store integer and pop
variations/
operand 8087 287 387 486 Pentium
fist mem16 (80-90)+EA 80-90 82-95 29-34 6 NP
fist mem32 (82-92)+EA 82-92 79-93 28-34 6 NP
fistp mem16 (82-92)+EA 82-92 82-95 29-34 6 NP
fistp mem32 (84-94)+EA 84-94 79-93 28-34 6 NP
fistp mem64 (94-105)+EA 94-105 80-97 28-34 6 NP

•FISUB Integer subtract


•FISUBR Integer subtract reversed
variations/
Operand 8087 287 387 486 Pentium
fisub mem16 (102-137)+EA 102-137 71-85 20-35 7/4 NP
fisubr mem32 (108-143)+EA 108-143 57-82 19-32 7/4 NP

Advanced Microprocessor 49
Arithmetic Coprocessor
• FINCSTP Increment floating point stack pointer
8087 287 387 486 Pentium
6-12 6-12 21 3 1 NP

• FLD Floating point load


operand 8087 287 387 486 Pentium
reg 17-22 17-22 14 4 1 FX
mem32 (38-56)+EA 38-56 20 3 1 FX
mem64 (40-60)+EA 40-60 25 3 1 FX
mem80 (53-65)+EA 53-65 44 6 3 NP
Load floating point constants

• FLDCW Load control word


operand 8087 287 387 486 Pentium
mem16 (7-14)+EA 7-14 19 4 7 NP

Advanced Microprocessor 50
Arithmetic Coprocessor

•FLDZ Load constant onto stack, 0.0


•FLD1 Load constant onto stack, 1.0
•FLDL2E Load constant onto stack, logarithm base 2 (e)
•FLDL2T Load constant onto stack, logarithm base 2 (10)
•FLDLG2 Load constant onto stack, logarithm base 10 (2)
•FLDLN2 Load constant onto stack, natural logarithm (2)
•FLDPI Load constant onto stack, pi (3.14159...)

variations 8087 287 387 486 Pentium


fldz 11-17 11-17 20 4 2 NP
fld1 15-21 15-21 24 4 2 NP
fldl2e 15-21 15-21 40 8 5/3 NP
fldl2t 16-22 16-22 40 8 5/3 NP
fldlg2 18-24 18-24 41 8 5/3 NP
fldln2 17-23 17-23 41 8 5/3 NP
fldpi 16-22 16-22 40 8 5/3 NP

Advanced Microprocessor 51
Arithmetic Coprocessor

•FLDENV Load environment state


operand 8087 287 387 486 Pentium
mem (35-45)+EA 35-45 71 44/34 37/32-33 NP
cycles for real mode/protected mode

•FMUL Floating point multiply


•FMULP Floating point multiply and pop
variations/
operand 8087 287 387 486 Pentium
fmul reg s 90-105 90-105 29-52 16 3/1 FX
fmul reg 130-145 130-145 46-57 16 3/1 FX
fmul mem32 (110-125)+EA 110-125 27-35 11 3/1 FX
fmul mem64 (154-168)+EA 154-168 32-57 14 3/1 FX
fmulp reg s 94-108 94-108 29-52 16 3/1 FX
fmulp reg 134-148 134-148 29-57 16 3/1 FX
s = register with 40 trailing zeros in fraction
Advanced Microprocessor 52
Arithmetic Coprocessor
•FNOP no operation
8087 287 387 486 Pentium
10-16 10-16 12 3 1 NP

•FPATAN Partial arctangent


8087 287 387 486 Pentium
250-800 250-800 314-487 218-303 17-173

•FPREM Partial remainder


•FPREM1 Partial remainder (IEEE compatible, 387+)
Variations 8087 287 387 486 Pentium
fprem 15-190 15-190 74-155 70-138 16-64 NP
fprem1 - - 95-185 72-167 20-70 NP

•FPTAN Partial tangent


8087 287 387 486 Pentium
30-540 30-540 191-497 200-273 17-173 NP
Additional cycles required if operand > pi/4 (~3.141/4 =~.785)
Advanced Microprocessor 53
Arithmetic Coprocessor

•FRNDINT Round to integer


8087 287 387 486 Pentium
16-50 16-50 66-80 21-30 9-20 NP

•FRSTOR Restore saved state


variations/
Operand 8087 287 387 486 Pentium
frstor mem (197-207)+EA 197-207 308 131/120 75-95/70 NP
frstorw mem - - 308 131/120 75-95/70 NP
frstord mem - - 308 131/120 75-95/70 NP

cycles for real mode/protected mode

Advanced Microprocessor 54
Arithmetic Coprocessor
•FSAVE Save FPU state
•FSAVEW Save FPU state, 16-bit format (387+)
•FSAVED Save FPU state, 32-bit format (387+)
•FSAVE Save FPU state, no wait
•FSAVEW Save FPU state, no wait, 16-bit format (387+)
•FSAVED Save FPU state, no wait, 32-bit format (387+)

variations 8087 287 387 486 Pentium


fsave (197-207)+EA 197-207 375-376 154/143 127-151/124 NP
fsavew 375-376 154/143 127-151/124 NP
fsaved 375-376 154/143 127-151/124 NP
fnsave (197-207)+EA 197-207 375-376 154/143 127-151/124 NP
fnsavew 375-376 154/143 127-151/124 NP
Fnsaved 375-376 154/143 127-151/124 NP
Cycles for real mode/protected mode
The wait version may take additional cycles

Advanced Microprocessor 55
Arithmetic Coprocessor

•FSCALE Scale by factor of 2


8087 287 387 486 Pentium
32-38 32-38 67-86 30-32 20-31 NP

FSETPM Set protected mode (287 only, 387+ = fnop)


8087 287 387 486 Pentium
- 2-8 12 3 1 NP

•FSIN Sine (387+)


•FSINCOS Sine and cosine (387+)
variations 8087 287 387 486 Pentium
fsin - - 122-771 257-354 16-126 NP
fsincos - - 194-809 292-365 17-137 NP
Additional cycles required if operand > pi/4 (~3.141/4 = ~.785)
•FSQRT Square root
8087 287 387 486 Pentium
180-186 180-186 122-129 83-87 70 NP
Advanced Microprocessor 56
Arithmetic Coprocessor

•FST Floating point store


•FSTP Floating point store and pop

variations/
Operand 8087 287 387 486 Pentium
fst reg 15-22 15-22 11 3 1 NP
fst mem32 (84-90)+EA 84-90 44 7 2 NP
fst mem64 (96-104)+EA 96-104 45 8 2 NP
fstp reg 17-24 17-24 12 3 1 NP
fstp mem32 (86-92)+EA 86-92 44 7 2 NP
fstp mem64 (98-106)+EA 98-106 45 8 2 NP
fstp mem80 (52-58)+EA 52-58 53 6 3 NP

Advanced Microprocessor 57
Arithmetic Coprocessor

•FSTCW Store control word


•FNSTCW Store control word, no wait
variations/
operand 8087 287 387 486 Pentium
fstcw mem 12-18 12-18 15 3 2 NP
fnstcw mem 12-18 12-18 15 3 2 NP
The wait version may take additional cycles

Advanced Microprocessor 58
Arithmetic Coprocessor

•FSTENV Store FPU environment


•FSTENVW Store FPU environment, 16-bit format (387+)
•FSTENVD Store FPU environment, 32-bit format (387+)
•FNSTENV Store FPU environment, no wait
•FNSTENVW Store FPU environment, no wait, 16-bit format (387+)
•FNSTENVD Store FPU environment, no wait, 32-bit format (387+)
variations/
operand 8087 287 387 486 Pentium
fstenv mem (40-50)+EA 40-50 103-104 67/56 48-50 NP
fstenvw mem 103-104 67/56 48-50 NP
fstenvd mem 103-104 67/56 48-50 NP
fnstenv mem (40-50)+EA 40-50 103-104 67/56 48-50 NP
fnstenvw mem 103-104 67/56 48-50 NP
fnstenvd mem 103-104 67/56 48-50 NP
Cycles for real mode/protected mode
The wait version may take additional cycles

Advanced Microprocessor 59
Arithmetic Coprocessor
•FSTSW Store status word
•FNSTSW Store status word, no wait
variations/
operand 8087 287 387 486 Pentium
fstsw mem 12-18 12-18 15 3 2 NP
fstsw ax - 10-16 13 3 2 NP
fnstsw mem 12-18 12-18 15 3 2 NP
fnstsw ax - 10-16 13 3 2 NP
The wait version may take additional cycles

•FSUB Floating point subtract


•FSUBP Floating point subtract and pop
variations/
operand 8087 287 387 486 Pentium
fsub reg 70-100 70-100 26-37 8-20 3/1 FX
fsub mem32 (90-120)+EA 90-120 24-32 8-20 3/1 FX
fsub mem64 (95-125)+EA 95-125 28-36 8-20 3/1 FX
fsubp reg 75-105 75-105 26-34 8-20 3/1 FX
Advanced Microprocessor 60
Arithmetic Coprocessor
•FSUBR Floating point reverse subtract
•FSUBRP Floating point reverse subtract and pop
variations/
operand 8087 287 387 486 Pentium
fsubr reg 70-100 70-100 26-37 8-20 3/1 FX
fsubr mem32 (90-120)+EA 90-120 24-32 8-20 3/1 FX
fsubr mem64 (95-125)+EA 95-125 28-36 8-20 3/1 FX
fsubrp reg 75-105 75-105 26-34 8-20 3/1 FX

FTST Floating point test for zero


8087 287 387 486 Pentium
38-48 38-48 28 4 4/1 FX

FWAIT Wait while FPU is executing


8087 287 387 486 Pentium
4 3 6 1-3 1-3 NP

Advanced Microprocessor 61
Arithmetic Coprocessor
•FXAM Examine condition flags
8087 287 387 486 Pentium
12-23 12-23 30-38 8 21 NP

FXCH Exchange floating point registers


8087 287 387 486 Pentium
10-15 10-15 18 4 0-1 *
• * FCXH is pairable in the V pipe with all FX pairable instructions

•FXTRACT Extract exponent and significand


8087 287 387 486 Pentium
27-55 27-55 70-76 16-20 13 NP

•FYL2X Compute Y * log2(x)


•FYL2XP1 Compute Y * log2(x+1)
variations 8087 287 387 486 Pentium
fyl2x 900-1100 900-1100 120-538 196-329 22-111 NP
fyl2xp1 700-1000 700-1000 257-547 171-326 22-103 NP
Advanced Microprocessor 62
MMX Technology
• Multi Media eXtensions ( MMX )

• Designed to accelerate multimedia and communication


applications
- motion video, image processing, audio synthesis,
speech synthesis and compression, video conferencing, 2D
and 3D graphics

• Includes new instructions and data types to significantly


improve application performance

• Exploits the parallelism inherent in many multimedia and


communications algorithms

• Maintains full compatibility with existing operating systems


and applications
Advanced Microprocessor 63
MMX Technology

Data Types:-

• packed data types


- 8 packed , consecutive 8 bit bytes
- 4 packed , consecutive 16 bit words
- 2 packed , consecutive 32 bit double words
- format have consecutive memory addresses & uses
little endian form
63 56 55 48 47 40 39 32 31 24 23 16 15 87 0

63 48 47 32 31 16 15 0

63 32 31 0

63 0

Advanced Microprocessor 64
MMX Technology
• MMX Technology registers have the same format as a 64 bit
quantity in memory

• has 2 data access modes


- 64 bit access mode, for 64 bit memory & register transfer
occur between floating point coprocessor registers
- 32 bit access mode, for 32 bit memory & register transfer
occur between microprocessor registers

MM7
MM6
MM5
MM4
MM3
MM2
MM1
MM0
TAGs
Advanced Microprocessor 65
MMX Technology

• adds 57 new instructions to the instructions set of pentium –


pentium4

Instruction Set :-

- arithmetic
- comparison
- conversion
- logical
- shift
- data transfer

• instruction types are similar to microprocessor , MMX


instruction uses packed data types

Arithmetic instruction:
addition, subtraction, multiplication & a special multiplication
with an addition. Advanced Microprocessor 66
MMX Technology

• addition are performed


- packed signed or unsigned packed bytes ( B )
- packed words ( W )
- packed double word data ( D )

• any carry or borrow is generated are dropped

Comparison instruction:

• 2 comparison PCMPEQ( equal) & PCMPGT( greater than)

• compared bytes, words or double word

• do not change the microprocessor flag bits, return 1’s for true & 0’s
for false

• if MM2 compared with MM1 , if equal Least significant byte of MM2


contains FFH otherwise 00H
Advanced Microprocessor 67
MMX Technology

Conversion Instruction:

• 2 comparison instruction PACK as signed and unsigned , & PUNPCK as


unpack high data and unpack low data

• packed signed or unsigned packed bytes ( B )


- packed words ( W )
- packed double word data ( D )

• B,W & D – must be used in combination


- WB word to byte
- DW double to word

• in conversion, if unsigned word does not fit , then the


destination byte becomes an FFH

Advanced Microprocessor 68
MMX Technology

Logical instruction:

• AND, OR, NAND & XOR

• instruction do not have size extension

• perform bit wise operations on all 64 bits of the data

Shift instruction:

• logical shift & arithmetic shift right instruction

• performed on word (W), double word (D) & quad word (Q)

Advanced Microprocessor 69
MMX Technology
Data transfer instruction:

• data transfer done – register to register or register and memory

• only rightmost 32 bits are copied , no instruction to transfer


leftmost 32 bit ,

• to transfer leftmost 32 bit, shift right

EMMS instruction:

• empty MMX state, all the tags in the floating point unit , floating
point register are listed as empty

• this instruction should be executed before the return


instruction at the end of MMX procedure or subsequent floating
point operation will cause interrupt error, crashing window,
application
Advanced Microprocessor 70
MMX Technology
• EMMS – empty MMX state

Ex : EMMS

• MOVED – move double word

Ex: MOVED MM3, EAX  reg to xreg


MOVED EAX, MM4  xreg to reg
MOVED MM3, DATA  mem to xreg
MOVED DATA1, MM3  xreg to mem

MOVEQ – move quadword

Ex: MOVEQ MM3, MM1  xreg to xreg


MOVEQ MM3, DATA  mem to xreg
MOVEQ DATA1, MM3  xreg to mem

Advanced Microprocessor 71
MMX Technology

• PACKSSDW – pack signed doubleword to word

Ex :
PACKSSDW MM1,MM2  xreg to xreg
PACKSSDW MM1,DATA  mem to xreg

• PACKSSWB – pack signed word to byte

Ex :
PACKSSWB MM1,MM2  xreg to xreg
PACKSSWB MM1,DATA  mem to xreg

• PACKUSDW – pack unsigned word to byte

Ex :
PACKUSDW MM1,MM2  xreg to xreg
PACKUSDW MM1,DATA  mem to xreg

Advanced Microprocessor 72
MMX Technology
• PADD – add with truncation : byte, word & doubleword

Ex :
PADDB MM1,MM3  xreg to xreg
PADDW MM1,MM3
PADDD MM1,MM3

PADDB MM1, DATA  mem to xreg


PADDW MM1, DATA
PADDD MM1,DATA

• PADDS – add with signed saturation : byte & word

Ex :
PADDSB MM1,MM3  xreg to xreg
PADDSW MM1,MM3

PADDSB MM1, DATA  mem to xreg


PADDSW MM1, DATA
Advanced Microprocessor 73
MMX Technology
• PADDUS – add with unsigned saturation : byte & word
Ex :
PADDUSB MM1,MM3  xreg to xreg
PADDUSW MM1,MM3

PADDUSB MM1, DATA  mem to xreg


PADDUSW MM1, DATA

• PAND – And
•EX :
PAND MM1,MM2  xreg to xreg
PAND MM1,DATA  mem to xreg

• PAND – Nand
EX :
PANDN MM1,MM2  xreg to xreg
PANDN MM1,DATA  mem to xreg

Advanced Microprocessor 74
MMX Technology
• PCMPEQU – compare for equality
Ex :
PCMPEQUB MM1,MM2  xreg to xreg
PCMPEQUW MM1,MM2
PCMPEQUD MM1,MM2
PCMPEQUB MM1,DATA  mem to xreg
PCMPEQUW MM1,DATA
PCMPEQUD MM1,DATA

PCMPGT – compare for greater than


Ex :
PCMPGTB MM1,MM2  xreg to xreg
PCMPGTW MM1,MM2
PCMPGTD MM1,MM2
PCMPGTB MM1,DATA  mem to xreg
PCMPGTW MM1,DATA
PCMPGTD MM1,DATA
Advanced Microprocessor 75
MMX Technology
• PMADD – multiply and add
Ex :
PMADD MM1,MM4  xreg to xreg
PMADD MM1,DATA  mem to xreg

• PMULH – multiplication - high


Ex :
PMULH MM1,MM4  xreg to xreg
PMULH MM1,DATA  mem to xreg

• PMULL – multiplication - low


Ex :
PMULL MM1,MM4  xreg to xreg
PMULL MM1,DATA  mem to xreg

• POR – or
POR MM1,MM4  xreg to xreg
POR MM1,DATA  mem to xreg
Advanced Microprocessor 76
MMX Technology
• PSLL – shift left :word, doubleword and quadword

Ex :
PSLLW MM1,MM3  xreg to xreg
PSLLD MM1,MM3
PSLLQ MM1,MM3

PSLLW MM1,DATA  mem to xreg


PSLLD MM1,DATA
PSLLQ MM1,DATA

PSLLW MM1,5  xreg by count


PSLLD MM1,4
PSLLQ MM1,7

Advanced Microprocessor 77
MMX Technology

• PSRA – shift arithmetic right :word, doubleword and quadword

Ex :
PSRAW MM1,MM3  xreg to xreg
PSRAD MM1,MM3
PSRAQ MM1,MM3

PSRAW MM1,DATA  mem to xreg


PSRAD MM1,DATA
PSRAQ MM1,DATA

PSRAW MM1,5  xreg by count


PSRAD MM1,4
PSRAQ MM1,7

Advanced Microprocessor 78
MMX Technology

• PSRL – shift right :word, doubleword and quadword

Ex :
PSRLW MM1,MM3  xreg to xreg
PSRLD MM1,MM3
PSRLQ MM1,MM3

PSRLW MM1,DATA  mem to xreg


PSRLD MM1,DATA
PSRLQ MM1,DATA

PSRLW MM1,5  xreg by count


PSRLD MM1,4
PSRLQ MM1,7

Advanced Microprocessor 79
MMX Technology
• PSUB – subtraction with truncation : byte, word & doubleword
Ex :
PSUBB MM1,MM3  xreg to xreg
PSUBW MM1,MM3
PSUBD MM1,MM3

PSUBB MM1, DATA  mem to xreg


PSUBW MM1, DATA
PSUBD MM1,DATA

• PSUBS – subtraction with signed saturation: byte, word &


doubleword
Ex :
PSUBSB MM1,MM3  xreg to xreg
PSUBSW MM1,MM3
PSUBSD MM1,MM3

PSUBSB MM1, DATA  mem to xreg


PSUBSW MM1, DATA Advanced Microprocessor 80

PSUBSD MM1,DATA
MMX Technology
• PSUBUS – subtraction with unsigned saturation
: byte, word & doubleword
Ex :
PSUBUSB MM1,MM3  xreg to xreg
PSUBUSW MM1,MM3
PSUBUSD MM1,MM3

PSUBUSB MM1, DATA mem to xreg


PSUBUSW MM1, DATA
PSUBUSD MM1,DATA

• PXOR – exclusive Or
Ex :
PXOR MM1,MM3  xreg to xreg
PXOR MM4,DATA  mem to xreg

Advanced Microprocessor 81
MMX Technology

• PUNPCKH – unpack high : byte, word & doubleword


Ex
PUNPCKHB MM1,MM3  xreg to xreg
PUNPCKHW MM1,MM3
PUNPCKHD MM1,MM3
PUNPCKHB MM1,DATA  mem to xreg
PUNPCKHW MM1,DATA
PUNPCKHD MM1,DATA

PUNPCHL – unpack LOW : byte, word & doubleword


Ex
PUNPCHLB MM1,MM3  xreg to xreg
PUNPCHLW MM1,MM3
PUNPCHLD MM1,MM3
PUNPCHLB MM1,DATA  mem to xreg
PUNPCHLW MM1,DATA
PUNPCHLD MM1,DATA
Advanced Microprocessor 82
MMX Technology

Advanced Microprocessor 83