You are on page 1of 14

CS220

April 23, 2007


Some tips to lab 7
#include<stdio.h> #include<stdio.h>
main() main()
{ {
float f1=1.1,f2=2.2; float f1=1.1,f2=2.2;
float result; float result;
__asm__ ( __asm__ (
"flds %1\n\t" "faddp\n\t"
"fadds %2\n\t" : "=t"(result)
"fsts %0" : "0"(f1), "u"(f2)
: "=m"(result) : "st(1)"
: "m"(f1), "m"(f2) );
); printf("f1 + f2 = %f\n",result);
printf("f1 + f2 = %f\n",result); }
}
Single Instruction Multiple Data
(SIMD)
• Data level parallelism
• Multimedia Extensions (MMX)
– Integers
– Reuse FP registers
• Streaming SIMD Extensions (SSE)
– expanded with 32-bit floating point support
– Additional registers
SSE/SSE2/SSE3
• Perform SIMD operations on floating-point data.
– 128-bit, packed, single-precision floating-point data type
• contain four single-precision floating-point values
– Eight 128-bit registers (XMM0 through XMM7)
• SSE2
– 128-bit packed double-precision floating-point value
• contains two double-precision values
– 128-bit packed byte integer value
• contains 16 single-byte integer values
– 128-bit packed word integer value
• contains eight word integer values
– 128-bit packed double word integer value
• contains four double word integer values
– 128-bit packed quad word integer value
• contains two quad word integer values
• SSE3
– No new data type
MMX Registers
• MMX utilizes the 80-bit
FPU registers
– MM0 through MM7 are
directly mapped to FPU
registers R0 through R7
– Random access contrast
to register stack in FPU
– Only use 64 bits, upper 16
bits are set to all ones
(NaNs or infinities in FP
view)
Two New principles
1. Operations on packed data
– four new 64-bit data types:
• Packed byte
– Eight bytes packed into one 64-bit quantity
• Packed word
– Four words packed into one 64-bit quantity
• Packed doubleword
– Two doublewords packed into one 64-bit quantity
• Quadword
– One 64-bit quantity
2. Saturation Arithmetic
MMX Data Types

Note that the values in one same register can have different interpretations
Saturation and Wraparound
• Wraparound: truncating any overflow, only the lower bits are
returned. The carry is ignored.
– add two eight-bit values 0x02 and 0xFF
– The actual sum is 0x101, but the ninth bit is truncated, and the
result is 0x01
• Saturation: Results are clipped (saturated) to some maximum or
minimum value, 8-bit example:

Decimal Hexadecimal
Data Type
Lower Limit Upper Limit Lower Limit Upper Limit
Signed Byte -128 +127 0x80 0x7F
Unsigned Byte 0 255 0x0 0xfF
Signed Word -32768 +32767 0x8000 0x7FFF
Unsigned Word 0 65535 0x0 0xFFFF
• Cannot mix FPU and MMX instructions
• Begin MMX instructions at any time
• EMMS (Exit MMX Machine State) to reset FP state. After any MMX instruction, the
entire floating-point tag word is set to Valid (00s). EMMS sets the entire floating-point
tag word to Empty (11s).
• Register states (both FP and MMX) can be saved and restored by FNSAVE and
FRSTR instructions.
• Do not rely on register contents across transitions.
FP_code:
…...
……
MMX_code:
…...
EMMS (*mark the FP tag word as empty*)
FP_code 1:
…...
…...
Instruction Group
• Fifty-seven MMX instructions:
– Arithmetic Instructions
– Comparison Instructions
– Conversion Instructions
– Logical Instructions
– Shift Instructions
– Data Transfer Instructions
– Empty MMX State (EMMS) Instruction
MMX Instruction Set
Category Mnemonic Different Opcodes Description
Arithmetic PADD[B,W,D] 3 Add with wrap-around on [byte, word, doubleword]
PADDS[B,W] 2 Add signed with saturation on [byte, word]
PADDUS[B,W] 2 Add unsigned with saturation on [byte, word]
PSUB[B,W,D] 3 Subtract with wrap-around on [byte, word, doubleword]
PSUBS[B,W] 2 Subtract signed with saturation on [byte, word]
PSUBUS[B,W] 2 Subtract unsigned with saturation on [byte, word]
PMULHW 1 Packed multiply high on words
PMULLW 1 Packed multiply low on words
PMADDWD 1 Packed multiply on words and add resulting pairs
Comparison PCMPEQ[B,W,D] 3 Packed compare for equality [byte, word,doubleword]
PCMPGT[B,W,D] 3 Packed compare greater than [byte, word, doubleword]
Conversion PACKUSWB 1 Pack words into bytes (unsigned with saturation)
PACKSS[WB,DW] 2 Pack [words into bytes, doublewords into words] (signed with
saturation)
PUNPCKH [BW,WD,DQ] 3 Unpack (interleave) high-order [bytes, words, doublewords] from
MMXTM register
PUNPCKL [BW,WD,DQ] 3 Unpack (interleave) low-order [bytes, words, doublewords] from
MMX register
Logical PAND 1 Bitwise AND
PANDN 1 Bitwise AND NOT
POR 1 Bitwise OR
PXOR 1 Bitwise XOR
Shift PSLL[W,D,Q] 6 Packed shift left logical [word, doubleword, quadword] by amount
specified in MMX register or by immediate value
PSRL[W,D,Q] 6 Packed shift right logical [word, doubleword, quadword] by amount
specified in MMX register or by immediate value
PSRA[W,D] 4 Packed shift right arithmetic [word, doubleword] by amount
specified in MMX register or by immediate value
Data Transfer MOV[D,Q] 4 Move [doubleword, quadword] to MMX register or from MMX
register
State Mgmt EMMS 1 Empty MMX state
Data Transfer Instructions
• The MOVD (Move 32 Bits) instruction transfers 32 bits of packed
data from memory to MMX registers and visa versa, or from integer
registers to MMX registers and visa versa.
– Examples:
• movd %eax, %mm0
• movd my32bits, %mm0
• movd %mm0, my32bits
• movd %mm0, %mm1 (WRONG!)
• The MOVQ (Move 64 Bits) instruction transfers 64-bits of packed
data from memory to MMX registers and vise versa, or transfers
data between MMX registers.
– Examples:
• movq %mm0, my64bits
• movq my64bits, %mm0
• can’t move between mmx regs, like load/store.
Instruction format
• OPERATION SRC, DEST (AT&T syntax)
would be decoded as:
DEST = DEST OPERATION SRC
• A typical MMX instruction has this syntax:
Prefix: P for Packed
Instruction operation: for example - ADD,CMP,or XOR
Suffix:
– US for Unsigned Saturation
– S for Signed saturation
– B, W, D, Q for the data type: packed byte, packed word, packed
doubleword, or quadword.
The rest of today’s class, explain MMX instructions on this page:
http://www.tommesani.com/MMXPrimer.html
Note the difference of Intel syntax and AT&T syntax
http://www.imada.sdu.dk/~kslarsen/dm18/Litteratur/IntelnATT.htm
This page uses Intel syntax, and the position of source and destination
in instructions are exchanged compared to AT&T syntax.
The pseudo-code explanation of each instruction is the same.

You may also want to refer to Intel official MMX reference manual for
better explanation (also Intel syntax):
ftp://download.intel.com/ids/mmx/MMX_Manual_%20Prog_Ref.pdf

Examples and applications of MMX instructions will be on next class.

You might also like