Single Instruction Multiple Data SIMD and MMX Registers

CS220
April 23, 2007

Some tips to lab 7
#include<stdio.h> #include<stdio.h>
main() main()
{ {
float f1=1.1,f2=2.2; float f1=1.1,f2=2.2;
float result; float result;
__asm__ ( __asm__ (
"flds %1\n\t" "faddp\n\t"
"fadds %2\n\t" : "=t"(result)
"fsts %0" : "0"(f1), "u"(f2)
: "=m"(result) : "st(1)"
: "m"(f1), "m"(f2) );
); printf("f1 + f2 = %f\n",result);
printf("f1 + f2 = %f\n",result); }
}
Single Instruction Multiple Data
(SIMD)
• Data level parallelism
• Multimedia Extensions (MMX)
– Integers
– Reuse FP registers
• Streaming SIMD Extensions (SSE)
– expanded with 32-bit floating point support
– Additional registers
SSE/SSE2/SSE3
• Perform SIMD operations on floating-point data.
– 128-bit, packed, single-precision floating-point data type
• contain four single-precision floating-point values
– Eight 128-bit registers (XMM0 through XMM7)
• SSE2
– 128-bit packed double-precision floating-point value
• contains two double-precision values
– 128-bit packed byte integer value
• contains 16 single-byte integer values
– 128-bit packed word integer value
• contains eight word integer values
– 128-bit packed double word integer value
• contains four double word integer values
– 128-bit packed quad word integer value
• contains two quad word integer values
• SSE3
– No new data type
MMX Registers
• MMX utilizes the 80-bit
FPU registers
– MM0 through MM7 are
directly mapped to FPU
registers R0 through R7
– Random access contrast
to register stack in FPU
– Only use 64 bits, upper 16
bits are set to all ones
(NaNs or infinities in FP
view)
Two New principles
1. Operations on packed data
– four new 64-bit data types:
• Packed byte
– Eight bytes packed into one 64-bit quantity
• Packed word
– Four words packed into one 64-bit quantity
• Packed doubleword
– Two doublewords packed into one 64-bit quantity
• Quadword
– One 64-bit quantity
2. Saturation Arithmetic
MMX Data Types
Note that the values in one same register can have different interpretations
Saturation and Wraparound
• Wraparound: truncating any overflow, only the lower bits are
returned. The carry is ignored.
– add two eight-bit values 0x02 and 0xFF
– The actual sum is 0x101, but the ninth bit is truncated, and the
result is 0x01
• Saturation: Results are clipped (saturated) to some maximum or
minimum value, 8-bit example:
Decimal Hexadecimal
Data Type
Lower Limit Upper Limit Lower Limit Upper Limit
Signed Byte -128 +127 0x80 0x7F
Unsigned Byte 0 255 0x0 0xfF
Signed Word -32768 +32767 0x8000 0x7FFF
Unsigned Word 0 65535 0x0 0xFFFF
• Cannot mix FPU and MMX instructions
• Begin MMX instructions at any time
• EMMS (Exit MMX Machine State) to reset FP state. After any MMX instruction, the
entire floating-point tag word is set to Valid (00s). EMMS sets the entire floating-point
tag word to Empty (11s).
• Register states (both FP and MMX) can be saved and restored by FNSAVE and
FRSTR instructions.
• Do not rely on register contents across transitions.
FP_code:
…...
……
MMX_code:
…...
EMMS (*mark the FP tag word as empty*)
FP_code 1:
…...
…...
Instruction Group
• Fifty-seven MMX instructions:
– Arithmetic Instructions
– Comparison Instructions
– Conversion Instructions
– Logical Instructions
– Shift Instructions
– Data Transfer Instructions
– Empty MMX State (EMMS) Instruction
MMX Instruction Set
Category Mnemonic Different Opcodes Description
Arithmetic PADD[B,W,D] 3 Add with wrap-around on [byte, word, doubleword]
PADDS[B,W] 2 Add signed with saturation on [byte, word]
PADDUS[B,W] 2 Add unsigned with saturation on [byte, word]
PSUB[B,W,D] 3 Subtract with wrap-around on [byte, word, doubleword]
PSUBS[B,W] 2 Subtract signed with saturation on [byte, word]
PSUBUS[B,W] 2 Subtract unsigned with saturation on [byte, word]
PMULHW 1 Packed multiply high on words
PMULLW 1 Packed multiply low on words
PMADDWD 1 Packed multiply on words and add resulting pairs
Comparison PCMPEQ[B,W,D] 3 Packed compare for equality [byte, word,doubleword]
PCMPGT[B,W,D] 3 Packed compare greater than [byte, word, doubleword]
Conversion PACKUSWB 1 Pack words into bytes (unsigned with saturation)
PACKSS[WB,DW] 2 Pack [words into bytes, doublewords into words] (signed with
saturation)
PUNPCKH [BW,WD,DQ] 3 Unpack (interleave) high-order [bytes, words, doublewords] from
MMXTM register
PUNPCKL [BW,WD,DQ] 3 Unpack (interleave) low-order [bytes, words, doublewords] from
MMX register
Logical PAND 1 Bitwise AND
PANDN 1 Bitwise AND NOT
POR 1 Bitwise OR
PXOR 1 Bitwise XOR
Shift PSLL[W,D,Q] 6 Packed shift left logical [word, doubleword, quadword] by amount
specified in MMX register or by immediate value
PSRL[W,D,Q] 6 Packed shift right logical [word, doubleword, quadword] by amount
PSRA[W,D] 4 Packed shift right arithmetic [word, doubleword] by amount
Data Transfer MOV[D,Q] 4 Move [doubleword, quadword] to MMX register or from MMX
register
State Mgmt EMMS 1 Empty MMX state
Data Transfer Instructions
• The MOVD (Move 32 Bits) instruction transfers 32 bits of packed
data from memory to MMX registers and visa versa, or from integer
registers to MMX registers and visa versa.
– Examples:
• movd %eax, %mm0
• movd my32bits, %mm0
• movd %mm0, my32bits
• movd %mm0, %mm1 (WRONG!)
• The MOVQ (Move 64 Bits) instruction transfers 64-bits of packed
data from memory to MMX registers and vise versa, or transfers
data between MMX registers.
– Examples:
• movq %mm0, my64bits
• movq my64bits, %mm0
• can’t move between mmx regs, like load/store.
Instruction format
• OPERATION SRC, DEST (AT&T syntax)
would be decoded as:
DEST = DEST OPERATION SRC
• A typical MMX instruction has this syntax:
Prefix: P for Packed
Instruction operation: for example - ADD,CMP,or XOR
Suffix:
– US for Unsigned Saturation
– S for Signed saturation
– B, W, D, Q for the data type: packed byte, packed word, packed
doubleword, or quadword.
The rest of today’s class, explain MMX instructions on this page:
http://www.tommesani.com/MMXPrimer.html
Note the difference of Intel syntax and AT&T syntax
http://www.imada.sdu.dk/~kslarsen/dm18/Litteratur/IntelnATT.htm
This page uses Intel syntax, and the position of source and destination
in instructions are exchanged compared to AT&T syntax.
The pseudo-code explanation of each instruction is the same.
You may also want to refer to Intel official MMX reference manual for
better explanation (also Intel syntax):
ftp://download.intel.com/ids/mmx/MMX_Manual_%20Prog_Ref.pdf
Examples and applications of MMX instructions will be on next class.

Single Instruction Multiple Data SIMD and MMX Registers

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Single Instruction Multiple Data SIMD and MMX Registers

Uploaded by

Copyright:

Available Formats

CS220

April 23, 2007

Examples and applications of MMX instructions will be on next class.

You might also like