You are on page 1of 2

E.

10

Multimedia Extension (MMX) Operations

695

MOV FLD FLD FLD FLD FSUBP FLDZ FUCOMIP JE FSUBP FDIVP MOV FST FLD FLD FMULP FSUBRP MOV FSTP MOV JMP MOV MOV

EAX, OFFSET COORDS QWORD PTR [EAX + 24] QWORD PTR [EAX + 16] QWORD PTR [EAX + 8] QWORD PTR [EAX] ST(2), ST(0) ST(0), ST(2) NO_SLOPE ST(2), ST(0) ST(1), ST(0) EBX, OFFSET SLOPE QWORD PTR [EBX] QWORD PTR [EAX + 8] QWORD PTR [EAX] ST(2), ST(0) ST(1), ST(0) EBX, OFFSET INTERCEPT QWORD PTR [EBX]

NO_SLOPE: DONE:
Figure E.18

EBX, 0 DONE EBX, 1 Indicate that line is vertical. DWORD PTR VERT_LINE, EBX

EAX points to list of coordinates. Push y 1 on register stack. Push x 1 on register stack. Push y 0 on register stack. Push x 0 on register stack. Compute x 1 x 0 ; pop x 0 . Push 0.0 on stack. Determine whether denominator is zero. If so, slope m is undened. Compute y 1 y 0 ; pop y 0 . Compute m = (y 1 y 0 )/(x 1 x 0 ). EBX points to memory location SLOPE. Store the slope to memory. Push y 0 on register stack. Push x 0 on register stack. Compute m x 0 ; pop x 0 . Compute b = y 0 m x 0 ; pop y 0 . EBX points to memory location INTERCEPT. Store the intercept to memory; pop top of stack. Indicate that line is not vertical.

Floating-point program to compute the slope and intercept of a line.

E.10

Multimedia Extension (MMX) Operations

A two-dimensional graphic or video image can be represented by a large array of sampled image points, called pixels. The color and brightness of each point can be encoded into an 8-bit data item. Processing of such data has two main characteristics. The rst is that manipulations of individual pixels often involve very simple arithmetic or logic operations. The second is that very high computational performance is needed for some real-time display applications. The same characteristics apply to sampled audio signals or speech processing, where a sequence of signed numbers represents samples of a continuous analog signal taken at periodic intervals. In such applications, processing efciency is achieved if the individual data items, which are usually bytes or 16-bit words, are packed into small groups whose elements can be processed in parallel. Vector or single-instruction multiple-data (SIMD) instructions for this form of parallel processing are described in Chapter 12. The IA-32 instruction set

696

APPENDIX

The Intel IA-32 Architecture

includes a number of SIMD instructions, which are called multimedia extension (MMX) instructions. They perform the same operation simultaneously on multiple data elements, packed into 64-bit quadwords. The operands for MMX instructions can be in the memory, or in the eight oating-point registers. Thus, these registers serve a dual purpose. They can hold either oating-point numbers or MMX operands. When used by MMX instructions, the registers are referred to as MM0 through MM7, and only the lowermost 64 bits of each 80-bit register are relevant for MMX operations. Unlike the oating-point instructions in Section E.9, the MMX instructions do not manage this shared register set as a stack. The MOVQ instruction is provided for transferring 64-bit quadword operands between the memory and the MMX registers. For example, the instruction MOVQ MM0, [EAX]

loads the quadword from the memory location whose address is in register EAX into register MM0. The MOVQ instruction can also be used to transfer data between MMX registers. For example, the instruction MOVQ MM3, MM4

transfers the contents of register MM4 to register MM3. Instructions are provided to perform arithmetic and logic operations in parallel on multiple elements of a packed quadword operand. The source can be in the memory or in an MMX register, but the destination must be an MMX register. For most MMX instructions, a sufx is used to indicate the size (and number) of data elements within a packed quadword: B for byte (8 elements), W for word (4 elements), D for doubleword (2 elements), and Q for quadword (1 element). For example, the instruction PADDB MM2, [EBX]

adds eight corresponding bytes of the quadwords in register MM2 and in the memory location pointed to by register EBX. The eight sums are computed in parallel. The results are placed in register MM2. Other instructions are provided for subtraction (PSUB), multiplication (PMUL), combined multiplication and addition (PMADD), logic operations (PAND, POR, and PXOR), and a large number of other operations on packed quadword operands.

E.11

Vector (SIMD) Floating-Point Operations

Section E.9 described instructions for operating on individual oating-point numbers. Vector (SIMD) instructions are also provided to perform operations simultaneously on multiple oating-point numbers. In Intel terminology, these instructions are called streaming SIMD extension (SSE) instructions. They handle packed 128-bit double quadwords, each consisting of four 32-bit oating-point numbers. Eight additional 128-bit registers, XMM0 to XMM7, are available for holding these operands. The MOVAPS and MOVUPS instructions transfer a packed double quadword between memory and the XMM registers, or between XMM registers. The PS sufx indicates packed single-precision oating-point values in the double quadword. The A or U designation

You might also like