Assembly Language - SSE and SSE2. Floating Point Registers and x86_64

Size: px

Start display at page:

Download "Assembly Language - SSE and SSE2. Floating Point Registers and x86_64"

Virgil Benson
6 years ago
Views:

1 Assembly Language - SSE and SSE2 Introduction to Scalar Floating- Point Operations via SSE Floating Point Registers and x86_64 Two sets of, in addition to the General-Purpose Registers Three additional groups of instructions use the Floating-Point Registers, ST0 ST7-80-bit a superset of IEEE-754 format - used by the original Floating-Point coprocessors» 8087, 80387, i486 FPU, etc. - Only addressable as a stack, i.e. all operations applied to register st0 Useful for trig, log functions, etc. 1

x86_64 Registers MMX MMX instructions provide SIMD (Single- Instruction, Multiple-Data) processing - Integer operations on multiple data values in parallel MMX overlap the Floating-Point - MMX MMX0

2 x86_64 Registers MMX MMX instructions provide SIMD (Single- Instruction, Multiple-Data) processing - Integer operations on multiple data values in parallel MMX overlap the Floating-Point - MMX MMX0 MMX7 are only 64 bits wide - Each register overlaps the lower 64 bits of the 80-bit ST - MMX are directly addressable, unlike ST Difficult to use MMX and Floating-Point at the same time - slow context switches 2

3 SSE, SSE2, and beyond "Streaming SIMD Extensions" SSE offers single-precision floating-point SIMD instructions XMM0 XMM15 each hold multiple IEEE-754 operands bits wide, four single-precision operands per register SSE2 adds integer and double-precision floating-point SIMD instructions Many compilers use XMM and SSE/SSE2 instructions instead of F.P. or MMX Programming Usage C function ABI (Application Binary Interface): Floating-point arguments ("float", "double") are passed in XMM - AL register passes count of XMM used - Floating-point return values in XMM0, XMM1 if needed Floating-point operations use SSE, SSE2 instructions on XMM - Instead of more-awkward floating-point instructions 3

4 first example Just pass a couple of f.p. (double-type) arguments to a C function - printf() Note that scanf() doesn't get, or return, f.p. arguments Basic SSE Instructions movsd move a double-precision operand to/from an XMM register addsd, mulsd, divsd, sqrtsd, etc - operate on scalar, double-precision operands in XMM movss move a single-precision operand to/from an XMM register addss, mulss, divss, sqrtss, etc - operate on scalar, single-precision operands in XMM many other instructions - also see F.P. instructions 4

5 Example 2 64-bit Division SSE and Constants No SSE instructions to load constants into Constants must be created in memory, using Assembler (nasm) features Load constants from memory locations into for use 5

6 64-bit example - product This function calculates the product of an array of doubletype numbers. It uses an optimized while loop. It returns 1.0 for an empty array. 64-bit example 2 This code sets up a main() routine to read in a set of doubles. Easily adjusted for 32-bit floats. continued on the next slide 6

7 64-bit example 3 continued from the previous slide The input loop; calling product() and producing the outputs SSE and Converting Numeric Types Convert integers to floats/doubles - cvtsi2ss convert integer to 32-bit float in XMM register - cvtsi2sd convert integer to 64-bit double in XMM register Convert floats/doubles to integers - cvtss2si convert 32-bit float to integer, round up/down - cvttss2si convert 32-bit float to integer, truncate result - cvtsd2si convert 64-bit double to integer, round - cvttsd2si convert 64-bit double to integer, truncate Convert between floats and doubles - cvtss2sd convert 32-bit float to 64-bit double - cvtsd2ss convert 64-bit float to 32-bit float 7

8 32-bit example - mean This function sums up the elements of an array, then divides the sum by the array length. The divisor must be converted from an integer to a float. Instruction cvtsi2ss does this. 32-bit example 2 This code sets up a main() routine to read in a set of floats. continued on the next slide 8

9 32-bit example 3 continued from the previous slide The input loop; calling mean() ; producing the outputs The output must be promoted to a double, by cvtss2sd, for printf() 9

Assembly Language - SSE and SSE2. Introduction to Scalar Floating- Point Operations via SSE

Assembly Language - SSE and SSE2 Introduction to Scalar Floating- Point Operations via SSE Floating Point Registers and x86_64 Two sets of registers, in addition to the General-Purpose Registers Three