Assembly Language - SSE and SSE2. Introduction to Scalar Floating- Point Operations via SSE

Size: px

Start display at page:

Download "Assembly Language - SSE and SSE2. Introduction to Scalar Floating- Point Operations via SSE"

Letitia Rose
6 years ago
Views:

1 Assembly Language - SSE and SSE2 Introduction to Scalar Floating- Point Operations via SSE

2 Floating Point Registers and x86_64 Two sets of registers, in addition to the General-Purpose Registers Three additional groups of instructions use the registers Floating-Point Registers, ST0 ST7-80-bit registers a superset of IEEE-754 format - used by the original Floating-Point coprocessors» 8087, 80387, i486 FPU, etc. - Only addressable as a stack, i.e. all operations applied to register st0 Useful for trig, log functions, etc.

3 x86_64 Registers

4 MMX MMX instructions provide SIMD (Single- Instruction, Multiple-Data) processing - Integer operations on multiple data values in parallel MMX registers overlap the Floating-Point registers - MMX registers MMX0 MMX7 are only 64 bits wide - Each register overlaps the lower 64 bits of the 80-bit ST registers - MMX registers are directly addressable, unlike ST Difficult to use MMX and Floating-Point at the same time - slow context switches

5 SSE, SSE2, and beyond "Streaming SIMD Extensions" SSE offers single-precision floating-point SIMD instructions XMM0 XMM15 registers each hold multiple IEEE-754 operands bits wide, four single-precision operands per register SSE2 adds integer and double-precision floating-point SIMD instructions Many compilers use XMM registers and SSE/SSE2 instructions instead of F.P. or MMX

6 Programming Usage C function ABI (Application Binary Interface): Floating-point arguments ("float", "double") are passed in XMM registers - AL register passes count of XMM registers used - Floating-point return values in XMM0, XMM1 if needed Floating-point operations use SSE, SSE2 instructions on XMM registers - Instead of more-awkward floating-point instructions

7 first example Just pass a couple of f.p. (double-type) arguments to a C function - printf() Note that this scanf() doesn't get, or return, f.p. arguments - only pointers The printf() does, however

8 Basic SSE Instructions movsd move a double-precision operand to/from an XMM register addsd, mulsd, divsd, sqrtsd, etc - operate on scalar, double-precision operands in XMM registers movss move a single-precision operand to/from an XMM register addss, mulss, divss, sqrtss, etc - operate on scalar, single-precision operands in XMM registers many other instructions - also see F.P. instructions

9 Example 2 64-bit Division

10 SSE and Constants No SSE instructions to load constants into registers Constants must be created in memory, using Assembler (nasm) features Load constants from memory locations into registers for use

11 64-bit example product (a) This function calculates the product of an array of doubletype numbers. It uses an optimized while loop. It returns 1.0 for an empty array.

12 64-bit product (b) This code sets up a main() routine to read in a set of doubles. Easily adjusted for 32-bit floats. continued on the next slide

13 64-bit product (c) continued from the previous slide The input loop; calling product() and producing the outputs

14 SSE and Converting Numeric Types Convert integers to floats/doubles - cvtsi2ss convert integer to 32-bit float in XMM register - cvtsi2sd convert integer to 64-bit double in XMM register Convert floats/doubles to integers - cvtss2si convert 32-bit float to integer, round up/down - cvttss2si convert 32-bit float to integer, truncate result - cvtsd2si convert 64-bit double to integer, round - cvttsd2si convert 64-bit double to integer, truncate Convert between floats and doubles - cvtss2sd convert 32-bit float to 64-bit double - cvtsd2ss convert 64-bit float to 32-bit float

15 32-bit example mean (a) This function sums up the elements of an array, then divides the sum by the array length. The divisor must be converted from an integer to a float. Instruction cvtsi2ss does this.

16 32-bit mean (b) This code sets up a main() routine to read in a set of floats. continued on the next slide

17 32-bit mean (c) continued from the previous slide The input loop; calling mean() ; producing the outputs The output must be promoted to a double, by cvtss2sd, for printf()

Assembly Language - SSE and SSE2. Floating Point Registers and x86_64

Assembly Language - SSE and SSE2 Introduction to Scalar Floating- Point Operations via SSE Floating Point Registers and x86_64 Two sets of, in addition to the General-Purpose Registers Three additional