Enhancing LLVM s Floating-Point Exception and Rounding Mode Support. Andy Kaylor David Kreitzer Intel Corporation

Size: px

Start display at page:

Download "Enhancing LLVM s Floating-Point Exception and Rounding Mode Support. Andy Kaylor David Kreitzer Intel Corporation"

Dwayne Dickerson
6 years ago
Views:

1 Enhancing LLVM s Floating-Point Exception and Rounding Mode Support Andy Kaylor David Kreitzer Intel Corporation 1

2 What s Needed? User controlled rounding mode needs to be respected by optimizer FP exception status flags need to be correctly maintained No exceptions hidden by optimizations No false exceptions introduced by optimizations FP instruction side effects (in existing intrinsics) need to be modeled Need extra support for masked vector operations Masked-off lanes shouldn t raise exceptions Other issues? 2

Proposed Solution What passes can assume about rounding (DYNAMIC, TONEAREST, DOWNWARD,

metadata declare double @llvm.constrained.fadd.

3 Proposed Solution What passes can assume about rounding (DYNAMIC, TONEAREST, DOWNWARD, UPWARD, = thread_local global i8, section llvm.metadata declare %lhs, double %rhs, metadata %rounding_behavior, metadata %exception_behavior, i8* %fp_env) Opaque reference to the FP environment. Must What passes can assume about exceptions (IGNORE, RETURN, MAYTRAP) Hat Tip to Chandler Carruth: 3

4 Rounding Behavior The rounding behavior argument is information for the optimizer. It is not equivalent to the actual runtime rounding mode. define %a, double %b) { %rm = call ; We can t use this value at compile time. %add1 = call (double %a, double %b, metadata! LLVM_ROUND_DYNAMIC, metadata! LLVM_FPEXCEPT_RETURN, %result = call ; FE_TOWARDZERO -- Now we know. %add2 = call (double %a, double %b, metadata! LLVM_ROUND_TOWARDZERO, metadata! LLVM_FPEXCEPT_RETURN, 4

5 What happens in CodeGen? The proposal to this point centers on making things work in IR. Do we need a way to make sure CodeGen behaves also? Possible solutions: Defer the lowering of the intrinsic as late as possible. Correctly model the registers used by FP operations. For x86 targets, at least, the implicit uses of MXCSR, and the x87 control and status registers are not currently modeled. Add attributes describing the rounding and exception behavior. 5

6 Masked Vector Operations We re teaching the vectorizer to create masked vector operations. We need to avoid false exceptions when the target hardware does not support masking. declare <2 x (<2 x double> %a, <2 x double> %b, <2 x i1> %mask, <2 x double> %passthru, metadata %rounding_behavior, metadata %exception_behavior, i8* %fp_env) 6

7 Other concerns? 7

8 Backup 8

9 Why? #pragma STDC FENV_ACCESS ON More pragmas on the way -ftrapping-math New masked vector operations 9

10 Implementation Goals No changes required to maintain the current handling when the default modes are used. Limit scope of changes needed for conservatively correct behavior. Allow a path to optimize constrained FP code. Try not to limit potential vectorization. 10

Example: Constant Folding define double @f(){ %div = fdiv double 1.000000e+00, 1.

11 Example: Constant Folding define %div = fdiv double e+00, e+01 ret double %div Sparse Conditional Constant Propagation define { ret double e-01 Looks right but the folded constant must be rounded! 11

Example: Rounding (a 0) define double @f(double %a){ %sub = fsub double %a, 0.

12 Example: Rounding (a 0) define %a){ %sub = fsub double %a, e+00 ret double %sub Early CSE define %a) { ret double %a This is incorrect if %a is zero and rounding mode is FE_DOWNWARD! 12

Example: Rounding (-(-a)*b) define double @f(double %a, double %b){ %sub = fsub double -0.000000e+00, %a %mul = fmul double %sub, %b %sub1 = fsub double -0.

13 Example: Rounding (-(-a)*b) define %a, double %b){ %sub = fsub double e+00, %a %mul = fmul double %sub, %b %sub1 = fsub double e+00, %mul ret double %sub1 Combine Redundant Instructions define %a, double %b) { %mul = fmul double %a, %b ret double %mul Optimized code produces a different result for some rounding modes! 13

Example: Speculative Execution define double @f(i32 %n, double %d) { %cmp = icmp sgt i32 %n, 0 br i1 %cmp, label %if.then, label %if.end if.then: %add = fadd double 1.000000e+00, %d br label %if.

14 Example: Speculative Execution define %n, double %d) { %cmp = icmp sgt i32 %n, 0 br i1 %cmp, label %if.then, label %if.end if.then: %add = fadd double e+00, %d br label %if.end if.end: %d.0 = phi double [%add, %if.then], [ %d, %entry ] ret double %d.0 Simplify CFG define %n, double %d) { %cmp = icmp sgt i32 %n, 0 %add = fadd double e+00, %d %0 = select i1 %cmp, double %add, double %d ret double %0 Speculative execution may set exception status flags! 14

15 Example: Side Effects define <4 x x float> %v) { %tmp = alloca i32, align 4 %tmp1 = alloca i32, align 4 %0 = bitcast i32* %tmp to i8* call %0) %stmxcsr = load i32, i32* %tmp, align 4 %or = or i32 %stmxcsr, store i32 %or, i32* %tmp1, align 4 %1 = bitcast i32* %tmp1 to i8* call %1) %floorint = call <4 x x float> %0) readnone %result = sitofp <4 x i32> %floorint to <4 x float> call %0) ret %result The instructions in bold all implicitly use MXCSR! 15

16 The FP Environment The FP environment is kind of an abstract idea. On Intel64 targets, for instance, it consists of: The x87 FPU status register The x87 FPU control register The MXCSR register How should we be modeling its use? Implicit state, SSA Value or Intrinsic Global? If an explicit value is used, it cannot be used outside the new intrinsics. 16

17 Example: Implicit State define %a, double %b, double %c) { %add1 = call (double %a, double %b, metadata! LLVM_ROUND_DYNAMIC, metadata! LLVM_FEEXCEPT_RETURN ) call %add1) %add2 = call (double %a, double %b, metadata! LLVM_ROUND_DYNAMIC, metadata! LLVM_FEEXCEPT_RETURN ) %add3 = call (double %add2, double %c, metadata! LLVM_ROUND_DYNAMIC, metadata! LLVM_FEEXCEPT_RETURN ) ret double %add3 Problem: The intrinsic properties must be very restrictive. 17

18 Example: SSA Value define %a, double %b, double %c) { %fenv = call %add1.ret = call { double, %a, double %b, metadata! LLVM_ROUND_DYNAMIC, metadata! LLVM_FEEXCEPT_RETURN, token %fenv) %fenv.2 = extractvalue { double, token %add1.ret, 1 call token %fenv.1) %add1 = extractvalue { double, token %add1.ret, 0 call %add1) [ fp_env (token %fenv.2)] %fenv.2 = call %add2.ret = call { double, %a, double %b, metadata! LLVM_ROUND_DYNAMIC, metadata! LLVM_FEEXCEPT_RETURN, token %fenv.2) %fenv.3 = extractvalue { double, token %add2.ret, 1 %add2 = extractvalue { double, token %add2.ret, 0 %add3.ret = call { double, %add2, double %c, metadata! LLVM_ROUND_DYNAMIC, metadata! LLVM_FEEXCEPT_RETURN, token %fenv.3) %fenv.4 = extractvalue { double, token %add3.env, 0 call token %fenv.4) %add3 = extractvalue { double, token %add3.env, 0 ret float %add3 Problem: We aren t allowed to use tokens this way. 18

19 Example: Intrinsic = thread_local global i8, section llvm.metadata define %a, double %b, double %c) { %add1 = call (double %a, double %b, metadata! LLVM_ROUND_DYNAMIC, metadata! LLVM_FEEXCEPT_RETURN, call %add2 = call (double %a, double %b, metadata! LLVM_ROUND_DYNAMIC, metadata! LLVM_FEEXCEPT_RETURN, %add3 = call (double %add2, double %c, metadata! LLVM_ROUND_DYNAMIC, metadata! LLVM_FEEXCEPT_RETURN, ret float %add3 19

20 Bad Masked Vector Lowering (hypothetical) %div.v = call <8 x (<8 x double> %a, <8 x double> %b, <8 x i1> %mask, <8 x double> %passthru, i32 1, i32-1, AVX-512F %vreg4 = VDIVPDZrrk %vreg0, %vreg3, %vreg1, %vreg2 AVX (bad code) %vreg4 = V_SET0 %vreg5 = VPCMPEQDrr %vreg3, %vreg4 %vreg6 = V_SETALLONES %vreg7 = VPXORrr %vreg5, %vreg6 %vreg8 = VPMOVSXDQYrr %vreg7 %vreg9 = VDIVPDYrr %vreg1, %vreg2 %vreg10 = VBLENDVPDYrr %vreg0, %vreg9, %vreg8 This may raise false divideby-zero exceptions! 20

21 Command Line Options clang currently recognizes the following command line options: -ffast-math : Allow aggressive, lossy floating-point optimizations -ffinitee-math-only : Assume no NaNs or infinities are generated -ffloat-store : Don't allocate floats and doubles in extended precision registers -frounding-math : Disable optimizations that assume default FP rounding behavior -fsignaling-nans : Disable optimizations observable by IEEE signaling NaNs -fsigned-zeros : Disable floating point optimizations that ignore the IEEE signedness of zero -fsingle-precision-constants : Convert floating point constants to single precision constants -ftrapping-math : Assume floating point operations can trap -funsafe-math-optimizations : Allow math optimizations that may violate IEEE or ISO standards Some of these are ignored. Others are implemented using the fast math flags. 21

22 FP-related pragmas C99 FENV_ACCESS (on/off) FP_CONTRACT (on/off) CX_LIMITED_RANGE (on/off) ISO/IEC TS :2014 FE_ROUND (dynamic/direction) ISO/IEC TS :2015 FE_DEC_ROUND (dynamic/direction) ISO/IEC TS :2016 FENV_FLT_EVAL_METHOD (width) FENV_DEC_EVAL_METHOD (width) FENV_ALLOW_VALUE_CHANGING_OPTIMIZATION (on/off) FENV_ALLOW_ASSOCIATIVE_LAW (on/off) FENV_ALLOW_DISTRIBUTIVE_LAW (on/off) FENV_ALLOW_MULTIPLY_BY_RECIPROCAL (on/off) FENV_ALLOW_ZERO_SUBNORMAL (on/off) FENV_ALLOW_CONTRACT_FMA (on/off) FENV_ALLOW_CONTRACT_OPERATION_CONVERSION (on/off) FENV_ALLOW_CONTRACT (on/off) FENV_REPRODUCIBLE (on/off) FENV_EXCEPT (action except-list) 22

23 <fenv.h> Functions int feclearexcept(int except); int fegetexceptflag(fexcept_t *pflag, int except); int feraiseexcept(int except); int fesetexceptflag(const fexcept_t *pflag, int except); int fetestexcept(int except); int fegetround(void); int fesetround(int mode); int fegetenv(fenv_t *penv); int feholdexcept(fenv_t *penv); int fesetenv(const fenv_t *penv); int feupdateenv(const fenv_t *penv); 23

Directed Rounding Arithmetic Operations (Revision 1) WG21 Document : N2876=

Directed Rounding Arithmetic Operations (Revision 1) WG21 Document : N2876=09-0066 Guillaume Melquiond and Sylvain Pion 2009-04-30 Abstract This revision of N2811 is targeted at the Core Working Group.