Efficient implementation of elementary functions in the medium-precision range

Size: px

Start display at page:

Download "Efficient implementation of elementary functions in the medium-precision range"

Jonah Walsh
6 years ago
Views:

1 Efficient implementation of elementary functions in the medium-precision range Fredrik Johansson (LFANT, INRIA Bordeaux) 22nd IEEE Symposium on Computer Arithmetic (ARITH 22), Lyon, France, June / 27

2 Elementary functions Functions: exp, log, sin, cos, atan 2 / 27

3 Elementary functions Functions: exp, log, sin, cos, atan My goal is to do interval arithmetic with arbitrary precision. 2 / 27

4 Elementary functions Functions: exp, log, sin, cos, atan My goal is to do interval arithmetic with arbitrary precision. Input: floating-point number x = a 2 b, precision p 2 Output: m, r with f (x) [m r, m + r] and r 2 p f (x) 2 / 27

5 Precision ranges Hardware precision (n 53 bits) Extensively studied - elsewhere! 3 / 27

6 Precision ranges Hardware precision (n 53 bits) Extensively studied - elsewhere! Medium precision (n bits) Multiplication costs M(n) = O(n 2 ) or O(n 1.6 ) Argument reduction + rectangular splitting: O(n 1/3 M(n)) In the lower range, software overhead is significant 3 / 27

7 Precision ranges Hardware precision (n 53 bits) Extensively studied - elsewhere! Medium precision (n bits) Multiplication costs M(n) = O(n 2 ) or O(n 1.6 ) Argument reduction + rectangular splitting: O(n 1/3 M(n)) In the lower range, software overhead is significant Very high precision (n bits) Multiplication costs M(n) = O(n log n log log n) Asymptotically fast algorithms: binary splitting, arithmetic-geometric mean (AGM) iteration: O(M(n) log(n)) 3 / 27

8 Improvements in this work 1. The low-level mpn layer of GMP is used exclusively Most libraries (e.g. MPFR) use more convenient types, e.g. mpz and mpfr, to implement elementary functions 4 / 27

9 Improvements in this work 1. The low-level mpn layer of GMP is used exclusively Most libraries (e.g. MPFR) use more convenient types, e.g. mpz and mpfr, to implement elementary functions 2. Argument reduction based on lookup tables Old idea, not well explored in high precision 4 / 27

10 Improvements in this work 1. The low-level mpn layer of GMP is used exclusively Most libraries (e.g. MPFR) use more convenient types, e.g. mpz and mpfr, to implement elementary functions 2. Argument reduction based on lookup tables Old idea, not well explored in high precision 3. Faster evaluation of Taylor series Optimized version of Smith s rectangular splitting algorithm Takes advantage of mpn level functions 4 / 27

11 Recipe for elementary functions exp(x) sin(x), cos(x) log(1 + x) atan(x) Domain reduction using π and log(2) x [0, log(2)) x [0, π/4) x [0, 1) x [0, 1) 5 / 27

12 Recipe for elementary functions exp(x) sin(x), cos(x) log(1 + x) atan(x) Domain reduction using π and log(2) x [0, log(2)) x [0, π/4) x [0, 1) x [0, 1) Argument-halving r 8 times exp(x) = [exp(x/2)] 2 log(1 + x) = 2 log( 1 + x) x [0, 2 r ) Taylor series 5 / 27

13 Better recipe at medium precision exp(x) sin(x), cos(x) log(1 + x) atan(x) Domain reduction using π and log(2) x [0, log(2)) x [0, π/4) x [0, 1) x [0, 1) Lookup table with 2 r 2 8 entries exp(t + x) = exp(t) exp(x) log(1 + t + x) = log(1 + t) + log(1 + x/(1 + t)) x [0, 2 r ) Taylor series 6 / 27

14 Argument reduction formulas What we want to compute: f (x), x [0, 1) Table size: q = 2 r Precomputed value: f (t), t = i/q, i = 2 r x Remaining value to compute: f (y), y [0, 2 r ) 7 / 27

15 Argument reduction formulas What we want to compute: f (x), x [0, 1) Table size: q = 2 r Precomputed value: f (t), t = i/q, i = 2 r x Remaining value to compute: f (y), y [0, 2 r ) exp(x) = exp(t) exp(y), sin(x) = sin(t) cos(y) + cos(t) sin(y), cos(x) = cos(t) cos(y) sin(t) sin(y), y = x i/q y = x i/q y = x i/q 7 / 27

16 Argument reduction formulas What we want to compute: f (x), x [0, 1) Table size: q = 2 r Precomputed value: f (t), t = i/q, i = 2 r x Remaining value to compute: f (y), y [0, 2 r ) exp(x) = exp(t) exp(y), sin(x) = sin(t) cos(y) + cos(t) sin(y), cos(x) = cos(t) cos(y) sin(t) sin(y), y = x i/q y = x i/q y = x i/q log(1 + x) = log(1 + t) + log(1 + y), y = (qx i)/(i + q) atan(x) = atan(t) + atan(y), y = (qx i)/(ix + q) 7 / 27

17 Optimizing lookup tables m = 2 tables with entries gives same reduction as m = 1 table with 2 10 entries Function Precision m r Entries Size (KiB) exp exp sin sin cos cos log log atan atan Total / 27

18 Taylor series Logarithmic series: atan(x) = x 1 3 x x x log(1 + x) = 2 atanh(x/(x + 2)) With x < 2 10, need 230 terms for 4600-bit precision 9 / 27

19 Taylor series Logarithmic series: atan(x) = x 1 3 x x x log(1 + x) = 2 atanh(x/(x + 2)) With x < 2 10, need 230 terms for 4600-bit precision Exponential series: exp(x) = 1 + x + 1 2! x ! x ! x sin(x) = x 1 3! x ! x 5..., cos(x) = 1 1 2! x ! x 4... With x < 2 10, need 280 terms for 4600-bit precision 9 / 27

20 Taylor series Logarithmic series: atan(x) = x 1 3 x x x log(1 + x) = 2 atanh(x/(x + 2)) With x < 2 10, need 230 terms for 4600-bit precision Exponential series: exp(x) = 1 + x + 1 2! x ! x ! x sin(x) = x 1 3! x ! x 5..., cos(x) = 1 1 2! x ! x 4... With x < 2 10, need 280 terms for 4600-bit precision Above 300 bits: cos(x) = 1 sin 2 (x) Above 800 bits: exp(x) = sinh(x) sinh 2 (x) 9 / 27

21 Evaluating Taylor series using rectangular splitting Paterson and Stockmeyer, 1973: n i=0 x i in O(n) cheap steps + O(n 1/2 ) expensive steps 10 / 27

22 Evaluating Taylor series using rectangular splitting Paterson and Stockmeyer, 1973: n i=0 x i in O(n) cheap steps + O(n 1/2 ) expensive steps ( + x + x 2 + x 3 ) + ( + x + x 2 + x 3 ) x 4 + ( + x + x 2 + x 3 ) x 8 + ( + x + x 2 + x 3 ) x / 27

23 Evaluating Taylor series using rectangular splitting Paterson and Stockmeyer, 1973: n i=0 x i in O(n) cheap steps + O(n 1/2 ) expensive steps ( + x + x 2 + x 3 ) + ( + x + x 2 + x 3 ) x 4 + ( + x + x 2 + x 3 ) x 8 + ( + x + x 2 + x 3 ) x 12 Smith, 1989: elementary and hypergeometric functions 10 / 27

24 Evaluating Taylor series using rectangular splitting Paterson and Stockmeyer, 1973: n i=0 x i in O(n) cheap steps + O(n 1/2 ) expensive steps ( + x + x 2 + x 3 ) + ( + x + x 2 + x 3 ) x 4 + ( + x + x 2 + x 3 ) x 8 + ( + x + x 2 + x 3 ) x 12 Smith, 1989: elementary and hypergeometric functions Brent & Zimmermann, 2010: improvements to Smith 10 / 27

25 Evaluating Taylor series using rectangular splitting Paterson and Stockmeyer, 1973: n i=0 x i in O(n) cheap steps + O(n 1/2 ) expensive steps ( + x + x 2 + x 3 ) + ( + x + x 2 + x 3 ) x 4 + ( + x + x 2 + x 3 ) x 8 + ( + x + x 2 + x 3 ) x 12 Smith, 1989: elementary and hypergeometric functions Brent & Zimmermann, 2010: improvements to Smith FJ, 2014: generalization to D-finite functions 10 / 27

26 Evaluating Taylor series using rectangular splitting Paterson and Stockmeyer, 1973: n i=0 x i in O(n) cheap steps + O(n 1/2 ) expensive steps ( + x + x 2 + x 3 ) + ( + x + x 2 + x 3 ) x 4 + ( + x + x 2 + x 3 ) x 8 + ( + x + x 2 + x 3 ) x 12 Smith, 1989: elementary and hypergeometric functions Brent & Zimmermann, 2010: improvements to Smith FJ, 2014: generalization to D-finite functions New: optimized algorithm for elementary functions 10 / 27

27 Logarithmic series Rectangular splitting: x x 2 + x 3{ x x 2 + x 3{ x x 2}} 11 / 27

28 Logarithmic series Rectangular splitting: x x 2 + x 3{ x x 2 + x 3{ x x 2}} Improved algorithm with fewer divisions: x [ 30x 2 +x 3{ 20+15x+12x 2 +x 3{ [ 60 [8x+7x 2]]}}] / 27

29 Exponential series Rectangular splitting: 1+x+ 1 2 [ x x 3{ 1+ 1 [ x+ 1 [ x x 3{ 1+ 1 [x x 2]}]]}] 12 / 27

30 Exponential series Rectangular splitting: 1+x+ 1 2 [ x x 3{ 1+ 1 [ x+ 1 [ x x 3{ 1+ 1 [x x 2]}]]}] Improved algorithm with fewer divisions: 1+x+ 1 [ 12x 2 +x 3{ [ 4+1 x+ 1 [ 6x 2 +x 3{ 1+ 1 [8x+x 2]}]]}] / 27

31 Coefficients for exp series (on a 64-bit machine) Numerators 0-20, denominator 20!/0! = Numerators 21-33, denominator 33!/20! = Numerators , denominator 294!/287! = / 27

32 Taylor series evaluation using mpn arithmetic We use n-word fixed-point numbers (ulp = 2 64n ) Negative numbers implicitly or using two s complement! 14 / 27

33 Taylor series evaluation using mpn arithmetic We use n-word fixed-point numbers (ulp = 2 64n ) Negative numbers implicitly or using two s complement! Example: // sum = sum + term * coeff sum[n] += mpn_addmul_1(sum, term, n, coeff) term is n words: real number in [0, 1) sum is n + 1 words: real number in [0, 2 64 ) coeff is 1 word: integer in [0, 2 64 ) 14 / 27

34 Taylor series summation c 0 + c 1 x + c 2 x 2 + c 3 x 3 + x 4[ c 4 + c 5 x + c 6 x 2 + c 7 x 3] 15 / 27

35 Taylor series summation c 0 + c 1 x + c 2 x 2 + c 3 x 3 + x 4[ c 4 + c 5 x + c 6 x 2 + c 7 x 3] sum[n] += mpn_addmul_1(sum, xpowers[3], n, c[7]) sum[n] += mpn_addmul_1(sum, xpowers[2], n, c[6]) sum[n] += mpn_addmul_1(sum, xpowers[1], n, c[5]) sum[n] += c[4] 15 / 27

36 Taylor series summation c 0 + c 1 x + c 2 x 2 + c 3 x 3 + x 4[ c 4 + c 5 x + c 6 x 2 + c 7 x 3] sum[n] += mpn_addmul_1(sum, xpowers[3], n, c[7]) sum[n] += mpn_addmul_1(sum, xpowers[2], n, c[6]) sum[n] += mpn_addmul_1(sum, xpowers[1], n, c[5]) sum[n] += c[4] mpn_mul(tmp, sum, n+1, xpowers[4], n) mpn_copyi(sum, tmp+n, n+1) 15 / 27

37 Taylor series summation c 0 + c 1 x + c 2 x 2 + c 3 x 3 + x 4[ c 4 + c 5 x + c 6 x 2 + c 7 x 3] sum[n] += mpn_addmul_1(sum, xpowers[3], n, c[7]) sum[n] += mpn_addmul_1(sum, xpowers[2], n, c[6]) sum[n] += mpn_addmul_1(sum, xpowers[1], n, c[5]) sum[n] += c[4] mpn_mul(tmp, sum, n+1, xpowers[4], n) mpn_copyi(sum, tmp+n, n+1) sum[n] += mpn_addmul_1(sum, xpowers[3], n, c[3]) sum[n] += mpn_addmul_1(sum, xpowers[2], n, c[2]) sum[n] += mpn_addmul_1(sum, xpowers[1], n, c[1]) sum[n] += c[0] 15 / 27

38 Alternating signs c 0 c 1 x + c 2 x 2 c 3 x 3 + x 4[ c 4 c 5 x + c 6 x 2 c 7 x 3] 16 / 27

39 Alternating signs c 0 c 1 x + c 2 x 2 c 3 x 3 + x 4[ c 4 c 5 x + c 6 x 2 c 7 x 3] sum[n] -= mpn_submul_1(sum, xpowers[3], n, c[7]) sum[n] += mpn_addmul_1(sum, xpowers[2], n, c[6]) sum[n] -= mpn_submul_1(sum, xpowers[1], n, c[5]) sum[n] += c[4] 16 / 27

40 Alternating signs c 0 c 1 x + c 2 x 2 c 3 x 3 + x 4[ c 4 c 5 x + c 6 x 2 c 7 x 3] sum[n] -= mpn_submul_1(sum, xpowers[3], n, c[7]) sum[n] += mpn_addmul_1(sum, xpowers[2], n, c[6]) sum[n] -= mpn_submul_1(sum, xpowers[1], n, c[5]) sum[n] += c[4] mpn_mul(tmp, sum, n+1, xpowers[4], n) mpn_copyi(sum, tmp+n, n+1) 16 / 27

41 Alternating signs c 0 c 1 x + c 2 x 2 c 3 x 3 + x 4[ c 4 c 5 x + c 6 x 2 c 7 x 3] sum[n] -= mpn_submul_1(sum, xpowers[3], n, c[7]) sum[n] += mpn_addmul_1(sum, xpowers[2], n, c[6]) sum[n] -= mpn_submul_1(sum, xpowers[1], n, c[5]) sum[n] += c[4] mpn_mul(tmp, sum, n+1, xpowers[4], n) mpn_copyi(sum, tmp+n, n+1) sum[n] -= mpn_submul_1(sum, xpowers[3], n, c[3]) sum[n] += mpn_addmul_1(sum, xpowers[2], n, c[2]) sum[n] -= mpn_submul_1(sum, xpowers[1], n, c[1]) sum[n] += c[0] 16 / 27

42 Including divisions (exponential series) 1 [c 0 + c 1 x + c 2 x 2 + c 3 x 3 + 1q4 [ c4 x 4 + c 5 x 5]] q 0 17 / 27

43 Including divisions (exponential series) 1 [c 0 + c 1 x + c 2 x 2 + c 3 x 3 + 1q4 [ c4 x 4 + c 5 x 5]] q 0 sum[n] += mpn_addmul_1(sum, xpowers[5], n, c[5]) sum[n] += mpn_addmul_1(sum, xpowers[4], n, c[4]) 17 / 27

44 Including divisions (exponential series) 1 [c 0 + c 1 x + c 2 x 2 + c 3 x 3 + 1q4 [ c4 x 4 + c 5 x 5]] q 0 sum[n] += mpn_addmul_1(sum, xpowers[5], n, c[5]) sum[n] += mpn_addmul_1(sum, xpowers[4], n, c[4]) mpn_divrem_1(sum, 0, sum, n+1, q[4]) 17 / 27

45 Including divisions (exponential series) 1 [c 0 + c 1 x + c 2 x 2 + c 3 x 3 + 1q4 [ c4 x 4 + c 5 x 5]] q 0 sum[n] += mpn_addmul_1(sum, xpowers[5], n, c[5]) sum[n] += mpn_addmul_1(sum, xpowers[4], n, c[4]) mpn_divrem_1(sum, 0, sum, n+1, q[4]) sum[n] += mpn_addmul_1(sum, xpowers[3], n, c[3]) sum[n] += mpn_addmul_1(sum, xpowers[2], n, c[2]) sum[n] += mpn_addmul_1(sum, xpowers[1], n, c[1]) sum[n] += c[0] 17 / 27

46 Including divisions (exponential series) 1 [c 0 + c 1 x + c 2 x 2 + c 3 x 3 + 1q4 [ c4 x 4 + c 5 x 5]] q 0 sum[n] += mpn_addmul_1(sum, xpowers[5], n, c[5]) sum[n] += mpn_addmul_1(sum, xpowers[4], n, c[4]) mpn_divrem_1(sum, 0, sum, n+1, q[4]) sum[n] += mpn_addmul_1(sum, xpowers[3], n, c[3]) sum[n] += mpn_addmul_1(sum, xpowers[2], n, c[2]) sum[n] += mpn_addmul_1(sum, xpowers[1], n, c[1]) sum[n] += c[0] mpn_divrem_1(sum, 0, sum, n+1, q[0]) 17 / 27

47 Including divisions (logarithmic series) 1 q 0 [ c0 + c 1 x + c 2 x 2 + c 3 x 3] + 1 q 4 [ c4 x 4 + c 5 x 5] 18 / 27

48 Including divisions (logarithmic series) 1 q 0 [ c 0 + c 1 x + c 2 x 2 + c 3 x 3 + q 0 [ c4 x 4 + c 5 x 5] ] q 4 18 / 27

49 Including divisions (logarithmic series) 1 q 0 [ c 0 + c 1 x + c 2 x 2 + c 3 x 3 + q 0 [ c4 x 4 + c 5 x 5] ] q 4 sum[n] += mpn_addmul_1(sum, xpowers[5], n, c[5]) sum[n] += mpn_addmul_1(sum, xpowers[4], n, c[4]) 18 / 27

50 Including divisions (logarithmic series) 1 q 0 [ c 0 + c 1 x + c 2 x 2 + c 3 x 3 + q 0 [ c4 x 4 + c 5 x 5] ] q 4 sum[n] += mpn_addmul_1(sum, xpowers[5], n, c[5]) sum[n] += mpn_addmul_1(sum, xpowers[4], n, c[4]) sum[n+1] = mpn_mul_1(sum, sum, n+1, q[0]) mpn_divrem_1(sum, 0, sum, n+2, q[4]) 18 / 27

51 Including divisions (logarithmic series) 1 q 0 [ c 0 + c 1 x + c 2 x 2 + c 3 x 3 + q 0 [ c4 x 4 + c 5 x 5] ] q 4 sum[n] += mpn_addmul_1(sum, xpowers[5], n, c[5]) sum[n] += mpn_addmul_1(sum, xpowers[4], n, c[4]) sum[n+1] = mpn_mul_1(sum, sum, n+1, q[0]) mpn_divrem_1(sum, 0, sum, n+2, q[4]) sum[n] += mpn_addmul_1(sum, xpowers[3], n, c[3]) sum[n] += mpn_addmul_1(sum, xpowers[2], n, c[2]) sum[n] += mpn_addmul_1(sum, xpowers[1], n, c[1]) sum[n] += c[0] 18 / 27

52 Including divisions (logarithmic series) 1 q 0 [ c 0 + c 1 x + c 2 x 2 + c 3 x 3 + q 0 [ c4 x 4 + c 5 x 5] ] q 4 sum[n] += mpn_addmul_1(sum, xpowers[5], n, c[5]) sum[n] += mpn_addmul_1(sum, xpowers[4], n, c[4]) sum[n+1] = mpn_mul_1(sum, sum, n+1, q[0]) mpn_divrem_1(sum, 0, sum, n+2, q[4]) sum[n] += mpn_addmul_1(sum, xpowers[3], n, c[3]) sum[n] += mpn_addmul_1(sum, xpowers[2], n, c[2]) sum[n] += mpn_addmul_1(sum, xpowers[1], n, c[1]) sum[n] += c[0] mpn_divrem_1(sum, 0, sum, n+1, q[0]) 18 / 27

53 Error bounds and correctness The algorithm evaluates each truncated Taylor series with 2 fixed-point ulp error 19 / 27

54 Error bounds and correctness The algorithm evaluates each truncated Taylor series with 2 fixed-point ulp error How does that work? mpn addmul 1 type operations are exact Clearing denominators gives up to 64 guard bits Evaluation is done backwards: multiplications and divisions remove error 19 / 27

55 Error bounds and correctness The algorithm evaluates each truncated Taylor series with 2 fixed-point ulp error How does that work? mpn addmul 1 type operations are exact Clearing denominators gives up to 64 guard bits Evaluation is done backwards: multiplications and divisions remove error Must also show: no overflow, correct handling of signs 19 / 27

56 Error bounds and correctness The algorithm evaluates each truncated Taylor series with 2 fixed-point ulp error How does that work? mpn addmul 1 type operations are exact Clearing denominators gives up to 64 guard bits Evaluation is done backwards: multiplications and divisions remove error Must also show: no overflow, correct handling of signs Proof depends on the precise sequence of numerators and denominators used. 19 / 27

57 Error bounds and correctness The algorithm evaluates each truncated Taylor series with 2 fixed-point ulp error How does that work? mpn addmul 1 type operations are exact Clearing denominators gives up to 64 guard bits Evaluation is done backwards: multiplications and divisions remove error Must also show: no overflow, correct handling of signs Proof depends on the precise sequence of numerators and denominators used. Proof by exhaustive side computation! 19 / 27

58 Benchmarks The code is part of the Arb library Open source (GPL version 2 or later) 20 / 27

59 Timings (microseconds / function evaluation) Bits exp sin cos log atan Measurements done on an Intel i7-2600s CPU. 21 / 27

60 Speedup vs MPFR Bits exp sin cos log atan Measurements done on an Intel i7-2600s CPU. 22 / 27

61 Comparison to MPFR 10-1 Time (seconds) - MPFR dashed exp(x) log(x) 10-2 sin(x) 30 Speedup Arb vs MPFR exp(x) log(x) sin(x) cos(x) cos(x) 10-3 atan(x) 10 atan(x) Precision (bits) Precision (bits) Measurements done on an Intel i7-2600s CPU. 23 / 27

62 Double (53 bits) precision, microseconds/eval, Intel i7-2600s: exp sin cos log atan EGLIBC MPFR Arb / 27

63 Double (53 bits) precision, microseconds/eval, Intel i7-2600s: exp sin cos log atan EGLIBC MPFR Arb Quad (113 bits) precision, microseconds/eval, Intel T4400: exp sin cos log atan MPFR libquadmath QD Arb / 27

64 Double (53 bits) precision, microseconds/eval, Intel i7-2600s: exp sin cos log atan EGLIBC MPFR Arb Quad (113 bits) precision, microseconds/eval, Intel T4400: exp sin cos log atan MPFR libquadmath QD Arb Quad-double (212 bits) precision, microseconds/eval, Intel T4400: exp sin cos log atan MPFR QD Arb / 27

65 Summary Elementary functions with error bounds Variable precision up to 4600 bits 25 / 27

66 Summary Elementary functions with error bounds Variable precision up to 4600 bits mpn arithmetic KB of lookup tables + efficient algorithm to evaluate Taylor series (rectangular splitting, optimized denominator sequence) Similar algorithm for all functions (no Newton iteration, etc.) 25 / 27

67 Summary Elementary functions with error bounds Variable precision up to 4600 bits mpn arithmetic KB of lookup tables + efficient algorithm to evaluate Taylor series (rectangular splitting, optimized denominator sequence) Similar algorithm for all functions (no Newton iteration, etc.) Improvement over MPFR: up to 3-4x for cos, 8-10x for sin/exp/log, 30x for atan Gap to double precision LIBM (EGLIBC): 4-7x 25 / 27

68 Future work Optimizations Gradually change precision in Taylor series summation Use short multiplications (no GMP support) Use precomputed inverses (no GMP support) Assembly for low precision (1-3 words?) Further tuning of lookup tables 26 / 27

69 Future work Optimizations Gradually change precision in Taylor series summation Use short multiplications (no GMP support) Use precomputed inverses (no GMP support) Assembly for low precision (1-3 words?) Further tuning of lookup tables What if floating-point expansions or carry-save bignums are used instead of GMP/64-bit full words? Vectorization? 26 / 27

70 Future work Optimizations Gradually change precision in Taylor series summation Use short multiplications (no GMP support) Use precomputed inverses (no GMP support) Assembly for low precision (1-3 words?) Further tuning of lookup tables What if floating-point expansions or carry-save bignums are used instead of GMP/64-bit full words? Vectorization? Formally verified implementation? 26 / 27

71 Future work Optimizations Gradually change precision in Taylor series summation Use short multiplications (no GMP support) Use precomputed inverses (no GMP support) Assembly for low precision (1-3 words?) Further tuning of lookup tables What if floating-point expansions or carry-save bignums are used instead of GMP/64-bit full words? Vectorization? Formally verified implementation? Port to other libraries (e.g. MPFR) 26 / 27

72 Thank you! 27 / 27

arxiv: v2 [cs.ms] 9 Jun 2015

arxiv: v2 [cs.ms] 9 Jun 2015 Efficient implementation of elementary functions in the medium-precision range Fredrik Johansson arxiv:1410.7176v2 [cs.ms] 9 Jun 2015 Abstract We describe a new implementation of the elementary transcendental