Efficient implementation of elementary functions in the medium-precision range
|
|
- Jonah Walsh
- 6 years ago
- Views:
Transcription
1 Efficient implementation of elementary functions in the medium-precision range Fredrik Johansson (LFANT, INRIA Bordeaux) 22nd IEEE Symposium on Computer Arithmetic (ARITH 22), Lyon, France, June / 27
2 Elementary functions Functions: exp, log, sin, cos, atan 2 / 27
3 Elementary functions Functions: exp, log, sin, cos, atan My goal is to do interval arithmetic with arbitrary precision. 2 / 27
4 Elementary functions Functions: exp, log, sin, cos, atan My goal is to do interval arithmetic with arbitrary precision. Input: floating-point number x = a 2 b, precision p 2 Output: m, r with f (x) [m r, m + r] and r 2 p f (x) 2 / 27
5 Precision ranges Hardware precision (n 53 bits) Extensively studied - elsewhere! 3 / 27
6 Precision ranges Hardware precision (n 53 bits) Extensively studied - elsewhere! Medium precision (n bits) Multiplication costs M(n) = O(n 2 ) or O(n 1.6 ) Argument reduction + rectangular splitting: O(n 1/3 M(n)) In the lower range, software overhead is significant 3 / 27
7 Precision ranges Hardware precision (n 53 bits) Extensively studied - elsewhere! Medium precision (n bits) Multiplication costs M(n) = O(n 2 ) or O(n 1.6 ) Argument reduction + rectangular splitting: O(n 1/3 M(n)) In the lower range, software overhead is significant Very high precision (n bits) Multiplication costs M(n) = O(n log n log log n) Asymptotically fast algorithms: binary splitting, arithmetic-geometric mean (AGM) iteration: O(M(n) log(n)) 3 / 27
8 Improvements in this work 1. The low-level mpn layer of GMP is used exclusively Most libraries (e.g. MPFR) use more convenient types, e.g. mpz and mpfr, to implement elementary functions 4 / 27
9 Improvements in this work 1. The low-level mpn layer of GMP is used exclusively Most libraries (e.g. MPFR) use more convenient types, e.g. mpz and mpfr, to implement elementary functions 2. Argument reduction based on lookup tables Old idea, not well explored in high precision 4 / 27
10 Improvements in this work 1. The low-level mpn layer of GMP is used exclusively Most libraries (e.g. MPFR) use more convenient types, e.g. mpz and mpfr, to implement elementary functions 2. Argument reduction based on lookup tables Old idea, not well explored in high precision 3. Faster evaluation of Taylor series Optimized version of Smith s rectangular splitting algorithm Takes advantage of mpn level functions 4 / 27
11 Recipe for elementary functions exp(x) sin(x), cos(x) log(1 + x) atan(x) Domain reduction using π and log(2) x [0, log(2)) x [0, π/4) x [0, 1) x [0, 1) 5 / 27
12 Recipe for elementary functions exp(x) sin(x), cos(x) log(1 + x) atan(x) Domain reduction using π and log(2) x [0, log(2)) x [0, π/4) x [0, 1) x [0, 1) Argument-halving r 8 times exp(x) = [exp(x/2)] 2 log(1 + x) = 2 log( 1 + x) x [0, 2 r ) Taylor series 5 / 27
13 Better recipe at medium precision exp(x) sin(x), cos(x) log(1 + x) atan(x) Domain reduction using π and log(2) x [0, log(2)) x [0, π/4) x [0, 1) x [0, 1) Lookup table with 2 r 2 8 entries exp(t + x) = exp(t) exp(x) log(1 + t + x) = log(1 + t) + log(1 + x/(1 + t)) x [0, 2 r ) Taylor series 6 / 27
14 Argument reduction formulas What we want to compute: f (x), x [0, 1) Table size: q = 2 r Precomputed value: f (t), t = i/q, i = 2 r x Remaining value to compute: f (y), y [0, 2 r ) 7 / 27
15 Argument reduction formulas What we want to compute: f (x), x [0, 1) Table size: q = 2 r Precomputed value: f (t), t = i/q, i = 2 r x Remaining value to compute: f (y), y [0, 2 r ) exp(x) = exp(t) exp(y), sin(x) = sin(t) cos(y) + cos(t) sin(y), cos(x) = cos(t) cos(y) sin(t) sin(y), y = x i/q y = x i/q y = x i/q 7 / 27
16 Argument reduction formulas What we want to compute: f (x), x [0, 1) Table size: q = 2 r Precomputed value: f (t), t = i/q, i = 2 r x Remaining value to compute: f (y), y [0, 2 r ) exp(x) = exp(t) exp(y), sin(x) = sin(t) cos(y) + cos(t) sin(y), cos(x) = cos(t) cos(y) sin(t) sin(y), y = x i/q y = x i/q y = x i/q log(1 + x) = log(1 + t) + log(1 + y), y = (qx i)/(i + q) atan(x) = atan(t) + atan(y), y = (qx i)/(ix + q) 7 / 27
17 Optimizing lookup tables m = 2 tables with entries gives same reduction as m = 1 table with 2 10 entries Function Precision m r Entries Size (KiB) exp exp sin sin cos cos log log atan atan Total / 27
18 Taylor series Logarithmic series: atan(x) = x 1 3 x x x log(1 + x) = 2 atanh(x/(x + 2)) With x < 2 10, need 230 terms for 4600-bit precision 9 / 27
19 Taylor series Logarithmic series: atan(x) = x 1 3 x x x log(1 + x) = 2 atanh(x/(x + 2)) With x < 2 10, need 230 terms for 4600-bit precision Exponential series: exp(x) = 1 + x + 1 2! x ! x ! x sin(x) = x 1 3! x ! x 5..., cos(x) = 1 1 2! x ! x 4... With x < 2 10, need 280 terms for 4600-bit precision 9 / 27
20 Taylor series Logarithmic series: atan(x) = x 1 3 x x x log(1 + x) = 2 atanh(x/(x + 2)) With x < 2 10, need 230 terms for 4600-bit precision Exponential series: exp(x) = 1 + x + 1 2! x ! x ! x sin(x) = x 1 3! x ! x 5..., cos(x) = 1 1 2! x ! x 4... With x < 2 10, need 280 terms for 4600-bit precision Above 300 bits: cos(x) = 1 sin 2 (x) Above 800 bits: exp(x) = sinh(x) sinh 2 (x) 9 / 27
21 Evaluating Taylor series using rectangular splitting Paterson and Stockmeyer, 1973: n i=0 x i in O(n) cheap steps + O(n 1/2 ) expensive steps 10 / 27
22 Evaluating Taylor series using rectangular splitting Paterson and Stockmeyer, 1973: n i=0 x i in O(n) cheap steps + O(n 1/2 ) expensive steps ( + x + x 2 + x 3 ) + ( + x + x 2 + x 3 ) x 4 + ( + x + x 2 + x 3 ) x 8 + ( + x + x 2 + x 3 ) x / 27
23 Evaluating Taylor series using rectangular splitting Paterson and Stockmeyer, 1973: n i=0 x i in O(n) cheap steps + O(n 1/2 ) expensive steps ( + x + x 2 + x 3 ) + ( + x + x 2 + x 3 ) x 4 + ( + x + x 2 + x 3 ) x 8 + ( + x + x 2 + x 3 ) x 12 Smith, 1989: elementary and hypergeometric functions 10 / 27
24 Evaluating Taylor series using rectangular splitting Paterson and Stockmeyer, 1973: n i=0 x i in O(n) cheap steps + O(n 1/2 ) expensive steps ( + x + x 2 + x 3 ) + ( + x + x 2 + x 3 ) x 4 + ( + x + x 2 + x 3 ) x 8 + ( + x + x 2 + x 3 ) x 12 Smith, 1989: elementary and hypergeometric functions Brent & Zimmermann, 2010: improvements to Smith 10 / 27
25 Evaluating Taylor series using rectangular splitting Paterson and Stockmeyer, 1973: n i=0 x i in O(n) cheap steps + O(n 1/2 ) expensive steps ( + x + x 2 + x 3 ) + ( + x + x 2 + x 3 ) x 4 + ( + x + x 2 + x 3 ) x 8 + ( + x + x 2 + x 3 ) x 12 Smith, 1989: elementary and hypergeometric functions Brent & Zimmermann, 2010: improvements to Smith FJ, 2014: generalization to D-finite functions 10 / 27
26 Evaluating Taylor series using rectangular splitting Paterson and Stockmeyer, 1973: n i=0 x i in O(n) cheap steps + O(n 1/2 ) expensive steps ( + x + x 2 + x 3 ) + ( + x + x 2 + x 3 ) x 4 + ( + x + x 2 + x 3 ) x 8 + ( + x + x 2 + x 3 ) x 12 Smith, 1989: elementary and hypergeometric functions Brent & Zimmermann, 2010: improvements to Smith FJ, 2014: generalization to D-finite functions New: optimized algorithm for elementary functions 10 / 27
27 Logarithmic series Rectangular splitting: x x 2 + x 3{ x x 2 + x 3{ x x 2}} 11 / 27
28 Logarithmic series Rectangular splitting: x x 2 + x 3{ x x 2 + x 3{ x x 2}} Improved algorithm with fewer divisions: x [ 30x 2 +x 3{ 20+15x+12x 2 +x 3{ [ 60 [8x+7x 2]]}}] / 27
29 Exponential series Rectangular splitting: 1+x+ 1 2 [ x x 3{ 1+ 1 [ x+ 1 [ x x 3{ 1+ 1 [x x 2]}]]}] 12 / 27
30 Exponential series Rectangular splitting: 1+x+ 1 2 [ x x 3{ 1+ 1 [ x+ 1 [ x x 3{ 1+ 1 [x x 2]}]]}] Improved algorithm with fewer divisions: 1+x+ 1 [ 12x 2 +x 3{ [ 4+1 x+ 1 [ 6x 2 +x 3{ 1+ 1 [8x+x 2]}]]}] / 27
31 Coefficients for exp series (on a 64-bit machine) Numerators 0-20, denominator 20!/0! = Numerators 21-33, denominator 33!/20! = Numerators , denominator 294!/287! = / 27
32 Taylor series evaluation using mpn arithmetic We use n-word fixed-point numbers (ulp = 2 64n ) Negative numbers implicitly or using two s complement! 14 / 27
33 Taylor series evaluation using mpn arithmetic We use n-word fixed-point numbers (ulp = 2 64n ) Negative numbers implicitly or using two s complement! Example: // sum = sum + term * coeff sum[n] += mpn_addmul_1(sum, term, n, coeff) term is n words: real number in [0, 1) sum is n + 1 words: real number in [0, 2 64 ) coeff is 1 word: integer in [0, 2 64 ) 14 / 27
34 Taylor series summation c 0 + c 1 x + c 2 x 2 + c 3 x 3 + x 4[ c 4 + c 5 x + c 6 x 2 + c 7 x 3] 15 / 27
35 Taylor series summation c 0 + c 1 x + c 2 x 2 + c 3 x 3 + x 4[ c 4 + c 5 x + c 6 x 2 + c 7 x 3] sum[n] += mpn_addmul_1(sum, xpowers[3], n, c[7]) sum[n] += mpn_addmul_1(sum, xpowers[2], n, c[6]) sum[n] += mpn_addmul_1(sum, xpowers[1], n, c[5]) sum[n] += c[4] 15 / 27
36 Taylor series summation c 0 + c 1 x + c 2 x 2 + c 3 x 3 + x 4[ c 4 + c 5 x + c 6 x 2 + c 7 x 3] sum[n] += mpn_addmul_1(sum, xpowers[3], n, c[7]) sum[n] += mpn_addmul_1(sum, xpowers[2], n, c[6]) sum[n] += mpn_addmul_1(sum, xpowers[1], n, c[5]) sum[n] += c[4] mpn_mul(tmp, sum, n+1, xpowers[4], n) mpn_copyi(sum, tmp+n, n+1) 15 / 27
37 Taylor series summation c 0 + c 1 x + c 2 x 2 + c 3 x 3 + x 4[ c 4 + c 5 x + c 6 x 2 + c 7 x 3] sum[n] += mpn_addmul_1(sum, xpowers[3], n, c[7]) sum[n] += mpn_addmul_1(sum, xpowers[2], n, c[6]) sum[n] += mpn_addmul_1(sum, xpowers[1], n, c[5]) sum[n] += c[4] mpn_mul(tmp, sum, n+1, xpowers[4], n) mpn_copyi(sum, tmp+n, n+1) sum[n] += mpn_addmul_1(sum, xpowers[3], n, c[3]) sum[n] += mpn_addmul_1(sum, xpowers[2], n, c[2]) sum[n] += mpn_addmul_1(sum, xpowers[1], n, c[1]) sum[n] += c[0] 15 / 27
38 Alternating signs c 0 c 1 x + c 2 x 2 c 3 x 3 + x 4[ c 4 c 5 x + c 6 x 2 c 7 x 3] 16 / 27
39 Alternating signs c 0 c 1 x + c 2 x 2 c 3 x 3 + x 4[ c 4 c 5 x + c 6 x 2 c 7 x 3] sum[n] -= mpn_submul_1(sum, xpowers[3], n, c[7]) sum[n] += mpn_addmul_1(sum, xpowers[2], n, c[6]) sum[n] -= mpn_submul_1(sum, xpowers[1], n, c[5]) sum[n] += c[4] 16 / 27
40 Alternating signs c 0 c 1 x + c 2 x 2 c 3 x 3 + x 4[ c 4 c 5 x + c 6 x 2 c 7 x 3] sum[n] -= mpn_submul_1(sum, xpowers[3], n, c[7]) sum[n] += mpn_addmul_1(sum, xpowers[2], n, c[6]) sum[n] -= mpn_submul_1(sum, xpowers[1], n, c[5]) sum[n] += c[4] mpn_mul(tmp, sum, n+1, xpowers[4], n) mpn_copyi(sum, tmp+n, n+1) 16 / 27
41 Alternating signs c 0 c 1 x + c 2 x 2 c 3 x 3 + x 4[ c 4 c 5 x + c 6 x 2 c 7 x 3] sum[n] -= mpn_submul_1(sum, xpowers[3], n, c[7]) sum[n] += mpn_addmul_1(sum, xpowers[2], n, c[6]) sum[n] -= mpn_submul_1(sum, xpowers[1], n, c[5]) sum[n] += c[4] mpn_mul(tmp, sum, n+1, xpowers[4], n) mpn_copyi(sum, tmp+n, n+1) sum[n] -= mpn_submul_1(sum, xpowers[3], n, c[3]) sum[n] += mpn_addmul_1(sum, xpowers[2], n, c[2]) sum[n] -= mpn_submul_1(sum, xpowers[1], n, c[1]) sum[n] += c[0] 16 / 27
42 Including divisions (exponential series) 1 [c 0 + c 1 x + c 2 x 2 + c 3 x 3 + 1q4 [ c4 x 4 + c 5 x 5]] q 0 17 / 27
43 Including divisions (exponential series) 1 [c 0 + c 1 x + c 2 x 2 + c 3 x 3 + 1q4 [ c4 x 4 + c 5 x 5]] q 0 sum[n] += mpn_addmul_1(sum, xpowers[5], n, c[5]) sum[n] += mpn_addmul_1(sum, xpowers[4], n, c[4]) 17 / 27
44 Including divisions (exponential series) 1 [c 0 + c 1 x + c 2 x 2 + c 3 x 3 + 1q4 [ c4 x 4 + c 5 x 5]] q 0 sum[n] += mpn_addmul_1(sum, xpowers[5], n, c[5]) sum[n] += mpn_addmul_1(sum, xpowers[4], n, c[4]) mpn_divrem_1(sum, 0, sum, n+1, q[4]) 17 / 27
45 Including divisions (exponential series) 1 [c 0 + c 1 x + c 2 x 2 + c 3 x 3 + 1q4 [ c4 x 4 + c 5 x 5]] q 0 sum[n] += mpn_addmul_1(sum, xpowers[5], n, c[5]) sum[n] += mpn_addmul_1(sum, xpowers[4], n, c[4]) mpn_divrem_1(sum, 0, sum, n+1, q[4]) sum[n] += mpn_addmul_1(sum, xpowers[3], n, c[3]) sum[n] += mpn_addmul_1(sum, xpowers[2], n, c[2]) sum[n] += mpn_addmul_1(sum, xpowers[1], n, c[1]) sum[n] += c[0] 17 / 27
46 Including divisions (exponential series) 1 [c 0 + c 1 x + c 2 x 2 + c 3 x 3 + 1q4 [ c4 x 4 + c 5 x 5]] q 0 sum[n] += mpn_addmul_1(sum, xpowers[5], n, c[5]) sum[n] += mpn_addmul_1(sum, xpowers[4], n, c[4]) mpn_divrem_1(sum, 0, sum, n+1, q[4]) sum[n] += mpn_addmul_1(sum, xpowers[3], n, c[3]) sum[n] += mpn_addmul_1(sum, xpowers[2], n, c[2]) sum[n] += mpn_addmul_1(sum, xpowers[1], n, c[1]) sum[n] += c[0] mpn_divrem_1(sum, 0, sum, n+1, q[0]) 17 / 27
47 Including divisions (logarithmic series) 1 q 0 [ c0 + c 1 x + c 2 x 2 + c 3 x 3] + 1 q 4 [ c4 x 4 + c 5 x 5] 18 / 27
48 Including divisions (logarithmic series) 1 q 0 [ c 0 + c 1 x + c 2 x 2 + c 3 x 3 + q 0 [ c4 x 4 + c 5 x 5] ] q 4 18 / 27
49 Including divisions (logarithmic series) 1 q 0 [ c 0 + c 1 x + c 2 x 2 + c 3 x 3 + q 0 [ c4 x 4 + c 5 x 5] ] q 4 sum[n] += mpn_addmul_1(sum, xpowers[5], n, c[5]) sum[n] += mpn_addmul_1(sum, xpowers[4], n, c[4]) 18 / 27
50 Including divisions (logarithmic series) 1 q 0 [ c 0 + c 1 x + c 2 x 2 + c 3 x 3 + q 0 [ c4 x 4 + c 5 x 5] ] q 4 sum[n] += mpn_addmul_1(sum, xpowers[5], n, c[5]) sum[n] += mpn_addmul_1(sum, xpowers[4], n, c[4]) sum[n+1] = mpn_mul_1(sum, sum, n+1, q[0]) mpn_divrem_1(sum, 0, sum, n+2, q[4]) 18 / 27
51 Including divisions (logarithmic series) 1 q 0 [ c 0 + c 1 x + c 2 x 2 + c 3 x 3 + q 0 [ c4 x 4 + c 5 x 5] ] q 4 sum[n] += mpn_addmul_1(sum, xpowers[5], n, c[5]) sum[n] += mpn_addmul_1(sum, xpowers[4], n, c[4]) sum[n+1] = mpn_mul_1(sum, sum, n+1, q[0]) mpn_divrem_1(sum, 0, sum, n+2, q[4]) sum[n] += mpn_addmul_1(sum, xpowers[3], n, c[3]) sum[n] += mpn_addmul_1(sum, xpowers[2], n, c[2]) sum[n] += mpn_addmul_1(sum, xpowers[1], n, c[1]) sum[n] += c[0] 18 / 27
52 Including divisions (logarithmic series) 1 q 0 [ c 0 + c 1 x + c 2 x 2 + c 3 x 3 + q 0 [ c4 x 4 + c 5 x 5] ] q 4 sum[n] += mpn_addmul_1(sum, xpowers[5], n, c[5]) sum[n] += mpn_addmul_1(sum, xpowers[4], n, c[4]) sum[n+1] = mpn_mul_1(sum, sum, n+1, q[0]) mpn_divrem_1(sum, 0, sum, n+2, q[4]) sum[n] += mpn_addmul_1(sum, xpowers[3], n, c[3]) sum[n] += mpn_addmul_1(sum, xpowers[2], n, c[2]) sum[n] += mpn_addmul_1(sum, xpowers[1], n, c[1]) sum[n] += c[0] mpn_divrem_1(sum, 0, sum, n+1, q[0]) 18 / 27
53 Error bounds and correctness The algorithm evaluates each truncated Taylor series with 2 fixed-point ulp error 19 / 27
54 Error bounds and correctness The algorithm evaluates each truncated Taylor series with 2 fixed-point ulp error How does that work? mpn addmul 1 type operations are exact Clearing denominators gives up to 64 guard bits Evaluation is done backwards: multiplications and divisions remove error 19 / 27
55 Error bounds and correctness The algorithm evaluates each truncated Taylor series with 2 fixed-point ulp error How does that work? mpn addmul 1 type operations are exact Clearing denominators gives up to 64 guard bits Evaluation is done backwards: multiplications and divisions remove error Must also show: no overflow, correct handling of signs 19 / 27
56 Error bounds and correctness The algorithm evaluates each truncated Taylor series with 2 fixed-point ulp error How does that work? mpn addmul 1 type operations are exact Clearing denominators gives up to 64 guard bits Evaluation is done backwards: multiplications and divisions remove error Must also show: no overflow, correct handling of signs Proof depends on the precise sequence of numerators and denominators used. 19 / 27
57 Error bounds and correctness The algorithm evaluates each truncated Taylor series with 2 fixed-point ulp error How does that work? mpn addmul 1 type operations are exact Clearing denominators gives up to 64 guard bits Evaluation is done backwards: multiplications and divisions remove error Must also show: no overflow, correct handling of signs Proof depends on the precise sequence of numerators and denominators used. Proof by exhaustive side computation! 19 / 27
58 Benchmarks The code is part of the Arb library Open source (GPL version 2 or later) 20 / 27
59 Timings (microseconds / function evaluation) Bits exp sin cos log atan Measurements done on an Intel i7-2600s CPU. 21 / 27
60 Speedup vs MPFR Bits exp sin cos log atan Measurements done on an Intel i7-2600s CPU. 22 / 27
61 Comparison to MPFR 10-1 Time (seconds) - MPFR dashed exp(x) log(x) 10-2 sin(x) 30 Speedup Arb vs MPFR exp(x) log(x) sin(x) cos(x) cos(x) 10-3 atan(x) 10 atan(x) Precision (bits) Precision (bits) Measurements done on an Intel i7-2600s CPU. 23 / 27
62 Double (53 bits) precision, microseconds/eval, Intel i7-2600s: exp sin cos log atan EGLIBC MPFR Arb / 27
63 Double (53 bits) precision, microseconds/eval, Intel i7-2600s: exp sin cos log atan EGLIBC MPFR Arb Quad (113 bits) precision, microseconds/eval, Intel T4400: exp sin cos log atan MPFR libquadmath QD Arb / 27
64 Double (53 bits) precision, microseconds/eval, Intel i7-2600s: exp sin cos log atan EGLIBC MPFR Arb Quad (113 bits) precision, microseconds/eval, Intel T4400: exp sin cos log atan MPFR libquadmath QD Arb Quad-double (212 bits) precision, microseconds/eval, Intel T4400: exp sin cos log atan MPFR QD Arb / 27
65 Summary Elementary functions with error bounds Variable precision up to 4600 bits 25 / 27
66 Summary Elementary functions with error bounds Variable precision up to 4600 bits mpn arithmetic KB of lookup tables + efficient algorithm to evaluate Taylor series (rectangular splitting, optimized denominator sequence) Similar algorithm for all functions (no Newton iteration, etc.) 25 / 27
67 Summary Elementary functions with error bounds Variable precision up to 4600 bits mpn arithmetic KB of lookup tables + efficient algorithm to evaluate Taylor series (rectangular splitting, optimized denominator sequence) Similar algorithm for all functions (no Newton iteration, etc.) Improvement over MPFR: up to 3-4x for cos, 8-10x for sin/exp/log, 30x for atan Gap to double precision LIBM (EGLIBC): 4-7x 25 / 27
68 Future work Optimizations Gradually change precision in Taylor series summation Use short multiplications (no GMP support) Use precomputed inverses (no GMP support) Assembly for low precision (1-3 words?) Further tuning of lookup tables 26 / 27
69 Future work Optimizations Gradually change precision in Taylor series summation Use short multiplications (no GMP support) Use precomputed inverses (no GMP support) Assembly for low precision (1-3 words?) Further tuning of lookup tables What if floating-point expansions or carry-save bignums are used instead of GMP/64-bit full words? Vectorization? 26 / 27
70 Future work Optimizations Gradually change precision in Taylor series summation Use short multiplications (no GMP support) Use precomputed inverses (no GMP support) Assembly for low precision (1-3 words?) Further tuning of lookup tables What if floating-point expansions or carry-save bignums are used instead of GMP/64-bit full words? Vectorization? Formally verified implementation? 26 / 27
71 Future work Optimizations Gradually change precision in Taylor series summation Use short multiplications (no GMP support) Use precomputed inverses (no GMP support) Assembly for low precision (1-3 words?) Further tuning of lookup tables What if floating-point expansions or carry-save bignums are used instead of GMP/64-bit full words? Vectorization? Formally verified implementation? Port to other libraries (e.g. MPFR) 26 / 27
72 Thank you! 27 / 27
arxiv: v2 [cs.ms] 9 Jun 2015
Efficient implementation of elementary functions in the medium-precision range Fredrik Johansson arxiv:1410.7176v2 [cs.ms] 9 Jun 2015 Abstract We describe a new implementation of the elementary transcendental
More informationArb: a C library for ball arithmetic
Arb: a C library for ball arithmetic Fredrik Johansson RISC-Linz ISSAC 2013 Supported by the Austrian Science Fund (FWF) grant Y464-N18 Fredrik Johansson (RISC-Linz) Arb: a C library for ball arithmetic
More informationAdditional Key Words and Phrases: Accuracy, elementary function evaluation, floating point, Fortran, mathematical library, portable software
A Fortran Package For Floating-Point Multiple-Precision Arithmetic David M. Smith Loyola Marymount University FM is a collection of Fortran-77 routines which performs floating-point multiple-precision
More informationCS321 Introduction To Numerical Methods
CS3 Introduction To Numerical Methods Fuhua (Frank) Cheng Department of Computer Science University of Kentucky Lexington KY 456-46 - - Table of Contents Errors and Number Representations 3 Error Types
More informationCorrect Rounding of Mathematical Functions
Correct Rounding of Mathematical Functions Vincent LEFÈVRE Arénaire, INRIA Grenoble Rhône-Alpes / LIP, ENS-Lyon SIESTE, 2010-03-23 Outline Introduction: Floating-Point Arithmetic Relation with Programming
More informationFormal Verification of a Floating-Point Elementary Function
Introduction Coq & Flocq Coq.Interval Gappa Conclusion Formal Verification of a Floating-Point Elementary Function Inria Saclay Île-de-France & LRI, Université Paris Sud, CNRS 2015-06-25 Introduction Coq
More informationClasses of Real Numbers 1/2. The Real Line
Classes of Real Numbers All real numbers can be represented by a line: 1/2 π 1 0 1 2 3 4 real numbers The Real Line { integers rational numbers non-integral fractions irrational numbers Rational numbers
More informationSIPE: Small Integer Plus Exponent
SIPE: Small Integer Plus Exponent Vincent LEFÈVRE AriC, INRIA Grenoble Rhône-Alpes / LIP, ENS-Lyon Arith 21, Austin, Texas, USA, 2013-04-09 Introduction: Why SIPE? All started with floating-point algorithms
More informationOptimized Binary64 and Binary128 Arithmetic with GNU MPFR (common work with Vincent Lefèvre)
Arith 24 conference, London, July 24, 2017 Optimized Binary64 and Binary128 Arithmetic with GNU MPFR (common work with Vincent Lefèvre) Paul Zimmermann Introducing the GNU MPFR library a software implementation
More informationComputing Fundamentals
Computing Fundamentals Salvatore Filippone salvatore.filippone@uniroma2.it 2012 2013 (salvatore.filippone@uniroma2.it) Computing Fundamentals 2012 2013 1 / 18 Octave basics Octave/Matlab: f p r i n t f
More informationGeneral Terms: Algorithms, exponential integral function, multiple precision
Multiple-Precision Exponential Integral and Related Functions David M. Smith Loyola Marymount University This article describes a collection of Fortran-95 routines for evaluating the exponential integral
More informationTest Construction for Mathematical Functions
Test Construction for Mathematical Functions Victor Kuliamin Institute for System Programming Russian Academy of Sciences 109004, B. Kommunistitcheskaya, 25, Moscow, Russia kuliamin@ispras.ru Abstract.
More informationNon-Holonomic Sequences and Functions
Non-Holonomic Sequences and Functions Stefan Gerhold May 16, 2006 Holonomic Sequences and Functions A function f (z) is holonomic if it satisfies an LODE p 0 (z)f (0) (z) + + p d (z)f (d) (z) = 0 with
More informationLecture Objectives. Structured Programming & an Introduction to Error. Review the basic good habits of programming
Structured Programming & an Introduction to Error Lecture Objectives Review the basic good habits of programming To understand basic concepts of error and error estimation as it applies to Numerical Methods
More informationFormal Verification in Industry
Formal Verification in Industry 1 Formal Verification in Industry John Harrison Intel Corporation The cost of bugs Formal verification Machine-checked proof Automatic and interactive approaches HOL Light
More informationChapter 4: Basic C Operators
Chapter 4: Basic C Operators In this chapter, you will learn about: Arithmetic operators Unary operators Binary operators Assignment operators Equalities and relational operators Logical operators Conditional
More information2 Computation with Floating-Point Numbers
2 Computation with Floating-Point Numbers 2.1 Floating-Point Representation The notion of real numbers in mathematics is convenient for hand computations and formula manipulations. However, real numbers
More informationGenerating a Minimal Interval Arithmetic Based on GNU MPFR
Generating a Minimal Interval Arithmetic Based on GNU MPFR (in the context of the search for hard-to-round cases) Vincent LEFÈVRE Arénaire, INRIA Grenoble Rhône-Alpes / LIP, ENS-Lyon Dagstuhl Seminar 11371
More informationSketchify Tutorial Properties and Variables. sketchify.sf.net Željko Obrenović
Sketchify Tutorial Properties and Variables sketchify.sf.net Željko Obrenović z.obrenovic@tue.nl Properties and Variables Properties of active regions and sketches can be given directly, or indirectly
More informationLimits and Continuity: section 12.2
Limits and Continuity: section 1. Definition. Let f(x,y) be a function with domain D, and let (a,b) be a point in the plane. We write f (x,y) = L if for each ε > 0 there exists some δ > 0 such that if
More information2 Computation with Floating-Point Numbers
2 Computation with Floating-Point Numbers 2.1 Floating-Point Representation The notion of real numbers in mathematics is convenient for hand computations and formula manipulations. However, real numbers
More informationIntroduction to Programming
Introduction to Programming Department of Computer Science and Information Systems Tingting Han (afternoon), Steve Maybank (evening) tingting@dcs.bbk.ac.uk sjmaybank@dcs.bbk.ac.uk Autumn 2017 Week 4: More
More informationPart V Appendices c Copyright, Todd Young and Martin Mohlenkamp, Department of Mathematics, Ohio University, 2017
Part V Appendices c Copyright, Todd Young and Martin Mohlenkamp, Department of Mathematics, Ohio University, 2017 Appendix A Glossary of Matlab Commands Mathematical Operations + Addition. Type help plus
More informationLAB 1 General MATLAB Information 1
LAB 1 General MATLAB Information 1 General: To enter a matrix: > type the entries between square brackets, [...] > enter it by rows with elements separated by a space or comma > rows are terminated by
More information3.2 Recursions One-term recursion, searching for a first occurrence, two-term recursion. 2 sin.
Chapter 3 Sequences 3.1 Summation Nested loops, while-loops with compound termination criteria 3.2 Recursions One-term recursion, searching for a first occurrence, two-term recursion. In Chapter 2 we played
More informationOutline. L9: Project Discussion and Floating Point Issues. Project Parts (Total = 50%) Project Proposal (due 3/8) 2/13/12.
Outline L9: Project Discussion and Floating Point Issues Discussion of semester projects Floating point Mostly single precision until recent architectures Accuracy What s fast and what s not Reading: Ch
More informationSpace Complexity of Fast D-Finite Function Evaluation
Space Complexity of Fast D-Finite Function Evaluation AriC Team Inria CASC 0 Binary Splitting A classical method to evaluate series of rational numbers Classical constants Ex.: π = ( ) k (6k)! (359409
More informationWebAssign Lesson 1-2a Area Between Curves (Homework)
WebAssign Lesson 1-2a Area Between Curves (Homework) Current Score : / 30 Due : Thursday, June 26 2014 11:00 AM MDT Jaimos Skriletz Math 175, section 31, Summer 2 2014 Instructor: Jaimos Skriletz 1. /3
More informationIntroduction to Computer Programming in Python Dr. William C. Bulko. Data Types
Introduction to Computer Programming in Python Dr William C Bulko Data Types 2017 What is a data type? A data type is the kind of value represented by a constant or stored by a variable So far, you have
More informationIntroduction to numerical algorithms
Introduction to numerical algorithms Given an algebraic equation or formula, we may want to approximate the value, and while in calculus, we deal with equations or formulas that are well defined at each
More informationAlgorithm must complete after a finite number of instructions have been executed. Each step must be clearly defined, having only one interpretation.
Algorithms 1 algorithm: a finite set of instructions that specify a sequence of operations to be carried out in order to solve a specific problem or class of problems An algorithm must possess the following
More informationIntroduction to Computer Programming with MATLAB Calculation and Programming Errors. Selis Önel, PhD
Introduction to Computer Programming with MATLAB Calculation and Programming Errors Selis Önel, PhD Today you will learn Numbers, Significant figures Error analysis Absolute error Relative error Chopping
More informationSecond Order Function Approximation Using a Single Multiplication on FPGAs
Second Order Function Approximation Using a Single Multiplication on FPGAs Jérémie Detrey and Florent de Dinechin Laboratoire de l Informatique du Parallélisme École Normale Supérieure de Lyon 46, allée
More informationMachine Computation of the Sine Function
Machine Computation of the Sine Function Chuck Allison CNS 3320 Numerical Software Engineering Utah Valley State College February 2007 Abstract This note shows how to determine the number of terms to use
More informationPerformance Evaluation of a Novel Direct Table Lookup Method and Architecture With Application to 16-bit Integer Functions
Performance Evaluation of a Novel Direct Table Lookup Method and Architecture With Application to 16-bit nteger Functions L. Li, Alex Fit-Florea, M. A. Thornton, D. W. Matula Southern Methodist University,
More informationFloating-point representation
Lecture 3-4: Floating-point representation and arithmetic Floating-point representation The notion of real numbers in mathematics is convenient for hand computations and formula manipulations. However,
More informationLECTURE 0: Introduction and Background
1 LECTURE 0: Introduction and Background September 10, 2012 1 Computational science The role of computational science has become increasingly significant during the last few decades. It has become the
More information2-3 Graphing Rational Functions
2-3 Graphing Rational Functions Factor What are the end behaviors of the Graph? Sketch a graph How to identify the intercepts, asymptotes and end behavior of a rational function. How to sketch the graph
More informationMATLAB QUICK START TUTORIAL
MATLAB QUICK START TUTORIAL This tutorial is a brief introduction to MATLAB which is considered one of the most powerful languages of technical computing. In the following sections, the basic knowledge
More informationMATLAB NOTES. Matlab designed for numerical computing. Strongly oriented towards use of arrays, one and two dimensional.
MATLAB NOTES Matlab designed for numerical computing. Strongly oriented towards use of arrays, one and two dimensional. Excellent graphics that are easy to use. Powerful interactive facilities; and programs
More informationIntroduction to floating point arithmetic
Introduction to floating point arithmetic Matthias Petschow and Paolo Bientinesi AICES, RWTH Aachen petschow@aices.rwth-aachen.de October 24th, 2013 Aachen, Germany Matthias Petschow (AICES, RWTH Aachen)
More informationPython Programming: An Introduction to Computer Science
Python Programming: An Introduction to Computer Science Chapter 3 Computing with Numbers Python Programming, 3/e 1 Objectives n To understand the concept of data types. n To be familiar with the basic
More informationCS321. Introduction to Numerical Methods
CS31 Introduction to Numerical Methods Lecture 1 Number Representations and Errors Professor Jun Zhang Department of Computer Science University of Kentucky Lexington, KY 40506 0633 August 5, 017 Number
More information1.1 Numbers system :-
1.1 Numbers system :- 1.3.1 Decimal System (0-9) :- Decimal system is a way of writing numbers. Any number, from huge quantities to tiny fractions, can be written in the decimal system using only the ten
More informationFloating Point (with contributions from Dr. Bin Ren, William & Mary Computer Science)
Floating Point (with contributions from Dr. Bin Ren, William & Mary Computer Science) Floating Point Background: Fractional binary numbers IEEE floating point standard: Definition Example and properties
More informationGPU Floating Point Features
CSE 591: GPU Programming Floating Point Considerations Klaus Mueller Computer Science Department Stony Brook University Objective To understand the fundamentals of floating-point representation To know
More informationReview Questions 26 CHAPTER 1. SCIENTIFIC COMPUTING
26 CHAPTER 1. SCIENTIFIC COMPUTING amples. The IEEE floating-point standard can be found in [131]. A useful tutorial on floating-point arithmetic and the IEEE standard is [97]. Although it is no substitute
More informationGreen Globs And Graphing Equations
Green Globs And Graphing Equations Green Globs and Graphing Equations has four parts to it which serve as a tool, a review or testing device, and two games. The menu choices are: Equation Plotter which
More informationDebugging Floating-Point Math in Racket. Neil Toronto RacketCon 2013
Debugging Floating-Point Math in Racket Neil Toronto RacketCon 213 Racket Floating-Point Support Fast (JIT-ed) and compliant (IEEE 754 and C99) 1 Racket Floating-Point Support Fast (JIT-ed) and compliant
More informationIntroduction to Engineering gii
25.108 Introduction to Engineering gii Dr. Jay Weitzen Lecture Notes I: Introduction to Matlab from Gilat Book MATLAB - Lecture # 1 Starting with MATLAB / Chapter 1 Topics Covered: 1. Introduction. 2.
More information9 Using Equation Networks
9 Using Equation Networks In this chapter Introduction to Equation Networks 244 Equation format 247 Using register address lists 254 Setting up an enable contact 255 Equations displayed within the Network
More informationStarting MATLAB To logon onto a Temple workstation at the Tech Center, follow the directions below.
What is MATLAB? MATLAB (short for MATrix LABoratory) is a language for technical computing, developed by The Mathworks, Inc. (A matrix is a rectangular array or table of usually numerical values.) MATLAB
More informationTHE representation formats and the behavior of binary
SUBMITTED TO IEEE TC - NOVEMBER 2016 1 Exact Lookup Tables for the Evaluation of Trigonometric and Hyperbolic Functions Hugues de Lassus Saint-Geniès, David Defour, and Guillaume Revy Abstract Elementary
More informationCSI31 Lecture 5. Topics: 3.1 Numeric Data Types 3.2 Using the Math Library 3.3 Accumulating Results: Factorial
CSI31 Lecture 5 Topics: 3.1 Numeric Data Types 3.2 Using the Math Library 3.3 Accumulating Results: Factorial 1 3.1 Numberic Data Types When computers were first developed, they were seen primarily as
More informationMathematical Modeling to Formally Prove Correctness
0 Mathematical Modeling to Formally Prove Correctness John Harrison Intel Corporation Gelato Meeting April 24, 2006 1 The human cost of bugs Computers are often used in safety-critical systems where a
More informationA Multiple -Precision Division Algorithm. By David M. Smith
A Multiple -Precision Division Algorithm By David M. Smith Abstract. The classical algorithm for multiple -precision division normalizes digits during each step and sometimes makes correction steps when
More informationECEn 621 Computer Arithmetic Project #3. CORDIC History
ECEn 621 Computer Arithmetic Project #3 CORDIC Algorithms Slide #1 CORDIC History Coordinate Rotation Digital Computer Invented in late 1950 s Based on the observation that: if you rotate a unit-length
More informationMy 2 hours today: 1. Efficient arithmetic in finite fields minute break 3. Elliptic curves. My 2 hours tomorrow:
My 2 hours today: 1. Efficient arithmetic in finite fields 2. 10-minute break 3. Elliptic curves My 2 hours tomorrow: 4. Efficient arithmetic on elliptic curves 5. 10-minute break 6. Choosing curves Efficient
More information= f (a, b) + (hf x + kf y ) (a,b) +
Chapter 14 Multiple Integrals 1 Double Integrals, Iterated Integrals, Cross-sections 2 Double Integrals over more general regions, Definition, Evaluation of Double Integrals, Properties of Double Integrals
More informationarxiv:cs/ v1 [cs.ds] 11 May 2005
The Generic Multiple-Precision Floating-Point Addition With Exact Rounding (as in the MPFR Library) arxiv:cs/0505027v1 [cs.ds] 11 May 2005 Vincent Lefèvre INRIA Lorraine, 615 rue du Jardin Botanique, 54602
More informationData Types and Basic Calculation
Data Types and Basic Calculation Intrinsic Data Types Fortran supports five intrinsic data types: 1. INTEGER for exact whole numbers e.g., 1, 100, 534, -18, -654321, etc. 2. REAL for approximate, fractional
More informationHonors Precalculus: Solving equations and inequalities graphically and algebraically. Page 1
Solving equations and inequalities graphically and algebraically 1. Plot points on the Cartesian coordinate plane. P.1 2. Represent data graphically using scatter plots, bar graphs, & line graphs. P.1
More informationBisecting Floating Point Numbers
Squishy Thinking by Jason Merrill jason@squishythinking.com Bisecting Floating Point Numbers 22 Feb 2014 Bisection is about the simplest algorithm there is for isolating a root of a continuous function:
More informationA Thread-Safe Arbitrary Precision Computation Package (Full Documentation)
A Thread-Safe Arbitrary Precision Computation Package (Full Documentation) David H. Bailey August 18, 2018 Abstract Numerous research studies have arisen, particularly in the realm of mathematical physics
More informationFinite arithmetic and error analysis
Finite arithmetic and error analysis Escuela de Ingeniería Informática de Oviedo (Dpto de Matemáticas-UniOvi) Numerical Computation Finite arithmetic and error analysis 1 / 45 Outline 1 Number representation:
More informationGoals for This Lecture:
Goals for This Lecture: Understand integer arithmetic Understand mixed-mode arithmetic Understand the hierarchy of arithmetic operations Introduce the use of intrinsic functions Real Arithmetic Valid expressions
More information16.69 TAYLOR: Manipulation of Taylor series
859 16.69 TAYLOR: Manipulation of Taylor series This package carries out the Taylor expansion of an expression in one or more variables and efficient manipulation of the resulting Taylor series. Capabilities
More informationChapter 3. Errors and numerical stability
Chapter 3 Errors and numerical stability 1 Representation of numbers Binary system : micro-transistor in state off 0 on 1 Smallest amount of stored data bit Object in memory chain of 1 and 0 10011000110101001111010010100010
More informationCS 6210 Fall 2016 Bei Wang. Lecture 4 Floating Point Systems Continued
CS 6210 Fall 2016 Bei Wang Lecture 4 Floating Point Systems Continued Take home message 1. Floating point rounding 2. Rounding unit 3. 64 bit word: double precision (IEEE standard word) 4. Exact rounding
More informationOutline. CSE 1570 Interacting with MATLAB. Starting MATLAB. Outline. MATLAB Windows. MATLAB Desktop Window. Instructor: Aijun An.
CSE 170 Interacting with MATLAB Instructor: Aijun An Department of Computer Science and Engineering York University aan@cse.yorku.ca Outline Starting MATLAB MATLAB Windows Using the Command Window Some
More informationMath 121. Graphing Rational Functions Fall 2016
Math 121. Graphing Rational Functions Fall 2016 1. Let x2 85 x 2 70. (a) State the domain of f, and simplify f if possible. (b) Find equations for the vertical asymptotes for the graph of f. (c) For each
More informationUNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666
UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer Arithmetic ECE 666 Part 8 Division through Multiplication Israel Koren ECE666/Koren Part.8.1 Division by Convergence
More informationMYSQL NUMERIC FUNCTIONS
MYSQL NUMERIC FUNCTIONS http://www.tutorialspoint.com/mysql/mysql-numeric-functions.htm Copyright tutorialspoint.com MySQL numeric functions are used primarily for numeric manipulation and/or mathematical
More informationFloating-point representations
Lecture 10 Floating-point representations Methods of representing real numbers (1) 1. Fixed-point number system limited range and/or limited precision results must be scaled 100101010 1111010 100101010.1111010
More informationUNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666
UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer Arithmetic ECE 666 Part 4-C Floating-Point Arithmetic - III Israel Koren ECE666/Koren Part.4c.1 Floating-Point Adders
More informationf(0.2) < 0 and f(0.3) > 0 f(1.2) > 0 and f(1.3) < 0 f(1) = 1 > 0 and f(2) 0.
1. [Burden and Faires, Section 1.1 Problem 1]. Show that the following equations have at least one solution in the given intervals. a. xcosx 2x 2 +3x 1 = 0 on [0.2,0.3] and [1.2,1.3]. Let f(x) = xcosx
More informationFloating-point representations
Lecture 10 Floating-point representations Methods of representing real numbers (1) 1. Fixed-point number system limited range and/or limited precision results must be scaled 100101010 1111010 100101010.1111010
More informationNumerical computing. How computers store real numbers and the problems that result
Numerical computing How computers store real numbers and the problems that result The scientific method Theory: Mathematical equations provide a description or model Experiment Inference from data Test
More informationMAT128A: Numerical Analysis Lecture Two: Finite Precision Arithmetic
MAT128A: Numerical Analysis Lecture Two: Finite Precision Arithmetic September 28, 2018 Lecture 1 September 28, 2018 1 / 25 Floating point arithmetic Computers use finite strings of binary digits to represent
More informationFP-ANR: Another representation format to handle cancellation at run-time. David Defour LAMPS, Univ. de Perpignan (France)
FP-ANR: Another representation format to handle cancellation at run-time David Defour LAMPS, Univ. de Perpignan (France) Motivations Number s representation really matters List of problems Uncertainty
More informationFloating Point Puzzles. Lecture 3B Floating Point. IEEE Floating Point. Fractional Binary Numbers. Topics. IEEE Standard 754
Floating Point Puzzles Topics Lecture 3B Floating Point IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties For each of the following C expressions, either: Argue that
More informationLab 1 - Worksheet Spring 2013
Math 300 UMKC Lab 1 - Worksheet Spring 2013 Learning Objectives: 1. How to use Matlab as a calculator 2. Learn about Matlab built in functions 3. Matrix and Vector arithmetics 4. MATLAB rref command 5.
More informationIntroduction to Numerical Computing
Statistics 580 Introduction to Numerical Computing Number Systems In the decimal system we use the 10 numeric symbols 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 to represent numbers. The relative position of each symbol
More informationMake Computer Arithmetic Great Again?
Make Computer Arithmetic Great Again? Jean-Michel Muller CNRS, ENS Lyon, Inria, Université de Lyon France ARITH-25 June 2018 -2- An apparent contradiction low number of paper submissions to Arith these
More informationOutline. CSE 1570 Interacting with MATLAB. Starting MATLAB. Outline (Cont d) MATLAB Windows. MATLAB Desktop Window. Instructor: Aijun An
CSE 170 Interacting with MATLAB Instructor: Aijun An Department of Computer Science and Engineering York University aan@cse.yorku.ca Outline Starting MATLAB MATLAB Windows Using the Command Window Some
More informationA Multiple-Precision Division Algorithm
Digital Commons@ Loyola Marymount University and Loyola Law School Mathematics Faculty Works Mathematics 1-1-1996 A Multiple-Precision Division Algorithm David M. Smith Loyola Marymount University, dsmith@lmu.edu
More informationJulia Calculator ( Introduction)
Julia Calculator ( Introduction) Julia can replicate the basics of a calculator with the standard notations. Binary operators Symbol Example Addition + 2+2 = 4 Substraction 2*3 = 6 Multify * 3*3 = 9 Division
More informationEasy Accurate Reading and Writing of Floating-Point Numbers. Aubrey Jaffer 1. August 2018
Easy Accurate Reading and Writing of Floating-Point Numbers Aubrey Jaffer August 8 Abstract Presented here are algorithms for converting between (decimal) scientific-notation and (binary) IEEE-754 double-precision
More informationMATLAB Basics EE107: COMMUNICATION SYSTEMS HUSSAIN ELKOTBY
MATLAB Basics EE107: COMMUNICATION SYSTEMS HUSSAIN ELKOTBY What is MATLAB? MATLAB (MATrix LABoratory) developed by The Mathworks, Inc. (http://www.mathworks.com) Key Features: High-level language for numerical
More informationScientific Computing. Error Analysis
ECE257 Numerical Methods and Scientific Computing Error Analysis Today s s class: Introduction to error analysis Approximations Round-Off Errors Introduction Error is the difference between the exact solution
More informationReals 1. Floating-point numbers and their properties. Pitfalls of numeric computation. Horner's method. Bisection. Newton's method.
Reals 1 13 Reals Floating-point numbers and their properties. Pitfalls of numeric computation. Horner's method. Bisection. Newton's method. 13.1 Floating-point numbers Real numbers, those declared to be
More informationFixed point algorithmic math package user s guide By David Bishop
Fixed point algorithmic math package user s guide By David Bishop (dbishop@vhdl.org) The fixed point matrix math package was designed to be a synthesizable matrix math package. Because this package allows
More informationECE232: Hardware Organization and Design
ECE232: Hardware Organization and Design Lecture 11: Floating Point & Floating Point Addition Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Last time: Single Precision Format
More informationFloating Point January 24, 2008
15-213 The course that gives CMU its Zip! Floating Point January 24, 2008 Topics IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties class04.ppt 15-213, S 08 Floating
More informationRepresentation of Numbers
Computer Architecture 10 Representation of Numbers Made with OpenOffice.org 1 Number encodings Additive systems - historical Positional systems radix - the base of the numbering system, the positive integer
More informationMath 340 Fall 2014, Victor Matveev. Binary system, round-off errors, loss of significance, and double precision accuracy.
Math 340 Fall 2014, Victor Matveev Binary system, round-off errors, loss of significance, and double precision accuracy. 1. Bits and the binary number system A bit is one digit in a binary representation
More informationGPU & Computer Arithmetics
GPU & Computer Arithmetics David Defour University of Perpignan Key multicore challenges Performance challenge How to scale from 1 to 1000 cores The number of cores is the new MegaHertz Power efficiency
More informationFloating Point Puzzles. Lecture 3B Floating Point. IEEE Floating Point. Fractional Binary Numbers. Topics. IEEE Standard 754
Floating Point Puzzles Topics Lecture 3B Floating Point IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties For each of the following C expressions, either: Argue that
More informationFloating Point Numbers
Floating Point Numbers Computer Systems Organization (Spring 2016) CSCI-UA 201, Section 2 Instructor: Joanna Klukowska Slides adapted from Randal E. Bryant and David R. O Hallaron (CMU) Mohamed Zahran
More informationFloating Point Numbers
Floating Point Numbers Computer Systems Organization (Spring 2016) CSCI-UA 201, Section 2 Fractions in Binary Instructor: Joanna Klukowska Slides adapted from Randal E. Bryant and David R. O Hallaron (CMU)
More information