Recovering Numerical Reproducibility in Hydrodynamics Simulations

Size: px

Start display at page:

Download "Recovering Numerical Reproducibility in Hydrodynamics Simulations"

Christal Gilmore
5 years ago
Views:

23rd IEEE Symposium on Computer Arithmetic July

DALI, Université de Perpignan Via Domitia LIRMM, UMR

1 23rd IEEE Symposium on Computer Arithmetic July , Santa-Clara, USA Recovering Numerical Reproducibility in Hydrodynamics Simulations Philippe Langlois, Rafife Nheili, Christophe Denis DALI, Université de Perpignan Via Domitia LIRMM, UMR 5506 CNRS, Université de Montpellier CMLA, ENS Cachan, France 1 / 31

2 Recovering numerical reproducibility Execution: Sequential 2 procs 4 procs p procs Original code 2 / 31

3 Recovering numerical reproducibility Execution: Sequential 2 procs 4 procs p procs Non-reproducible original code 2 / 31

4 Recovering numerical reproducibility Reproducibility: bitwise identical results for every p-parallel run, p 1 Execution: Sequential 2 procs 4 procs p procs Non-reproducible original code Reproducible code 2 / 31

5 Recovering numerical reproducibility Reproducibility: bitwise identical results for every p-parallel run, p 1 Execution: Sequential 2 procs 4 procs p procs Non-reproducible original code reproducibility Reproducible code 2 / 31

6 Recovering numerical reproducibility Reproducibility: bitwise identical results for every p-parallel run, p 1 Reproducibility Accuracy Execution: Sequential 2 procs 4 procs p procs Non-reproducible original code accuracy reproducibility Reproducible code 2 / 31

7 Recovering numerical reproducibility Reproducibility: bitwise identical results for every p-parallel run, p 1 Reproducibility Accuracy Failures reported in numerical simulation for energy [10], dynamic weather science [2], dynamic molecular [9], dynamic fluid [8] Execution: Sequential 2 procs 4 procs p procs Non-reproducible original code accuracy reproducibility Reproducible code 2 / 31

8 Recovering numerical reproducibility Reproducibility: bitwise identical results for every p-parallel run, p 1 Reproducibility Accuracy Failures reported in numerical simulation for energy [10], dynamic weather science [2], dynamic molecular [9], dynamic fluid [8] How to debug? How to test? How to validate? How to receive legal agreements? Execution: Sequential 2 procs 4 procs p procs Non-reproducible original code accuracy reproducibility Reproducible code 2 / 31

Hydrodynamics simulation One industrial scale simulation code Simulation of free-surface flows in 1D-2D-3D hydrodynamics 300 000 loc.

9 Hydrodynamics simulation One industrial scale simulation code Simulation of free-surface flows in 1D-2D-3D hydrodynamics loc. of open source Fortran years, 4000 registered users, EDF R&D + international consortium Telemac 2D [3] 2D hydrodynamic: Saint Venant equations Finite element method, triangular element mesh, sub-domain decomposition for parallel resolution Mesh node unknowns: water depth (H) and velocity (U,V) 3 / 31

10 Telemac2D: the simplest gouttedo simulation The gouttedo simulation test case 2D-simulation of a water drop fall in a square basin Unknown: water depth for a 0.2 sec time step Triangular mesh: 8978 elements and 4624 nodes Expected numerical reproducibility (time step = 1, 2,... ) Sequential Parallel p = 2 4 / 31

11 A white plot displays a non-reproducible value Numerical reproducibility? time step = 1 Sequential Parallel p = 2 5 / 31

12 A white plot displays a non-reproducible value Numerical reproducibility? time step = 2 Sequential Parallel p = 2 5 / 31

13 A white plot displays a non-reproducible value Numerical reproducibility? time step = 3 Sequential Parallel p = 2 5 / 31

14 A white plot displays a non-reproducible value Numerical reproducibility? time step = 4 Sequential Parallel p = 2 5 / 31

15 A white plot displays a non-reproducible value Numerical reproducibility? time step = 5 Sequential Parallel p = 2 5 / 31

16 A white plot displays a non-reproducible value Numerical reproducibility? time step = 6 Sequential Parallel p = 2 5 / 31

17 A white plot displays a non-reproducible value Numerical reproducibility? time step = 7 Sequential Parallel p = 2 5 / 31

18 A white plot displays a non-reproducible value Numerical reproducibility? time step = 8 Sequential Parallel p = 2 5 / 31

19 A white plot displays a non-reproducible value Numerical reproducibility? time step = 9 Sequential Parallel p = 2 5 / 31

20 A white plot displays a non-reproducible value Numerical reproducibility? time step = 10 Sequential Parallel p = 2 5 / 31

21 A white plot displays a non-reproducible value Numerical reproducibility? time step = 11 Sequential Parallel p = 2 5 / 31

22 A white plot displays a non-reproducible value Numerical reproducibility? time step = 12 Sequential Parallel p = 2 5 / 31

23 A white plot displays a non-reproducible value Numerical reproducibility? time step = 13 Sequential Parallel p = 2 5 / 31

24 A white plot displays a non-reproducible value Numerical reproducibility? time step = 14 Sequential Parallel p = 2 5 / 31

25 A white plot displays a non-reproducible value NO numerical reproducibility! time step = 15 Sequential Parallel p = 2 5 / 31

26 Telemac2D: gouttedo NO numerical reproducibility! Sequential Parallel p = 2 6 / 31

27 Today s issues Case study Industrial scale software: opentelemac-mascaret Finite element simulation, domain decomposition, linear system solving 2 modules: Tomawac, Telemac2D 7 / 31

28 Today s issues Case study Industrial scale software: opentelemac-mascaret Finite element simulation, domain decomposition, linear system solving 2 modules: Tomawac, Telemac2D Feasibility How to recover reproducibility? Sources of non-reproducibility? Do existing techniques apply? how easily? Compensation yields reproducibility here! 7 / 31

29 Today s issues Case study Industrial scale software: opentelemac-mascaret Finite element simulation, domain decomposition, linear system solving 2 modules: Tomawac, Telemac2D Feasibility How to recover reproducibility? Sources of non-reproducibility? Do existing techniques apply? how easily? Compensation yields reproducibility here! Efficiency How much to pay for reproducibility? extra-cost which decreases as the problem size increases OK to debug, to validate and even to simulate! 7 / 31

30 Outline 1 Motivation 2 Reproducibility failure in a finite element simulation Sequential and parallel FE assembly Sources of non reproducibility in Telemac2D 3 Recovering reproducibility Reproducible parallel FE assembly Reproducible algebraic operations Reproducible conjugate gradient Reproducible Telemac2D 4 Efficiency 5 Conclusion 8 / 31

31 Parallel reduction and compensation techniques Non associative floating point addition The computed value depends on the operation order 9 / 31

32 Parallel reduction and compensation techniques Non associative floating point addition The computed value depends on the operation order Parallel reduction of undefined order generates reproducibility failure a b c d a b c d a b c d a b (a b) c (a b) (c d) ((a b) c) d 9 / 31

33 Parallel reduction and compensation techniques Non associative floating point addition The computed value depends on the operation order Parallel reduction of undefined order generates reproducibility failure Compensate rounding errors with error free transformations a b c d a b c d e 1 a b c d e 2 a b f 1 (a b) c f 2 e 3 (a b) (c d) ((a b) c) d f 3 ((a b) (c d)) ((e 1 e 2 ) e 3 ) = (((a b) c) d) ((f 1 f 2 ) f 3 ) Should be repeted for too ill-conditionned sums 9 / 31

34 Finite element assembly: the sequential case The assembly step: V (i) = elements W el(i) compute the inner node values V (i) accumulating local W el for every el that contains i %&# *+# ()*+,'-# "!# '#!" "# &# ' $$#!"# +# The assembly loop for p = 1,np //p: triangular local number (np=3) for el = 1,nel i = IKLE(el,p) % V(i) = V(i) + W(el,p) //i: domain global number 10 / 31

Finite element assembly: the sequential case The assembly step: V (i) = elements W el(i) compute the inner node values V (i) accumulating local W el for every el that contains i ()*+,'-# %&# *+# "!

35 Finite element assembly: the sequential case The assembly step: V (i) = elements W el(i) compute the inner node values V (i) accumulating local W el for every el that contains i ()*+,'-# %&# *+# "!# '#!" "# &# ' $$#!"# +# The assembly loop for p = 1,np //p: triangular local number (np=3) for el = 1,nel i = IKLE(el,p) % < - LOOP INDEX INDIRECTION V(i) = V(i) + W(el,p) //i: domain global number 10 / 31

36 Finite element assembly: the parallel case Interface point assembly: communications and reductions V (i) = D k V (i) for sub-domains D k, k = 1...p sequential parallel sub-domains inner nodes interface points V (i) = a V D1 (i) = b V D2 (i) = c Interface point assembly V (i) = b + c = a Exact arithmetic 11 / 31

37 Finite element assembly: the parallel case Interface point assembly: communications and reductions V (i) = D k V (i) for sub-domains D k, k = 1...p sequential parallel sub-domains inner nodes interface points V (i) = â V D1 (i) = b V D2 (i) = ĉ Interface point V (i) = b assembly ĉ â Floating point arithmetic 11 / 31

38 Basic ingredients 1 Motivation 2 Reproducibility failure in a finite element simulation Sequential and parallel FE assembly Sources of non reproducibility in Telemac2D 3 Recovering reproducibility Reproducible parallel FE assembly Reproducible algebraic operations Reproducible conjugate gradient Reproducible Telemac2D 4 Efficiency 5 Conclusion 11 / 31

39 Sources of non reproducibility in Telemac2D Culprits: theory 1 Building step: interface point assembly 2 Solving step (conjugate gradient): parallel matrix-vector and dot products 12 / 31

40 Sources of non reproducibility in Telemac2D Culprits: theory 1 Building step: interface point assembly 2 Solving step (conjugate gradient): parallel matrix-vector and dot products Culprits: practice = optimization Interface point assembly and linear system solving are merged Element-by-element storage (EBE) of the FE matrix EBE parallel matrix-vector product: no BLAS Everything is vector, no matrix! Wave equation, mass-lumping and associated algebraic transformations System decoupling and many diagonal matrices Everything is vector, no matrix! 12 / 31

41 Sources of non reproducibility in Telemac2D The Telemac2D FE steps Solution U, V C 2 = B u A uhh Solution H diagonal resolution C 3 = B v A vhh Interface point assembly: C 2, C 3 H A 2, A 3 System equation AX = C A A A 3 A hh A hu A hv A 0 uh Auu A vh 0 Avv conjugate gradient H U = V H U = V C 1 C 2 C 3 wave equation B h Bu Bv Interface point assembly: A 1d in each iteration A 2, A 3 : diagonal matrices A 1 = A hh A hua 1 2 Auh Ahv A 1 3 Avh C 1 = B h A hua 1 2 Bu Ahv A 1 3 Bv Interface point assembly: A 2, A 3, C 1 A 1, C 1 FE assembly + algebraic computation Mesh (elements, nodes) Discretisation 13 / 31

42 Sources of non reproducibility in Telemac2D The Telemac2D FE steps Solution U, V C 2 = B u A uhh diagonal resolution Solution H C 3 = B v A vhh Interface point assembly: C 2, C 3 H A 2, A 3 System equation AX = C A A A 3 A hh A hu A hv A 0 uh Auu A vh 0 Avv conjugate gradient H U = V H U = V C 1 C 2 C 3 wave equation B h Bu Bv Interface point assembly: A 1d in each iteration A 2, A 3 : diagonal matrices A 1 = A hh A hua 1 2 Auh Ahv A 1 3 Avh C 1 = B h A hua 1 2 Bu Ahv A 1 3 Bv Interface point assembly: A 2, A 3, C 1 A 1, C 1 Mesh (elements, nodes) FE assembly + algebraic computation Objective Correct sources of non-reproducibility to compute reproducible system and solutions Discretisation 13 / 31

43 Recovering reproducibility in a finite element resolution 1 Motivation 2 Reproducibility failure in a finite element simulation Sequential and parallel FE assembly Sources of non reproducibility in Telemac2D 3 Recovering reproducibility Reproducible parallel FE assembly Reproducible algebraic operations Reproducible conjugate gradient Reproducible Telemac2D 4 Efficiency 5 Conclusion 13 / 31

44 Recovering reproducibility in Telemac2D Sources FE assembly: diagonal of the matrices and second members Resolution: EBE matrix-vector and dot products Wave equation: algebraic transformations and diagonal resolutions Reproducible resolution: principles vector V [V, E V ] V + E V Computes E V in the FE assembly of V Propagates E V over each V operation Compensates all nodes while assembling the Interface Point Compensate MPI parallel dot products that include MPI reduction 14 / 31

45 Reproducible FE assembly i: any node; W el (i): contribution for every el that contains i Original FE assembly: V (i) = elements W el(i) V (i) = W el1 (i) + W el2 (i) + + W elni (i) Modified FE assembly: [V (i), E V (i)] = ReprodAss elements W el (i) V (i) = W el1 (i) + W el2 (i) + + W elni (i) e 1 e 2 e ni E V (i) = e 1 + e e ni 1 15 / 31

46 Reproducible interface point assembly i: one interface point of D 1, D 2,, D k 1, D k Original IP assembly: V (i) = D k V (i) Communicate V Di to D 1, D 2, and compute: V (i) = V D1 (i) + V D2 (i) + + V Dk 1 (i) + V Dk (i) Modified IP assembly: [V (i), E V (i)] = ReprodAss Dk [V (i), E VDk (i)] Communicate [V Di, E DVi ] to D 1, D 2, and compute: V (i) = V D1 (i) + V D2 (i) + + V Dk 1 (i) + V Dk (i) δ 1 δ 2 δ k 1 E V (i) = (E VD1 (i) + E VD2 (i) + δ 1 ) + + (E VDk 1 (i) + E VDk (i) + δ k 1 ) Reproducibility: at the IP assembly step [V, E V ] V + E V compensates every node value 16 / 31

47 Reproducible algebraic operations 1 Motivation 2 Reproducibility failure in a finite element simulation Sequential and parallel FE assembly Sources of non reproducibility in Telemac2D 3 Recovering reproducibility Reproducible parallel FE assembly Reproducible algebraic operations Reproducible conjugate gradient Reproducible Telemac2D 4 Efficiency 5 Conclusion 16 / 31

48 Algebraic operation: the vector case Reproducible algebraic vector operations opentelemac s included library: BIEF Entry-wise vector ops: copy, opp/inv., add/sub, Hadamard prod.,... Applies also for diagonal matrix Propagate rounding errors to compensate while assembling IP Example: Hadamard product Original version X, Y V = X Y V (i) = X (i) Y (i) Modified version [X, E X ], [Y, E Y ] [V, E V ] with (V, e V ) = 2Prod(X, Y ) and E V = X E Y + Y E X + e V 17 / 31

49 What is reproducible now? Most of the linear system: except: FE assembly algebraic vector operations interface point assembly the matrix of the H system its dependencies: the second members of the U and V systems Next step: conjugate gradient Partially reproducible Telemac2D Diagonal resolutions: C 2 = B u A uhh, C 3 = B v A vhh. Interface point assembly: C 2, C 3 Solution U, V Conjugate gradient : Interface point assembly: A 1d in each iteration Solution H Wave equation: A 2, A 3 : diagonal matrices, A 1 = A hh A hua 1 2 Auh Ahv A 1 3 Avh, C 1 = B h A hua 1 2 Bu Ahv A 1 3 Bv, Interface point assembly: A 2, A 3, C 1 H A 1,C 1 A 2, A 3 18 / 31

50 Recovering reproducibility in a finite element resolution 1 Motivation 2 Reproducibility failure in a finite element simulation Sequential and parallel FE assembly Sources of non reproducibility in Telemac2D 3 Recovering reproducibility Reproducible parallel FE assembly Reproducible algebraic operations Reproducible conjugate gradient Reproducible Telemac2D 4 Efficiency 5 Conclusion 18 / 31

51 Towards a reproducible conjugate gradient A = [A 1, E A1 ] B = C 1 Initialization: a given d 0 r 0 = AX 0 B; ρ 0 = (r0, d 0 ) ; X 1 = X 0 ρ 0 d 0 (Ad 0, d 0 ) Iterations until stopping criteria: r m = r m 1 ρ m 1 Ad m 1 d m = r m + (r m, r m ) (r m 1, r m 1 ) d m 1 ρ m = (rm, d m ) (d m, Ad m ) X m+1 = X m ρ m d m X = H 19 / 31

52 Towards a reproducible conjugate gradient A = [A 1, E A1 ] B = C 1 Initialization: a given d 0 r 0 = AX 0 B; ρ 0 = (r0, d 0 ) ; X 1 = X 0 ρ 0 d 0 (Ad 0, d 0 ) Iterations until stopping criteria: r m = r m 1 ρ m 1 Ad m 1 d m = r m + (r m, r m ) (r m 1, r m 1 ) d m 1 ρ m = (rm, d m ) (d m, Ad m ) X m+1 = X m ρ m d m X = H Non-reproducibility sources EBE matrix-vector product dot product MPI reduction weighted dot products for IP shared by p sub-domains 19 / 31

53 The EBE storage nel M = D + el=1 nodes i [1, np], elements el [1, nel], element vertices j, k, l el M is decomposed as: X el 1 assembled diagonal D[np] : D = [D(1),, D(np)] 2 elementary extra-diagonal X el [6]: X jk (el) X jl (el) X el = X kj (el) X kl (el) = [X el (1),, X el (6)] X lj (el) X lk (el) 20 / 31

54 The EBE Matrix-Vector product Steps of the EBE matrix-vector product 1 R 1 (i) = D(i) V (i), i [1, np] nel R = M V = D V + X el V el 2 X el.v el = [X el (1) V (k), X el (2) V (l),, X el (6) V (k)], el [1, nel] el=1 3 FE assembly R 2 [np]: R 2 = nel el=1 X el V el 4 R = R 1 + R 2 5 IP assembly: R(i) = D k R(i) for all IP i 21 / 31

55 Reproducible EBE matrix-vector product Original EBE matrix-vector product R = D V + nel el=1 X el V el R(i) = D k R(i) Reproducible EBE matrix-vector product [R, E R ] = [D, E D ] V + ReprodAss nel el=1x el V el R(i) = ReprodAss Dk [R(i), E R (i)] Compensation: R + E R 22 / 31

56 A reproducible conjugate gradient A = [A 1, E A1 ] B = C 1 Initialization: a given d 0 r 0 = AX 0 B; ρ 0 = (r0, d 0 ) ; X 1 = X 0 ρ 0 d 0 (Ad 0, d 0 ) Iterations until stopping criteria: r m = r m 1 ρ m 1 Ad m 1 d m = r m + (r m, r m ) (r m 1, r m 1 ) d m 1 ρ m = (rm, d m ) (d m, Ad m ) X m+1 = X m ρ m d m X = H Non-reproducibility: sources and solutions Reproducible EBE matrix-vector product dot product MPI reduction: a parallel compensated dot2 weights: (1/k, 1/k,..., 1/k) (1, 0,..., 0) Reproducible operations reproducible results 23 / 31

57 A reproducible conjugate gradient A=[A 1, E A1 ] B=C 1 Initialization: a given d 0 r 0 = AX 0 B; ρ 0 = (r0, d 0 ) (Ad 0, d 0 ) ; X 1 = X 0 ρ 0 d 0 Iterations until stopping criteria: r m = r m 1 ρ m 1 Ad m 1 d m = r m + (r m, r m ) (r m 1, r m 1 ) d m 1 ρ m = (rm, d m ) (d m, Ad m ) X m+1 = X m ρ m d m X =H Non-reproducibility: sources and solutions Reproducible EBE matrix-vector product dot product MPI reduction: a parallel compensated dot2 weights: (1/k, 1/k,..., 1/k) (1, 0,..., 0) Reproducible operations reproducible results Same errors for both sequential and parallel executions 23 / 31

58 Recovering reproducibility in a finite element resolution 1 Motivation 2 Reproducibility failure in a finite element simulation Sequential and parallel FE assembly Sources of non reproducibility in Telemac2D 3 Recovering reproducibility Reproducible parallel FE assembly Reproducible algebraic operations Reproducible conjugate gradient Reproducible Telemac2D 4 Efficiency 5 Conclusion 23 / 31

59 Reproducible Telemac2D Reproducible Telemac2D Execution: Sequential 2 procs 4 procs p procs Non-reproducible Original code original code accuracy reproducibility Diagonal resolutions: C 2 = B u A uhh, C 3 = B v A vhh. Interface point assembly: C 2, C 3 Solution U, V Conjugate gradient : Interface point assembly: A 1d in each iteration Solution H Wave equation: A 2, A 3 : diagonal matrices, A 1 = A hh A hua 1 2 Auh Ahv A 1 3 Avh, C 1 = B h A hua 1 2 Bu Ahv A 1 3 Bv, Interface point assembly: A 2, A 3, C 1 H A 1,C 1 A 2, A 3 Maximum Relative Error vs. Sequential Computation Reproducible code Maximum relative error gouttedo test case acc rep COMP P=2 COMP P=4 COMP P= Time step 24 / 31

60 Reproducible gouttedo! Time step 1 p=1 p=2 p=4 p=8 25 / 31

61 Reproducible gouttedo! Time step 2 p=1 p=2 p=4 p=8 25 / 31

62 Reproducible gouttedo! Time step 3 p=1 p=2 p=4 p=8 25 / 31

63 Reproducible gouttedo! Time step 4 p=1 p=2 p=4 p=8 25 / 31

64 Reproducible gouttedo! Time step 5 p=1 p=2 p=4 p=8 25 / 31

65 Reproducible gouttedo! Time step 6 p=1 p=2 p=4 p=8 25 / 31

66 Reproducible gouttedo! Time step 7 p=1 p=2 p=4 p=8 25 / 31

67 Reproducible gouttedo! Time step 8 p=1 p=2 p=4 p=8 25 / 31

68 Reproducible gouttedo! Time step 9 p=1 p=2 p=4 p=8 25 / 31

69 Reproducible gouttedo! Time step 10 p=1 p=2 p=4 p=8 25 / 31

70 Reproducible gouttedo! Time step 11 p=1 p=2 p=4 p=8 25 / 31

71 Reproducible gouttedo! Time step 12 p=1 p=2 p=4 p=8 25 / 31

72 Reproducible gouttedo! Time step 13 p=1 p=2 p=4 p=8 25 / 31

73 Reproducible gouttedo! Time step 14 p=1 p=2 p=4 p=8 25 / 31

74 Reproducible gouttedo! Time step 15 p=1 p=2 p=4 p=8 25 / 31

75 Reproducible gouttedo! Time step 15 p=1 p=2 p=4 p=8 26 / 31

76 Efficiency 1 Motivation 2 Reproducibility failure in a finite element simulation Sequential and parallel FE assembly Sources of non reproducibility in Telemac2D 3 Recovering reproducibility Reproducible parallel FE assembly Reproducible algebraic operations Reproducible conjugate gradient Reproducible Telemac2D 4 Efficiency 5 Conclusion 26 / 31

77 Runtime extra-cost for reproducible simulations Measures, test cases and mesh sizes hardware cycle counter: rdtsc gouttedo mesh sizes: 4624, 18225, nodes (1, 4, 16) #nodes #proc Number of IP Hardware and software env. opentelemac v7.2 socket: Intel Xeon E GHz (L3 cache = 20 M) 2 sockets of 8 cores each GNU Fortran 4.6.3, -O3 OpenMPI Linux generic 27 / 31

78 The core runtime extra-cost for reproducible gouttedo gouttedo core: no input/output steps, just building+solving x 1.16 Telemac v7, gouttedo x 1.23 Original, #nodes= 4624 Reproducible, #nodes= 4624 Original, #nodes= Reproducible, #nodes= Original, #nodes= Reproducible, #nodes= #cycles x 1.31 x 1.44 x 1.43 x x 1.64 x 1.83 x 1.59 x x 2.21 x # processors 28 / 31

79 Time to conclude 1 Motivation 2 Reproducibility failure in a finite element simulation Sequential and parallel FE assembly Sources of non reproducibility in Telemac2D 3 Recovering reproducibility Reproducible parallel FE assembly Reproducible algebraic operations Reproducible conjugate gradient Reproducible Telemac2D 4 Efficiency 5 Conclusion 28 / 31

80 Conclusion Recovering numerical reproducibility Industrial scale software: opentelemac-mascaret Finite element simulation, domain decomposition, linear system solving 2 reproducible modules: Tomawac, Telemac2D Integration in the next opentelemac version: current work 29 / 31

81 Conclusion Recovering numerical reproducibility Industrial scale software: opentelemac-mascaret Finite element simulation, domain decomposition, linear system solving 2 reproducible modules: Tomawac, Telemac2D Integration in the next opentelemac version: current work Feasibility How to recover reproducibility? Sources of non-reproducibility? Do existing techniques apply? how easily? Hand-made analysis of the computing workflow Compensation yields reproducibility here! Fits well to the opentelemac s vector library Other existing techniques also apply and more or less easily [4] 29 / 31

82 Conclusion Efficiency How much to pay for reproducibility? extra-cost which decreases as the problem size increases OK to debug, to validate and even to simulate! 30 / 31

83 Conclusion Efficiency How much to pay for reproducibility? extra-cost which decreases as the problem size increases OK to debug, to validate and even to simulate! Reproducibility at a larger scale: the whole opentelemac software suite Does it still working for complex, large and real-life simulations? The two FE test cases are significant enough to validate the methodology Localization of the failure sources is difficult to automatize but the methodology application is easy for the software developpers 30 / 31

84 Acknowledgment Jean-Michel Hervouet, LNE, EDR R&D, Chatou Chemseddine Chohra, DALI/LIRMM, UPVD, Perpignan 30 / 31

85 Références I J. W. Demmel and H. D. Nguyen. Fast reproducible floating-point summation. In Proc. 21th IEEE Symposium on Computer Arithmetic. Austin, Texas, USA, Y. He and C. Ding. Using accurate arithmetics to improve numerical reproducibility and stability in parallel applications. J. Supercomput., 18: , J.-M. Hervouet. Hydrodynamics of free surface flows: Modelling with the finite element method. John Wiley & Sons, P. Langlois, R. Nheili, and C. Denis. Numerical Reproducibility: Feasibility Issues. In NTMS 2015: 7th IFIP International Conference on New Technologies, Mobility and Security, pages 1 5, Paris, France, July IEEE, IEEE COMSOC & IFIP TC6.5 WG. P. Langlois, R. Nheili, and C. Denis. Recovering numerical reproducibility in hydrodynamic simulations. In J. Hormigo, S. Oberman, and N. Revol, editors, 23rd IEEE International Symposium on Computer Arithmetic, pages IEEE Computer Society, July (Silicon Valley, USA. July ). 30 / 31

86 Références II T. Ogita, S. M. Rump, and S. Oishi. Accurate sum and dot product. SIAM J. Sci. Comput., 26(6): , Open TELEMAC-MASCARET. v.7.0, Release notes R. W. Robey, J. M. Robey, and R. Aulwes. In search of numerical consistency in parallel programming. Parallel Comput., 37(4-5): , M. Taufer, O. Padron, P. Saponaro, and S. Patel. Improving numerical reproducibility and stability in large-scale numerical simulations on gpus. In IPDPS, pages 1 9. IEEE, O. Villa, D. G. Chavarría-Miranda, V. Gurumoorthi, A. Márquez, and S. Krishnamoorthy. Effects of floating-point non-associativity on numerical computations on massively multithreaded systems. In CUG 2009 Proceedings, pages 1 11, / 31

Numerical Reproducibility: Feasibility Issues

Numerical Reproducibility: Feasibility Issues Philippe Langlois, Rafife Nheili, Christophe Denis To cite this version: Philippe Langlois, Rafife Nheili, Christophe Denis. Numerical Reproducibility: Feasibility