Recovering Numerical Reproducibility in Hydrodynamics Simulations
|
|
- Christal Gilmore
- 5 years ago
- Views:
Transcription
1 23rd IEEE Symposium on Computer Arithmetic July , Santa-Clara, USA Recovering Numerical Reproducibility in Hydrodynamics Simulations Philippe Langlois, Rafife Nheili, Christophe Denis DALI, Université de Perpignan Via Domitia LIRMM, UMR 5506 CNRS, Université de Montpellier CMLA, ENS Cachan, France 1 / 31
2 Recovering numerical reproducibility Execution: Sequential 2 procs 4 procs p procs Original code 2 / 31
3 Recovering numerical reproducibility Execution: Sequential 2 procs 4 procs p procs Non-reproducible original code 2 / 31
4 Recovering numerical reproducibility Reproducibility: bitwise identical results for every p-parallel run, p 1 Execution: Sequential 2 procs 4 procs p procs Non-reproducible original code Reproducible code 2 / 31
5 Recovering numerical reproducibility Reproducibility: bitwise identical results for every p-parallel run, p 1 Execution: Sequential 2 procs 4 procs p procs Non-reproducible original code reproducibility Reproducible code 2 / 31
6 Recovering numerical reproducibility Reproducibility: bitwise identical results for every p-parallel run, p 1 Reproducibility Accuracy Execution: Sequential 2 procs 4 procs p procs Non-reproducible original code accuracy reproducibility Reproducible code 2 / 31
7 Recovering numerical reproducibility Reproducibility: bitwise identical results for every p-parallel run, p 1 Reproducibility Accuracy Failures reported in numerical simulation for energy [10], dynamic weather science [2], dynamic molecular [9], dynamic fluid [8] Execution: Sequential 2 procs 4 procs p procs Non-reproducible original code accuracy reproducibility Reproducible code 2 / 31
8 Recovering numerical reproducibility Reproducibility: bitwise identical results for every p-parallel run, p 1 Reproducibility Accuracy Failures reported in numerical simulation for energy [10], dynamic weather science [2], dynamic molecular [9], dynamic fluid [8] How to debug? How to test? How to validate? How to receive legal agreements? Execution: Sequential 2 procs 4 procs p procs Non-reproducible original code accuracy reproducibility Reproducible code 2 / 31
9 Hydrodynamics simulation One industrial scale simulation code Simulation of free-surface flows in 1D-2D-3D hydrodynamics loc. of open source Fortran years, 4000 registered users, EDF R&D + international consortium Telemac 2D [3] 2D hydrodynamic: Saint Venant equations Finite element method, triangular element mesh, sub-domain decomposition for parallel resolution Mesh node unknowns: water depth (H) and velocity (U,V) 3 / 31
10 Telemac2D: the simplest gouttedo simulation The gouttedo simulation test case 2D-simulation of a water drop fall in a square basin Unknown: water depth for a 0.2 sec time step Triangular mesh: 8978 elements and 4624 nodes Expected numerical reproducibility (time step = 1, 2,... ) Sequential Parallel p = 2 4 / 31
11 A white plot displays a non-reproducible value Numerical reproducibility? time step = 1 Sequential Parallel p = 2 5 / 31
12 A white plot displays a non-reproducible value Numerical reproducibility? time step = 2 Sequential Parallel p = 2 5 / 31
13 A white plot displays a non-reproducible value Numerical reproducibility? time step = 3 Sequential Parallel p = 2 5 / 31
14 A white plot displays a non-reproducible value Numerical reproducibility? time step = 4 Sequential Parallel p = 2 5 / 31
15 A white plot displays a non-reproducible value Numerical reproducibility? time step = 5 Sequential Parallel p = 2 5 / 31
16 A white plot displays a non-reproducible value Numerical reproducibility? time step = 6 Sequential Parallel p = 2 5 / 31
17 A white plot displays a non-reproducible value Numerical reproducibility? time step = 7 Sequential Parallel p = 2 5 / 31
18 A white plot displays a non-reproducible value Numerical reproducibility? time step = 8 Sequential Parallel p = 2 5 / 31
19 A white plot displays a non-reproducible value Numerical reproducibility? time step = 9 Sequential Parallel p = 2 5 / 31
20 A white plot displays a non-reproducible value Numerical reproducibility? time step = 10 Sequential Parallel p = 2 5 / 31
21 A white plot displays a non-reproducible value Numerical reproducibility? time step = 11 Sequential Parallel p = 2 5 / 31
22 A white plot displays a non-reproducible value Numerical reproducibility? time step = 12 Sequential Parallel p = 2 5 / 31
23 A white plot displays a non-reproducible value Numerical reproducibility? time step = 13 Sequential Parallel p = 2 5 / 31
24 A white plot displays a non-reproducible value Numerical reproducibility? time step = 14 Sequential Parallel p = 2 5 / 31
25 A white plot displays a non-reproducible value NO numerical reproducibility! time step = 15 Sequential Parallel p = 2 5 / 31
26 Telemac2D: gouttedo NO numerical reproducibility! Sequential Parallel p = 2 6 / 31
27 Today s issues Case study Industrial scale software: opentelemac-mascaret Finite element simulation, domain decomposition, linear system solving 2 modules: Tomawac, Telemac2D 7 / 31
28 Today s issues Case study Industrial scale software: opentelemac-mascaret Finite element simulation, domain decomposition, linear system solving 2 modules: Tomawac, Telemac2D Feasibility How to recover reproducibility? Sources of non-reproducibility? Do existing techniques apply? how easily? Compensation yields reproducibility here! 7 / 31
29 Today s issues Case study Industrial scale software: opentelemac-mascaret Finite element simulation, domain decomposition, linear system solving 2 modules: Tomawac, Telemac2D Feasibility How to recover reproducibility? Sources of non-reproducibility? Do existing techniques apply? how easily? Compensation yields reproducibility here! Efficiency How much to pay for reproducibility? extra-cost which decreases as the problem size increases OK to debug, to validate and even to simulate! 7 / 31
30 Outline 1 Motivation 2 Reproducibility failure in a finite element simulation Sequential and parallel FE assembly Sources of non reproducibility in Telemac2D 3 Recovering reproducibility Reproducible parallel FE assembly Reproducible algebraic operations Reproducible conjugate gradient Reproducible Telemac2D 4 Efficiency 5 Conclusion 8 / 31
31 Parallel reduction and compensation techniques Non associative floating point addition The computed value depends on the operation order 9 / 31
32 Parallel reduction and compensation techniques Non associative floating point addition The computed value depends on the operation order Parallel reduction of undefined order generates reproducibility failure a b c d a b c d a b c d a b (a b) c (a b) (c d) ((a b) c) d 9 / 31
33 Parallel reduction and compensation techniques Non associative floating point addition The computed value depends on the operation order Parallel reduction of undefined order generates reproducibility failure Compensate rounding errors with error free transformations a b c d a b c d e 1 a b c d e 2 a b f 1 (a b) c f 2 e 3 (a b) (c d) ((a b) c) d f 3 ((a b) (c d)) ((e 1 e 2 ) e 3 ) = (((a b) c) d) ((f 1 f 2 ) f 3 ) Should be repeted for too ill-conditionned sums 9 / 31
34 Finite element assembly: the sequential case The assembly step: V (i) = elements W el(i) compute the inner node values V (i) accumulating local W el for every el that contains i %&# *+# ()*+,'-# "!# '#!" "# &# ' $$#!"# +# The assembly loop for p = 1,np //p: triangular local number (np=3) for el = 1,nel i = IKLE(el,p) % V(i) = V(i) + W(el,p) //i: domain global number 10 / 31
35 Finite element assembly: the sequential case The assembly step: V (i) = elements W el(i) compute the inner node values V (i) accumulating local W el for every el that contains i ()*+,'-# %&# *+# "!# '#!" "# &# ' $$#!"# +# The assembly loop for p = 1,np //p: triangular local number (np=3) for el = 1,nel i = IKLE(el,p) % < - LOOP INDEX INDIRECTION V(i) = V(i) + W(el,p) //i: domain global number 10 / 31
36 Finite element assembly: the parallel case Interface point assembly: communications and reductions V (i) = D k V (i) for sub-domains D k, k = 1...p sequential parallel sub-domains inner nodes interface points V (i) = a V D1 (i) = b V D2 (i) = c Interface point assembly V (i) = b + c = a Exact arithmetic 11 / 31
37 Finite element assembly: the parallel case Interface point assembly: communications and reductions V (i) = D k V (i) for sub-domains D k, k = 1...p sequential parallel sub-domains inner nodes interface points V (i) = â V D1 (i) = b V D2 (i) = ĉ Interface point V (i) = b assembly ĉ â Floating point arithmetic 11 / 31
38 Basic ingredients 1 Motivation 2 Reproducibility failure in a finite element simulation Sequential and parallel FE assembly Sources of non reproducibility in Telemac2D 3 Recovering reproducibility Reproducible parallel FE assembly Reproducible algebraic operations Reproducible conjugate gradient Reproducible Telemac2D 4 Efficiency 5 Conclusion 11 / 31
39 Sources of non reproducibility in Telemac2D Culprits: theory 1 Building step: interface point assembly 2 Solving step (conjugate gradient): parallel matrix-vector and dot products 12 / 31
40 Sources of non reproducibility in Telemac2D Culprits: theory 1 Building step: interface point assembly 2 Solving step (conjugate gradient): parallel matrix-vector and dot products Culprits: practice = optimization Interface point assembly and linear system solving are merged Element-by-element storage (EBE) of the FE matrix EBE parallel matrix-vector product: no BLAS Everything is vector, no matrix! Wave equation, mass-lumping and associated algebraic transformations System decoupling and many diagonal matrices Everything is vector, no matrix! 12 / 31
41 Sources of non reproducibility in Telemac2D The Telemac2D FE steps Solution U, V C 2 = B u A uhh Solution H diagonal resolution C 3 = B v A vhh Interface point assembly: C 2, C 3 H A 2, A 3 System equation AX = C A A A 3 A hh A hu A hv A 0 uh Auu A vh 0 Avv conjugate gradient H U = V H U = V C 1 C 2 C 3 wave equation B h Bu Bv Interface point assembly: A 1d in each iteration A 2, A 3 : diagonal matrices A 1 = A hh A hua 1 2 Auh Ahv A 1 3 Avh C 1 = B h A hua 1 2 Bu Ahv A 1 3 Bv Interface point assembly: A 2, A 3, C 1 A 1, C 1 FE assembly + algebraic computation Mesh (elements, nodes) Discretisation 13 / 31
42 Sources of non reproducibility in Telemac2D The Telemac2D FE steps Solution U, V C 2 = B u A uhh diagonal resolution Solution H C 3 = B v A vhh Interface point assembly: C 2, C 3 H A 2, A 3 System equation AX = C A A A 3 A hh A hu A hv A 0 uh Auu A vh 0 Avv conjugate gradient H U = V H U = V C 1 C 2 C 3 wave equation B h Bu Bv Interface point assembly: A 1d in each iteration A 2, A 3 : diagonal matrices A 1 = A hh A hua 1 2 Auh Ahv A 1 3 Avh C 1 = B h A hua 1 2 Bu Ahv A 1 3 Bv Interface point assembly: A 2, A 3, C 1 A 1, C 1 Mesh (elements, nodes) FE assembly + algebraic computation Objective Correct sources of non-reproducibility to compute reproducible system and solutions Discretisation 13 / 31
43 Recovering reproducibility in a finite element resolution 1 Motivation 2 Reproducibility failure in a finite element simulation Sequential and parallel FE assembly Sources of non reproducibility in Telemac2D 3 Recovering reproducibility Reproducible parallel FE assembly Reproducible algebraic operations Reproducible conjugate gradient Reproducible Telemac2D 4 Efficiency 5 Conclusion 13 / 31
44 Recovering reproducibility in Telemac2D Sources FE assembly: diagonal of the matrices and second members Resolution: EBE matrix-vector and dot products Wave equation: algebraic transformations and diagonal resolutions Reproducible resolution: principles vector V [V, E V ] V + E V Computes E V in the FE assembly of V Propagates E V over each V operation Compensates all nodes while assembling the Interface Point Compensate MPI parallel dot products that include MPI reduction 14 / 31
45 Reproducible FE assembly i: any node; W el (i): contribution for every el that contains i Original FE assembly: V (i) = elements W el(i) V (i) = W el1 (i) + W el2 (i) + + W elni (i) Modified FE assembly: [V (i), E V (i)] = ReprodAss elements W el (i) V (i) = W el1 (i) + W el2 (i) + + W elni (i) e 1 e 2 e ni E V (i) = e 1 + e e ni 1 15 / 31
46 Reproducible interface point assembly i: one interface point of D 1, D 2,, D k 1, D k Original IP assembly: V (i) = D k V (i) Communicate V Di to D 1, D 2, and compute: V (i) = V D1 (i) + V D2 (i) + + V Dk 1 (i) + V Dk (i) Modified IP assembly: [V (i), E V (i)] = ReprodAss Dk [V (i), E VDk (i)] Communicate [V Di, E DVi ] to D 1, D 2, and compute: V (i) = V D1 (i) + V D2 (i) + + V Dk 1 (i) + V Dk (i) δ 1 δ 2 δ k 1 E V (i) = (E VD1 (i) + E VD2 (i) + δ 1 ) + + (E VDk 1 (i) + E VDk (i) + δ k 1 ) Reproducibility: at the IP assembly step [V, E V ] V + E V compensates every node value 16 / 31
47 Reproducible algebraic operations 1 Motivation 2 Reproducibility failure in a finite element simulation Sequential and parallel FE assembly Sources of non reproducibility in Telemac2D 3 Recovering reproducibility Reproducible parallel FE assembly Reproducible algebraic operations Reproducible conjugate gradient Reproducible Telemac2D 4 Efficiency 5 Conclusion 16 / 31
48 Algebraic operation: the vector case Reproducible algebraic vector operations opentelemac s included library: BIEF Entry-wise vector ops: copy, opp/inv., add/sub, Hadamard prod.,... Applies also for diagonal matrix Propagate rounding errors to compensate while assembling IP Example: Hadamard product Original version X, Y V = X Y V (i) = X (i) Y (i) Modified version [X, E X ], [Y, E Y ] [V, E V ] with (V, e V ) = 2Prod(X, Y ) and E V = X E Y + Y E X + e V 17 / 31
49 What is reproducible now? Most of the linear system: except: FE assembly algebraic vector operations interface point assembly the matrix of the H system its dependencies: the second members of the U and V systems Next step: conjugate gradient Partially reproducible Telemac2D Diagonal resolutions: C 2 = B u A uhh, C 3 = B v A vhh. Interface point assembly: C 2, C 3 Solution U, V Conjugate gradient : Interface point assembly: A 1d in each iteration Solution H Wave equation: A 2, A 3 : diagonal matrices, A 1 = A hh A hua 1 2 Auh Ahv A 1 3 Avh, C 1 = B h A hua 1 2 Bu Ahv A 1 3 Bv, Interface point assembly: A 2, A 3, C 1 H A 1,C 1 A 2, A 3 18 / 31
50 Recovering reproducibility in a finite element resolution 1 Motivation 2 Reproducibility failure in a finite element simulation Sequential and parallel FE assembly Sources of non reproducibility in Telemac2D 3 Recovering reproducibility Reproducible parallel FE assembly Reproducible algebraic operations Reproducible conjugate gradient Reproducible Telemac2D 4 Efficiency 5 Conclusion 18 / 31
51 Towards a reproducible conjugate gradient A = [A 1, E A1 ] B = C 1 Initialization: a given d 0 r 0 = AX 0 B; ρ 0 = (r0, d 0 ) ; X 1 = X 0 ρ 0 d 0 (Ad 0, d 0 ) Iterations until stopping criteria: r m = r m 1 ρ m 1 Ad m 1 d m = r m + (r m, r m ) (r m 1, r m 1 ) d m 1 ρ m = (rm, d m ) (d m, Ad m ) X m+1 = X m ρ m d m X = H 19 / 31
52 Towards a reproducible conjugate gradient A = [A 1, E A1 ] B = C 1 Initialization: a given d 0 r 0 = AX 0 B; ρ 0 = (r0, d 0 ) ; X 1 = X 0 ρ 0 d 0 (Ad 0, d 0 ) Iterations until stopping criteria: r m = r m 1 ρ m 1 Ad m 1 d m = r m + (r m, r m ) (r m 1, r m 1 ) d m 1 ρ m = (rm, d m ) (d m, Ad m ) X m+1 = X m ρ m d m X = H Non-reproducibility sources EBE matrix-vector product dot product MPI reduction weighted dot products for IP shared by p sub-domains 19 / 31
53 The EBE storage nel M = D + el=1 nodes i [1, np], elements el [1, nel], element vertices j, k, l el M is decomposed as: X el 1 assembled diagonal D[np] : D = [D(1),, D(np)] 2 elementary extra-diagonal X el [6]: X jk (el) X jl (el) X el = X kj (el) X kl (el) = [X el (1),, X el (6)] X lj (el) X lk (el) 20 / 31
54 The EBE Matrix-Vector product Steps of the EBE matrix-vector product 1 R 1 (i) = D(i) V (i), i [1, np] nel R = M V = D V + X el V el 2 X el.v el = [X el (1) V (k), X el (2) V (l),, X el (6) V (k)], el [1, nel] el=1 3 FE assembly R 2 [np]: R 2 = nel el=1 X el V el 4 R = R 1 + R 2 5 IP assembly: R(i) = D k R(i) for all IP i 21 / 31
55 Reproducible EBE matrix-vector product Original EBE matrix-vector product R = D V + nel el=1 X el V el R(i) = D k R(i) Reproducible EBE matrix-vector product [R, E R ] = [D, E D ] V + ReprodAss nel el=1x el V el R(i) = ReprodAss Dk [R(i), E R (i)] Compensation: R + E R 22 / 31
56 A reproducible conjugate gradient A = [A 1, E A1 ] B = C 1 Initialization: a given d 0 r 0 = AX 0 B; ρ 0 = (r0, d 0 ) ; X 1 = X 0 ρ 0 d 0 (Ad 0, d 0 ) Iterations until stopping criteria: r m = r m 1 ρ m 1 Ad m 1 d m = r m + (r m, r m ) (r m 1, r m 1 ) d m 1 ρ m = (rm, d m ) (d m, Ad m ) X m+1 = X m ρ m d m X = H Non-reproducibility: sources and solutions Reproducible EBE matrix-vector product dot product MPI reduction: a parallel compensated dot2 weights: (1/k, 1/k,..., 1/k) (1, 0,..., 0) Reproducible operations reproducible results 23 / 31
57 A reproducible conjugate gradient A=[A 1, E A1 ] B=C 1 Initialization: a given d 0 r 0 = AX 0 B; ρ 0 = (r0, d 0 ) (Ad 0, d 0 ) ; X 1 = X 0 ρ 0 d 0 Iterations until stopping criteria: r m = r m 1 ρ m 1 Ad m 1 d m = r m + (r m, r m ) (r m 1, r m 1 ) d m 1 ρ m = (rm, d m ) (d m, Ad m ) X m+1 = X m ρ m d m X =H Non-reproducibility: sources and solutions Reproducible EBE matrix-vector product dot product MPI reduction: a parallel compensated dot2 weights: (1/k, 1/k,..., 1/k) (1, 0,..., 0) Reproducible operations reproducible results Same errors for both sequential and parallel executions 23 / 31
58 Recovering reproducibility in a finite element resolution 1 Motivation 2 Reproducibility failure in a finite element simulation Sequential and parallel FE assembly Sources of non reproducibility in Telemac2D 3 Recovering reproducibility Reproducible parallel FE assembly Reproducible algebraic operations Reproducible conjugate gradient Reproducible Telemac2D 4 Efficiency 5 Conclusion 23 / 31
59 Reproducible Telemac2D Reproducible Telemac2D Execution: Sequential 2 procs 4 procs p procs Non-reproducible Original code original code accuracy reproducibility Diagonal resolutions: C 2 = B u A uhh, C 3 = B v A vhh. Interface point assembly: C 2, C 3 Solution U, V Conjugate gradient : Interface point assembly: A 1d in each iteration Solution H Wave equation: A 2, A 3 : diagonal matrices, A 1 = A hh A hua 1 2 Auh Ahv A 1 3 Avh, C 1 = B h A hua 1 2 Bu Ahv A 1 3 Bv, Interface point assembly: A 2, A 3, C 1 H A 1,C 1 A 2, A 3 Maximum Relative Error vs. Sequential Computation Reproducible code Maximum relative error gouttedo test case acc rep COMP P=2 COMP P=4 COMP P= Time step 24 / 31
60 Reproducible gouttedo! Time step 1 p=1 p=2 p=4 p=8 25 / 31
61 Reproducible gouttedo! Time step 2 p=1 p=2 p=4 p=8 25 / 31
62 Reproducible gouttedo! Time step 3 p=1 p=2 p=4 p=8 25 / 31
63 Reproducible gouttedo! Time step 4 p=1 p=2 p=4 p=8 25 / 31
64 Reproducible gouttedo! Time step 5 p=1 p=2 p=4 p=8 25 / 31
65 Reproducible gouttedo! Time step 6 p=1 p=2 p=4 p=8 25 / 31
66 Reproducible gouttedo! Time step 7 p=1 p=2 p=4 p=8 25 / 31
67 Reproducible gouttedo! Time step 8 p=1 p=2 p=4 p=8 25 / 31
68 Reproducible gouttedo! Time step 9 p=1 p=2 p=4 p=8 25 / 31
69 Reproducible gouttedo! Time step 10 p=1 p=2 p=4 p=8 25 / 31
70 Reproducible gouttedo! Time step 11 p=1 p=2 p=4 p=8 25 / 31
71 Reproducible gouttedo! Time step 12 p=1 p=2 p=4 p=8 25 / 31
72 Reproducible gouttedo! Time step 13 p=1 p=2 p=4 p=8 25 / 31
73 Reproducible gouttedo! Time step 14 p=1 p=2 p=4 p=8 25 / 31
74 Reproducible gouttedo! Time step 15 p=1 p=2 p=4 p=8 25 / 31
75 Reproducible gouttedo! Time step 15 p=1 p=2 p=4 p=8 26 / 31
76 Efficiency 1 Motivation 2 Reproducibility failure in a finite element simulation Sequential and parallel FE assembly Sources of non reproducibility in Telemac2D 3 Recovering reproducibility Reproducible parallel FE assembly Reproducible algebraic operations Reproducible conjugate gradient Reproducible Telemac2D 4 Efficiency 5 Conclusion 26 / 31
77 Runtime extra-cost for reproducible simulations Measures, test cases and mesh sizes hardware cycle counter: rdtsc gouttedo mesh sizes: 4624, 18225, nodes (1, 4, 16) #nodes #proc Number of IP Hardware and software env. opentelemac v7.2 socket: Intel Xeon E GHz (L3 cache = 20 M) 2 sockets of 8 cores each GNU Fortran 4.6.3, -O3 OpenMPI Linux generic 27 / 31
78 The core runtime extra-cost for reproducible gouttedo gouttedo core: no input/output steps, just building+solving x 1.16 Telemac v7, gouttedo x 1.23 Original, #nodes= 4624 Reproducible, #nodes= 4624 Original, #nodes= Reproducible, #nodes= Original, #nodes= Reproducible, #nodes= #cycles x 1.31 x 1.44 x 1.43 x x 1.64 x 1.83 x 1.59 x x 2.21 x # processors 28 / 31
79 Time to conclude 1 Motivation 2 Reproducibility failure in a finite element simulation Sequential and parallel FE assembly Sources of non reproducibility in Telemac2D 3 Recovering reproducibility Reproducible parallel FE assembly Reproducible algebraic operations Reproducible conjugate gradient Reproducible Telemac2D 4 Efficiency 5 Conclusion 28 / 31
80 Conclusion Recovering numerical reproducibility Industrial scale software: opentelemac-mascaret Finite element simulation, domain decomposition, linear system solving 2 reproducible modules: Tomawac, Telemac2D Integration in the next opentelemac version: current work 29 / 31
81 Conclusion Recovering numerical reproducibility Industrial scale software: opentelemac-mascaret Finite element simulation, domain decomposition, linear system solving 2 reproducible modules: Tomawac, Telemac2D Integration in the next opentelemac version: current work Feasibility How to recover reproducibility? Sources of non-reproducibility? Do existing techniques apply? how easily? Hand-made analysis of the computing workflow Compensation yields reproducibility here! Fits well to the opentelemac s vector library Other existing techniques also apply and more or less easily [4] 29 / 31
82 Conclusion Efficiency How much to pay for reproducibility? extra-cost which decreases as the problem size increases OK to debug, to validate and even to simulate! 30 / 31
83 Conclusion Efficiency How much to pay for reproducibility? extra-cost which decreases as the problem size increases OK to debug, to validate and even to simulate! Reproducibility at a larger scale: the whole opentelemac software suite Does it still working for complex, large and real-life simulations? The two FE test cases are significant enough to validate the methodology Localization of the failure sources is difficult to automatize but the methodology application is easy for the software developpers 30 / 31
84 Acknowledgment Jean-Michel Hervouet, LNE, EDR R&D, Chatou Chemseddine Chohra, DALI/LIRMM, UPVD, Perpignan 30 / 31
85 Références I J. W. Demmel and H. D. Nguyen. Fast reproducible floating-point summation. In Proc. 21th IEEE Symposium on Computer Arithmetic. Austin, Texas, USA, Y. He and C. Ding. Using accurate arithmetics to improve numerical reproducibility and stability in parallel applications. J. Supercomput., 18: , J.-M. Hervouet. Hydrodynamics of free surface flows: Modelling with the finite element method. John Wiley & Sons, P. Langlois, R. Nheili, and C. Denis. Numerical Reproducibility: Feasibility Issues. In NTMS 2015: 7th IFIP International Conference on New Technologies, Mobility and Security, pages 1 5, Paris, France, July IEEE, IEEE COMSOC & IFIP TC6.5 WG. P. Langlois, R. Nheili, and C. Denis. Recovering numerical reproducibility in hydrodynamic simulations. In J. Hormigo, S. Oberman, and N. Revol, editors, 23rd IEEE International Symposium on Computer Arithmetic, pages IEEE Computer Society, July (Silicon Valley, USA. July ). 30 / 31
86 Références II T. Ogita, S. M. Rump, and S. Oishi. Accurate sum and dot product. SIAM J. Sci. Comput., 26(6): , Open TELEMAC-MASCARET. v.7.0, Release notes R. W. Robey, J. M. Robey, and R. Aulwes. In search of numerical consistency in parallel programming. Parallel Comput., 37(4-5): , M. Taufer, O. Padron, P. Saponaro, and S. Patel. Improving numerical reproducibility and stability in large-scale numerical simulations on gpus. In IPDPS, pages 1 9. IEEE, O. Villa, D. G. Chavarría-Miranda, V. Gurumoorthi, A. Márquez, and S. Krishnamoorthy. Effects of floating-point non-associativity on numerical computations on massively multithreaded systems. In CUG 2009 Proceedings, pages 1 11, / 31
Numerical Reproducibility: Feasibility Issues
Numerical Reproducibility: Feasibility Issues Philippe Langlois, Rafife Nheili, Christophe Denis To cite this version: Philippe Langlois, Rafife Nheili, Christophe Denis. Numerical Reproducibility: Feasibility
More informationFirst improvements toward a reproducible Telemac-2D
First improvements toward a reproducible Telemac-2D Rafife Nheili, Philippe Langlois, Christophe Denis To cite this version: Rafife Nheili, Philippe Langlois, Christophe Denis. First improvements toward
More informationNumerical Verification of Large Scale CFD Simulations: One Way to Prepare the Exascale Challenge
Numerical Verification of Large Scale CFD Simulations: One Way to Prepare the Exascale Challenge Christophe DENIS Christophe.Denis@edf.fr EDF Resarch and Development - EDF Lab Clamart August 22, 2014 16
More informationReproducibility in Stochastic Simulation
Reproducibility in Stochastic Simulation Prof. Michael Mascagni Department of Computer Science Department of Mathematics Department of Scientific Computing Graduate Program in Molecular Biophysics Florida
More informationEstimation of numerical reproducibility on CPU and GPU
Estimation of numerical reproducibility on CPU and GPU Fabienne Jézéquel 1,2,3, Jean-Luc Lamotte 1,2, and Issam Saïd 1,2 1 Sorbonne Universités, UPMC Univ Paris 06, UMR 7606, LIP6, F-75005, Paris, France
More informationDeveloping the TELEMAC system for HECToR (phase 2b & beyond) Zhi Shang
Developing the TELEMAC system for HECToR (phase 2b & beyond) Zhi Shang Outline of the Talk Introduction to the TELEMAC System and to TELEMAC-2D Code Developments Data Reordering Strategy Results Conclusions
More informationOn Massively Parallel Algorithms to Track One Path of a Polynomial Homotopy
On Massively Parallel Algorithms to Track One Path of a Polynomial Homotopy Jan Verschelde joint with Genady Yoffe and Xiangcheng Yu University of Illinois at Chicago Department of Mathematics, Statistics,
More informationNumerical validation of compensated summation algorithms with stochastic arithmetic
NSV 2015 Numerical validation of compensated summation algorithms with stochastic arithmetic S. Graillat 1 Sorbonne Universités, UPMC Univ Paris 06, UMR 7606, LIP6, F-75005, Paris, France CNRS, UMR 7606,
More informationToward hardware support for Reproducible BLAS
Toward hardware support for Reproducible BLAS http://bebop.cs.berkeley.edu/reproblas/ James Demmel, Hong Diep Nguyen SCAN 2014 - Wurzburg, Germany Sep 24, 2014 1 / 17 Reproducibility Reproducibility: obtaining
More informationChristophe DENIS 1 and Sethy MONTAN 2
ESAIM: PROCEEDINGS, March 2012, Vol. 35, p. 107-113 Fédération Denis Poisson (Orléans-Tours) et E. Trélat (UPMC), Editors NUMERICAL VERIFICATION OF INDUSTRIAL NUMERICAL CODES Christophe DENIS 1 and Sethy
More informationEfficiency of Reproducible Level 1 BLAS
Efficiency of Reproducible Level 1 BLAS Chemseddine Chohra, Philippe Langlois, David Parello To cite this version: Chemseddine Chohra, Philippe Langlois, David Parello. Efficiency of Reproducible Level
More informationHigh Performance Computing Programming Paradigms and Scalability Part 6: Examples of Parallel Algorithms
High Performance Computing Programming Paradigms and Scalability Part 6: Examples of Parallel Algorithms PD Dr. rer. nat. habil. Ralf-Peter Mundani Computation in Engineering (CiE) Scientific Computing
More informationFirst steps towards more numerical reproducibility
First steps towards more numerical reproducibility Fabienne Jézéquel, Philippe Langlois, Nathalie Revol To cite this version: Fabienne Jézéquel, Philippe Langlois, Nathalie Revol. First steps towards more
More informationFloating Point Adverse Effects on the Numerical Verification and Reproducibility of Digital Simulations
Floating Point Adverse Effects on the Numerical Verification and Reproducibility of Digital Simulations Seminar at The Maison de la Simulation Christophe DENIS Research Fellow (HDR), CMLA, ENS Paris-Saclay
More informationNumerical Linear Algebra
Numerical Linear Algebra Probably the simplest kind of problem. Occurs in many contexts, often as part of larger problem. Symbolic manipulation packages can do linear algebra "analytically" (e.g. Mathematica,
More informationChapter 1 A New Parallel Algorithm for Computing the Singular Value Decomposition
Chapter 1 A New Parallel Algorithm for Computing the Singular Value Decomposition Nicholas J. Higham Pythagoras Papadimitriou Abstract A new method is described for computing the singular value decomposition
More informationCompensated algorithms in floating point arithmetic: accuracy, validation, performances.
Compensated algorithms in floating point arithmetic: accuracy, validation, performances. Nicolas Louvet Directeur de thèse: Philippe Langlois Université de Perpignan Via Domitia Laboratoire ELIAUS Équipe
More informationEFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI
EFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI 1 Akshay N. Panajwar, 2 Prof.M.A.Shah Department of Computer Science and Engineering, Walchand College of Engineering,
More informationRealization of Hardware Architectures for Householder Transformation based QR Decomposition using Xilinx System Generator Block Sets
IJSTE - International Journal of Science Technology & Engineering Volume 2 Issue 08 February 2016 ISSN (online): 2349-784X Realization of Hardware Architectures for Householder Transformation based QR
More informationParallel solution for finite element linear systems of. equations on workstation cluster *
Aug. 2009, Volume 6, No.8 (Serial No.57) Journal of Communication and Computer, ISSN 1548-7709, USA Parallel solution for finite element linear systems of equations on workstation cluster * FU Chao-jiang
More informationStudy and implementation of computational methods for Differential Equations in heterogeneous systems. Asimina Vouronikoy - Eleni Zisiou
Study and implementation of computational methods for Differential Equations in heterogeneous systems Asimina Vouronikoy - Eleni Zisiou Outline Introduction Review of related work Cyclic Reduction Algorithm
More informationWhy Use the GPU? How to Exploit? New Hardware Features. Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid. Semiconductor trends
Imagine stream processor; Bill Dally, Stanford Connection Machine CM; Thinking Machines Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid Jeffrey Bolz Eitan Grinspun Caltech Ian Farmer
More informationHow to perform HPL on CPU&GPU clusters. Dr.sc. Draško Tomić
How to perform HPL on CPU&GPU clusters Dr.sc. Draško Tomić email: drasko.tomic@hp.com Forecasting is not so easy, HPL benchmarking could be even more difficult Agenda TOP500 GPU trends Some basics about
More informationUppsala University Department of Information technology. Hands-on 1: Ill-conditioning = x 2
Uppsala University Department of Information technology Hands-on : Ill-conditioning Exercise (Ill-conditioned linear systems) Definition A system of linear equations is said to be ill-conditioned when
More informationSolving non-hydrostatic Navier-Stokes equations with a free surface
Solving non-hydrostatic Navier-Stokes equations with a free surface J.-M. Hervouet Laboratoire National d'hydraulique et Environnement, Electricite' De France, Research & Development Division, France.
More informationOptimizing TELEMAC-2D for Large-scale Flood Simulations
Available on-line at www.prace-ri.eu Partnership for Advanced Computing in Europe Optimizing TELEMAC-2D for Large-scale Flood Simulations Charles Moulinec a,, Yoann Audouin a, Andrew Sunderland a a STFC
More informationEffects of Floating-Point non-associativity on Numerical Computations on Massively Multithreaded Systems
Effects of Floating-Point non-associativity on Numerical Computations on Massively Multithreaded Systems Oreste Villa 1, Daniel Chavarría-Miranda 1, Vidhya Gurumoorthi 2, Andrés Márquez 1, and Sriram Krishnamoorthy
More informationArchitecture, Programming and Performance of MIC Phi Coprocessor
Architecture, Programming and Performance of MIC Phi Coprocessor JanuszKowalik, Piotr Arłukowicz Professor (ret), The Boeing Company, Washington, USA Assistant professor, Faculty of Mathematics, Physics
More informationEstimation d arrondis, analyse de stabilité des grands codes de calcul numérique
Estimation d arrondis, analyse de stabilité des grands codes de calcul numérique Jean-Marie Chesneaux, Fabienne Jézéquel, Jean-Luc Lamotte, Jean Vignes Laboratoire d Informatique de Paris 6, P. and M.
More informationModelling and implementation of algorithms in applied mathematics using MPI
Modelling and implementation of algorithms in applied mathematics using MPI Lecture 1: Basics of Parallel Computing G. Rapin Brazil March 2011 Outline 1 Structure of Lecture 2 Introduction 3 Parallel Performance
More informationLecture 27: Fast Laplacian Solvers
Lecture 27: Fast Laplacian Solvers Scribed by Eric Lee, Eston Schweickart, Chengrun Yang November 21, 2017 1 How Fast Laplacian Solvers Work We want to solve Lx = b with L being a Laplacian matrix. Recall
More information3D Finite Difference Time-Domain Modeling of Acoustic Wave Propagation based on Domain Decomposition
3D Finite Difference Time-Domain Modeling of Acoustic Wave Propagation based on Domain Decomposition UMR Géosciences Azur CNRS-IRD-UNSA-OCA Villefranche-sur-mer Supervised by: Dr. Stéphane Operto Jade
More informationVeriTracer: Context-enriched tracer for floating-point arithmetic analysis
VeriTracer: Context-enriched tracer for floating-point arithmetic analysis ARITH 25, Amherst MA USA, June 2018 Yohan Chatelain 1,5 Pablo de Oliveira Castro 1,5 Eric Petit 2,5 David Defour 3 Jordan Bieder
More informationAugmented Arithmetic Operations Proposed for IEEE
Augmented Arithmetic Operations Proposed for IEEE 754-2018 E. Jason Riedy Georgia Institute of Technology 25th IEEE Symposium on Computer Arithmetic, 26 June 2018 Outline History and Definitions Decisions
More informationINTRODUCTION TO OPENACC. Analyzing and Parallelizing with OpenACC, Feb 22, 2017
INTRODUCTION TO OPENACC Analyzing and Parallelizing with OpenACC, Feb 22, 2017 Objective: Enable you to to accelerate your applications with OpenACC. 2 Today s Objectives Understand what OpenACC is and
More informationThermomechanical and hydraulic industrial simulations using MUMPS at EDF
K u f Thermomechanical and hydraulic industrial simulations using MUMPS at EDF MUMPS User group meeting 15 april 2010 O.Boiteau, C.Denis, F.Zaoui MUltifrontal Massively Parallel sparse direct Solver 1
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra)
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 1: Course Overview; Matrix Multiplication Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical Analysis I 1 / 21 Outline 1 Course
More information2.7 Numerical Linear Algebra Software
2.7 Numerical Linear Algebra Software In this section we will discuss three software packages for linear algebra operations: (i) (ii) (iii) Matlab, Basic Linear Algebra Subroutines (BLAS) and LAPACK. There
More informationESPRESO ExaScale PaRallel FETI Solver. Hybrid FETI Solver Report
ESPRESO ExaScale PaRallel FETI Solver Hybrid FETI Solver Report Lubomir Riha, Tomas Brzobohaty IT4Innovations Outline HFETI theory from FETI to HFETI communication hiding and avoiding techniques our new
More informationDense Matrix Algorithms
Dense Matrix Algorithms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text Introduction to Parallel Computing, Addison Wesley, 2003. Topic Overview Matrix-Vector Multiplication
More informationReproBLAS: Reproducible BLAS
ReproBLAS: Reproducible BLAS http://bebop.cs.berkeley.edu/reproblas/ James Demmel, Nguyen Hong Diep SC 13 - Denver, CO Nov 22, 2013 1 / 15 Reproducibility Reproducibility: obtaining bit-wise identical
More informationPerformance Analysis of BLAS Libraries in SuperLU_DIST for SuperLU_MCDT (Multi Core Distributed) Development
Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe Performance Analysis of BLAS Libraries in SuperLU_DIST for SuperLU_MCDT (Multi Core Distributed) Development M. Serdar Celebi
More informationSystem Demonstration of Spiral: Generator for High-Performance Linear Transform Libraries
System Demonstration of Spiral: Generator for High-Performance Linear Transform Libraries Yevgen Voronenko, Franz Franchetti, Frédéric de Mesmay, and Markus Püschel Department of Electrical and Computer
More informationModelling, migration, and inversion using Linear Algebra
Modelling, mid., and inv. using Linear Algebra Modelling, migration, and inversion using Linear Algebra John C. Bancroft and Rod Blais ABSRAC Modelling migration and inversion can all be accomplished using
More informationAnalysis of Matrix Multiplication Computational Methods
European Journal of Scientific Research ISSN 1450-216X / 1450-202X Vol.121 No.3, 2014, pp.258-266 http://www.europeanjournalofscientificresearch.com Analysis of Matrix Multiplication Computational Methods
More informationHigh-Performance Scientific Computing
High-Performance Scientific Computing Instructor: Randy LeVeque TA: Grady Lemoine Applied Mathematics 483/583, Spring 2011 http://www.amath.washington.edu/~rjl/am583 World s fastest computers http://top500.org
More informationFigure 6.1: Truss topology optimization diagram.
6 Implementation 6.1 Outline This chapter shows the implementation details to optimize the truss, obtained in the ground structure approach, according to the formulation presented in previous chapters.
More informationApproaches to Parallel Implementation of the BDDC Method
Approaches to Parallel Implementation of the BDDC Method Jakub Šístek Includes joint work with P. Burda, M. Čertíková, J. Mandel, J. Novotný, B. Sousedík. Institute of Mathematics of the AS CR, Prague
More informationEfficient implementation of interval matrix multiplication
Efficient implementation of interval matrix multiplication Hong Diep Nguyen To cite this version: Hong Diep Nguyen. Efficient implementation of interval matrix multiplication. Para 2010: State of the Art
More informationQ. Wang National Key Laboratory of Antenna and Microwave Technology Xidian University No. 2 South Taiba Road, Xi an, Shaanxi , P. R.
Progress In Electromagnetics Research Letters, Vol. 9, 29 38, 2009 AN IMPROVED ALGORITHM FOR MATRIX BANDWIDTH AND PROFILE REDUCTION IN FINITE ELEMENT ANALYSIS Q. Wang National Key Laboratory of Antenna
More informationLecture 15: More Iterative Ideas
Lecture 15: More Iterative Ideas David Bindel 15 Mar 2010 Logistics HW 2 due! Some notes on HW 2. Where we are / where we re going More iterative ideas. Intro to HW 3. More HW 2 notes See solution code!
More informationOptimization of Lattice QCD with CG and multi-shift CG on Intel Xeon Phi Coprocessor
Optimization of Lattice QCD with CG and multi-shift CG on Intel Xeon Phi Coprocessor Intel K. K. E-mail: hirokazu.kobayashi@intel.com Yoshifumi Nakamura RIKEN AICS E-mail: nakamura@riken.jp Shinji Takeda
More informationMatching. Compare region of image to region of image. Today, simplest kind of matching. Intensities similar.
Matching Compare region of image to region of image. We talked about this for stereo. Important for motion. Epipolar constraint unknown. But motion small. Recognition Find object in image. Recognize object.
More informationExBLAS: Reproducible and Accurate BLAS Library
ExBLAS: Reproducible and Accurate BLAS Library Roman Iakymchuk, Sylvain Collange, David Defour, Stef Graillat To cite this version: Roman Iakymchuk, Sylvain Collange, David Defour, Stef Graillat. ExBLAS:
More informationImplicit Low-Order Unstructured Finite-Element Multiple Simulation Enhanced by Dense Computation using OpenACC
Fourth Workshop on Accelerator Programming Using Directives (WACCPD), Nov. 13, 2017 Implicit Low-Order Unstructured Finite-Element Multiple Simulation Enhanced by Dense Computation using OpenACC Takuma
More informationMAT 275 Laboratory 2 Matrix Computations and Programming in MATLAB
MATLAB sessions: Laboratory MAT 75 Laboratory Matrix Computations and Programming in MATLAB In this laboratory session we will learn how to. Create and manipulate matrices and vectors.. Write simple programs
More informationAn Improved Measurement Placement Algorithm for Network Observability
IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. 16, NO. 4, NOVEMBER 2001 819 An Improved Measurement Placement Algorithm for Network Observability Bei Gou and Ali Abur, Senior Member, IEEE Abstract This paper
More informationGTC 2013: DEVELOPMENTS IN GPU-ACCELERATED SPARSE LINEAR ALGEBRA ALGORITHMS. Kyle Spagnoli. Research EM Photonics 3/20/2013
GTC 2013: DEVELOPMENTS IN GPU-ACCELERATED SPARSE LINEAR ALGEBRA ALGORITHMS Kyle Spagnoli Research Engineer @ EM Photonics 3/20/2013 INTRODUCTION» Sparse systems» Iterative solvers» High level benchmarks»
More informationParallel Implementations of Gaussian Elimination
s of Western Michigan University vasilije.perovic@wmich.edu January 27, 2012 CS 6260: in Parallel Linear systems of equations General form of a linear system of equations is given by a 11 x 1 + + a 1n
More informationSolving Large Regression Problems using an Ensemble of GPU-accelerated ELMs
Solving Large Regression Problems using an Ensemble of GPU-accelerated ELMs Mark van Heeswijk 1 and Yoan Miche 2 and Erkki Oja 1 and Amaury Lendasse 1 1 Helsinki University of Technology - Dept. of Information
More informationTwo-Phase flows on massively parallel multi-gpu clusters
Two-Phase flows on massively parallel multi-gpu clusters Peter Zaspel Michael Griebel Institute for Numerical Simulation Rheinische Friedrich-Wilhelms-Universität Bonn Workshop Programming of Heterogeneous
More informationDomain Decomposition: Computational Fluid Dynamics
Domain Decomposition: Computational Fluid Dynamics July 11, 2016 1 Introduction and Aims This exercise takes an example from one of the most common applications of HPC resources: Fluid Dynamics. We will
More informationEvaluation of sparse LU factorization and triangular solution on multicore architectures. X. Sherry Li
Evaluation of sparse LU factorization and triangular solution on multicore architectures X. Sherry Li Lawrence Berkeley National Laboratory ParLab, April 29, 28 Acknowledgement: John Shalf, LBNL Rich Vuduc,
More informationExam Design and Analysis of Algorithms for Parallel Computer Systems 9 15 at ÖP3
UMEÅ UNIVERSITET Institutionen för datavetenskap Lars Karlsson, Bo Kågström och Mikael Rännar Design and Analysis of Algorithms for Parallel Computer Systems VT2009 June 2, 2009 Exam Design and Analysis
More informationChapter 14: Matrix Iterative Methods
Chapter 14: Matrix Iterative Methods 14.1INTRODUCTION AND OBJECTIVES This chapter discusses how to solve linear systems of equations using iterative methods and it may be skipped on a first reading of
More informationUnderstanding the performances of sparse compression formats using data parallel programming model
2017 International Conference on High Performance Computing & Simulation Understanding the performances of sparse compression formats using data parallel programming model Ichrak MEHREZ, Olfa HAMDI-LARBI,
More informationChapter 1 A New Parallel Algorithm for Computing the Singular Value Decomposition
Chapter 1 A New Parallel Algorithm for Computing the Singular Value Decomposition Nicholas J. Higham Pythagoras Papadimitriou Abstract A new method is described for computing the singular value decomposition
More informationScanning Real World Objects without Worries 3D Reconstruction
Scanning Real World Objects without Worries 3D Reconstruction 1. Overview Feng Li 308262 Kuan Tian 308263 This document is written for the 3D reconstruction part in the course Scanning real world objects
More informationSPH: Why and what for?
SPH: Why and what for? 4 th SPHERIC training day David Le Touzé, Fluid Mechanics Laboratory, Ecole Centrale de Nantes / CNRS SPH What for and why? How it works? Why not for everything? Duality of SPH SPH
More informationFast Primitives for Irregular Computations on the NEC SX-4
To appear: Crosscuts 6 (4) Dec 1997 (http://www.cscs.ch/official/pubcrosscuts6-4.pdf) Fast Primitives for Irregular Computations on the NEC SX-4 J.F. Prins, University of North Carolina at Chapel Hill,
More information3D Helmholtz Krylov Solver Preconditioned by a Shifted Laplace Multigrid Method on Multi-GPUs
3D Helmholtz Krylov Solver Preconditioned by a Shifted Laplace Multigrid Method on Multi-GPUs H. Knibbe, C. W. Oosterlee, C. Vuik Abstract We are focusing on an iterative solver for the three-dimensional
More informationBlocked Schur Algorithms for Computing the Matrix Square Root. Deadman, Edvin and Higham, Nicholas J. and Ralha, Rui. MIMS EPrint: 2012.
Blocked Schur Algorithms for Computing the Matrix Square Root Deadman, Edvin and Higham, Nicholas J. and Ralha, Rui 2013 MIMS EPrint: 2012.26 Manchester Institute for Mathematical Sciences School of Mathematics
More informationDomain Decomposition: Computational Fluid Dynamics
Domain Decomposition: Computational Fluid Dynamics December 0, 0 Introduction and Aims This exercise takes an example from one of the most common applications of HPC resources: Fluid Dynamics. We will
More informationAlgorithms, System and Data Centre Optimisation for Energy Efficient HPC
2015-09-14 Algorithms, System and Data Centre Optimisation for Energy Efficient HPC Vincent Heuveline URZ Computing Centre of Heidelberg University EMCL Engineering Mathematics and Computing Lab 1 Energy
More informationAcknowledgements. Prof. Dan Negrut Prof. Darryl Thelen Prof. Michael Zinn. SBEL Colleagues: Hammad Mazar, Toby Heyn, Manoj Kumar
Philipp Hahn Acknowledgements Prof. Dan Negrut Prof. Darryl Thelen Prof. Michael Zinn SBEL Colleagues: Hammad Mazar, Toby Heyn, Manoj Kumar 2 Outline Motivation Lumped Mass Model Model properties Simulation
More informationCorrected/Updated References
K. Kashiyama, H. Ito, M. Behr and T. Tezduyar, "Massively Parallel Finite Element Strategies for Large-Scale Computation of Shallow Water Flows and Contaminant Transport", Extended Abstracts of the Second
More informationExercise Set Decide whether each matrix below is an elementary matrix. (a) (b) (c) (d) Answer:
Understand the relationships between statements that are equivalent to the invertibility of a square matrix (Theorem 1.5.3). Use the inversion algorithm to find the inverse of an invertible matrix. Express
More informationx = 12 x = 12 1x = 16
2.2 - The Inverse of a Matrix We've seen how to add matrices, multiply them by scalars, subtract them, and multiply one matrix by another. The question naturally arises: Can we divide one matrix by another?
More informationEstimation de la reproductibilité numérique dans les environnements hybrides CPU-GPU
Estimation de la reproductibilité numérique dans les environnements hybrides CPU-GPU Fabienne Jézéquel 1, Jean-Luc Lamotte 1 & Issam Said 2 1 LIP6, Université Pierre et Marie Curie 2 Total & LIP6, Université
More informationDeep Learning for Visual Computing Prof. Debdoot Sheet Department of Electrical Engineering Indian Institute of Technology, Kharagpur
Deep Learning for Visual Computing Prof. Debdoot Sheet Department of Electrical Engineering Indian Institute of Technology, Kharagpur Lecture - 05 Classification with Perceptron Model So, welcome to today
More informationFirst Steps of YALES2 Code Towards GPU Acceleration on Standard and Prototype Cluster
First Steps of YALES2 Code Towards GPU Acceleration on Standard and Prototype Cluster YALES2: Semi-industrial code for turbulent combustion and flows Jean-Matthieu Etancelin, ROMEO, NVIDIA GPU Application
More informationcomputational Fluid Dynamics - Prof. V. Esfahanian
Three boards categories: Experimental Theoretical Computational Crucial to know all three: Each has their advantages and disadvantages. Require validation and verification. School of Mechanical Engineering
More informationDevelopment of a Maxwell Equation Solver for Application to Two Fluid Plasma Models. C. Aberle, A. Hakim, and U. Shumlak
Development of a Maxwell Equation Solver for Application to Two Fluid Plasma Models C. Aberle, A. Hakim, and U. Shumlak Aerospace and Astronautics University of Washington, Seattle American Physical Society
More informationA parallel computing framework and a modular collaborative cfd workbench in Java
Advances in Fluid Mechanics VI 21 A parallel computing framework and a modular collaborative cfd workbench in Java S. Sengupta & K. P. Sinhamahapatra Department of Aerospace Engineering, IIT Kharagpur,
More informationHigh-Order Finite-Element Earthquake Modeling on very Large Clusters of CPUs or GPUs
High-Order Finite-Element Earthquake Modeling on very Large Clusters of CPUs or GPUs Gordon Erlebacher Department of Scientific Computing Sept. 28, 2012 with Dimitri Komatitsch (Pau,France) David Michea
More informationParallel Implementation of QRD Algorithms on the Fujitsu AP1000
Parallel Implementation of QRD Algorithms on the Fujitsu AP1000 Zhou, B. B. and Brent, R. P. Computer Sciences Laboratory Australian National University Canberra, ACT 0200 Abstract This paper addresses
More informationA Scalable, Numerically Stable, High- Performance Tridiagonal Solver for GPUs. Li-Wen Chang, Wen-mei Hwu University of Illinois
A Scalable, Numerically Stable, High- Performance Tridiagonal Solver for GPUs Li-Wen Chang, Wen-mei Hwu University of Illinois A Scalable, Numerically Stable, High- How to Build a gtsv for Performance
More informationCMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)
CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) Parallel Programming with Message Passing and Directives 2 MPI + OpenMP Some applications can
More informationNOISE PROPAGATION FROM VIBRATING STRUCTURES
NOISE PROPAGATION FROM VIBRATING STRUCTURES Abstract R. Helfrich, M. Spriegel (INTES GmbH, Germany) Noise and noise exposure are becoming more important in product development due to environmental legislation.
More informationMulticore Challenge in Vector Pascal. P Cockshott, Y Gdura
Multicore Challenge in Vector Pascal P Cockshott, Y Gdura N-body Problem Part 1 (Performance on Intel Nehalem ) Introduction Data Structures (1D and 2D layouts) Performance of single thread code Performance
More informationME964 High Performance Computing for Engineering Applications
ME964 High Performance Computing for Engineering Applications Outlining Midterm Projects Topic 3: GPU-based FEA Topic 4: GPU Direct Solver for Sparse Linear Algebra March 01, 2011 Dan Negrut, 2011 ME964
More informationS0432 NEW IDEAS FOR MASSIVELY PARALLEL PRECONDITIONERS
S0432 NEW IDEAS FOR MASSIVELY PARALLEL PRECONDITIONERS John R Appleyard Jeremy D Appleyard Polyhedron Software with acknowledgements to Mark A Wakefield Garf Bowen Schlumberger Outline of Talk Reservoir
More informationParallel edge-based implementation of the finite element method for shallow water equations
Parallel edge-based implementation of the finite element method for shallow water equations I. Slobodcicov, F.L.B. Ribeiro, & A.L.G.A. Coutinho Programa de Engenharia Civil, COPPE / Universidade Federal
More informationSIPE: Small Integer Plus Exponent
SIPE: Small Integer Plus Exponent Vincent LEFÈVRE AriC, INRIA Grenoble Rhône-Alpes / LIP, ENS-Lyon Arith 21, Austin, Texas, USA, 2013-04-09 Introduction: Why SIPE? All started with floating-point algorithms
More informationContents. F10: Parallel Sparse Matrix Computations. Parallel algorithms for sparse systems Ax = b. Discretized domain a metal sheet
Contents 2 F10: Parallel Sparse Matrix Computations Figures mainly from Kumar et. al. Introduction to Parallel Computing, 1st ed Chap. 11 Bo Kågström et al (RG, EE, MR) 2011-05-10 Sparse matrices and storage
More informationSolving Sparse Linear Systems. Forward and backward substitution for solving lower or upper triangular systems
AMSC 6 /CMSC 76 Advanced Linear Numerical Analysis Fall 7 Direct Solution of Sparse Linear Systems and Eigenproblems Dianne P. O Leary c 7 Solving Sparse Linear Systems Assumed background: Gauss elimination
More informationA Parallel Implementation of the BDDC Method for Linear Elasticity
A Parallel Implementation of the BDDC Method for Linear Elasticity Jakub Šístek joint work with P. Burda, M. Čertíková, J. Mandel, J. Novotný, B. Sousedík Institute of Mathematics of the AS CR, Prague
More informationParallel Numerical Algorithms
Parallel Numerical Algorithms Chapter 3 Dense Linear Systems Section 3.3 Triangular Linear Systems Michael T. Heath and Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign
More informationExploiting GPU Caches in Sparse Matrix Vector Multiplication. Yusuke Nagasaka Tokyo Institute of Technology
Exploiting GPU Caches in Sparse Matrix Vector Multiplication Yusuke Nagasaka Tokyo Institute of Technology Sparse Matrix Generated by FEM, being as the graph data Often require solving sparse linear equation
More informationParallel Numerical Algorithms
Parallel Numerical Algorithms Chapter 5 Vector and Matrix Products Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Michael T. Heath Parallel
More information