Simulieren geht über Probieren
|
|
- Penelope Osborne
- 6 years ago
- Views:
Transcription
1 Simulieren geht über Probieren Ulrich Rüde Lehrstuhl für Informatik 10 (Systemsimulation) Universität Erlangen-Nürnberg www10.informatik.uni-erlangen.de Ulm, 17. Mai
2 Overview Motivation Three examples Material science and process technology: Metal Foams Nano Technology Biomedical Technology: The inverse EEG problem High Performance Computing Conclusions 2
3 The Two Principles of Science Three Experiments Theory Mathematical Models, Differential Equations, Newton Observation and prototypes empirical Sciences Computational Science Simulation, Optimization (quantitative) virtual Reality 3
4 Part IIa Metal Foams In collaboration with the Institut für Werkstoffwissenschaften Lehrstuhl Werkstoffkunde und Technologie der Metalle WTM (R.F. Singer, C. Körner) 4
5 Examples of Foams Glass Ceramics Metals Polymers Structural Properties stiffness energy absorption damping Functional Properties burner, shock absorber, heat exchanger, batteries large, dynamic surface expansion 5
6 Towards Simulating Metal Foams Bubble growth, coalescence, collapse, drainage, rheology, etc. are still poorly understood Simulation as a tool to better understand, control and optimize the process 6
7 The Lattice-Boltzmann Method Real valued representation of particles Discrete velocities and positions Algorithm consists of two steps: Stream Collide 7
8 The Stream Step Move particle distribution functions along corresponding velocity vector Normalized time step, cell size and particle speed 8
9 The Collide Step Amounts for collisions of particles during movement Weigh equilibrium velocities and velocities from streaming depending on fluid viscosity 9
10 Free surfaces with LBM Metal Foams huge gas volumes Only simulate and track fluid motion Compute boundary conditions at free surface 10
11 Boundary Conditions Problem: Missing distribution functions at interface cells after streaming! Liquid Gas Reconstruction such that macroscopic boundary conditions are satisfied. Körner et al. Lattice Boltzmann Model for Free Surface Flow, to be published in Journal of Computational Physics 11
12 Curvature calculation (version I) Alternative approaches: Integrate normals over surface (weighted triangles) Level set methods (track surface as implicit function) 12
13 Free surface flow: Breaking Dam Zur Anzeige wird der QuickTime Dekompressor YUV420 codec bentigt. 13
14 Visualization Ray-tracing Refraction Reflection Caustics About 15 Min per frame = 1 day for 4 secs About same compute time as flow simulation 14
15 Rising Bubbles Zur Anzeige wird der QuickTime Dekompressor YUV420 codec bentigt. 15
16 More Rising Bubbles Zur Anzeige wird der QuickTime Dekompressor YUV420 codec bentigt. 16
17 Simulation Verification by Experiment Zur Anzeige wird der QuickTime Dekompressor YUV420 codec bentigt. Simulation and Experiment: Diplomarbeit N. Thürey 17
18 Velocity 0,005 Verification for bubble dynamics 0,004 (C. Körner) Stokes Law: Climbing rate of a bubble exposed to gravity 0,003 Ideal bubble No boundaries Equilibrium state 0,002 0,001 Zur Anzeige wird der QuickTime Dekompressor Cinepak bentigt. Climb rate 0,000 R = 8, τ = 0.74, g = 10-4, σ = 2*10-2 Example 2 2: x x cells Rel. in error: Distance l.u.2 % Error = function of the system size 18
19 True Foams with Disjoining Pressure Zur Anzeige wird der QuickTime Dekompressor Cinepak bentigt. Zur Anzeige wird der QuickTime Dekompressor bentigt. 19
20 Data Set Pulsating Blood Flow at Aneurysm CE Elite Master Thesis: Jan Götz Zur Anzeige wird der QuickTime Dekompressor YUV420 codec bentigt. 20
21 Data Set Pulsating Blood Flow at Aneurysm Master Thesis Jan Götz In Zusammenarbeit mit Zur Anzeige wird der QuickTime Dekompressor YUV420 codec bentigt. Neuroradiologie (Prof. Dörfler) Bildverarbeitung Simulation Strömungsmechanik 21
22 Part III High Performance Computing 22
23 A little quiz... 1 Kflops = 103, 1 Mflops = 106, 1 Gflops = 109, 1 Tflops = 1012, 1 Pflops = 1015 floating point operations per second What is the speed of your PC? What is the speed of the fastest computer currently available (and where is it located) What was the speed of the fastest computer in 1995? 2000? 2005? 23
24 A little quiz... 1 Kflops = 103, 1 Mflops = 106, 1 Gflops = 109, 1 Tflops = 1012, 1 Pflops = 1015 floating point operations per second What is the speed of your PC? probably between 1 and 6.5 GFlops 24
25 A little quiz... 1 Kflops = 103, 1 Mflops = 106, 1 Gflops = 109, 1 Tflops = 1012, 1 Pflops = 1015 floating point operations per second What is the speed of your PC? What is the speed of the fastest computer currently available (and where is it located) What was the speed of the fastest computer in 1995? 2000? 2005? 25
26 A little quiz... 1 Kflops = 103, 1 Mflops = 106, 1 Gflops = 109, 1 Tflops = 1012, 1 Pflops = 1015 floating point operations per second What is the speed of your PC? What is the speed of the fastest computer currently available (and where is it located) 367 Tflop, it is a Blue Gene/L in Livermore/California with > processors 26
27 A little quiz... 1 Kflops = 103, 1 Mflops = 106, 1 Gflops = 109, 1 Tflops = 1012, 1 Pflops = 1015 floating point operations per second What is the speed of your PC? What is the speed of the fastest computer currently available (and where is it located) What was the speed of the fastest computer in 1995? 2000? 2005? 27
28 A little quiz... 1 Kflops = 103, 1 Mflops = 106, 1 Gflops = 109, 1 Tflops = 1012, 1 Pflops = 1015 floating point operations per second What is the speed of your PC? What is the speed of the fastest computer currently available (and where is it located) What was the speed of the fastest computer in 1995? 2000? 2005? TFlops 12.3 TFlops 367 TFlops... and how much has the speed of cars/airplanes/... improved in the same time? additional question: When do you expect that computers exceed 1 PFlops? 28
29 Compute Nodes (8x4 CPUs) LSS-Cluster CPU: AMD Opteron GHz, max. 4.4 GFlops RAM: 16 GByte Interactive Nodes (9x2 CPUs) CPU: AMD Opteron 248 High-Speed Network InfiniBand 10 GBit/s Fujitsu-Siemens 29
30 Architecture example: Our Pet Dinosaur 8 Proc and 8 GB per node Hitachi SR 8000 at the Leibniz-Rechenzentrum der Bayerischen Akademie der Wissenschaften Performance: 1344 CPUs (168*8) 12 GFlop/node 2016 GFlop total Linpack: 1645 Gflop (82% of theoretical peak) Very sensitive to data structures To be replaced by a 6000 Proc. SGI in 1H 2006 Upgrade to >70 Tflop in 2007 (No. 5 at time of installation in 2000) 30
31 31
32 32
33 Supercomputer Performance TOP 500 Zur Anzeige wird der QuickTime Dekompressor TIFF (Unkomprimiert) bentigt. 33
34 Growth: V 52% per year Transistors/Die G 1G M Merced Pentium Pro Pentium M 1M K K Growth: 42% per year K DRAM Microprocessor (Intel) Year Moore's Law in Semiconductor Technology (F. Hossfeld)
35 Semiconductor Technology 10 9 Atoms/Bit kt Energy/logic Operation [pico-joules Year Information Density & Energy Dissipation (adapted by F. Hossfeld from C. P. Williams et al., 1998) 35
36 Parallelization of LBM Code Standard LBM-Code in C (1-D Partitioning): - excellent performance on single SR8000 node - almost linear speed-up - large partitions favorable Performance on SR8000 Ca. 30% of Peak Performance 36
37 Parallelization Standard LBM-Code: Scalability Largest Simulation: 1,08*109 cells 370 GByte memory Communication Cost because of large data volume (64 MByte) Efficiency ~ 75% Dissertation T. Pohl (2006) 37
38 Parallelization Free surface LBM-Code Standard LBM 1 sweep through grid Free surface LBM 5 sweeps through grid Cell type changes, Closed boundary for bubbles, Initialization of modified cells, Mass balance correction 38
39 Parallelization Free surface LBM-Code: Standard LBM 1 sweep through grid 1 row of ghost nodes Free surface LBM 5 sweeps through grid 4 rows of ghost nodes 39
40 Performance Standard LBM-Code Free surface LBM-Code Performance lousy on a single node! Conditionals: 2,9 SLBM 51 free surface LBM Pentium 4: almost no degradation ~ 10% SR 8000: enormous degradation (pseudo-vector, predictable jumps) 40
41 Structured vs. Unstructured Grids (on Hitachi SR 8000) JDS Stencils ,937 2,146,689 # unknowns gridlib/hhg MFlops rates for matrix-vector multiplication on one node on the Hitachi compared with highly tuned JDS results for sparse matrices (courtesy of G. Wellein, RRZE Erlangen) 41
42 Refinement example Input Grid 42
43 Refinement example Refinement Level one 43
44 Refinement example Refinement Level Two 44
45 Refinement example Structured Interior 45
46 Refinement example Structured Interior 46
47 Refinement example Edge Interior 47
48 Refinement example Edge Interior 48
49 HHG: Parallel Scalability #Procs #DOFS x 10^6 #Els x 10^6 #Input Els GFLOP/s Time [s] 64 2,144 12, / ,288 25, / ,577 51, / , , / , , ,456/ Parallel scalability of Poisson problem discretized by tetrahedral finite elements: Machine - SGI Altix (Itanium GHz) B. Bergen, F. Hülsemann, U. Ruede: Is unknowns the largest finite element system that can be solved today? in SuperComputing, Nov
50 Conclusions (1) High performance simulation still requires heroic programming but we are on the way to make supercomputers more generally usable Parallel Programming is easy, node performance is difficult (B. Gropp) Which architecture? ASCI-type: custom CPU, massively parallel cluster of SMPs nobody has been able to show that these machines scale efficiently, except on a few very special applications and using enormous human effort Earth-simulator-type: Vector CPU, as many CPUs as affordable impressive performance on vectorizable code, but need to check with more demanding data and algorithm structures Hitachi Class: modified custom CPU, cluster of SMPs excellent performance on some codes, but unexpected slowdowns on others, too exotic to have a sufficiently large software base Others: BlueGene, Cray X1, Multithreading, PIM, reconfigurable, quantum computing, 50
51 Conclusions (2) Which data structures? structured (inflexible) unstructured (slow) HHG (high development effort, even prototype 50 K lines of code) meshless (useful in niches) Where are we going? the end of Moore s law nobody builds CPUs with HPC specific requirements high on the list of priorities petaflops: 100,000 processors and we can hardly handle 1000 It s the locality - stupid! the memory wall latency bandwidth Distinguish between algorithms where control flow is data independent: latency hiding techniques (pipelining, prefetching, etc) can help data dependent 51
52 In the Future? What s beyond Moore s Law? 52
53 Part VI Outlook: Other applications 3D-Animation Computational Steering Real-Time Simulation 53
54 Near-Real-Time Free-Surface LBM (N. Thürey) Zur Anzeige wird der QuickTime Dekompressor bentigt. 54
55 Free-Surface LBM with Adaptive Refinement (N. Thürey) Zur Anzeige wird der QuickTime Dekompressor bentigt. Hochaufgelöste Animationen Adaptive Verfeinerung/ Vergröberung Visualisierung mit Raytracer Fluid-Simulation in Blender 2.4 ( ) Blender: 3DModellierungsprogramm Frei verfügbar: 55
56 Collaborators Acknowledgements In Erlangen: WTM, LSE, LSTM, LGDV, RRZE, Neurozentrum, Radiologie, etc. Especially for foams: C. Körner (WTM) International: Utah, Technion, Constanta, Ghent, Boulder,... Dissertationen Projects U. Fabricius (AMG-Verfahren and SW-Engineering for parallelization) C. Freundl (Parelle Expression Templates for PDE-solver) J. Härtlein (Expression Templates for FE-Applications) N. Thürey (LBM, free surfaces) T. Pohl (Parallel LBM)... and 6 more 19 Diplom- /Master- Thesis Studien- /Bachelor- Thesis Especially for Performance-Analysis/ Optimization for LBM J. Wilke, K. Iglberger, S. Donath... and 23 more KONWIHR, DFG, NATO, BMBF Elitenetzwerk Bayern Bavarian Graduate School in Computational Engineering (with TUM, since 2004) Special International PhD program: Identifikation, Optimierung und Steuerung für technische Anwendungen (with Bayreuth and Würzburg) to start Jan
57 Talk is Over Zur Anzeige wird der QuickTime Dekompressor bentigt. Please wake up! 57
58 The Lattice-Boltzmann Method Based on cellular automata Introduced by von Neumann around 1940 Famous: Conway s Game of Life Complex system with simple rules Regular grid Local rules specifying time evolution Intrinsically parallel for model & simulation, similar to elliptic PDE solvers 58
59 The Lattice-Boltzmann Method Weakly compressible approximation of the Navier-Stokes equations Easy implementation Applicable for small Mach numbers (< 0.1) Easy to adapt, e.g. for Complicated or time-varying geometries Free surfaces Additional physical and chemical effects 59
60 LBM Demonstration (Java applet) file:///users/ruede/doc/lehr/vorles/ws03/hppt/lbm/jlb-comp/start.html 60
61 Free surface implementation Before stream step, compute mass exchange across cell boundaries for interface cells Calculate bubble volumes and pressure Surface curvature for surface tension Change topology if interface cells become full or empty keep layer of interface cells closed 61
62 Surface Tension (Vers. 2) Marching-cube surface triangulation Compute a curvature for each triangle _ ν1 δα = Α Α δv k= 1 da 2 dv _ n3 Α _ Α n2 Associate with each LBM cell the average curvature of its triangles Complicated Beats level sets for our applications (mass conservation). 62
63 Nano Technology Curved Boundaries: Particles approximated with spheres Improve accuracy of LBM simulations by using curved boundary conditions Standard No-Slip Reflect DFs at cell boundary More accurate: Take distance to boundary surface into account, then interpolate DFs accordingly 63
64 What are hierarchical hybrid grids? Standard geometric multigrid approach: Purely unstructured input grid resolves geometry of problem domain Patch-wise regular refinement applied repeatedly to every cell of the coarse grid generates nested grid hierarchies naturally suitable for geometric multigrid algorithms New: Modify storage formats and operations on the grid to reflect the generated regular substructures 64
65 What s the new thing here? Hierarchical hybrid grids (HHG) are not yet another block structured grid HHG are more flexible (unstructured, hybrid input grids) are not yet another unstructured geometric multigrid package HHG achieve better performance -unstructured treatment of regular regions does not improve performance 65
66 Simulation is Performance hungry and Memory intensive Parallel Supercomputing required 66
67 Current Challenge: Parallelism on all levels and The Memory Wall Parallel computing is easy, good (single) processor performance is difficult (B. Gropp, Argonne) There has been no significant progress in High Performance Computing over the past 5 years (H. Simon, NERSC) Instruction level parallelism Memory bandwidth and latency are the limiting factors Cache-aware algorithms Conventional complexity measures (based on operation count) are becoming increasingly unrealistic 67
68 LSS Cluster-Computer Fujitsu-Siemens HPC Line Programming Methods Cache Optimization C++ Expression Templates (Parallel) Algorithms Cooperations in Material Sciences Engineering Mechanical Electrical Chemical Medical Technology... 68
Performance and Software-Engineering Considerations for Massively Parallel Simulations
Performance and Software-Engineering Considerations for Massively Parallel Simulations Ulrich Rüde (ruede@cs.fau.de) Ben Bergen, Frank Hülsemann, Christoph Freundl Universität Erlangen-Nürnberg www10.informatik.uni-erlangen.de
More informationTowards PetaScale Computational Science
Towards PetaScale Computational Science U. Rüde (LSS Erlangen, ruede@cs.fau.de) joint work with many Lehrstuhl für Informatik 10 (Systemsimulation) Universität Erlangen-Nürnberg www10.informatik.uni-erlangen.de
More informationSolving Finite Element Systems with 17 Billion Unknowns at Sustained Teraflops Performance
Solving Finite Element Systems with 17 Billion Unknowns at Sustained Teraflops Performance B. Bergen (LSS Erlangen/Los Alamos) T. Gradl (LSS Erlangen) F. Hülsemann (LSS Erlangen/EDF) U. Rüde (LSS Erlangen,
More informationParallel Solution of a Finite Element Problem with 17 Billion Unknowns
Parallel Solution of a Finite Element Problem with 17 Billion Unknowns B. Bergen (LSS Erlangen/Los Alamos) T. Gradl (LSS Erlangen) F. Hülsemann (LSS Erlangen/EDF) U. Rüde (LSS Erlangen, ruede@cs.fau.de)
More informationHigh End Computing for Large Scale Simulations
High End Computing for Large Scale Simulations B. Bergen (LSS Erlangen/ now Los Alamos) N. Thürey (LSS Erlangen/ now ETH Zürich) U. Rüde (LSS Erlangen, ruede@cs.fau.de) joint work with many more Lehrstuhl
More informationScalable Parallel Multigrid for Finite Element Computations
Scalable Parallel Multigrid for Finite Element Computations Minisymposium MS8 @ SIAM CS&E 2007 Large Scale Parallel Multigrid B. Bergen (LSS Erlangen/ now Los Alamos) T. Gradl, Chr. Freundl (LSS Erlangen)
More informationMassively Parallel Multgrid for Finite Elements
Massively Parallel Multgrid for Finite Elements B. Bergen (LSS Erlangen/ now Los Alamos) T. Gradl, Chr. Freundl (LSS Erlangen) U. Rüde (LSS Erlangen, ruede@cs.fau.de) Lehrstuhl für Informatik 10 (Systemsimulation)
More informationAdaptive Hierarchical Grids with a Trillion Tetrahedra
Adaptive Hierarchical Grids with a Trillion Tetrahedra Tobias Gradl, Björn Gmeiner and U. Rüde (LSS Erlangen, ruede@cs.fau.de) in collaboration with many more Lehrstuhl für Informatik 10 (Systemsimulation)
More information(LSS Erlangen, Simon Bogner, Ulrich Rüde, Thomas Pohl, Nils Thürey in collaboration with many more
Parallel Free-Surface Extension of the Lattice-Boltzmann Method A Lattice-Boltzmann Approach for Simulation of Two-Phase Flows Stefan Donath (LSS Erlangen, stefan.donath@informatik.uni-erlangen.de) Simon
More informationNumerical Simulation in the Multi-Core Age
Numerical Simulation in the Multi-Core Age N. Thürey (LSS Erlangen/ now ETH Zürich), J. Götz, M. Stürmer, S. Donath, C. Feichtinger, K. Iglberger, T. Preclic, P. Neumann (LSS Erlangen) U. Rüde (LSS Erlangen,
More informationArchitecture Aware Multigrid
Architecture Aware Multigrid U. Rüde (LSS Erlangen, ruede@cs.fau.de) joint work with D. Ritter, T. Gradl, M. Stürmer, H. Köstler, J. Treibig and many more students Lehrstuhl für Informatik 10 (Systemsimulation)
More informationPerformance Analysis of the Lattice Boltzmann Method on x86-64 Architectures
Performance Analysis of the Lattice Boltzmann Method on x86-64 Architectures Jan Treibig, Simon Hausmann, Ulrich Ruede Zusammenfassung The Lattice Boltzmann method (LBM) is a well established algorithm
More informationTowards Exa-Scale: Computing with Millions of Cores
Towards Exa-Scale: Computing with Millions of Cores U. Rüde (LSS Erlangen, ruede@cs.fau.de) Lehrstuhl für Informatik 10 (Systemsimulation) Excellence Cluster Engineering of Advanced Materials Universität
More informationThe walberla Framework: Multi-physics Simulations on Heterogeneous Parallel Platforms
The walberla Framework: Multi-physics Simulations on Heterogeneous Parallel Platforms Harald Köstler, Uli Rüde (LSS Erlangen, ruede@cs.fau.de) Lehrstuhl für Simulation Universität Erlangen-Nürnberg www10.informatik.uni-erlangen.de
More informationA Contact Angle Model for the Parallel Free Surface Lattice Boltzmann Method in walberla Stefan Donath (stefan.donath@informatik.uni-erlangen.de) Computer Science 10 (System Simulation) University of Erlangen-Nuremberg
More informationIntroducing a Cache-Oblivious Blocking Approach for the Lattice Boltzmann Method
Introducing a Cache-Oblivious Blocking Approach for the Lattice Boltzmann Method G. Wellein, T. Zeiser, G. Hager HPC Services Regional Computing Center A. Nitsure, K. Iglberger, U. Rüde Chair for System
More informationτ-extrapolation on 3D semi-structured finite element meshes
τ-extrapolation on 3D semi-structured finite element meshes European Multi-Grid Conference EMG 2010 Björn Gmeiner Joint work with: Tobias Gradl, Ulrich Rüde September, 2010 Contents The HHG Framework τ-extrapolation
More informationSoftware and Performance Engineering for numerical codes on GPU clusters
Software and Performance Engineering for numerical codes on GPU clusters H. Köstler International Workshop of GPU Solutions to Multiscale Problems in Science and Engineering Harbin, China 28.7.2010 2 3
More informationReconstruction of Trees from Laser Scan Data and further Simulation Topics
Reconstruction of Trees from Laser Scan Data and further Simulation Topics Helmholtz-Research Center, Munich Daniel Ritter http://www10.informatik.uni-erlangen.de Overview 1. Introduction of the Chair
More informationComputational Fluid Dynamics with the Lattice Boltzmann Method KTH SCI, Stockholm
Computational Fluid Dynamics with the Lattice Boltzmann Method KTH SCI, Stockholm March 17 March 21, 2014 Florian Schornbaum, Martin Bauer, Simon Bogner Chair for System Simulation Friedrich-Alexander-Universität
More informationSimulation of Liquid-Gas-Solid Flows with the Lattice Boltzmann Method
Simulation of Liquid-Gas-Solid Flows with the Lattice Boltzmann Method June 21, 2011 Introduction Free Surface LBM Liquid-Gas-Solid Flows Parallel Computing Examples and More References Fig. Simulation
More informationsmooth coefficients H. Köstler, U. Rüde
A robust multigrid solver for the optical flow problem with non- smooth coefficients H. Köstler, U. Rüde Overview Optical Flow Problem Data term and various regularizers A Robust Multigrid Solver Galerkin
More informationNumerical Algorithms on Multi-GPU Architectures
Numerical Algorithms on Multi-GPU Architectures Dr.-Ing. Harald Köstler 2 nd International Workshops on Advances in Computational Mechanics Yokohama, Japan 30.3.2010 2 3 Contents Motivation: Applications
More informationLarge scale Imaging on Current Many- Core Platforms
Large scale Imaging on Current Many- Core Platforms SIAM Conf. on Imaging Science 2012 May 20, 2012 Dr. Harald Köstler Chair for System Simulation Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen,
More informationFRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG. Lehrstuhl für Informatik 10 (Systemsimulation)
FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG INSTITUT FÜR INFORMATIK (MATHEMATISCHE MASCHINEN UND DATENVERARBEITUNG) Lehrstuhl für Informatik 10 (Systemsimulation) Hierarchical hybrid grids: A framework
More informationAdaptive-Mesh-Refinement Hydrodynamic GPU Computation in Astrophysics
Adaptive-Mesh-Refinement Hydrodynamic GPU Computation in Astrophysics H. Y. Schive ( 薛熙于 ) Graduate Institute of Physics, National Taiwan University Leung Center for Cosmology and Particle Astrophysics
More informationMassively Parallel Phase Field Simulations using HPC Framework walberla
Massively Parallel Phase Field Simulations using HPC Framework walberla SIAM CSE 2015, March 15 th 2015 Martin Bauer, Florian Schornbaum, Christian Godenschwager, Johannes Hötzer, Harald Köstler and Ulrich
More informationsimulation framework for piecewise regular grids
WALBERLA, an ultra-scalable multiphysics simulation framework for piecewise regular grids ParCo 2015, Edinburgh September 3rd, 2015 Christian Godenschwager, Florian Schornbaum, Martin Bauer, Harald Köstler
More informationFree Surface Lattice-Boltzmann fluid simulations. with and without level sets.
ree Surface Lattice-Boltzmann fluid simulations with and without level sets Nils Thürey, Ulrich Rüde University of Erlangen-Nuremberg System Simulation roup Cauerstr. 6, 91054 Erlangen, ermany Email: Nils.Thuerey@cs.fau.de
More informationGeometric Multigrid on Multicore Architectures: Performance-Optimized Complex Diffusion
Geometric Multigrid on Multicore Architectures: Performance-Optimized Complex Diffusion M. Stürmer, H. Köstler, and U. Rüde Lehrstuhl für Systemsimulation Friedrich-Alexander-Universität Erlangen-Nürnberg
More informationHierarchical Hybrid Grids
Hierarchical Hybrid Grids IDK Summer School 2012 Björn Gmeiner, Ulrich Rüde July, 2012 Contents Mantle convection Hierarchical Hybrid Grids Smoothers Geometric approximation Performance modeling 2 Mantle
More informationDevelopment of an Integrated Computational Simulation Method for Fluid Driven Structure Movement and Acoustics
Development of an Integrated Computational Simulation Method for Fluid Driven Structure Movement and Acoustics I. Pantle Fachgebiet Strömungsmaschinen Karlsruher Institut für Technologie KIT Motivation
More informationEfficient Imaging Algorithms on Many-Core Platforms
Efficient Imaging Algorithms on Many-Core Platforms H. Köstler Dagstuhl, 22.11.2011 Contents Imaging Applications HDR Compression performance of PDE-based models Image Denoising performance of patch-based
More informationVirtual EM Inc. Ann Arbor, Michigan, USA
Functional Description of the Architecture of a Special Purpose Processor for Orders of Magnitude Reduction in Run Time in Computational Electromagnetics Tayfun Özdemir Virtual EM Inc. Ann Arbor, Michigan,
More informationTowards real-time prediction of Tsunami impact effects on nearshore infrastructure
Towards real-time prediction of Tsunami impact effects on nearshore infrastructure Manfred Krafczyk & Jonas Tölke Inst. for Computational Modeling in Civil Engineering http://www.cab.bau.tu-bs.de 24.04.2007
More informationComputational Fluid Dynamics
Computational Fluid Dynamics Prof. Dr.-Ing. Siegfried Wagner Institut für Aerodynamik und Gasdynamik, Universität Stuttgart, Pfaffenwaldring 21, 70550 Stuttgart A large number of highly qualified papers
More informationHPC Algorithms and Applications
HPC Algorithms and Applications Dwarf #5 Structured Grids Michael Bader Winter 2012/2013 Dwarf #5 Structured Grids, Winter 2012/2013 1 Dwarf #5 Structured Grids 1. dense linear algebra 2. sparse linear
More informationComputing on GPU Clusters
Computing on GPU Clusters Robert Strzodka (MPII), Dominik Göddeke G (TUDo( TUDo), Dominik Behr (AMD) Conference on Parallel Processing and Applied Mathematics Wroclaw, Poland, September 13-16, 16, 2009
More informationCUDA. Fluid simulation Lattice Boltzmann Models Cellular Automata
CUDA Fluid simulation Lattice Boltzmann Models Cellular Automata Please excuse my layout of slides for the remaining part of the talk! Fluid Simulation Navier Stokes equations for incompressible fluids
More informationEfficient Tridiagonal Solvers for ADI methods and Fluid Simulation
Efficient Tridiagonal Solvers for ADI methods and Fluid Simulation Nikolai Sakharnykh - NVIDIA San Jose Convention Center, San Jose, CA September 21, 2010 Introduction Tridiagonal solvers very popular
More informationEfficiency Aspects for Advanced Fluid Finite Element Formulations
Proceedings of the 5 th International Conference on Computation of Shell and Spatial Structures June 1-4, 2005 Salzburg, Austria E. Ramm, W. A. Wall, K.-U. Bletzinger, M. Bischoff (eds.) www.iassiacm2005.de
More informationMultigrid algorithms on multi-gpu architectures
Multigrid algorithms on multi-gpu architectures H. Köstler European Multi-Grid Conference EMG 2010 Isola d Ischia, Italy 20.9.2010 2 Contents Work @ LSS GPU Architectures and Programming Paradigms Applications
More informationAlgorithms and Architecture. William D. Gropp Mathematics and Computer Science
Algorithms and Architecture William D. Gropp Mathematics and Computer Science www.mcs.anl.gov/~gropp Algorithms What is an algorithm? A set of instructions to perform a task How do we evaluate an algorithm?
More informationRadial Basis Function-Generated Finite Differences (RBF-FD): New Opportunities for Applications in Scientific Computing
Radial Basis Function-Generated Finite Differences (RBF-FD): New Opportunities for Applications in Scientific Computing Natasha Flyer National Center for Atmospheric Research Boulder, CO Meshes vs. Mesh-free
More informationLehrstuhl für Informatik 10 (Systemsimulation)
FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG INSTITUT FÜR INFORMATIK (MATHEMATISCHE MASCHINEN UND DATENVERARBEITUNG) Lehrstuhl für Informatik 10 (Systemsimulation) On the Resource Requirements of
More informationExploring unstructured Poisson solvers for FDS
Exploring unstructured Poisson solvers for FDS Dr. Susanne Kilian hhpberlin - Ingenieure für Brandschutz 10245 Berlin - Germany Agenda 1 Discretization of Poisson- Löser 2 Solvers for 3 Numerical Tests
More informationFrom Notebooks to Supercomputers: Tap the Full Potential of Your CUDA Resources with LibGeoDecomp
From Notebooks to Supercomputers: Tap the Full Potential of Your CUDA Resources with andreas.schaefer@cs.fau.de Friedrich-Alexander-Universität Erlangen-Nürnberg GPU Technology Conference 2013, San José,
More informationShallow Water Simulations on Graphics Hardware
Shallow Water Simulations on Graphics Hardware Ph.D. Thesis Presentation 2014-06-27 Martin Lilleeng Sætra Outline Introduction Parallel Computing and the GPU Simulating Shallow Water Flow Topics of Thesis
More informationCost-Effective Parallel Computational Electromagnetic Modeling
Cost-Effective Parallel Computational Electromagnetic Modeling, Tom Cwik {Daniel.S.Katz, cwik}@jpl.nasa.gov Beowulf System at PL (Hyglac) l 16 Pentium Pro PCs, each with 2.5 Gbyte disk, 128 Mbyte memory,
More informationEfficient implementation of simple lattice Boltzmann kernels
Survey Introduction Efficient implementation of simple lattice Boltzmann kernels Memory hierarchies of modern processors Implementation of lattice Boltzmann method Optimization of data access for 3D lattice
More informationTwo-Phase flows on massively parallel multi-gpu clusters
Two-Phase flows on massively parallel multi-gpu clusters Peter Zaspel Michael Griebel Institute for Numerical Simulation Rheinische Friedrich-Wilhelms-Universität Bonn Workshop Programming of Heterogeneous
More informationIs Unknowns the Largest Finite Element System that Can Be Solved Today?
Is 1.7 10 10 Unknowns the Largest Finite Element System that Can Be Solved Today? B. Bergen F. Hülsemann U. Rüde 22nd July 2005 Permission to make digital or hard copies of all or part of this work for
More informationThe Immersed Interface Method
The Immersed Interface Method Numerical Solutions of PDEs Involving Interfaces and Irregular Domains Zhiiin Li Kazufumi Ito North Carolina State University Raleigh, North Carolina Society for Industrial
More informationPerformance Optimization of a Massively Parallel Phase-Field Method Using the HPC Framework walberla
Performance Optimization of a Massively Parallel Phase-Field Method Using the HPC Framework walberla SIAM PP 2016, April 13 th 2016 Martin Bauer, Florian Schornbaum, Christian Godenschwager, Johannes Hötzer,
More informationLarge Scale Parallel Lattice Boltzmann Model of Dendritic Growth
Large Scale Parallel Lattice Boltzmann Model of Dendritic Growth Bohumir Jelinek Mohsen Eshraghi Sergio Felicelli CAVS, Mississippi State University March 3-7, 2013 San Antonio, Texas US Army Corps of
More informationLATTICE-BOLTZMANN METHOD FOR THE SIMULATION OF LAMINAR MIXERS
14 th European Conference on Mixing Warszawa, 10-13 September 2012 LATTICE-BOLTZMANN METHOD FOR THE SIMULATION OF LAMINAR MIXERS Felix Muggli a, Laurent Chatagny a, Jonas Lätt b a Sulzer Markets & Technology
More informationParallel High-Order Geometric Multigrid Methods on Adaptive Meshes for Highly Heterogeneous Nonlinear Stokes Flow Simulations of Earth s Mantle
ICES Student Forum The University of Texas at Austin, USA November 4, 204 Parallel High-Order Geometric Multigrid Methods on Adaptive Meshes for Highly Heterogeneous Nonlinear Stokes Flow Simulations of
More informationWhy Use the GPU? How to Exploit? New Hardware Features. Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid. Semiconductor trends
Imagine stream processor; Bill Dally, Stanford Connection Machine CM; Thinking Machines Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid Jeffrey Bolz Eitan Grinspun Caltech Ian Farmer
More informationThread and Data parallelism in CPUs - will GPUs become obsolete?
Thread and Data parallelism in CPUs - will GPUs become obsolete? USP, Sao Paulo 25/03/11 Carsten Trinitis Carsten.Trinitis@tum.de Lehrstuhl für Rechnertechnik und Rechnerorganisation (LRR) Institut für
More information2.7 Cloth Animation. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter 2 123
2.7 Cloth Animation 320491: Advanced Graphics - Chapter 2 123 Example: Cloth draping Image Michael Kass 320491: Advanced Graphics - Chapter 2 124 Cloth using mass-spring model Network of masses and springs
More informationSolving Partial Differential Equations on Overlapping Grids
**FULL TITLE** ASP Conference Series, Vol. **VOLUME**, **YEAR OF PUBLICATION** **NAMES OF EDITORS** Solving Partial Differential Equations on Overlapping Grids William D. Henshaw Centre for Applied Scientific
More informationComputing architectures Part 2 TMA4280 Introduction to Supercomputing
Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:
More informationHigh Performance Computing for PDE Some numerical aspects of Petascale Computing
High Performance Computing for PDE Some numerical aspects of Petascale Computing S. Turek, D. Göddeke with support by: Chr. Becker, S. Buijssen, M. Grajewski, H. Wobker Institut für Angewandte Mathematik,
More informationContinuum-Microscopic Models
Scientific Computing and Numerical Analysis Seminar October 1, 2010 Outline Heterogeneous Multiscale Method Adaptive Mesh ad Algorithm Refinement Equation-Free Method Incorporates two scales (length, time
More informationTAU mesh deformation. Thomas Gerhold
TAU mesh deformation Thomas Gerhold The parallel mesh deformation of the DLR TAU-Code Introduction Mesh deformation method & Parallelization Results & Applications Conclusion & Outlook Introduction CFD
More informationFinite Volume Discretization on Irregular Voronoi Grids
Finite Volume Discretization on Irregular Voronoi Grids C.Huettig 1, W. Moore 1 1 Hampton University / National Institute of Aerospace Folie 1 The earth and its terrestrial neighbors NASA Colin Rose, Dorling
More informationHigh Scalability of Lattice Boltzmann Simulations with Turbulence Models using Heterogeneous Clusters
SIAM PP 2014 High Scalability of Lattice Boltzmann Simulations with Turbulence Models using Heterogeneous Clusters C. Riesinger, A. Bakhtiari, M. Schreiber Technische Universität München February 20, 2014
More informationOptimization of HOM Couplers using Time Domain Schemes
Optimization of HOM Couplers using Time Domain Schemes Workshop on HOM Damping in Superconducting RF Cavities Carsten Potratz Universität Rostock October 11, 2010 10/11/2010 2009 UNIVERSITÄT ROSTOCK FAKULTÄT
More informationGPU Cluster Computing for FEM
GPU Cluster Computing for FEM Dominik Göddeke Sven H.M. Buijssen, Hilmar Wobker and Stefan Turek Angewandte Mathematik und Numerik TU Dortmund, Germany dominik.goeddeke@math.tu-dortmund.de GPU Computing
More informationFree Surface Flow Simulations
Free Surface Flow Simulations Hrvoje Jasak h.jasak@wikki.co.uk Wikki Ltd. United Kingdom 11/Jan/2005 Free Surface Flow Simulations p.1/26 Outline Objective Present two numerical modelling approaches for
More informationShape of Things to Come: Next-Gen Physics Deep Dive
Shape of Things to Come: Next-Gen Physics Deep Dive Jean Pierre Bordes NVIDIA Corporation Free PhysX on CUDA PhysX by NVIDIA since March 2008 PhysX on CUDA available: August 2008 GPU PhysX in Games Physical
More informationNVIDIA. Interacting with Particle Simulation in Maya using CUDA & Maximus. Wil Braithwaite NVIDIA Applied Engineering Digital Film
NVIDIA Interacting with Particle Simulation in Maya using CUDA & Maximus Wil Braithwaite NVIDIA Applied Engineering Digital Film Some particle milestones FX Rendering Physics 1982 - First CG particle FX
More informationHigh Performance Computing for PDE Towards Petascale Computing
High Performance Computing for PDE Towards Petascale Computing S. Turek, D. Göddeke with support by: Chr. Becker, S. Buijssen, M. Grajewski, H. Wobker Institut für Angewandte Mathematik, Univ. Dortmund
More informationNetwork Bandwidth & Minimum Efficient Problem Size
Network Bandwidth & Minimum Efficient Problem Size Paul R. Woodward Laboratory for Computational Science & Engineering (LCSE), University of Minnesota April 21, 2004 Build 3 virtual computers with Intel
More informationEfficient multigrid solvers for strongly anisotropic PDEs in atmospheric modelling
Iterative Solvers Numerical Results Conclusion and outlook 1/22 Efficient multigrid solvers for strongly anisotropic PDEs in atmospheric modelling Part II: GPU Implementation and Scaling on Titan Eike
More informationPerformances and Tuning for Designing a Fast Parallel Hemodynamic Simulator. Bilel Hadri
Performances and Tuning for Designing a Fast Parallel Hemodynamic Simulator Bilel Hadri University of Tennessee Innovative Computing Laboratory Collaboration: Dr Marc Garbey, University of Houston, Department
More informationHigh Performance Computing
High Performance Computing ADVANCED SCIENTIFIC COMPUTING Dr. Ing. Morris Riedel Adjunct Associated Professor School of Engineering and Natural Sciences, University of Iceland Research Group Leader, Juelich
More informationFRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG. Lehrstuhl für Informatik 10 (Systemsimulation)
FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG INSTITUT FÜR INFORMATIK (MATHEMATISCHE MASCHINEN UND DATENVERARBEITUNG) Lehrstuhl für Informatik 10 (Systemsimulation) walberla: Visualization of Fluid
More informationIntroduction CPS343. Spring Parallel and High Performance Computing. CPS343 (Parallel and HPC) Introduction Spring / 29
Introduction CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Introduction Spring 2018 1 / 29 Outline 1 Preface Course Details Course Requirements 2 Background Definitions
More informationTowards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers
Towards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers Markus Geveler, Dirk Ribbrock, Dominik Göddeke, Peter Zajac, Stefan Turek Institut für Angewandte Mathematik TU Dortmund,
More information3D ADI Method for Fluid Simulation on Multiple GPUs. Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA
3D ADI Method for Fluid Simulation on Multiple GPUs Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA Introduction Fluid simulation using direct numerical methods Gives the most accurate result Requires
More informationPrinciples and Goals
Simulations of foam rheology Simon Cox Principles and Goals Use knowledge of foam structure to predict response when experiments are not easy to isolate causes of certain effects to save time in designing
More informationWebinar #3 Lattice Boltzmann method for CompBioMed (incl. Palabos)
Webinar series A Centre of Excellence in Computational Biomedicine Webinar #3 Lattice Boltzmann method for CompBioMed (incl. Palabos) 19 March 2018 The webinar will start at 12pm CET / 11am GMT Dr Jonas
More informationcomputational Fluid Dynamics - Prof. V. Esfahanian
Three boards categories: Experimental Theoretical Computational Crucial to know all three: Each has their advantages and disadvantages. Require validation and verification. School of Mechanical Engineering
More informationReal Application Performance and Beyond
Real Application Performance and Beyond Mellanox Technologies Inc. 2900 Stender Way, Santa Clara, CA 95054 Tel: 408-970-3400 Fax: 408-970-3403 http://www.mellanox.com Scientists, engineers and analysts
More information1.2 Numerical Solutions of Flow Problems
1.2 Numerical Solutions of Flow Problems DIFFERENTIAL EQUATIONS OF MOTION FOR A SIMPLIFIED FLOW PROBLEM Continuity equation for incompressible flow: 0 Momentum (Navier-Stokes) equations for a Newtonian
More informationIntroduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620
Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved
More informationAdarsh Krishnamurthy (cs184-bb) Bela Stepanova (cs184-bs)
OBJECTIVE FLUID SIMULATIONS Adarsh Krishnamurthy (cs184-bb) Bela Stepanova (cs184-bs) The basic objective of the project is the implementation of the paper Stable Fluids (Jos Stam, SIGGRAPH 99). The final
More informationKommunikations- und Optimierungsaspekte paralleler Programmiermodelle auf hybriden HPC-Plattformen
Kommunikations- und Optimierungsaspekte paralleler Programmiermodelle auf hybriden HPC-Plattformen Rolf Rabenseifner rabenseifner@hlrs.de Universität Stuttgart, Höchstleistungsrechenzentrum Stuttgart (HLRS)
More informationHPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances)
HPC and IT Issues Session Agenda Deployment of Simulation (Trends and Issues Impacting IT) Discussion Mapping HPC to Performance (Scaling, Technology Advances) Discussion Optimizing IT for Remote Access
More informationLattice Boltzmann with CUDA
Lattice Boltzmann with CUDA Lan Shi, Li Yi & Liyuan Zhang Hauptseminar: Multicore Architectures and Programming Page 1 Outline Overview of LBM An usage of LBM Algorithm Implementation in CUDA and Optimization
More informationDIFFERENTIAL. Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka
USE OF FOR Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka Faculty of Nuclear Sciences and Physical Engineering Czech Technical University in Prague Mini workshop on advanced numerical methods
More informationFlux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters
Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters Manfred Liebmann Technische Universität München Chair of Optimal Control Center for Mathematical Sciences,
More informationHigh-Performance Computational Electromagnetic Modeling Using Low-Cost Parallel Computers
High-Performance Computational Electromagnetic Modeling Using Low-Cost Parallel Computers July 14, 1997 J Daniel S. Katz (Daniel.S.Katz@jpl.nasa.gov) Jet Propulsion Laboratory California Institute of Technology
More informationA Scalable GPU-Based Compressible Fluid Flow Solver for Unstructured Grids
A Scalable GPU-Based Compressible Fluid Flow Solver for Unstructured Grids Patrice Castonguay and Antony Jameson Aerospace Computing Lab, Stanford University GTC Asia, Beijing, China December 15 th, 2011
More informationAnimation of Fluids. Animating Fluid is Hard
Animation of Fluids Animating Fluid is Hard Too complex to animate by hand Surface is changing very quickly Lots of small details In short, a nightmare! Need automatic simulations AdHoc Methods Some simple
More informationCommunication and Optimization Aspects of Parallel Programming Models on Hybrid Architectures
Communication and Optimization Aspects of Parallel Programming Models on Hybrid Architectures Rolf Rabenseifner rabenseifner@hlrs.de Gerhard Wellein gerhard.wellein@rrze.uni-erlangen.de University of Stuttgart
More informationUnstructured Mesh Generation for Implicit Moving Geometries and Level Set Applications
Unstructured Mesh Generation for Implicit Moving Geometries and Level Set Applications Per-Olof Persson (persson@mit.edu) Department of Mathematics Massachusetts Institute of Technology http://www.mit.edu/
More informationPeta-Scale Simulations with the HPC Software Framework walberla:
Peta-Scale Simulations with the HPC Software Framework walberla: Massively Parallel AMR for the Lattice Boltzmann Method SIAM PP 2016, Paris April 15, 2016 Florian Schornbaum, Christian Godenschwager,
More informationPhD Student. Associate Professor, Co-Director, Center for Computational Earth and Environmental Science. Abdulrahman Manea.
Abdulrahman Manea PhD Student Hamdi Tchelepi Associate Professor, Co-Director, Center for Computational Earth and Environmental Science Energy Resources Engineering Department School of Earth Sciences
More information