Towards Exa-Scale: Computing with Millions of Cores

Size: px

Start display at page:

Download "Towards Exa-Scale: Computing with Millions of Cores"

Steven Cummings
5 years ago
Views:

Towards Exa-Scale: Computing with Millions of Cores U. Rüde (LSS Erlangen, ruede@cs.

Engineering of Advanced Materials Universität Erlangen-Nürnberg www10.informatik.

1 Towards Exa-Scale: Computing with Millions of Cores U. Rüde (LSS Erlangen, Lehrstuhl für Informatik 10 (Systemsimulation) Excellence Cluster Engineering of Advanced Materials Universität Erlangen-Nürnberg www10.informatik.uni-erlangen.de Einweihung des neuen Erlanger Höchstleistungsrechners LiMa RRZE 7. April

2 Overview Motivation: How fast are computers today (and next decade) Rigid Body Dynamics for Granular Media Flow Simulation with Lattice Boltzmann Methods free surfaces bubbly flows Fluid-Structure Interaction with Moving Rigid Objects particle ladden flows Conclusions ZISC 2

3 Motivation 3

Example Peta-Scale System: Jugene @ Jülich Extreme Scaling Workshop 2010 at Jülich Supercomputing Center PetaFlops = 10 15 operations/second IBM Blue Gene Theoretical peak

4 Example Peta-Scale System: Jülich Extreme Scaling Workshop 2010 at Jülich Supercomputing Center PetaFlops = operations/second IBM Blue Gene Theoretical peak capability Petaflop/s cores #5 on TOP 500 List in June 2010 For comparison: Current fast desktop PC is! times slower > cores expected

5 What can we do with Exa-Scale Computers? Even if we want to simulate a billion objects (particles): we can do a billion operations for each of them in each second a trillion finite elements (finite volumes) to resolve a PDE, we can do a million operations for each of them in each second Most existing software dramatically underperforms on contemporary HPC architectures This will get more dramatic on future exa-scale systems Fluidized Bed (movie: thanks to K.E. Wirth) 5

6 What s the problem? with four strong jet engines Would you want to propel a Super Jumbo (not those of Rolls-Royce of course) or with 300,000 blow dryer fans? 6

7 Granular Media Simulations 7

8 Silo Scenario randomly generated, non-spherical particles, 256 CPUs, time steps, runtime: 16.4 h (including data output), s per time step 8

9 Hourglass Simulation (1) spherical particles, 256 CPUs, time steps, runtime: 48 h (including data output) 9

10 Flow Simulation with Lattice Boltzmann Methods 10

Larger-Scale Computation: 1000 Bubbles Best Paper Award for Stefan Donath (LSS Erlangen) at ParCFD, May 2009 (Moffett Field, USA) Simulation 1000 Bubbles

11 Larger-Scale Computation: 1000 Bubbles Best Paper Award for Stefan Donath (LSS Erlangen) at ParCFD, May 2009 (Moffett Field, USA) Simulation 1000 Bubbles 510x510x530 = 1.4"10 8 lattice cells 70,000 time steps 77 GB 64 processes 72 hours 4,608 core hours Visualization 770 images Approx. 12,000 core hours for rendering 11

12 Fluid-Structure Interaction with Moving Rigid Bodies 12

13 Fluid-Structure Interaction 13

14 Virtual Fluidized Bed 512 processors Simulation Domain Size: 180x198x360 cells of LBM 900 capsules and 1008 spheres = 1908 objects Number time steps: Run Time: 252,000 07h 12 min 14

per time step (LBM alone) 50 TByte 0.6 0.

80x80x80 lattice cells per core 100 1000

15 Weak Scaling Efficiency Jugene Blue Gene/P Jülich Supercomputer Center Largest simulation to date : 8 Trillion (10 12 ) variables per time step (LBM alone) 50 TByte x40x40 lattice cells per core 80x80x80 lattice cells per core Number of Cores Scaling 64 to cores Densely packed particles lattice cells rigid spherical objects 15

16 Conclusions 16

17 Zur Konzeption eines Zentralinstituts Scientific Computing ZISC an der FAU Erlangen-Nürnberg The Third Pillar for 21st Century Science 17

Observation and prototypes empirical Sciences

18 The Two Principles of Science Three Theory Mathematical Models, Differential Equations, Newton Experiments Observation and prototypes empirical Sciences Computational Science Simulation, Optimization (quantitative) virtual Reality 18

19 Zitat aus dem PITAC Report (Presidents Information Technology Advisory Committee) Computational Science: Ensuring Americas Competitiveness (2005) Finding: Traditional disciplinary boundaries... severely inhibit the development of effective research and education in Computational Science. Recommendation: Universities must significantly change their organizational structures... for computational scientists... to remain at the forefront of scientific discovery. 19

20 Aufgaben des ZISC Kooperationsplattform: Ansprechpartner für komplexe Simulationsaufgaben nach innen und aussen Einwerbung koordinierter Forschungsförderung Uni-Interne überfachliche Vernetzung e.g. gestern: Workshop GPU Computing Weiterbildung: Sommerschulen Kompaktkurse Gästeprogramm Vernetzung und Technologietransfer zur Industrie 20

21 ZISC Beteiligte Institutionen: CCC, RRZE, EAM, ECN,... Individuelle Forscher aus: Chemie, Physik, Mathematik, Informatik, Medizin, MB, CBI, EEI, WW,... 21

22 Conclusions Exa-Scale will be Easy! If parallel efficiency is bad, choose a slower serial algorithm it is probably easier to parallelize and will make your speedups look much more impressive Introduce the CrunchMe variable for getting high Flops rates advanced method: disguise CrunchMe by using an inefficient (but compute-intensive!) algorithm from the start Introduce the HitMe variable to get good cache hit rates advanced version: Implement HitMe in the Hash-Brown Lookaside Table for the Multi- Threatened Cash-Filling Clouded Tree data structure... impressing yourself and others Never cite time-to-solution who cares whether you solve a real-life problem anyway it is the MachoFlops that interest the people who pay for your research Never waste your time by trying to use a complicated algorithm in parallel Use Primitive Algorithm => Easy to Maximize your MachoFlops A few million CPU hours can easily save you days of reading in boring math books 22

23 Acknowledgements Collaborators In Erlangen: WTM, LSE, LSTM, LGDV, RRZE, LME, Neurozentrum, Radiologie, Applied Mathematics, Theoretical Physics, etc. Especially for foams: C. Körner (WTM) International: Utah, Technion, Constanta, Ghent, Boulder, München, CAS, Zürich, Delhi,... Dissertationen Projects N. Thürey, T. Pohl, S. Donath, S. Bogner (LBM, free surfaces, 2-phase flows) M. Mohr, B. Bergen, U. Fabricius, H. Köstler, C. Freundl, T. Gradl, B. Gmeiner (Massively parallel PDE-solvers) M. Kowarschik, J. Treibig, M. Stürmer, J. Habich (architecture aware algorithms) K. Iglberger, T. Preclik, K. Pickel (rigid body dynamics) J. Götz, C. Feichtinger (Massively parallel LBM software, suspensions) C. Mihoubi, D. Bartuschat (Complex geometries, parallel LBM) (Long Term) Guests in summer/fall 2009/10: Dr. S. Ganguly, IIT Kharagpur (Humboldt) - Electroosmotic Flows Prof. V. Buwa, IIT Delhi (Humboldt) - Gas-Fluid-Solid flows Felipe Aristizabal, McGill Univ., Canada (LBM with Brownian Motion) Prof. Popa, Constanta, Romania (DAAD) Numerical Linear Algebra Prof. N. Zakaria, Universiti Petronas, Malaysia Prof. Hanke, Prof. Oppelstrup, KTH Stockholm (DAAD), Mathematical Modelling ~25 Diplom- /Master- Thesis, ~30 Bachelor Thesis Funding by KONWIHR, DFG, BMBF, EU, Elitenetzwerk Bayern 23

24 Thank you for your attention! Questions? Slides, reports, thesis, animations available for download at: www10.informatik.uni-erlangen.de 24

Adaptive Hierarchical Grids with a Trillion Tetrahedra

Adaptive Hierarchical Grids with a Trillion Tetrahedra Tobias Gradl, Björn Gmeiner and U. Rüde (LSS Erlangen, ruede@cs.fau.de) in collaboration with many more Lehrstuhl für Informatik 10 (Systemsimulation)