Parallel Simulation of Dendritic Growth On Unstructured Grids

Size: px

Start display at page:

Download "Parallel Simulation of Dendritic Growth On Unstructured Grids"

Amberlynn Adams
5 years ago
Views:

1 Parallel Simulation of Dendritic Growth On Unstructured Grids, Julian Hammer, Dietmar Fey Friedrich-Alexander-Universität Erlangen-Nürnberg IA 3 Nov. 13th, 2011

2 Outline 1 What and why? 2 Specialized Parallelization 3 Stencilized Parallelization

3 Outlook 1 What and why? 2 Specialized Parallelization 3 Stencilized Parallelization

Growth in Al/Cu Microscope two classes of models: 1 2 Simulation cellular automata

4 What and why? Specialized Parallelization Stencilized Parallelization Simulation of Dendritic Growth in Al/Cu Microscope two classes of models: 1 2 Simulation cellular automata (our approach) phase field method (Peta-scale Phase-Field Simulation for Dendritic Solidification on the TSUBAME 2.0 Supercomputer) meshfree (no regular grid)

5 Simulation Model black: solid cells green: liquid cells squares: particles on phase boundary

6 Outlook 1 What and why? 2 Specialized Parallelization 3 Stencilized Parallelization

7 1 decompose graph via ParMETIS 2 loop 1 sync ghostzones 2 update 3 (output)

8 Communication Graph cells, 10 MPI processes 74k ghost cells 2 GB/step

9 Communication Graph cells, 100 MPI processes 475k ghost cells 11 GB/step

10 Outlook 1 What and why? 2 Specialized Parallelization 3 Stencilized Parallelization

11 Stencilization superimpose grid on irregular graph place cells into container cells physically equivalent reuse existing library: LibGeoDecomp overlapping comm. & calc. hybrid parallelization

12 Stencilization superimpose grid on irregular graph place cells into container cells physically equivalent reuse existing library: LibGeoDecomp overlapping comm. & calc. hybrid parallelization

13 Evaluation: Speedup Speedup Ideal LibGeoDecomp Specialized Parallelization Cores testbed: 28 IBM LS21 blades (Opteron dual-cores) 10 Gb InfiniBand

14 Overlapping Communication and Calculation Myth #1: It s as easy as calling MPI_Isend() Myth #2: It s not possible at all 1 MPI_Isend() 2 loop 1 MPI_Test() 2 work() 3 MPI_Wait()

15 Overlapping Communication and Calculation Myth #1: It s as easy as calling MPI_Isend() Myth #2: It s not possible at all 1 MPI_Isend() 2 loop 1 MPI_Test() 2 work() 3 MPI_Wait()

16 Overlapping Communication and Calculation Myth #1: It s as easy as calling MPI_Isend() Myth #2: It s not possible at all 1 MPI_Isend() 2 loop 1 MPI_Test() 2 work() 3 MPI_Wait()

17 Overlapping Micro Benchmark , overlap, send 64, overlap, send t k 256k 1M 4M 16M 64M 256M 1G comsize Open MPI + InfiniBand

18 Conclusion communication-bound model stencilization surprisingly efficient reduces number of neighbors but model changes may be substantial use MPI+OpenMP to reduce memory traffic asynchronous communication by repeatedly poking MPI LibGeoDecomp Self-Adapting Stencil Codes for the Grid

19 Backup

20 Improving Data Locality for Communication

21 Efficient Memory Layout Original Layout Optimized Layout SimSpace Grid Cell ID neighborids SimObject position state concentration Particle IDsource IDtarget velocity ContainerCell position dimensions Particle IDsource IDtarget velocity Cell ID neighborids SimObject position state concentration

22 Simulation of Dendritic Growth in Al/Cu Microscope Simulation model courtesy of Department of Metallic Materials, FSU Jena, Germany name derived from greek δενδρoν (dendron) not supercooled

From Notebooks to Supercomputers: Tap the Full Potential of Your CUDA Resources with LibGeoDecomp

From Notebooks to Supercomputers: Tap the Full Potential of Your CUDA Resources with andreas.schaefer@cs.fau.de Friedrich-Alexander-Universität Erlangen-Nürnberg GPU Technology Conference 2013, San José,