Damaris: Using Dedicated I/O Cores for Scalable Post-petascale HPC Simulations

Size: px

Start display at page:

Download "Damaris: Using Dedicated I/O Cores for Scalable Post-petascale HPC Simulations"

Philippa Park
5 years ago
Views:

1 Damaris: Using Dedicated I/O Cores for Scalable Post-petascale HPC Simulations Matthieu Dorier ENS Cachan Brittany extension Advised by Gabriel Antoniu SRC

2 Context: HPC simulations on Blue Waters ² INRIA/UIUC Joint Lab for Petascale Computing ² Targeting large-scale simulation of unprecedented accuracy ² Our concern: I/O performance scalability 2

3 Motivation: data management in HPC 3

4 processes Motivation: data management in HPC PetaBytes of data ~ processes ~ 100 data servers ² Problem: ² All processes entering I/O phases at the same time ² File system contention: lake of scalability ² High I/O overhead, high performance variability 4

5 I/O variability: an example ² CM1 tornado simulation: 672 processes sorted by write time 5

6 The Damaris approach: dedicated I/O cores ² Use the SMP s intra-node shared memory Leave a core, go faster! 6

7 Integration with the CM1 tornado simulation ² Less than an hour to write an I/O backend with Damaris ² The I/O core spends 25% of its time writing è 75% spare time! How to use the spare time? ² Custom plugin system: ² Data post-processing, indexing, analysis ² End-to-end scientific process ² Connect visualization/analysis tools è inline visualization 7

8 Results with the CM1 tornado simulation ² On Grid 5000: French national testbed (24 cores/node, 672 cores), with PVFS, comparison with collective I/O ² Communication overhead è leaving a core is more efficient ² No synchronization ² 6 times higher write throughput ² BluePrint: Power5 BlueWaters interim system at NCSA (16 cores/node, 1024 cores), with GPFS, comparison with file-per-process approach ² On 64 nodes è 64 files instead of

9 Results with the CM1 tornado simulation ² On Grid 5000: French national testbed (24 cores/node, 672 cores), with PVFS, comparison with collective I/O ² Communication overhead è leaving a core is more efficient ² No synchronization ² 6 times higher write throughput ² BluePrint: Power5 BlueWaters interim system at NCSA (16 cores/node, 1024 cores), with GPFS, comparison with file-per-process approach ² On 64 nodes è 64 files instead of 1024 ² Overall benefits ² Spare time usage ² Data layout adaptation for subsequent analysis ² Overhead-free compression (600%) ² No more I/O jitter 9

10 Results with the CM1 tornado simulation 10

11 Conclusion ² Damaris: dedicated I/O core in multicore SMP nodes 1 Better I/O and global performance 2 No more variability in write phases 3 Easy integration and configuration ² Targeting Blue Waters and future Post-petascale machines ² Very promising prospects in many directions ² Integration with other simulations: Enzo (AMR), GTC, ² Leverage spare time for efficient inline visualization ² Data-aware self-configuration, scheduled data movements, multi-simulations coupling ² 11

12 Conclusion ² Damaris: dedicated I/O core in multicore SMP nodes 1 Better I/O and global performance 2 No more variability in write phases 3 Easy integration and configuration ² Targeting Blue Waters and future Post-petascale machines ² Very promising prospects in many directions ² Integration with other simulations: Enzo (AMR), GTC, ² Leverage spare time for efficient inline visualization ² Data-aware self-configuration, scheduled data movements, multi-simulations coupling ² Thank you, questions? 12

Damaris. In-Situ Data Analysis and Visualization for Large-Scale HPC Simulations. KerData Team. Inria Rennes,

Damaris. In-Situ Data Analysis and Visualization for Large-Scale HPC Simulations. KerData Team. Inria Rennes, Damaris In-Situ Data Analysis and Visualization for Large-Scale HPC Simulations KerData Team Inria Rennes, http://damaris.gforge.inria.fr Outline 1. From I/O to in-situ visualization 2. Damaris approach