Using Charm++ to Support Multiscale Multiphysics

Size: px

Start display at page:

Download "Using Charm++ to Support Multiscale Multiphysics"

Harvey Godwin Flynn
5 years ago
Views:

1 LA-UR Using Charm++ to Support Multiscale Multiphysics On the Trinity Supercomputer Robert Pavel, Christoph Junghans, Susan M. Mniszewski, Timothy C. Germann April, 18 th 2017 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

Exascale Co-Design Center for Materials in

*Others are: CESAR (ANL/reactors), ExaCT

requirements for the exascale ecosystem that

materials science simulations (both single-

2 Exascale Co-Design Center for Materials in Extreme Environments (ExMatEx) ExMatEx was one of three* DOE Office of Science application co-design centers ( ) *Others are: CESAR (ANL/reactors), ExaCT (SNL-CA/combustion) Large scale collaborations between national labs, academia, vendors Coordinated with related DOE NNSA co-design efforts Goal: to establish a relationships between algorithms, software stack, and architectures needed to enable exascale-ready science applications Two ultimate objectives: Identify the requirements for the exascale ecosystem that are necessary to perform computational materials science simulations (both single- and multiscale) Demonstrate and deliver a prototype scale-bridging materials science application based upon adaptive physics refinement 04/18/17 2

Tabasco Test Problem: Modeling a Taylor cylinder impact test The simple Taylor model cannot account for the twinning and anisotropy of a tantalum sample used in a LANL experiments (MST-8), and thus

3 Tabasco Test Problem: Modeling a Taylor cylinder impact test The simple Taylor model cannot account for the twinning and anisotropy of a tantalum sample used in a LANL experiments (MST-8), and thus the final shapes do not match. The physics goal of this demonstration is to show that the more accurate VPSC fine-scale model with an appropriate reduceddimensionality (~60 degrees of freedom) model of texture can (qualitatively or quantitatively?) reproduce the experimental shape. Ta P.J. Maudlin, J.F. Bingert, J.W. House, and S.R. Chen, On the modeling of the Taylor cylinder impact test for orthotropic textured materials: experiments and simulations, Int. J. Plasticity 15(2), (1999). CoEVP 04/18/17 3

database DB Subdomain N-1 Subdomain N Adaptive Sampler Node

4 Workflow Overview Coarse-scale model Subdomain 1 Subdomain 2 Adaptive Sampler Node 1 DB Eventually consistent distributed database DB Subdomain N-1 Subdomain N Adaptive Sampler Node N/2 DB DB DB On-demand fine scale models FSM FSM FSM 04/18/17 4

5 Task-Based Scale-bridging Code (TaBaSCo) Demonstrate the feasibility of at-scale heterogeneous computations composed of: Coarse-scale Lagrangian hydrodynamics Dynamically launched constitutive model calculations Results of fine-scale evaluations stored for reuse in databases Use of Taylor fine scale model evaluation VPSC fine scale model evaluation Adaptive sampling which queries the database, interpolates results, and decides when to spawn fine-scale evaluations Combines an asynchronous task-based runtime environment with persistent database storage Provides load-balancer and checkpoints for fault tolerant modes 04/18/17 5

6 Tabasco Framework Asynchronous Task-based runtimes explored CHARM++ (built/ran on Trinity) libcircle (built/ran on Darwin, but not Trinity) MPI Task Pool (dual binary version built/ran on Darwin, single binary version ran for small examples on Trinity) Nearest neighbor search Mtree vs. FLANN (both worked on Trinity) Database storage In memory HashMap (was limited for long runs) Posix (became our reliable database option for Trinity) Posix/Data Warp (only worked for short runs on Trinity) REDIS (never ran on Trinity) 04/18/17 6

7 Chare Wrapper mapping Chare Dim Class Lib Resolution Migrate CoarseScaleModel 1D Lulesh MPI Rank N FineScaleModel 2D Constitutive/ ElastoPlasticity CM Element Y Evaluate 1D Taylor/VPSC CM Element Y NearestNeighbor Search 1D Approx NearestNeighbors DBInterface 1D KrigingDataBase/ SingletonDB CM/FLANN /Mtree CM/Redis/ libhio/ POSIX Request (Service) Read/Write (Service) N N 04/18/17 7

8 Trinity: Advanced Technology System (a mixture of Intel Haswell and Knights Landing (KNL) processors) 04/18/17 8

9 Haswell nodes Coarse Scale (LULESH) Haswell nodes Open Science Trinity Port of TaBaSCo Burst Buffer nodes Neighbor Search NoSQL Database (Prior Fine Scale Results) Haswell nodes Neighbor Interpolation Haswell (Phase 1) or KNL (Phase 2) nodes Fine Scale and Evaluate (Taylor/VPSC) 04/18/17 9

10 Tabasco Weak and Strong Scaling Weak scaling Brute force (w/o AS) and Adaptive Sampling (w AS) Edge=64, height=26-13,312, 46,592-23,855,104 elements Good till 128 nodes Communication overhead for >=256 Strong scaling Brute force (w/o AS) and Adaptive Sampling (w AS) Edge=128, height=208, 1,490,944 elements Runtime(s) Runtime(s) Tabasco Weak Scaling Number of nodes Tabasco Strong Scaling (128x208) Number of nodes w AS w/o AS w/o AS w AS 04/18/17 10

11 Tabasco Brute Force (w/o Adaptive Sampling) on Trinity (512 nodes) step 0 04/18/17 11

12 Tabasco Brute Force (w/o Adaptive Sampling) on Trinity (512 nodes) step /18/17 12

13 Tabasco Brute Force (w/o Adaptive Sampling) on Trinity (512 nodes) step /18/17 13

14 Tabasco Brute Force (w/o Adaptive Sampling) on Trinity (512 nodes) step 10,000 04/18/17 14

15 Tabasco Brute Force (w/o Adaptive Samp.) on Trinity (512 nodes) step 20,000 04/18/17 15

16 Tabasco Brute Force (w/o Adaptive Samp.) on Trinity (512 nodes) step 22,000 T a 04/18/17 16

17 Early Work in Hybrid Runs on Trinity Trinity is a machine that was installed in two stages 9408 compute nodes with Intel Haswell Processors 32 CPU Cores per node, each with 2x hyperthreads 9500 compute nodes with Intel Knights Landing processors 68 cores per node While similar, nodes have different strengths Goal of Tabasco is to perform hybrid run in which both stages are utilized Coarse scale solver and less compute intensive work on Haswell nodes Fine grain solver on KNLs Used Charm++ s Logical Machine Entities to identify KNLs and Haswells And then used custom mapper to assign chares based on physical node type 04/18/17 17

18 Haswell nodes Coarse Scale (LULESH) Haswell nodes Open Science Trinity Port of TaBaSCo Burst Buffer nodes Neighbor Search NoSQL Database (Prior Fine Scale Results) Haswell nodes Neighbor Interpolation Haswell (Phase 1) or KNL (Phase 2) nodes Fine Scale and Evaluate (Taylor/VPSC) 04/18/17 18

19 Proof of Concept Hybrid Run: Host Platform Current Proof of Concept implementation running on Trinitite Run with three types of node Dual Socket Haswell ``Solver Node 32 MPI ranks per node KNL ``Solver Node 64 MPI ranks per node Dedicated Haswell ``Organizer Node 4 MPI ranks per node Run on a subset of Trinitite Goal was to work with the system stack and get initial performance results Larger Runs Planned following unification of Trinity phases 04/18/17 19

20 Proof of Concept Hybrid Run: Simulation Configuration Restricted to ``Taylor Solver No Adaptive Sampling Maximize Work, Minimize Communication Re-used a run from Open Science Phase 1 48 Edge Elements 104 Height Elements 4 Domains (Coarse Scale Chares) Fine Scale Evaluations 100 Time Steps 04/18/17 20

21 Proof of Concept Hybrid Run Results: Raw Execution Time 04/18/17 21

22 Approximation of Energy Savings Used very rough approximates to estimate energy savings of hybrid run Execution times of Proof of Concept runs TDP from spec sheets for each processor Don t do this Assumed all else the same E = TDP ExecutionTime 04/18/17 22

23 Very Early TDP-Based Energy Results 04/18/17 23

24 Energy Savings Through Use of KNL Solvers 04/18/17 24

25 Questions? 04/18/17 25

libhio: Optimizing IO on Cray XC Systems With DataWarp

libhio: Optimizing IO on Cray XC Systems With DataWarp May 9, 2017 Nathan Hjelm Cray Users Group May 9, 2017 Los Alamos National Laboratory LA-UR-17-23841 5/8/2017 1 Outline Background HIO Design Functionality