Philip C. Roth. Computer Science and Mathematics Division Oak Ridge National Laboratory

Size: px

Start display at page:

Download "Philip C. Roth. Computer Science and Mathematics Division Oak Ridge National Laboratory"

Jeffery Underwood
5 years ago
Views:

1 Philip C. Roth Computer Science and Mathematics Division Oak Ridge National Laboratory

2 A Tree-Based Overlay Network (TBON) like MRNet provides scalable infrastructure for tools and applications MRNet's process topology and placement support is extremely flexible (on most platforms) Any tree topology Internal processes on same nodes as application processes, or on distinct nodes 2 Managed by UT-Battelle

3 3 Managed by UT-Battelle

4 Flexibility leads to questions identifying best process topology and placement Interaction of several factors determine best Performance (tool and application) System hardware and software Purpose Even economics (e.g., can I afford to request extra nodes for MRNet processes given my allocation budget?) Decision process often not rigorous using rule of thumb 4 Managed by UT-Battelle

5 Goal: Given a node allocation on a leadership class system, to be able to identify best MRNet process placement and topoogy Several constraints: Tool multicast and reduction requirements Behavior of application under study Other activity on the system System software and hardware 5 Managed by UT-Battelle

6 Cray XT is target platform Jaguar XT4 and XT5 systems at Oak Ridge National Laboratory (ORNL) Hopper XT5 at NERSC Kraken XT5 at ORNL Opteron-based nodes arranged in 3D mesh with possibility of torus links Cray Linux Environment 6 Managed by UT-Battelle

7 Goal: understand Cray XT allocation characteristics & their impact on MRNet-based tool process placement Used simple MPI/Portals program to collect node number and position within the XT mesh Earlier generation ORNL Jaguar with dual-core Opterons Batch job launched two independent instances of the program: 512 application nodes (1024 processes) 72 tool nodes (enough for balanced 8-way TBON topology assuming front-end is on batch script service node) 7 Managed by UT-Battelle

8 8 Managed by UT-Battelle

9 Discrete event simulation of XT system nodes running application and MRNet processes Component of MAST framework: Modeling Assertions, Simulation, and Tuning 9 Managed by UT-Battelle

10 Node modules connected in 3D torus Implemented using OMNeT++ ( 10 Managed by UT-Battelle

11 XTNode Application Process +/- X +/- Y +/- Z MRNet Process 11 Managed by UT-Battelle

12 12 Managed by UT-Battelle

13 13 Managed by UT-Battelle

14 XML file Multiple parallel programs per file, including type and associated attributes like input Mapping of processes to system nodes 14 Managed by UT-Battelle

15 Measuring process-to-process latency and bandwidth MPI, Sockets Fully populated nodes, one process per node Pairs of processes Even ranks first pair left, then right 15 Managed by UT-Battelle

16 1.2e-05 mpi left mpi right 4.5e-06 sock left sock right 1e-05 4e-06 8e e-06 Latency (s) 6e-06 Latency (s) 3e-06 4e e-06 2e-06 2e Rank 1.5e Rank 16 Managed by UT-Battelle

17 5e e+09 mpi left mpi right 9.5e+08 sock left sock right 9e+08 4e e e+08 Bandwidth (Bytes/s) 3e e+09 Bandwidth (Bytes/s) 8e+08 2e e e+09 7e+08 1e+09 5e Rank 6.5e Rank 17 Managed by UT-Battelle

18 9.6e e-06 mpi left mpi right 1.88e e-06 sock left sock right 9.2e e-06 9e e e e-06 Latency (s) 8.6e e-06 Latency (s) 1.78e e e e-06 8e e e e e Rank 1.68e Rank 18 Managed by UT-Battelle

19 1.6e e+09 mpi left mpi right 1.59e e+09 sock left sock right 1.57e e e+09 Bandwidth (Bytes/s) 1.54e e e+09 Bandwidth (Bytes/s) 1.55e e e e e e e e e e Rank 1.48e Rank 19 Managed by UT-Battelle

Process/Processor Mapping MA- Instrumented MPI Program Run on Parallel System MA Model MA Control Flow Graph Simulator System Simulator Behavior/ Performance Prediction(s) Automated Code Tuning

20 Process/Processor Mapping MA- Instrumented MPI Program Run on Parallel System MA Model MA Control Flow Graph Simulator System Simulator Behavior/ Performance Prediction(s) Automated Code Tuning Framework ScalaTrace Trace File Replayer Open Trace Format Trace File Replayer Sequoia Trace File Replayer MRNet Workload Driver + Trace File Replayer + Stochastic Workload Generator 20 Managed by UT-Battelle

Basic XTNode with SeaStar router is implemented Parameterization still in progress as described earlier Support for simple MPI-based workloads Hardcoded behaviors (hot potato, 1D exchange) OTF and

21 Basic XTNode with SeaStar router is implemented Parameterization still in progress as described earlier Support for simple MPI-based workloads Hardcoded behaviors (hot potato, 1D exchange) OTF and Sequioa trace readers implemented for previous version, must be resurrected Support for TBON processes designed and partially implemented Recently adapted model from OMNeT to 4.1 (changes in simulation time) 21 Managed by UT-Battelle

22 This research is sponsored by the Office of Advanced Scientific Computing Research; U.S. Department of Energy. The work was performed at the Oak Ridge National Laboratory which is managed by UT Battelle, LLC under Contract No. De-AC05-00OR This research used resources of the Center for Computational Sciences at Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. De- AC05-00OR Managed by UT-Battelle

23 Predicting TBON performance on Cray XT is highly desirable Matching TBON process topology and placement to tool needs subject to application and system constraints May support online reconfiguration of TBON topology Developing simulation-based TBON prediction capability Expect predictions of realistic scenarios soon Easily adaptable to expected future architectures (e.g., GPU-enabled nodes, Infiniband clusters) Embeddable (in theory) 23 Managed by UT-Battelle

24 24 Managed by UT-Battelle

ScalaIOTrace: Scalable I/O Tracing and Analysis

ScalaIOTrace: Scalable I/O Tracing and Analysis Karthik Vijayakumar 1, Frank Mueller 1, Xiaosong Ma 1,2, Philip C. Roth 2 1 Department of Computer Science, NCSU 2 Computer Science and Mathematics Division,