Characterizing the I/O Behavior of Scientific Applications on the Cray XT

Size: px
Start display at page:

Download "Characterizing the I/O Behavior of Scientific Applications on the Cray XT"

Transcription

1 Characterizing the I/O Behavior of Scientific Applications on the Cray XT Philip C. Roth Computer Science and Mathematics Division Oak Ridge National Laboratory Oak Ridge, TN ABSTRACT Scientific applications use input/output (I/O) for obtaining initial conditions and execution parameters, as a persistent way of saving program output, and for safeguarding against system unreliability. Although system sizes are expected to continue increasing, I/O performance is not expected to keep pace with system computation and communication performance. Understanding application I/O demands and system I/O capabilities is the first step toward bridging this gap between them. In this paper, we present our approach for characterizing the I/O demands of applications on the Cray XT. We also present preliminary case studies showing the use of our I/O characterization infrastructure with climate studies and combustion simulation programs. Categories and Subject Descriptors C.4 [Performance of Systems]: performance attributes, measurement techniques. General Terms Performance, Measurement. Keywords Performance data collection, instrumentation, Cray XT. 1. INTRODUCTION Scientific applications from areas like climate studies, fusion, and molecular dynamics use input/output (I/O) for several purposes, such as to obtain initial conditions and execution parameters, as a persistent way of saving program output, and to safeguard against system unreliability. This last purpose is becoming increasingly important: the desire to reach everincreasing computational targets with high-performance computing (HPC) systems has produced a trend toward systems with an increasing number of components, and system reliability This research is sponsored by the Office of Advanced Scientific Computing Research; U.S. Department of Energy. The work was performed at the Oak Ridge National Laboratory, which is managed by UT-Battelle, LLC under Contract No. DE-AC05-00OR Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by a contractor or affiliate of the U.S. Government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only. Supercomputing'07, Nov , 2007, Reno, NV. Copyright 2007 ACM ISBN /07/11...$5.00 is expected to decrease as the number of components increases. Current HPC systems have several tens to hundreds of thousands of processor cores, but researchers are already bracing themselves for systems with millions of cores. Even with expected technological advances, this trend is expected to continue many years into the future. Because I/O in the form of checkpointing is the technique most often used to guard against system failures, and because I/O technology is not expected to keep pace with processor technology, I/O is an area of increasing concern for both producers and consumers of HPC systems. Understanding application I/O demands and system I/O capabilities is the first step toward bridging the gap between the two. In response to the need for tools that provide insight into this gap, performance analysis tools like Paradyn [10] and the Tuning and Analysis Utilities (TAU) [15] support measurement and problem diagnosis of I/O performance. To support our investigation into the I/O behavior of scientific applications on leadership class systems, we have designed an I/O event tracing system and produced a prototype implementation for applications running on the Cray XT, the primary platform of the U.S. Department of Energy (DOE) Office of Science s Leadership Computing Facility at Oak Ridge National Laboratory (ORNL). There are two primary contributions of this paper. First, we present our preliminary I/O characterization approach and its prototype implementation. Second, we present preliminary case studies showing the use of our I/O characterization approach with the Parallel Ocean Program (POP) [7] and the S3D combustion simulation program [4]. 2. THE CRAY XT The Cray XT is a parallel computing platform that features massive parallelism and high performance [1]. An XT system consists of processing elements (PEs) connected in a threedimensional mesh or torus topology. Each PE contains an AMD Opteron processor, memory, and a Cray proprietary router Application-Specific Integrated Circuit (ASIC) called SeaStar (see Figure 1). Single- and multi-core processors are supported. The initial Cray XT systems (the XT3 and XT4) use only Opteron-based PEs, but the Cray XT5 also supports heterogeneous systems containing vector processors and Field Programmable Gate Arrays [5].

2 Cray XT PEs are partitioned into compute PEs and service PEs. Compute PEs run application processes, and use either a lightweight operating system kernel called Catamount [8] or Cray s Compute Node Linux (CNL). Service PEs provide login and I/O services with a traditional Linux installation. For this work, we used the Cray XT system from the DOE Leadership Computing Facility at ORNL. During the time of our experimentation, this system contained a combination of XT3 and XT4 cabinets. Also, the system has been converted from using the Catamount kernel on its compute nodes to CNL. At the time of our experimentation, this system used the Catamount kernel on its compute PEs and Lustre as its parallel file system. 3. THE IOT EVENT TRACING INFRASTRUCTURE Event tracing is a well-established technique for performance data collection, and tools like TAU [15], Paraver [13], SCALASCA [2], and svpablo [6] have long supported collection and analysis of program event data, often including I/O events. For our application I/O characterization activity, we adopt a traditional event-based performance data collection approach that produces event trace files for ease of repeated post-mortem analysis and sharing with other researchers. We developed a prototype implementation of our event tracing infrastructure for MPI applications on the Cray XT; we call this prototype IOT. We intend IOT to be the first component of a more comprehensive performance data collection and analysis infrastructure for programs running on DOE leadership-class computing platforms. Our data collection approach uses two components. The first component is a collection of functions that replace I/O and other interesting function calls (e.g., open() and write()) with an instrumented wrapper function. Each wrapper generates an event trace record for function entry, calls the real function that implements the desired functionality, generates an event trace record that captures the relevant details of the I/O operation, and then generates an event trace record for the function exit. At a Figure 1: Cray XT4 Processing Elements (Image courtesy Cray, Inc.) minimum, each event trace record includes a timestamp of when the operation occurred and the type of operation. Figure 2 shows where instrumented functions are interposed between an application process and the default runtime software stack. The second component is an event tracing support library that implements the needed functionality to produce event trace files in the Open Trace Format [12], a file format for expressing event traces that is supported by performance tools such as TAU, SCALASCA [2], and Vampir [11]. On many traditional UNIX and UNIX-like systems (e.g., Linux), we could use shared libraries to interpose our instrumented file I/O wrapper functions into the control flow between an application function that calls a file I/O function and the system s implementation of that I/O function. In fact, Cray s CNL supports shared libraries in order to ease the use of scripting languages such as Python in scientific applications. However, the Cray XT running Catamount does not support shared libraries and the default linking mode for XT systems running CNL is to produce statically linked executable files. Thus, we chose to use link-time function wrapping, facilitated by the GNU linker s strong support for function wrapping. Using the --wrap command-line switch, this linker causes application calls to a function like read() to be calls to a function named wrap_read() instead and exposes the original function with the name real_read() instead. Our instrumented version of wrap_read() uses the symbol real_read() to access the system s implementation of the read() function. In addition to collecting event data for system file I/O functions, our event tracing software can collect event data for MPI [9] functions, including MPI-IO functions. Because a compliant MPI implementation includes support for the PMPI profiling interface, we use the PMPI interface for interposing instrumented MPI functions rather than the linker s function wrapping facility to interpose our instrumented functions. For interposing our instrumented MPI wrapper functions, we use an automated wrapper generator script based on the generator used by the mpip [16] lightweight MPI profiling tool.

3 Figure 2: IOT Interposition of instrumented functions between an application process and the runtime software stack To simplify the use of our event trace capture software, we use custom versions of the Cray Fortran, C, and C++ compiler scripts that automatically include the correct linker switches and libraries to interpose our infrastructure libraries. For basic event tracing scenarios, the user need not modify their application source code to use our infrastructure; instead, the user modifies his makefiles to use the command iot_ftn instead of ftn to link their Fortran program. Performance data collection using event tracing has the potential to generate massive volumes of performance data, e.g., if the events being traced occur frequently, are traced in a large number of processes, or generate a large amount of performance data each time they occur. Several techniques exist to manage the performance data volume produced by detailed event tracing. Dynamic control of the event tracing infrastructure can be used to enable and disable event tracing while a program runs. Such control may be explicitly specified using API functions provided by the tracing infrastructure, or implicitly enabled by the tracing infrastructure in response to excessive performance data volume. Sophisticated performance data collection tools like Paraver use pattern recognition to identify repetitive behavior and only keep event data for a limited, representative sequence of program events. Our IOT event tracing infrastructure currently uses simple event tracing control using explicit API functions coupled with OTF s compressed short format to manage performance data volume. 4. CASE STUDIES As preliminary test cases for the prototype implementation of our I/O characterization approach, we have studied the I/O behavior of two scientific applications on the Cray XT at ORNL. At the time of our experimentation the ORNL Cray XT used the Catamount lightweight kernel on its compute nodes. 4.1 STATISTICS: THE PARALLEL OCEAN PROGRAM The Parallel Ocean Program [7] (POP) is an ocean simulation program produced by researchers at Los Alamos National Laboratory. It serves as the ocean model in the Community Climate System Model [3] (CCSM). It is implemented in Fortran 90 and uses MPI message passing for communicating data between parallel processes. It can use either netcdf or Fortran I/O functions for its output. The program performs I/O for reasons common to many scientific applications: To obtain simulation control parameters and initial conditions, such as topography grid data and forcing data used when POP is run in standalone mode (i.e., outside of the full CCSM); To save time-varying results such as movie frames and calculation history; and To save periodic checkpoint files. In our experimentation, we used POP version and the X1 benchmark problem with a grid spacing of one degree. To limit the time required for our program runs, we limited the number of simulation timesteps to forty timesteps; production runs involve many more timesteps. We also configured the program to output checkpoints every 10 timesteps, movie files every five timesteps, and no calculation history files. POP implements its own collective I/O instead of using existing parallel I/O software like MPI-IO or parallel netcdf. Because performing I/O from too many processes can overwhelm the I/O capabilities of many systems, POP can be configured to limit the number of writer processes. For this study, we configured POP to use four output tasks. After we completed our study, climate community experts notified us that the parallel I/O feature of the POP version we used was suspected to be defective and that only one output task was usually used for production runs. For our experiments, POP s I/O data volume was modest. The input activity consisted of reading approximately 7MB of horizontal and vertical grid data and approximately 490KB of topography data during program initialization. Output activity involved writing checkpoint files consisting of a 10KB text metadata file and a 346MB binary data file, and writing 3.9MB movie files using netcdf.

4 Table 1: POP OTF trace file characteristics (all values in bytes) POP output activity varied between the MPI rank 0 process and other writer processes. The rank 0 process alone writes the checkpoint metadata file; our IOT traces showed this process used seven write function calls to write this 10KB file. The rank 0 process also writes the entire netcdf movie file itself. The IOT event traces showed two write function calls each time a movie file was saved. All writer processes regardless of rank made write function calls to contribute to the checkpoint file. Each writer performed eighty write operations, each of approximately 980KB. Our analysis of the IOT event trace files suggests several potential optimizations to improve POP I/O performance. First, the program could be modified to use a parallel I/O library for writing movie files to avoid serializing this activity. Second, the program s checkpoint output activity could be adapted to use a parallel I/O library instead of its own collective communication and Fortran I/O operations. We stress, however, that any such changes must take into account any differences between the layout of data in memory versus the desired layout on disk, intended to support post-mortem analysis of the program s results. Table 1 describes the OTF event trace files produced when collecting event trace files describing the I/O activity of our four POP writer processes. The table shows the total performance data volume and the per-timestep data volumes for both MPI rank 0 and non-rank-0 processes. These data volumes reflect the data volume of all OTF local (per-process) metadata files but not the global metadata file. The event trace files include events for Fortran I/O operations but not the MPI communication performed by POP to gather program data to the writer processes. I/O operation event data in the generated trace file includes the operation timestamp, duration, and number of bytes read or written but not the number of bytes requested to be read or written. As expected, the OTF compressed short format is the most desirable output format. That this output format produced only 67 bytes per timestep is encouraging. 4.2 VISUALIZATION: S3D For another preliminary I/O characterization case study we used the combustion simulation program S3D [4]. Because our initial goal was to test the functionality of our approach, we applied our software to a small test case with eight application processes running for fifty simulation timesteps. Production S3D runs use thousands of processes and run for hundreds or thousands of timesteps. In contrast to the POP case study where we analyzed IOT event trace files to obtain I/O operation statistics, for our S3D study we focused on event data visualization. A portion of the event trace corresponding to the writing of one checkpoint is shown in the Vampir timeline display shown in Figure 3. In the figure, MPI Although MPI events are shown in the timeline display, the lines indication communication between processes are not shown for clarity. Although our analysis of S3D s I/O behavior is in its early stages, the Vampir timeline display reveals the general checkpoint I/O strategy used by the version of S3D we obtained. The MPI rank 0 process opens and reads data from a control file, then broadcasts a message describing the parameters to use for the checkpoint operation. The other MPI tasks, waiting for the broadcast, open a file for their checkpoint data (an event not clearly visible in the timeline visualization due to the display s zoom factor). Once each process opens its checkpoint file, it writes its checkpoint data in several small write operations, at least as far as the system is concerned. The checkpoint finishes with a barrier operation, but a slow writer process causes most processes a delay before proceeding with the computation. For our S3D test problem, these checkpoint files are each relatively small: only 16MB. However, because each process writes its own checkpoint file, the timeline visualization hints that runs of the version of S3D we used would present the metadata server with many nearly-simultaneous file create operations. This activity could be overwhelming for production runs with tens of thousands of processes. Spreading these file open operations across a longer time interval, and performing fewer large writes rather than several small writes are two potential strategies for improving the I/O performance of the S3D version we used, based on this preliminary event trace analysis. Although our analysis of S3D s I/O behavior has just begun, an early visualization of detailed event trace data has already enhanced our understanding of the S3D I/O strategy and suggested potential approaches for improving the I/O performance of this application. 5. SUMMARY Understanding application I/O behavior is critical to overcoming gaps between the I/O demands of an application and the I/O capabilities of a system. We are developing an event tracing infrastructure for characterizing the I/O behavior of applications running on the Cray XT, a primary computing platform in the DOE Office of Science leadership computing efforts. We have begun to apply our prototype implementation to characterize the I/O behavior of two scientific applications of interest to the Office of Science, obtaining insight into possible optimizations for improving their I/O performance.

5 Figure 3: Vampir event trace timeline visualization showing one S3D checkpoint operation In the future, we plan to use our I/O characterization software as the foundation for a suite of simple tools for I/O performance analysis, automated performance problem diagnosis, and automated performance tuning of application I/O behavior. We also plan to improve the scalability of our I/O performance data collection and analysis functionality using our MRNet [14] scalable tool infrastructure. Finally, we plan to continue our characterization work with applications beyond POP and S3D. 6. ACKNOWLEDGMENTS Our thanks to Jeffrey S. Vetter, Weikuan Yu, and the other members of the ORNL Future Technologies Group for their constructive criticism of our work. Thanks also to Pat Worley for providing a mechanism for access to ORNL Leadership Computing Facility systems via the Performance Evaluation and Analysis Consortium End Station. This research used resources of the National Center for Computational Sciences at Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract number DE-AC05-00OR REFERENCES [1] S.R. Alam, R.F. Barrett et al., An Evaluation of the ORNL Cray XT3, International Journal of High Performance Computing Applications, 21(4), 2007 (to appear). [2] D. Becker, F. Wolf et al., Automated Trace-Based Performance Analysis of Metacomputing Applications, Proc. IEEE International Parallel and Distributed Processing Symposium (IPDPS), [3] M.B. Blackmon, B. Boville et al., The Community Climate System Model, BAMS, 82(11): , [4] J.H. Chen and H.G. Im, Stretch effects on the Burning Velocity of turbulent premixed ydrogen-air Flames, Proc. Comb. Inst, 2000, pp [5] Cray Inc., Cray XT5 Family of Supercomputers, [6] L. DeRose and D.A. Reed, SvPablo: A Multi-Language Architecture-Independent Performance Analysis System, Proc. International Conference on Parallel Processing (ICPP'99), 1999, pp [7] P.W. Jones, P.H. Worley et al., Practical performance portability in the Parallel Ocean Program (POP), Concurrency and Computation: Experience and Practice, 17(10): , [8] S.M. Kelly and R. Brightwell, Software Architecture of the Light Weight Kernel, Catamount, in Cray User Group Technical Conference. Albuquerque, NM, 2005 [9] Message Passing Interface Forum, MPI-2: A Message Passing Interface Standard, International Journal of Supercomputer Applications and High Performance Computing, 12(1 2):1 299, 1998.

6 [10] B.P. Miller, M.D. Callaghan et al., The Paradyn Parallel Performance Measurement Tools, IEEE Computer, 28(11):37 46, [11] W.E. Nagel, A. Arnold et al., VAMPIR: Visualization and Analysis of MPI Resources, Supercomputer 63, 12(1):69-80, [12] Paratools Inc., Open Trace Format, [13] V. Pillet, J. Labarta et al., PARAVER: A Tool to Visualize and Analyze Parallel Code, Proc. WoTUG-18: Transputer and Occam Developments, 1995, pp [14] P.C. Roth, D.C. Arnold, and B.P. Miller, MRNet: A Software-Based Multicast/Reduction Network for Scalable Tools, Proc. SC2003, [15] S. Shende and A.D. Malony, The TAU Parallel Performance System, International Journal of High Performance Computing Applications, 20(2): , [16] J.S. Vetter and M.O. McCracken, Statistical Scalability Analysis of Communication Operations in Distributed Applications, Principles and Practice of Parallel Programming, 36(7):123 32, 2001.

ScalaIOTrace: Scalable I/O Tracing and Analysis

ScalaIOTrace: Scalable I/O Tracing and Analysis ScalaIOTrace: Scalable I/O Tracing and Analysis Karthik Vijayakumar 1, Frank Mueller 1, Xiaosong Ma 1,2, Philip C. Roth 2 1 Department of Computer Science, NCSU 2 Computer Science and Mathematics Division,

More information

Scalable Tool Infrastructure for the Cray XT Using Tree-Based Overlay Networks

Scalable Tool Infrastructure for the Cray XT Using Tree-Based Overlay Networks Scalable Tool Infrastructure for the Cray XT Using Tree-Based Overlay Networks Philip C. Roth, Oak Ridge National Laboratory and Jeffrey S. Vetter, Oak Ridge National Laboratory and Georgia Institute of

More information

Philip C. Roth. Computer Science and Mathematics Division Oak Ridge National Laboratory

Philip C. Roth. Computer Science and Mathematics Division Oak Ridge National Laboratory Philip C. Roth Computer Science and Mathematics Division Oak Ridge National Laboratory A Tree-Based Overlay Network (TBON) like MRNet provides scalable infrastructure for tools and applications MRNet's

More information

Guidelines for Efficient Parallel I/O on the Cray XT3/XT4

Guidelines for Efficient Parallel I/O on the Cray XT3/XT4 Guidelines for Efficient Parallel I/O on the Cray XT3/XT4 Jeff Larkin, Cray Inc. and Mark Fahey, Oak Ridge National Laboratory ABSTRACT: This paper will present an overview of I/O methods on Cray XT3/XT4

More information

Scalable I/O Tracing and Analysis

Scalable I/O Tracing and Analysis Scalable I/O Tracing and Analysis Karthik Vijayakumar 1 Frank Mueller 1 Xiaosong Ma 1,2 Philip C. Roth 2 1 Department of Computer Science, North Carolina State University, Raleigh, NC 27695-7534 2 Computer

More information

Compute Node Linux: Overview, Progress to Date & Roadmap

Compute Node Linux: Overview, Progress to Date & Roadmap Compute Node Linux: Overview, Progress to Date & Roadmap David Wallace Cray Inc ABSTRACT: : This presentation will provide an overview of Compute Node Linux(CNL) for the CRAY XT machine series. Compute

More information

Improving I/O Performance in POP (Parallel Ocean Program)

Improving I/O Performance in POP (Parallel Ocean Program) Improving I/O Performance in POP (Parallel Ocean Program) Wang Di 2 Galen M. Shipman 1 Sarp Oral 1 Shane Canon 1 1 National Center for Computational Sciences, Oak Ridge National Laboratory Oak Ridge, TN

More information

Scalable, Automated Characterization of Parallel Application Communication Behavior

Scalable, Automated Characterization of Parallel Application Communication Behavior Scalable, Automated Characterization of Parallel Application Communication Behavior Philip C. Roth Computer Science and Mathematics Division Oak Ridge National Laboratory 12 th Scalable Tools Workshop

More information

Automated Characterization of Parallel Application Communication Patterns

Automated Characterization of Parallel Application Communication Patterns Automated Characterization of Parallel Application Communication Patterns Philip C. Roth Jeremy S. Meredith Jeffrey S. Vetter Oak Ridge National Laboratory 17 June 2015 ORNL is managed by UT-Battelle for

More information

Early Evaluation of the Cray X1 at Oak Ridge National Laboratory

Early Evaluation of the Cray X1 at Oak Ridge National Laboratory Early Evaluation of the Cray X1 at Oak Ridge National Laboratory Patrick H. Worley Thomas H. Dunigan, Jr. Oak Ridge National Laboratory 45th Cray User Group Conference May 13, 2003 Hyatt on Capital Square

More information

Early Evaluation of the Cray XD1

Early Evaluation of the Cray XD1 Early Evaluation of the Cray XD1 (FPGAs not covered here) Mark R. Fahey Sadaf Alam, Thomas Dunigan, Jeffrey Vetter, Patrick Worley Oak Ridge National Laboratory Cray User Group May 16-19, 2005 Albuquerque,

More information

ISC 09 Poster Abstract : I/O Performance Analysis for the Petascale Simulation Code FLASH

ISC 09 Poster Abstract : I/O Performance Analysis for the Petascale Simulation Code FLASH ISC 09 Poster Abstract : I/O Performance Analysis for the Petascale Simulation Code FLASH Heike Jagode, Shirley Moore, Dan Terpstra, Jack Dongarra The University of Tennessee, USA [jagode shirley terpstra

More information

Performance database technology for SciDAC applications

Performance database technology for SciDAC applications Performance database technology for SciDAC applications D Gunter 1, K Huck 2, K Karavanic 3, J May 4, A Malony 2, K Mohror 3, S Moore 5, A Morris 2, S Shende 2, V Taylor 6, X Wu 6, and Y Zhang 7 1 Lawrence

More information

Improving the Scalability of Performance Evaluation Tools

Improving the Scalability of Performance Evaluation Tools Improving the Scalability of Performance Evaluation Tools Sameer Suresh Shende, Allen D. Malony, and Alan Morris Performance Research Laboratory Department of Computer and Information Science University

More information

The Effect of Emerging Architectures on Data Science (and other thoughts)

The Effect of Emerging Architectures on Data Science (and other thoughts) The Effect of Emerging Architectures on Data Science (and other thoughts) Philip C. Roth With contributions from Jeffrey S. Vetter and Jeremy S. Meredith (ORNL) and Allen Malony (U. Oregon) Future Technologies

More information

A Holistic Approach for Performance Measurement and Analysis for Petascale Applications

A Holistic Approach for Performance Measurement and Analysis for Petascale Applications A Holistic Approach for Performance Measurement and Analysis for Petascale Applications Heike Jagode 1,2, Jack Dongarra 1,2 Sadaf Alam 2, Jeffrey Vetter 2 Wyatt Spear 3, Allen D. Malony 3 1 The University

More information

Workload Characterization using the TAU Performance System

Workload Characterization using the TAU Performance System Workload Characterization using the TAU Performance System Sameer Shende, Allen D. Malony, and Alan Morris Performance Research Laboratory, Department of Computer and Information Science University of

More information

A More Realistic Way of Stressing the End-to-end I/O System

A More Realistic Way of Stressing the End-to-end I/O System A More Realistic Way of Stressing the End-to-end I/O System Verónica G. Vergara Larrea Sarp Oral Dustin Leverman Hai Ah Nam Feiyi Wang James Simmons CUG 2015 April 29, 2015 Chicago, IL ORNL is managed

More information

Scalable, Automated Parallel Performance Analysis with TAU, PerfDMF and PerfExplorer

Scalable, Automated Parallel Performance Analysis with TAU, PerfDMF and PerfExplorer Scalable, Automated Parallel Performance Analysis with TAU, PerfDMF and PerfExplorer Kevin A. Huck, Allen D. Malony, Sameer Shende, Alan Morris khuck, malony, sameer, amorris@cs.uoregon.edu http://www.cs.uoregon.edu/research/tau

More information

Titan - Early Experience with the Titan System at Oak Ridge National Laboratory

Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Office of Science Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Buddy Bland Project Director Oak Ridge Leadership Computing Facility November 13, 2012 ORNL s Titan Hybrid

More information

On the scalability of tracing mechanisms 1

On the scalability of tracing mechanisms 1 On the scalability of tracing mechanisms 1 Felix Freitag, Jordi Caubet, Jesus Labarta Departament d Arquitectura de Computadors (DAC) European Center for Parallelism of Barcelona (CEPBA) Universitat Politècnica

More information

Preparing GPU-Accelerated Applications for the Summit Supercomputer

Preparing GPU-Accelerated Applications for the Summit Supercomputer Preparing GPU-Accelerated Applications for the Summit Supercomputer Fernanda Foertter HPC User Assistance Group Training Lead foertterfs@ornl.gov This research used resources of the Oak Ridge Leadership

More information

Comparison of XT3 and XT4 Scalability

Comparison of XT3 and XT4 Scalability Comparison of XT3 and XT4 Scalability Patrick H. Worley Oak Ridge National Laboratory CUG 2007 May 7-10, 2007 Red Lion Hotel Seattle, WA Acknowledgements Research sponsored by the Climate Change Research

More information

Initial Performance Evaluation of the Cray SeaStar Interconnect

Initial Performance Evaluation of the Cray SeaStar Interconnect Initial Performance Evaluation of the Cray SeaStar Interconnect Ron Brightwell Kevin Pedretti Keith Underwood Sandia National Laboratories Scalable Computing Systems Department 13 th IEEE Symposium on

More information

Introduction to HPC Parallel I/O

Introduction to HPC Parallel I/O Introduction to HPC Parallel I/O Feiyi Wang (Ph.D.) and Sarp Oral (Ph.D.) Technology Integration Group Oak Ridge Leadership Computing ORNL is managed by UT-Battelle for the US Department of Energy Outline

More information

Extending scalability of the community atmosphere model

Extending scalability of the community atmosphere model Journal of Physics: Conference Series Extending scalability of the community atmosphere model To cite this article: A Mirin and P Worley 2007 J. Phys.: Conf. Ser. 78 012082 Recent citations - Evaluation

More information

The Titan Tools Experience

The Titan Tools Experience The Titan Tools Experience Michael J. Brim, Ph.D. Computer Science Research, CSMD/NCCS Petascale Tools Workshop 213 Madison, WI July 15, 213 Overview of Titan Cray XK7 18,688+ compute nodes 16-core AMD

More information

Compute Node Linux (CNL) The Evolution of a Compute OS

Compute Node Linux (CNL) The Evolution of a Compute OS Compute Node Linux (CNL) The Evolution of a Compute OS Overview CNL The original scheme plan, goals, requirements Status of CNL Plans Features and directions Futures May 08 Cray Inc. Proprietary Slide

More information

Empirical Analysis of a Large-Scale Hierarchical Storage System

Empirical Analysis of a Large-Scale Hierarchical Storage System Empirical Analysis of a Large-Scale Hierarchical Storage System Weikuan Yu, H. Sarp Oral, R. Shane Canon, Jeffrey S. Vetter, and Ramanan Sankaran Oak Ridge National Laboratory Oak Ridge, TN 37831 {wyu,oralhs,canonrs,vetter,sankaranr}@ornl.gov

More information

Evaluating Algorithms for Shared File Pointer Operations in MPI I/O

Evaluating Algorithms for Shared File Pointer Operations in MPI I/O Evaluating Algorithms for Shared File Pointer Operations in MPI I/O Ketan Kulkarni and Edgar Gabriel Parallel Software Technologies Laboratory, Department of Computer Science, University of Houston {knkulkarni,gabriel}@cs.uh.edu

More information

Scalable, Automated Performance Analysis with TAU and PerfExplorer

Scalable, Automated Performance Analysis with TAU and PerfExplorer Scalable, Automated Performance Analysis with TAU and PerfExplorer Kevin A. Huck, Allen D. Malony, Sameer Shende and Alan Morris Performance Research Laboratory Computer and Information Science Department

More information

Scalable Performance Analysis of Parallel Systems: Concepts and Experiences

Scalable Performance Analysis of Parallel Systems: Concepts and Experiences 1 Scalable Performance Analysis of Parallel Systems: Concepts and Experiences Holger Brunst ab and Wolfgang E. Nagel a a Center for High Performance Computing, Dresden University of Technology, 01062 Dresden,

More information

Improving Applica/on Performance Using the TAU Performance System

Improving Applica/on Performance Using the TAU Performance System Improving Applica/on Performance Using the TAU Performance System Sameer Shende, John C. Linford {sameer, jlinford}@paratools.com ParaTools, Inc and University of Oregon. April 4-5, 2013, CG1, NCAR, UCAR

More information

Aggregation of Real-Time System Monitoring Data for Analyzing Large-Scale Parallel and Distributed Computing Environments

Aggregation of Real-Time System Monitoring Data for Analyzing Large-Scale Parallel and Distributed Computing Environments Aggregation of Real-Time System Monitoring Data for Analyzing Large-Scale Parallel and Distributed Computing Environments Swen Böhm 1,2, Christian Engelmann 2, and Stephen L. Scott 2 1 Department of Computer

More information

The Red Storm System: Architecture, System Update and Performance Analysis

The Red Storm System: Architecture, System Update and Performance Analysis The Red Storm System: Architecture, System Update and Performance Analysis Douglas Doerfler, Jim Tomkins Sandia National Laboratories Center for Computation, Computers, Information and Mathematics LACSI

More information

The PAPI Cross-Platform Interface to Hardware Performance Counters

The PAPI Cross-Platform Interface to Hardware Performance Counters The PAPI Cross-Platform Interface to Hardware Performance Counters Kevin London, Shirley Moore, Philip Mucci, and Keith Seymour University of Tennessee-Knoxville {london, shirley, mucci, seymour}@cs.utk.edu

More information

Performance Measurement and Evaluation Tool for Large-scale Systems

Performance Measurement and Evaluation Tool for Large-scale Systems Performance Measurement and Evaluation Tool for Large-scale Systems Hong Ong ORNL hongong@ornl.gov December 7 th, 2005 Acknowledgements This work is sponsored in parts by: The High performance Computing

More information

Allowing Users to Run Services at the OLCF with Kubernetes

Allowing Users to Run Services at the OLCF with Kubernetes Allowing Users to Run Services at the OLCF with Kubernetes Jason Kincl Senior HPC Systems Engineer Ryan Adamson Senior HPC Security Engineer This work was supported by the Oak Ridge Leadership Computing

More information

The Cray Programming Environment. An Introduction

The Cray Programming Environment. An Introduction The Cray Programming Environment An Introduction Vision Cray systems are designed to be High Productivity as well as High Performance Computers The Cray Programming Environment (PE) provides a simple consistent

More information

Characterizing Imbalance in Large-Scale Parallel Programs. David Bo hme September 26, 2013

Characterizing Imbalance in Large-Scale Parallel Programs. David Bo hme September 26, 2013 Characterizing Imbalance in Large-Scale Parallel Programs David o hme September 26, 2013 Need for Performance nalysis Tools mount of parallelism in Supercomputers keeps growing Efficient resource usage

More information

Revealing Applications Access Pattern in Collective I/O for Cache Management

Revealing Applications Access Pattern in Collective I/O for Cache Management Revealing Applications Access Pattern in for Yin Lu 1, Yong Chen 1, Rob Latham 2 and Yu Zhuang 1 Presented by Philip Roth 3 1 Department of Computer Science Texas Tech University 2 Mathematics and Computer

More information

A Case for Standard Non-Blocking Collective Operations

A Case for Standard Non-Blocking Collective Operations A Case for Standard Non-Blocking Collective Operations T. Hoefler,2, P. Kambadur, R. L. Graham 3, G. Shipman 4 and A. Lumsdaine Open Systems Lab 2 Computer Architecture Group Indiana University Technical

More information

Performance of a Direct Numerical Simulation Solver forf Combustion on the Cray XT3/4

Performance of a Direct Numerical Simulation Solver forf Combustion on the Cray XT3/4 Performance of a Direct Numerical Simulation Solver forf Combustion on the Cray XT3/4 Ramanan Sankaran and Mark R. Fahey National Center for Computational Sciences Oak Ridge National Laboratory Jacqueline

More information

The Automatic Library Tracking Database

The Automatic Library Tracking Database The Automatic Library Tracking Database Mark Fahey, Nick Jones, and Bilel Hadri National Institute for Computational Sciences ABSTRACT: A library tracking database has been developed and put into production

More information

Steven Carter. Network Lead, NCCS Oak Ridge National Laboratory OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY 1

Steven Carter. Network Lead, NCCS Oak Ridge National Laboratory OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY 1 Networking the National Leadership Computing Facility Steven Carter Network Lead, NCCS Oak Ridge National Laboratory scarter@ornl.gov 1 Outline Introduction NCCS Network Infrastructure Cray Architecture

More information

TAUg: Runtime Global Performance Data Access Using MPI

TAUg: Runtime Global Performance Data Access Using MPI TAUg: Runtime Global Performance Data Access Using MPI Kevin A. Huck, Allen D. Malony, Sameer Shende, and Alan Morris Performance Research Laboratory Department of Computer and Information Science University

More information

Scalable Critical Path Analysis for Hybrid MPI-CUDA Applications

Scalable Critical Path Analysis for Hybrid MPI-CUDA Applications Center for Information Services and High Performance Computing (ZIH) Scalable Critical Path Analysis for Hybrid MPI-CUDA Applications The Fourth International Workshop on Accelerators and Hybrid Exascale

More information

The Cray Rainier System: Integrated Scalar/Vector Computing

The Cray Rainier System: Integrated Scalar/Vector Computing THE SUPERCOMPUTER COMPANY The Cray Rainier System: Integrated Scalar/Vector Computing Per Nyberg 11 th ECMWF Workshop on HPC in Meteorology Topics Current Product Overview Cray Technology Strengths Rainier

More information

CHARACTERIZING HPC I/O: FROM APPLICATIONS TO SYSTEMS

CHARACTERIZING HPC I/O: FROM APPLICATIONS TO SYSTEMS erhtjhtyhy CHARACTERIZING HPC I/O: FROM APPLICATIONS TO SYSTEMS PHIL CARNS carns@mcs.anl.gov Mathematics and Computer Science Division Argonne National Laboratory April 20, 2017 TU Dresden MOTIVATION FOR

More information

Parallel I/O Libraries and Techniques

Parallel I/O Libraries and Techniques Parallel I/O Libraries and Techniques Mark Howison User Services & Support I/O for scientifc data I/O is commonly used by scientific applications to: Store numerical output from simulations Load initial

More information

Scalability Improvements in the TAU Performance System for Extreme Scale

Scalability Improvements in the TAU Performance System for Extreme Scale Scalability Improvements in the TAU Performance System for Extreme Scale Sameer Shende Director, Performance Research Laboratory, University of Oregon TGCC, CEA / DAM Île de France Bruyères- le- Châtel,

More information

Isolating Runtime Faults with Callstack Debugging using TAU

Isolating Runtime Faults with Callstack Debugging using TAU Isolating Runtime Faults with Callstack Debugging using TAU Sameer Shende, Allen D. Malony, John C. Linford ParaTools, Inc. Eugene, OR {Sameer, malony, jlinford}@paratools.com Andrew Wissink U.S. Army

More information

IS TOPOLOGY IMPORTANT AGAIN? Effects of Contention on Message Latencies in Large Supercomputers

IS TOPOLOGY IMPORTANT AGAIN? Effects of Contention on Message Latencies in Large Supercomputers IS TOPOLOGY IMPORTANT AGAIN? Effects of Contention on Message Latencies in Large Supercomputers Abhinav S Bhatele and Laxmikant V Kale ACM Research Competition, SC 08 Outline Why should we consider topology

More information

The Role of InfiniBand Technologies in High Performance Computing. 1 Managed by UT-Battelle for the Department of Energy

The Role of InfiniBand Technologies in High Performance Computing. 1 Managed by UT-Battelle for the Department of Energy The Role of InfiniBand Technologies in High Performance Computing 1 Managed by UT-Battelle Contributors Gil Bloch Noam Bloch Hillel Chapman Manjunath Gorentla- Venkata Richard Graham Michael Kagan Vasily

More information

JULIA ENABLED COMPUTATION OF MOLECULAR LIBRARY COMPLEXITY IN DNA SEQUENCING

JULIA ENABLED COMPUTATION OF MOLECULAR LIBRARY COMPLEXITY IN DNA SEQUENCING JULIA ENABLED COMPUTATION OF MOLECULAR LIBRARY COMPLEXITY IN DNA SEQUENCING Larson Hogstrom, Mukarram Tahir, Andres Hasfura Massachusetts Institute of Technology, Cambridge, Massachusetts, USA 18.337/6.338

More information

Impact of Quad-Core Cray XT4 System and Software Stack on Scientific Computation

Impact of Quad-Core Cray XT4 System and Software Stack on Scientific Computation Impact of Quad-Core Cray XT4 System and Software Stack on Scientific Computation S.R. Alam, R.F. Barrett, H. Jagode, J.A. Kuehn, S.W. Poole, and R. Sankaran Oak Ridge National Laboratory, Oak Ridge, TN

More information

Performance of Variant Memory Configurations for Cray XT Systems

Performance of Variant Memory Configurations for Cray XT Systems Performance of Variant Memory Configurations for Cray XT Systems Wayne Joubert, Oak Ridge National Laboratory ABSTRACT: In late 29 NICS will upgrade its 832 socket Cray XT from Barcelona (4 cores/socket)

More information

Efficiency Evaluation of the Input/Output System on Computer Clusters

Efficiency Evaluation of the Input/Output System on Computer Clusters Efficiency Evaluation of the Input/Output System on Computer Clusters Sandra Méndez, Dolores Rexachs and Emilio Luque Computer Architecture and Operating System Department (CAOS) Universitat Autònoma de

More information

A Trace-Scaling Agent for Parallel Application Tracing 1

A Trace-Scaling Agent for Parallel Application Tracing 1 A Trace-Scaling Agent for Parallel Application Tracing 1 Felix Freitag, Jordi Caubet, Jesus Labarta Computer Architecture Department (DAC) European Center for Parallelism of Barcelona (CEPBA) Universitat

More information

The Constellation Project. Andrew W. Nash 14 November 2016

The Constellation Project. Andrew W. Nash 14 November 2016 The Constellation Project Andrew W. Nash 14 November 2016 The Constellation Project: Representing a High Performance File System as a Graph for Analysis The Titan supercomputer utilizes high performance

More information

Extreme I/O Scaling with HDF5

Extreme I/O Scaling with HDF5 Extreme I/O Scaling with HDF5 Quincey Koziol Director of Core Software Development and HPC The HDF Group koziol@hdfgroup.org July 15, 2012 XSEDE 12 - Extreme Scaling Workshop 1 Outline Brief overview of

More information

MADNESS. Rick Archibald. Computer Science and Mathematics Division ORNL

MADNESS. Rick Archibald. Computer Science and Mathematics Division ORNL MADNESS Rick Archibald Computer Science and Mathematics Division ORNL CScADS workshop: Leadership-class Machines, Petascale Applications and Performance Strategies July 27-30 th Managed by UT-Battelle

More information

IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning

IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning September 22 nd 2015 Tommaso Cecchi 2 What is IME? This breakthrough, software defined storage application

More information

Profiling Non-numeric OpenSHMEM Applications with the TAU Performance System

Profiling Non-numeric OpenSHMEM Applications with the TAU Performance System Profiling Non-numeric OpenSHMEM Applications with the TAU Performance System John Linford 2,TylerA.Simon 1,2, Sameer Shende 2,3,andAllenD.Malony 2,3 1 University of Maryland Baltimore County 2 ParaTools

More information

Metropolitan Road Traffic Simulation on FPGAs

Metropolitan Road Traffic Simulation on FPGAs Metropolitan Road Traffic Simulation on FPGAs Justin L. Tripp, Henning S. Mortveit, Anders Å. Hansson, Maya Gokhale Los Alamos National Laboratory Los Alamos, NM 85745 Overview Background Goals Using the

More information

Toward portable I/O performance by leveraging system abstractions of deep memory and interconnect hierarchies

Toward portable I/O performance by leveraging system abstractions of deep memory and interconnect hierarchies Toward portable I/O performance by leveraging system abstractions of deep memory and interconnect hierarchies François Tessier, Venkatram Vishwanath, Paul Gressier Argonne National Laboratory, USA Wednesday

More information

MPI Performance Engineering through the Integration of MVAPICH and TAU

MPI Performance Engineering through the Integration of MVAPICH and TAU MPI Performance Engineering through the Integration of MVAPICH and TAU Allen D. Malony Department of Computer and Information Science University of Oregon Acknowledgement Research work presented in this

More information

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004 A Study of High Performance Computing and the Cray SV1 Supercomputer Michael Sullivan TJHSST Class of 2004 June 2004 0.1 Introduction A supercomputer is a device for turning compute-bound problems into

More information

xsim The Extreme-Scale Simulator

xsim The Extreme-Scale Simulator www.bsc.es xsim The Extreme-Scale Simulator Janko Strassburg Severo Ochoa Seminar @ BSC, 28 Feb 2014 Motivation Future exascale systems are predicted to have hundreds of thousands of nodes, thousands of

More information

Cray RS Programming Environment

Cray RS Programming Environment Cray RS Programming Environment Gail Alverson Cray Inc. Cray Proprietary Red Storm Red Storm is a supercomputer system leveraging over 10,000 AMD Opteron processors connected by an innovative high speed,

More information

Toward Improved Support for Loosely Coupled Large Scale Simulation Workflows. Swen Boehm Wael Elwasif Thomas Naughton, Geoffroy R.

Toward Improved Support for Loosely Coupled Large Scale Simulation Workflows. Swen Boehm Wael Elwasif Thomas Naughton, Geoffroy R. Toward Improved Support for Loosely Coupled Large Scale Simulation Workflows Swen Boehm Wael Elwasif Thomas Naughton, Geoffroy R. Vallee Motivation & Challenges Bigger machines (e.g., TITAN, upcoming Exascale

More information

The Cray Programming Environment. An Introduction

The Cray Programming Environment. An Introduction The Cray Programming Environment An Introduction Vision Cray systems are designed to be High Productivity as well as High Performance Computers The Cray Programming Environment (PE) provides a simple consistent

More information

Early Evaluation of the Cray XT5

Early Evaluation of the Cray XT5 CUG 2009 Proceedings Page of 2 Early Evaluation of the Cray XT5 P. H. Worley, R. F. Barrett, J. A. Kuehn Abstract A Cray XT5 system has recently been installed at Oak Ridge National Laboratory (ORNL).

More information

Scalable Compression and Replay of Communication Traces in Massively Parallel Environments

Scalable Compression and Replay of Communication Traces in Massively Parallel Environments Scalable Compression and Replay of Communication Traces in Massively Parallel Environments Michael Noeth 1, Frank Mueller 1, Martin Schulz 2, Bronis R. de Supinski 2 1 North Carolina State University 2

More information

Enabling high-speed asynchronous data extraction and transfer using DART

Enabling high-speed asynchronous data extraction and transfer using DART CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. (21) Published online in Wiley InterScience (www.interscience.wiley.com)..1567 Enabling high-speed asynchronous

More information

An Introduction to OpenACC

An Introduction to OpenACC An Introduction to OpenACC Alistair Hart Cray Exascale Research Initiative Europe 3 Timetable Day 1: Wednesday 29th August 2012 13:00 Welcome and overview 13:15 Session 1: An Introduction to OpenACC 13:15

More information

Using R for HPC Data Science. Session: Parallel Programming Paradigms. George Ostrouchov

Using R for HPC Data Science. Session: Parallel Programming Paradigms. George Ostrouchov Using R for HPC Data Science Session: Parallel Programming Paradigms George Ostrouchov Oak Ridge National Laboratory and University of Tennessee and pbdr Core Team Course at IT4Innovations, Ostrava, October

More information

The Fusion Distributed File System

The Fusion Distributed File System Slide 1 / 44 The Fusion Distributed File System Dongfang Zhao February 2015 Slide 2 / 44 Outline Introduction FusionFS System Architecture Metadata Management Data Movement Implementation Details Unique

More information

Screen Saver Science: Realizing Distributed Parallel Computing with Jini and JavaSpaces

Screen Saver Science: Realizing Distributed Parallel Computing with Jini and JavaSpaces Screen Saver Science: Realizing Distributed Parallel Computing with Jini and JavaSpaces William L. George and Jacob Scott National Institute of Standards and Technology Information Technology Laboratory

More information

Christopher Sewell Katrin Heitmann Li-ta Lo Salman Habib James Ahrens

Christopher Sewell Katrin Heitmann Li-ta Lo Salman Habib James Ahrens LA-UR- 14-25437 Approved for public release; distribution is unlimited. Title: Portable Parallel Halo and Center Finders for HACC Author(s): Christopher Sewell Katrin Heitmann Li-ta Lo Salman Habib James

More information

Update on Cray Activities in the Earth Sciences

Update on Cray Activities in the Earth Sciences Update on Cray Activities in the Earth Sciences Presented to the 13 th ECMWF Workshop on the Use of HPC in Meteorology 3-7 November 2008 Per Nyberg nyberg@cray.com Director, Marketing and Business Development

More information

Getting Insider Information via the New MPI Tools Information Interface

Getting Insider Information via the New MPI Tools Information Interface Getting Insider Information via the New MPI Tools Information Interface EuroMPI 2016 September 26, 2016 Kathryn Mohror This work was performed under the auspices of the U.S. Department of Energy by Lawrence

More information

Communication Patterns

Communication Patterns Communication Patterns Rolf Riesen Sandia National Laboratories P.O. Box 5 Albuquerque, NM 715-111 rolf@cs.sandia.gov Abstract Parallel applications have message-passing patterns that are important to

More information

Scalasca support for Intel Xeon Phi. Brian Wylie & Wolfgang Frings Jülich Supercomputing Centre Forschungszentrum Jülich, Germany

Scalasca support for Intel Xeon Phi. Brian Wylie & Wolfgang Frings Jülich Supercomputing Centre Forschungszentrum Jülich, Germany Scalasca support for Intel Xeon Phi Brian Wylie & Wolfgang Frings Jülich Supercomputing Centre Forschungszentrum Jülich, Germany Overview Scalasca performance analysis toolset support for MPI & OpenMP

More information

Score-P A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir

Score-P A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir Score-P A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir VI-HPS Team Score-P: Specialized Measurements and Analyses Mastering build systems Hooking up the

More information

Productive Performance on the Cray XK System Using OpenACC Compilers and Tools

Productive Performance on the Cray XK System Using OpenACC Compilers and Tools Productive Performance on the Cray XK System Using OpenACC Compilers and Tools Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc. 1 The New Generation of Supercomputers Hybrid

More information

[Scalasca] Tool Integrations

[Scalasca] Tool Integrations Mitglied der Helmholtz-Gemeinschaft [Scalasca] Tool Integrations Aug 2011 Bernd Mohr CScADS Performance Tools Workshop Lake Tahoe Contents Current integration of various direct measurement tools Paraver

More information

Dynamic Load Balancing for Weather Models via AMPI

Dynamic Load Balancing for Weather Models via AMPI Dynamic Load Balancing for Eduardo R. Rodrigues IBM Research Brazil edrodri@br.ibm.com Celso L. Mendes University of Illinois USA cmendes@ncsa.illinois.edu Laxmikant Kale University of Illinois USA kale@cs.illinois.edu

More information

Introducing the Cray XMT. Petr Konecny May 4 th 2007

Introducing the Cray XMT. Petr Konecny May 4 th 2007 Introducing the Cray XMT Petr Konecny May 4 th 2007 Agenda Origins of the Cray XMT Cray XMT system architecture Cray XT infrastructure Cray Threadstorm processor Shared memory programming model Benefits/drawbacks/solutions

More information

Supercomputing and Mass Market Desktops

Supercomputing and Mass Market Desktops Supercomputing and Mass Market Desktops John Manferdelli Microsoft Corporation This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.

More information

Introduction to Cluster Computing

Introduction to Cluster Computing Introduction to Cluster Computing Prabhaker Mateti Wright State University Dayton, Ohio, USA Overview High performance computing High throughput computing NOW, HPC, and HTC Parallel algorithms Software

More information

Chapter 14 Performance and Processor Design

Chapter 14 Performance and Processor Design Chapter 14 Performance and Processor Design Outline 14.1 Introduction 14.2 Important Trends Affecting Performance Issues 14.3 Why Performance Monitoring and Evaluation are Needed 14.4 Performance Measures

More information

Performance Analysis of Parallel Scientific Applications In Eclipse

Performance Analysis of Parallel Scientific Applications In Eclipse Performance Analysis of Parallel Scientific Applications In Eclipse EclipseCon 2015 Wyatt Spear, University of Oregon wspear@cs.uoregon.edu Supercomputing Big systems solving big problems Performance gains

More information

Scibox: Online Sharing of Scientific Data via the Cloud

Scibox: Online Sharing of Scientific Data via the Cloud Scibox: Online Sharing of Scientific Data via the Cloud Jian Huang, Xuechen Zhang, Greg Eisenhauer, Karsten Schwan, Matthew Wolf *, Stephane Ethier, Scott Klasky * Georgia Institute of Technology, Princeton

More information

Commission of the European Communities **************** ESPRIT III PROJECT NB 6756 **************** CAMAS

Commission of the European Communities **************** ESPRIT III PROJECT NB 6756 **************** CAMAS Commission of the European Communities **************** ESPRIT III PROJECT NB 6756 **************** CAMAS COMPUTER AIDED MIGRATION OF APPLICATIONS SYSTEM **************** CAMAS-TR-2.3.4 Finalization Report

More information

Score-P. SC 14: Hands-on Practical Hybrid Parallel Application Performance Engineering 1

Score-P. SC 14: Hands-on Practical Hybrid Parallel Application Performance Engineering 1 Score-P SC 14: Hands-on Practical Hybrid Parallel Application Performance Engineering 1 Score-P Functionality Score-P is a joint instrumentation and measurement system for a number of PA tools. Provide

More information

Exploiting Lustre File Joining for Effective Collective IO

Exploiting Lustre File Joining for Effective Collective IO Exploiting Lustre File Joining for Effective Collective IO Weikuan Yu, Jeffrey Vetter Oak Ridge National Laboratory Computer Science & Mathematics Oak Ridge, TN, USA 37831 {wyu,vetter}@ornl.gov R. Shane

More information

Parallel Execution of Functional Mock-up Units in Buildings Modeling

Parallel Execution of Functional Mock-up Units in Buildings Modeling ORNL/TM-2016/173 Parallel Execution of Functional Mock-up Units in Buildings Modeling Ozgur Ozmen James J. Nutaro Joshua R. New Approved for public release. Distribution is unlimited. June 30, 2016 DOCUMENT

More information

pnfs and Linux: Working Towards a Heterogeneous Future

pnfs and Linux: Working Towards a Heterogeneous Future CITI Technical Report 06-06 pnfs and Linux: Working Towards a Heterogeneous Future Dean Hildebrand dhildebz@umich.edu Peter Honeyman honey@umich.edu ABSTRACT Anticipating terascale and petascale HPC demands,

More information

Communication Characteristics in the NAS Parallel Benchmarks

Communication Characteristics in the NAS Parallel Benchmarks Communication Characteristics in the NAS Parallel Benchmarks Ahmad Faraj Xin Yuan Department of Computer Science, Florida State University, Tallahassee, FL 32306 {faraj, xyuan}@cs.fsu.edu Abstract In this

More information