Size: px
Start display at page:

Download ""

Transcription

1 5 th Annual Workshop on Charm++ and Applications Welcome and Introduction State of Charm++ Laxmikant Kale Parallel Programming Laboratory Department of Computer Science University of Illinois at Urbana Champaign 4/23/2007 CharmWorkshop2007 1

2 A Glance at History 1987: Chare Kernel arose from parallel Prolog work Dynamic load balancing for state-space search, Prolog, : Charm : Position Paper: Application Oriented yet CS Centered Research NAMD : 1994, 1996 Charm++ in almost current form: Chare arrays, Measurement Based Dynamic Load balancing 1997 : Rocket Center: a trigger for AMPI 2001: Era of ITRs: Quantum Chemistry collaboration Computational Astronomy collaboration: ChaNGa 4/23/2007 CharmWorkshop2007 2

3 What is Charm++ and why is it good Overview of recent results Language work: raising the level of abstraction Domain Specific Frameworks: ParFUM Guebelle: crack propoagation Haber: spae-time meshing Applications NAMD (picked by NSF, new scaling results to 32k procs.) ChaNGa: released, gravity performance LeanCP: Use at National centers BigSim Outline Scalable Performance tools Scalable Load Balancers Fault tolerance Cell, GPGPUs,.. Upcoming Challenges and opportunities: Multicore Funding 4/23/2007 CharmWorkshop2007 3

4 PPL Mission and Approach To enhance Performance and Productivity in programming complex parallel applications Performance: scalable to thousands of processors Productivity: of human programmers Complex: irregular structure, dynamic variations Approach: Application Oriented yet CS centered research Develop enabling technology, for a wide collection of apps. Develop, use and test it in the context of real applications How? Develop novel Parallel programming techniques Embody them into easy to use abstractions So, application scientist can use advanced techniques with ease Enabling technology: reused across many apps 4/23/2007 CharmWorkshop2007 4

5 Migratable Objects (aka Processor Virtualization) Programmer: [Over] decomposition into virtual processors Runtime: Assigns VPs to processors Enables adaptive runtime strategies Implementations: Charm++, AMPI User View System implementation Benefits Software engineering Number of virtual processors can be independently controlled Separate VPs for different modules Message driven execution Adaptive overlap of communication Predictability : Automatic out-of-core Asynchronous reductions Dynamic mapping Heterogeneous clusters Vacate, adjust to speed, share Automatic checkpointing Change set of processors used Automatic dynamic load balancing Communication optimization 4/23/2007 CharmWorkshop2007 5

6 Adaptive overlap and modules SPMD and Message-Driven Modules (From A. Gursoy, Simplified expression of message-driven programs and quantification of their impact on performance, Ph.D Thesis, Apr 1994.) Modularity, Reuse, and Efficiency with Message-Driven Libraries: Proc. of the Seventh SIAM Conference on Parallel Processing for Scientific Computing, San Fransisco, /23/2007 CharmWorkshop2007 6

7 Realization: Charm++ s Object Arrays A collection of data-driven objects With a single global name for the collection Each member addressed by an index [sparse] 1D, 2D, 3D, tree, string,... Mapping of element objects to procs handled by the system A[0] A[1] A[2] A[3] A[..] User s view 4/23/2007 CharmWorkshop2007 7

8 Realization: Charm++ s Object Arrays A collection of data-driven objects With a single global name for the collection Each member addressed by an index [sparse] 1D, 2D, 3D, tree, string,... Mapping of element objects to procs handled by the system A[0] A[1] A[2] A[3] A[..] User s view System view A[0] A[3] 4/23/2007 CharmWorkshop2007 8

9 Charm++: Object Arrays A collection of data-driven objects With a single global name for the collection Each member addressed by an index [sparse] 1D, 2D, 3D, tree, string,... Mapping of element objects to procs handled by the system A[0] A[1] A[2] A[3] A[..] User s view System view A[0] A[3] 4/23/2007 CharmWorkshop2007 9

10 AMPI: Adaptive MPI 7 MPI processes 4/23/2007 CharmWorkshop

11 AMPI: Adaptive MPI 7 MPI processes Implemented as virtual processors (user-level migratable threads) Real Processors 4/23/2007 CharmWorkshop

12 Load Balancing Refinement Load Balancing Aggressive Load Balancing Processor Utilization against Time on 128 and 1024 processors On 128 processor, a single load balancing step suffices, but On 1024 processors, we need a refinement step. 4/23/2007 CharmWorkshop

13 Shrink/Expand Problem: Availability of computing platform may change Fitting applications on the platform by object migration Time per step for the million-row CG solver on a 16-node cluster Additional 16 nodes available at step 600 4/23/2007 CharmWorkshop

14 So, Whats new? 4/23/2007 CharmWorkshop

15 New Higher Level Abstractions Previously: Multiphase Shared Arrays Provides a disciplined use of global address space Each array can be accessed only in one of the following modes: ReadOnly, Write-by-One-Thread, Accumulate-only Access mode can change from phase to phase Phases delineated by per-array sync Charisma++: Global view of control Allows expressing global control flow in a charm program Separate expression of parallel and sequential Functional Implementation (Chao Huang PhD thesis) LCR 04, HPDC 07 4/23/2007 CharmWorkshop

16 Multiparadigm Interoperability Charm++ supports concurrent composition Allows multiple module written in multiple paradigms to cooperate in a single application Some recent paradigms implemented: ARMCI (for Global Arrays) Use of Multiparadigm programming You heard yesterday how ParFUM made use of multiple paradigms effetively 4/23/2007 CharmWorkshop

17 Blue Gene Provided a Showcase. Co-operation with Blue Gene team Sameer Kumar joins BlueGene team BGW days competetion 2006: Computer Science day 2007: Computational cosmology: ChaNGa LeanCP collaboration with Glenn Martyna, IBM 4/23/2007 CharmWorkshop

18 Cray and PSC Warms up 4000 fast processors at PSC 12,500 processors at ORNL Cray support via a gift grant 4/23/2007 CharmWorkshop

19 IBM Power7 Team Collaborations begun with NSF Track 1 proposal 4/23/2007 CharmWorkshop

20 Our Applications Achieved Unprecedented Speedups 4/23/2007 CharmWorkshop

21 Applications and Charm++ Application Issues Techniques & libraries Charm++ Other Applications Synergy between Computer Science Research and Biophysics has been beneficial to both 4/23/2007 CharmWorkshop

22 Charm++ and Applications Synergy between Computer Science Research and Biophysics has been beneficial to both LeanCP Space-time meshing NAMD Issues Techniques & libraries Charm++ Other Applications ChaNGa Rocket Simulation 4/23/2007 CharmWorkshop

23 Quantum Chemistry LeanCP Develop abstractions in context of full-scale applications NAMD: Molecular Dynamics Protein Folding Computational Cosmology STM virus simulation Parallel Objects, Adaptive Runtime System Libraries and Tools Crack Propagation Rocket Simulation Dendritic Growth Space-time meshes The enabling CS technology of parallel objects and intelligent Runtime systems has led to several collaborative applications in CSE 4/23/2007 CharmWorkshop

24 Molecular Dynamics in NAMD Collection of [charged] atoms, with bonds Newtonian mechanics Thousands of atoms (10, ,000) 1 femtosecond time-step, millions needed! At each time-step Calculate forces on each atom Bonds: Non-bonded: electrostatic and van der Waal s Short-distance: every timestep Long-distance: every 4 timesteps using PME (3D FFT) Multiple Time Stepping Calculate velocities and advance positions Collaboration with K. Schulten, R. Skeel, and coworkers 4/23/2007 CharmWorkshop

25 NAMD: A Production MD program NAMD Fully featured program NIH-funded development Distributed free of charge (~20,000 registered users) Binaries and source code Installed at NSF centers User training and support Large published simulations 4/23/2007 CharmWorkshop

26 NAMD: A Production MD program NAMD Fully featured program NIH-funded development Distributed free of charge (~20,000 registered users) Binaries and source code Installed at NSF centers User training and support Large published simulations 4/23/2007 CharmWorkshop

27 NAMD Design Designed from the beginning as a parallel program Uses the Charm++ idea: Decompose the computation into a large number of objects Have an Intelligent Run-time system (of Charm++) assign objects to processors for dynamic load balancing Hybrid of spatial and force decomposition: Spatial decomposition of atoms into cubes (called patches) For every pair of interacting patches, create one object for calculating electrostatic interactions Recent: Blue Matter, Desmond, etc. use this idea in some form 4/23/2007 CharmWorkshop

28 NAMD Parallelization using Charm++ Example Configuration 847 VPs 100,000 VPs 108 VPs These 100,000 Objects (virtual processors, or VPs) are assigned to real processors by the Charm++ runtime system 4/23/2007 CharmWorkshop

29 Performance on BlueGene/L Simulation Rate in Nanoseconds Per Day Processors IAPP (5.5K atoms) LYSOZYME (40K atoms) APOA1 (92K atoms) ATPase (327K atoms) STMV (1M atoms) BAR d. (1.3M atoms) IAPP simulation (Rivera, Straub, BU) at 20 ns per day on 256 processors 1 us in 50 days STMV simulation at 6.65 ns per day on 20,000 processors 4/23/2007 CharmWorkshop

30 Comparison with Blue Matter Nodes ApoLipoprotein-A1 (92K atoms) Blue Matter (SC 06) ms/step NAMD ms/step NAMD (Virtual Node) ms/step NAMD is about 1.8 times faster than Blue Matter on 1024 nodes (and 2.4 times faster with VN mode, where NAMD can use both processors on a node effectively). However: Note that NAMD does PME every 4 steps. 4/23/2007 CharmWorkshop

31 Performance on Cray XT3 100 Simulation Rate in Nanoseconds Per Day Processors IAPP (5.5K atoms) LYSOZYME(40K atoms) APOA1 (92K atoms) ATPASE (327K atoms) STMV (1M atoms) BAR d. (1.3M atoms) RIBOSOME (2.8M atoms) 4/23/2007 CharmWorkshop

32 Computational Cosmology N body Simulation (NSF) N particles (1 million to 1 billion), in a periodic box Move under gravitation Organized in a tree (oct, binary (k-d),..) Output data Analysis: in parallel (NASA) Particles are read in parallel Interactive Analysis Issues: Load balancing, fine-grained communication, tolerating communication latencies. Multiple-time stepping New Code Released: ChaNGa Collaboration with T. Quinn (Univ. of Washington) UofI Team: Filippo Giaochin, Pritish Jetley, Celso Mendes 4/23/2007 CharmWorkshop

33 ChanGA Load Balancing Challenge: Trade-off between communication and balance 4/23/2007 CharmWorkshop

34 Recent Sucesses in Scaling ChaNGa Number of Processors x Execution Time e e drgas lambb dwf hrwh_lcdms dwarf Execution Time Scaling Number of Processors 4/23/2007 CharmWorkshop

35 Car-Parinello MD Quantum Chemistry: LeanCP Illustrates utility of separating decomposition and mapping Very complex set of objects and interactions Excellent scaling achieved Collaboration with Glenn Martyna (IBM), Mark Tuckerman (NYU) UofI team: Eric Bohm, Abhinav Bhatele 4/23/2007 CharmWorkshop

36 LeanCP Decomposition 4/23/2007 CharmWorkshop

37 LeanCP Scaling 4/23/2007 CharmWorkshop

38 Space-time meshing Discontinuous Galerkin method Tent-pitcher algorithm Collaboration with Bob Haber, Jeff Ericsson, Michael Garland PPL team: Aaron Baker, Sayantan Chakravorty, Terry Wilmarth 4/23/2007 CharmWorkshop

39 4/23/2007 CharmWorkshop

40 Dynamic, coupled physics simulation in 3D Finite-element solids on unstructured tet mesh Finite-volume fluids on structured hex mesh Rocket Simulation Coupling every timestep via a least-squares data transfer Challenges: Multiple modules Dynamic behavior: burning surface, mesh adaptation Robert Fielder, Center for Simulation of Advanced Rockets Collaboration with M. Heath, P. Geubelle, others 4/23/2007 CharmWorkshop

41 Dynamic load balancing in Crack Propagation 4/23/2007 CharmWorkshop

42 Colony: FAST-OS Project DOE funded collaboration Terry Jones: LLNL Jose Moreira, et al IBM At Illinois: supports Scalable Dynamic Load Balancing Fault tolerance 4/23/2007 CharmWorkshop

43 Colony Project Overview Collaborators Lawrence Livermore National Laboratory Terry Jones University of Illinois at Urbana-Champaign Laxmikant Kale Celso Mendes Sayantan Chakravorty International Business Machines Jose Moreira Andrew Tauferner Todd Inglett Title Services and Interfaces to Support Systems with Very Large Numbers of Processors Topics Parallel Resource Instrumentation Framework Scalable Load Balancing OS mechanisms for Migration Processor Virtualization for Fault Tolerance Single system management space Parallel Awareness and Coordinated Scheduling of Services Linux OS for cellular architecture 4/23/2007 CharmWorkshop

44 Load Balancing on Very Large Machines Existing load balancing strategies don t scale on extremely large machines Consider an application with 1M objects on 64K processors Centralized Object load data are sent to processor 0 Integrate to a complete object graph Migration decision is broadcast from processor 0 Global barrier Distributed Load balancing among neighboring processors Build partial object graph Migration decision is sent to its neighbors No global barrier 4/23/2007 CharmWorkshop

45 A Hybrid Load Balancing Strategy Dividing processors into independent sets of groups, and groups are organized in hierarchies (decentralized) Each group has a leader (the central node) which performs centralized load balancing A particular hybrid strategy that works well Gengbin Zheng, PhD Thesis, /23/2007 CharmWorkshop

46 Fault Tolerance Automatic Checkpointing Migrate objects to disk In-memory checkpointing as an option Automatic fault detection and restart Proactive Fault Tolerance Impending Fault Response Migrate objects to other processors Adjust processor-level parallel data structures Scalable fault tolerance When a processor out of 100,000 fails, all 99,999 shouldn t have to run back to their checkpoints! Sender-side message logging Latency tolerance helps mitigate costs Restart can be speeded up by spreading out objects from failed processor 4/23/2007 CharmWorkshop

47 BigSim Simulating very large parallel machines Using smaller parallel machines Reasons Predict performance on future machines Predict performance obstacles for future machines Do performance tuning on existing machines that are difficult to get allocations on Idea: Emulation run using virtual processor processors (AMPI) Get traces Detailed machine simulation using traces 4/23/2007 CharmWorkshop

48 Objectives and Simualtion Model Objectives: Develop techniques to facilitate the development of efficient peta-scale applications Based on performance prediction of applications on large simulated parallel machines Simulation-based Performance Prediction: Focus on Charm++ and AMPI programming models Performance prediction based on PDES Supports varying levels of fidelity processor prediction, network prediction. Modes of execution : online and post-mortem mode 4/23/2007 CharmWorkshop

49 Big Network Simulation Simulate network behavior: packetization, routing, contention, etc. Incorporate with post-mortem simulation Switches are connected in torus network BGSIM Emulator POSE Timestamp Correction BG Log Files (tasks & dependencies) BigNetSim Timestamp-corrected Tasks 4/23/2007 CharmWorkshop

50 Projections: Performance visualization 4/23/2007 CharmWorkshop

51 Architecture of BigNetSim 4/23/2007 CharmWorkshop

52 Performance Prediction (contd.) Predicting time of sequential code: User supplied time for every code block Wall-clock measurements on simulating machine can be used via a suitable multiplier Hardware performance counters to count floating point, integer, branch instructions, etc Cache performance and memory footprint are approximated by percentage of memory accesses and cache hit/miss ratio Instruction level simulation (not implemented) Predicting Network performance: No contention, time based on topology & other network parameters Back-patching, modifies comm time using amount of comm activity Network-simulation, modelling the netowrk entirely 4/23/2007 CharmWorkshop

53 Multi-Cluster Co-Scheduling Cluster A Intra-cluster latency (microseconds) Cluster B Inter-cluster latency (milliseconds) Job co-scheduled to run across two clusters to provide access to large numbers of processors But cross cluster latencies are large! Virtualization within Charm++ masks high inter-cluster latency by allowing overlap of communication with computation 4/23/2007 CharmWorkshop

54 Multi-Cluster Co-Scheduling 4/23/2007 CharmWorkshop

55 Faucets: Optimizing Utilization Within/across Clusters Job Specs Cluster Job Specs File Upload Job Id Job Submission Bids File Upload Cluster Job Id Job Monitor Cluster 4/23/2007 CharmWorkshop

56 Other Ongoing Projects Parallel Debugger Automatic out-of-core execution Parallel algorithms Current: Prim s spanning tree algorithm, sorting,.. New collaborations being explored Prof. Paulino, Prof. Pantano,.. 4/23/2007 CharmWorkshop

57 Domain Specific Frameworks Motivation Reduce tedium of parallel programming for commonly used paradigms and parallel data structures Encapsulate parallel data structures and algorithms Provide easy to use interface Used to build concurrenltly composible parallel modules Frameworks Unstructured Meshes:ParFUM Generalized ghost regions Used in Rocfrac, Rocflu at rocket center, and Outside CSAR Fast collision detection Multiblock framework Structured Meshes Automates communication AMR Common for both above Particles Multiphase flows MD, tree codes 4/23/2007 CharmWorkshop

58 Summary and Messages We at PPL have advanced migratable objects technology We are committed to supporting applications We grow our base of reusable techniques via such collaborations Try using our technology: AMPI, Charm++, Faucets, ParFUM,.. Available via the web charm.cs.uiuc.edu 4/23/2007 CharmWorkshop

59 Parallel Programming Laboratory Sr.STAFF IBM PERCS High Productivity GRANTS NSF: ITR Chemistry Car- Parinello MD, QM/MM NCSA Faculty Fellows Program NSF: ITR, NASA Computational Cosmology and Visualization DOE HPC-Colony Services and Interfaces for Large Computers NIH Biophysics NAMD DOE CSAR Rocket Simulation NSF: ITR CPSD Space / Time Meshing NSF Next Generation Software BlueGene ENABLING PROJECTS Faucets: Dynamic Resource Management for Grids Charm++ and Converse Load-Balance: Centralized, Distributed, Hybrid AMPI Adaptive MPI Fault-Tolerance: Checkpointing, Fault-Recovery, Proc.Evacuation Projections: Performance Analysis ParFUM: Supporting Unstructured Meshes (Comp.Geometry) Orchestration and Parallel Languages BigSim: Simulating Big Machines and Networks 4/23/2007 CharmWorkshop

60 Keynote: Kathy Yelick PGAS Languages and Beyond System progress talks Adaptive MPI BigSim: Performance prediction Scalable Performance Analysis Fault Tolerance Over the next two days Applications Molecular Dynamics Quantum Chemistry (LeanCP) Computational Cosmology Rocket Simulation Cell Processor Grid Multi-cluster applications Charm++ AMPI Tutorials Projections BigSim 4/23/2007 CharmWorkshop

Scalable Fault Tolerance Schemes using Adaptive Runtime Support

Scalable Fault Tolerance Schemes using Adaptive Runtime Support Scalable Fault Tolerance Schemes using Adaptive Runtime Support Laxmikant (Sanjay) Kale http://charm.cs.uiuc.edu Parallel Programming Laboratory Department of Computer Science University of Illinois at

More information

A Parallel-Object Programming Model for PetaFLOPS Machines and BlueGene/Cyclops Gengbin Zheng, Arun Singla, Joshua Unger, Laxmikant Kalé

A Parallel-Object Programming Model for PetaFLOPS Machines and BlueGene/Cyclops Gengbin Zheng, Arun Singla, Joshua Unger, Laxmikant Kalé A Parallel-Object Programming Model for PetaFLOPS Machines and BlueGene/Cyclops Gengbin Zheng, Arun Singla, Joshua Unger, Laxmikant Kalé Parallel Programming Laboratory Department of Computer Science University

More information

Scalable Interaction with Parallel Applications

Scalable Interaction with Parallel Applications Scalable Interaction with Parallel Applications Filippo Gioachin Chee Wai Lee Laxmikant V. Kalé Department of Computer Science University of Illinois at Urbana-Champaign Outline Overview Case Studies Charm++

More information

Scalable Dynamic Adaptive Simulations with ParFUM

Scalable Dynamic Adaptive Simulations with ParFUM Scalable Dynamic Adaptive Simulations with ParFUM Terry L. Wilmarth Center for Simulation of Advanced Rockets and Parallel Programming Laboratory University of Illinois at Urbana-Champaign The Big Picture

More information

A CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS

A CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS A CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS Abhinav Bhatele, Eric Bohm, Laxmikant V. Kale Parallel Programming Laboratory Euro-Par 2009 University of Illinois at Urbana-Champaign

More information

Adaptive Runtime Support

Adaptive Runtime Support Scalable Fault Tolerance Schemes using Adaptive Runtime Support Laxmikant (Sanjay) Kale http://charm.cs.uiuc.edu Parallel Programming Laboratory Department of Computer Science University of Illinois at

More information

Abhinav Bhatele, Laxmikant V. Kale University of Illinois at Urbana Champaign Sameer Kumar IBM T. J. Watson Research Center

Abhinav Bhatele, Laxmikant V. Kale University of Illinois at Urbana Champaign Sameer Kumar IBM T. J. Watson Research Center Abhinav Bhatele, Laxmikant V. Kale University of Illinois at Urbana Champaign Sameer Kumar IBM T. J. Watson Research Center Motivation: Contention Experiments Bhatele, A., Kale, L. V. 2008 An Evaluation

More information

Charm++ for Productivity and Performance

Charm++ for Productivity and Performance Charm++ for Productivity and Performance A Submission to the 2011 HPC Class II Challenge Laxmikant V. Kale Anshu Arya Abhinav Bhatele Abhishek Gupta Nikhil Jain Pritish Jetley Jonathan Lifflander Phil

More information

Presented by: Terry L. Wilmarth

Presented by: Terry L. Wilmarth C h a l l e n g e s i n D y n a m i c a l l y E v o l v i n g M e s h e s f o r L a r g e - S c a l e S i m u l a t i o n s Presented by: Terry L. Wilmarth Parallel Programming Laboratory and Center for

More information

Welcome to the 2017 Charm++ Workshop!

Welcome to the 2017 Charm++ Workshop! Welcome to the 2017 Charm++ Workshop! Laxmikant (Sanjay) Kale http://charm.cs.illinois.edu Parallel Programming Laboratory Department of Computer Science University of Illinois at Urbana Champaign 2017

More information

Advanced Parallel Programming. Is there life beyond MPI?

Advanced Parallel Programming. Is there life beyond MPI? Advanced Parallel Programming Is there life beyond MPI? Outline MPI vs. High Level Languages Declarative Languages Map Reduce and Hadoop Shared Global Address Space Languages Charm++ ChaNGa ChaNGa on GPUs

More information

HANDLING LOAD IMBALANCE IN DISTRIBUTED & SHARED MEMORY

HANDLING LOAD IMBALANCE IN DISTRIBUTED & SHARED MEMORY HANDLING LOAD IMBALANCE IN DISTRIBUTED & SHARED MEMORY Presenters: Harshitha Menon, Seonmyeong Bak PPL Group Phil Miller, Sam White, Nitin Bhat, Tom Quinn, Jim Phillips, Laxmikant Kale MOTIVATION INTEGRATED

More information

Using GPUs to compute the multilevel summation of electrostatic forces

Using GPUs to compute the multilevel summation of electrostatic forces Using GPUs to compute the multilevel summation of electrostatic forces David J. Hardy Theoretical and Computational Biophysics Group Beckman Institute for Advanced Science and Technology University of

More information

Charm++: An Asynchronous Parallel Programming Model with an Intelligent Adaptive Runtime System

Charm++: An Asynchronous Parallel Programming Model with an Intelligent Adaptive Runtime System Charm++: An Asynchronous Parallel Programming Model with an Intelligent Adaptive Runtime System Laxmikant (Sanjay) Kale http://charm.cs.illinois.edu Parallel Programming Laboratory Department of Computer

More information

Using BigSim to Estimate Application Performance

Using BigSim to Estimate Application Performance October 19, 2010 Using BigSim to Estimate Application Performance Ryan Mokos Parallel Programming Laboratory University of Illinois at Urbana-Champaign Outline Overview BigSim Emulator BigSim Simulator

More information

NAMD: A Portable and Highly Scalable Program for Biomolecular Simulations

NAMD: A Portable and Highly Scalable Program for Biomolecular Simulations NAMD: A Portable and Highly Scalable Program for Biomolecular Simulations Abhinav Bhatele 1, Sameer Kumar 3, Chao Mei 1, James C. Phillips 2, Gengbin Zheng 1, Laxmikant V. Kalé 1 1 Department of Computer

More information

NAMD Serial and Parallel Performance

NAMD Serial and Parallel Performance NAMD Serial and Parallel Performance Jim Phillips Theoretical Biophysics Group Serial performance basics Main factors affecting serial performance: Molecular system size and composition. Cutoff distance

More information

Load Balancing Techniques for Asynchronous Spacetime Discontinuous Galerkin Methods

Load Balancing Techniques for Asynchronous Spacetime Discontinuous Galerkin Methods Load Balancing Techniques for Asynchronous Spacetime Discontinuous Galerkin Methods Aaron K. Becker (abecker3@illinois.edu) Robert B. Haber Laxmikant V. Kalé University of Illinois, Urbana-Champaign Parallel

More information

Enzo-P / Cello. Formation of the First Galaxies. San Diego Supercomputer Center. Department of Physics and Astronomy

Enzo-P / Cello. Formation of the First Galaxies. San Diego Supercomputer Center. Department of Physics and Astronomy Enzo-P / Cello Formation of the First Galaxies James Bordner 1 Michael L. Norman 1 Brian O Shea 2 1 University of California, San Diego San Diego Supercomputer Center 2 Michigan State University Department

More information

IS TOPOLOGY IMPORTANT AGAIN? Effects of Contention on Message Latencies in Large Supercomputers

IS TOPOLOGY IMPORTANT AGAIN? Effects of Contention on Message Latencies in Large Supercomputers IS TOPOLOGY IMPORTANT AGAIN? Effects of Contention on Message Latencies in Large Supercomputers Abhinav S Bhatele and Laxmikant V Kale ACM Research Competition, SC 08 Outline Why should we consider topology

More information

Dr. Gengbin Zheng and Ehsan Totoni. Parallel Programming Laboratory University of Illinois at Urbana-Champaign

Dr. Gengbin Zheng and Ehsan Totoni. Parallel Programming Laboratory University of Illinois at Urbana-Champaign Dr. Gengbin Zheng and Ehsan Totoni Parallel Programming Laboratory University of Illinois at Urbana-Champaign April 18, 2011 A function level simulator for parallel applications on peta scale machine An

More information

A Scalable Adaptive Mesh Refinement Framework For Parallel Astrophysics Applications

A Scalable Adaptive Mesh Refinement Framework For Parallel Astrophysics Applications A Scalable Adaptive Mesh Refinement Framework For Parallel Astrophysics Applications James Bordner, Michael L. Norman San Diego Supercomputer Center University of California, San Diego 15th SIAM Conference

More information

Refactoring NAMD for Petascale Machines and Graphics Processors

Refactoring NAMD for Petascale Machines and Graphics Processors Refactoring NAMD for Petascale Machines and Graphics Processors James Phillips Research/namd/ NAMD Design Designed from the beginning as a parallel program Uses the Charm++ idea: Decompose the computation

More information

Overcoming Scaling Challenges in Biomolecular Simulations across Multiple Platforms

Overcoming Scaling Challenges in Biomolecular Simulations across Multiple Platforms Overcoming Scaling Challenges in Biomolecular Simulations across Multiple Platforms Abhinav Bhatelé, Sameer Kumar, Chao Mei, James C. Phillips, Gengbin Zheng, Laxmikant V. Kalé Department of Computer Science

More information

Programming Models for Supercomputing in the Era of Multicore

Programming Models for Supercomputing in the Era of Multicore Programming Models for Supercomputing in the Era of Multicore Marc Snir MULTI-CORE CHALLENGES 1 Moore s Law Reinterpreted Number of cores per chip doubles every two years, while clock speed decreases Need

More information

Scalable Topology Aware Object Mapping for Large Supercomputers

Scalable Topology Aware Object Mapping for Large Supercomputers Scalable Topology Aware Object Mapping for Large Supercomputers Abhinav Bhatele (bhatele@illinois.edu) Department of Computer Science, University of Illinois at Urbana-Champaign Advisor: Laxmikant V. Kale

More information

Enzo-P / Cello. Scalable Adaptive Mesh Refinement for Astrophysics and Cosmology. San Diego Supercomputer Center. Department of Physics and Astronomy

Enzo-P / Cello. Scalable Adaptive Mesh Refinement for Astrophysics and Cosmology. San Diego Supercomputer Center. Department of Physics and Astronomy Enzo-P / Cello Scalable Adaptive Mesh Refinement for Astrophysics and Cosmology James Bordner 1 Michael L. Norman 1 Brian O Shea 2 1 University of California, San Diego San Diego Supercomputer Center 2

More information

Charm++ Workshop 2010

Charm++ Workshop 2010 Charm++ Workshop 2010 Eduardo R. Rodrigues Institute of Informatics Federal University of Rio Grande do Sul - Brazil ( visiting scholar at CS-UIUC ) errodrigues@inf.ufrgs.br Supported by Brazilian Ministry

More information

HPC Colony: Linux at Large Node Counts

HPC Colony: Linux at Large Node Counts UCRL-TR-233689 HPC Colony: Linux at Large Node Counts T. Jones, A. Tauferner, T. Inglett, A. Sidelnik August 14, 2007 Disclaimer This document was prepared as an account of work sponsored by an agency

More information

c Copyright by Gengbin Zheng, 2005

c Copyright by Gengbin Zheng, 2005 c Copyright by Gengbin Zheng, 2005 ACHIEVING HIGH PERFORMANCE ON EXTREMELY LARGE PARALLEL MACHINES: PERFORMANCE PREDICTION AND LOAD BALANCING BY GENGBIN ZHENG B.S., Beijing University, China, 1995 M.S.,

More information

NAMD at Extreme Scale. Presented by: Eric Bohm Team: Eric Bohm, Chao Mei, Osman Sarood, David Kunzman, Yanhua, Sun, Jim Phillips, John Stone, LV Kale

NAMD at Extreme Scale. Presented by: Eric Bohm Team: Eric Bohm, Chao Mei, Osman Sarood, David Kunzman, Yanhua, Sun, Jim Phillips, John Stone, LV Kale NAMD at Extreme Scale Presented by: Eric Bohm Team: Eric Bohm, Chao Mei, Osman Sarood, David Kunzman, Yanhua, Sun, Jim Phillips, John Stone, LV Kale Overview NAMD description Power7 Tuning Support for

More information

CHRONO::HPC DISTRIBUTED MEMORY FLUID-SOLID INTERACTION SIMULATIONS. Felipe Gutierrez, Arman Pazouki, and Dan Negrut University of Wisconsin Madison

CHRONO::HPC DISTRIBUTED MEMORY FLUID-SOLID INTERACTION SIMULATIONS. Felipe Gutierrez, Arman Pazouki, and Dan Negrut University of Wisconsin Madison CHRONO::HPC DISTRIBUTED MEMORY FLUID-SOLID INTERACTION SIMULATIONS Felipe Gutierrez, Arman Pazouki, and Dan Negrut University of Wisconsin Madison Support: Rapid Innovation Fund, U.S. Army TARDEC ASME

More information

PICS - a Performance-analysis-based Introspective Control System to Steer Parallel Applications

PICS - a Performance-analysis-based Introspective Control System to Steer Parallel Applications PICS - a Performance-analysis-based Introspective Control System to Steer Parallel Applications Yanhua Sun, Laxmikant V. Kalé University of Illinois at Urbana-Champaign sun51@illinois.edu November 26,

More information

Dynamic Load Balancing for Weather Models via AMPI

Dynamic Load Balancing for Weather Models via AMPI Dynamic Load Balancing for Eduardo R. Rodrigues IBM Research Brazil edrodri@br.ibm.com Celso L. Mendes University of Illinois USA cmendes@ncsa.illinois.edu Laxmikant Kale University of Illinois USA kale@cs.illinois.edu

More information

ITR/AP: Multiscale Models for Microstructure Simulation and Process Design

ITR/AP: Multiscale Models for Microstructure Simulation and Process Design ITR/AP: Multiscale Models for Microstructure Simulation and Process Design Principal Investigators: Bob Haber (Theor.. & Applied Mechs.), Jonathan Dantzig (Mech.. & Ind. Engng.), Duane Johnson (Matl..

More information

Strong Scaling for Molecular Dynamics Applications

Strong Scaling for Molecular Dynamics Applications Strong Scaling for Molecular Dynamics Applications Scaling and Molecular Dynamics Strong Scaling How does solution time vary with the number of processors for a fixed problem size Classical Molecular Dynamics

More information

NAMD: Biomolecular Simulation on Thousands of Processors

NAMD: Biomolecular Simulation on Thousands of Processors NAMD: Biomolecular Simulation on Thousands of Processors James C. Phillips Gengbin Zheng Sameer Kumar Laxmikant V. Kalé Abstract NAMD is a fully featured, production molecular dynamics program for high

More information

Toward Runtime Power Management of Exascale Networks by On/Off Control of Links

Toward Runtime Power Management of Exascale Networks by On/Off Control of Links Toward Runtime Power Management of Exascale Networks by On/Off Control of Links, Nikhil Jain, Laxmikant Kale University of Illinois at Urbana-Champaign HPPAC May 20, 2013 1 Power challenge Power is a major

More information

c Copyright by Sameer Kumar, 2005

c Copyright by Sameer Kumar, 2005 c Copyright by Sameer Kumar, 2005 OPTIMIZING COMMUNICATION FOR MASSIVELY PARALLEL PROCESSING BY SAMEER KUMAR B. Tech., Indian Institute Of Technology Madras, 1999 M.S., University of Illinois at Urbana-Champaign,

More information

TraceR: A Parallel Trace Replay Tool for Studying Interconnection Networks

TraceR: A Parallel Trace Replay Tool for Studying Interconnection Networks TraceR: A Parallel Trace Replay Tool for Studying Interconnection Networks Bilge Acun, PhD Candidate Department of Computer Science University of Illinois at Urbana- Champaign *Contributors: Nikhil Jain,

More information

Integrated Performance Views in Charm++: Projections Meets TAU

Integrated Performance Views in Charm++: Projections Meets TAU 1 Integrated Performance Views in Charm++: Projections Meets TAU Scott Biersdorff, Allen D. Malony Performance Research Laboratory Department of Computer Science University of Oregon, Eugene, OR, USA email:

More information

Topology and affinity aware hierarchical and distributed load-balancing in Charm++

Topology and affinity aware hierarchical and distributed load-balancing in Charm++ Topology and affinity aware hierarchical and distributed load-balancing in Charm++ Emmanuel Jeannot, Guillaume Mercier, François Tessier Inria - IPB - LaBRI - University of Bordeaux - Argonne National

More information

Accelerating Molecular Modeling Applications with Graphics Processors

Accelerating Molecular Modeling Applications with Graphics Processors Accelerating Molecular Modeling Applications with Graphics Processors John Stone Theoretical and Computational Biophysics Group University of Illinois at Urbana-Champaign Research/gpu/ SIAM Conference

More information

Recent Advances in Heterogeneous Computing using Charm++

Recent Advances in Heterogeneous Computing using Charm++ Recent Advances in Heterogeneous Computing using Charm++ Jaemin Choi, Michael Robson Parallel Programming Laboratory University of Illinois Urbana-Champaign April 12, 2018 1 / 24 Heterogeneous Computing

More information

An Orchestration Language for Parallel Objects

An Orchestration Language for Parallel Objects An Orchestration Language for Parallel Objects Laxmikant V. Kalé Department of Computer Science University of Illinois kale@cs.uiuc.edu Mark Hills Department of Computer Science University of Illinois

More information

Improving NAMD Performance on Volta GPUs

Improving NAMD Performance on Volta GPUs Improving NAMD Performance on Volta GPUs David Hardy - Research Programmer, University of Illinois at Urbana-Champaign Ke Li - HPC Developer Technology Engineer, NVIDIA John Stone - Senior Research Programmer,

More information

NAMD, CUDA, and Clusters: Taking GPU Molecular Dynamics Beyond the Deskop

NAMD, CUDA, and Clusters: Taking GPU Molecular Dynamics Beyond the Deskop NAMD, CUDA, and Clusters: Taking GPU Molecular Dynamics Beyond the Deskop James Phillips Research/gpu/ Beckman Institute University of Illinois at Urbana-Champaign Theoretical and Computational Biophysics

More information

Simulating Life at the Atomic Scale

Simulating Life at the Atomic Scale Simulating Life at the Atomic Scale James Phillips Beckman Institute, University of Illinois Research/namd/ Beckman Institute University of Illinois at Urbana-Champaign Theoretical and Computational Biophysics

More information

BigNetSim Tutorial. Presented by Gengbin Zheng & Eric Bohm. Parallel Programming Laboratory University of Illinois at Urbana-Champaign

BigNetSim Tutorial. Presented by Gengbin Zheng & Eric Bohm. Parallel Programming Laboratory University of Illinois at Urbana-Champaign BigNetSim Tutorial Presented by Gengbin Zheng & Eric Bohm Parallel Programming Laboratory University of Illinois at Urbana-Champaign Outline Overview BigSim Emulator Charm++ on the Emulator Simulation

More information

Simulation-Based Performance Prediction for Large Parallel Machines

Simulation-Based Performance Prediction for Large Parallel Machines Simulation-Based Performance Prediction for Large Parallel Machines Gengbin Zheng, Terry Wilmarth, Praveen Jagadishprasad, Laxmikant V. Kalé Dept. of Computer Science University of Illinois at Urbana-Champaign

More information

Designing Parallel Programs. This review was developed from Introduction to Parallel Computing

Designing Parallel Programs. This review was developed from Introduction to Parallel Computing Designing Parallel Programs This review was developed from Introduction to Parallel Computing Author: Blaise Barney, Lawrence Livermore National Laboratory references: https://computing.llnl.gov/tutorials/parallel_comp/#whatis

More information

Chapter 18: Database System Architectures.! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems!

Chapter 18: Database System Architectures.! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Chapter 18: Database System Architectures! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types 18.1 Centralized Systems! Run on a single computer system and

More information

Memory Tagging in Charm++

Memory Tagging in Charm++ Memory Tagging in Charm++ Filippo Gioachin Department of Computer Science University of Illinois at Urbana-Champaign gioachin@uiuc.edu Laxmikant V. Kalé Department of Computer Science University of Illinois

More information

Charm++ on the Cell Processor

Charm++ on the Cell Processor Charm++ on the Cell Processor David Kunzman, Gengbin Zheng, Eric Bohm, Laxmikant V. Kale 1 Motivation Cell Processor (CBEA) is powerful (peak-flops) Allow Charm++ applications utilize the Cell processor

More information

A Case Study in Tightly Coupled Multi-paradigm Parallel Programming

A Case Study in Tightly Coupled Multi-paradigm Parallel Programming A Case Study in Tightly Coupled Multi-paradigm Parallel Programming Sayantan Chakravorty 1, Aaron Becker 1, Terry Wilmarth 2 and Laxmikant Kalé 1 1 Department of Computer Science, University of Illinois

More information

NAMD Performance Benchmark and Profiling. November 2010

NAMD Performance Benchmark and Profiling. November 2010 NAMD Performance Benchmark and Profiling November 2010 Note The following research was performed under the HPC Advisory Council activities Participating vendors: HP, Mellanox Compute resource - HPC Advisory

More information

Chapter 20: Database System Architectures

Chapter 20: Database System Architectures Chapter 20: Database System Architectures Chapter 20: Database System Architectures Centralized and Client-Server Systems Server System Architectures Parallel Systems Distributed Systems Network Types

More information

Preliminary Evaluation of a Parallel Trace Replay Tool for HPC Network Simulations

Preliminary Evaluation of a Parallel Trace Replay Tool for HPC Network Simulations Preliminary Evaluation of a Parallel Trace Replay Tool for HPC Network Simulations Bilge Acun 1, Nikhil Jain 1, Abhinav Bhatele 2, Misbah Mubarak 3, Christopher D. Carothers 4, and Laxmikant V. Kale 1

More information

ScalaIOTrace: Scalable I/O Tracing and Analysis

ScalaIOTrace: Scalable I/O Tracing and Analysis ScalaIOTrace: Scalable I/O Tracing and Analysis Karthik Vijayakumar 1, Frank Mueller 1, Xiaosong Ma 1,2, Philip C. Roth 2 1 Department of Computer Science, NCSU 2 Computer Science and Mathematics Division,

More information

Presenting: Comparing the Power and Performance of Intel's SCC to State-of-the-Art CPUs and GPUs

Presenting: Comparing the Power and Performance of Intel's SCC to State-of-the-Art CPUs and GPUs Presenting: Comparing the Power and Performance of Intel's SCC to State-of-the-Art CPUs and GPUs A paper comparing modern architectures Joakim Skarding Christian Chavez Motivation Continue scaling of performance

More information

Scheduling Strategies for HPC as a Service (HPCaaS) for Bio-Science Applications

Scheduling Strategies for HPC as a Service (HPCaaS) for Bio-Science Applications Scheduling Strategies for HPC as a Service (HPCaaS) for Bio-Science Applications Sep 2009 Gilad Shainer, Tong Liu (Mellanox); Jeffrey Layton (Dell); Joshua Mora (AMD) High Performance Interconnects for

More information

Reducing Communication Costs Associated with Parallel Algebraic Multigrid

Reducing Communication Costs Associated with Parallel Algebraic Multigrid Reducing Communication Costs Associated with Parallel Algebraic Multigrid Amanda Bienz, Luke Olson (Advisor) University of Illinois at Urbana-Champaign Urbana, IL 11 I. PROBLEM AND MOTIVATION Algebraic

More information

Center Extreme Scale CS Research

Center Extreme Scale CS Research Center Extreme Scale CS Research Center for Compressible Multiphase Turbulence University of Florida Sanjay Ranka Herman Lam Outline 10 6 10 7 10 8 10 9 cores Parallelization and UQ of Rocfun and CMT-Nek

More information

Memory Tagging in Charm++

Memory Tagging in Charm++ Memory Tagging in Charm++ Filippo Gioachin Department of Computer Science University of Illinois at Urbana-Champaign gioachin@uiuc.edu Laxmikant V. Kalé Department of Computer Science University of Illinois

More information

Leveraging Flash in HPC Systems

Leveraging Flash in HPC Systems Leveraging Flash in HPC Systems IEEE MSST June 3, 2015 This work was performed under the auspices of the U.S. Department of Energy by under Contract DE-AC52-07NA27344. Lawrence Livermore National Security,

More information

Flexible Hardware Mapping for Finite Element Simulations on Hybrid CPU/GPU Clusters

Flexible Hardware Mapping for Finite Element Simulations on Hybrid CPU/GPU Clusters Flexible Hardware Mapping for Finite Element Simulations on Hybrid /GPU Clusters Aaron Becker (abecker3@illinois.edu) Isaac Dooley Laxmikant Kale SAAHPC, July 30 2009 Champaign-Urbana, IL Target Application

More information

Generic finite element capabilities for forest-of-octrees AMR

Generic finite element capabilities for forest-of-octrees AMR Generic finite element capabilities for forest-of-octrees AMR Carsten Burstedde joint work with Omar Ghattas, Tobin Isaac Institut für Numerische Simulation (INS) Rheinische Friedrich-Wilhelms-Universität

More information

Memory Tagging in Charm++

Memory Tagging in Charm++ Memory Tagging in Charm++ Filippo Gioachin Laxmikant V. Kalé Department of Computer Science University of Illinois at Urbana Champaign Outline Overview Memory tagging Charm++ RTS CharmDebug Charm++ memory

More information

BigSim: A Parallel Simulator for Performance Prediction of Extremely Large Parallel Machines

BigSim: A Parallel Simulator for Performance Prediction of Extremely Large Parallel Machines BigSim: A Parallel Simulator for Performance Prediction of Extremely Large Parallel Machines Gengbin Zheng, Gunavardhan Kakulapati, Laxmikant V. Kalé Dept. of Computer Science University of Illinois at

More information

NAMD Performance Benchmark and Profiling. January 2015

NAMD Performance Benchmark and Profiling. January 2015 NAMD Performance Benchmark and Profiling January 2015 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox Compute resource

More information

Introducing Overdecomposition to Existing Applications: PlasComCM and AMPI

Introducing Overdecomposition to Existing Applications: PlasComCM and AMPI Introducing Overdecomposition to Existing Applications: PlasComCM and AMPI Sam White Parallel Programming Lab UIUC 1 Introduction How to enable Overdecomposition, Asynchrony, and Migratability in existing

More information

c Copyright by Milind A. Bhandarkar, 2002

c Copyright by Milind A. Bhandarkar, 2002 c Copyright by Milind A. Bhandarkar, 2002 CHARISMA: A COMPONENT ARCHITECTURE FOR PARALLEL PROGRAMMING BY MILIND A. BHANDARKAR M.S., Birla Institute of Technology and Science, Pilani, 1989 M.Tech., Indian

More information

NAMD GPU Performance Benchmark. March 2011

NAMD GPU Performance Benchmark. March 2011 NAMD GPU Performance Benchmark March 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Dell, Intel, Mellanox Compute resource - HPC Advisory

More information

Scalable In-memory Checkpoint with Automatic Restart on Failures

Scalable In-memory Checkpoint with Automatic Restart on Failures Scalable In-memory Checkpoint with Automatic Restart on Failures Xiang Ni, Esteban Meneses, Laxmikant V. Kalé Parallel Programming Laboratory University of Illinois at Urbana-Champaign November, 2012 8th

More information

NAMD Performance Benchmark and Profiling. February 2012

NAMD Performance Benchmark and Profiling. February 2012 NAMD Performance Benchmark and Profiling February 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource -

More information

Projections A Performance Tool for Charm++ Applications

Projections A Performance Tool for Charm++ Applications Projections A Performance Tool for Charm++ Applications Chee Wai Lee cheelee@uiuc.edu Parallel Programming Laboratory Dept. of Computer Science University of Illinois at Urbana Champaign http://charm.cs.uiuc.edu

More information

Cycle Sharing Systems

Cycle Sharing Systems Cycle Sharing Systems Jagadeesh Dyaberi Dependable Computing Systems Lab Purdue University 10/31/2005 1 Introduction Design of Program Security Communication Architecture Implementation Conclusion Outline

More information

NAMD: Biomolecular Simulation on Thousands of Processors

NAMD: Biomolecular Simulation on Thousands of Processors NAMD: Biomolecular Simulation on Thousands of Processors James Phillips Gengbin Zheng Laxmikant Kalé Abstract NAMD is a parallel, object-oriented molecular dynamics program designed for high performance

More information

CUDA GPGPU Workshop 2012

CUDA GPGPU Workshop 2012 CUDA GPGPU Workshop 2012 Parallel Programming: C thread, Open MP, and Open MPI Presenter: Nasrin Sultana Wichita State University 07/10/2012 Parallel Programming: Open MP, MPI, Open MPI & CUDA Outline

More information

Enabling Contiguous Node Scheduling on the Cray XT3

Enabling Contiguous Node Scheduling on the Cray XT3 Enabling Contiguous Node Scheduling on the Cray XT3 Chad Vizino vizino@psc.edu Pittsburgh Supercomputing Center ABSTRACT: The Pittsburgh Supercomputing Center has enabled contiguous node scheduling to

More information

Preliminary Experiences with the Uintah Framework on on Intel Xeon Phi and Stampede

Preliminary Experiences with the Uintah Framework on on Intel Xeon Phi and Stampede Preliminary Experiences with the Uintah Framework on on Intel Xeon Phi and Stampede Qingyu Meng, Alan Humphrey, John Schmidt, Martin Berzins Thanks to: TACC Team for early access to Stampede J. Davison

More information

DEBUGGING LARGE SCALE APPLICATIONS WITH VIRTUALIZATION FILIPPO GIOACHIN DISSERTATION

DEBUGGING LARGE SCALE APPLICATIONS WITH VIRTUALIZATION FILIPPO GIOACHIN DISSERTATION c 2010 FILIPPO GIOACHIN DEBUGGING LARGE SCALE APPLICATIONS WITH VIRTUALIZATION BY FILIPPO GIOACHIN DISSERTATION Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy

More information

In-Situ Visualization and Analysis of Petascale Molecular Dynamics Simulations with VMD

In-Situ Visualization and Analysis of Petascale Molecular Dynamics Simulations with VMD In-Situ Visualization and Analysis of Petascale Molecular Dynamics Simulations with VMD John Stone Theoretical and Computational Biophysics Group Beckman Institute for Advanced Science and Technology University

More information

Lecture 23 Database System Architectures

Lecture 23 Database System Architectures CMSC 461, Database Management Systems Spring 2018 Lecture 23 Database System Architectures These slides are based on Database System Concepts 6 th edition book (whereas some quotes and figures are used

More information

Analyzing the Performance of IWAVE on a Cluster using HPCToolkit

Analyzing the Performance of IWAVE on a Cluster using HPCToolkit Analyzing the Performance of IWAVE on a Cluster using HPCToolkit John Mellor-Crummey and Laksono Adhianto Department of Computer Science Rice University {johnmc,laksono}@rice.edu TRIP Meeting March 30,

More information

c 2011 by Ehsan Totoni. All rights reserved.

c 2011 by Ehsan Totoni. All rights reserved. c 2011 by Ehsan Totoni. All rights reserved. SIMULATION-BASED PERFORMANCE ANALYSIS AND TUNING FOR FUTURE SUPERCOMPUTERS BY EHSAN TOTONI THESIS Submitted in partial fulfillment of the requirements for the

More information

Optimizing Fine-grained Communication in a Biomolecular Simulation Application on Cray XK6

Optimizing Fine-grained Communication in a Biomolecular Simulation Application on Cray XK6 Optimizing Fine-grained Communication in a Biomolecular Simulation Application on Cray XK6 Yanhua Sun, Gengbin Zheng, Chao Mei, Eric J. Bohm, James C. Phillips, Laxmikant V. Kalé University of Illinois

More information

IOS: A Middleware for Decentralized Distributed Computing

IOS: A Middleware for Decentralized Distributed Computing IOS: A Middleware for Decentralized Distributed Computing Boleslaw Szymanski Kaoutar El Maghraoui, Carlos Varela Department of Computer Science Rensselaer Polytechnic Institute http://www.cs.rpi.edu/wwc

More information

Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance

Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander*, Esteban Meneses, Harshitha Menon*, Phil Miller*, Sriram Krishnamoorthy, Laxmikant V. Kale* jliffl2@illinois.edu,

More information

Parallel and Distributed Systems. Programming Models. Why Parallel or Distributed Computing? What is a parallel computer?

Parallel and Distributed Systems. Programming Models. Why Parallel or Distributed Computing? What is a parallel computer? Parallel and Distributed Systems Instructor: Sandhya Dwarkadas Department of Computer Science University of Rochester What is a parallel computer? A collection of processing elements that communicate and

More information

HPX. High Performance ParalleX CCT Tech Talk Series. Hartmut Kaiser

HPX. High Performance ParalleX CCT Tech Talk Series. Hartmut Kaiser HPX High Performance CCT Tech Talk Hartmut Kaiser (hkaiser@cct.lsu.edu) 2 What s HPX? Exemplar runtime system implementation Targeting conventional architectures (Linux based SMPs and clusters) Currently,

More information

xsim The Extreme-Scale Simulator

xsim The Extreme-Scale Simulator www.bsc.es xsim The Extreme-Scale Simulator Janko Strassburg Severo Ochoa Seminar @ BSC, 28 Feb 2014 Motivation Future exascale systems are predicted to have hundreds of thousands of nodes, thousands of

More information

Detecting and Using Critical Paths at Runtime in Message Driven Parallel Programs

Detecting and Using Critical Paths at Runtime in Message Driven Parallel Programs Detecting and Using Critical Paths at Runtime in Message Driven Parallel Programs Isaac Dooley, Laxmikant V. Kale Department of Computer Science University of Illinois idooley2@illinois.edu kale@illinois.edu

More information

Molecular Dynamics and Quantum Mechanics Applications

Molecular Dynamics and Quantum Mechanics Applications Understanding the Performance of Molecular Dynamics and Quantum Mechanics Applications on Dell HPC Clusters High-performance computing (HPC) clusters are proving to be suitable environments for running

More information

HPC Algorithms and Applications

HPC Algorithms and Applications HPC Algorithms and Applications Dwarf #5 Structured Grids Michael Bader Winter 2012/2013 Dwarf #5 Structured Grids, Winter 2012/2013 1 Dwarf #5 Structured Grids 1. dense linear algebra 2. sparse linear

More information

Outline. Definition of a Distributed System Goals of a Distributed System Types of Distributed Systems

Outline. Definition of a Distributed System Goals of a Distributed System Types of Distributed Systems Distributed Systems Outline Definition of a Distributed System Goals of a Distributed System Types of Distributed Systems What Is A Distributed System? A collection of independent computers that appears

More information

High-Performance Scientific Computing

High-Performance Scientific Computing High-Performance Scientific Computing Instructor: Randy LeVeque TA: Grady Lemoine Applied Mathematics 483/583, Spring 2011 http://www.amath.washington.edu/~rjl/am583 World s fastest computers http://top500.org

More information

Communication and Topology-aware Load Balancing in Charm++ with TreeMatch

Communication and Topology-aware Load Balancing in Charm++ with TreeMatch Communication and Topology-aware Load Balancing in Charm++ with TreeMatch Joint lab 10th workshop (IEEE Cluster 2013, Indianapolis, IN) Emmanuel Jeannot Esteban Meneses-Rojas Guillaume Mercier François

More information

Visualizing Biomolecular Complexes on x86 and KNL Platforms: Integrating VMD and OSPRay

Visualizing Biomolecular Complexes on x86 and KNL Platforms: Integrating VMD and OSPRay Visualizing Biomolecular Complexes on x86 and KNL Platforms: Integrating VMD and OSPRay John E. Stone Theoretical and Computational Biophysics Group Beckman Institute for Advanced Science and Technology

More information

Scalability of Uintah Past Present and Future

Scalability of Uintah Past Present and Future DOE for funding the CSAFE project (97-10), DOE NETL, DOE NNSA NSF for funding via SDCI and PetaApps, INCITE, XSEDE Scalability of Uintah Past Present and Future Martin Berzins Qingyu Meng John Schmidt,

More information