Experiences with ENZO on the Intel R Many Integrated Core (Intel MIC) Architecture

Size: px
Start display at page:

Download "Experiences with ENZO on the Intel R Many Integrated Core (Intel MIC) Architecture"

Transcription

1 Experiences with ENZO on the Intel R Many Integrated Core (Intel MIC) Architecture 1 Introduction Robert Harkness National Institute for Computational Sciences Oak Ridge National Laboratory The National Institute for Computational Sciences (NICS) has deployed several Intel R Many Integrated Core (MIC) Knight s Ferry (KNF) platforms in cooperation with Intel R and Cray Inc. NICS initial focus has been on demonstrating several large-scale physics and chemistry codes on the Intel R KNF architecture and exploring different models of execution. Here we will describe the porting and testing process using ENZO-R, a version of the ENZO astrophysics code. ENZO applications running on petascale systems are approaching the limits of weak scaling mainly due to the limitations of reasonable cost and the need to complete simulations in a funding cycle time scale. The Intel R MIC architecture opens up new possibilities for improving strong scaling by increasing internal parallelism without the need to re-write the entire application code. 2 Programming Models for the ENZO Code ENZO is a complex astrophysics code for multi-scale and multi-physics applications using large (Eulerian) fixed meshes or multi-level adaptive mesh refinement (5; 6; 4; 3). Many of the largest ENZO applications are in cosmology where dark matter is treated as collisionless but gravitating particles. ENZO thus also contains particle-in-mesh type computations as well as standard hydrodynamical methods (PPM) (1) for normal matter gas dynamics. Cosmology simulations also require calculations of the global gravitational field due to the normal matter and dark matter. This requires a 3D FFT on the top-level scale mesh in both AMR and non-amr simulations. ENZO-R incorporates 3D flux-limited radiation diffusion coupled to ENZO hydrodynamics and chemistry. The radiation package is dependent upon solvers and preconditioners from the LLNL HYPRE package (2). The complete ENZO-R source code consists of approximately 250,000 lines of C, C++ and Fortran90, excluding MPI, OpenMP, HDF5, HYPRE and other third-party libraries. A full-scale cosmology simulation involves local computations, mainly of atomic physics processes, and global computations necessitating a considerable amount of long-range communication with MPI, with potentially large imbalances in per-node memory requirements (since the dark matter particles move freely in the domain due to gravity) and local CPU load imbalances due to AMR or nonlinear local interactions. In a cosmology application, ENZO-R requires approximately 25 physical fields defined on each mesh point, plus 12 dark matter or star particle attributes, per particle. At the largest scales, using up to meshes (usually with one dark matter particle per mesh point) these state data amount tens of TBytes and the entire working set can be four times larger still, resulting in aggregate memory requirements of more than 100TB. Additional physics capabilities such as MHD or frequency-dependent RHD bring further increases in the aggregate 1

2 memory requirement. Clearly, this is the defining issue with multi-dimensional, multi-scale, multiphysics codes in general: at the present time, no accelerator technology can provide enough local memory to conduct a full scale calculation without relying on the host system to accommodate as much as 90 per cent of the working set. In principle, parts of the calculation can be offloaded from the host system but the efficiency of this depends on the granularity of the tasks, the performance of the PCIe connection between the Intel R Xeon R and KNF card and the extent to which some part of the working set can reside in the local KNF memory. In practice, this is difficult to achieve. With a memory-bound code like ENZO, the clear preference is to make the entire working set resident on the Intel R MIC co-processor and to use the host merely for inter-node communication and for I/O to disk. The aggregate memory bandwidth available on accelerators is generally greater than that on the host, making a very strong argument in favor of keeping the entire working set resident on the accelerator. The Knight s Ferry cards in NICS systems have approximately 1 GByte of usable memory each and this severely restricts the scale of ENZO test problems compared to production systems which typically have 16 to 64 GBytes of memory per node. The vector model of computation is extremely well-suited to codes like ENZO. Most of the core physics components of ENZO are derived from codes originally developed for true vector multiprocessors such as the Cray YMP/C90/T90. Consequently, most of the computational work is already cleanly vectorized with moderate vector lengths (up to 256 elements). The essentially 3D data structures provide multiple opportunities for simultaneous vectorization and threadeding at more than one level. Strided and indexed memory loads and stores are unavoidable, however, and these operations can sometimes interact poorly with caching mechanisms. One of the many advantages of the Intel R MIC co-processor is that it can perform all of these indirect operations. 3 Migrating ENZO-R to the Intel MIC Architecture ENZO-R and one of its supporting codes have been migrated to the Intel R MIC architecture in both native and offload modes. The main emphasis has been on getting the codes and supporting libraries operational and generating correct results. ENZO-R is used for some of the largest simulations done on NSF and DOE supercomputers today (Cray XT5, XE6) so production-scale models are impossible. The goal is to generate some insight into how to implement ENZO on future very large scale systems utilizing Intel R MIC components. The fundamental choice is between the outside-in method where the ENZO state is mainly resident on the Intel R Xeon R host with offloading of parallel regions to the Intel R MIC using directives, and the inside-out method where the bulk of the ENZO state is resident on the Intel R MIC and the host is peripherally involved with communicating between (single or multiple) Intel R MICs on multiple nodes. ENZO is a large code with over 1,000 routines and it is dependent on several major third party libraries for numerical methods, communications and data handling. No single component of ENZO dominates the execution profile. Furthermore, in real applications, ENZO spends a significant fraction of the execution time in MPI communication and I/O to disk. These factors suggest that the standard CPU plus accelerator offload model stands little chance of success. The Intel R MIC software distribution supports the Intel R compilers for native and offload mode but currently does not provide any of the essential third-party library components needed to support large-scale scientific applications. Large-scale scientific applications are almost all criti- 2

3 cally dependent on MPI for inter-node communication on distributed-memory systems. Any truly large-scale application is likely to be designed to scale to tens of thousands of MPI tasks and it is often the case that MPI communication accounts for a major fraction of the overall simulation run time. Compilation of MPI for the Intel R Xeon R host using the offload mechanism is trivial but compiling MPI for single Intel R MIC, multiple Intel R MIC per node, and single or multiple Intel R MICs on multiple nodes requires cross-compilation. MPI uses configure scripts which require manual modification to achieve this. MPICH was used for the initial porting effort because even though it is now considered obsolete it is relatively simple to build and it presented a lower risk. ENZO is also dependent on the Hierarchical Data Format library (HDF or later) and again the main challenge is manual modification of the necessary configure scripts to enable cross-compilation. Other essential third-party libraries (SPRNG 2.0 and HYPRE 2.8.0) also require similar manual modifications to their configure scripts. With all these components in place for offload and native builds, the compilation of ENZO-R itself is straightforward. No ENZO-R source code modifications are required to build a functional ENZO-R binary in native mode although, clearly, some source code changes will be necessary for to achieve full optimization on the Intel R MIC co-processor. Although ENZO-R is already a hybrid code using OpenMP directives throughout, the extension of all of these parallel regions to offload regions is far from trivial in such a large code and the conversion to offload mode has not been completed at the time of writing. ENZO requires initial conditions for a simulation and for cosmology cases these are generated with ENZO-Inits. Like ENZO-R, this code contains a mix of C, C++ and Fortran90 but contains only 12,000 lines of code. ENZO-Inits has been implemented as a shared memory OpenMPthreaded code as well as a threaded hybrid MPI/OpenMP code. The shared memory OpenMPthreaded variant provides a reasonably complex test case for offload compilation. In comparison to native mode, the offload mode has proven to be quite difficult to use. Small errors in syntax or placement of offload directives tended to result in a system hang and occasionally even a crash of the host system. As expected, simple use of offload directives is relatively ineffective compared to native mode execution on a single KNF card due to the costs of startup and data transfer to and from the KNF to the host, even though the OpenMP parallel code is quite efficient on an Intel R Xeon R multicore when running at the same scale. 4 NICS Test Systems and Preliminary Results NICS has three KNF systems covering most of the choices in basic cluster configurations: An Intel R Development Workstation with 2 Intel R Xeon R processor 5600s and 2 KNF cards. A Cray CX-1 system consisting of a single head node with 2 Intel R Xeon R processor 5600s and two compute nodes each containing 2 Intel R Xeon R processor 5600s plus one KNF card. An Appro cluster consisting of 4 compute nodes, each of which contains 2 Intel R Xeon R processor 5600s and 2 KNF cards, for a total of 8 KNF cards. 3

4 32 ENZO-R Scaling Single KF Native Mode MPI Ideal Actual 16 Relative Performance Number of MPI Tasks Figure 1: Unoptimized ENZO-R scaling in native mode on KNF. Each of these systems is configured to support outside-in and inside-out programming models and can support native mode MPI running across the nodes within each cluster. Every KNF card has 32-core processors with 2 GBytes of on-board GDDR5. Preliminary results are available for small-scale tests run on the various NICS systems. 4.1 ENZO-R in KNF Native Mode Figure 1 shows the strong scaling of a non-amr cosmology model using MPI in native mode on a single KNF card. This non-rhd model uses 32-bit arithmetic for the physical fields with the exception of dark matter particle position which requires 64-bit precision. The scaling behavior is remarkable given the fact that decomposing such a small model results in parallel tasks which are far smaller than would be used in a full-scale simulation (i.e. for 32 tasks the model is decomposed into 32 regions with 4x4x2 tiles of size 32x32x64 compared to or tiles in production simulations). AMR and RHD models require full 64-bit precision throughout. Consequently, for RHD a single KNF can run only an 80 3 model. AMR models require an increasing amount of memory as the refinement progresses and the test case with a 64 3 root grid and three levels of refinement exhausts all available memory at about 1 GByte long before the AMR is fully developed. Larger models using up to 8 times as much memory can run using multiple KNF cards on the NICS cluster although the cost of inter-node communication becomes excessive given the present method of indirect routing. 4

5 32 ENZO-Inits Scaling Single KF 16 Ideal MMIC mode OFFLOAD mode XEON mode Relative Performance Number of OpenMP Threads Figure 2: Comparison of scaling of main parallel region in ENZO-Inits in native mode and offload mode on KNF and execution on the Intel R Xeon R host. 4.2 ENZO-Inits in KNF Offload Mode Figure 2 shows the scaling of part of the ENZO-Inits code running in native mode, offload mode and purely on the Intel R Xeon R processor 5600 front-end. The code is identical in all three cases and all arithmetic uses 64-bit precision. ENZO-Inits reads a file containing a double precision pseudo-random number sequence. For the test problem this file is 85 MBytes in size. This random number data represents the input to the major OpenMP parallel region and the timing data is given for this region only. In native Intel R Xeon R and native KNF modes this data is already resident in memory at the start of the parallel region. For the offload case it is automatically copied in to the KNF card together with a double precision complex field (32 MBytes in size) which is also returned to the host on completion. The results for native mode show reasonable efficiency and the behavior of the pure Intel R Xeon R run is comparable up to 8 OpenMP threads. The results for offload presumably demonstrate the impact of the data transfers from and back to the host over the PCIe. The results for two cores show almost no improvement over using a single core, even on the Intel R Xeon R processor. The reason for this is unknown at the time of writing. Beyond two cores the scaling of the Intel R Xeon R and KNF resident cases is close to ideal. Although this parallel region represents most of the workload, ENZO-Inits must also write several fields to disk. Since all I/O uses KNF resources, the need to write these files restricts the use of the KNF to small test cases only. 5

6 5 Future Developments The next steps in development will be to complete the migration of ENZO-R to offload mode and to investigate the use of two separate but communicating MPI domains resident on the Intel R Xeon R and KNF cooperating components, respectively, to do reverse offload communication and I/O where ENZO state remains KNF-resident. The forward offload model is close to the standard hybrid hybrid CPU/GPGPU model and similar constraints apply. Reverse offload preserves a model which can be used on any MPP platform. 6

7 References [1] P. Colella and P. R. Woodward. The Piecewise Parabolic Method (PPM) for Gas-Dynamical Simulations. Journal of Computational Physics, 54: , September [2] HYPRE project site: [3] ENZO poject site: [4] M.L. Norman, J. Bordner, D. Reynolds, R. Wagner, G.L.Bryan, R. Harkness & B.W. O Shea. Simulating Cosmological Evolution with Enzo Petascale Computing: Algorithms and Applications, pp , Ed. D.A. Bader. Chapman & Hall/CRC, [5] Norman, M. L., Bryan, G. L. et al Simulating Cosmological Evolution with Enzo, in Petascale Computing: Algorithms and Applications, Ed. D. Bader, CRC Press LLC (2007) [6] O Shea, B. W.; Bryan, G.; Bordner, J.; Norman, M. L.; Abel, T.; Harkness, R.; Kritsuk, A. 2004: Introducing Enzo, an AMR Cosmology Application, in Adaptive Mesh Refinement - Theory and Applications, Eds. T. Plewa, T. Linde & V. G. Weirs, Springer Lecture Notes in Computational Science and Engineering 7

Experiences with ENZO on the Intel Many Integrated Core Architecture

Experiences with ENZO on the Intel Many Integrated Core Architecture Experiences with ENZO on the Intel Many Integrated Core Architecture Dr. Robert Harkness National Institute for Computational Sciences April 10th, 2012 Overview ENZO applications at petascale ENZO and

More information

Enzo-P / Cello. Scalable Adaptive Mesh Refinement for Astrophysics and Cosmology. San Diego Supercomputer Center. Department of Physics and Astronomy

Enzo-P / Cello. Scalable Adaptive Mesh Refinement for Astrophysics and Cosmology. San Diego Supercomputer Center. Department of Physics and Astronomy Enzo-P / Cello Scalable Adaptive Mesh Refinement for Astrophysics and Cosmology James Bordner 1 Michael L. Norman 1 Brian O Shea 2 1 University of California, San Diego San Diego Supercomputer Center 2

More information

A Scalable Adaptive Mesh Refinement Framework For Parallel Astrophysics Applications

A Scalable Adaptive Mesh Refinement Framework For Parallel Astrophysics Applications A Scalable Adaptive Mesh Refinement Framework For Parallel Astrophysics Applications James Bordner, Michael L. Norman San Diego Supercomputer Center University of California, San Diego 15th SIAM Conference

More information

AACE: Applications. Director, Application Acceleration Center of Excellence National Institute for Computational Sciences glenn-

AACE: Applications. Director, Application Acceleration Center of Excellence National Institute for Computational Sciences glenn- AACE: Applications R. Glenn Brook Director, Application Acceleration Center of Excellence National Institute for Computational Sciences glenn- brook@tennessee.edu Ryan C. Hulguin Computational Science

More information

Vincent C. Betro, R. Glenn Brook, & Ryan C. Hulguin XSEDE Xtreme Scaling Workshop Chicago, IL July 15-16, 2012

Vincent C. Betro, R. Glenn Brook, & Ryan C. Hulguin XSEDE Xtreme Scaling Workshop Chicago, IL July 15-16, 2012 Vincent C. Betro, R. Glenn Brook, & Ryan C. Hulguin XSEDE Xtreme Scaling Workshop Chicago, IL July 15-16, 2012 Outline NICS and AACE Architecture Overview Resources Native Mode Boltzmann BGK Solver Native/Offload

More information

Formation of the First Galaxies Enzo-P / Cello Adaptive Mesh Refinement

Formation of the First Galaxies Enzo-P / Cello Adaptive Mesh Refinement Formation of the First Galaxies Enzo-P / ello Adaptive Mesh Refinement James Bordner, Michael L. Norman, Brian O Shea June 20, 2013 Abstract Enzo [8] is a mature and highly successful parallel adaptive

More information

PORTING CP2K TO THE INTEL XEON PHI. ARCHER Technical Forum, Wed 30 th July Iain Bethune

PORTING CP2K TO THE INTEL XEON PHI. ARCHER Technical Forum, Wed 30 th July Iain Bethune PORTING CP2K TO THE INTEL XEON PHI ARCHER Technical Forum, Wed 30 th July Iain Bethune (ibethune@epcc.ed.ac.uk) Outline Xeon Phi Overview Porting CP2K to Xeon Phi Performance Results Lessons Learned Further

More information

RAMSES on the GPU: An OpenACC-Based Approach

RAMSES on the GPU: An OpenACC-Based Approach RAMSES on the GPU: An OpenACC-Based Approach Claudio Gheller (ETHZ-CSCS) Giacomo Rosilho de Souza (EPFL Lausanne) Romain Teyssier (University of Zurich) Markus Wetzstein (ETHZ-CSCS) PRACE-2IP project EU

More information

NIA CFD Futures Conference Hampton, VA; August 2012

NIA CFD Futures Conference Hampton, VA; August 2012 Petascale Computing and Similarity Scaling in Turbulence P. K. Yeung Schools of AE, CSE, ME Georgia Tech pk.yeung@ae.gatech.edu NIA CFD Futures Conference Hampton, VA; August 2012 10 2 10 1 10 4 10 5 Supported

More information

Adaptive-Mesh-Refinement Hydrodynamic GPU Computation in Astrophysics

Adaptive-Mesh-Refinement Hydrodynamic GPU Computation in Astrophysics Adaptive-Mesh-Refinement Hydrodynamic GPU Computation in Astrophysics H. Y. Schive ( 薛熙于 ) Graduate Institute of Physics, National Taiwan University Leung Center for Cosmology and Particle Astrophysics

More information

GPU Debugging Made Easy. David Lecomber CTO, Allinea Software

GPU Debugging Made Easy. David Lecomber CTO, Allinea Software GPU Debugging Made Easy David Lecomber CTO, Allinea Software david@allinea.com Allinea Software HPC development tools company Leading in HPC software tools market Wide customer base Blue-chip engineering,

More information

ISC 09 Poster Abstract : I/O Performance Analysis for the Petascale Simulation Code FLASH

ISC 09 Poster Abstract : I/O Performance Analysis for the Petascale Simulation Code FLASH ISC 09 Poster Abstract : I/O Performance Analysis for the Petascale Simulation Code FLASH Heike Jagode, Shirley Moore, Dan Terpstra, Jack Dongarra The University of Tennessee, USA [jagode shirley terpstra

More information

Enzo-P / Cello. Formation of the First Galaxies. San Diego Supercomputer Center. Department of Physics and Astronomy

Enzo-P / Cello. Formation of the First Galaxies. San Diego Supercomputer Center. Department of Physics and Astronomy Enzo-P / Cello Formation of the First Galaxies James Bordner 1 Michael L. Norman 1 Brian O Shea 2 1 University of California, San Diego San Diego Supercomputer Center 2 Michigan State University Department

More information

Intel Xeon Phi архитектура, модели программирования, оптимизация.

Intel Xeon Phi архитектура, модели программирования, оптимизация. Нижний Новгород, 2017 Intel Xeon Phi архитектура, модели программирования, оптимизация. Дмитрий Прохоров, Дмитрий Рябцев, Intel Agenda What and Why Intel Xeon Phi Top 500 insights, roadmap, architecture

More information

Intel Many Integrated Core (MIC) Architecture

Intel Many Integrated Core (MIC) Architecture Intel Many Integrated Core (MIC) Architecture Karl Solchenbach Director European Exascale Labs BMW2011, November 3, 2011 1 Notice and Disclaimers Notice: This document contains information on products

More information

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004 A Study of High Performance Computing and the Cray SV1 Supercomputer Michael Sullivan TJHSST Class of 2004 June 2004 0.1 Introduction A supercomputer is a device for turning compute-bound problems into

More information

Preliminary Experiences with the Uintah Framework on on Intel Xeon Phi and Stampede

Preliminary Experiences with the Uintah Framework on on Intel Xeon Phi and Stampede Preliminary Experiences with the Uintah Framework on on Intel Xeon Phi and Stampede Qingyu Meng, Alan Humphrey, John Schmidt, Martin Berzins Thanks to: TACC Team for early access to Stampede J. Davison

More information

High performance computing and numerical modeling

High performance computing and numerical modeling High performance computing and numerical modeling Volker Springel Plan for my lectures Lecture 1: Collisional and collisionless N-body dynamics Lecture 2: Gravitational force calculation Lecture 3: Basic

More information

High-Performance Scientific Computing

High-Performance Scientific Computing High-Performance Scientific Computing Instructor: Randy LeVeque TA: Grady Lemoine Applied Mathematics 483/583, Spring 2011 http://www.amath.washington.edu/~rjl/am583 World s fastest computers http://top500.org

More information

Performance Metrics and Application Experiences on a Cray CS300- AC Cluster Supercomputer Equipped with Intel Xeon Phi Coprocessors

Performance Metrics and Application Experiences on a Cray CS300- AC Cluster Supercomputer Equipped with Intel Xeon Phi Coprocessors Performance Metrics and Application Experiences on a Cray CS300- AC Cluster Supercomputer Equipped with Intel Xeon Phi Coprocessors Vincent C. Betro, Ph.D. Computational Scientist National Institute for

More information

Debugging and Optimizing Programs Accelerated with Intel Xeon Phi Coprocessors

Debugging and Optimizing Programs Accelerated with Intel Xeon Phi Coprocessors Debugging and Optimizing Programs Accelerated with Intel Xeon Phi Coprocessors Chris Gottbrath Rogue Wave Software Boulder, CO Chris.Gottbrath@roguewave.com Abstract Intel Xeon Phi coprocessors present

More information

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

Computing architectures Part 2 TMA4280 Introduction to Supercomputing Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:

More information

Intel Xeon Phi архитектура, модели программирования, оптимизация.

Intel Xeon Phi архитектура, модели программирования, оптимизация. Нижний Новгород, 2016 Intel Xeon Phi архитектура, модели программирования, оптимизация. Дмитрий Прохоров, Intel Agenda What and Why Intel Xeon Phi Top 500 insights, roadmap, architecture How Programming

More information

CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)

CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) Parallel Programming with Message Passing and Directives 2 MPI + OpenMP Some applications can

More information

Double Rewards of Porting Scientific Applications to the Intel MIC Architecture

Double Rewards of Porting Scientific Applications to the Intel MIC Architecture Double Rewards of Porting Scientific Applications to the Intel MIC Architecture Troy A. Porter Hansen Experimental Physics Laboratory and Kavli Institute for Particle Astrophysics and Cosmology Stanford

More information

LS-DYNA Scalability Analysis on Cray Supercomputers

LS-DYNA Scalability Analysis on Cray Supercomputers 13 th International LS-DYNA Users Conference Session: Computing Technology LS-DYNA Scalability Analysis on Cray Supercomputers Ting-Ting Zhu Cray Inc. Jason Wang LSTC Abstract For the automotive industry,

More information

Adaptive Mesh Astrophysical Fluid Simulations on GPU. San Jose 10/2/2009 Peng Wang, NVIDIA

Adaptive Mesh Astrophysical Fluid Simulations on GPU. San Jose 10/2/2009 Peng Wang, NVIDIA Adaptive Mesh Astrophysical Fluid Simulations on GPU San Jose 10/2/2009 Peng Wang, NVIDIA Overview Astrophysical motivation & the Enzo code Finite volume method and adaptive mesh refinement (AMR) CUDA

More information

Early Experiences Writing Performance Portable OpenMP 4 Codes

Early Experiences Writing Performance Portable OpenMP 4 Codes Early Experiences Writing Performance Portable OpenMP 4 Codes Verónica G. Vergara Larrea Wayne Joubert M. Graham Lopez Oscar Hernandez Oak Ridge National Laboratory Problem statement APU FPGA neuromorphic

More information

Future of Enzo. Michael L. Norman James Bordner LCA/SDSC/UCSD

Future of Enzo. Michael L. Norman James Bordner LCA/SDSC/UCSD Future of Enzo Michael L. Norman James Bordner LCA/SDSC/UCSD SDSC Resources Data to Discovery Host SDNAP San Diego network access point for multiple 10 Gbs WANs ESNet, NSF TeraGrid, CENIC, Internet2, StarTap

More information

Lagrangian methods and Smoothed Particle Hydrodynamics (SPH) Computation in Astrophysics Seminar (Spring 2006) L. J. Dursi

Lagrangian methods and Smoothed Particle Hydrodynamics (SPH) Computation in Astrophysics Seminar (Spring 2006) L. J. Dursi Lagrangian methods and Smoothed Particle Hydrodynamics (SPH) Eulerian Grid Methods The methods covered so far in this course use an Eulerian grid: Prescribed coordinates In `lab frame' Fluid elements flow

More information

Designing Parallel Programs. This review was developed from Introduction to Parallel Computing

Designing Parallel Programs. This review was developed from Introduction to Parallel Computing Designing Parallel Programs This review was developed from Introduction to Parallel Computing Author: Blaise Barney, Lawrence Livermore National Laboratory references: https://computing.llnl.gov/tutorials/parallel_comp/#whatis

More information

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,

More information

Using Intel VTune Amplifier XE for High Performance Computing

Using Intel VTune Amplifier XE for High Performance Computing Using Intel VTune Amplifier XE for High Performance Computing Vladimir Tsymbal Performance, Analysis and Threading Lab 1 The Majority of all HPC-Systems are Clusters Interconnect I/O I/O... I/O I/O Message

More information

Project Name. The Eclipse Integrated Computational Environment. Jay Jay Billings, ORNL Parent Project. None selected yet.

Project Name. The Eclipse Integrated Computational Environment. Jay Jay Billings, ORNL Parent Project. None selected yet. Project Name The Eclipse Integrated Computational Environment Jay Jay Billings, ORNL 20140219 Parent Project None selected yet. Background The science and engineering community relies heavily on modeling

More information

HPC Architectures. Types of resource currently in use

HPC Architectures. Types of resource currently in use HPC Architectures Types of resource currently in use Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

Debugging CUDA Applications with Allinea DDT. Ian Lumb Sr. Systems Engineer, Allinea Software Inc.

Debugging CUDA Applications with Allinea DDT. Ian Lumb Sr. Systems Engineer, Allinea Software Inc. Debugging CUDA Applications with Allinea DDT Ian Lumb Sr. Systems Engineer, Allinea Software Inc. ilumb@allinea.com GTC 2013, San Jose, March 20, 2013 Embracing GPUs GPUs a rival to traditional processors

More information

Scalability of Uintah Past Present and Future

Scalability of Uintah Past Present and Future DOE for funding the CSAFE project (97-10), DOE NETL, DOE NNSA NSF for funding via SDCI and PetaApps, INCITE, XSEDE Scalability of Uintah Past Present and Future Martin Berzins Qingyu Meng John Schmidt,

More information

Introduction to Xeon Phi. Bill Barth January 11, 2013

Introduction to Xeon Phi. Bill Barth January 11, 2013 Introduction to Xeon Phi Bill Barth January 11, 2013 What is it? Co-processor PCI Express card Stripped down Linux operating system Dense, simplified processor Many power-hungry operations removed Wider

More information

Ryan C. Hulguin TACC-Intel Highly Parallel Computing Symposium April 10th-11th, 2012 Austin, TX

Ryan C. Hulguin TACC-Intel Highly Parallel Computing Symposium April 10th-11th, 2012 Austin, TX Ryan C. Hulguin TACC-Intel Highly Parallel Computing Symposium April 10th-11th, 2012 Austin, TX Outline Introduction Knights Ferry Technical Specifications CFD Governing Equations Numerical Algorithm Solver

More information

Communication and Optimization Aspects of Parallel Programming Models on Hybrid Architectures

Communication and Optimization Aspects of Parallel Programming Models on Hybrid Architectures Communication and Optimization Aspects of Parallel Programming Models on Hybrid Architectures Rolf Rabenseifner rabenseifner@hlrs.de Gerhard Wellein gerhard.wellein@rrze.uni-erlangen.de University of Stuttgart

More information

Analyzing the Performance of IWAVE on a Cluster using HPCToolkit

Analyzing the Performance of IWAVE on a Cluster using HPCToolkit Analyzing the Performance of IWAVE on a Cluster using HPCToolkit John Mellor-Crummey and Laksono Adhianto Department of Computer Science Rice University {johnmc,laksono}@rice.edu TRIP Meeting March 30,

More information

Debugging Programs Accelerated with Intel Xeon Phi Coprocessors

Debugging Programs Accelerated with Intel Xeon Phi Coprocessors Debugging Programs Accelerated with Intel Xeon Phi Coprocessors A White Paper by Rogue Wave Software. Rogue Wave Software 5500 Flatiron Parkway, Suite 200 Boulder, CO 80301, USA www.roguewave.com Debugging

More information

GPU Architecture. Alan Gray EPCC The University of Edinburgh

GPU Architecture. Alan Gray EPCC The University of Edinburgh GPU Architecture Alan Gray EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? Architectural reasons for accelerator performance advantages Latest GPU Products From

More information

simulation framework for piecewise regular grids

simulation framework for piecewise regular grids WALBERLA, an ultra-scalable multiphysics simulation framework for piecewise regular grids ParCo 2015, Edinburgh September 3rd, 2015 Christian Godenschwager, Florian Schornbaum, Martin Bauer, Harald Köstler

More information

Acceleration of HPC applications on hybrid CPU-GPU systems: When can Multi-Process Service (MPS) help?

Acceleration of HPC applications on hybrid CPU-GPU systems: When can Multi-Process Service (MPS) help? Acceleration of HPC applications on hybrid CPU- systems: When can Multi-Process Service (MPS) help? GTC 2018 March 28, 2018 Olga Pearce (Lawrence Livermore National Laboratory) http://people.llnl.gov/olga

More information

Addressing the Increasing Challenges of Debugging on Accelerated HPC Systems. Ed Hinkel Senior Sales Engineer

Addressing the Increasing Challenges of Debugging on Accelerated HPC Systems. Ed Hinkel Senior Sales Engineer Addressing the Increasing Challenges of Debugging on Accelerated HPC Systems Ed Hinkel Senior Sales Engineer Agenda Overview - Rogue Wave & TotalView GPU Debugging with TotalView Nvdia CUDA Intel Phi 2

More information

The Stampede is Coming: A New Petascale Resource for the Open Science Community

The Stampede is Coming: A New Petascale Resource for the Open Science Community The Stampede is Coming: A New Petascale Resource for the Open Science Community Jay Boisseau Texas Advanced Computing Center boisseau@tacc.utexas.edu Stampede: Solicitation US National Science Foundation

More information

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance 11 th International LS-DYNA Users Conference Computing Technology LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton

More information

Scientific Computing at Million-way Parallelism - Blue Gene/Q Early Science Program

Scientific Computing at Million-way Parallelism - Blue Gene/Q Early Science Program Scientific Computing at Million-way Parallelism - Blue Gene/Q Early Science Program Implementing Hybrid Parallelism in FLASH Christopher Daley 1 2 Vitali Morozov 1 Dongwook Lee 2 Anshu Dubey 1 2 Jonathon

More information

Top-Down System Design Approach Hans-Christian Hoppe, Intel Deutschland GmbH

Top-Down System Design Approach Hans-Christian Hoppe, Intel Deutschland GmbH Exploiting the Potential of European HPC Stakeholders in Extreme-Scale Demonstrators Top-Down System Design Approach Hans-Christian Hoppe, Intel Deutschland GmbH Motivation & Introduction Computer system

More information

GAMER : a GPU-accelerated Adaptive-MEsh-Refinement Code for Astrophysics GPU 與自適性網格於天文模擬之應用與效能

GAMER : a GPU-accelerated Adaptive-MEsh-Refinement Code for Astrophysics GPU 與自適性網格於天文模擬之應用與效能 GAMER : a GPU-accelerated Adaptive-MEsh-Refinement Code for Astrophysics GPU 與自適性網格於天文模擬之應用與效能 Hsi-Yu Schive ( 薛熙于 ), Tzihong Chiueh ( 闕志鴻 ), Yu-Chih Tsai ( 蔡御之 ), Ui-Han Zhang ( 張瑋瀚 ) Graduate Institute

More information

Issues In Implementing The Primal-Dual Method for SDP. Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM

Issues In Implementing The Primal-Dual Method for SDP. Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM Issues In Implementing The Primal-Dual Method for SDP Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM 87801 borchers@nmt.edu Outline 1. Cache and shared memory parallel computing concepts.

More information

High performance Computing and O&G Challenges

High performance Computing and O&G Challenges High performance Computing and O&G Challenges 2 Seismic exploration challenges High Performance Computing and O&G challenges Worldwide Context Seismic,sub-surface imaging Computing Power needs Accelerating

More information

Large Scale Simulations of the Non-Thermal Universe

Large Scale Simulations of the Non-Thermal Universe Available on-line at www.prace-ri.eu Partnership for Advanced Computing in Europe Large Scale Simulations of the Non-Thermal Universe Claudio Gheller a,, Graziella Ferini a, Maciej Cytowski b, Franco Vazza

More information

Scalasca support for Intel Xeon Phi. Brian Wylie & Wolfgang Frings Jülich Supercomputing Centre Forschungszentrum Jülich, Germany

Scalasca support for Intel Xeon Phi. Brian Wylie & Wolfgang Frings Jülich Supercomputing Centre Forschungszentrum Jülich, Germany Scalasca support for Intel Xeon Phi Brian Wylie & Wolfgang Frings Jülich Supercomputing Centre Forschungszentrum Jülich, Germany Overview Scalasca performance analysis toolset support for MPI & OpenMP

More information

The Use of Cloud Computing Resources in an HPC Environment

The Use of Cloud Computing Resources in an HPC Environment The Use of Cloud Computing Resources in an HPC Environment Bill, Labate, UCLA Office of Information Technology Prakashan Korambath, UCLA Institute for Digital Research & Education Cloud computing becomes

More information

Peta-Scale Simulations with the HPC Software Framework walberla:

Peta-Scale Simulations with the HPC Software Framework walberla: Peta-Scale Simulations with the HPC Software Framework walberla: Massively Parallel AMR for the Lattice Boltzmann Method SIAM PP 2016, Paris April 15, 2016 Florian Schornbaum, Christian Godenschwager,

More information

Fourteen years of Cactus Community

Fourteen years of Cactus Community Fourteen years of Cactus Community Frank Löffler Center for Computation and Technology Louisiana State University, Baton Rouge, LA September 6th 2012 Outline Motivation scenario from Astrophysics Cactus

More information

OpenFOAM Scaling on Cray Supercomputers Dr. Stephen Sachs GOFUN 2017

OpenFOAM Scaling on Cray Supercomputers Dr. Stephen Sachs GOFUN 2017 OpenFOAM Scaling on Cray Supercomputers Dr. Stephen Sachs GOFUN 2017 Safe Harbor Statement This presentation may contain forward-looking statements that are based on our current expectations. Forward looking

More information

Performance optimization of the Smoothed Particle Hydrodynamics code Gadget3 on 2nd generation Intel Xeon Phi

Performance optimization of the Smoothed Particle Hydrodynamics code Gadget3 on 2nd generation Intel Xeon Phi Performance optimization of the Smoothed Particle Hydrodynamics code Gadget3 on 2nd generation Intel Xeon Phi Dr. Luigi Iapichino Leibniz Supercomputing Centre Supercomputing 2017 Intel booth, Nerve Center

More information

Scientific Computing with Intel Xeon Phi Coprocessors

Scientific Computing with Intel Xeon Phi Coprocessors Scientific Computing with Intel Xeon Phi Coprocessors Andrey Vladimirov Colfax International HPC Advisory Council Stanford Conference 2015 Compututing with Xeon Phi Welcome Colfax International, 2014 Contents

More information

Assessment of LS-DYNA Scalability Performance on Cray XD1

Assessment of LS-DYNA Scalability Performance on Cray XD1 5 th European LS-DYNA Users Conference Computing Technology (2) Assessment of LS-DYNA Scalability Performance on Cray Author: Ting-Ting Zhu, Cray Inc. Correspondence: Telephone: 651-65-987 Fax: 651-65-9123

More information

P. K. Yeung Georgia Tech,

P. K. Yeung Georgia Tech, Progress in Petascale Computations of Turbulence at high Reynolds number P. K. Yeung Georgia Tech, pk.yeung@ae.gatech.edu Blue Waters Symposium NCSA, Urbana-Champaign, May 2014 Thanks... NSF: PRAC and

More information

Developing PIC Codes for the Next Generation Supercomputer using GPUs. Viktor K. Decyk UCLA

Developing PIC Codes for the Next Generation Supercomputer using GPUs. Viktor K. Decyk UCLA Developing PIC Codes for the Next Generation Supercomputer using GPUs Viktor K. Decyk UCLA Abstract The current generation of supercomputer (petaflops scale) cannot be scaled up to exaflops (1000 petaflops),

More information

Introduction to Parallel Computing. CPS 5401 Fall 2014 Shirley Moore, Instructor October 13, 2014

Introduction to Parallel Computing. CPS 5401 Fall 2014 Shirley Moore, Instructor October 13, 2014 Introduction to Parallel Computing CPS 5401 Fall 2014 Shirley Moore, Instructor October 13, 2014 1 Definition of Parallel Computing Simultaneous use of multiple compute resources to solve a computational

More information

IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning

IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning September 22 nd 2015 Tommaso Cecchi 2 What is IME? This breakthrough, software defined storage application

More information

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D.

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D. Resources Current and Future Systems Timothy H. Kaiser, Ph.D. tkaiser@mines.edu 1 Most likely talk to be out of date History of Top 500 Issues with building bigger machines Current and near future academic

More information

Tutorial. Preparing for Stampede: Programming Heterogeneous Many-Core Supercomputers

Tutorial. Preparing for Stampede: Programming Heterogeneous Many-Core Supercomputers Tutorial Preparing for Stampede: Programming Heterogeneous Many-Core Supercomputers Dan Stanzione, Lars Koesterke, Bill Barth, Kent Milfeld dan/lars/bbarth/milfeld@tacc.utexas.edu XSEDE 12 July 16, 2012

More information

What does Heterogeneity bring?

What does Heterogeneity bring? What does Heterogeneity bring? Ken Koch Scientific Advisor, CCS-DO, LANL LACSI 2006 Conference October 18, 2006 Some Terminology Homogeneous Of the same or similar nature or kind Uniform in structure or

More information

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 13 th CALL (T ier-0)

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 13 th CALL (T ier-0) TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 13 th CALL (T ier-0) Contributing sites and the corresponding computer systems for this call are: BSC, Spain IBM System x idataplex CINECA, Italy Lenovo System

More information

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 6 th CALL (Tier-0)

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 6 th CALL (Tier-0) TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 6 th CALL (Tier-0) Contributing sites and the corresponding computer systems for this call are: GCS@Jülich, Germany IBM Blue Gene/Q GENCI@CEA, France Bull Bullx

More information

BlueGene/L. Computer Science, University of Warwick. Source: IBM

BlueGene/L. Computer Science, University of Warwick. Source: IBM BlueGene/L Source: IBM 1 BlueGene/L networking BlueGene system employs various network types. Central is the torus interconnection network: 3D torus with wrap-around. Each node connects to six neighbours

More information

White Paper. Technical Advances in the SGI. UV Architecture

White Paper. Technical Advances in the SGI. UV Architecture White Paper Technical Advances in the SGI UV Architecture TABLE OF CONTENTS 1. Introduction 1 2. The SGI UV Architecture 2 2.1. SGI UV Compute Blade 3 2.1.1. UV_Hub ASIC Functionality 4 2.1.1.1. Global

More information

IBM Spectrum Scale IO performance

IBM Spectrum Scale IO performance IBM Spectrum Scale 5.0.0 IO performance Silverton Consulting, Inc. StorInt Briefing 2 Introduction High-performance computing (HPC) and scientific computing are in a constant state of transition. Artificial

More information

Network Bandwidth & Minimum Efficient Problem Size

Network Bandwidth & Minimum Efficient Problem Size Network Bandwidth & Minimum Efficient Problem Size Paul R. Woodward Laboratory for Computational Science & Engineering (LCSE), University of Minnesota April 21, 2004 Build 3 virtual computers with Intel

More information

Intel Performance Libraries

Intel Performance Libraries Intel Performance Libraries Powerful Mathematical Library Intel Math Kernel Library (Intel MKL) Energy Science & Research Engineering Design Financial Analytics Signal Processing Digital Content Creation

More information

INTRODUCTION TO THE ARCHER KNIGHTS LANDING CLUSTER. Adrian

INTRODUCTION TO THE ARCHER KNIGHTS LANDING CLUSTER. Adrian INTRODUCTION TO THE ARCHER KNIGHTS LANDING CLUSTER Adrian Jackson adrianj@epcc.ed.ac.uk @adrianjhpc Processors The power used by a CPU core is proportional to Clock Frequency x Voltage 2 In the past, computers

More information

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA Pak Lui, Gilad Shainer, Brian Klaff Mellanox Technologies Abstract From concept to

More information

Path to Exascale? Intel in Research and HPC 2012

Path to Exascale? Intel in Research and HPC 2012 Path to Exascale? Intel in Research and HPC 2012 Intel s Investment in Manufacturing New Capacity for 14nm and Beyond D1X Oregon Development Fab Fab 42 Arizona High Volume Fab 22nm Fab Upgrades D1D Oregon

More information

Titan - Early Experience with the Titan System at Oak Ridge National Laboratory

Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Office of Science Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Buddy Bland Project Director Oak Ridge Leadership Computing Facility November 13, 2012 ORNL s Titan Hybrid

More information

I/O Analysis and Optimization for an AMR Cosmology Application

I/O Analysis and Optimization for an AMR Cosmology Application I/O Analysis and Optimization for an AMR Cosmology Application Jianwei Li Wei-keng Liao Alok Choudhary Valerie Taylor ECE Department, Northwestern University {jianwei, wkliao, choudhar, taylor}@ece.northwestern.edu

More information

A TIMING AND SCALABILITY ANALYSIS OF THE PARALLEL PERFORMANCE OF CMAQ v4.5 ON A BEOWULF LINUX CLUSTER

A TIMING AND SCALABILITY ANALYSIS OF THE PARALLEL PERFORMANCE OF CMAQ v4.5 ON A BEOWULF LINUX CLUSTER A TIMING AND SCALABILITY ANALYSIS OF THE PARALLEL PERFORMANCE OF CMAQ v4.5 ON A BEOWULF LINUX CLUSTER Shaheen R. Tonse* Lawrence Berkeley National Lab., Berkeley, CA, USA 1. INTRODUCTION The goal of this

More information

arxiv: v1 [hep-lat] 1 Dec 2017

arxiv: v1 [hep-lat] 1 Dec 2017 arxiv:1712.00143v1 [hep-lat] 1 Dec 2017 MILC Code Performance on High End CPU and GPU Supercomputer Clusters Carleton DeTar 1, Steven Gottlieb 2,, Ruizi Li 2,, and Doug Toussaint 3 1 Department of Physics

More information

Bring your application to a new era:

Bring your application to a new era: Bring your application to a new era: learning by example how to parallelize and optimize for Intel Xeon processor and Intel Xeon Phi TM coprocessor Manel Fernández, Roger Philp, Richard Paul Bayncore Ltd.

More information

Intel C++ Compiler User's Guide With Support For The Streaming Simd Extensions 2

Intel C++ Compiler User's Guide With Support For The Streaming Simd Extensions 2 Intel C++ Compiler User's Guide With Support For The Streaming Simd Extensions 2 This release of the Intel C++ Compiler 16.0 product is a Pre-Release, and as such is 64 architecture processor supporting

More information

Trends in HPC (hardware complexity and software challenges)

Trends in HPC (hardware complexity and software challenges) Trends in HPC (hardware complexity and software challenges) Mike Giles Oxford e-research Centre Mathematical Institute MIT seminar March 13th, 2013 Mike Giles (Oxford) HPC Trends March 13th, 2013 1 / 18

More information

Thread and Data parallelism in CPUs - will GPUs become obsolete?

Thread and Data parallelism in CPUs - will GPUs become obsolete? Thread and Data parallelism in CPUs - will GPUs become obsolete? USP, Sao Paulo 25/03/11 Carsten Trinitis Carsten.Trinitis@tum.de Lehrstuhl für Rechnertechnik und Rechnerorganisation (LRR) Institut für

More information

Accelerating Implicit LS-DYNA with GPU

Accelerating Implicit LS-DYNA with GPU Accelerating Implicit LS-DYNA with GPU Yih-Yih Lin Hewlett-Packard Company Abstract A major hindrance to the widespread use of Implicit LS-DYNA is its high compute cost. This paper will show modern GPU,

More information

Productive Performance on the Cray XK System Using OpenACC Compilers and Tools

Productive Performance on the Cray XK System Using OpenACC Compilers and Tools Productive Performance on the Cray XK System Using OpenACC Compilers and Tools Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc. 1 The New Generation of Supercomputers Hybrid

More information

ICON for HD(CP) 2. High Definition Clouds and Precipitation for Advancing Climate Prediction

ICON for HD(CP) 2. High Definition Clouds and Precipitation for Advancing Climate Prediction ICON for HD(CP) 2 High Definition Clouds and Precipitation for Advancing Climate Prediction High Definition Clouds and Precipitation for Advancing Climate Prediction ICON 2 years ago Parameterize shallow

More information

High Performance Computing Course Notes HPC Fundamentals

High Performance Computing Course Notes HPC Fundamentals High Performance Computing Course Notes 2008-2009 2009 HPC Fundamentals Introduction What is High Performance Computing (HPC)? Difficult to define - it s a moving target. Later 1980s, a supercomputer performs

More information

Topology and affinity aware hierarchical and distributed load-balancing in Charm++

Topology and affinity aware hierarchical and distributed load-balancing in Charm++ Topology and affinity aware hierarchical and distributed load-balancing in Charm++ Emmanuel Jeannot, Guillaume Mercier, François Tessier Inria - IPB - LaBRI - University of Bordeaux - Argonne National

More information

Chapter 3 Parallel Software

Chapter 3 Parallel Software Chapter 3 Parallel Software Part I. Preliminaries Chapter 1. What Is Parallel Computing? Chapter 2. Parallel Hardware Chapter 3. Parallel Software Chapter 4. Parallel Applications Chapter 5. Supercomputers

More information

OP2 FOR MANY-CORE ARCHITECTURES

OP2 FOR MANY-CORE ARCHITECTURES OP2 FOR MANY-CORE ARCHITECTURES G.R. Mudalige, M.B. Giles, Oxford e-research Centre, University of Oxford gihan.mudalige@oerc.ox.ac.uk 27 th Jan 2012 1 AGENDA OP2 Current Progress Future work for OP2 EPSRC

More information

6.1 Multiprocessor Computing Environment

6.1 Multiprocessor Computing Environment 6 Parallel Computing 6.1 Multiprocessor Computing Environment The high-performance computing environment used in this book for optimization of very large building structures is the Origin 2000 multiprocessor,

More information

Parallel Applications on Distributed Memory Systems. Le Yan HPC User LSU

Parallel Applications on Distributed Memory Systems. Le Yan HPC User LSU Parallel Applications on Distributed Memory Systems Le Yan HPC User Services @ LSU Outline Distributed memory systems Message Passing Interface (MPI) Parallel applications 6/3/2015 LONI Parallel Programming

More information

The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System

The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System Alan Humphrey, Qingyu Meng, Martin Berzins Scientific Computing and Imaging Institute & University of Utah I. Uintah Overview

More information

Overview. Idea: Reduce CPU clock frequency This idea is well suited specifically for visualization

Overview. Idea: Reduce CPU clock frequency This idea is well suited specifically for visualization Exploring Tradeoffs Between Power and Performance for a Scientific Visualization Algorithm Stephanie Labasan & Matt Larsen (University of Oregon), Hank Childs (Lawrence Berkeley National Laboratory) 26

More information

From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation

From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation Erik Schnetter, Perimeter Institute with M. Blazewicz, I. Hinder, D. Koppelman, S. Brandt, M. Ciznicki, M.

More information

It s not my fault! Finding errors in parallel codes 找並行程序的錯誤

It s not my fault! Finding errors in parallel codes 找並行程序的錯誤 It s not my fault! Finding errors in parallel codes 找並行程序的錯誤 David Abramson Minh Dinh (UQ) Chao Jin (UQ) Research Computing Centre, University of Queensland, Brisbane Australia Luiz DeRose (Cray) Bob Moench

More information