A Coarray Fortran Implementation to Support Data-Intensive Application Development
|
|
- Lucy Phillips
- 5 years ago
- Views:
Transcription
1 A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati 1, Alan Richardson 2, Terrence Liao 3, Henri Calandra 3, Barbara Chapman 1 Data-Intensive Scalable Computing Systems 2012 (DISCS 12) Workshop, November 16, Department of Computer Science, University of Houston 2 Department of Earth, Atmospheric, and Planetary Sciences, MIT 3 Total E&P 1
2 Oil and Gas Industry: Compute Needs Industry is looking for faster and more cost-effective ways to process massive amounts of data more powerful hardware more productive programming models innovative software techniques 2
3 Outline Fortran 2008 parallel processing additions (CAF) CAF Implementation in OpenUH Fortran compiler Application port to CAF and Results Further extensions for Parallel I/O Closing Remarks 3
4 Outline Fortran 2008 parallel processing additions (CAF) CAF Implementation in OpenUH Fortran compiler Application port to CAF and Results Further extensions for Parallel I/O Closing Remarks 4
5 Coarray Model in Fortran 2008 Derives from Co-Array Fortran (CAF) SPMD execution model, PGAS memory model execution entities called images coarrays: globally-accessible, symmetric data objects additional intrinsic subroutines/functions for querying process and data information additional statements in language for synchronization 5
6 Working with Distributed Data using Coarrays * M real:: B[M, *] B references local B B[3,4] references local B B[3,3] references B in left neighbor 6
7 Working with Distributed Data using Coarrays * real:: B(10,10)[M, *] B(2:4,2:4) references local subarray of B B(2:4,2:4)[3,4] references local subarray of B B(2:4,2:4)[3,3] references subarray of B in left neighbor M 7
8 2D Halo Exchange with MPI real :: a(0:r+1, 0:C+1) call mpi_isend( a(1,1:c), C, mpi_real, & top(myp), TAG,...) call mpi_irecv( a(r+1,1:c), C, mpi_real, & bottom(myp), TAG,...) call mpi_isend( a(r,1:c), C, mpi_real, & bottom(myp), TAG,...) call mpi_irecv( a(0,1:c), C, mpi_real, & top(myp), TAG,...) call mpi_isend( a(1:r,c), R, mpi_real, & right(myp), TAG,...) call mpi_irecv( a(1:r,0), R, mpi_real, & left(myp), TAG,...) call mpi_isend( a(1:r,1), R, mpi_real, & left(myp), TAG,...) call mpi_irecv( a(c+1,1:r), R, mpi_real, & right(myp), TAG,...) call mpi_waitall( 8,...) 8
9 2D Halo Exchange Example with CAF real :: a(0:r+1, 0:C+1)[pR,*] a(r+1,1)[top(1),top(2)] = a(1,1:c) a(0,1:c)[bottom(1),bottom(2)] = a(r,1:c) a(1:r,0)[right(1),right(2)] = a(1:r,c) a(1:r,c+1)[left(1),left(2)] = a(1:r,1) sync all 9
10 Outline Fortran 2008 parallel processing additions (CAF) CAF Implementation in OpenUH Fortran compiler Application port to CAF and Results Further extensions for Parallel I/O Closing Remarks 10
11 Implementation of CAF OpenUH compiler an industry-quality, optimizing compiler based on Open64 features: dependence and data-flow analysis, interprocedural analysis, OpenMP backend supports multiple targets (x86_64, IA64, IA32, MIPS, PTX) OpenUH Compiler CAF Source Code Fortran Front-End with coarray support Coarray Translation Phase Loop Optimizer Global Optimizer Code Gen OpenUH CAF Runtime Library exec. 11
12 Runtime Support for CAF Runtime Interface (libcaf) Collectives Support (e.g. reductions) PGAS Memory Allocation 1-sided Communication Synchronization Atomics Portable Communication Substrate: GASNet or ARMCI 12
13 Comparison with other Implementations Compiler Commercial/Free Fortran 2008 Coarray Support? OpenUH Free Yes G95 Partially Free, No longer supported Missing Locks Support Gfortran Free In progress Rice CAF 2.0 Free Partially, but adds different features Cray Fortran Commercial Yes Intel Fortran Commercial Yes 13
14 Outline Fortran 2008 parallel processing additions (CAF) CAF Implementation in OpenUH Fortran compiler Application port to CAF and Results Further extensions for Parallel I/O Closing Remarks 14
15 Seismic Subsurface Imaging: Reverse Time Migration A source wave is emitted per shot Reflected waves captured by array of sensors RTM (in time domain) uses finite difference method to numerically solve wave equation and reconstruct subsurface image (in parallel, with domain decomposition) 15
16 RTM Implementations Isotropic simplest model assumes reflected waves propagate at same speed in every direction from a point only swaps faces (6 swaps in halo exchange) Tilted Transverse Isotropy (TTI) assumes waves may propagate at different speeds swaps faces and edges (18 swaps in halo exchange) 16
17 Typical Data Usage 82 thousand shots data parallel problem, where each shot can be processed independently in parallel each shot may handle ~80 MB of data so, total data to analyze is ~6 TB Handling I/O C I/O reads in velocity and coefficient models Shot headers read by master and distributed Each processor writes to a distinct file, and file is merged in post-processing step 17
18 Results for CAF RTM port Total Domain Size: 1024 x 768 x 512 Forward Shot Isotropic case: up to 32% faster compared to corresponding MPI implementation TTI case: competitive performance with MPI 18
19 Results for CAF RTM port Total Domain Size: 1024 x 768 x 512 Backward Shot Isotropic case: performance hit at 256 procs TTI case: lagging a bit behind MPI 19
20 Outline Fortran 2008 parallel processing additions (CAF) CAF Implementation in OpenUH Fortran compiler Application port to CAF and Results Further extensions for Parallel I/O Closing Remarks 20
21 Extending Fortran for Parallel I/O We are currently designing a prototype implementation for a parallel I/O language extension Fortran I/O was not yet extended to facilitate cooperative I/O to shared files original Co-Array Fortran specified a simple extension to Fortran I/O parallel I/O may be added in a future version of the standard 21
22 Fortran I/O Fortran provides interfaces for formatted and unformatted I/O open( 10, file= fn, action= write, & access= direct, recl=k ) write (10, rec=3) A write file fn connected to unit 10 record 1 record 2 record 3 record 4 A 22
23 Current limitations of I/O Issues: 1. no defined, legal way for multiple images to access the same file 2. a file is a 1-dimensional sequence of records 3. records are read/written one at a time 4. no mechanism for collectives accesses to a shared file amongst multiple images 23
24 Proposed Extension for Parallel I/O Allow a file to be share-opened, e.g. OPEN( 10, file= fn, TEAM= yes, ) all images form a team with shared access to the same file implicit synchronization recommended only for direct access mode FLUSH statement used to ensure changes by one image are visible to other images in team CLOSE statement has implicit image synchronization 24
25 Further extensions we re exploring Multi-dimensional view of records Read/write multiple records at a time Collective read/write operations on shared files open( 10, file= fn, action= write, & access= direct, ndim=2, & dims=(/m/), team= yes, recl=k ) file fn connected to unit 10 1,1 1,2 1,3 2,1 2,2 2,3 3,1 3,2 3,3 4,1 4,2 4,3 5,1 5,2 5,3 M,1 M,2 M,3 25
26 Further extensions we re exploring Multi-dimensional view of records Read/write multiple records at a time Collective read/write operations on shared files write (10, rec_lb=(/ 2,2 /), rec_ub=(/ 4,3 /) ) & A(1:4, 1:2) A(1:4,1:2) write file fn connected to unit 10 1,1 1,2 1,3 2,1 2,2 2,3 3,1 3,2 3,3 4,1 4,2 4,3 5,1 5,2 5,3 M,1 M,2 M,3 26
27 Further extensions we re exploring Multi-dimensional view of records Read/write multiple records at a time Collective read/write operations on shared files type(t) :: A(2,2)[3,*] my_rec_lbs = get_rec_lbs( this_image() ) my_rec_ubs = get_rec_ubs( this_image() ) write_team( 10, rec_lb=my_rec_lbs, & rec_lb=my_rec_lbs) & A(:,:) file fn connected to unit 10 1,1 1,2 1,3 1,4 2,1 2,2 2,3 2,4 3,1 3,2 3,3 3,4 4,1 4,2 4,3 4,4 A(1:2,1:2)[1,1] A(1:2,1:2)[1,2] write_team 5,1 5,2 5,3 5,4 6,1 6,2 6,3 6,4 A(1:2,1:2)[2,1] A(1:2,1:2)[2,2] A(1:2,1:2)[3,1] A(1:2,1:2)[3,2] 27
28 Leverage Global Arrays as memory buffers for I/O Implementation in progress which utilizes global arrays (GA) as I/O buffers in memory compute nodes I/O requests I/O nodes asynchronous disk updates 28
29 Outline Fortran 2008 parallel processing additions (CAF) CAF Implementation in OpenUH Fortran compiler Application port to CAF and Results Further extensions for Parallel I/O Closing Remarks 29
30 In Summary Fortran coarray model may be used for processing large data sets Developed implementation that s freely available and used it to develop RTM application Fortran s I/O model doesn t support parallel I/O for large-scale, multi-dimensional array data sets, and we are working on addressing this 30
31 Thanks 31
A Coarray Fortran Implementation to Support Data-Intensive Application Development
A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati, Alan Richardson, Terrence Liao, Henri Calandra and Barbara Chapman Department of Computer Science,
More informationA Coarray Fortran Implementation to Support Data-Intensive Application Development
A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati, Alan Richardson, Terrence Liao, Henri Calandra and Barbara Chapman Department of Computer Science,
More informationAn Open64-based Compiler and Runtime Implementation for Coarray Fortran
An Open64-based Compiler and Runtime Implementation for Coarray Fortran talk by Deepak Eachempati Presented at: Open64 Developer Forum 2010 8/25/2010 1 Outline Motivation Implementation Overview Evaluation
More informationBringing a scientific application to the distributed world using PGAS
Bringing a scientific application to the distributed world using PGAS Performance, Portability and Usability of Fortran Coarrays Jeffrey Salmond August 15, 2017 Research Software Engineering University
More informationImplementation and Evaluation of Coarray Fortran Translator Based on OMNI XcalableMP. October 29, 2015 Hidetoshi Iwashita, RIKEN AICS
Implementation and Evaluation of Coarray Fortran Translator Based on OMNI XcalableMP October 29, 2015 Hidetoshi Iwashita, RIKEN AICS Background XMP Contains Coarray Features XcalableMP (XMP) A PGAS language,
More informationExploring XMP programming model applied to Seismic Imaging application. Laurence BEAUDE
Exploring XMP programming model applied to Seismic Imaging application Introduction Total at a glance: 96 000 employees in more than 130 countries Upstream operations (oil and gas exploration, development
More informationFortran Coarrays John Reid, ISO Fortran Convener, JKR Associates and Rutherford Appleton Laboratory
Fortran Coarrays John Reid, ISO Fortran Convener, JKR Associates and Rutherford Appleton Laboratory This talk will explain the objectives of coarrays, give a quick summary of their history, describe the
More informationLecture V: Introduction to parallel programming with Fortran coarrays
Lecture V: Introduction to parallel programming with Fortran coarrays What is parallel computing? Serial computing Single processing unit (core) is used for solving a problem One task processed at a time
More informationIMPLEMENTATION AND EVALUATION OF ADDITIONAL PARALLEL FEATURES IN COARRAY FORTRAN
IMPLEMENTATION AND EVALUATION OF ADDITIONAL PARALLEL FEATURES IN COARRAY FORTRAN A Thesis Presented to the Faculty of the Department of Computer Science University of Houston In Partial Fulfillment of
More informationParallel Programming in Fortran with Coarrays
Parallel Programming in Fortran with Coarrays John Reid, ISO Fortran Convener, JKR Associates and Rutherford Appleton Laboratory Fortran 2008 is now in FDIS ballot: only typos permitted at this stage.
More informationFortran 2008: what s in it for high-performance computing
Fortran 2008: what s in it for high-performance computing John Reid, ISO Fortran Convener, JKR Associates and Rutherford Appleton Laboratory Fortran 2008 has been completed and is about to be published.
More informationTowards Exascale Computing with Fortran 2015
Towards Exascale Computing with Fortran 2015 Alessandro Fanfarillo National Center for Atmospheric Research Damian Rouson Sourcery Institute Outline Parallelism in Fortran 2008 SPMD PGAS Exascale challenges
More informationFirst Experiences with Application Development with Fortran Damian Rouson
First Experiences with Application Development with Fortran 2018 Damian Rouson Overview Fortran 2018 in a Nutshell ICAR & Coarray ICAR WRF-Hydro Results Conclusions www.yourwebsite.com Overview Fortran
More information6.1 Multiprocessor Computing Environment
6 Parallel Computing 6.1 Multiprocessor Computing Environment The high-performance computing environment used in this book for optimization of very large building structures is the Origin 2000 multiprocessor,
More informationOPENSHMEM AS AN EFFECTIVE COMMUNICATION LAYER FOR PGAS MODELS
OPENSHMEM AS AN EFFECTIVE COMMUNICATION LAYER FOR PGAS MODELS A Thesis Presented to the Faculty of the Department of Computer Science University of Houston In Partial Fulfillment of the Requirements for
More informationCo-array Fortran Performance and Potential: an NPB Experimental Study. Department of Computer Science Rice University
Co-array Fortran Performance and Potential: an NPB Experimental Study Cristian Coarfa Jason Lee Eckhardt Yuri Dotsenko John Mellor-Crummey Department of Computer Science Rice University Parallel Programming
More informationAn Open-Source Compiler and Runtime Implementation for Coarray Fortran
An Open-Source Compiler and Runtime Implementation for Coarray Fortran Deepak Eachempati Hyoung Joon Jun Barbara Chapman Computer Science Department University of Houston Houston, TX, 77004, USA {dreachem,
More informationAdvanced Features. SC10 Tutorial, November 15 th Parallel Programming with Coarray Fortran
Advanced Features SC10 Tutorial, November 15 th 2010 Parallel Programming with Coarray Fortran Advanced Features: Overview Execution segments and Synchronisation Non-global Synchronisation Critical Sections
More informationMigrating A Scientific Application from MPI to Coarrays. John Ashby and John Reid HPCx Consortium Rutherford Appleton Laboratory STFC UK
Migrating A Scientific Application from MPI to Coarrays John Ashby and John Reid HPCx Consortium Rutherford Appleton Laboratory STFC UK Why and Why Not? +MPI programming is arcane +New emerging paradigms
More informationParallel Programming Features in the Fortran Standard. Steve Lionel 12/4/2012
Parallel Programming Features in the Fortran Standard Steve Lionel 12/4/2012 Agenda Overview of popular parallelism methodologies FORALL a look back DO CONCURRENT Coarrays Fortran 2015 Q+A 12/5/2012 2
More informationHigh performance Computing and O&G Challenges
High performance Computing and O&G Challenges 2 Seismic exploration challenges High Performance Computing and O&G challenges Worldwide Context Seismic,sub-surface imaging Computing Power needs Accelerating
More informationAnalyzing the Performance of IWAVE on a Cluster using HPCToolkit
Analyzing the Performance of IWAVE on a Cluster using HPCToolkit John Mellor-Crummey and Laksono Adhianto Department of Computer Science Rice University {johnmc,laksono}@rice.edu TRIP Meeting March 30,
More informationParallel Programming with Coarray Fortran
Parallel Programming with Coarray Fortran SC10 Tutorial, November 15 th 2010 David Henty, Alan Simpson (EPCC) Harvey Richardson, Bill Long, Nathan Wichmann (Cray) Tutorial Overview The Fortran Programming
More informationLeveraging OpenCoarrays to Support Coarray Fortran on IBM Power8E
Executive Summary Leveraging OpenCoarrays to Support Coarray Fortran on IBM Power8E Alessandro Fanfarillo, Damian Rouson Sourcery Inc. www.sourceryinstitue.org We report on the experience of installing
More informationImplementation of an integrated efficient parallel multiblock Flow solver
Implementation of an integrated efficient parallel multiblock Flow solver Thomas Bönisch, Panagiotis Adamidis and Roland Rühle adamidis@hlrs.de Outline Introduction to URANUS Why using Multiblock meshes
More informationPortable, MPI-Interoperable! Coarray Fortran
Portable, MPI-Interoperable! Coarray Fortran Chaoran Yang, 1 Wesley Bland, 2! John Mellor-Crummey, 1 Pavan Balaji 2 1 Department of Computer Science! Rice University! Houston, TX 2 Mathematics and Computer
More informationSUMMARY INTRODUCTION NEW METHOD
Reverse Time Migration in the presence of known sharp interfaces Alan Richardson and Alison E. Malcolm, Department of Earth, Atmospheric and Planetary Sciences, Massachusetts Institute of Technology SUMMARY
More informationPortable, MPI-Interoperable! Coarray Fortran
Portable, MPI-Interoperable! Coarray Fortran Chaoran Yang, 1 Wesley Bland, 2! John Mellor-Crummey, 1 Pavan Balaji 2 1 Department of Computer Science! Rice University! Houston, TX 2 Mathematics and Computer
More informationCMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)
CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) Parallel Programming with Message Passing and Directives 2 MPI + OpenMP Some applications can
More informationAddressing Heterogeneity in Manycore Applications
Addressing Heterogeneity in Manycore Applications RTM Simulation Use Case stephane.bihan@caps-entreprise.com Oil&Gas HPC Workshop Rice University, Houston, March 2008 www.caps-entreprise.com Introduction
More informationAppendix D. Fortran quick reference
Appendix D Fortran quick reference D.1 Fortran syntax... 315 D.2 Coarrays... 318 D.3 Fortran intrisic functions... D.4 History... 322 323 D.5 Further information... 324 Fortran 1 is the oldest high-level
More informationScalasca performance properties The metrics tour
Scalasca performance properties The metrics tour Markus Geimer m.geimer@fz-juelich.de Scalasca analysis result Generic metrics Generic metrics Time Total CPU allocation time Execution Overhead Visits Hardware
More informationCo-arrays to be included in the Fortran 2008 Standard
Co-arrays to be included in the Fortran 2008 Standard John Reid, ISO Fortran Convener The ISO Fortran Committee has decided to include co-arrays in the next revision of the Standard. Aim of this talk:
More informationGhost Cell Pattern. Fredrik Berg Kjolstad. January 26, 2010
Ghost Cell Pattern Fredrik Berg Kjolstad University of Illinois Urbana-Champaign, USA kjolsta1@illinois.edu Marc Snir University of Illinois Urbana-Champaign, USA snir@illinois.edu January 26, 2010 Problem
More informationLecture 32: Partitioned Global Address Space (PGAS) programming models
COMP 322: Fundamentals of Parallel Programming Lecture 32: Partitioned Global Address Space (PGAS) programming models Zoran Budimlić and Mack Joyner {zoran, mjoyner}@rice.edu http://comp322.rice.edu COMP
More informationBarbara Chapman, Gabriele Jost, Ruud van der Pas
Using OpenMP Portable Shared Memory Parallel Programming Barbara Chapman, Gabriele Jost, Ruud van der Pas The MIT Press Cambridge, Massachusetts London, England c 2008 Massachusetts Institute of Technology
More informationExperiences Developing the OpenUH Compiler and Runtime Infrastructure
Experiences Developing the OpenUH Compiler and Runtime Infrastructure Barbara Chapman and Deepak Eachempati University of Houston Oscar Hernandez Oak Ridge National Laboratory Abstract The OpenUH compiler
More informationParallel Programming without MPI Using Coarrays in Fortran SUMMERSCHOOL
Parallel Programming without MPI Using Coarrays in Fortran SUMMERSCHOOL 2007 2015 August 5, 2015 Ge Baolai SHARCNET Western University Outline What is coarray How to write: Terms, syntax How to compile
More informationShared memory programming model OpenMP TMA4280 Introduction to Supercomputing
Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing NTNU, IMF February 16. 2018 1 Recap: Distributed memory programming model Parallelism with MPI. An MPI execution is started
More informationIntroduction to Parallel Programming
Introduction to Parallel Programming Overview Parallel programming allows the user to use multiple cpus concurrently Reasons for parallel execution: shorten execution time by spreading the computational
More informationLubuntu Linux Virtual Machine
Lubuntu Linux 18.04 Virtual Machine About Us Slide / 01 Founded in 2015, Sourcery Institute is a California nonprofit public-benefit corporation engaged in research, education, and advisory services in
More informationPerformance Comparison between Two Programming Models of XcalableMP
Performance Comparison between Two Programming Models of XcalableMP H. Sakagami Fund. Phys. Sim. Div., National Institute for Fusion Science XcalableMP specification Working Group (XMP-WG) Dilemma in Parallel
More informationCAF versus MPI Applicability of Coarray Fortran to a Flow Solver
CAF versus MPI Applicability of Coarray Fortran to a Flow Solver Manuel Hasert, Harald Klimach, Sabine Roller m.hasert@grs-sim.de Applied Supercomputing in Engineering Motivation We develop several CFD
More informationFriday, May 25, User Experiments with PGAS Languages, or
User Experiments with PGAS Languages, or User Experiments with PGAS Languages, or It s the Performance, Stupid! User Experiments with PGAS Languages, or It s the Performance, Stupid! Will Sawyer, Sergei
More informationThe Cray Programming Environment. An Introduction
The Cray Programming Environment An Introduction Vision Cray systems are designed to be High Productivity as well as High Performance Computers The Cray Programming Environment (PE) provides a simple consistent
More informationCoarray Fortran: Past, Present, and Future. John Mellor-Crummey Department of Computer Science Rice University
Coarray Fortran: Past, Present, and Future John Mellor-Crummey Department of Computer Science Rice University johnmc@cs.rice.edu CScADS Workshop on Leadership Computing July 19-22, 2010 1 Staff Bill Scherer
More informationCompilers and Compiler-based Tools for HPC
Compilers and Compiler-based Tools for HPC John Mellor-Crummey Department of Computer Science Rice University http://lacsi.rice.edu/review/2004/slides/compilers-tools.pdf High Performance Computing Algorithms
More information3D Finite Difference Time-Domain Modeling of Acoustic Wave Propagation based on Domain Decomposition
3D Finite Difference Time-Domain Modeling of Acoustic Wave Propagation based on Domain Decomposition UMR Géosciences Azur CNRS-IRD-UNSA-OCA Villefranche-sur-mer Supervised by: Dr. Stéphane Operto Jade
More informationOpenSHMEM as a Portable Communication Layer for PGAS Models: A Case Study with Coarray Fortran
OpenSHMEM as a Portable Communication Layer for PGAS Models: A Case Study with Coarray Fortran Naveen Namashivayam, Deepak Eachempati, Dounia Khaldi and Barbara Chapman Department of Computer Science University
More informationReducing the Computational Complexity of Adjoint Computations
Reducing the Computational Complexity of Adjoint Computations William W. Symes CAAM, Rice University, 2007 Agenda Discrete simulation, objective definition Adjoint state method Checkpointing Griewank s
More informationSINGLE-SIDED PGAS COMMUNICATIONS LIBRARIES. Basic usage of OpenSHMEM
SINGLE-SIDED PGAS COMMUNICATIONS LIBRARIES Basic usage of OpenSHMEM 2 Outline Concept and Motivation Remote Read and Write Synchronisation Implementations OpenSHMEM Summary 3 Philosophy of the talks In
More informationIntroduction to parallel computing concepts and technics
Introduction to parallel computing concepts and technics Paschalis Korosoglou (support@grid.auth.gr) User and Application Support Unit Scientific Computing Center @ AUTH Overview of Parallel computing
More informationCompute Node Linux: Overview, Progress to Date & Roadmap
Compute Node Linux: Overview, Progress to Date & Roadmap David Wallace Cray Inc ABSTRACT: : This presentation will provide an overview of Compute Node Linux(CNL) for the CRAY XT machine series. Compute
More informationTechniques to improve the scalability of Checkpoint-Restart
Techniques to improve the scalability of Checkpoint-Restart Bogdan Nicolae Exascale Systems Group IBM Research Ireland 1 Outline A few words about the lab and team Challenges of Exascale A case for Checkpoint-Restart
More informationPortable SHMEMCache: A High-Performance Key-Value Store on OpenSHMEM and MPI
Portable SHMEMCache: A High-Performance Key-Value Store on OpenSHMEM and MPI Huansong Fu*, Manjunath Gorentla Venkata, Neena Imam, Weikuan Yu* *Florida State University Oak Ridge National Laboratory Outline
More informationMPI Programming. Henrik R. Nagel Scientific Computing IT Division
1 MPI Programming Henrik R. Nagel Scientific Computing IT Division 2 Outline Introduction Finite Difference Method Finite Element Method LU Factorization SOR Method Monte Carlo Method Molecular Dynamics
More informationExperiences Developing the OpenUH Compiler and Runtime Infrastructure
Noname manuscript No. (will be inserted by the editor) Experiences Developing the OpenUH Compiler and Runtime Infrastructure Barbara Chapman Deepak Eachempati Oscar Hernandez Received: date / Accepted:
More informationPGAS: Partitioned Global Address Space
.... PGAS: Partitioned Global Address Space presenter: Qingpeng Niu January 26, 2012 presenter: Qingpeng Niu : PGAS: Partitioned Global Address Space 1 Outline presenter: Qingpeng Niu : PGAS: Partitioned
More informationScalability issues : HPC Applications & Performance Tools
High Performance Computing Systems and Technology Group Scalability issues : HPC Applications & Performance Tools Chiranjib Sur HPC @ India Systems and Technology Lab chiranjib.sur@in.ibm.com Top 500 :
More informationAMD S X86 OPEN64 COMPILER. Michael Lai AMD
AMD S X86 OPEN64 COMPILER Michael Lai AMD CONTENTS Brief History AMD and Open64 Compiler Overview Major Components of Compiler Important Optimizations Recent Releases Performance Applications and Libraries
More informationDesigning Parallel Programs. This review was developed from Introduction to Parallel Computing
Designing Parallel Programs This review was developed from Introduction to Parallel Computing Author: Blaise Barney, Lawrence Livermore National Laboratory references: https://computing.llnl.gov/tutorials/parallel_comp/#whatis
More informationCSE. Parallel Algorithms on a cluster of PCs. Ian Bush. Daresbury Laboratory (With thanks to Lorna Smith and Mark Bull at EPCC)
Parallel Algorithms on a cluster of PCs Ian Bush Daresbury Laboratory I.J.Bush@dl.ac.uk (With thanks to Lorna Smith and Mark Bull at EPCC) Overview This lecture will cover General Message passing concepts
More informationTowards Exascale Programming Models HPC Summit, Prague Erwin Laure, KTH
Towards Exascale Programming Models HPC Summit, Prague Erwin Laure, KTH 1 Exascale Programming Models With the evolution of HPC architecture towards exascale, new approaches for programming these machines
More informationCompute Node Linux (CNL) The Evolution of a Compute OS
Compute Node Linux (CNL) The Evolution of a Compute OS Overview CNL The original scheme plan, goals, requirements Status of CNL Plans Features and directions Futures May 08 Cray Inc. Proprietary Slide
More informationNew Programming Paradigms: Partitioned Global Address Space Languages
Raul E. Silvera -- IBM Canada Lab rauls@ca.ibm.com ECMWF Briefing - April 2010 New Programming Paradigms: Partitioned Global Address Space Languages 2009 IBM Corporation Outline Overview of the PGAS programming
More informationPorting GASNet to Portals: Partitioned Global Address Space (PGAS) Language Support for the Cray XT
Porting GASNet to Portals: Partitioned Global Address Space (PGAS) Language Support for the Cray XT Paul Hargrove Dan Bonachea, Michael Welcome, Katherine Yelick UPC Review. July 22, 2009. What is GASNet?
More informationDangerously Clever X1 Application Tricks
Dangerously Clever X1 Application Tricks CUG 2004 James B. White III (Trey) trey@ornl.gov 1 Acknowledgement Research sponsored by the Mathematical, Information, and Division, Office of Advanced Scientific
More informationProgramming Models for Supercomputing in the Era of Multicore
Programming Models for Supercomputing in the Era of Multicore Marc Snir MULTI-CORE CHALLENGES 1 Moore s Law Reinterpreted Number of cores per chip doubles every two years, while clock speed decreases Need
More informationNORSAR-3D. Predict and Understand Seismic. Exploring the Earth. Find the answers with NORSAR-3D seismic ray-modelling
Exploring the Earth NORSAR-3D Predict and Understand Seismic Is undershooting possible? Which is the best survey geometry? MAZ, WAZ, RAZ, Coil, OBS? Why are there shadow zones? Can they be illuminated?
More informationParticle-in-Cell Simulations on Modern Computing Platforms. Viktor K. Decyk and Tajendra V. Singh UCLA
Particle-in-Cell Simulations on Modern Computing Platforms Viktor K. Decyk and Tajendra V. Singh UCLA Outline of Presentation Abstraction of future computer hardware PIC on GPUs OpenCL and Cuda Fortran
More informationParallel Programming. Libraries and Implementations
Parallel Programming Libraries and Implementations Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationHigh Performance Computing Implementation on a Risk Assessment Problem
High Performance Computing Implementation on a Risk Assessment Problem Carlos A. Acosta 1 and Juan D. Ocampo 2 University of Texas at San Antonio, San Antonio, TX, 78249 Harry Millwater, Jr. 3 University
More informationOpenMP 4.0/4.5. Mark Bull, EPCC
OpenMP 4.0/4.5 Mark Bull, EPCC OpenMP 4.0/4.5 Version 4.0 was released in July 2013 Now available in most production version compilers support for device offloading not in all compilers, and not for all
More informationParallelisation of Surface-Related Multiple Elimination
Parallelisation of Surface-Related Multiple Elimination G. M. van Waveren High Performance Computing Centre, Groningen, The Netherlands and I.M. Godfrey Stern Computing Systems, Lyon,
More informationCS 470 Spring Parallel Languages. Mike Lam, Professor
CS 470 Spring 2017 Mike Lam, Professor Parallel Languages Graphics and content taken from the following: http://dl.acm.org/citation.cfm?id=2716320 http://chapel.cray.com/papers/briefoverviewchapel.pdf
More informationPERFORMANCE OF PARALLEL IO ON LUSTRE AND GPFS
PERFORMANCE OF PARALLEL IO ON LUSTRE AND GPFS David Henty and Adrian Jackson (EPCC, The University of Edinburgh) Charles Moulinec and Vendel Szeremi (STFC, Daresbury Laboratory Outline Parallel IO problem
More informationACCELERATING THE PRODUCTION OF SYNTHETIC SEISMOGRAMS BY A MULTICORE PROCESSOR CLUSTER WITH MULTIPLE GPUS
ACCELERATING THE PRODUCTION OF SYNTHETIC SEISMOGRAMS BY A MULTICORE PROCESSOR CLUSTER WITH MULTIPLE GPUS Ferdinando Alessi Annalisa Massini Roberto Basili INGV Introduction The simulation of wave propagation
More informationPyramid-shaped grid for elastic wave propagation Feng Chen * and Sheng Xu, CGGVeritas
Feng Chen * and Sheng Xu, CGGVeritas Summary Elastic wave propagation is elemental to wave-equationbased migration and modeling. Conventional simulation of wave propagation is done on a grid of regular
More informationFVM - How to program the Multi-Core FVM instead of MPI
FVM - How to program the Multi-Core FVM instead of MPI DLR, 15. October 2009 Dr. Mirko Rahn Competence Center High Performance Computing and Visualization Fraunhofer Institut for Industrial Mathematics
More informationCray XE6 Performance Workshop
Cray XE6 Performance Workshop Multicore Programming Overview Shared memory systems Basic Concepts in OpenMP Brief history of OpenMP Compiling and running OpenMP programs 2 1 Shared memory systems OpenMP
More informationOverpartioning with the Rice dhpf Compiler
Overpartioning with the Rice dhpf Compiler Strategies for Achieving High Performance in High Performance Fortran Ken Kennedy Rice University http://www.cs.rice.edu/~ken/presentations/hug00overpartioning.pdf
More informationMPI: A Message-Passing Interface Standard
MPI: A Message-Passing Interface Standard Version 2.1 Message Passing Interface Forum June 23, 2008 Contents Acknowledgments xvl1 1 Introduction to MPI 1 1.1 Overview and Goals 1 1.2 Background of MPI-1.0
More informationProgramming Scalable Systems with MPI. Clemens Grelck, University of Amsterdam
Clemens Grelck University of Amsterdam UvA / SurfSARA High Performance Computing and Big Data Course June 2014 Parallel Programming with Compiler Directives: OpenMP Message Passing Gentle Introduction
More information. Programming in Chapel. Kenjiro Taura. University of Tokyo
.. Programming in Chapel Kenjiro Taura University of Tokyo 1 / 44 Contents. 1 Chapel Chapel overview Minimum introduction to syntax Task Parallelism Locales Data parallel constructs Ranges, domains, and
More informationApplication Performance on IME
Application Performance on IME Toine Beckers, DDN Marco Grossi, ICHEC Burst Buffer Designs Introduce fast buffer layer Layer between memory and persistent storage Pre-stage application data Buffer writes
More informationLab 3: Depth imaging using Reverse Time Migration
Due Wednesday, May 1, 2013 TA: Yunyue (Elita) Li Lab 3: Depth imaging using Reverse Time Migration Your Name: Anne of Cleves ABSTRACT In this exercise you will familiarize yourself with full wave-equation
More informationEnabling Highly-Scalable Remote Memory Access Programming with MPI-3 One Sided
Enabling Highly-Scalable Remote Memory Access Programming with MPI-3 One Sided ROBERT GERSTENBERGER, MACIEJ BESTA, TORSTEN HOEFLER MPI-3.0 RMA MPI-3.0 supports RMA ( MPI One Sided ) Designed to react to
More informationAdditional Parallel Features in Fortran An Overview of ISO/IEC TS 18508
Additional Parallel Features in Fortran An Overview of ISO/IEC TS 18508 Dr. Reinhold Bader Leibniz Supercomputing Centre Introductory remarks Technical Specification a Mini-Standard permits implementors
More informationUnderstanding Communication and MPI on Cray XC40 C O M P U T E S T O R E A N A L Y Z E
Understanding Communication and MPI on Cray XC40 Features of the Cray MPI library Cray MPI uses MPICH3 distribution from Argonne Provides a good, robust and feature rich MPI Well tested code for high level
More informationLLVM-based Communication Optimizations for PGAS Programs
LLVM-based Communication Optimizations for PGAS Programs nd Workshop on the LLVM Compiler Infrastructure in HPC @ SC15 Akihiro Hayashi (Rice University) Jisheng Zhao (Rice University) Michael Ferguson
More informationReverse time migration with random boundaries
Reverse time migration with random boundaries Robert G. Clapp ABSTRACT Reading wavefield checkpoints from disk is quickly becoming the bottleneck in Reverse Time Migration. We eliminate the need to write
More informationComments on wavefield propagation using Reverse-time and Downward continuation
Comments on wavefield propagation using Reverse-time and Downward continuation John C. Bancroft ABSTRACT Each iteration a of Full-waveform inversion requires the migration of the difference between the
More informationA brief introduction to OpenMP
A brief introduction to OpenMP Alejandro Duran Barcelona Supercomputing Center Outline 1 Introduction 2 Writing OpenMP programs 3 Data-sharing attributes 4 Synchronization 5 Worksharings 6 Task parallelism
More informationMemory allocation and sample API calls. Preliminary Gemini performance measurements
DMAPP in context Basic features of the API Memory allocation and sample API calls Preliminary Gemini performance measurements 2 The Distributed Memory Application (DMAPP) API Supports features of the Gemini
More informationHigh Performance Fortran. James Curry
High Performance Fortran James Curry Wikipedia! New Fortran statements, such as FORALL, and the ability to create PURE (side effect free) procedures Compiler directives for recommended distributions of
More informationMPI Casestudy: Parallel Image Processing
MPI Casestudy: Parallel Image Processing David Henty 1 Introduction The aim of this exercise is to write a complete MPI parallel program that does a very basic form of image processing. We will start by
More informationA Heat-Transfer Example with MPI Rolf Rabenseifner
A Heat-Transfer Example with MPI (short version) Rolf Rabenseifner rabenseifner@hlrs.de University of Stuttgart High-Performance Computing-Center Stuttgart (HLRS) www.hlrs.de A Heat-Transfer Example with
More informationExploring XcalableMP. Shun Liang. August 24, 2012
Exploring XcalableMP Shun Liang August 24, 2012 MSc in High Performance Computing The University of Edinburgh Year of Presentation: 2012 Abstract This project has implemented synthetic and application
More informationExploration seismology and the return of the supercomputer
Exploration seismology and the return of the supercomputer Exploring scalability for speed in development and delivery Sverre Brandsberg-Dahl Chief Geophysicist, Imaging and Engineering Marine seismic
More informationPROGRAMMING MODEL EXAMPLES
( Cray Inc 2015) PROGRAMMING MODEL EXAMPLES DEMONSTRATION EXAMPLES OF VARIOUS PROGRAMMING MODELS OVERVIEW Building an application to use multiple processors (cores, cpus, nodes) can be done in various
More information