A Coarray Fortran Implementation to Support Data-Intensive Application Development

Size: px
Start display at page:

Download "A Coarray Fortran Implementation to Support Data-Intensive Application Development"

Transcription

1 A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati 1, Alan Richardson 2, Terrence Liao 3, Henri Calandra 3, Barbara Chapman 1 Data-Intensive Scalable Computing Systems 2012 (DISCS 12) Workshop, November 16, Department of Computer Science, University of Houston 2 Department of Earth, Atmospheric, and Planetary Sciences, MIT 3 Total E&P 1

2 Oil and Gas Industry: Compute Needs Industry is looking for faster and more cost-effective ways to process massive amounts of data more powerful hardware more productive programming models innovative software techniques 2

3 Outline Fortran 2008 parallel processing additions (CAF) CAF Implementation in OpenUH Fortran compiler Application port to CAF and Results Further extensions for Parallel I/O Closing Remarks 3

4 Outline Fortran 2008 parallel processing additions (CAF) CAF Implementation in OpenUH Fortran compiler Application port to CAF and Results Further extensions for Parallel I/O Closing Remarks 4

5 Coarray Model in Fortran 2008 Derives from Co-Array Fortran (CAF) SPMD execution model, PGAS memory model execution entities called images coarrays: globally-accessible, symmetric data objects additional intrinsic subroutines/functions for querying process and data information additional statements in language for synchronization 5

6 Working with Distributed Data using Coarrays * M real:: B[M, *] B references local B B[3,4] references local B B[3,3] references B in left neighbor 6

7 Working with Distributed Data using Coarrays * real:: B(10,10)[M, *] B(2:4,2:4) references local subarray of B B(2:4,2:4)[3,4] references local subarray of B B(2:4,2:4)[3,3] references subarray of B in left neighbor M 7

8 2D Halo Exchange with MPI real :: a(0:r+1, 0:C+1) call mpi_isend( a(1,1:c), C, mpi_real, & top(myp), TAG,...) call mpi_irecv( a(r+1,1:c), C, mpi_real, & bottom(myp), TAG,...) call mpi_isend( a(r,1:c), C, mpi_real, & bottom(myp), TAG,...) call mpi_irecv( a(0,1:c), C, mpi_real, & top(myp), TAG,...) call mpi_isend( a(1:r,c), R, mpi_real, & right(myp), TAG,...) call mpi_irecv( a(1:r,0), R, mpi_real, & left(myp), TAG,...) call mpi_isend( a(1:r,1), R, mpi_real, & left(myp), TAG,...) call mpi_irecv( a(c+1,1:r), R, mpi_real, & right(myp), TAG,...) call mpi_waitall( 8,...) 8

9 2D Halo Exchange Example with CAF real :: a(0:r+1, 0:C+1)[pR,*] a(r+1,1)[top(1),top(2)] = a(1,1:c) a(0,1:c)[bottom(1),bottom(2)] = a(r,1:c) a(1:r,0)[right(1),right(2)] = a(1:r,c) a(1:r,c+1)[left(1),left(2)] = a(1:r,1) sync all 9

10 Outline Fortran 2008 parallel processing additions (CAF) CAF Implementation in OpenUH Fortran compiler Application port to CAF and Results Further extensions for Parallel I/O Closing Remarks 10

11 Implementation of CAF OpenUH compiler an industry-quality, optimizing compiler based on Open64 features: dependence and data-flow analysis, interprocedural analysis, OpenMP backend supports multiple targets (x86_64, IA64, IA32, MIPS, PTX) OpenUH Compiler CAF Source Code Fortran Front-End with coarray support Coarray Translation Phase Loop Optimizer Global Optimizer Code Gen OpenUH CAF Runtime Library exec. 11

12 Runtime Support for CAF Runtime Interface (libcaf) Collectives Support (e.g. reductions) PGAS Memory Allocation 1-sided Communication Synchronization Atomics Portable Communication Substrate: GASNet or ARMCI 12

13 Comparison with other Implementations Compiler Commercial/Free Fortran 2008 Coarray Support? OpenUH Free Yes G95 Partially Free, No longer supported Missing Locks Support Gfortran Free In progress Rice CAF 2.0 Free Partially, but adds different features Cray Fortran Commercial Yes Intel Fortran Commercial Yes 13

14 Outline Fortran 2008 parallel processing additions (CAF) CAF Implementation in OpenUH Fortran compiler Application port to CAF and Results Further extensions for Parallel I/O Closing Remarks 14

15 Seismic Subsurface Imaging: Reverse Time Migration A source wave is emitted per shot Reflected waves captured by array of sensors RTM (in time domain) uses finite difference method to numerically solve wave equation and reconstruct subsurface image (in parallel, with domain decomposition) 15

16 RTM Implementations Isotropic simplest model assumes reflected waves propagate at same speed in every direction from a point only swaps faces (6 swaps in halo exchange) Tilted Transverse Isotropy (TTI) assumes waves may propagate at different speeds swaps faces and edges (18 swaps in halo exchange) 16

17 Typical Data Usage 82 thousand shots data parallel problem, where each shot can be processed independently in parallel each shot may handle ~80 MB of data so, total data to analyze is ~6 TB Handling I/O C I/O reads in velocity and coefficient models Shot headers read by master and distributed Each processor writes to a distinct file, and file is merged in post-processing step 17

18 Results for CAF RTM port Total Domain Size: 1024 x 768 x 512 Forward Shot Isotropic case: up to 32% faster compared to corresponding MPI implementation TTI case: competitive performance with MPI 18

19 Results for CAF RTM port Total Domain Size: 1024 x 768 x 512 Backward Shot Isotropic case: performance hit at 256 procs TTI case: lagging a bit behind MPI 19

20 Outline Fortran 2008 parallel processing additions (CAF) CAF Implementation in OpenUH Fortran compiler Application port to CAF and Results Further extensions for Parallel I/O Closing Remarks 20

21 Extending Fortran for Parallel I/O We are currently designing a prototype implementation for a parallel I/O language extension Fortran I/O was not yet extended to facilitate cooperative I/O to shared files original Co-Array Fortran specified a simple extension to Fortran I/O parallel I/O may be added in a future version of the standard 21

22 Fortran I/O Fortran provides interfaces for formatted and unformatted I/O open( 10, file= fn, action= write, & access= direct, recl=k ) write (10, rec=3) A write file fn connected to unit 10 record 1 record 2 record 3 record 4 A 22

23 Current limitations of I/O Issues: 1. no defined, legal way for multiple images to access the same file 2. a file is a 1-dimensional sequence of records 3. records are read/written one at a time 4. no mechanism for collectives accesses to a shared file amongst multiple images 23

24 Proposed Extension for Parallel I/O Allow a file to be share-opened, e.g. OPEN( 10, file= fn, TEAM= yes, ) all images form a team with shared access to the same file implicit synchronization recommended only for direct access mode FLUSH statement used to ensure changes by one image are visible to other images in team CLOSE statement has implicit image synchronization 24

25 Further extensions we re exploring Multi-dimensional view of records Read/write multiple records at a time Collective read/write operations on shared files open( 10, file= fn, action= write, & access= direct, ndim=2, & dims=(/m/), team= yes, recl=k ) file fn connected to unit 10 1,1 1,2 1,3 2,1 2,2 2,3 3,1 3,2 3,3 4,1 4,2 4,3 5,1 5,2 5,3 M,1 M,2 M,3 25

26 Further extensions we re exploring Multi-dimensional view of records Read/write multiple records at a time Collective read/write operations on shared files write (10, rec_lb=(/ 2,2 /), rec_ub=(/ 4,3 /) ) & A(1:4, 1:2) A(1:4,1:2) write file fn connected to unit 10 1,1 1,2 1,3 2,1 2,2 2,3 3,1 3,2 3,3 4,1 4,2 4,3 5,1 5,2 5,3 M,1 M,2 M,3 26

27 Further extensions we re exploring Multi-dimensional view of records Read/write multiple records at a time Collective read/write operations on shared files type(t) :: A(2,2)[3,*] my_rec_lbs = get_rec_lbs( this_image() ) my_rec_ubs = get_rec_ubs( this_image() ) write_team( 10, rec_lb=my_rec_lbs, & rec_lb=my_rec_lbs) & A(:,:) file fn connected to unit 10 1,1 1,2 1,3 1,4 2,1 2,2 2,3 2,4 3,1 3,2 3,3 3,4 4,1 4,2 4,3 4,4 A(1:2,1:2)[1,1] A(1:2,1:2)[1,2] write_team 5,1 5,2 5,3 5,4 6,1 6,2 6,3 6,4 A(1:2,1:2)[2,1] A(1:2,1:2)[2,2] A(1:2,1:2)[3,1] A(1:2,1:2)[3,2] 27

28 Leverage Global Arrays as memory buffers for I/O Implementation in progress which utilizes global arrays (GA) as I/O buffers in memory compute nodes I/O requests I/O nodes asynchronous disk updates 28

29 Outline Fortran 2008 parallel processing additions (CAF) CAF Implementation in OpenUH Fortran compiler Application port to CAF and Results Further extensions for Parallel I/O Closing Remarks 29

30 In Summary Fortran coarray model may be used for processing large data sets Developed implementation that s freely available and used it to develop RTM application Fortran s I/O model doesn t support parallel I/O for large-scale, multi-dimensional array data sets, and we are working on addressing this 30

31 Thanks 31

A Coarray Fortran Implementation to Support Data-Intensive Application Development

A Coarray Fortran Implementation to Support Data-Intensive Application Development A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati, Alan Richardson, Terrence Liao, Henri Calandra and Barbara Chapman Department of Computer Science,

More information

A Coarray Fortran Implementation to Support Data-Intensive Application Development

A Coarray Fortran Implementation to Support Data-Intensive Application Development A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati, Alan Richardson, Terrence Liao, Henri Calandra and Barbara Chapman Department of Computer Science,

More information

An Open64-based Compiler and Runtime Implementation for Coarray Fortran

An Open64-based Compiler and Runtime Implementation for Coarray Fortran An Open64-based Compiler and Runtime Implementation for Coarray Fortran talk by Deepak Eachempati Presented at: Open64 Developer Forum 2010 8/25/2010 1 Outline Motivation Implementation Overview Evaluation

More information

Bringing a scientific application to the distributed world using PGAS

Bringing a scientific application to the distributed world using PGAS Bringing a scientific application to the distributed world using PGAS Performance, Portability and Usability of Fortran Coarrays Jeffrey Salmond August 15, 2017 Research Software Engineering University

More information

Implementation and Evaluation of Coarray Fortran Translator Based on OMNI XcalableMP. October 29, 2015 Hidetoshi Iwashita, RIKEN AICS

Implementation and Evaluation of Coarray Fortran Translator Based on OMNI XcalableMP. October 29, 2015 Hidetoshi Iwashita, RIKEN AICS Implementation and Evaluation of Coarray Fortran Translator Based on OMNI XcalableMP October 29, 2015 Hidetoshi Iwashita, RIKEN AICS Background XMP Contains Coarray Features XcalableMP (XMP) A PGAS language,

More information

Exploring XMP programming model applied to Seismic Imaging application. Laurence BEAUDE

Exploring XMP programming model applied to Seismic Imaging application. Laurence BEAUDE Exploring XMP programming model applied to Seismic Imaging application Introduction Total at a glance: 96 000 employees in more than 130 countries Upstream operations (oil and gas exploration, development

More information

Fortran Coarrays John Reid, ISO Fortran Convener, JKR Associates and Rutherford Appleton Laboratory

Fortran Coarrays John Reid, ISO Fortran Convener, JKR Associates and Rutherford Appleton Laboratory Fortran Coarrays John Reid, ISO Fortran Convener, JKR Associates and Rutherford Appleton Laboratory This talk will explain the objectives of coarrays, give a quick summary of their history, describe the

More information

Lecture V: Introduction to parallel programming with Fortran coarrays

Lecture V: Introduction to parallel programming with Fortran coarrays Lecture V: Introduction to parallel programming with Fortran coarrays What is parallel computing? Serial computing Single processing unit (core) is used for solving a problem One task processed at a time

More information

IMPLEMENTATION AND EVALUATION OF ADDITIONAL PARALLEL FEATURES IN COARRAY FORTRAN

IMPLEMENTATION AND EVALUATION OF ADDITIONAL PARALLEL FEATURES IN COARRAY FORTRAN IMPLEMENTATION AND EVALUATION OF ADDITIONAL PARALLEL FEATURES IN COARRAY FORTRAN A Thesis Presented to the Faculty of the Department of Computer Science University of Houston In Partial Fulfillment of

More information

Parallel Programming in Fortran with Coarrays

Parallel Programming in Fortran with Coarrays Parallel Programming in Fortran with Coarrays John Reid, ISO Fortran Convener, JKR Associates and Rutherford Appleton Laboratory Fortran 2008 is now in FDIS ballot: only typos permitted at this stage.

More information

Fortran 2008: what s in it for high-performance computing

Fortran 2008: what s in it for high-performance computing Fortran 2008: what s in it for high-performance computing John Reid, ISO Fortran Convener, JKR Associates and Rutherford Appleton Laboratory Fortran 2008 has been completed and is about to be published.

More information

Towards Exascale Computing with Fortran 2015

Towards Exascale Computing with Fortran 2015 Towards Exascale Computing with Fortran 2015 Alessandro Fanfarillo National Center for Atmospheric Research Damian Rouson Sourcery Institute Outline Parallelism in Fortran 2008 SPMD PGAS Exascale challenges

More information

First Experiences with Application Development with Fortran Damian Rouson

First Experiences with Application Development with Fortran Damian Rouson First Experiences with Application Development with Fortran 2018 Damian Rouson Overview Fortran 2018 in a Nutshell ICAR & Coarray ICAR WRF-Hydro Results Conclusions www.yourwebsite.com Overview Fortran

More information

6.1 Multiprocessor Computing Environment

6.1 Multiprocessor Computing Environment 6 Parallel Computing 6.1 Multiprocessor Computing Environment The high-performance computing environment used in this book for optimization of very large building structures is the Origin 2000 multiprocessor,

More information

OPENSHMEM AS AN EFFECTIVE COMMUNICATION LAYER FOR PGAS MODELS

OPENSHMEM AS AN EFFECTIVE COMMUNICATION LAYER FOR PGAS MODELS OPENSHMEM AS AN EFFECTIVE COMMUNICATION LAYER FOR PGAS MODELS A Thesis Presented to the Faculty of the Department of Computer Science University of Houston In Partial Fulfillment of the Requirements for

More information

Co-array Fortran Performance and Potential: an NPB Experimental Study. Department of Computer Science Rice University

Co-array Fortran Performance and Potential: an NPB Experimental Study. Department of Computer Science Rice University Co-array Fortran Performance and Potential: an NPB Experimental Study Cristian Coarfa Jason Lee Eckhardt Yuri Dotsenko John Mellor-Crummey Department of Computer Science Rice University Parallel Programming

More information

An Open-Source Compiler and Runtime Implementation for Coarray Fortran

An Open-Source Compiler and Runtime Implementation for Coarray Fortran An Open-Source Compiler and Runtime Implementation for Coarray Fortran Deepak Eachempati Hyoung Joon Jun Barbara Chapman Computer Science Department University of Houston Houston, TX, 77004, USA {dreachem,

More information

Advanced Features. SC10 Tutorial, November 15 th Parallel Programming with Coarray Fortran

Advanced Features. SC10 Tutorial, November 15 th Parallel Programming with Coarray Fortran Advanced Features SC10 Tutorial, November 15 th 2010 Parallel Programming with Coarray Fortran Advanced Features: Overview Execution segments and Synchronisation Non-global Synchronisation Critical Sections

More information

Migrating A Scientific Application from MPI to Coarrays. John Ashby and John Reid HPCx Consortium Rutherford Appleton Laboratory STFC UK

Migrating A Scientific Application from MPI to Coarrays. John Ashby and John Reid HPCx Consortium Rutherford Appleton Laboratory STFC UK Migrating A Scientific Application from MPI to Coarrays John Ashby and John Reid HPCx Consortium Rutherford Appleton Laboratory STFC UK Why and Why Not? +MPI programming is arcane +New emerging paradigms

More information

Parallel Programming Features in the Fortran Standard. Steve Lionel 12/4/2012

Parallel Programming Features in the Fortran Standard. Steve Lionel 12/4/2012 Parallel Programming Features in the Fortran Standard Steve Lionel 12/4/2012 Agenda Overview of popular parallelism methodologies FORALL a look back DO CONCURRENT Coarrays Fortran 2015 Q+A 12/5/2012 2

More information

High performance Computing and O&G Challenges

High performance Computing and O&G Challenges High performance Computing and O&G Challenges 2 Seismic exploration challenges High Performance Computing and O&G challenges Worldwide Context Seismic,sub-surface imaging Computing Power needs Accelerating

More information

Analyzing the Performance of IWAVE on a Cluster using HPCToolkit

Analyzing the Performance of IWAVE on a Cluster using HPCToolkit Analyzing the Performance of IWAVE on a Cluster using HPCToolkit John Mellor-Crummey and Laksono Adhianto Department of Computer Science Rice University {johnmc,laksono}@rice.edu TRIP Meeting March 30,

More information

Parallel Programming with Coarray Fortran

Parallel Programming with Coarray Fortran Parallel Programming with Coarray Fortran SC10 Tutorial, November 15 th 2010 David Henty, Alan Simpson (EPCC) Harvey Richardson, Bill Long, Nathan Wichmann (Cray) Tutorial Overview The Fortran Programming

More information

Leveraging OpenCoarrays to Support Coarray Fortran on IBM Power8E

Leveraging OpenCoarrays to Support Coarray Fortran on IBM Power8E Executive Summary Leveraging OpenCoarrays to Support Coarray Fortran on IBM Power8E Alessandro Fanfarillo, Damian Rouson Sourcery Inc. www.sourceryinstitue.org We report on the experience of installing

More information

Implementation of an integrated efficient parallel multiblock Flow solver

Implementation of an integrated efficient parallel multiblock Flow solver Implementation of an integrated efficient parallel multiblock Flow solver Thomas Bönisch, Panagiotis Adamidis and Roland Rühle adamidis@hlrs.de Outline Introduction to URANUS Why using Multiblock meshes

More information

Portable, MPI-Interoperable! Coarray Fortran

Portable, MPI-Interoperable! Coarray Fortran Portable, MPI-Interoperable! Coarray Fortran Chaoran Yang, 1 Wesley Bland, 2! John Mellor-Crummey, 1 Pavan Balaji 2 1 Department of Computer Science! Rice University! Houston, TX 2 Mathematics and Computer

More information

SUMMARY INTRODUCTION NEW METHOD

SUMMARY INTRODUCTION NEW METHOD Reverse Time Migration in the presence of known sharp interfaces Alan Richardson and Alison E. Malcolm, Department of Earth, Atmospheric and Planetary Sciences, Massachusetts Institute of Technology SUMMARY

More information

Portable, MPI-Interoperable! Coarray Fortran

Portable, MPI-Interoperable! Coarray Fortran Portable, MPI-Interoperable! Coarray Fortran Chaoran Yang, 1 Wesley Bland, 2! John Mellor-Crummey, 1 Pavan Balaji 2 1 Department of Computer Science! Rice University! Houston, TX 2 Mathematics and Computer

More information

CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)

CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) Parallel Programming with Message Passing and Directives 2 MPI + OpenMP Some applications can

More information

Addressing Heterogeneity in Manycore Applications

Addressing Heterogeneity in Manycore Applications Addressing Heterogeneity in Manycore Applications RTM Simulation Use Case stephane.bihan@caps-entreprise.com Oil&Gas HPC Workshop Rice University, Houston, March 2008 www.caps-entreprise.com Introduction

More information

Appendix D. Fortran quick reference

Appendix D. Fortran quick reference Appendix D Fortran quick reference D.1 Fortran syntax... 315 D.2 Coarrays... 318 D.3 Fortran intrisic functions... D.4 History... 322 323 D.5 Further information... 324 Fortran 1 is the oldest high-level

More information

Scalasca performance properties The metrics tour

Scalasca performance properties The metrics tour Scalasca performance properties The metrics tour Markus Geimer m.geimer@fz-juelich.de Scalasca analysis result Generic metrics Generic metrics Time Total CPU allocation time Execution Overhead Visits Hardware

More information

Co-arrays to be included in the Fortran 2008 Standard

Co-arrays to be included in the Fortran 2008 Standard Co-arrays to be included in the Fortran 2008 Standard John Reid, ISO Fortran Convener The ISO Fortran Committee has decided to include co-arrays in the next revision of the Standard. Aim of this talk:

More information

Ghost Cell Pattern. Fredrik Berg Kjolstad. January 26, 2010

Ghost Cell Pattern. Fredrik Berg Kjolstad. January 26, 2010 Ghost Cell Pattern Fredrik Berg Kjolstad University of Illinois Urbana-Champaign, USA kjolsta1@illinois.edu Marc Snir University of Illinois Urbana-Champaign, USA snir@illinois.edu January 26, 2010 Problem

More information

Lecture 32: Partitioned Global Address Space (PGAS) programming models

Lecture 32: Partitioned Global Address Space (PGAS) programming models COMP 322: Fundamentals of Parallel Programming Lecture 32: Partitioned Global Address Space (PGAS) programming models Zoran Budimlić and Mack Joyner {zoran, mjoyner}@rice.edu http://comp322.rice.edu COMP

More information

Barbara Chapman, Gabriele Jost, Ruud van der Pas

Barbara Chapman, Gabriele Jost, Ruud van der Pas Using OpenMP Portable Shared Memory Parallel Programming Barbara Chapman, Gabriele Jost, Ruud van der Pas The MIT Press Cambridge, Massachusetts London, England c 2008 Massachusetts Institute of Technology

More information

Experiences Developing the OpenUH Compiler and Runtime Infrastructure

Experiences Developing the OpenUH Compiler and Runtime Infrastructure Experiences Developing the OpenUH Compiler and Runtime Infrastructure Barbara Chapman and Deepak Eachempati University of Houston Oscar Hernandez Oak Ridge National Laboratory Abstract The OpenUH compiler

More information

Parallel Programming without MPI Using Coarrays in Fortran SUMMERSCHOOL

Parallel Programming without MPI Using Coarrays in Fortran SUMMERSCHOOL Parallel Programming without MPI Using Coarrays in Fortran SUMMERSCHOOL 2007 2015 August 5, 2015 Ge Baolai SHARCNET Western University Outline What is coarray How to write: Terms, syntax How to compile

More information

Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing

Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing NTNU, IMF February 16. 2018 1 Recap: Distributed memory programming model Parallelism with MPI. An MPI execution is started

More information

Introduction to Parallel Programming

Introduction to Parallel Programming Introduction to Parallel Programming Overview Parallel programming allows the user to use multiple cpus concurrently Reasons for parallel execution: shorten execution time by spreading the computational

More information

Lubuntu Linux Virtual Machine

Lubuntu Linux Virtual Machine Lubuntu Linux 18.04 Virtual Machine About Us Slide / 01 Founded in 2015, Sourcery Institute is a California nonprofit public-benefit corporation engaged in research, education, and advisory services in

More information

Performance Comparison between Two Programming Models of XcalableMP

Performance Comparison between Two Programming Models of XcalableMP Performance Comparison between Two Programming Models of XcalableMP H. Sakagami Fund. Phys. Sim. Div., National Institute for Fusion Science XcalableMP specification Working Group (XMP-WG) Dilemma in Parallel

More information

CAF versus MPI Applicability of Coarray Fortran to a Flow Solver

CAF versus MPI Applicability of Coarray Fortran to a Flow Solver CAF versus MPI Applicability of Coarray Fortran to a Flow Solver Manuel Hasert, Harald Klimach, Sabine Roller m.hasert@grs-sim.de Applied Supercomputing in Engineering Motivation We develop several CFD

More information

Friday, May 25, User Experiments with PGAS Languages, or

Friday, May 25, User Experiments with PGAS Languages, or User Experiments with PGAS Languages, or User Experiments with PGAS Languages, or It s the Performance, Stupid! User Experiments with PGAS Languages, or It s the Performance, Stupid! Will Sawyer, Sergei

More information

The Cray Programming Environment. An Introduction

The Cray Programming Environment. An Introduction The Cray Programming Environment An Introduction Vision Cray systems are designed to be High Productivity as well as High Performance Computers The Cray Programming Environment (PE) provides a simple consistent

More information

Coarray Fortran: Past, Present, and Future. John Mellor-Crummey Department of Computer Science Rice University

Coarray Fortran: Past, Present, and Future. John Mellor-Crummey Department of Computer Science Rice University Coarray Fortran: Past, Present, and Future John Mellor-Crummey Department of Computer Science Rice University johnmc@cs.rice.edu CScADS Workshop on Leadership Computing July 19-22, 2010 1 Staff Bill Scherer

More information

Compilers and Compiler-based Tools for HPC

Compilers and Compiler-based Tools for HPC Compilers and Compiler-based Tools for HPC John Mellor-Crummey Department of Computer Science Rice University http://lacsi.rice.edu/review/2004/slides/compilers-tools.pdf High Performance Computing Algorithms

More information

3D Finite Difference Time-Domain Modeling of Acoustic Wave Propagation based on Domain Decomposition

3D Finite Difference Time-Domain Modeling of Acoustic Wave Propagation based on Domain Decomposition 3D Finite Difference Time-Domain Modeling of Acoustic Wave Propagation based on Domain Decomposition UMR Géosciences Azur CNRS-IRD-UNSA-OCA Villefranche-sur-mer Supervised by: Dr. Stéphane Operto Jade

More information

OpenSHMEM as a Portable Communication Layer for PGAS Models: A Case Study with Coarray Fortran

OpenSHMEM as a Portable Communication Layer for PGAS Models: A Case Study with Coarray Fortran OpenSHMEM as a Portable Communication Layer for PGAS Models: A Case Study with Coarray Fortran Naveen Namashivayam, Deepak Eachempati, Dounia Khaldi and Barbara Chapman Department of Computer Science University

More information

Reducing the Computational Complexity of Adjoint Computations

Reducing the Computational Complexity of Adjoint Computations Reducing the Computational Complexity of Adjoint Computations William W. Symes CAAM, Rice University, 2007 Agenda Discrete simulation, objective definition Adjoint state method Checkpointing Griewank s

More information

SINGLE-SIDED PGAS COMMUNICATIONS LIBRARIES. Basic usage of OpenSHMEM

SINGLE-SIDED PGAS COMMUNICATIONS LIBRARIES. Basic usage of OpenSHMEM SINGLE-SIDED PGAS COMMUNICATIONS LIBRARIES Basic usage of OpenSHMEM 2 Outline Concept and Motivation Remote Read and Write Synchronisation Implementations OpenSHMEM Summary 3 Philosophy of the talks In

More information

Introduction to parallel computing concepts and technics

Introduction to parallel computing concepts and technics Introduction to parallel computing concepts and technics Paschalis Korosoglou (support@grid.auth.gr) User and Application Support Unit Scientific Computing Center @ AUTH Overview of Parallel computing

More information

Compute Node Linux: Overview, Progress to Date & Roadmap

Compute Node Linux: Overview, Progress to Date & Roadmap Compute Node Linux: Overview, Progress to Date & Roadmap David Wallace Cray Inc ABSTRACT: : This presentation will provide an overview of Compute Node Linux(CNL) for the CRAY XT machine series. Compute

More information

Techniques to improve the scalability of Checkpoint-Restart

Techniques to improve the scalability of Checkpoint-Restart Techniques to improve the scalability of Checkpoint-Restart Bogdan Nicolae Exascale Systems Group IBM Research Ireland 1 Outline A few words about the lab and team Challenges of Exascale A case for Checkpoint-Restart

More information

Portable SHMEMCache: A High-Performance Key-Value Store on OpenSHMEM and MPI

Portable SHMEMCache: A High-Performance Key-Value Store on OpenSHMEM and MPI Portable SHMEMCache: A High-Performance Key-Value Store on OpenSHMEM and MPI Huansong Fu*, Manjunath Gorentla Venkata, Neena Imam, Weikuan Yu* *Florida State University Oak Ridge National Laboratory Outline

More information

MPI Programming. Henrik R. Nagel Scientific Computing IT Division

MPI Programming. Henrik R. Nagel Scientific Computing IT Division 1 MPI Programming Henrik R. Nagel Scientific Computing IT Division 2 Outline Introduction Finite Difference Method Finite Element Method LU Factorization SOR Method Monte Carlo Method Molecular Dynamics

More information

Experiences Developing the OpenUH Compiler and Runtime Infrastructure

Experiences Developing the OpenUH Compiler and Runtime Infrastructure Noname manuscript No. (will be inserted by the editor) Experiences Developing the OpenUH Compiler and Runtime Infrastructure Barbara Chapman Deepak Eachempati Oscar Hernandez Received: date / Accepted:

More information

PGAS: Partitioned Global Address Space

PGAS: Partitioned Global Address Space .... PGAS: Partitioned Global Address Space presenter: Qingpeng Niu January 26, 2012 presenter: Qingpeng Niu : PGAS: Partitioned Global Address Space 1 Outline presenter: Qingpeng Niu : PGAS: Partitioned

More information

Scalability issues : HPC Applications & Performance Tools

Scalability issues : HPC Applications & Performance Tools High Performance Computing Systems and Technology Group Scalability issues : HPC Applications & Performance Tools Chiranjib Sur HPC @ India Systems and Technology Lab chiranjib.sur@in.ibm.com Top 500 :

More information

AMD S X86 OPEN64 COMPILER. Michael Lai AMD

AMD S X86 OPEN64 COMPILER. Michael Lai AMD AMD S X86 OPEN64 COMPILER Michael Lai AMD CONTENTS Brief History AMD and Open64 Compiler Overview Major Components of Compiler Important Optimizations Recent Releases Performance Applications and Libraries

More information

Designing Parallel Programs. This review was developed from Introduction to Parallel Computing

Designing Parallel Programs. This review was developed from Introduction to Parallel Computing Designing Parallel Programs This review was developed from Introduction to Parallel Computing Author: Blaise Barney, Lawrence Livermore National Laboratory references: https://computing.llnl.gov/tutorials/parallel_comp/#whatis

More information

CSE. Parallel Algorithms on a cluster of PCs. Ian Bush. Daresbury Laboratory (With thanks to Lorna Smith and Mark Bull at EPCC)

CSE. Parallel Algorithms on a cluster of PCs. Ian Bush. Daresbury Laboratory (With thanks to Lorna Smith and Mark Bull at EPCC) Parallel Algorithms on a cluster of PCs Ian Bush Daresbury Laboratory I.J.Bush@dl.ac.uk (With thanks to Lorna Smith and Mark Bull at EPCC) Overview This lecture will cover General Message passing concepts

More information

Towards Exascale Programming Models HPC Summit, Prague Erwin Laure, KTH

Towards Exascale Programming Models HPC Summit, Prague Erwin Laure, KTH Towards Exascale Programming Models HPC Summit, Prague Erwin Laure, KTH 1 Exascale Programming Models With the evolution of HPC architecture towards exascale, new approaches for programming these machines

More information

Compute Node Linux (CNL) The Evolution of a Compute OS

Compute Node Linux (CNL) The Evolution of a Compute OS Compute Node Linux (CNL) The Evolution of a Compute OS Overview CNL The original scheme plan, goals, requirements Status of CNL Plans Features and directions Futures May 08 Cray Inc. Proprietary Slide

More information

New Programming Paradigms: Partitioned Global Address Space Languages

New Programming Paradigms: Partitioned Global Address Space Languages Raul E. Silvera -- IBM Canada Lab rauls@ca.ibm.com ECMWF Briefing - April 2010 New Programming Paradigms: Partitioned Global Address Space Languages 2009 IBM Corporation Outline Overview of the PGAS programming

More information

Porting GASNet to Portals: Partitioned Global Address Space (PGAS) Language Support for the Cray XT

Porting GASNet to Portals: Partitioned Global Address Space (PGAS) Language Support for the Cray XT Porting GASNet to Portals: Partitioned Global Address Space (PGAS) Language Support for the Cray XT Paul Hargrove Dan Bonachea, Michael Welcome, Katherine Yelick UPC Review. July 22, 2009. What is GASNet?

More information

Dangerously Clever X1 Application Tricks

Dangerously Clever X1 Application Tricks Dangerously Clever X1 Application Tricks CUG 2004 James B. White III (Trey) trey@ornl.gov 1 Acknowledgement Research sponsored by the Mathematical, Information, and Division, Office of Advanced Scientific

More information

Programming Models for Supercomputing in the Era of Multicore

Programming Models for Supercomputing in the Era of Multicore Programming Models for Supercomputing in the Era of Multicore Marc Snir MULTI-CORE CHALLENGES 1 Moore s Law Reinterpreted Number of cores per chip doubles every two years, while clock speed decreases Need

More information

NORSAR-3D. Predict and Understand Seismic. Exploring the Earth. Find the answers with NORSAR-3D seismic ray-modelling

NORSAR-3D. Predict and Understand Seismic. Exploring the Earth. Find the answers with NORSAR-3D seismic ray-modelling Exploring the Earth NORSAR-3D Predict and Understand Seismic Is undershooting possible? Which is the best survey geometry? MAZ, WAZ, RAZ, Coil, OBS? Why are there shadow zones? Can they be illuminated?

More information

Particle-in-Cell Simulations on Modern Computing Platforms. Viktor K. Decyk and Tajendra V. Singh UCLA

Particle-in-Cell Simulations on Modern Computing Platforms. Viktor K. Decyk and Tajendra V. Singh UCLA Particle-in-Cell Simulations on Modern Computing Platforms Viktor K. Decyk and Tajendra V. Singh UCLA Outline of Presentation Abstraction of future computer hardware PIC on GPUs OpenCL and Cuda Fortran

More information

Parallel Programming. Libraries and Implementations

Parallel Programming. Libraries and Implementations Parallel Programming Libraries and Implementations Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

High Performance Computing Implementation on a Risk Assessment Problem

High Performance Computing Implementation on a Risk Assessment Problem High Performance Computing Implementation on a Risk Assessment Problem Carlos A. Acosta 1 and Juan D. Ocampo 2 University of Texas at San Antonio, San Antonio, TX, 78249 Harry Millwater, Jr. 3 University

More information

OpenMP 4.0/4.5. Mark Bull, EPCC

OpenMP 4.0/4.5. Mark Bull, EPCC OpenMP 4.0/4.5 Mark Bull, EPCC OpenMP 4.0/4.5 Version 4.0 was released in July 2013 Now available in most production version compilers support for device offloading not in all compilers, and not for all

More information

Parallelisation of Surface-Related Multiple Elimination

Parallelisation of Surface-Related Multiple Elimination Parallelisation of Surface-Related Multiple Elimination G. M. van Waveren High Performance Computing Centre, Groningen, The Netherlands and I.M. Godfrey Stern Computing Systems, Lyon,

More information

CS 470 Spring Parallel Languages. Mike Lam, Professor

CS 470 Spring Parallel Languages. Mike Lam, Professor CS 470 Spring 2017 Mike Lam, Professor Parallel Languages Graphics and content taken from the following: http://dl.acm.org/citation.cfm?id=2716320 http://chapel.cray.com/papers/briefoverviewchapel.pdf

More information

PERFORMANCE OF PARALLEL IO ON LUSTRE AND GPFS

PERFORMANCE OF PARALLEL IO ON LUSTRE AND GPFS PERFORMANCE OF PARALLEL IO ON LUSTRE AND GPFS David Henty and Adrian Jackson (EPCC, The University of Edinburgh) Charles Moulinec and Vendel Szeremi (STFC, Daresbury Laboratory Outline Parallel IO problem

More information

ACCELERATING THE PRODUCTION OF SYNTHETIC SEISMOGRAMS BY A MULTICORE PROCESSOR CLUSTER WITH MULTIPLE GPUS

ACCELERATING THE PRODUCTION OF SYNTHETIC SEISMOGRAMS BY A MULTICORE PROCESSOR CLUSTER WITH MULTIPLE GPUS ACCELERATING THE PRODUCTION OF SYNTHETIC SEISMOGRAMS BY A MULTICORE PROCESSOR CLUSTER WITH MULTIPLE GPUS Ferdinando Alessi Annalisa Massini Roberto Basili INGV Introduction The simulation of wave propagation

More information

Pyramid-shaped grid for elastic wave propagation Feng Chen * and Sheng Xu, CGGVeritas

Pyramid-shaped grid for elastic wave propagation Feng Chen * and Sheng Xu, CGGVeritas Feng Chen * and Sheng Xu, CGGVeritas Summary Elastic wave propagation is elemental to wave-equationbased migration and modeling. Conventional simulation of wave propagation is done on a grid of regular

More information

FVM - How to program the Multi-Core FVM instead of MPI

FVM - How to program the Multi-Core FVM instead of MPI FVM - How to program the Multi-Core FVM instead of MPI DLR, 15. October 2009 Dr. Mirko Rahn Competence Center High Performance Computing and Visualization Fraunhofer Institut for Industrial Mathematics

More information

Cray XE6 Performance Workshop

Cray XE6 Performance Workshop Cray XE6 Performance Workshop Multicore Programming Overview Shared memory systems Basic Concepts in OpenMP Brief history of OpenMP Compiling and running OpenMP programs 2 1 Shared memory systems OpenMP

More information

Overpartioning with the Rice dhpf Compiler

Overpartioning with the Rice dhpf Compiler Overpartioning with the Rice dhpf Compiler Strategies for Achieving High Performance in High Performance Fortran Ken Kennedy Rice University http://www.cs.rice.edu/~ken/presentations/hug00overpartioning.pdf

More information

MPI: A Message-Passing Interface Standard

MPI: A Message-Passing Interface Standard MPI: A Message-Passing Interface Standard Version 2.1 Message Passing Interface Forum June 23, 2008 Contents Acknowledgments xvl1 1 Introduction to MPI 1 1.1 Overview and Goals 1 1.2 Background of MPI-1.0

More information

Programming Scalable Systems with MPI. Clemens Grelck, University of Amsterdam

Programming Scalable Systems with MPI. Clemens Grelck, University of Amsterdam Clemens Grelck University of Amsterdam UvA / SurfSARA High Performance Computing and Big Data Course June 2014 Parallel Programming with Compiler Directives: OpenMP Message Passing Gentle Introduction

More information

. Programming in Chapel. Kenjiro Taura. University of Tokyo

. Programming in Chapel. Kenjiro Taura. University of Tokyo .. Programming in Chapel Kenjiro Taura University of Tokyo 1 / 44 Contents. 1 Chapel Chapel overview Minimum introduction to syntax Task Parallelism Locales Data parallel constructs Ranges, domains, and

More information

Application Performance on IME

Application Performance on IME Application Performance on IME Toine Beckers, DDN Marco Grossi, ICHEC Burst Buffer Designs Introduce fast buffer layer Layer between memory and persistent storage Pre-stage application data Buffer writes

More information

Lab 3: Depth imaging using Reverse Time Migration

Lab 3: Depth imaging using Reverse Time Migration Due Wednesday, May 1, 2013 TA: Yunyue (Elita) Li Lab 3: Depth imaging using Reverse Time Migration Your Name: Anne of Cleves ABSTRACT In this exercise you will familiarize yourself with full wave-equation

More information

Enabling Highly-Scalable Remote Memory Access Programming with MPI-3 One Sided

Enabling Highly-Scalable Remote Memory Access Programming with MPI-3 One Sided Enabling Highly-Scalable Remote Memory Access Programming with MPI-3 One Sided ROBERT GERSTENBERGER, MACIEJ BESTA, TORSTEN HOEFLER MPI-3.0 RMA MPI-3.0 supports RMA ( MPI One Sided ) Designed to react to

More information

Additional Parallel Features in Fortran An Overview of ISO/IEC TS 18508

Additional Parallel Features in Fortran An Overview of ISO/IEC TS 18508 Additional Parallel Features in Fortran An Overview of ISO/IEC TS 18508 Dr. Reinhold Bader Leibniz Supercomputing Centre Introductory remarks Technical Specification a Mini-Standard permits implementors

More information

Understanding Communication and MPI on Cray XC40 C O M P U T E S T O R E A N A L Y Z E

Understanding Communication and MPI on Cray XC40 C O M P U T E S T O R E A N A L Y Z E Understanding Communication and MPI on Cray XC40 Features of the Cray MPI library Cray MPI uses MPICH3 distribution from Argonne Provides a good, robust and feature rich MPI Well tested code for high level

More information

LLVM-based Communication Optimizations for PGAS Programs

LLVM-based Communication Optimizations for PGAS Programs LLVM-based Communication Optimizations for PGAS Programs nd Workshop on the LLVM Compiler Infrastructure in HPC @ SC15 Akihiro Hayashi (Rice University) Jisheng Zhao (Rice University) Michael Ferguson

More information

Reverse time migration with random boundaries

Reverse time migration with random boundaries Reverse time migration with random boundaries Robert G. Clapp ABSTRACT Reading wavefield checkpoints from disk is quickly becoming the bottleneck in Reverse Time Migration. We eliminate the need to write

More information

Comments on wavefield propagation using Reverse-time and Downward continuation

Comments on wavefield propagation using Reverse-time and Downward continuation Comments on wavefield propagation using Reverse-time and Downward continuation John C. Bancroft ABSTRACT Each iteration a of Full-waveform inversion requires the migration of the difference between the

More information

A brief introduction to OpenMP

A brief introduction to OpenMP A brief introduction to OpenMP Alejandro Duran Barcelona Supercomputing Center Outline 1 Introduction 2 Writing OpenMP programs 3 Data-sharing attributes 4 Synchronization 5 Worksharings 6 Task parallelism

More information

Memory allocation and sample API calls. Preliminary Gemini performance measurements

Memory allocation and sample API calls. Preliminary Gemini performance measurements DMAPP in context Basic features of the API Memory allocation and sample API calls Preliminary Gemini performance measurements 2 The Distributed Memory Application (DMAPP) API Supports features of the Gemini

More information

High Performance Fortran. James Curry

High Performance Fortran. James Curry High Performance Fortran James Curry Wikipedia! New Fortran statements, such as FORALL, and the ability to create PURE (side effect free) procedures Compiler directives for recommended distributions of

More information

MPI Casestudy: Parallel Image Processing

MPI Casestudy: Parallel Image Processing MPI Casestudy: Parallel Image Processing David Henty 1 Introduction The aim of this exercise is to write a complete MPI parallel program that does a very basic form of image processing. We will start by

More information

A Heat-Transfer Example with MPI Rolf Rabenseifner

A Heat-Transfer Example with MPI Rolf Rabenseifner A Heat-Transfer Example with MPI (short version) Rolf Rabenseifner rabenseifner@hlrs.de University of Stuttgart High-Performance Computing-Center Stuttgart (HLRS) www.hlrs.de A Heat-Transfer Example with

More information

Exploring XcalableMP. Shun Liang. August 24, 2012

Exploring XcalableMP. Shun Liang. August 24, 2012 Exploring XcalableMP Shun Liang August 24, 2012 MSc in High Performance Computing The University of Edinburgh Year of Presentation: 2012 Abstract This project has implemented synthetic and application

More information

Exploration seismology and the return of the supercomputer

Exploration seismology and the return of the supercomputer Exploration seismology and the return of the supercomputer Exploring scalability for speed in development and delivery Sverre Brandsberg-Dahl Chief Geophysicist, Imaging and Engineering Marine seismic

More information

PROGRAMMING MODEL EXAMPLES

PROGRAMMING MODEL EXAMPLES ( Cray Inc 2015) PROGRAMMING MODEL EXAMPLES DEMONSTRATION EXAMPLES OF VARIOUS PROGRAMMING MODELS OVERVIEW Building an application to use multiple processors (cores, cpus, nodes) can be done in various

More information