An Open64-based Compiler and Runtime Implementation for Coarray Fortran
|
|
- Monica Kelley
- 5 years ago
- Views:
Transcription
1 An Open64-based Compiler and Runtime Implementation for Coarray Fortran talk by Deepak Eachempati Presented at: Open64 Developer Forum /25/2010 1
2 Outline Motivation Implementation Overview Evaluation Future Work 2
3 Motivation Goals Provide an open, portable implementation of Coarray Fortran (CAF) Explore potential optimizations in compiler and runtime for reducing parallelization costs Prototype new language features Encourage support for CAF - adding to Open64 encourages other vendors to pick it up 3
4 Motivation What is Coarray Fortran (CAF)? Simple language extension to Fortran for explicit parallel programming Principle: smallest change required to make Fortran an effective parallel language Features SPMD model similar to MPI Fixed number of asynchronously executing copies called images coarray abstraction allows indexing into other images using cosubscripts for remote data access new intrinsics and synchronization statements added to language 4
5 Motivation Example: Matrix Multiply real, allocatable :: a(:,:)[:,:], b(:,:)[:,:], c(:,:)[:,:] allocate(a(n,n)[p,*], b(n,n)[p,*], c(n,n)[p,*]) myp = this_image(a,1) myq = this_image(a,2) * a = 1.0 b = 1.0 c = sync all do i=1,n do j=1,n do l=1,p c(i,j) = c(i,j) + sum(a(i,:)[myp,l]*b(:,j)[l,myq]) end do end do end do 3 p allocatable coarrays get cosubscripts for this image based on codimensions a(:,:)[3,4] myp = 3 myq = 4 remote access to array sections; communication costs may be reduced by optimizing compiler 5
6 Implementation UH CAF Implementation Basic translation strategy associate two dope vectors with each declared coarray replace coarray references (having * + cosubscripts) with GET or PUT runtime call identify and translate intrinsic functions What about optimizations? Open64 is a robust optimizing compiler we defer translation until after front-end so that we can explore compiler optimizations based on CAF syntax 6
7 Implementation Comparison to other CAF compilers UH CAF currently a baseline implementation (no optimization) portable, open-source runtime support based current implementations using ARMCI or GASNet G95 can generate optimized program for multi-core can support images run on cluster, using G95 Coarray Console (closed source) gfortran CAF implementation is in progress focus is on shared memory (single node) multi-image cluster execution is not yet supported 7
8 Implementation OpenUH-CAF Compiler CAF Source Code CRAYF90 Fortran Frontend With CAF Support VH-WHIRL Emitter OpenUH Communication Library (ARMCI, GASNet) coarray analysis phase coarray lowering OpenUH Middle-end and Backend VHO - LNO WOPT CG 8
9 Implementation UH CAF Coarray Lowering Coarray Lowering Responsibilities replacement of coarray variables with low-level data structures for describing shape and location of data modify program to create coarray data and track it throughout lifetime of program replacement of coarray references to remote data with 1-sided communication leverage analyses from compiler to optimize: synchronizations communication buffering 9
10 Implementation Data structure for describing coarrays Dope Vector Codope Vector Introduced 10
11 Coarray Lowering Communication Generation (Example) Implementation A(i, j, 1:n)[q] = B(1, j, 1:n)[p] + C(1, j, 1:n)[p] + D[p] A(i, j, 1:n)[q] t4 = t1 B(1, j, 1:n)[p] + t2 C(1, j, 1:n)[p] + t3 D[p]] 11
12 t1 B(1, i:j)[p]! allocate t1 t1%base = 0 t1%flds%el_len = b_dope%flds%el_len t1%flds%assoc = 0 t1%flds%type_code = IOR(ISHFT(2_8,32), ISHFT(32_8,44)) t1%flds%num_dims = ISHFT(2,29) t1%dims(1)%lb = 1 t1%dims(1)%ext = 1 t1%dims(1)%str_m = 1 t1%dims(2)%lb = 1 t1%dims(2)%ext = j-i+1 t1%dims(2)%str_m = 1 call caf_alloc_comm_buf(t1)! set up b_dope and b_codope for transfer xfer_dope = b_dope xfer_dope%base = b_dope%base + ISHFT(b_dope%flds%el_len,-3)*& ( (1-b_dope%dims(1)%lb)*b_dope%dims(1)%str_m + & (j-b_dope%dims(2)%lb)*b_dope%dims(2)%str_m ) xfer_codope = b_codope xfer_codope%base_img = b_codope%base_img + & (p-b_codope%codims(1)%lb)*b_codope%codims(1)%str_m call caf_rma_get(t1, xfer_dope, xfer_codope) B(3:5,i:j)[p] t2 =! allocate t2 t2%base = 0 t2%flds%el_len = b_dope%flds%el_len t2%flds%assoc = 0 t2%flds%type_code = IOR(ISHFT(2_8,32), ISHFT(32_8,44)) t2%flds%num_dims = ISHFT(2,29) t2%dims(1)%lb = 1 t2%dims(1)%ext = 3 t2%dims(1)%str_m = 1 t2%dims(2)%lb = 1 t2%dims(2)%ext = j-i+1 t2%dims(2)%str_m = 3 call caf_alloc_comm_buf(t2) t2 = Implementation! set up b_dope and b_codope for transfer xfer_dope = b_dope xfer_dope%base = b_dope%base + ISHFT(b_dope%flds%el_len,-3)*& ( (3-b_dope%dims(1)%lb)*b_dope%dims(1)%str_m + & (i-b_dope%dims(2)%lb)*b_dope%dims(2)%str_m ) xfer_codope = b_codope xfer_codope%base_img = b_codope%base_img + & (p-b_codope%codims(1)%lb)*b_codope%codims(1)%str_m call caf_rma_put(t2, xfer_dope, xfer_codope) 12
13 Implementation Opportunities for Optimization reduce costs due to allocation of temporary buffers optimize buffer sizes static analysis to generate non-blocking communication as appropriate static analysis to reduce synchronization costs adapt WOPT and CG optimization phases for coarrays 13
14 Implementation UH CAF Runtime Support A communication runtime library, provides: process management manage coarray data 1-sided communication synchronizations collectives communication buffer allocation/deallocation ARMCI or GASnet for 1-sided communication Assumes CRAY dopevector format launch images with mpirun 14
15 Implementation Managing Coarray Data Coarrays may be non-allocable or allocatable. Remote Memory Descriptor Defines a memory region allocated collectively on each executing process (addr i is the base address on image i) size bytes on all images Coarrays are created collectively within these memory regions at next_offset rm_desc 0 rm_desc 1 rm_desc n addr 1 addr 2 addr 3 addr 4... size next offset 15
16 Evaluation Parallelizing using CAF real :: u(xmin-lx:xmax+lx, zmin-lz:zmin+lz) real :: v(xmin-lx:xmax+lx, zmin-lz:zmin+lz) & & do it=1,nt, 2 print *, maxval(u), minval(u) u(xsource,zsource) += & source(it) call cg_fwd_2d( u, v, ) v(xsource,zsource += & source(it+1) call cg_fwd_2d( v, u, ) end do real :: u(xmin-lx:xmax+lx, & zmin-lz:zmin+lz)[npx,*] real :: v(xmin-lx:xmax+lx, & zmin-lz:zmin+lz)[npx,*] do it=1,nt, 2 call co_maxval(maxval(u), u_max) call co_minval( minval(u), u_min) if (this_image()==1) print *, u_max, u_min if (is_center_image()) & u(xsource,zsource) += source(it) call cg_fwd_2d( u, v, )! get bottom rows from top neighbor if (px>1) u(xmin-lx:xmax-1,:) = & u(xmax-lx+1:xmax,:)[px-1,pz] if (is_center_image()) & v(xsource,zsource += source(it+1)! get bottom rows from top neighbor if (px>1) v(xmin-lx:xmax-1,:) = & v(xmax-lx+1:xmax,:)[px-1,pz] call cg_fwd_2d( v, u, ) end do 16
17 Evaluation Results on IB cluster 17
18 Future Work On-going and future work Optimize implementation for execution on SMP node generate non-blocking communication and wait primitives for reduce overheads explore language extensions tune buffering: consider user-, compiler-, and runtime-driven strategies. 18
19 Current Status Supported Features coarrays of basic data types allocatable coarrays intrinsics: this_image, num_images, image_index, lcobound, ucobound synchronizations sync all, sync images, notify/query, critical sections collectives comax, comin, cosum enable performance tracing in runtime For more information: 19
20 Thank you. Questions? 20
An Open-Source Compiler and Runtime Implementation for Coarray Fortran
An Open-Source Compiler and Runtime Implementation for Coarray Fortran Deepak Eachempati Hyoung Joon Jun Barbara Chapman Computer Science Department University of Houston Houston, TX, 77004, USA {dreachem,
More informationLecture V: Introduction to parallel programming with Fortran coarrays
Lecture V: Introduction to parallel programming with Fortran coarrays What is parallel computing? Serial computing Single processing unit (core) is used for solving a problem One task processed at a time
More informationA Coarray Fortran Implementation to Support Data-Intensive Application Development
A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati, Alan Richardson, Terrence Liao, Henri Calandra and Barbara Chapman Department of Computer Science,
More informationImplementation and Evaluation of Coarray Fortran Translator Based on OMNI XcalableMP. October 29, 2015 Hidetoshi Iwashita, RIKEN AICS
Implementation and Evaluation of Coarray Fortran Translator Based on OMNI XcalableMP October 29, 2015 Hidetoshi Iwashita, RIKEN AICS Background XMP Contains Coarray Features XcalableMP (XMP) A PGAS language,
More informationA Coarray Fortran Implementation to Support Data-Intensive Application Development
A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati 1, Alan Richardson 2, Terrence Liao 3, Henri Calandra 3, Barbara Chapman 1 Data-Intensive Scalable Computing
More informationA Coarray Fortran Implementation to Support Data-Intensive Application Development
A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati, Alan Richardson, Terrence Liao, Henri Calandra and Barbara Chapman Department of Computer Science,
More informationCo-arrays to be included in the Fortran 2008 Standard
Co-arrays to be included in the Fortran 2008 Standard John Reid, ISO Fortran Convener The ISO Fortran Committee has decided to include co-arrays in the next revision of the Standard. Aim of this talk:
More informationFortran Coarrays John Reid, ISO Fortran Convener, JKR Associates and Rutherford Appleton Laboratory
Fortran Coarrays John Reid, ISO Fortran Convener, JKR Associates and Rutherford Appleton Laboratory This talk will explain the objectives of coarrays, give a quick summary of their history, describe the
More informationIMPLEMENTATION AND EVALUATION OF ADDITIONAL PARALLEL FEATURES IN COARRAY FORTRAN
IMPLEMENTATION AND EVALUATION OF ADDITIONAL PARALLEL FEATURES IN COARRAY FORTRAN A Thesis Presented to the Faculty of the Department of Computer Science University of Houston In Partial Fulfillment of
More informationFortran 2008: what s in it for high-performance computing
Fortran 2008: what s in it for high-performance computing John Reid, ISO Fortran Convener, JKR Associates and Rutherford Appleton Laboratory Fortran 2008 has been completed and is about to be published.
More informationCo-array Fortran Performance and Potential: an NPB Experimental Study. Department of Computer Science Rice University
Co-array Fortran Performance and Potential: an NPB Experimental Study Cristian Coarfa Jason Lee Eckhardt Yuri Dotsenko John Mellor-Crummey Department of Computer Science Rice University Parallel Programming
More informationParallel Programming with Coarray Fortran
Parallel Programming with Coarray Fortran SC10 Tutorial, November 15 th 2010 David Henty, Alan Simpson (EPCC) Harvey Richardson, Bill Long, Nathan Wichmann (Cray) Tutorial Overview The Fortran Programming
More informationProceedings of the GCC Developers Summit
Reprinted from the Proceedings of the GCC Developers Summit June 17th 19th, 2008 Ottawa, Ontario Canada Conference Organizers Andrew J. Hutton, Steamballoon, Inc., Linux Symposium, Thin Lines Mountaineering
More informationFortran 2008 coarrays
Fortran 2008 coarrays Anton Shterenlikht Mech Eng Dept, The University of Bristol, Bristol BS8 1TR mexas@bris.ac.uk ABSTRACT Coarrays are a Fortran 2008 standard feature intended for SPMD type parallel
More informationMore Coarray Features. SC10 Tutorial, November 15 th 2010 Parallel Programming with Coarray Fortran
More Coarray Features SC10 Tutorial, November 15 th 2010 Parallel Programming with Coarray Fortran Overview Multiple Dimensions and Codimensions Allocatable Coarrays and Components of Coarray Structures
More informationParallel Programming in Fortran with Coarrays
Parallel Programming in Fortran with Coarrays John Reid, ISO Fortran Convener, JKR Associates and Rutherford Appleton Laboratory Fortran 2008 is now in FDIS ballot: only typos permitted at this stage.
More informationParallel Programming without MPI Using Coarrays in Fortran SUMMERSCHOOL
Parallel Programming without MPI Using Coarrays in Fortran SUMMERSCHOOL 2007 2015 August 5, 2015 Ge Baolai SHARCNET Western University Outline What is coarray How to write: Terms, syntax How to compile
More informationBringing a scientific application to the distributed world using PGAS
Bringing a scientific application to the distributed world using PGAS Performance, Portability and Usability of Fortran Coarrays Jeffrey Salmond August 15, 2017 Research Software Engineering University
More informationMigrating A Scientific Application from MPI to Coarrays. John Ashby and John Reid HPCx Consortium Rutherford Appleton Laboratory STFC UK
Migrating A Scientific Application from MPI to Coarrays John Ashby and John Reid HPCx Consortium Rutherford Appleton Laboratory STFC UK Why and Why Not? +MPI programming is arcane +New emerging paradigms
More informationParallel Programming Features in the Fortran Standard. Steve Lionel 12/4/2012
Parallel Programming Features in the Fortran Standard Steve Lionel 12/4/2012 Agenda Overview of popular parallelism methodologies FORALL a look back DO CONCURRENT Coarrays Fortran 2015 Q+A 12/5/2012 2
More informationLeveraging OpenCoarrays to Support Coarray Fortran on IBM Power8E
Executive Summary Leveraging OpenCoarrays to Support Coarray Fortran on IBM Power8E Alessandro Fanfarillo, Damian Rouson Sourcery Inc. www.sourceryinstitue.org We report on the experience of installing
More informationOPENSHMEM AS AN EFFECTIVE COMMUNICATION LAYER FOR PGAS MODELS
OPENSHMEM AS AN EFFECTIVE COMMUNICATION LAYER FOR PGAS MODELS A Thesis Presented to the Faculty of the Department of Computer Science University of Houston In Partial Fulfillment of the Requirements for
More informationNew Programming Paradigms: Partitioned Global Address Space Languages
Raul E. Silvera -- IBM Canada Lab rauls@ca.ibm.com ECMWF Briefing - April 2010 New Programming Paradigms: Partitioned Global Address Space Languages 2009 IBM Corporation Outline Overview of the PGAS programming
More informationCoarray Fortran: Past, Present, and Future. John Mellor-Crummey Department of Computer Science Rice University
Coarray Fortran: Past, Present, and Future John Mellor-Crummey Department of Computer Science Rice University johnmc@cs.rice.edu CScADS Workshop on Leadership Computing July 19-22, 2010 1 Staff Bill Scherer
More informationProgramming for High Performance Computing in Modern Fortran. Bill Long, Cray Inc. 17-May-2005
Programming for High Performance Computing in Modern Fortran Bill Long, Cray Inc. 17-May-2005 Concepts in HPC Efficient and structured layout of local data - modules and allocatable arrays Efficient operations
More informationReport from WG5 convener
Report from WG5 convener Content of Fortran 2008 Framework was decided at last years WG5 meeting and was not substantially changed at this year s WG5 meeting. Two large items bits and intelligent macros
More informationAdvanced Features. SC10 Tutorial, November 15 th Parallel Programming with Coarray Fortran
Advanced Features SC10 Tutorial, November 15 th 2010 Parallel Programming with Coarray Fortran Advanced Features: Overview Execution segments and Synchronisation Non-global Synchronisation Critical Sections
More informationExperiences Developing the OpenUH Compiler and Runtime Infrastructure
Experiences Developing the OpenUH Compiler and Runtime Infrastructure Barbara Chapman and Deepak Eachempati University of Houston Oscar Hernandez Oak Ridge National Laboratory Abstract The OpenUH compiler
More informationEvaluating the Portability of UPC to the Cell Broadband Engine
Evaluating the Portability of UPC to the Cell Broadband Engine Dipl. Inform. Ruben Niederhagen JSC Cell Meeting CHAIR FOR OPERATING SYSTEMS Outline Introduction UPC Cell UPC on Cell Mapping Compiler and
More informationAppendix D. Fortran quick reference
Appendix D Fortran quick reference D.1 Fortran syntax... 315 D.2 Coarrays... 318 D.3 Fortran intrisic functions... D.4 History... 322 323 D.5 Further information... 324 Fortran 1 is the oldest high-level
More informationCoarrays in the next Fortran Standard
ISO/IEC JTC1/SC22/WG5 N1724 Coarrays in the next Fortran Standard John Reid, JKR Associates, UK March 18, 2008 Abstract The WG5 committee, at its meeting in Delft, May 2005, decided to include coarrays
More informationCoarrays in the next Fortran Standard
ISO/IEC JTC1/SC22/WG5 N1824 Coarrays in the next Fortran Standard John Reid, JKR Associates, UK April 21, 2010 Abstract Coarrays will be included in the next Fortran Standard, known informally as Fortran
More informationUniversity of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /
Shterenlikht, A., Margetts, L., Cebamanos, L., & Henty, D. (2015). Fortran 2008 coarrays. ACM SIGPLAN Fortran Forum, 34(1), 10-30. https://doi.org/10.1145/2754942.2754944 Peer reviewed version Link to
More informationParallel programming with Fortran 2008 and 2015 coarrays
Parallel programming with Fortran 2008 and 2015 coarrays Anton Shterenlikht Mech Eng Dept, The University of Bristol, Bristol BS8 1TR, UK mexas@bris.ac.uk ABSTRACT Coarrays were first introduced in Fortran
More informationTowards Exascale Computing with Fortran 2015
Towards Exascale Computing with Fortran 2015 Alessandro Fanfarillo National Center for Atmospheric Research Damian Rouson Sourcery Institute Outline Parallelism in Fortran 2008 SPMD PGAS Exascale challenges
More informationExperiences Developing the OpenUH Compiler and Runtime Infrastructure
Noname manuscript No. (will be inserted by the editor) Experiences Developing the OpenUH Compiler and Runtime Infrastructure Barbara Chapman Deepak Eachempati Oscar Hernandez Received: date / Accepted:
More informationPortable, MPI-Interoperable! Coarray Fortran
Portable, MPI-Interoperable! Coarray Fortran Chaoran Yang, 1 Wesley Bland, 2! John Mellor-Crummey, 1 Pavan Balaji 2 1 Department of Computer Science! Rice University! Houston, TX 2 Mathematics and Computer
More informationMorden Fortran: Concurrency and parallelism
Morden Fortran: Concurrency and parallelism GENERAL SUMMERSCHOOL INTEREST SEMINARS 2007 2017 April 19, 2017 Ge Baolai SHARCNET Western University Outline Highlights of some Fortran 2008 enhancement Array
More informationProgramming Models for Scientific Computing on Leadership Computing Platforms:
Programming Models for Scientific Computing on Leadership Computing Platforms: The Evolution of Coarray Fortran John Mellor-Crummey Department of Computer Science Rice University COMP 422 08 April 2008
More informationCAF versus MPI Applicability of Coarray Fortran to a Flow Solver
CAF versus MPI Applicability of Coarray Fortran to a Flow Solver Manuel Hasert, Harald Klimach, Sabine Roller m.hasert@grs-sim.de Applied Supercomputing in Engineering Motivation We develop several CFD
More informationEE/CSCI 451: Parallel and Distributed Computation
EE/CSCI 451: Parallel and Distributed Computation Lecture #7 2/5/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline From last class
More informationScalable Software Transactional Memory for Chapel High-Productivity Language
Scalable Software Transactional Memory for Chapel High-Productivity Language Srinivas Sridharan and Peter Kogge, U. Notre Dame Brad Chamberlain, Cray Inc Jeffrey Vetter, Future Technologies Group, ORNL
More informationAdditional Parallel Features in Fortran An Overview of ISO/IEC TS 18508
Additional Parallel Features in Fortran An Overview of ISO/IEC TS 18508 Dr. Reinhold Bader Leibniz Supercomputing Centre Introductory remarks Technical Specification a Mini-Standard permits implementors
More informationLecture 32: Partitioned Global Address Space (PGAS) programming models
COMP 322: Fundamentals of Parallel Programming Lecture 32: Partitioned Global Address Space (PGAS) programming models Zoran Budimlić and Mack Joyner {zoran, mjoyner}@rice.edu http://comp322.rice.edu COMP
More informationPortable, MPI-Interoperable! Coarray Fortran
Portable, MPI-Interoperable! Coarray Fortran Chaoran Yang, 1 Wesley Bland, 2! John Mellor-Crummey, 1 Pavan Balaji 2 1 Department of Computer Science! Rice University! Houston, TX 2 Mathematics and Computer
More informationLLVM-based Communication Optimizations for PGAS Programs
LLVM-based Communication Optimizations for PGAS Programs nd Workshop on the LLVM Compiler Infrastructure in HPC @ SC15 Akihiro Hayashi (Rice University) Jisheng Zhao (Rice University) Michael Ferguson
More informationReview More Arrays Modules Final Review
OUTLINE 1 REVIEW 2 MORE ARRAYS Using Arrays Why do we need dynamic arrays? Using Dynamic Arrays 3 MODULES Global Variables Interface Blocks Modular Programming 4 FINAL REVIEW THE STORY SO FAR... Create
More informationParallel Programming Languages. HPC Fall 2010 Prof. Robert van Engelen
Parallel Programming Languages HPC Fall 2010 Prof. Robert van Engelen Overview Partitioned Global Address Space (PGAS) A selection of PGAS parallel programming languages CAF UPC Further reading HPC Fall
More informationDangerously Clever X1 Application Tricks
Dangerously Clever X1 Application Tricks CUG 2004 James B. White III (Trey) trey@ornl.gov 1 Acknowledgement Research sponsored by the Mathematical, Information, and Division, Office of Advanced Scientific
More informationJohn Mellor-Crummey Department of Computer Science Center for High Performance Software Research Rice University
Co-Array Fortran and High Performance Fortran John Mellor-Crummey Department of Computer Science Center for High Performance Software Research Rice University LACSI Symposium October 2006 The Problem Petascale
More informationTS Further Interoperability of Fortran with C WG5/N1917
TS 29113 Further Interoperability of Fortran with C WG5/N1917 7th May 2012 12:21 Draft document for DTS Ballot (Blank page) 2012/5/7 TS 29113 Further Interoperability of Fortran with C WG5/N1917 Contents
More informationProgramming techniques for heterogeneous architectures. Pietro Bonfa SuperComputing Applications and Innovation Department
Programming techniques for heterogeneous architectures Pietro Bonfa p.bonfa@cineca.it SuperComputing Applications and Innovation Department Heterogeneous computing Gain performance or energy efficiency
More informationCSE 590o: Chapel. Brad Chamberlain Steve Deitz Chapel Team. University of Washington September 26, 2007
CSE 590o: Chapel Brad Chamberlain Steve Deitz Chapel Team University of Washington September 26, 2007 Outline Context for Chapel This Seminar Chapel Compiler CSE 590o: Chapel (2) Chapel Chapel: a new parallel
More informationMPI Runtime Error Detection with MUST
MPI Runtime Error Detection with MUST At the 27th VI-HPS Tuning Workshop Joachim Protze IT Center RWTH Aachen University April 2018 How many issues can you spot in this tiny example? #include #include
More informationA New Vision for Coarray Fortran
A New Vision for Coarray Fortran John Mellor-Crummey, Laksono Adhianto, and William Scherer III Department of Computer Science Rice University Houston, TX, USA {johnmc, laksono, scherer}@rice.edu Abstract
More informationThe Complete Compendium on Cooperative Computing using Coarrays. c 2008 Andrew Vaught October 29, 2008
Preface The Complete Compendium on Cooperative Computing using Coarrays. c 2008 Andrew Vaught October 29, 2008 Over the last several decades, the speed of computing has increased exponentially, a phenononom
More informationAMD S X86 OPEN64 COMPILER. Michael Lai AMD
AMD S X86 OPEN64 COMPILER Michael Lai AMD CONTENTS Brief History AMD and Open64 Compiler Overview Major Components of Compiler Important Optimizations Recent Releases Performance Applications and Libraries
More informationCo-Array Fortran Performance and Potential: An NPB Experimental Study
Co-Array Fortran Performance and Potential: An NPB Experimental Study Cristian Coarfa, Yuri Dotsenko, Jason Eckhardt, and John Mellor-Crummey Rice University, Houston TX 77005, USA Abstract. Co-array Fortran
More informationCOS 140: Foundations of Computer Science
COS 140: Foundations of Computer Science Variables and Primitive Data Types Fall 2017 Introduction 3 What is a variable?......................................................... 3 Variable attributes..........................................................
More informationParallel Programming with OpenMP. CS240A, T. Yang
Parallel Programming with OpenMP CS240A, T. Yang 1 A Programmer s View of OpenMP What is OpenMP? Open specification for Multi-Processing Standard API for defining multi-threaded shared-memory programs
More informationPorting GASNet to Portals: Partitioned Global Address Space (PGAS) Language Support for the Cray XT
Porting GASNet to Portals: Partitioned Global Address Space (PGAS) Language Support for the Cray XT Paul Hargrove Dan Bonachea, Michael Welcome, Katherine Yelick UPC Review. July 22, 2009. What is GASNet?
More informationEE/CSCI 451: Parallel and Distributed Computation
EE/CSCI 451: Parallel and Distributed Computation Lecture #15 3/7/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 From last class Outline
More informationIntroduction to the PGAS (Partitioned Global Address Space) Languages Coarray Fortran (CAF) and Unified Parallel C (UPC) Dr. R. Bader Dr. A.
Introduction to the PGAS (Partitioned Global Address Space) Languages Coarray Fortran (CAF) and Unified Parallel C (UPC) Dr. R. Bader Dr. A. Block January 2011 Applying PGAS to classical HPC languages
More informationEvaluating Fortran Coarrays and MPI on a Modern HPC architecture
Evaluating Fortran Coarrays and MPI on a Modern HPC architecture Daniel Robinson August 19, 2011 MSc in High Performance Computing The University of Edinburgh Year of Presentation: 2011 Abstract The increasing
More informationParallel Programming Models. Parallel Programming Models. Threads Model. Implementations 3/24/2014. Shared Memory Model (without threads)
Parallel Programming Models Parallel Programming Models Shared Memory (without threads) Threads Distributed Memory / Message Passing Data Parallel Hybrid Single Program Multiple Data (SPMD) Multiple Program
More informationIntroduction to Parallel Programming
Introduction to Parallel Programming Overview Parallel programming allows the user to use multiple cpus concurrently Reasons for parallel execution: shorten execution time by spreading the computational
More informationIntroduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines
Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines What is OpenMP? What does OpenMP stands for? What does OpenMP stands for? Open specifications for Multi
More informationOpenCL TM & OpenMP Offload on Sitara TM AM57x Processors
OpenCL TM & OpenMP Offload on Sitara TM AM57x Processors 1 Agenda OpenCL Overview of Platform, Execution and Memory models Mapping these models to AM57x Overview of OpenMP Offload Model Compare and contrast
More informationFirst Experiences with Application Development with Fortran Damian Rouson
First Experiences with Application Development with Fortran 2018 Damian Rouson Overview Fortran 2018 in a Nutshell ICAR & Coarray ICAR WRF-Hydro Results Conclusions www.yourwebsite.com Overview Fortran
More informationParallel Programming. Libraries and Implementations
Parallel Programming Libraries and Implementations Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationOpenMP 3.0 Tasking Implementation in OpenUH
Open64 Workshop @ CGO 09 OpenMP 3.0 Tasking Implementation in OpenUH Cody Addison Texas Instruments Lei Huang University of Houston James (Jim) LaGrone University of Houston Barbara Chapman University
More informationThe MPI Message-passing Standard Lab Time Hands-on. SPD Course 11/03/2014 Massimo Coppola
The MPI Message-passing Standard Lab Time Hands-on SPD Course 11/03/2014 Massimo Coppola What was expected so far Prepare for the lab sessions Install a version of MPI which works on your O.S. OpenMPI
More informationParallel and High Performance Computing CSE 745
Parallel and High Performance Computing CSE 745 1 Outline Introduction to HPC computing Overview Parallel Computer Memory Architectures Parallel Programming Models Designing Parallel Programs Parallel
More informationPGAS: Partitioned Global Address Space
.... PGAS: Partitioned Global Address Space presenter: Qingpeng Niu January 26, 2012 presenter: Qingpeng Niu : PGAS: Partitioned Global Address Space 1 Outline presenter: Qingpeng Niu : PGAS: Partitioned
More informationA Local-View Array Library for Partitioned Global Address Space C++ Programs
Lawrence Berkeley National Laboratory A Local-View Array Library for Partitioned Global Address Space C++ Programs Amir Kamil, Yili Zheng, and Katherine Yelick Lawrence Berkeley Lab Berkeley, CA, USA June
More informationOpenSHMEM as a Portable Communication Layer for PGAS Models: A Case Study with Coarray Fortran
OpenSHMEM as a Portable Communication Layer for PGAS Models: A Case Study with Coarray Fortran Naveen Namashivayam, Deepak Eachempati, Dounia Khaldi and Barbara Chapman Department of Computer Science University
More informationCompute Node Linux: Overview, Progress to Date & Roadmap
Compute Node Linux: Overview, Progress to Date & Roadmap David Wallace Cray Inc ABSTRACT: : This presentation will provide an overview of Compute Node Linux(CNL) for the CRAY XT machine series. Compute
More informationProgramming with MPI
Programming with MPI p. 1/?? Programming with MPI Miscellaneous Guidelines Nick Maclaren Computing Service nmm1@cam.ac.uk, ext. 34761 March 2010 Programming with MPI p. 2/?? Summary This is a miscellaneous
More informationLubuntu Linux Virtual Machine
Lubuntu Linux 18.04 Virtual Machine About Us Slide / 01 Founded in 2015, Sourcery Institute is a California nonprofit public-benefit corporation engaged in research, education, and advisory services in
More informationPortable SHMEMCache: A High-Performance Key-Value Store on OpenSHMEM and MPI
Portable SHMEMCache: A High-Performance Key-Value Store on OpenSHMEM and MPI Huansong Fu*, Manjunath Gorentla Venkata, Neena Imam, Weikuan Yu* *Florida State University Oak Ridge National Laboratory Outline
More informationLinear Algebra Programming Motifs
Linear Algebra Programming Motifs John G. Lewis Cray Inc. (retired) March 2, 2011 Programming Motifs 1, 2 & 9 Dense Linear Algebra Graph Algorithms (and Sparse Matrix Reordering) (2) SIAM CSE 11 Features
More informationOptimization of MPI Applications Rolf Rabenseifner
Optimization of MPI Applications Rolf Rabenseifner University of Stuttgart High-Performance Computing-Center Stuttgart (HLRS) www.hlrs.de Optimization of MPI Applications Slide 1 Optimization and Standardization
More informationCOS 140: Foundations of Computer Science
COS 140: Foundations of Variables and Primitive Data Types Fall 2017 Copyright c 2002 2017 UMaine School of Computing and Information S 1 / 29 Homework Reading: Chapter 16 Homework: Exercises at end of
More informationMemory allocation and sample API calls. Preliminary Gemini performance measurements
DMAPP in context Basic features of the API Memory allocation and sample API calls Preliminary Gemini performance measurements 2 The Distributed Memory Application (DMAPP) API Supports features of the Gemini
More informationLocality/Affinity Features COMPUTE STORE ANALYZE
Locality/Affinity Features Safe Harbor Statement This presentation may contain forward-looking statements that are based on our current expectations. Forward looking statements may include statements about
More informationMUSIC the Multi-Simulation Coordinator. Örjan Ekeberg and Mikael Djurfeldt CSC, KTH
the Multi-Simulation Coordinator CSC, KTH The purpose of On-line pre- or post-processing of huge amounts of data for a parallel simulator within the cluster Connect models developed for different parallel
More informationIntroduction Contech s Task Graph Representation Parallel Program Instrumentation (Break) Analysis and Usage of a Contech Task Graph Hands-on
Introduction Contech s Task Graph Representation Parallel Program Instrumentation (Break) Analysis and Usage of a Contech Task Graph Hands-on Exercises 2 Contech is An LLVM compiler pass to instrument
More informationThe MPI Message-passing Standard Lab Time Hands-on. SPD Course Massimo Coppola
The MPI Message-passing Standard Lab Time Hands-on SPD Course 2016-2017 Massimo Coppola Remember! Simplest programs do not need much beyond Send and Recv, still... Each process lives in a separate memory
More informationOmni Compiler and XcodeML: An Infrastructure for Source-to- Source Transformation
http://omni compiler.org/ Omni Compiler and XcodeML: An Infrastructure for Source-to- Source Transformation MS03 Code Generation Techniques for HPC Earth Science Applications Mitsuhisa Sato (RIKEN / Advanced
More informationTechniques to improve the scalability of Checkpoint-Restart
Techniques to improve the scalability of Checkpoint-Restart Bogdan Nicolae Exascale Systems Group IBM Research Ireland 1 Outline A few words about the lab and team Challenges of Exascale A case for Checkpoint-Restart
More informationSINGLE-SIDED PGAS COMMUNICATIONS LIBRARIES. Basic usage of OpenSHMEM
SINGLE-SIDED PGAS COMMUNICATIONS LIBRARIES Basic usage of OpenSHMEM 2 Outline Concept and Motivation Remote Read and Write Synchronisation Implementations OpenSHMEM Summary 3 Philosophy of the talks In
More informationChapter 4. Fortran Arrays
Chapter 4. Fortran Arrays Fortran arrays are any object with the dimension attribute. In Fortran 90/95, and in HPF, arrays may be very different from arrays in older versions of Fortran. Arrays can have
More informationThe Parallel Boost Graph Library spawn(active Pebbles)
The Parallel Boost Graph Library spawn(active Pebbles) Nicholas Edmonds and Andrew Lumsdaine Center for Research in Extreme Scale Technologies Indiana University Origins Boost Graph Library (1999) Generic
More informationPGAS Languages (Par//oned Global Address Space) Marc Snir
PGAS Languages (Par//oned Global Address Space) Marc Snir Goal Global address space is more convenient to users: OpenMP programs are simpler than MPI programs Languages such as OpenMP do not provide mechanisms
More informationA Case for High Performance Computing with Virtual Machines
A Case for High Performance Computing with Virtual Machines Wei Huang*, Jiuxing Liu +, Bulent Abali +, and Dhabaleswar K. Panda* *The Ohio State University +IBM T. J. Waston Research Center Presentation
More informationMPI: A Message-Passing Interface Standard
MPI: A Message-Passing Interface Standard Version 2.1 Message Passing Interface Forum June 23, 2008 Contents Acknowledgments xvl1 1 Introduction to MPI 1 1.1 Overview and Goals 1 1.2 Background of MPI-1.0
More informationCompiling Techniques
Lecture 2: The view from 35000 feet 19 September 2017 Table of contents 1 2 Passes Representations 3 Instruction Selection Register Allocation Instruction Scheduling 4 of a compiler Source Compiler Machine
More informationIntroduction to Parallel Programming
Introduction to Parallel Programming Section 5. Victor Gergel, Professor, D.Sc. Lobachevsky State University of Nizhni Novgorod (UNN) Contents (CAF) Approaches to parallel programs development Parallel
More informationIntroduction to parallel computing with MPI
Introduction to parallel computing with MPI Sergiy Bubin Department of Physics Nazarbayev University Distributed Memory Environment image credit: LLNL Hybrid Memory Environment Most modern clusters and
More informationProgramming Models for Supercomputing in the Era of Multicore
Programming Models for Supercomputing in the Era of Multicore Marc Snir MULTI-CORE CHALLENGES 1 Moore s Law Reinterpreted Number of cores per chip doubles every two years, while clock speed decreases Need
More information