Parallel Programming Features in the Fortran Standard. Steve Lionel 12/4/2012

Similar documents
GAP Guided Auto Parallelism A Tool Providing Vectorization Guidance

C Language Constructs for Parallel Programming

Intel MKL Data Fitting component. Overview

Intel Parallel Amplifier Sample Code Guide

What's new in VTune Amplifier XE

Using Intel Inspector XE 2011 with Fortran Applications

Enabling DDR2 16-Bit Mode on Intel IXP43X Product Line of Network Processors

Getting Compiler Advice from the Optimization Reports

Overview of Intel Parallel Studio XE

Software Tools for Software Developers and Programming Models

Intel Direct Sparse Solver for Clusters, a research project for solving large sparse systems of linear algebraic equation

Techniques for Lowering Power Consumption in Design Utilizing the Intel EP80579 Integrated Processor Product Line

Intel IT Director 1.7 Release Notes

Intel MKL Sparse Solvers. Software Solutions Group - Developer Products Division

Intel MPI Library for Windows* OS

Intel C++ Compiler Documentation

Intel(R) Threading Building Blocks

Open FCoE for ESX*-based Intel Ethernet Server X520 Family Adapters

How to Configure Intel X520 Ethernet Server Adapter Based Virtual Functions on SuSE*Enterprise Linux Server* using Xen*

Parallel Programming Models

Product Change Notification

Beyond Threads: Scalable, Composable, Parallelism with Intel Cilk Plus and TBB

Intel Software Development Products Licensing & Programs Channel EMEA

Повышение энергоэффективности мобильных приложений путем их распараллеливания. Примеры. Владимир Полин

Intel(R) Threading Building Blocks

Intel Fortran Composer XE 2011 Getting Started Tutorials

Third Party Hardware TDM Bus Administration

Cilk Plus in GCC. GNU Tools Cauldron Balaji V. Iyer Robert Geva and Pablo Halpern Intel Corporation

Intel IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor: Boot-Up Options

Continuous Speech Processing API for Host Media Processing

Introduction to Intel Fortran Compiler Documentation. Document Number: US

Product Change Notification

Intel Platform Controller Hub EG20T

Product Change Notification

Overview of Intel MKL Sparse BLAS. Software and Services Group Intel Corporation

Installation Guide and Release Notes

Intel Platform Controller Hub EG20T

Product Change Notification

ECC Handling Issues on Intel XScale I/O Processors

Product Change Notification

Product Change Notification

Product Change Notification

Product Change Notification

Product Change Notification

Product Change Notification

Product Change Notification

Intel Parallel Studio XE 2011 for Windows* Installation Guide and Release Notes

Product Change Notification

Installation Guide and Release Notes

Agenda. Optimization Notice Copyright 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.

Intel IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor PCI 16-Bit Read Implementation

Product Change Notification

Product Change Notification

VTune(TM) Performance Analyzer for Linux

Product Change Notification

Product Change Notification

Product Change Notification

MayLoon User Manual. Copyright 2013 Intel Corporation. Document Number: xxxxxx-xxxus. World Wide Web:

Product Change Notification

Intel IXP400 Software: Integrating STMicroelectronics* ADSL MTK20170* Chipset Firmware

Product Change Notification

Using the Intel VTune Amplifier 2013 on Embedded Platforms

Product Change Notification

Intel Parallel Studio XE 2011 SP1 for Linux* Installation Guide and Release Notes

Product Change Notification

Intel Platform Controller Hub EG20T

Intel Thread Profiler

Product Change Notification

Getting Started with Intel SDK for OpenCL Applications

Intel Parallel Studio XE 2015 Composer Edition for Linux* Installation Guide and Release Notes

Intel EP80579 Software Drivers for Embedded Applications

Product Change Notification

Product Change Notification

Vectorization Advisor: getting started

Intel Advisor XE Future Release Threading Design & Prototyping Vectorization Assistant

Product Change Notification

Product Change Notification

Intel Parallel Studio XE 2011 for Linux* Installation Guide and Release Notes

Installation Guide and Release Notes

Overview

Enabling Hardware Accelerated Playback for Intel Atom /Intel US15W Platform and IEGD

Continuous Speech Processing API for Linux and Windows Operating Systems

Virtual PLATFORMS for complex IP within system context

Using Intel VTune Amplifier XE and Inspector XE in.net environment

Intel Xeon Phi Coprocessor. Technical Resources. Intel Xeon Phi Coprocessor Workshop Pawsey Centre & CSIRO, Aug Intel Xeon Phi Coprocessor

Using Intel VTune Amplifier XE for High Performance Computing

Product Change Notification

What s New August 2015

Global Call API for Host Media Processing on Linux

A Simple Path to Parallelism with Intel Cilk Plus

Product Change Notification

Recommended JTAG Circuitry for Debug with Intel Xscale Microarchitecture

This guide will show you how to use Intel Inspector XE to identify and fix resource leak errors in your programs before they start causing problems.

IXPUG 16. Dmitry Durnov, Intel MPI team

OpenMP * 4 Support in Clang * / LLVM * Andrey Bokhanko, Intel

Overview

Product Change Notification

Intel EP80579 Software for IP Telephony Applications on Intel QuickAssist Technology Linux* Device Driver API Reference Manual

C Language Constructs for Parallel Programming

Intel Dialogic Global Call Protocols Version 4.1 for Linux and Windows

Transcription:

Parallel Programming Features in the Fortran Standard Steve Lionel 12/4/2012

Agenda Overview of popular parallelism methodologies FORALL a look back DO CONCURRENT Coarrays Fortran 2015 Q+A 12/5/2012 2

Fortran on the Side 12/5/2012 3

Popular Parallelism Methodologies Defined by multi-vendor consortiums OpenMP* Threading on shared-memory systems Directive-based Single program execution, fork-join parallelism Requires compiler support OpenMP Architecture Review Board OpenMP.org Message Passing Interface (MPI) Shared or distributed memory API (procedure call) based Multiple copies of program run in parallel No explicit compiler support required, but MPI Forum mpi-forum.org 12/5/2012 4

Popular Parallelism Methodologies Implementation-specific OS threads (Windows threads, pthreads, etc.) Defined by OS vendor API based Single copy of program, typically worker threads No explicit compiler support required Auto-Parallel Feature of Intel (and some other) compilers Directives needed for best performance Loops and array operations only Compiler support required 12/5/2012 5

The Fortran Way 12/5/2012 6

FORALL (1/2) Provides array assignments controlled by a tripletspec and, optionally, a mask Originally part of High-Performance Fortran, a dialect extending Fortran 90 Adopted in Fortran 95 Example: FORALL (I=1:10,J=1:10,B(I,J)/=0) A(I,J) = REAL(I+J+2) B(I,J) = A(I,J) + B(I,J) * REAL(I*J) END FORALL 12/5/2012 7

FORALL (2/2) Not a loop construct! Each array assignment evaluated completely in turn Inefficient to parallelize 12/5/2012 8

DO CONCURRENT (1/3) New in Fortran 2008 Uses FORALL header, but no mask Iterations can execute in any order and to any degree of parallelism Programmer is responsible for making sure there are no loop-carried dependencies Intel Fortran will attempt to parallelize with autoparallel enabled Helps vectorization Fork-join model 12/5/2012 9

DO CONCURRENT (2/3) Example: DO CONCURRENT (I=1:N) A(I) = T + (B(I) * C(I)) END DO Limitations Must use BLOCK to create iteration-private variables Intel Fortran doesn t yet support BLOCK Not suitable for reductions I/O allowed, but no dependence on order 12/5/2012 10

DO CONCURRENT (3/3) Example using BLOCK: DO CONCURRENT (I=1:N) BLOCK REAL :: T T = A(I) + B(I) C(I) = T + SQRT(I) END BLOCK END DO Scatter/Gather example: DO CONCURRENT (I=1:M) A(IND(I)) = I END DO 12/5/2012 11

Coarrays New in Fortran 2008 Derived from F-- specification in early 2000s Cray implementation on T3E and X1 supercomputers Partitioned Global Address Space (PGAS) model PGAS also used by UPC, X10, Chapel Multiple copies of program run in parallel Each copy called an image 12/5/2012 12

Coarrays What is a coarray? Variable declared to have CODIMENSIONs with [] REAL, DIMENSION(1000), CODIMENSION [*] :: X Last codimension must be * - computed at runtime from number of images Can be array or scalar Each image has its own piece of the coarray Images can reference other image s pieces by using coindices enclosed in [] X[3] Images can reference their own piece by omitting the [] 12/5/2012 13

Coarrays REAL :: X(2,2) [*]! Assume four images X(1,1)[1] X(1,2)[1] X(2,1)[1] X(2,2)[1] X(2,1) on image 2 X(1,1)[2] X(1,2)[2] X(2,1)[2] X(2,2)[2] X(1,1)[3] X(1,2)[3] X(2,1)[3] X(2,2)[3] X(1,2)[3] on image 4 X(1,1)[4] X(1,2)[4] X(2,1)[4] X(2,2)[4] 12/5/2012 14

Coarrays Mapping of objects with codimensions: 2D real, codimension[2,*] :: x Mapping of X if program run with 6 images: Image 1 X[1,1] Image 3 X[1,2] Image 5 X[1,3] Image 2 X[2,1] Image 4 X[2,2] Image 6 X[2,3] Mapping of X if program run with 9 images Image 1 X[1,1] Image 3 X[1,2] Image 5 X[1,3] Image 7 X[1,4] Image 9 X[1,5] Image 2 X[2,1] Image 4 X[2,2] Image 6 X[2,3] Image 8 X[2,4] 12/5/2012 15

Coarrays Coarrays can be ALLOCATABLE Polymorphic Used in assignments and expressions Used in READ and WRITE statements Passed as arguments to procedures Used as dummy arguments in procedures (explicit interface required) Coarrays can t be Allocated differently in different images Interoperable with C 12/5/2012 16

Coarrays and Synchronization Implicit synchronization points At image start When a coarray is allocated At image end Explicit synchronization SYNC ALL SYNC IMAGES (image-list) SYNC MEMORY (image-list) Critical sections (CRITICAL END CRITICAL) LOCK and UNLOCK statements ATOMIC_DEFINE and ATOMIC_REF intrinsics ERROR STOP 12/5/2012 17

Intrinsics for coarrays NUM_IMAGES() IMAGE_INDEX(varname) LCOBOUND(varname) UCOBOUND(varname) THIS_IMAGE() 12/5/2012 18

Input and Output with coarrays Standard output (unit 6) preconnected in all images Intel Fortran will merge the streams and display in image 1 Order of output not guaranteed Standard input (unit 5) preconnected for image 1 only For all other units, each image has their own independent set 12/5/2012 19

Example coarray code my_subgrid( 0, 1:my_M) = my_subgrid( my_n, 1:my_M)[my_north_P,me_Q] my_subgrid( my_n+1, 1:my_M) = my_subgrid( 1, 1:my_M)[my_south_P,me_Q] my_subgrid( 1:my_N, my_m+1) = my_subgrid( 1:my_N, 1 my_subgrid( 1:my_N, 0 )[me_p, my_east_q] ) = my_subgrid( 1:my_N, my_m )[me_p, my_west_q] max_global = MAXVAL( ABS( my_subgrid_new_values(1:my_n,1:my_m) - & my_subgrid(1:my_n,1:my_m) ) ) SYNC ALL! protects both max_global and my_subgrid IF (me == 1) THEN DO I= 2,NUM_IMAGES() max_local = max_global[i] max_global = MAX( max_global, max_local ) END DO END IF 12/5/2012 20

Coarrays in Intel Fortran (1/2) New in Intel [Visual] Fortran Composer XE 2011 Supported on Linux and Windows only Underlying transport is Intel MPI Run-time libraries provided Shared memory model included with Composer XE Distributed memory model (cluster) requires Intel Cluster Studio XE license Windows cluster support added in Composer XE 2011 Update 6 12/5/2012 21

Coarrays in Intel Fortran (2/2) Compile with coarray (/Qcoarray) to get coarray syntax and features enabled For shared memory, just run executable For distributed memory, use coarray=distributed (/Qcoarray:distributed) and define MPI ring Whole programs only can t create coarray-using library for use by non-coarray programs 12/5/2012 22

Fortran 2015 12/5/2012 23

Fortran 2015 Name for next revision of Fortran Standard Technical work to be completed in 2014 Enhancements to C Interoperability (TS29113) Enhancements to coarray features Draft under development Teams Events Collective procedures Additional atomic procedures Corrections and Clarifications 12/5/2012 24

Q+A 12/5/2012 25

12/5/2012 26

Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED AS IS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, reference www.intel.com/software/products. BunnyPeople, Celeron, Celeron Inside, Centrino, Centrino Atom, Centrino Atom Inside, Centrino Inside, Centrino logo, Cilk, Core Inside, FlashFile, i960, InstantIP, Intel, the Intel logo, Intel386, Intel486, IntelDX2, IntelDX4, IntelSX2, Intel Atom, Intel Atom Inside, Intel Core, Intel Inside, Intel Inside logo, Intel. Leap ahead., Intel. Leap ahead. logo, Intel NetBurst, Intel NetMerge, Intel NetStructure, Intel SingleDriver, Intel SpeedStep, Intel StrataFlash, Intel Viiv, Intel vpro, Intel XScale, Itanium, Itanium Inside, MCS, MMX, Oplus, OverDrive, PDCharm, Pentium, Pentium Inside, skoool, Sound Mark, The Journey Inside, Viiv Inside, vpro Inside, VTune, Xeon, and Xeon Inside are trademarks of Intel Corporation in the U.S. and other countries. *Other names and brands may be claimed as the property of others. Copyright 2012. Intel Corporation. http://intel.com/software/products 12/5/2012 27

Intel s compilers may or may not optimize to the same degree for non-intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. revision #20110804 12/5/2012 28