A Coarray Fortran Implementation to Support Data-Intensive Application Development

Size: px

Start display at page:

Download "A Coarray Fortran Implementation to Support Data-Intensive Application Development"

Lucy Phillips
5 years ago
Views:

1 A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati 1, Alan Richardson 2, Terrence Liao 3, Henri Calandra 3, Barbara Chapman 1 Data-Intensive Scalable Computing Systems 2012 (DISCS 12) Workshop, November 16, Department of Computer Science, University of Houston 2 Department of Earth, Atmospheric, and Planetary Sciences, MIT 3 Total E&P 1

massive amounts of data more powerful hardware more

2 Oil and Gas Industry: Compute Needs Industry is looking for faster and more cost-effective ways to process massive amounts of data more powerful hardware more productive programming models innovative software techniques 2

3 Outline Fortran 2008 parallel processing additions (CAF) CAF Implementation in OpenUH Fortran compiler Application port to CAF and Results Further extensions for Parallel I/O Closing Remarks 3

4 Outline Fortran 2008 parallel processing additions (CAF) CAF Implementation in OpenUH Fortran compiler Application port to CAF and Results Further extensions for Parallel I/O Closing Remarks 4

5 Coarray Model in Fortran 2008 Derives from Co-Array Fortran (CAF) SPMD execution model, PGAS memory model execution entities called images coarrays: globally-accessible, symmetric data objects additional intrinsic subroutines/functions for querying process and data information additional statements in language for synchronization 5

6 Working with Distributed Data using Coarrays * M real:: B[M, *] B references local B B[3,4] references local B B[3,3] references B in left neighbor 6

7 Working with Distributed Data using Coarrays * real:: B(10,10)[M, *] B(2:4,2:4) references local subarray of B B(2:4,2:4)[3,4] references local subarray of B B(2:4,2:4)[3,3] references subarray of B in left neighbor M 7

8 2D Halo Exchange with MPI real :: a(0:r+1, 0:C+1) call mpi_isend( a(1,1:c), C, mpi_real, & top(myp), TAG,...) call mpi_irecv( a(r+1,1:c), C, mpi_real, & bottom(myp), TAG,...) call mpi_isend( a(r,1:c), C, mpi_real, & bottom(myp), TAG,...) call mpi_irecv( a(0,1:c), C, mpi_real, & top(myp), TAG,...) call mpi_isend( a(1:r,c), R, mpi_real, & right(myp), TAG,...) call mpi_irecv( a(1:r,0), R, mpi_real, & left(myp), TAG,...) call mpi_isend( a(1:r,1), R, mpi_real, & left(myp), TAG,...) call mpi_irecv( a(c+1,1:r), R, mpi_real, & right(myp), TAG,...) call mpi_waitall( 8,...) 8

9 2D Halo Exchange Example with CAF real :: a(0:r+1, 0:C+1)[pR,*] a(r+1,1)[top(1),top(2)] = a(1,1:c) a(0,1:c)[bottom(1),bottom(2)] = a(r,1:c) a(1:r,0)[right(1),right(2)] = a(1:r,c) a(1:r,c+1)[left(1),left(2)] = a(1:r,1) sync all 9

10 Outline Fortran 2008 parallel processing additions (CAF) CAF Implementation in OpenUH Fortran compiler Application port to CAF and Results Further extensions for Parallel I/O Closing Remarks 10

Implementation of CAF OpenUH compiler an industry-quality, optimizing compiler based on Open64 features: dependence and data-flow analysis, interprocedural analysis, OpenMP backend supports multiple

11 Implementation of CAF OpenUH compiler an industry-quality, optimizing compiler based on Open64 features: dependence and data-flow analysis, interprocedural analysis, OpenMP backend supports multiple targets (x86_64, IA64, IA32, MIPS, PTX) OpenUH Compiler CAF Source Code Fortran Front-End with coarray support Coarray Translation Phase Loop Optimizer Global Optimizer Code Gen OpenUH CAF Runtime Library exec. 11

12 Runtime Support for CAF Runtime Interface (libcaf) Collectives Support (e.g. reductions) PGAS Memory Allocation 1-sided Communication Synchronization Atomics Portable Communication Substrate: GASNet or ARMCI 12

13 Comparison with other Implementations Compiler Commercial/Free Fortran 2008 Coarray Support? OpenUH Free Yes G95 Partially Free, No longer supported Missing Locks Support Gfortran Free In progress Rice CAF 2.0 Free Partially, but adds different features Cray Fortran Commercial Yes Intel Fortran Commercial Yes 13

14 Outline Fortran 2008 parallel processing additions (CAF) CAF Implementation in OpenUH Fortran compiler Application port to CAF and Results Further extensions for Parallel I/O Closing Remarks 14

15 Seismic Subsurface Imaging: Reverse Time Migration A source wave is emitted per shot Reflected waves captured by array of sensors RTM (in time domain) uses finite difference method to numerically solve wave equation and reconstruct subsurface image (in parallel, with domain decomposition) 15

16 RTM Implementations Isotropic simplest model assumes reflected waves propagate at same speed in every direction from a point only swaps faces (6 swaps in halo exchange) Tilted Transverse Isotropy (TTI) assumes waves may propagate at different speeds swaps faces and edges (18 swaps in halo exchange) 16

17 Typical Data Usage 82 thousand shots data parallel problem, where each shot can be processed independently in parallel each shot may handle ~80 MB of data so, total data to analyze is ~6 TB Handling I/O C I/O reads in velocity and coefficient models Shot headers read by master and distributed Each processor writes to a distinct file, and file is merged in post-processing step 17

18 Results for CAF RTM port Total Domain Size: 1024 x 768 x 512 Forward Shot Isotropic case: up to 32% faster compared to corresponding MPI implementation TTI case: competitive performance with MPI 18

19 Results for CAF RTM port Total Domain Size: 1024 x 768 x 512 Backward Shot Isotropic case: performance hit at 256 procs TTI case: lagging a bit behind MPI 19

20 Outline Fortran 2008 parallel processing additions (CAF) CAF Implementation in OpenUH Fortran compiler Application port to CAF and Results Further extensions for Parallel I/O Closing Remarks 20

21 Extending Fortran for Parallel I/O We are currently designing a prototype implementation for a parallel I/O language extension Fortran I/O was not yet extended to facilitate cooperative I/O to shared files original Co-Array Fortran specified a simple extension to Fortran I/O parallel I/O may be added in a future version of the standard 21

access= direct, recl=k ) write (10, rec=3) A write file

22 Fortran I/O Fortran provides interfaces for formatted and unformatted I/O open( 10, file= fn, action= write, & access= direct, recl=k ) write (10, rec=3) A write file fn connected to unit 10 record 1 record 2 record 3 record 4 A 22

23 Current limitations of I/O Issues: 1. no defined, legal way for multiple images to access the same file 2. a file is a 1-dimensional sequence of records 3. records are read/written one at a time 4. no mechanism for collectives accesses to a shared file amongst multiple images 23

24 Proposed Extension for Parallel I/O Allow a file to be share-opened, e.g. OPEN( 10, file= fn, TEAM= yes, ) all images form a team with shared access to the same file implicit synchronization recommended only for direct access mode FLUSH statement used to ensure changes by one image are visible to other images in team CLOSE statement has implicit image synchronization 24

action= write, & access= direct, ndim=2, & dims=(/m/), team= yes, recl=k ) file fn

25 Further extensions we re exploring Multi-dimensional view of records Read/write multiple records at a time Collective read/write operations on shared files open( 10, file= fn, action= write, & access= direct, ndim=2, & dims=(/m/), team= yes, recl=k ) file fn connected to unit 10 1,1 1,2 1,3 2,1 2,2 2,3 3,1 3,2 3,3 4,1 4,2 4,3 5,1 5,2 5,3 M,1 M,2 M,3 25

Further extensions we re exploring Multi-dimensional view of records Read/write multiple records at a time Collective read/write operations on shared files write (10,

26 Further extensions we re exploring Multi-dimensional view of records Read/write multiple records at a time Collective read/write operations on shared files write (10, rec_lb=(/ 2,2 /), rec_ub=(/ 4,3 /) ) & A(1:4, 1:2) A(1:4,1:2) write file fn connected to unit 10 1,1 1,2 1,3 2,1 2,2 2,3 3,1 3,2 3,3 4,1 4,2 4,3 5,1 5,2 5,3 M,1 M,2 M,3 26

27 Further extensions we re exploring Multi-dimensional view of records Read/write multiple records at a time Collective read/write operations on shared files type(t) :: A(2,2)[3,*] my_rec_lbs = get_rec_lbs( this_image() ) my_rec_ubs = get_rec_ubs( this_image() ) write_team( 10, rec_lb=my_rec_lbs, & rec_lb=my_rec_lbs) & A(:,:) file fn connected to unit 10 1,1 1,2 1,3 1,4 2,1 2,2 2,3 2,4 3,1 3,2 3,3 3,4 4,1 4,2 4,3 4,4 A(1:2,1:2)[1,1] A(1:2,1:2)[1,2] write_team 5,1 5,2 5,3 5,4 6,1 6,2 6,3 6,4 A(1:2,1:2)[2,1] A(1:2,1:2)[2,2] A(1:2,1:2)[3,1] A(1:2,1:2)[3,2] 27

28 Leverage Global Arrays as memory buffers for I/O Implementation in progress which utilizes global arrays (GA) as I/O buffers in memory compute nodes I/O requests I/O nodes asynchronous disk updates 28

29 Outline Fortran 2008 parallel processing additions (CAF) CAF Implementation in OpenUH Fortran compiler Application port to CAF and Results Further extensions for Parallel I/O Closing Remarks 29

30 In Summary Fortran coarray model may be used for processing large data sets Developed implementation that s freely available and used it to develop RTM application Fortran s I/O model doesn t support parallel I/O for large-scale, multi-dimensional array data sets, and we are working on addressing this 30

31 Thanks 31

A Coarray Fortran Implementation to Support Data-Intensive Application Development

A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati, Alan Richardson, Terrence Liao, Henri Calandra and Barbara Chapman Department of Computer Science,