Compilers for High Performance Computer Systems: Do They Have a Future? Ken Kennedy Rice University

Size: px

Start display at page:

Download "Compilers for High Performance Computer Systems: Do They Have a Future? Ken Kennedy Rice University"

Brianne McDonald
5 years ago
Views:

1 Compilers for High Performance Computer Systems: Do They Have a Future? Ken Kennedy Rice University

2 Collaborators Raj Bandypadhyay Zoran Budimlic Arun Chauhan Daniel Chavarria-Miranda Keith Cooper Jack Dongarra Rob Fowler John Garvin Lennart Johnsson Chuck Koelbel Cheryl McCosh John Mellor-Crummey Dan Sorensen Linda Torczon

3 A Bit of History In the beginning, there was machine language (or assembly language) 1957: Fortran (John Backus,et.al., IBM) made it possible for every scientist to develop (high-performance) applications Today: programming high end machines is once again the near-exclusive domain of experts.

4 Making Languages Usable It was our belief that if FORTRAN, during its first months, were to translate any reasonable scientific source program into an object program only half as fast as its hand-coded counterpart, then acceptance of our system would be in serious danger... I believe that had we failed to produce efficient programs, the widespread use of languages like FORTRAN would have been seriously delayed. John Backus

5 Compilers for HPC They have not delivered needed increases in productivity over the past decade Why? Platform complexity Application complexity Lack of investment (and patience!)

6 HPF

7 HPF Goals Programming Support for Scalable Parallel Systems Focus on data parallelism Accessible, Machine-Independent Programming Model shared memory single thread of control implicit generation of communication Performance Comparable to Hand-Coded MPI

8 HPF Strategy Fortran 90 Sequential Machine Data Distribution Directives HPF Free CM-2 IBM SP-2 HP/Convex SPP2000

9 HPF Details Fortran 90 Plus Distribution Directives!HPF$ TEMPLATE D(256,256)!HPF$ ALIGN A(I,J) WITH D(I+1,J)!HPF$ DISTRIBUTE D(BLOCK,CYCLIC) Explicit and Implicit Parallelism owner computes extended looping and aggregate assignment HPF Library scatter, gather, reduction, segmented scan Extrinsic Interface

10 HPF Problems Compilers slow to mature poor performance Needed features are missing support for irregular distributions task parallelism Library support lacking no CMSSL equivalent Better compiler strategies are now available Complex relationship between program and performance Developers could not write complex applications with high performance

11 dhpf Performance Efficiency SP class 'C' 1.2 Parallel Efficiency MPI dhpf # procs.

12 IMPACT-3D HPF application: Simulate 3D Rayleigh-Taylor instabilities in plasma using TVD (1334 lines, 45 HPF directives) Problem size: 1024 x 1024 x 2048 Compiled with HPF/ES compiler 7.3 TFLOPS on 2048 ES processors ~ 45% peak Compiled with dhpf on PSC s Lemieux (Alpha+Quadrics) # procs relative speedup GFLOPS % peak

13 Rethinking HPF Language Complexity Adopt the OpenMP directives for SMP parallelism Performance Issues Embrace the HPF/JA extensions Open-source HPF Library Optimize the extrinsic interface Usability Make it possible to extend the notion of distribution

14 Encapsulated Distributions HPF s Fundamental Idea: Separate distribution from data structure Problems: Built-in distributions are not enough for some problems Expert user wants more control over distribution and performance Solution: Make it possible to add new distributions DISTRIBUTE A(Hilbert2D) where Hilbert2D is a distribution library

15 What is a Distribution? Mapping from arrays to storage Must provide a minimum set of methods Get(A,I,J), Put(A,I,J) Get(A,iteratorIJ), Put(A,iteratorIJ) where iteratorij = (1:N,J) or (1:N,1:M:2) or ((I:I), I = 1:N) Owner(A,I,J), Owners(A,iteratorIJ) Reflect (fill overlap regions) Global operators: shift, global sum Rebalance, Redistribute(Distlib2) Must do what compilers need to achieve performance

16 Advantages New distributions can be added as needed Current distributions are special cases Simplifies view of interesting new technologies Out-of-core data distribution Expert users retain more control over performance Manage own distributions Provide communication primitives (shift, global sum) as needed Can include and manage ghost regions Can design adaptivity strategy

17 Challenge: Performance Current compilers get mileage from knowing the details of the distribution Rice dhpf uses integer set framework to reason about regions requiring communication What do we do if the distribution is encapsulated in a collection of methods? Owner(A(I,J)) is a case in point Solution Strategy: Extensive preliminary processing of distribution library

18 Telescoping Languages

19 The Programming Problem: A Strategy Make it possible for end users to become application developers: users integrate software components using scripting languages (e.g., Matlab, Python, Visual Basic, S+) professional programmers develop software components

20 The Programming Problem: An Obstacle Achieving High Performance: translate scripts and components to common intermediate language optimize the resulting program using whole-program compilation

21 Whole-Program Compilation Component Library Script Translator Global Optimizing Compiler Code Generator Problem: long compilation times, even for short scripts! Problem: expert knowledge on specialization lost

22 Telescoping Languages L1 Component Library Compiler Generator Could run for many hours Script Translator L1 Compiler understands library calls as primitives Code Generator Optimized Application

23 Telescoping Languages: Advantages Compile times can be reasonable High-level optimizations can be included User retains substantive control over language performance Generate a new language with userproduced libraries Reliability can be improved

24 Applications

25 Applications Matlab SP (Signal Processing) Optimizations applied to functions Statistical Calculations in Science and Medicine in S (R or S-PLUS) Hundred-fold performance improvements Library Generation and Maintenance Component Integration Systems Parallelism in Matlab

26 Library Generator (LibGen) Prof Dan Sorensen (Rice CAAM) maintains ARPACK, a large-scale eigenvalue solver He prototypes the algorithms in Matlab, then generates 8 variants in Fortran by hand: ({Real, Complex} x {Single, Double} x {Symmetric, Nonsymmetric} Special handling for {Dense, Sparse} Could this hand generation step be avoided?

27 LibGen Performance MATLAB 6.1 MATLAB 6.5 ARPACK LibGen ,000,000x1,000,000

28 LibGen Scaling MATLAB 6.1 MATLAB 6.5 ARPACK LibGen x x x10000

29 Key Technology: Type Inference Translation of Matlab to C or Fortran requires type inference At each point in a function, translator must know types as functions of input parameter types ( type jump functions ) Constraint-based type inference algorithm Polynomial time Can be used to produce different specialized variants based on input types

30 Value of Specialization sparse-symmetric-real dense-symmetric-complex 25 dense-symmetric-real dense-nonsymmetric-complex sec

31 Parallelism in Matlab Add distributions to Matlab arrays Distributions can cross multiple processors A(1:100) A(101:200) A(201:300) A(301:400) Use distributions to determine parallelism Hide parallelism in library component operations Specialize for specific distributions

32 Parallel Matlab D = distributed(n,m,block,block); A = B + C.* D Where is each operation performed? A = D(1:n-2,1:m) + D(3:n,1:m) Some data must be communicated!

33 Parallel Matlab Implementation Matlab Interpreter Invokes Computation and Data Movement Routines on Cluster A(1:100) A(201:300) A(101) A(100) A(201) A(200) A(301) A(300) A(101:200) A(301:400) A(2:399) = (A(1:398) + A(3:400))/2

34 Parallel Matlab Implementation Produce parallel versions of functional components for each supported distribution Produce data movement components for each distribution pair This sounds like a lot of work!

35 Specialization: HPF Legacy Library written with template distribution Useful distributions specified by library designer (standard + user-defined) Examples: multipartitioning, generalized block, adaptive mesh Library specialized using HPF compilation technology to exploit parallelism Performed at language generation time

36 Implementation

37 Parallel Matlab No expensive HPF compile

38 Summary Do compilers for HPC have a future? Yes, but they will focus on productivity of end users as application developers Challenge: achieve high productivity without sacrificing application performance One strategy: Telescoping Languages

39 The End

Generation of High Performance Domain- Specific Languages from Component Libraries. Ken Kennedy Rice University

Generation of High Performance Domain- Specific Languages from Component Libraries Ken Kennedy Rice University Collaborators Raj Bandypadhyay Zoran Budimlic Arun Chauhan Daniel Chavarria-Miranda Keith