Compiler Technology for Problem Solving on Computational Grids

Similar documents
Compiler Architecture for High-Performance Problem Solving

Compilers and Run-Time Systems for High-Performance Computing

Component Architectures

Grid Application Development Software

High Performance Computing. Without a Degree in Computer Science

Generation of High Performance Domain- Specific Languages from Component Libraries. Ken Kennedy Rice University

Compilers for High Performance Computer Systems: Do They Have a Future? Ken Kennedy Rice University

Parallel Matlab Based on Telescoping Languages and Data Parallel Compilation. Ken Kennedy Rice University

Telescoping Languages: A Strategy for Automatic Generation of Scientific Problem-Solving Systems from Annotated Libraries

Future Applications and Architectures

Virtual Grids. Today s Readings

Telescoping Languages: A Strategy for Automatic Generation of Scientific Problem-Solving Systems from Annotated Libraries

Toward a Framework for Preparing and Executing Adaptive Grid Programs

Grid Computing: Application Development

Using Java for Scientific Computing. Mark Bul EPCC, University of Edinburgh

GrADSoft and its Application Manager: An Execution Mechanism for Grid Applications

Telescoping MATLAB for DSP Applications

GRID*p: Interactive Data-Parallel Programming on the Grid with MATLAB

Why Performance Models Matter for Grid Computing

Biological Sequence Alignment On The Computational Grid Using The Grads Framework

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004

UG3 Compiling Techniques Overview of the Course

Experiments with Scheduling Using Simulated Annealing in a Grid Environment

THE GLOBUS PROJECT. White Paper. GridFTP. Universal Data Transfer for the Grid

Compiling Java For High Performance on Servers

Pondering the Problem of Programmers Productivity

HETEROGENEOUS COMPUTING

How to perform HPL on CPU&GPU clusters. Dr.sc. Draško Tomić

Knowledge Discovery Services and Tools on Grids

Using Cache Models and Empirical Search in Automatic Tuning of Applications. Apan Qasem Ken Kennedy John Mellor-Crummey Rice University Houston, TX

IQ for DNA. Interactive Query for Dynamic Network Analytics. Haoyu Song. HUAWEI TECHNOLOGIES Co., Ltd.

Grid-Based Data Mining and the KNOWLEDGE GRID Framework

Center Extreme Scale CS Research

Parallelizing MATLAB

1: Introduction to Object (1)

Advanced School in High Performance and GRID Computing November Introduction to Grid computing.

Building Performance Topologies for Computational Grids UCSB Technical Report

A Decoupled Scheduling Approach for the GrADS Program Development Environment. DCSL Ahmed Amin

CSE 5306 Distributed Systems. Course Introduction

What's new in IBM Rational Build Forge Version 7.1

Lecture V: Introduction to parallel programming with Fortran coarrays

The Stampede is Coming: A New Petascale Resource for the Open Science Community

OmniRPC: a Grid RPC facility for Cluster and Global Computing in OpenMP

Intel Math Kernel Library

Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit

Why Performance Models Matter for Grid Computing

Latency Hiding by Redundant Processing: A Technique for Grid enabled, Iterative, Synchronous Parallel Programs

AOSA - Betriebssystemkomponenten und der Aspektmoderatoransatz

An Empirical Study of Computation-Intensive Loops for Identifying and Classifying Loop Kernels

Decreasing End-to Job Execution Times by Increasing Resource Utilization using Predictive Scheduling in the Grid

A Performance Oriented Migration Framework For The Grid Λ

Self Adaptivity in Grid Computing

Numerical Methods in Scientific Computation

Future of JRockit & Tools

DISTRIBUTED SYSTEMS [COMP9243] Lecture 3b: Distributed Shared Memory DISTRIBUTED SHARED MEMORY (DSM) DSM consists of two components:

Sista: Improving Cog s JIT performance. Clément Béra

Java-based coupling for parallel predictive-adaptive domain decomposition

Measuring Performance. Speed-up, Amdahl s Law, Gustafson s Law, efficiency, benchmarks

NUSGRID a computational grid at NUS

New Grid Scheduling and Rescheduling Methods in the GrADS Project

Computational Mini-Grid Research at Clemson University

SIMULATION OF ADAPTIVE APPLICATIONS IN HETEROGENEOUS COMPUTING ENVIRONMENTS

DARP: Java-based Data Analysis and Rapid Prototyping Environment for Distributed High Performance Computations

Overpartioning with the Rice dhpf Compiler

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction

Amdahl s Law. AMath 483/583 Lecture 13 April 25, Amdahl s Law. Amdahl s Law. Today: Amdahl s law Speed up, strong and weak scaling OpenMP

Grid Computing Systems: A Survey and Taxonomy

Advanced Compiler Construction

Intel Performance Libraries

Introduction to Grid Computing

The MPI Message-passing Standard Practical use and implementation (I) SPD Course 2/03/2010 Massimo Coppola

Linear Algebra libraries in Debian. DebConf 10 New York 05/08/2010 Sylvestre

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction

Storage and Sequence Association

The Arm Technology Ecosystem: Current Products and Future Outlook

The Eclipse Parallel Tools Platform Project

High-Performance and Parallel Computing

Code Merge. Flow Analysis. bookkeeping

AWS Lambda: Event-driven Code in the Cloud

Offloading Java to Graphics Processors

ACCI Recommendations on Long Term Cyberinfrastructure Issues: Building Future Development

International Journal of Parallel Programming, Vol. 33, No. 2, June 2005 ( 2005) DOI: /s

Case Studies in Storage Access by Loosely Coupled Petascale Applications

Workshop on High Performance Computing (HPC08) School of Physics, IPM February 16-21, 2008 HPC tools: an overview

JOVE. An Optimizing Compiler for Java. Allen Wirfs-Brock Instantiations Inc.

Multicore Computing and Scientific Discovery

A Test Suite for High-Performance Parallel Java

Ateles performance assessment report

Technology for a better society. hetcomp.com

Software Design. Levels in Design Process. Design Methodologies. Levels..

Workflow, Planning and Performance Information, information, information Dr Andrew Stephen M c Gough

Example of a Parallel Algorithm

The Cray Rainier System: Integrated Scalar/Vector Computing

LECTURE 0: Introduction and Background

Special Issue on Program Generation, Optimization, and Platform Adaptation /$ IEEE

CP2K Performance Benchmark and Profiling. April 2011

Numerical Libraries And The Grid: The GrADS Experiments With ScaLAPACK

Ch 1: The Architecture Business Cycle

A Grid Web Portal for Aerospace

RELEASE NOTES. Sippo WebRTC Application Controller. Version December Last updated: September 2017

Transcription:

Compiler Technology for Problem Solving on Computational Grids An Overview of Programming Support Research in the GrADS Project Ken Kennedy Rice University http://www.cs.rice.edu/~ken/presentations/gridcompilers.pdf

National Distributed Computing

National Distributed Computing

National Distributed Computing Supercomputer

National Distributed Computing Supercomputer Database

National Distributed Computing Supercomputer Database Supercomputer

National Distributed Computing Database Supercomputer Database Supercomputer

What Is a Grid? Collection of computing resources Varying in power or architecture Potentially dynamically varying in load Unreliable? No hardware shared memory Interconnected by network Links may vary in bandwidth Load may vary dynamically Distribution Across room, campus, state, nation, globe Inclusiveness Distributed-memory parallel computer is a degenerate case

A Software Grand Challenge Application Development and Performance Management for Grids Problems: Reliable performance on heterogeneous platforms Varying load On computation nodes and on communications links Challenges: Presenting a high-level programming interface If programming is hard, its useless Designing applications for adaptability Mapping applications to dynamically changing architectures Determining when to interrupt execution and remap Application monitors Performance estimators

Globus Developed by Ian Foster and Carl Kesselman Originally to support the I-Way (SC-96) Basic Services for distributed computing Accounting Resource directory User authentication Job initiation Communication services (Nexus and MPI) Applications are programmed by hand User responsible for resource mapping and all communication Many applications, most developed with Globus team Even Globus developers acknowledge how hard this is

What is Needed Compiler and language support for reliable performance dynamic reconfiguration, optimization for distributed targets Development of abstract Grid programming models design of an implementation strategy for those models Development of easy-to-use programming interfaces problem-solving environments Robust reliable numerical and data-structure libraries predictability and robustness of accuracy and performance reproducibility, fault tolerance, and auditability Performance monitoring and control strategies deep integration across compilers, tools, and runtime systems performance contracts and dynamic reconfiguration

Programming Models Distributed Collection of Objects (for serious experts) message passing for communiction Problem-Solving Environment (for non-experts) packaged components graphical or scripting language for glue Distribution of Shared-Memory Programs (for experts) language-based decomposition specification from programmer parametrizable for reconfiguration example: reconfigurable distributed arrays (DAGH) implemented as distributed object collection implicit or explicit communications

Grid Compilation Architecture Goal: reliable performance under varying load Performance Feedback Software Components Performance Problem Real-time Performance Monitor Whole- Program Compiler Source Application Configurable Object Program Service Negotiator Scheduler Negotiation Grid Runtime System Libraries Dynamic Optimizer GrADS Project (NSF NGS): Berman, Chien, Cooper, Dongarra, Foster, Gannon, Johnsson, Kennedy, Kesselman, Reed, Torczon, Wolski

Grid Compilation Architecture Execution Environment Performance Feedback Software Components Performance Problem Real-time Performance Monitor Whole- Program Compiler Source Application Configurable Object Program Service Negotiator Scheduler Negotiation Grid Runtime System Libraries Dynamic Optimizer

Performance Contracts At the Heart of the GrADS Model Fundamental mechanism for managing mapping and execution What are they? Mappings from resources to performance Mechanisms for determining when to interrupt and reschedule Abstract Definition Random Variable: ρ(a,i,c,t 0 ) with a probability distribution A = app, I = input, C = configuration, t 0 = time of initiation Important statistics: lower and upper bounds (95% confidence) Issue: Is ρ a derivative at t 0? (Wolski)

Grid Compilation Architecture Program Preparation System Execution Environment Performance Feedback Software Components Performance Problem Real-time Performance Monitor Whole- Program Compiler Source Application Configurable Object Program Service Negotiator Scheduler Negotiation Grid Runtime System Libraries Dynamic Optimizer

Configurable Object Program Representation of the Application Supporting dynamic reconfiguration and optimization for distributed targets Includes Program intermediate code Annotations from the compiler Reconfiguration strategy Historical information (run profile to now) Reconfiguration strategies Aggregation of data regions (submeshes) Aggregation of tasks Definition of parameters Used for algorithm selection

Testbeds MicroGrid (Andrew Chien) Cluster of Intel PCs Runs standard Grid software (Globus, Nexus) Permits simulation of varying loads Network and processor Extensive performance modeling MacroGrid (Carl Kesselman) Collection of processors running Globus At all 8 GrADS sites Permits experimentation with real applications Cactus (Ed Seidel)

Begin Modestly Research Strategy application experience to identify opportunities prototype reconfiguration system with performance monitoring, without dynamic optimization prototype reconfigurable library Move from Simple to Complex Systems begin with heterogeneous clusters refinements of reconfiguration system and performance contract mechanism use an artificial testbed to test performance under varying conditions Experiment with Real Applications someone cares about the answers

Part II: Support for High-Level Domain-Specific Programming Telescoping Languages: Generating Problem-Solving Sytems from Annotate Libraries

Programming Productivity Challenges programming is hard professional programmers are in short supply high performance will continue to be important

Programming Productivity Challenges programming is hard professional programmers are in short supply high performance will continue to be important One Strategy: Make the End User a Programmer professional programmers develop components users integrate components using: problem-solving environments (PSEs) scripting languages (possibly graphical) examples: Visual Basic, Tcl/Tk, AVS, Khoros

Programming Productivity Challenges programming is hard professional programmers are in short supply high performance will continue to be important One Strategy: Make the End User a Programmer professional programmers develop components users integrate components using: problem-solving environments (PSEs) scripting languages (possibly graphical) examples: Visual Basic, Tcl/Tk, AVS, Khoros Compilation for High Performance translate scripts and components to common intermediate language optimize the resulting program using interprocedural methods

Whole-Program Optimization Scientific Programming In Java Goal: make it possible to use the full object-oriented power for scientific applications Many scientific implementations mimic Fortran style

Whole-Program Optimization Scientific Programming In Java Goal: make it possible to use the full object-oriented power for scientific applications Many scientific implementations mimic Fortran style OwlPack Benchmark Suite Three versions of LinPACK in Java Fortran style Lite object-oriented style Full polymorphism No differences for type

Whole-Program Optimization Scientific Programming In Java Goal: make it possible to use the full object-oriented power for scientific applications Many scientific implementations mimic Fortran style OwlPack Benchmark Suite Three versions of LinPACK in Java Fortran style Lite object-oriented style Full polymorphism Experiment No differences for type Compare running times for different styles on same Java VM Evaluate potential for compiler optimization

Performance Results Run Time in Secs 35 30 25 20 15 10 5 Results Using JDK 1.2 JIT on SUN Ultra 5 Fortran Style Lite OO Style OO Style Optimized OO Native F90 0 dgefa dgesl dgedi

Script-Based Programming Component Library User Library Script

Script-Based Programming Component Library User Library Translator Intermediate Code Script

Script-Based Programming Component Library Global Optimizer User Library Translator Intermediate Code Script

Script-Based Programming Component Library Global Optimizer User Library Translator Intermediate Code Code Generator Script

Script-Based Programming Component Library Global Optimizer User Library Translator Intermediate Code Code Generator Script Problem: long compilation times, even for short scripts!

Telescoping Languages L 1 Class 1 Library

Telescoping Languages L 1 Class 1 Library Compiler Generator Could run for hours L 1 Compiler 1

Telescoping Languages L 1 Class 1 Library Compiler Generator Could run for hours Script Script Translator L 1 Compiler 1 understands library calls as primitives Vendor Compiler Optimized Application

Telescoping Languages: Advantages Compile times can be reasonable More compilation time can be spent on libraries Amortized over many uses Script compilations can be fast Components reused from scripts may be included in libraries High-level optimizations can be included Based on specifications of the library designer Properties often cannot be determined by compilers Properties may be hidden after low-level code generation User retains substantive control over language performance Mature code can be built into a library and incorporated into language

Applications Matlab Compiler Automatically generated from LAPACK or ScaLAPACK With help via annotations from the designer Flexible Data Distributions Failing of HPF: inflexible distributions Data distribution == collection of interfaces that meet specs Compiler applies standard transformations Automatic Generation of POOMA Data structure library implemented via template expansion in C++ Long compile times, missed optimizations Generator for Grid Computations GrADS: automatic generation of NetSolve

Requirements of Script Compilation Scripts must generate efficient programs Comparable to those generated from standard interprocedural methods Avoid need to recode in standard language Script compile times should be proportional to length of script Not a function of the complexity of the library Principle of least astonishment

Telescoping Languages Script Script Translator L 1 Compiler 1 understands library calls as primitives Vendor Compiler Optimized Application

Script Compilation Algorithm Propagate variable property information throughout the program Use jump functions to propagate through calls to library

Script Compilation Algorithm Propagate variable property information throughout the program Use jump functions to propagate through calls to library Apply high-level transformations Driven by information about properties Ensure that process applies to expanded code

Script Compilation Algorithm Propagate variable property information throughout the program Use jump functions to propagate through calls to library Apply high-level transformations Driven by information about properties Ensure that process applies to expanded code Perform low-level code specialization At each call site, determine the best estimate to parameter properties that is reflected by a specialized fragment in the code database Use a method similar to unification Substitute fragment from database for call This could contain a call to a lower-level library routine.

Telescoping Languages L 1 Class 1 Library Compiler Generator Could run for hours L 1 Compiler 1

Library Analysis and Preparation Discovery of Critical Properties Which properties of parameters affect optimization Examples: value, type, rank and size of matrix Construction of jump functions for the library calls With respect to critical properties

Discovery of Critical Properties From specifications by the library designer If the matrix is triangular, then From examining the code itself Look at a promising optimization point Determine conditions under which we can make significant optimizations See if any of these conditions can be mapped back to parameter properties From sample calling programs provided by the designer call average(shift(a,-1), shift(a,+1)) Can save on memory accesses

Propagation of Properties Jump Functions (Transfer Functions) Tell the effect on properties of a call to a library routine Whether it preserves the property or changes it Computed during library preparation (compiler generation) phase Can use lots of time Can be designed to be fast Tradeoff between accuracy and performance Advantage of Jump Functions Avoid necessity for analysis of library source Prevent blowup of compilation times

Library Analysis and Preparation Discovery of Critical Properties Which properties of parameters affect optimization Examples: value, type, rank and size of matrix Construction of jump functions for the library calls With respect to critical properties Analysis of Transformation Specifications Construction of a specification-driven translator for use in compiling scripts

High-level Identities Often library developer knows high-level identities Difficult for the compiler to discern Optimization should be performed on sequences of calls rather than code remaining after expansion Example: Push and Pop Designer Push(x) followed by y = Pop() becomes y = x Ignore possibility of overflow in Push Example: Trigonometric Functions Sin and Cos used in same loop both computed using expensive calls to the trig library Recognize that cos(x) = 1-sin(x), replace and simplify.

Contextual Expansions Out of Core Arrays Operatons Get(I,J) and GetRow(I,Lo,N) Get in a loop Do I Do J Get(I,J) Enddo Enddo When can we vectorize? Turn into GetRow Answer: if Get is not involved in a recurrence. How can we know?

Library Analysis and Preparation Discovery of Critical Properties Which properties of parameters affect optimization Examples: value, type, rank and size of matrix Construction of jump functions for the library calls With respect to critical properties Analysis of Transformation Specifications Construction of a specification-driven translator for use in compiling scripts Code specialization for different sets of parameter properties For each set, assume and optimize to produce specialized code

Code Selection Example Library compiler develops inlining tables subroutine VMP(C, A, B, m, n, s) integer m,n,s; real A(n), B(n), C(m) i = 1 do j = 1, n C(i) = C(i) + A(j)*B(j) i = i + s enddo end VMP Inlining Table case on s: ==0: C(1) = sum(a(1:n)*b(1:n))!=0: C(1:n) = C(1:n) + A(1:n)*B(1:n) default: call VMP(C,A,B,m,n,s) vector

Flexible Compiler Architecture Flexible Definition of Computation Parameters program scheme base library sequence (l1, l2,, lp) subprogram source files (s1, s2,..., sn) run history (r1, r2,..., rk) data sets (d1, d2,..., dm) target configuration Compilation = Partial Evaluation several compilation steps as information becomes available Program Management When to back out of previous compilation decisions due to change When to invalidate certain inputs Examples: change in library or run history

Summary The scalable infrastructure should be a scalable problem-solver Access to information is not enough Linked computation is not enough Infrastructure is complex Seamless integration of access to remote resources in response to need Execution of applications must be adaptive In response to changing loads and resources Programming support is hard Must manage execution to reliable completion Must prepare a program for execution Ideally, should support high-level domain-specific programming Telescoping languages http://www.cs.rice.edu/~ken/presentations/gridcompilers.pdf