A Class-Specific Optimizing Compiler

Similar documents
Storage and Sequence Association

Extrinsic Procedures. Section 6

CSE 501: Compiler Construction. Course outline. Goals for language implementation. Why study compilers? Models of compilation

Segmentation of Images

WEB ANALYTICS A REPORTING APPROACH

Generalized Loop-Unrolling: a Method for Program Speed-Up

Control flow graphs and loop optimizations. Thursday, October 24, 13

Building a Runnable Program and Code Improvement. Dario Marasco, Greg Klepic, Tess DiStefano

Modularization Based Code Optimization Technique for Ensuring Software Product Quality

Run-Time Environments/Garbage Collection

Programming in C++ Prof. Partha Pratim Das Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

A Simple Tree Pattern Matching Algorithm for Code Generator

MICRO-SPECIALIZATION IN MULTIDIMENSIONAL CONNECTED-COMPONENT LABELING CHRISTOPHER JAMES LAROSE

Debugging and Testing Optimizers through Comparison Checking

Research Article A Two-Level Cache for Distributed Information Retrieval in Search Engines

System Assistance in Structured Domain Model Development*

Eliminating Annotations by Automatic Flow Analysis of Real-Time Programs

Research Article Path Planning Using a Hybrid Evolutionary Algorithm Based on Tree Structure Encoding

Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

JULIA ENABLED COMPUTATION OF MOLECULAR LIBRARY COMPLEXITY IN DNA SEQUENCING

Technical Review of peephole Technique in compiler to optimize intermediate code

Instructions. Notation. notation: In particular, t(i, 2) = 2 2 2

Discovering Machine-Specific Code Improvements. Technical Report. S. L. Graham (415)

Introduction to Syntax Analysis. Compiler Design Syntax Analysis s.l. dr. ing. Ciprian-Bogdan Chirila

Reading 1 : Introduction

The Design and Implementation of Genesis

Experiences in Building a Compiler for an Object-Oriented Language

Peephole Optimization Technique for analysis and review of Compiler Design and Construction

Transformation of structured documents with the use of grammar

LR Parsing of CFG's with Restrictions

OBJECT-CENTERED INTERACTIVE MULTI-DIMENSIONAL SCALING: ASK THE EXPERT

Single-pass Static Semantic Check for Efficient Translation in YAPL

Machine-Independent Optimizations

Crafting a Compiler with C (II) Compiler V. S. Interpreter

An Efficient Technique to Test Suite Minimization using Hierarchical Clustering Approach

A Simple Syntax-Directed Translator

Eclipse Support for Using Eli and Teaching Programming Languages

Alternatives for semantic processing

I/O Considered Harmful (At Least for the First Few Weeks)

ALIASES AND THEIR EFFECT ON DATA DEPENDENCY ANALYSIS

This book is licensed under a Creative Commons Attribution 3.0 License

Artificial Neuron Modelling Based on Wave Shape

Regression Verification - a practical way to verify programs

Evaluation of Semantic Actions in Predictive Non- Recursive Parsing

Loop Invariant Code Motion. Background: ud- and du-chains. Upward Exposed Uses. Identifying Loop Invariant Code. Last Time Control flow analysis

Representing Symbolic Reasoning

I m sure you have been annoyed at least once by having to type out types like this:

A Modified Inertial Method for Loop-free Decomposition of Acyclic Directed Graphs

Chapter 8 Visualization and Optimization

Heap Management. Heap Allocation

AN ONTOLOGICAL EVALUATION OF JACKSON'S SYSTEM DEVELOPMENT MODEL. Fiona Rohde. Department of Commerce The University of Queensland, 4072.

CPEG 852 Advanced Topics in Computing Systems The Dataflow Model of Computation

Introduction to Axiomatic Semantics

Scheduling in Multiprocessor System Using Genetic Algorithms

Implementing Dynamic Minimal-prefix Tries

1 Hardware virtualization for shading languages Group Technical Proposal

CS2104 Prog. Lang. Concepts

Job Re-Packing for Enhancing the Performance of Gang Scheduling

Testing Isomorphism of Strongly Regular Graphs

Designing a Semantic Ground Truth for Mathematical Formulas

AN APPROACH FOR LOAD BALANCING FOR SIMULATION IN HETEROGENEOUS DISTRIBUTED SYSTEMS USING SIMULATION DATA MINING

Programmiersprachen (Programming Languages)

Ubiquitous Computing and Communication Journal (ISSN )

Interoperability for Digital Libraries

The Relationships between Domain Specific and General- Purpose Languages

Mapping Vector Codes to a Stream Processor (Imagine)

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/27/17

Group B Assignment 8. Title of Assignment: Problem Definition: Code optimization using DAG Perquisite: Lex, Yacc, Compiler Construction

Lectures 20, 21: Axiomatic Semantics

IMPROVING PERFORMANCE OF PROGRAM BY FINDING GOOD OPTIMAL SEQUENCES USING SEQUENCE SELECTION APPROACH

Autonomous Knowledge Agents

Automatic migration from PARMACS to MPI in parallel Fortran applications 1

Milind Kulkarni Research Statement

Working of the Compilers

Joint Entity Resolution

D.A.S.T. Defragmentation And Scheduling of Tasks University of Twente Computer Science

2.2 Syntax Definition

Induction Variable Identification (cont)

Organizing Information. Organizing information is at the heart of information science and is important in many other

ISSN: [Keswani* et al., 7(1): January, 2018] Impact Factor: 4.116

Theory and Algorithms for the Generation and Validation of Speculative Loop Optimizations

Optimising for the p690 memory system

Outline. Introduction. 2 Proof of Correctness. 3 Final Notes. Precondition P 1 : Inputs include

Alan J. Perlis - Epigrams on Programming

Extracting the Range of cps from Affine Typing

String Allocation in Icon

Visual Intranet Interfaces and Architecture of Unified Information Space in the Concept of Virtual University at ZSU

Comp 204: Computer Systems and Their Implementation. Lecture 22: Code Generation and Optimisation

IS0020 Program Design and Software Tools Midterm, Fall, 2004

Parallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides)

Part 5 Program Analysis Principles and Techniques

arxiv: v1 [cs.ma] 31 Dec 2009

Distributed & Heterogeneous Programming in C++ for HPC at SC17

MPI Case Study. Fabio Affinito. April 24, 2012

Keywords Clustering, Goals of clustering, clustering techniques, clustering algorithms.

AOSA - Betriebssystemkomponenten und der Aspektmoderatoransatz

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 11

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler so far

UC Irvine UC Irvine Previously Published Works

While waiting for the lecture to begin, please complete. the initial course questionnaire.

Transcription:

A Class-Specific Optimizing Compiler MICHAEL D. SHARP AND STEVE W. OTTO Oregon Graduate Institute of Science and Technology, Beaverton, OR 97006-1999 ABSTRACT Class-specific optimizations are compiler optimizations specified by the class implementor to the compiler. They allow the compiler to take advantage of the semantics of the particular class so as to produce better code. Optimizations of interest include the strength reduction of class: :array address calculations, elimination of large temporaries, and the placement of asynchronous send/recv calls so as to achieve computation/ communication overlap. We will outline our progress towards the implementation of a C++ compiler capable of incorporating class-specific optimizations. 1994 by John Wiley & Sons, Inc. 1 INTRODUCTION During the implementation of complex systems in C++, particularly numerical ones, the implementor typically encounters performance problems of varying difficulty. These difficulties usually relate to the lack of semantic understanding the C++ compiler has of the user-defined classes. This problem was recently studied [ 1] where the potential solution of class-based optimizations was put forth. A class-based optimization makes use of semantic information normally not known to the compiler. These optimization rules are specified by the user as part of the class description and they are dynamically linked to the compiler's standard optimizer. Although the notion of a ruledirected optimizer is not new [2] it is not widespread. The authors believe this is the first time the optimization rules have been user specified for the C++ language. Received April 1993 Revised June 1993 1994 by John Wiley & Sons, Inc. Scientific Programming, Vol. 2, pp. 235-238 (1993) CCC 1058-9244/94/040235-04 After introducing two example optimizations, this article will focus on some of the issues relating to the construction of a system implementing class-based optimizations. The issues discussed relate mainly to optimization specification, detection of applicability, and application. 2 SAMPLE OPTIMIZATIONS Throughout this article two optimizations will be used as examples. The first is the temporary variable elimination optimization, and the second is strength reduction combined with induction variable analysis in a general array iterator. The later construct is an extension the authors have made to the C + + language. 2.1 Temporary Variable Elimination In numerical computations it is often advantageous to optimize a program for the amount of memory used. One of the easiest ways to optimize a program for minimal memory usage is to eliminate large temporary variables. We will use the example of matrix calculations to demonstrate the point. The two code fragments appearing in Fig- 235

236 SHARP AND OTTO class Matrix { public: Matrix(); Matrix& operator=(const Matrix&); Matrix& operator+=(const Matrix&); ; friend Matrix& operator+(const Matrix&,const Matrix&); main(){ Matrix A, B, C, D; II.. A = B + C + D; I I Fragment 1 II.. A= B; A+ = C; A+= D; I I Fragment 2 I I... FIGURE 1 :vlatrix code fragments. ure 1 show the value of this optimization. The first fragment requires a temporary matrix whereas the second avoids this by performing the calculation in place. Ideally the transform from the first to second fragment would be handled bv a matrix class-specific optimization. The above optimization applies only if the '' + '' operator is at the root of the expressi~n tree. The same type of optimization will also apply for other overloaded operators such at *, I, and.-. This is not true in general, but if a user overloads EB then the operation is usually one that has similar characteristics to the integ~r EB operator. We now consider what happens in the optimization of a general expression containing several operators. The optimization rule is continually applied to the expression tree starting at the root. If the operator at the root of this tree is TVE (the ability to express the expression without tempo ~ary usage at this level) the single statement is split mto two statements (the = and the + =) as was done in fragment 2 of Figure 1. The optimization is then applied to each statement in turn. The statements continue to split into more statements as long as the root operator has the TVE property. If the root operator for a statement is not TVE then a temporary (of potentially large size) must be created. 2.2 Optimizing Abstract Array lterators Consider a partitioned array container class as described by Otto [3]. The partition types that are supported are block and block cyclic. In Fortran, access at array el~ments inside of do loops is very efficient. Thi~ is possible because Fortran does not have the pointer aliasing problems of C and C++, and the semantics of the do $X.iterate i over [0: 100 : 1]$ { X.elem(i) =a* X.elem(i) + y.elem(i); FIGURE 2 One-dimensional array iterator. loop are simpler than those of for. As a result Fortran compilers are able to perform induction variable analysis and strength reduction so that array address calculations are done efficiently. Although there are C++ compilers, g+ + for e~ample, capable of such optimizations, this is not the norm. One of our goals is to provide such a strength reduction optimization on a class by class basis. Using this approach it is possible to. avoid illegal applications and to guarantee the optimization will be applied without relying on the underlying compiler to implement it. Consider the simple example of an iterator for a one-dimensional array in Figure 2. If X is a block partitioned array this iterator might be implemented along the lines of Figure 3, and if X is block-cyclic partitioned, the iterator might be implemented as in Figure 4. Clearly the situation becomes complex for multidimensional, block-cyclic partitioned arravs. With proper optimizations for array iterators the c_oding complexity of multidimensi~nal com~utatlons can be reduced. General iterators also expose opportunities for additional optimization due to the less restrictive nature of the control structure. That is, because a precise ordering of the iteration space is not specified by the programmer the optimizer has more flexibility in loop restructuring. 3 OPTIMIZATION SPECIFICATION Two of the most difficult technical problems in the implementation of class-based optimizations are defining a language in which to describe general optimizations, and the implementation of the pattern matching routine that detects when to applv optimizations. What is presented in this and in th~ next section are not complete answers to these difficult problems, rather the current direction of research of the authors. FIGURE 3 iterator. for (i = X.start(O); i < X.end(O); i + +) { (X.base + i) =... One-dimensional block partitioned array

A CLASS-SPECIFIC OPTIYIIZII\G COMPILER 237 for (I= 0; I< X.numBiocks(O); ++I) { for (i = X.start(O, I); i < X.end(O, I);++ i) { *(X.base[l] + i) = FIGURE 4 One-dimensional block-cyclic partitioned array itcrator. In attempting to define a language to describe general optimizations there are a number of issues to be considered. It must be possible to not only describe the syntactic pattern to match, but to also specify the semantics, and dependencies of this code. Any optimization triggering heuristics must also be specifiable in this language. The syntactic patterns to be matched may not necessarily be contiguous. It is quite reasonable to expect user-defined optimizations to require the ability to skip past statements searching for some matching condition, or to require a certain set of conditions for an arbitrarily long list of commands. For example, the iteration optimizations discussed earlier require the examination of the entire loop body. A language for the specification of optimizations called GOSPEL was presented by Whitfield and Sofa [ 4]. This language expresses optimiza- AssignmentPtr=Find( Assignment); while ( AssignmentPtr) { I I check the type of this assignment if(type(assignmentptr)==matrixclass) { I I check for the pattern B+C on the right I I hand side of the assignment where B and C I I are any su bexpression. Expr Ptr=RightHandSide( AssignmentPtr); if( Operator(Expr Ptr )==0 P _Plus) { I I break A=B+C into A=B;A+=C I I since a match was found this statement should be I I re-processed in hopes of finding another. I I AssignmentPtr should not be changed before the I I next loop iteration. Tree construction code omitted for brevity. The newly constructed tree is pointed to by APiusCTree RightHandSide( AssignmentPtr )=LeftOperand(Expr Ptr); InsertStatementAfter(AssignmentPtr,APlusCTree) else { I I didn't find a pattern match. I I move to the next assignment statement Assign men tptr= FindN ext( Assignment, Assignment Ptr); else { I I didn't find a pattern match because I I the class type was wrong Assign men tptr= Find Next (Assignment,Assignment Ptr); FIGURE 5 elimination. Current specification form for temporary tions in terms of both the general program structure to be matched as well as the data dependencies necessary for the optimization to result in semantically correct code. Currently we are implementing optimizations at a much l~wer mechanical level (Fig. 5). Future research includes the definition of a language similar to GOSPEL, but more closely tied to C++. An ideal form of specification would be C++ extended with inspiration from programming logics. In such a language the general syntactic form of the optimization could be specified by fragments of C++, whereas the data dependence and any heuristics could appear in embedded assertions. It would surely be the most useful representation because optimizations would then be specified more by partial code examples and a few language extensions then by another language altogether. Figure 6 shows a possible form of a loop interchange optimization. The current design of our optimizer is quite similar to what one would use to implement an optimization in a traditional compiler. It is very dependent on the internal representation of the code and the writer of such an optimization must $pattern for ($1,$2,$3) { for ($4,$5,$6) { $A; ${ /* check for dependence between invariants *I /*($!->Statement is the statement containing *I /*the fragment represented by $1) *I if Dependence($1->Statement,$4->Statement,any) fail; /*check for ( <,>) dependence between *I /* two statements in the loop body *I foraiistmt($a,$7) { I* for all individual statements $7 in $A *I foraiistmt($a,$8) { I* check dependence for legality of optimization *I if (Dependence($7,$8,"( <,> )")) fail; $ $optimization for ($4,$5,$6) { for ($1,$2,$3) { $A; where: $A is a meta variable representing zero or more statements $J-$n are meta variables representing components of a statement FIGURE 6 Loop interchange specification.

238 SHARP AND OTTO have knowledge of this representation. Although neither of the authors is satisfied with this as a final goal, it is felt to be a good intermediate step to prove the concept of class-based optimizations. 4 OPTIMIZER IMPLEMENTATION The optimizer's implementation is greatly complicated by the fact that before an optimization can be applied, the associated pattern of unoptimized code must be located in the internal representation of the program. In the past various code generated and peephole optimizers [5-7] have done this, but either always on small contiguous patterns, or, if an attributed grammar is used, with restrictions on the use of attributes. In the case of this optimizer it must be possible to match noncontiguous patterns, as they were discussed in the previous section, and to add additional attributes (derived from operations on the compiler supplied ones) based on the needs of the optimization. Another complicating factor for the implementation of the optimizer is the determination of the optimization ordering. Usually this is determined by the compiler architect, however because the actual optimizations are now being supplied by the class designers, it is quite conceivable that the ordering of optimizations will play a role in the efficiency of the optimized code. Ordering problems will hopefully be minimal because optimizations are triggered by class occurrences, but an ordering mechanism should still be explored. Certainly such a mechanism will depend heavily on the user's application and should be specified by the user if the default ordering is not acceptable. 5 CURRENT STATUS At the time of writing, the authors have a working C++ to C++ optimizer that was custom built for this project. This is felt to be of great worth due to the avoided additional complexity of layering such an optimizer on top of a public domain compiler that was not designed with such capabilities in mind. The optimizer was implemented using a tool, developed by one of the authors, to describe complex attribute relationships and structures. The tool allowed a relatively quick implementation of a very memory efficient internal representation of C++. Because all of the attributes of this representation are managed and mapped by this tool, there is great flexibility in our optimizer when it comes to adding to the internal program representation. Work is currently under way to define the optimization specification language, as well as implement the pattern matcher. As was stated earlier, the current approach is very mechanical and strongly dependent on the internal representation of the program being optimized. It is the authors' goal to evolve this into a much higher form in the hopes of hiding many of the details of the compiler implementation. REFERENCES [1] I. G. Angus, Applications Demand Class-Specific Optimizations: The C++ Compiler Can Do More, 1993 Object-Oriented Numerics Conference. Corvallis, OR: Rogue Wave Software, 1993, pp. 25-27. [2] J. W. Davidson and D. B. Whalley, "Quick compilers using peephole optimization, Software Practice Experience, vol. 19, pp. 79-97, 1989. [3] S. W. Otto, 1993 Object-Oriented Numerics Conference. Corvallis, OR: Rogue Wave Software, 1993. [ 4] D. Whitfield and M. L. Soffa, Proceedings of the 1991 SIGPLAN Conference on Programming Language Design and Implementation. 1\'ew York: ACM Press, pp. 120-129. [5] A. V. Aho, M. Ganapathi, and S. W. K. Tjiang, "Code generation using tree matching and dynamic programming, A0\1 Transact Programming Languages 5_ystems, vol: 11, pp. 491-516. 1989. [6] M. Ganapathi and C. N. Fischer, "Affix grammar driven code generation, ACA1 Transact. Programming Languages Systems, vol. 7, pp. 560-599. 1985. [7] R. S. Glanville and S. L. Graham, Proceeding~ of the Fifth Annual ACM Symposium on Principles of Programming Languages. ='Jew York: ACM Press, pp. 231-240.

Industrial Engineering Multimedia The Scientific World Journal Applied Computational Intelligence and Soft Computing International Distributed Sensor Networks Fuzzy Systems Modelling & Simulation in Engineering Submit your manuscripts at Computer Networks and Communications Artificial Intelligence International Biomedical Imaging Artificial Neural Systems International Computer Engineering Computer Games Technology Software Engineering International Reconfigurable Computing Robotics Computational Intelligence and Neuroscience Human-Computer Interaction Electrical and Computer Engineering