TR ON OPERATOR STRENGTH REDUCTION. John H. Crawford Mehd i Jazayer i

Similar documents
Machine-Independent Optimizations

CSE P 501 Compilers. Loops Hal Perkins Spring UW CSE P 501 Spring 2018 U-1

Loop Optimizations. Outline. Loop Invariant Code Motion. Induction Variables. Loop Invariant Code Motion. Loop Invariant Code Motion

Compiler Construction 2010/2011 Loop Optimizations

Compiler Construction 2016/2017 Loop Optimizations

Induction Variable Identification (cont)

A Linear-Time Heuristic for Improving Network Partitions

Compiler Design. Fall Control-Flow Analysis. Prof. Pedro C. Diniz

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

7. Optimization! Prof. O. Nierstrasz! Lecture notes by Marcus Denker!

Introduction to Optimization Local Value Numbering

Loops. Lather, Rinse, Repeat. CS4410: Spring 2013

Compiler Optimization and Code Generation

Lecture Notes on Loop Optimizations

Control flow graphs and loop optimizations. Thursday, October 24, 13

Languages and Compiler Design II IR Code Optimization

Goals of Program Optimization (1 of 2)

Group B Assignment 8. Title of Assignment: Problem Definition: Code optimization using DAG Perquisite: Lex, Yacc, Compiler Construction

Tour of common optimizations

Loop Invariant Code Motion. Background: ud- and du-chains. Upward Exposed Uses. Identifying Loop Invariant Code. Last Time Control flow analysis

Database Optimization

Lecture 3 Local Optimizations, Intro to SSA

Programming Language Processor Theory

An Improved Measurement Placement Algorithm for Network Observability

A Bad Name. CS 2210: Optimization. Register Allocation. Optimization. Reaching Definitions. Dataflow Analyses 4/10/2013

COMS W4115 Programming Languages and Translators Lecture 21: Code Optimization April 15, 2013

Why Global Dataflow Analysis?

Topic 9: Control Flow

Calvin Lin The University of Texas at Austin

COMPILER DESIGN - CODE OPTIMIZATION

CSC D70: Compiler Optimization LICM: Loop Invariant Code Motion

Lecture 8: Induction Variable Optimizations

Iterative Languages. Scoping

CS 403 Compiler Construction Lecture 10 Code Optimization [Based on Chapter 8.5, 9.1 of Aho2]

Register Allocation via Hierarchical Graph Coloring

Middle End. Code Improvement (or Optimization) Analyzes IR and rewrites (or transforms) IR Primary goal is to reduce running time of the compiled code

Simone Campanoni Loop transformations

Compiler Passes. Optimization. The Role of the Optimizer. Optimizations. The Optimizer (or Middle End) Traditional Three-pass Compiler

A main goal is to achieve a better performance. Code Optimization. Chapter 9

Code optimization. Have we achieved optimal code? Impossible to answer! We make improvements to the code. Aim: faster code and/or less space

Code Merge. Flow Analysis. bookkeeping

What Do Compilers Do? How Can the Compiler Improve Performance? What Do We Mean By Optimization?

Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

Optimization Prof. James L. Frankel Harvard University

Optimizations. Optimization Safety. Optimization Safety CS412/CS413. Introduction to Compilers Tim Teitelbaum

Language Extensions for Vector loop level parallelism

Fitting Fragility Functions to Structural Analysis Data Using Maximum Likelihood Estimation

Assignment statement optimizations

A Project Network Protocol Based on Reliability Theory

8 Optimisation. 8.2 Machine-Independent Optimisation

Introduction to Programming in C Department of Computer Science and Engineering. Lecture No. #17. Loops: Break Statement

Lesson 7: Code optimisation. Compilers. Code optimisation. Code optimisation. Kinds of optimisations. Introductory concepts

Predictive Interpolation for Registration

Intermediate Representation (IR)

Sardar Vallabhbhai Patel Institute of Technology (SVIT), Vasad M.C.A. Department COSMOS LECTURE SERIES ( ) (ODD) Code Optimization

RAID SEMINAR REPORT /09/2004 Asha.P.M NO: 612 S7 ECE

CS202 Compiler Construction

Lecture notes on Transportation and Assignment Problem (BBE (H) QTM paper of Delhi University)

UNIT-V. Symbol Table & Run-Time Environments Symbol Table

CSE443 Compilers. Dr. Carl Alphonce 343 Davis Hall

Automatic Translation of Fortran Programs to Vector Form. Randy Allen and Ken Kennedy

Lecture Notes on Loop-Invariant Code Motion

Control flow and loop detection. TDT4205 Lecture 29

Fast Fuzzy Clustering of Infrared Images. 2. brfcm

ECE 375: Computer Organization and Assembly Language Programming

LESSON 6 FLOW OF CONTROL

Calvin Lin The University of Texas at Austin

Lecture 7. Instruction Scheduling. I. Basic Block Scheduling II. Global Scheduling (for Non-Numeric Code)

ISA[k] Trees: a Class of Binary Search Trees with Minimal or Near Minimal Internal Path Length

OPTIMIZING PRODUCTION WORK FLOW USING OPEMCSS. John R. Clymer

USC 227 Office hours: 3-4 Monday and Wednesday CS553 Lecture 1 Introduction 4

Lecture 18 List Scheduling & Global Scheduling Carnegie Mellon

A Comparative study on Algorithms for Shortest-Route Problem and Some Extensions

Lecture Notes on Liveness Analysis

Lambda Calculus. Gunnar Gotshalks LC-1

Bandwidth Efficient Distant Vector Routing for Ad Hoc Networks

Incremental Tree Height Reduction For High Level Synthesis *

Compiler Optimization Intermediate Representation

Using Self-Protecting Storage to Lower Backup TCO

REDUCTION IN RUN TIME USING TRAP ANALYSIS

CSE P 501 Compilers. SSA Hal Perkins Spring UW CSE P 501 Spring 2018 V-1

Reading 1 : Introduction

A CELLULAR, LANGUAGE DIRECTED COMPUTER ARCHITECTURE. (Extended Abstract) Gyula A. Mag6. University of North Carolina at Chapel Hill

CSC D70: Compiler Optimization

CS 170 Spring 2000 Solutions and grading standards for exam 1 Clancy

CPSC 510 (Fall 2007, Winter 2008) Compiler Construction II

Rate Distortion Optimization in Video Compression

Code Optimization Using Graph Mining

EECS 583 Class 8 Classic Optimization

THE RELATIVE EFFICIENCY OF DATA COMPRESSION BY LZW AND LZSS

DFA&:OPT-METAFrame: A Tool Kit for Program Analysis and Optimization

Seminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm

Application of Bounded Variable Simplex Algorithm in Solving Maximal Flow Model

Using Static Single Assignment Form

CSE 501 Midterm Exam: Sketch of Some Plausible Solutions Winter 1997

On the Adequacy of Program Dependence Graphs for Representing Programs

EFFICIENT ATTACKS ON HOMOPHONIC SUBSTITUTION CIPHERS

CS 406/534 Compiler Construction Putting It All Together

Programming Languages

Transcription:

TR 80-003 ON OPERATOR STRENGTH REDUCTION John H. Crawford Mehd i Jazayer i

l. ON OPERATOR STRENGTH REDUCTION John H. Crawford Intel Corporation 3065 Bowers Avenue Santa Clara, California Mehdi Jazayeri Department of Computer Science University of North Carolina Chapel Hill, North Carolina ABSTRACT Strength reduction is a program optimization technique which attempts to replace expensive operations in a program by a series of equivalet, cheaper operations. We discjss the global application of this technique and present an approach to its implementation. This approach is based on well-known optimization techniques of live variabl~ analysis and redundant computation elimination. A discussion of the safety and profitability of the technique is also included. Key Words and Phrases: Program optimization; strength reduction; safety and profitability of optimization; time vs. space tradeoffs. 1. Introduction Strength reduction is the name given to a class of program improving techniques that replace expensive operations by a series of equivalent, cheaper operation's, if such a replacement results in an overall performance gain. For example, if two additions cost less than one multiply, then 3*A can be replaced by A+A+A. Similarly, X**2 can be replaced by X*X. This form of strength reduction is highly machine dependent; it is also local in that it encompasses only one statement. This kind of improvement is best left to the code generator.

2. The kind of strength reduction considered here is the kind traditionally associated with loops. It replaces multiplies of the form l*lc where I is an inductive variable and LC is a loop constant, by: a. initializing a temporary T to l*lc prior to entrance to the loop, and b. adding statements T=T+ (LC*LC') after statements of the form I=I+LC'. The value of LC*LC' can be computed outside the loop. The temporary Twill then hold the current value of I*LC throughout the loop. The major targets of strength reduction are the multiplications used to calculate the addresses of array elements. In this paper we present a slight generalization of strength reduction, formulate a strategy for its implementation, and discuss the safety and profitability of the operation. Schaefer (1) discusses the problem of strength reduction. Algorithms for strength reduction appear in (2), (3) and (4). The major characteristic of the approach presented here is that it is entirely based on other, simpler optimization techniques. We assume the reader is familiar with (very) live variable analysis and redundant computation elimination. 2. Terminology A loop initialization node or loop init node is the only immediate predecessor of the loop entry node that is not in the loop. If no such node exists for a loop, one may be created by adding a dummy node whose successor is the loop entry node, and whose predecessors are the predecessors of the loop entry node not in the loop. The loop init node dominates every node inside the loop.

3. J With respect to a particular loop, L: A variable, v, is a loop constant if no assignments are made to it inside the loop; An expression, e, is a loop constant if its constituent operands are loop constants; A variable i, is an inductive variable if every statement assigning a value to i is of the form i :=j:_i:lc where LC is a loop constant and j is either i or an inductive variable. Control variables of iterative DO loops are almost always inductive variables. 3. Generalization and Applications Strength reduction need not be limited to loops, inductive variables, and loop constants. We will show that no special consideration of inauctive variables is required. We will present a method that attempts strength reduction of multiplies of all variables. This method can easily be extended to other operations, as well as to smaller or larger regions of the program. This generalization only requires that the variables corresponding to loop constlnts be constant throughout the scope of the strength reduction. This generalized form of strength reduction may be implemented using the well-known redundant computation elimination and live analysis techniques.

4. As shown in the introduction, strength reduction inserts multiplies of the form (LC*LC') into the program. Unless these multiplies can be folded or removed from the loop, the technique may actually degrade program performance. Even if they can be moved out of the loop, these multiplies will increase the program's size. It is expected that mose of the "loop constants" encountered by strength reduction wi 11 be true constants. One consequence of the application of strength reduction is the removal of references to the iterative variable. If all references to the iterative variable can be removed, then- the assignments to the iterative variable can be eliminated as dead. Test replacement is used to replace more uses of the iterative variable. Since the temporaries involved in strength reduction are just multiples of the iterative variable, tests of the form "I relation Ct" can be replaced by a test of the form "Ti relation Ct*Ci". Test replacement has no beneficial effect on the progr am by itself, but it improves the effect of applying strength reduction by helping to allow the elimination of assignments to the iterative variable. 4. Examples I : = 1 ; LOOP: A:=I*Cl; B:=I*C2; I:=I+C3; IF I ( C4 THEN GOTO LOOP; This loop can be improved by the application of strength reduction and test replacement by:

5. a. initializing Tl in the loop init node to I*Cl=l*Cl=Cl, and initializing T2 to C2. b. inserting the statements Tl:= Tl + (C3*Cl); T2:= T2 + (C3*C2); after the statement I:=I+C3; c. replacing the test I<C4 by Tl(Cl*C4. The expressions I*Cl and l*c2 are now available throughout the loop, and the program can be optimized by redundant computation elimination. Note that the expressions C3*Cl and C3*C2 can be folded, since Cl, C2, and C3 are all constants. Also, if I is not live on exit from the loop, the two assignments to I can be eliminated by dead assignment elimination, yielding: Tl :=Cl; T2:=C2; LOOP: A:=Tl; B:=TZ; Tl :=Tl+(C3*Cl); T2:=T2+(C3*C2); IF Tl<'.(C4*Cl) THEN GOTO LOOP; This has eliminated two multiplies and one add from the loop body, and inserted three adds. Two assignments were inserted in the loop init node, and one assignment was deleted. The loop will execute somewhat faster, but the program size has increased by one statement. If n1ultiplies take considerably longer than adds to execute, and the loop is traversed many

6. times, this transformation will substantially reduce the execution time of the program. An exa~ple of the application of strength reduction outside of loops is: A:= I*Cl; I:=I+C2; B:= I*Cl; which if I is dead after the last statement can be transformed to: A:= I*Cl; B:= A + (C2*Cl); Note that the expression CZ*c; can be folded, since Cl and C2 are both constants. 5. Strategy The approach is heavily based on other optimization techniques. We assume that the reader is familiar with the basic optimization techniques. a. If expression l*ci is available inti immediately before a statement of the form I :=I:!:Cj, make it also available after S by tentatively inserting Ti:=Ti:!:(Cj*Ci) after S. b. Insert computations of the form I*Ci in the appropriate loop init nodes. c. Apply (tentative) redundant computation elimination. d. Retain all computations tentatively inserted in step (a) that are either very live after the computation, or (if this is a loop strength reduction) very live at entrance to the loop. Delete all other inserted computations. e. Redo (this time permanent) redundant computation elimination to reflect the effects of the inserted computations.

7. 6. Safety and Profitability The standard code motion safety requirements will be met if the expression temporary of a strength reduced multiply is very live at all points where a computation of the corresponding exp~ession was inserted. The profitability of this transformation cannot be guaranteed unless the multiplies removed by strength reduction would have been executed at least as frequently as the inserted additions. As with the other code motion techniques, this can be assured if the expression temporary for the multiply is very live at each instruction inserted by the technique, both in the loop init node and after the increment s ta temen ts. This technique tends to trade space for time. When used outside of loops, the usual case is to break about even on space, since the usual case is that one add is inserted for each multiply removed. l4hen used in loops, a space increase usually results, since two statements must usually be inserted (one in the loop, one in the init node) to remove one multiply. If the assignments to the iterative variable can be removed, some of this space loss can be recouped. One special case where this technique is extremely profitable is where the inserted additions can be done with the auto-increment hardware found on machines like the PDP-11. Unfortunately, global optimizations such as this are usually too far from the machine representation of the object code to perform such an optimization effectively. However, may be possible to set things up properly so that a later improvement phase could do the adds with the autoincrement hardware.

8. Because this technique tends to increase the size of the program, it seems unreasonable to apply it indiscriminately throughout the program. It can be profitably applied to the few criti ca 1 inner loops of the program. In these loops with high execution frequencies, this technique would yield a large improvement in execution speed for a small increase in program size, In the less frequently executed parts of the program, however, the slight improvement in execution time that could be realized may not be worth the increase in program size incurred. It seems reasonable to strength reduce wherever it does not require initializations, i.e., where it does n~:: rro11;rc in:,crtirl0 computations in the loop init node. Due to the space/tiliic tradeoff involved with strength reduction in loops, some restrictions are needed on that form of the technique. It seems reasonable to apply that technique to only the innermost, most deeply nested loops in the program which permit the elimination of their iterative variable. With this restriction, most of the expected time imorovement is obtained, with a minimal increase in program size. Acknowledgements The algorithm and analyses were developed for the PLCD crosscompiler at the University of North Carolina. Drs. Steve l1eiss and Dave Parnas of UNC read and made valuable comments on the earlier versions of this paper.

9. References ( 1 ) {2) (3) (4) M. Schaefer, A Mathematical Theory of Global Program p_p_t:._i~ti_f}ation_, Prentice Hall, Englewood Cliffs, N.J., 1973. F. E. Allen, "Program Optimization," Annual Review of Automatic Programming, Vol. 5, Pergamon, Elmsford, 11.Y., pp. 239-307, 1969. J. Cocke and K. Kennedy, "An Algorithm for Reductionof Operator Strength, '' Technical Report 476-093-2, Dept. of Mathematical Sciences, Rice University, Houston, Texas. 1974. F. E. Allen, J. Cocke and K. Kennedy, "Reduction of Operator Strength, Technical Report 476-093-6, Dept. of Mathematical Sciences, Rice University, Houston, Texas, 1974.