Programming Language Processor Theory Munehiro Takimoto Course Descriptions Method of Evaluation: made through your technical reports Purposes: understanding various theories and implementations of modern compiler construction Reference Books Preface 1. Andrew W. Appel, Modern Compiler Implementation in ML, Cambridge University Press 2. A. V. Aho and M. S. Lam and R. Sethi and J. D. Ullman, Compilers: Principles, Techniques, and Tools Second Edition, Addison Wesley 3. A. W. 4. < 2 > A. V. R. J. D. M. S. 5. 6. Over the past decade, there have been several shifts in the way for building compilers: 1
1. New kinds of programming languages are being used: object-oriented languages with dynamic methods, functional languages with nested scope and first-class function closures. Many of these languages require garbage collection, 2. New machines have large register sets and a high penalty for memory access. and can often run much faster with compiler assistance: scheduling instructions and, managing instructions data for cache locality. 1 Introduction 1. A modern compiler is often organized into many phases, each operating on a different abstract language. 2. Figure 1 shows the phases in a typical compiler. Each phase is implemented as one or more software modules. 3. The interfaces between modules of the compiler are almost as important as the algorithms inside the modules. 1.1 Modules and interfaces Any large software system is much easier to understand and implement if the designer takes care with the fundamental abstractions and interfaces. Breaking the compiler into this many pieces allows for reuse of the components. Modules 1. The modules up through Translate are called front end, and the other modules are called back end 2. To change the source language being compiled, only the front end need to be changed. 2
Source Program Lex Tokens Parse Reductions Parsing Actions Abstract Syntax Environ- ments Tables Semantic Translate Translate Frame Frame Layout IR Trees Canon- icalize IR Trees Instruction Selection Assem Assem Control Flow Flow Graph Data Flow Interference Graph Register Allocation Register Assignment Code Emission Assembly Language Assembler Relocatable Object Code Linker Machine Language Figure 1: Phases of a compiler, and interfaces between them. 3. To change the target-machine for which the compiler produces machine language, it suffices to replace just the Frame Layout and Instruction Selection modules of the back end. The compiler can be attached to a language-oriented syntax editor at the Abstract Syntax interface. Interfaces 1. Abstract Syntax, IR Trees, and Assem, take the form of data structures. Ex. the Parsing Actions phase builds an Abstract Syntax data structure and passes it to the Semantic phase. 2. Other interfaces through Instruction Selection are abstract data types. Ex. the Translate interface is a set of functions that the Semantic phase can call, and the Tokens interface takes the form of a function that the Parser calls to get the next token of the input program. 3
Phase Lex Parse Semantic Actions Semantic Frame Layout Translate Canonicalize Instruction Selection Control Flow Dataflow Register Allocation Code Emission Description Break the source file into individual words, or tokens. Analyze the phrase structure of the program. Build a piece of abstract syntax tree corresponding to each phrase. Determine what each phrase means, relate uses of variables to their definitions, check types of expressions, request translation of each phrase. Place variables, function-parameters, etc. into activation records (stack frames) in a machine-dependent way. Produce intermediate representation trees (IR trees), a notation that is not tied to any particular source language or target-machine architecture. Hoist side effects out of expressions, and clean up conditional branches, for the convenience of the next phases. Group the IR-tree nodes into clumps that correspond to the actions of target-machine instructions. Analyze the sequence of instructions into a control flow graph that shows all the possible flows of control the program might follow when it executes. Gather information about the flow of information through variables of the program; for example, liveness analysis calculates the places where each program variable holds a still-needed value (is live). Choose a register to hold each of the variables and temporary values used by the program; variables not live at the same time can share the same register. Replace the temporary names in each machine instruction with machine registers. Figure 2: Description of compiler phases. 4
1.2 Overview In the first half, I will give some basic theories of a control analysis and a dataflow analysis, and talk about the usage of information collected by the dataflow analysis. Then, I will explain some techniques making the dataflow analysis efficient. In the second half, I will talk about other data structures for code optimizations e.g. a dominator tree and a SSA form. Also, as a technique of extracting instruction level parallelism, I will introduce you an instruction scheduling and a software pipelining. If we have extra time, I will mention important techniques of the front end e.g. a garbage collection and a function closure. I practically use a compiler infrastructure, COINS in order to illustrate the implementations of the static analyses. Regarding details of COINS, you can get information from the following Web site: http://coins-compiler.sourceforge.jp/international/ For information of this class, refer the followings: http://www.cs.is.noda.tus.ac.jp/ mune/master/14/ 2 Dataflow An optimizing compiler transforms programs to improve their efficiency without changing their semantics. There are many transformations that improve efficiency: Common-subexpression elimimination: If an expression is computed more than once, eliminate redundant ones of the computations. Dead-code elimination: Delete a computation whose result will never be used. Register allocation: Keep two nonoverlapping temporaries in the same register. Constant folding: If the operands of an expression are constants, the expression can be replaced with a constant value by computing it at compile time. 5
Figure 3: The structure of COINS This is not a complete list of optimizations. In fact, there can never be a complete list. 2.1 No magic bullet Computability theory shows that it will always be possible to invent new optimizing transformations, where we assume that optimizing is for program size instead of speed to simplify the discussion. 1. Define that fully optimizing compiler is one that transforms each program P to a program Opt(P ) that is the smallest program with the same input/output behavior as P. 2. For any program Q that produces no output and never halts, Opt(Q) is short and easily recognizable: L 1 : goto L 1 6
3. Therefore, if we had a fully optimizing compiler, we could use it to solve the halting problem; to see if there exists an input on which P halts, just see if Opt(P ) is the one-line infinite loop. But we know that no computable algorithm can always tell whether programs halt. a fully optimizing compiler cannot be written either. Instead of fully optimizing compiler, we can build optimizing compilers. An optimizing compiler : transforms P into a program P that always has the same input/output behavior as P, and might be smaller or faster. We hope that P runs faster than the optimized programs produced by our competitors compilers. No matter what optimizing compiler we consider, there must always exist another (usually bigger) optimizing compiler that does a better job. Ex. suppose we have an optimizing compiler A. 1. There must be some program P x which does not halt, such that A(P x ) Opt(P x ). 2. There exists a better compiler B: B(P ) = if P = P x then [L : goto L] else A(P ) This theorem, which tells that for any optimizing compiler there exists a better one, is known as the full employment theorem for compiler writers. 7
2.2 Intermediate representation for flow analysis we will consider intraprocedural global optimization. Intraprocedural global optimization: means the analysis stays within a single procedure or function; global means that the analysis spans all the statements within that procedure. Interprocedural optimization: operates on several procedures and functions at once. Peephole optimization: operates only on pairs of adjacent instructions. Each of the optimizing transformations listed at the beginning of the section can be applied using the following generic recipe: 1. Dataflow analysis: Traverse the flow graph, gathering information about what may happen at run time. 2. Transformation: Modify the program to make it faster in some way; the information gathered by analysis will guarantee that the program s semantics is unchanged. There are many dataflow analyses that can provide useful information for optimizing transformations. Quadruples Most can be described by dataflow equations. We will use an intermediate representation simplified by ensuring that each expression includes only a single mem or binop. We can easily turn ordinary expressions into simplified ones. 1. Whenever there is a nested expression of one binop or mem inside another, or a binop or mem inside a jump or cjump, we split it introducing a new temporary: v e1 + e2 e3 t e2 e3 ; v e1 + t 8
a b binop c goto L L : a b a M[b] M[a] b if a relop b goto L 1 else goto L 2 f(a 1,..., a n ) b f(a 1,..., a n ) Figure 4: Table: Statements in quadruples. Occurrences of a, b, c, f, L denote temp, const, or label only. 2. We also introduce new temporaries to ensure that any store statement (that is, a move whose left-hand side is a mem) has only a temp or a const on its right-hand side, and only a temp or const under the mem. mem[ e1 e2 ] e3 + e4 t1 e1 e2 ; t2 e3 + e4 ; mem[t1] t2 These simple statements are often called quadruples, because the typical statement is a b c with four components (a, b, c, ). We use to stand for an arbitrary binop. They are also called three-address code. The statements take one of the forms shown in Table 4. The optimizer may move, insert, delete, and modify the quadruples. After the optimizations are completed, there will be many move statements that define temporaries that are used only once. It will be necessary to find these and turn them back into nested expressions. Control-Flow Graph To perform analyses on a program, it is often useful to make a controlflow graph. 1. Each statement in the program is a node in the flow graph. 2. If statement x can be followed by statement y, there is an edge from x to y. 9
1 a := 0 a 0 L 1 : b a + 1 c c + b a b 2 if a < N goto L 1 return c 2 3 4 5 b := a+1 c := c+b a := b*2 a<n 6 return c Figure 5: Graph: Control-flow graph of a program. Graph 5 shows the flow graph for a simple loop. Flow graph terminology. 1. A flow-graph node has out-edges that lead to successor nodes, and in-edges that come from predecessor nodes. 2. The set pred[n] is a set of all the predecessors of node n, and succ[n] is a set of all the successors. In Graph 5, the out-edges of node 5 are 5 6 and 5 2, and succ[5] = {2, 6}. The in-edges of 2 are 5 2 and 1 2, and pred[2] = {1, 5}. 2.3 Various dataflow analyses A dataflow analysis of a control flow graph of quadruples collects information about the execution of the program. The results of these analyses can be used to make optimizing transformations of the program. 10