Intermediate Representations. Reading & Topics. Intermediate Representations CS2210

Similar documents
Control Flow Analysis. Reading & Topics. Optimization Overview CS2210. Muchnick: chapter 7

Goals of Program Optimization (1 of 2)

CSE 501: Compiler Construction. Course outline. Goals for language implementation. Why study compilers? Models of compilation

Intermediate representation

Control Flow Analysis

Control-Flow Analysis

Compiler Passes. Optimization. The Role of the Optimizer. Optimizations. The Optimizer (or Middle End) Traditional Three-pass Compiler

Control Flow Analysis & Def-Use. Hwansoo Han

An Overview of GCC Architecture (source: wikipedia) Control-Flow Analysis and Loop Detection

A Bad Name. CS 2210: Optimization. Register Allocation. Optimization. Reaching Definitions. Dataflow Analyses 4/10/2013

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

Tour of common optimizations

Control flow graphs and loop optimizations. Thursday, October 24, 13

Example (cont.): Control Flow Graph. 2. Interval analysis (recursive) 3. Structural analysis (recursive)

Challenges in the back end. CS Compiler Design. Basic blocks. Compiler analysis

Lecture 3 Local Optimizations, Intro to SSA

CONTROL FLOW ANALYSIS. The slides adapted from Vikram Adve

Lecture 9: Loop Invariant Computation and Code Motion

CS 406/534 Compiler Construction Putting It All Together

Loop Optimizations. Outline. Loop Invariant Code Motion. Induction Variables. Loop Invariant Code Motion. Loop Invariant Code Motion

Lecture 2: Control Flow Analysis

Compiler Design. Fall Control-Flow Analysis. Prof. Pedro C. Diniz

Compiler Construction 2016/2017 Loop Optimizations

Statements or Basic Blocks (Maximal sequence of code with branching only allowed at end) Possible transfer of control

USC 227 Office hours: 3-4 Monday and Wednesday CS553 Lecture 1 Introduction 4

Induction Variable Identification (cont)

Compiler Construction 2010/2011 Loop Optimizations

Compiler Structure. Data Flow Analysis. Control-Flow Graph. Available Expressions. Data Flow Facts

Middle End. Code Improvement (or Optimization) Analyzes IR and rewrites (or transforms) IR Primary goal is to reduce running time of the compiled code

Loop Invariant Code Motion. Background: ud- and du-chains. Upward Exposed Uses. Identifying Loop Invariant Code. Last Time Control flow analysis

7. Optimization! Prof. O. Nierstrasz! Lecture notes by Marcus Denker!

EECS 583 Class 2 Control Flow Analysis LLVM Introduction

Control Flow Analysis

Data Structures and Algorithms in Compiler Optimization. Comp314 Lecture Dave Peixotto

Principles of Compiler Design

CS5363 Final Review. cs5363 1

CSE 401/M501 Compilers

CSE P 501 Compilers. Intermediate Representations Hal Perkins Spring UW CSE P 501 Spring 2018 G-1

Lecture 8: Induction Variable Optimizations

6. Intermediate Representation

Intermediate Representation (IR)

A main goal is to achieve a better performance. Code Optimization. Chapter 9

Why Global Dataflow Analysis?

CS 701. Class Meets. Instructor. Teaching Assistant. Key Dates. Charles N. Fischer. Fall Tuesdays & Thursdays, 11:00 12: Engineering Hall

Data Flow Analysis. Agenda CS738: Advanced Compiler Optimizations. 3-address Code Format. Assumptions

Control flow and loop detection. TDT4205 Lecture 29

What Do Compilers Do? How Can the Compiler Improve Performance? What Do We Mean By Optimization?

Redundant Computation Elimination Optimizations. Redundancy Elimination. Value Numbering CS2210

Module 14: Approaches to Control Flow Analysis Lecture 27: Algorithm and Interval. The Lecture Contains: Algorithm to Find Dominators.

Compiler Optimization Intermediate Representation

CSC D70: Compiler Optimization LICM: Loop Invariant Code Motion

Office Hours: Mon/Wed 3:30-4:30 GDC Office Hours: Tue 3:30-4:30 Thu 3:30-4:30 GDC 5.

Calvin Lin The University of Texas at Austin

CS 432 Fall Mike Lam, Professor. Data-Flow Analysis

Intermediate Code & Local Optimizations

Review; questions Basic Analyses (2) Assign (see Schedule for links)

Building a Runnable Program and Code Improvement. Dario Marasco, Greg Klepic, Tess DiStefano

Compiler Construction 2009/2010 SSA Static Single Assignment Form

Compiler Theory. (Intermediate Code Generation Abstract S yntax + 3 Address Code)

Introduction to Optimization Local Value Numbering

Code Optimization. Code Optimization

Topic I (d): Static Single Assignment Form (SSA)

Compilers. Intermediate representations and code generation. Yannis Smaragdakis, U. Athens (original slides by Sam

CS577 Modern Language Processors. Spring 2018 Lecture Optimization

6. Intermediate Representation!

Code optimization. Have we achieved optimal code? Impossible to answer! We make improvements to the code. Aim: faster code and/or less space

Intermediate Representations

Lecture Notes on Loop Optimizations

CS415 Compilers. Intermediate Represeation & Code Generation

Compiler Optimization

Program Representations. Xiangyu Zhang

Compilers. Optimization. Yannis Smaragdakis, U. Athens

CODE GENERATION Monday, May 31, 2010

COP5621 Exam 4 - Spring 2005

Lecture 23 CIS 341: COMPILERS

Loop Unrolling. Source: for i := 1 to 100 by 1 A[i] := A[i] + B[i]; endfor

IR Optimization. May 15th, Tuesday, May 14, 13

Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit

Loops. Lather, Rinse, Repeat. CS4410: Spring 2013

Introduction to Machine-Independent Optimizations - 6

CSC D70: Compiler Optimization

CSE P 501 Compilers. SSA Hal Perkins Spring UW CSE P 501 Spring 2018 V-1

Using Static Single Assignment Form

CS153: Compilers Lecture 15: Local Optimization

CS 2461: Computer Architecture 1

EECS 583 Class 8 Classic Optimization

EECS 583 Class 3 More on loops, Region Formation

Data Flow Analysis. Program Analysis

SSA Construction. Daniel Grund & Sebastian Hack. CC Winter Term 09/10. Saarland University

The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program.

Intermediate Code Generation

VIVA QUESTIONS WITH ANSWERS

Introduction to Compilers

CS293S Redundancy Removal. Yufei Ding

Compiler Optimization and Code Generation

Final Exam Review Questions

Introduction to Code Optimization. Lecture 36: Local Optimization. Basic Blocks. Basic-Block Example

Overview of a Compiler

Register Allocation (wrapup) & Code Scheduling. Constructing and Representing the Interference Graph. Adjacency List CS2210

Transcription:

Intermediate Representations CS2210 Lecture 11 Reading & Topics Muchnick: chapter 6 Topics today: Intermediate representations Automatic code generation with pattern matching Optimization Overview Control Flow Analysis (start) Intermediate Representations Make optimizer independent of source and target language Usually multiple levels HIR = high level encodes source language semantics Can express language-specific optimizations MIR = representation for multiple source and target languages Can express source/target independent optimizations LIR = low level representation with many specifics to target Can express target-specific optimizations 1

IR Goals Primary goals Easy & effective analysis Few cases Support for things of interest Easy transformations General across source / target languages Secondary goals Compact in memory Easy to translate from / to Debugging support Extensible & displayable High-Level IRs Abstract syntax tree + symbol table most common LISP S-expressions Medium-level IRs Represent source variables + temporaries and registers Reduce control flow to conditional + unconditional branches Explicit operations for procedure calls and block structure Most popular: three address code t1 := t2 op t3 (address at most 3 operands) if t goto L t1 := t2 < t3 2

Important MIRs SSA = static single assignment form Like 3-address code but every variable has exactly one reaching definition Makes variables independent of the locations they are in Makes many optimization algorithms more effective SSA Example x := u x x := v x x1 := u x1 x2 := v x2 Other Representations Triples (1) i + 1 (2) i := (1) (3) i + 1 (4) p + 4 (5) *(4) (6) p := (4) Trees Like AST but at lower level Directed Acyclic Graphs (DAGs) More compact than trees through node sharing 3

Three Address Code Example for i:= 1 to 10 do a[i] := b[i]+5; end i := 1 i <= 10? t1 := i*4 t2 := &b t3 := *(t2+t1) t4 := t3*5 t5 := i*4 t6 := &a *(t6+t5) := t4 i := i+1 Representation Components Operations Dependences between operations Control dependences: sequencing of operations Evaluation of then & else depend on result of test Side effects of statements occur in right order Data dependences: flow of values from definitions to uses Operands computed before operation Values read from variable before being overwritten Want to represent only relevant dependences Dependences constrain operations, so the fewer the better Representing Control Dependence Implicit in AST Explicit as Control Flow Graphs (CFGs) Nodes are basic blocks Instructions in block sequence side effects Edges represent branches (control flow between blocks) Fancier: Control Dependence Graph Part of the PDG (program dependence graph) Value dependence graph (VDG) Control dependence converted to data dependence 4

Data Dependence Kinds True (flow) dependence (read after write RAW) Reflects real data flow, operands to operation Anti-dependence (WAR) Output dependence (WAW) Reflects overwriting of memory, not real data flow Can often be eliminated Data Dependence Example x := 3 if q!= NULL then y := x + 2 w := *q x := z * 10 else x :=4 endif (1) x := 3 (2) q!= 0? (3) y := x+2 (4) w := *q (5) x := z*10 (6) x := 4 Representing Data Dependences (within bb s) Sequence of instructions Simple Easy analysis But: may overconstrain operation order Expression tree / DAG Directly captures dependences in block Supports local CSE (common subexpression elimination) Can be compact Harder to analyze & transform Eventually has to be linearized 5

Representing Data Dependences (across blocks) Implicit via def-use Simple Makes analysis slow (have to compute dependences each time) Explicit: def-use chains Fast Space-consuming Has to be updated after transformations Advanced options: SSA VDGs Dependence flow graphs (DFGs) Manual vs. Automatic Code Generation Manual Flexible Can handcraft highly optimized code Laborious to adapt to new target Automatic Easy, fast adaptation to new targets Less flexible Stylized code generation can result in worse code quality Approaches Graham-Glanville Code Generators Pattern matching and rules similar to SLR(1) parsing rules Reductions generate target code Attribute grammars Semantic actions generate target code Generalized tree pattern matching Match expression trees and generate corresponding code templates 6

Graham-Glanville Code Generation Uses Context-free-grammar-like rules to represent operations with corresponding instruction templates Semantic actions generate the code Components IR transformations Transforms IR to representation closer to target and appropriate for pattern matcher Pattern matcher Uses pattern matching to find appropriate reduction Instruction generator Generates code sequences Machine Descriptions LIR Rule Template r.2 <- r.1 r.2 => r.1 or r.1,0,r.2 r.3 <- r.1 + r.2 r.3 => + r.1 r.2 add r.1,r.2,r.3 r.3 <- r.1 + k.2 r.3 => + r.1 k.2 add r.1,k.2,r.3 r.2 <- [r.1] r.2 => ^r.1 ld [r.1],r.2 [r.2] <- r.1 ε => <- r.2 r.1 st r.1 [r.2] Matching example: <- + r1 2 + ^ r3 3 (prefix notation for: *(r1+2) := *r3+3) <- r2 + ^ r3 3 add r1,2,r2 <- r2 + r4 3 ld [r3],r4 <- r2 r5 add r4,3,r5 st r2,[r5] Tree Pattern Matching AKA BURS-style code generation BURS = Bottom Up Rewrite System Replaces (sub) trees by other trees (nodes) Each pattern match rule has an associated (code generation) action and cost Try to find minimal cost tree cover to generate locally optimal code 7

Optimization Overview Two step process Analyze program to learn things about it program analysis Determine when transformations are legal & profitable Transform the program based on information into semantically equivalent but better output program Optimization is a misnomer Almost never optimal Sometimes slows some programs down on some inputs (try to speed up most programs on most inputs) Semantics Subtleties Evaluation order Arithmetic properties (e.g. associativity) Behavior in error cases Some languages very precise E.g., Ada Some weaker Potentially more optimization opportunity Analysis Scope Peephole Across small number of adjacent instructions Trivial Local Within a basic block simple Intraprocedural (aka. Global) Across basic blocks within a procedure More complex, branches, merges loops Interprocedural Across procedures, within whole program Even more complex, calls, returns More useful for higher-level languages Hard with separate compilation Whole-program Useful for safety properties Most complex 8

Catalog of Optimizations Arithmetic simplification Constant folding x: = 3+4 => x := 7 Strength reduction x := y*4 => x := y<<2 Constant propagation x:=5 => x:=5 y := x+2 y :=5+2 => y := 7 Copy propagation x := y => x := y w := w+ x w := w+y Catalog (2) Common Subexpression Elimination (CSE) x : = a + b => x := a+b y := a +b y := x Can also eliminate redundant memory references, branch tests Partial Redundancy Elimination (PRE) Like CSE but earlier expression available only along some path if then => if then x := a+b t :=a+b; x:=t end else t:=a+b end y := a+b y := t Catalog (3) Pointer analysis p := &x => p = &x *p := 5 *p := 5 y := x+1 y := 6 Dead assignment elimination x := y * z /* no use of x */ x := 6 Dead code elimination if (false) then Integer range analysis for (i=0;i<10;i++) { if (i>=10) goto error a[i] := 0; } 9

Loop Optimizations (1) Loop-invariant code motion for j := 1 to 10 for i := 1 to 10 a[i] := a[i] + b[j] for j := 1 to 10 t := b[j] for i := 1 to 10 a[i] := a[i] + t Induction variable elimination for i := t to 10 a[i] := a[i] + 1 for p := &a[1] to &a[10] *p := *p + 1 a[i] is several instructions *p is one Loop Optimizations (2) Loop unrolling for i := 1 to N a[i] := a[i] + 1 for i := 1 to N by 4 a[i] := a[i] + 1 a[i+1] := a[i+1] + 1 a[i+2] := a[i+2] + 1 a[i+3] := a[i+3] + 1 Creates more optimization opportunities in loop body Parallelization Interchange Reversal Fusion Blocking / tiling Data cache locality optimization Call Optimizations Inlining l := w : = 4 a := area(l,w) l := w := 4 a := l*w l := w := 4 a := l <<2 Many simple optimizations become important after inlining Interprocedural constant propagation 10

More Call Optimizations Static binding of dynamic calls Calls through function pointers in imperative languages Call of computed function in functional language OO-dispatch in OO languages (e.g., COOL) If receiver class can be deduced, can replace with direct call Other optimizations possible even when multiple targets (e.g., using PICs = polymorphic inline caches) Procedure specialization Partial evaluation Machine-dependent Optimizations Register allocation Instruction selection Important for CISCs Instruction scheduling Particularly important with long-delay instructions and on wide-issue machines (superscalar + VLIW) The Phase Ordering Problem In what order should optimizations be performed? Some optimizations create opportunities for others (order according to this dependence) Can make some optimizations simple Later optimization will clean up What about adverse interactions Common subexpression elimination register allocation Register allocation instructions scheduling What about cyclic dependences? Constant folding constant propagation 11

Control Flow Analysis Approaches Dominator-based Control flow graph with dominator relation to identify loops Most commonly used Interval-based Nested regions (= intervals) Control tree Special case: structural analysis Most sophisticated Classifies control structures (not just loops) Basic Blocks and Control Flow Graphs (CFGs) Basic block = maximal sequence of instructions entered only from first and exited from last Entry can be Procedure entry point Branch target Instruction immediately following branch or return Entry instructions are called leaders 12

receive m Example receive m f0 <- 0 Y f1 <- 1 if m <= 1 goto L3 i <- 2 return m L1:if i<= m goto L2 return f2 L2:f2 <- f0 + f1 N f0 <- f1 f1 <- f2 return f2 i <- i+1 L3:return m f0 <- 0 f1 <- 1 m <= 1 N i <- 2 i <= m Y f2 <- f0+f1 f0 <- f1 f1 <- f2 i <- i+1 Example CFG Y entry N B2 B1 B3 B5 N B4 B6 Y exit Dominators & Postdominators Binary relation useful to determine loops Node d dom i : every possible execution path from entry to i includes d Dominance relataion is Reflexive: d dom d Transitive: a dom b and b dom c then a dom c Antisymmetric: if a dom b and b dom a then a = b Immediate dominance For a b :a idom b iff a dom b and c: c dom b and a dom c c = a 13

Dominators & Postdominators p pdom i : every possible execution path from node i to exit includes p Dual relation: i dom p in CFG with edges reversed and entry and exit switched a strictly dominates b : a dom b and a b Dominator Example Y entry N B2 B1 B3 B5 N B4 B6 Y exit Computing Dominators (easy but slow) Initialize dom(i) = set of all nodes for i entry, dom(entry) = {entry} While changes occur do dom(i) = {i} U (dom(i) dom(pred(i)) for all predecessors of I) Works fastest if nodes are processed in DFS order O(n 2 e) complexity, n number of nodes, e number of edges 14

Computing IDOM Compute dominators tmp(i) := dom(i) - {i} Remove all n tmp(i) from tmp(i) for which dom(n) {n} Computing Dominators Faster Lengauer & Tarjan s algorithm Described in the book O(e α(e,n)) running time where α is the inverse of Ackerman s function Loops & SCCs An edge e = (m,n) is called a back edge iff n dom m (head dominates tail) A natural loop of back edge m->n =: subgraph containing n and all nodes from which m can be reached w/o passing through n and the edges that connect those nodes n is called the loop header Preheader a node inserted immediately before the loop header Useful in many loop optimization as a landing pad for code from the loop body 15

Headers and Preheaders preheader header header B1 B2 B1 B2 B3 B3 Natural Loop Properties Natural loops with different headers are either Nested Disjoint What about natural loops with the same header? B1 B2 B3 Strongly Connected Components Generalization of loops A SCC is a subgraph G S =<N S,E S > in which every node in N S is reachable from every other node in N S by a path including only edges from E S An SCC S is maximal, iff every SCC containing it is the S itself 16

Reducible Flow Graphs A flow graph G=(N,E) is reducible (aka. well-structured) if E can be partitioned into E = E B U E F, where E B is the back edge set, so that (N, E F ) forms a DAG in which all nodes are reachable from the entry node Patterns that make CFGs irreducible, are called improper regions Impossible in some languages (e.g., Modula-2) Dealing with Irreducibility Cannot use structural analysis directly Use iterative data flow analysis instead Make graph well-structured using node splitting Induced iteration on the lattice of monotone functions from the lattice to itself (more on this later) Control Trees & Interval Analysis Idea: divide CFG into regions of various kinds Combine each region into a new node (abstract node) Obtain an abstract flow graph Final result is called control tree Root of control tree is abstract flow graph representing original flowgraph Leaves of control tree are basic blocks Nodes between root and leaves represent regions Edges represent relationships between abstract node and its descendents 17

Example: T1-T2 Analysis T1: B1 B1a T2: B1 B2 B1a Interval Analysis A (maximal) interval I M (h) with header h is the maximal single-entry subgraph with h as only entry node and with all closed subpaths in the subgraph containing h Like natural loop but with acyclic stuff dangling off loop exits A (minimal) interval is either A natural loop A maximal acyclic subgraph A minimal irreducible region Interval Analysis Steps Iterate until done: Postorder traversal of CFG looking for loop headers Construct natural loop for each loop header and reduce the loop to an abstract region natural loop For each set of entries of an improper region construct minimal SCC and reduce it to improper region For entry node and each immediate descendant of a node in a natural loop or irreducible region construct maximal acyclic graph with that node as root: if more than one node results, reduce to acyclic region 18

Structural Analysis A refinement of interval analysis Advantage compared to standard iterative data flow analysis Uses specialized flow functions for recognized structures that are much faster Data flow equations are determined by the syntax and semantics of the (source) language Recognizes more structures than standard interval analysis Region Types Blocks If-then If-then-else Case-switch Self loop While loop Natural loop Improper interval Proper interval 19