Loops! while a.runs() loop { while b.runs() loop c.foo() pool; b.reset(); } pool

Similar documents
More Loop Unrolling and Vectorization

Flow Graph Theory. Depth-First Ordering Efficiency of Iterative Algorithms Reducible Flow Graphs

Lecture 8: Induction Variable Optimizations

Lecture 4. More on Data Flow: Constant Propagation, Speed, Loops

Statements or Basic Blocks (Maximal sequence of code with branching only allowed at end) Possible transfer of control

Lecture 2: Control Flow Analysis

A Bad Name. CS 2210: Optimization. Register Allocation. Optimization. Reaching Definitions. Dataflow Analyses 4/10/2013

Module 14: Approaches to Control Flow Analysis Lecture 27: Algorithm and Interval. The Lecture Contains: Algorithm to Find Dominators.

Compiler Structure. Data Flow Analysis. Control-Flow Graph. Available Expressions. Data Flow Facts

Intermediate representation

ECE 5775 High-Level Digital Design Automation Fall More CFG Static Single Assignment

Control Flow Analysis

Control Flow Graphs. (From Matthew S. Hetch. Flow Analysis of Computer Programs. North Holland ).

Introduction to Machine-Independent Optimizations - 4

Lecture 9: Loop Invariant Computation and Code Motion

HW/SW Codesign. WCET Analysis

An Overview of GCC Architecture (source: wikipedia) Control-Flow Analysis and Loop Detection

Compiler Construction 2016/2017 Loop Optimizations

Control-Flow Analysis

EECS 213 Introduction to Computer Systems Dinda, Spring Homework 3. Memory and Cache

Control Flow Analysis

CS521 \ Notes for the Final Exam

Example (cont.): Control Flow Graph. 2. Interval analysis (recursive) 3. Structural analysis (recursive)

Intermediate Representations. Reading & Topics. Intermediate Representations CS2210

Code Genera*on for Control Flow Constructs

Processors, Performance, and Profiling

Graph Algorithms Using Depth First Search

Compiler Construction 2010/2011 Loop Optimizations

MIPS function continued

1. O(log n), because at worst you will only need to downheap the height of the heap.

Loop Invariant Code Motion. Background: ud- and du-chains. Upward Exposed Uses. Identifying Loop Invariant Code. Last Time Control flow analysis

COL351: Analysis and Design of Algorithms (CSE, IITD, Semester-I ) Name: Entry number:

Translating JVM Code to MIPS Code 1 / 43

Induction Variable Identification (cont)

CSC D70: Compiler Optimization Register Allocation

Compiler Construction 2009/2010 SSA Static Single Assignment Form

Register Allocation. Stanford University CS243 Winter 2006 Wei Li 1

CSE P 501 Compilers. Loops Hal Perkins Spring UW CSE P 501 Spring 2018 U-1

Review; questions Basic Analyses (2) Assign (see Schedule for links)

Tail Recursion. ;; a recursive program for factorial (define fact (lambda (m) ;; m is non-negative (if (= m 0) 1 (* m (fact (- m 1))))))

Algorithms (IX) Guoqiang Li. School of Software, Shanghai Jiao Tong University

Acknowledgement. Flow Graph Theory. Depth First Search. Speeding up DFA 8/16/2016

Registers. Registers

Why Global Dataflow Analysis?

CIS 121 Data Structures and Algorithms with Java Spring Code Snippets and Recurrences Monday, January 29/Tuesday, January 30

Code Quality Analyzer (CQA)

ECE 5775 (Fall 17) High-Level Digital Design Automation. Static Single Assignment

Contents of Lecture 2

Elementary Graph Algorithms CSE 6331

Trees. 3. (Minimally Connected) G is connected and deleting any of its edges gives rise to a disconnected graph.

(Refer Slide Time: 05:25)

CS781 Lecture 2 January 13, Graph Traversals, Search, and Ordering

Stackless BVH Collision Detection for Physical Simulation

Searching in Graphs (cut points)

Chapter 3. Graphs. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

Control flow graphs and loop optimizations. Thursday, October 24, 13

Simone Campanoni Loop transformations

Lecture 23 CIS 341: COMPILERS

COMP 250 Fall recurrences 2 Oct. 13, 2017

8 Optimisation. 8.2 Machine-Independent Optimisation

R13. II B. Tech I Semester Supplementary Examinations, May/June DATA STRUCTURES (Com. to ECE, CSE, EIE, IT, ECC)

Chapter 3. Graphs CLRS Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

Control Flow Analysis & Def-Use. Hwansoo Han

3.1 Basic Definitions and Applications. Chapter 3. Graphs. Undirected Graphs. Some Graph Applications

Overview. Constructors and destructors Virtual functions Single inheritance Multiple inheritance RTTI Templates Exceptions Operator Overloading

INSTITUTE OF AERONAUTICAL ENGINEERING

Tree. A path is a connected sequence of edges. A tree topology is acyclic there is no loop.

mywbut.com Informed Search Strategies-I

Principles of Computer Science

EECS 583 Class 3 More on loops, Region Formation

Graph. Vertex. edge. Directed Graph. Undirected Graph

Topic I (d): Static Single Assignment Form (SSA)

M-ary Search Tree. B-Trees. B-Trees. Solution: B-Trees. B-Tree: Example. B-Tree Properties. Maximum branching factor of M Complete tree has height =

Assignment 3 Suggested solutions

Control Flow Analysis. Reading & Topics. Optimization Overview CS2210. Muchnick: chapter 7

SSA. Stanford University CS243 Winter 2006 Wei Li 1

CSC D70: Compiler Optimization LICM: Loop Invariant Code Motion

3.1 Basic Definitions and Applications

How do we mark reachable objects?

Fall 2011 Prof. Hyesoon Kim

Architecture and components of Computer System Execution of program instructions

ECE 571 Advanced Microprocessor-Based Design Lecture 7

Advanced Set Representation Methods

Solutions. Proof of halting: Denote the size of the search set by n. By definition of traversable set, the loop header runs at most n + 1 times.

Data Structures. 1. Each entry in a linked list is a called a (a)link (b)node (c)data structure (d)none of the above

Week - 04 Lecture - 02 Merge Sort, Analysis

An example of optimization in LLVM. Compiler construction Step 1: Naive translation to LLVM. Step 2: Translating to SSA form (opt -mem2reg)

Lecture 18 List Scheduling & Global Scheduling Carnegie Mellon

CMSC430 Spring 2014 Midterm 2 Solutions

We assume uniform hashing (UH):

UCS-406 (Data Structure) Lab Assignment-1 (2 weeks)

M-ary Search Tree. B-Trees. Solution: B-Trees. B-Tree: Example. B-Tree Properties. B-Trees (4.7 in Weiss)

Loop Optimizations. Outline. Loop Invariant Code Motion. Induction Variables. Loop Invariant Code Motion. Loop Invariant Code Motion

Finding the Depth of a Flow Graph*

Data Dependence Analysis

Chapter 04: Instruction Sets and the Processor organizations. Lesson 18: Stack-based processor Organisation

Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute. Week 02 Module 06 Lecture - 14 Merge Sort: Analysis

Trees Algorhyme by Radia Perlman

Spring 2009 Prof. Hyesoon Kim

Liveness Analysis and Register Allocation

Transcription:

Loops

Loops! while a.runs() loop { while b.runs() loop c.foo() pool; b.reset(); } pool

Not a Loop! if a.iseven() then { Even: b.foo(); goto Odd; } else { Odd: b.bar(); goto Even; }

Optimizing Loops Most program time is spent in loops. Otherwise, run time would be roughly proportional to program length. Not-a-loop cycles are rare in practice, even with goto. Programmers tend not to think that way. How would you normally write the code on the previous slide?

Detecting Loops: Overview Natural Loops : Entry node ( header ) that dominates all nodes in loop. Back edge from within loop body to header. Independent of how loop is written syntactically. Same handling for for-loops, while-loops, etc. In practice, many uses of goto form natural loops. Loop detection largely cribbed shamelessly from Jeffrey Ullman s slides at: http://infolab.stanford.edu/~ullman/dragon/w06/lectures/dfa3.pdf

Dominators Revisited X dominates Y (X Y) Every path to Y goes through X. Note: X X X strictly dominates Y (X > Y) X Y, but X Y. Direction: Forward Values: Sets of CFG nodes. v ENTRY = {ENTRY} Initial value = N Meet operator: Transfer function: f B x = x {B}

Kinds of Edges Defined relative to Depth-First Spanning Tree of CFG. 1. Tree edges. 2. Advancing edges: Node to proper descendent (includes tree edges). 3. Retreating edges: Node to ancestor (including self). 4. Cross edges: No ancestor relationship between nodes.

DFS Tree Edges Example

DFS Tree Edges Example Tree Edges 4 1 2 Label nodes from N to 1 after processing all children. 5 3

DFS Tree Edges Example Retreating Edges 1 Advancing Edge 4 2 5 3 Cross Edge

DFS Tree Edges Example 4 Retreating Edges 1 2 Retreating edges always go from high to low in DFS order. 5 3

Back Edges, Reducibility, and Depth An edge is a back edge if its head dominates its tail. Back edges are retreating edges. A graph is reducible iff all retreating edges are back edges. The depth of a CFG is the maximum number of retreating edges on any acyclic path. For reducible graphs, depth is fixed regardless of order of visiting children.

Depth Example 1 Depth = 2 2 3 4 5

Loop Detection Algorithm For each back edge n d: loop {n, d} Mark d as visited. For each node n in DFS of reverse edges from n: loop n loop 1 2 3 4 5

Loop Detection Algorithm For each back edge n d: loop {n, d} Mark d as visited. For each node n in DFS of reverse edges from n: loop n loop 1 2 3 4 5 Edge: 4 1 loop = {1, 4}

Loop Detection Algorithm For each back edge n d: loop {n, d} Mark d as visited. For each node n in DFS of reverse edges from n: loop n loop 1 2 3 4 5 Edge: 4 1 loop = {1, 4}

Loop Detection Algorithm For each back edge n d: loop {n, d} Mark d as visited. For each node n in DFS of reverse edges from n: loop n loop 1 2 3 4 5 Edge: 4 1 loop = {1, 2, 4}

Loop Detection Algorithm For each back edge n d: loop {n, d} Mark d as visited. For each node n in DFS of reverse edges from n: loop n loop 1 2 3 4 5 Edge: 4 1 loop = {1, 2, 3, 4}

Loop Detection Algorithm For each back edge n d: loop {n, d} Mark d as visited. For each node n in DFS of reverse edges from n: loop n loop 1 2 3 4 5 Edge: 3 2 loop = {2, 3}

Loop Detection Algorithm For each back edge n d: loop {n, d} Mark d as visited. For each node n in DFS of reverse edges from n: loop n loop 1 2 3 4 5 Edge: 3 2 loop = {2, 3}

Loop Detection Algorithm For each back edge n d: loop {n, d} Mark d as visited. For each node n in DFS of reverse edges from n: loop n loop 1 2 3 4 5 Found 2 loops: A: {1, 2, 3, 4} B: {2, 3} Since B A, we know A contains B. B is innermost loop.

Overlapping Loops 1 2 4 3 Loops: A: {1, 2} B: {1, 2, 3} C: {1, 2, 4} Merge B and C: BC: {1, 2, 3, 4} BC contains A.

Loop Unrolling

Loop Example li r0 <- 0 syscall IO.in_int li r2 <- 0 li r3 <- 1 L1: ble r1 r0 L2 add r2 <- r2 r0 add r0 <- r0 r3 jmp L1 L2: mov r1 <- r2 syscall IO.out_int >./cool --profile test.cl-asm 8191 33542145 PROFILE: instructions = 32774 PROFILE: pushes and pops = 0 PROFILE: cache hits = 0 PROFILE: cache misses = 15 PROFILE: branch predictions = 16382 PROFILE: branch mispredictions = 1 PROFILE: multiplications = 0 PROFILE: divisions = 0 PROFILE: system calls = 4 CYCLES: 38294

Loop Example li r3 <- 1 L1: ble r1 r0 L2 add r2 <- r2 r0 add r0 <- r0 r3 > jmp L1 L2: mov r1 <- r2 syscall IO.out_int

Loop Example li r3 <- 1 L1: ble r1 r0 L2 add r2 <- r2 r0 add r0 <- r0 r3 ble r1 r0 L2 add r2 <- r2 r0 add r0 <- r0 r3 jmp L1 L2: mov r1 <- r2 syscall IO.out_int >./cool --profile test.cl-asm 8191 33542145 PROFILE: instructions = 28678 PROFILE: pushes and pops = 0 PROFILE: cache hits = 0 PROFILE: cache misses = 18 PROFILE: branch predictions = 12286 PROFILE: branch mispredictions = 1 PROFILE: multiplications = 0 PROFILE: divisions = 0 PROFILE: system calls = 4 CYCLES: 34498

Loop Example li r3 <- 1 L1: ble r1 r0 L2 add r2 <- r2 r0 add r0 <- r0 r3 ble r1 r0 L2 add r2 <- r2 r0 add r0 <- r0 r3 jmp L1 L2: mov r1 <- r2 syscall IO.out_int >./cool --profile test.cl-asm 8191 33542145 12% fewer PROFILE: instructions instructions = 28678 PROFILE: pushes and pops = 0 PROFILE: cache hits = 0 PROFILE: cache misses = 18 PROFILE: branch predictions = 12286 PROFILE: branch mispredictions = 1 PROFILE: multiplications = 0 PROFILE: 10% divisions fewer = 0 PROFILE: system cycles calls = 4 CYCLES: 34498

Loop Example li r3 <- 1 L1: ble r1 r0 L2 add r2 <- r2 r0 add r0 <- r0 r3 ble r1 r0 L2 add r2 <- r2 r0 add r0 <- r0 r3 jmp L1 L2: mov r1 <- r2 syscall IO.out_int Can we remove this? >./cool --profile test.cl-asm 8191 33542145 PROFILE: instructions = 28678 PROFILE: pushes and pops = 0 PROFILE: cache hits = 0 PROFILE: cache misses = 18 PROFILE: branch predictions = 12286 PROFILE: branch mispredictions = 1 PROFILE: multiplications = 0 PROFILE: divisions = 0 PROFILE: system calls = 4 CYCLES: 34498

Loop Example li r3 <- 1 L1: ble r1 r0 L2 add r2 <- r2 r0 add r0 <- r0 r3 add r2 <- r2 r0 add r0 <- r0 r3 jmp L1 L2: mov r1 <- r2 syscall IO.out_int >./cool --profile test.cl-asm 8191 33550336 PROFILE: instructions = 24586 PROFILE: pushes and pops = 0 PROFILE: cache hits = 0 PROFILE: cache misses = 17 PROFILE: branch predictions = 8192 PROFILE: branch mispredictions = 1 PROFILE: multiplications = 0 PROFILE: divisions = 0 PROFILE: system calls = 4 CYCLES: 30306

Loop Example li r3 <- 1 L1: ble r1 r0 L2 add r2 <- r2 r0 add r0 <- r0 r3 add r2 <- r2 r0 add r0 <- r0 r3 jmp L1 L2: mov r1 <- r2 syscall IO.out_int >./cool --profile test.cl-asm 8191 33550336 PROFILE: instructions = 24586 PROFILE: pushes and pops = 0 PROFILE: cache hits = 0 PROFILE: cache misses = 17 PROFILE: branch predictions = 8192 PROFILE: branch mispredictions = 1 PROFILE: multiplications = 0 PROFILE: 21% divisions fewer = 0 PROFILE: system cycles calls = 4 CYCLES: 30306

Loop Example li r3 <- 1 L1: ble r1 r0 L2 add r2 <- r2 r0 add r0 <- r0 r3 add r2 <- r2 r0 add r0 <- r0 r3 jmp L1 L2: mov r1 <- r2 syscall IO.out_int >./cool --profile test.cl-asm 8191 33550336 PROFILE: Should instructions be = 24586 PROFILE: pushes and pops = 0 33542145! PROFILE: cache hits = 0 PROFILE: cache misses = 17 PROFILE: branch predictions = 8192 PROFILE: branch mispredictions = 1 PROFILE: multiplications = 0 PROFILE: divisions = 0 PROFILE: system calls = 4 CYCLES: 30306

Loop Example li r3 <- 1 li r4 <- 2 div r5 <- r1 r4 mul r6 <- r5 r4 beq r6 r1 L1 add r2 <- r2 r0 add r0 <- r0 r3 L1: ble r1 r0 L2 add r2 <- r2 r0 add r0 <- r0 r3 > Handle odd number of iterations. On x86 use mod instruction.

Loop Example li r3 <- 1 li r4 <- 2 div r5 <- r1 r4 mul r6 <- r5 r4 beq r6 r1 L1 add r2 <- r2 r0 add r0 <- r0 r3 L1: ble r1 r0 L2 add r2 <- r2 r0 add r0 <- r0 r3 >./cool test.cl-asm --profile 8191 33542145 PROFILE: Right instructions = 24586 PROFILE: pushes and pops = 0 answer! PROFILE: cache hits = 0 PROFILE: cache misses = 23 PROFILE: branch predictions = 8191 PROFILE: branch mispredictions = 1 PROFILE: multiplications = 1 PROFILE: 19% divisions fewer = 1 PROFILE: system cycles calls = 4 CYCLES: 30956

Bonus Material

Induction Variables Knowing loop bounds would help remove loop instructions. Many loop indices are affine expressions of program variables. E.g., c 0 + c 1 v 1 + c 2 v 2 Induction variables: affine expressions of number of iterations. I.e., c 0 + c 1 i Symbolic analysis can learn induction variables.

Affine Expression Example for (int m = 10; m < 20; m++) { x = m * 3; a = foo(x); y = a + 10; } m =? x =? a =? y =?

Affine Expression Example for (int m = 10; m < 20; m++) { x = m * 3; a = foo(x); y = a + 10; } m = i + 10 x = 3i + 30 a =? y = a + 10

Data-Flow Analysis for Affine Expressions Values: (unknown), affine expression, or (not affine). Let f(m) be a function to look up variables in the current data-flow value m. Meet operator: f 1 f 2 m v = f 1 m v, f 1 m v = f 2 m (v), otherwise

Data-Flow Analysis for Affine Expressions Transfer functions: For assignment statements to x when (c 1 = 0 or y = ) and (c 2 = 0 or z = ): m v, f s m x = v x c 0 + c 1 m y + c 2 m z, x c 0 + c 1 y + c 2 z, otherwise Otherwise, f s = I Composition: f 2 f 1 = substitute values from f 1 into f 2.

Handling Iteration Let f i denote composing f with itself i times. Basic induction variables: If f m x = m x + c, f i m x = m x + ci Symbolic constants: If f m x = m x, f i m x = m(x)

Handling Iteration Let f i denote composing f with itself i times. Induction variables (if x 1 are basic induction variables or symbolic constants): If f m x = c 0 + c 1 m x 1 +, f i m x = c 0 + c 1 f i m x 1 Otherwise, f i m x =.

Symbolic Analysis for Affine Expressions Start with innermost loops and work outward.