CS 406/534 Compiler Construction Instruction Scheduling
|
|
- Gavin Conley
- 5 years ago
- Views:
Transcription
1 CS 406/534 Compiler Construction Instruction Scheduling Prof. Li Xu Dept. of Computer Science UMass Lowell Fall 2004 Part of the course lecture notes are based on Prof. Keith Cooper, Prof. Ken Kennedy and Dr. Linda Torczon s teaching materials at Rice University. All rights reserved. 1
2 Administravia Lab2 due Class presentation by student today Lab3 has been posted on the web Due on 12/5 in 3 weeks Instruction scheduling for single basic block Implement 2 algorithms and compare results Lecture today covers the necessary background simulator and test files in ~cs406/lab3 CS406/534 Fall 2004, Prof. Li Xu 2 2
3 What We Did Last Time Procedure activation record and run-time structures for OO languages Change of schedule today to jump start lab3 Back to code generation in the next class CS406/534 Fall 2004, Prof. Li Xu 3 3
4 Today s Goals Instruction scheduling Motivation List scheduling Scheduling for larger regions CS406/534 Fall 2004, Prof. Li Xu 4 4
5 The Big Picture Source Code Front End IR Middle IR Back End Machine code Errors So far, we have covered the front-end Parsing, intermediate representation Middle end Data flow analysis and optimization, later Back end Generate machine code and deal with processor architecture Instruction scheduling today CS406/534 Fall 2004, Prof. Li Xu 5 5
6 The Big Picture IR Instruction Selection IR Register Allocation IR Instruction Scheduling Machine code Errors Back end has become increasingly important due to hardware architectures Multiple-issue, deep pipelining in modern micro-architecture Memory hierarchy to deal with CPU-memory gap Critical for application performance on modern machines CS406/534 Fall 2004, Prof. Li Xu 6 6
7 Recap from Computer Architecture Instruction-level parallelism Execute multiple instructions in parallel Employ multiple functional units and pipelining Typical paradigms Out-of-order superscalar Hardware dynamically reorders instructions Compiler scheduling can boost performance Examples: alpha, sparc, mips and other RISCs, even Pentium VLIW Compiler statically schedule parallel instructions into a long instruction word Depends on compiler to generate correct code Not just performance issues Examples: Intel Itanium IA-64, TI C6xxx CS406/534 Fall 2004, Prof. Li Xu 7 7
8 Motivation Instruction Scheduling Operations have non-uniform latencies due to pipelining Multiple-issue arch can issue several operations per cycle As a result, execution time is order-dependent Instruction Scheduling: decide on instruction execution order to minimize execution cycles Assumed latencies (conservative) Operation Cycles Loads & stores may or may not block load 3 > Non-blocking fill those issue slots store 3 Branch costs vary with path taken loadi 1 Branches typically have delay slots add 1 > Fill slots with unrelated operations mult 2 > Percolates branch upward fadd 1 Scheduler should hide the latencies fmult 2 shift 1 branch 0 to 8 Lab 3 will build a local scheduler CS406/534 Fall 2004, Prof. Li Xu 8 8
9 Example w w * 2 * x * y * z Simple schedule Schedule loads early 1 loadai r0,@w r1 4 add r1,r1 r1 5 loadai r0,@x r2 8 mult r1,r2 r1 9 loadai r0,@y r2 12 mult r1,r2 r1 13 loadai r0,@z r2 16 mult r1,r2 r1 18 storeai r1 r0,@w 21 r1 is free 1 loadai r0,@w r1 2 loadai r0,@x r2 3 loadai r0,@y r3 4 add r1,r1 r1 5 mult r1,r2 r1 6 loadai r0,@z r2 7 mult r1,r3 r1 9 mult r1,r2 r1 11 storeai r1 r0,@w 14 r1 is free 2 registers, 20 cycles 3 registers, 13 cycles Reordering operations for speed is called instruction scheduling CS406/534 Fall 2004, Prof. Li Xu 9 9
10 The Problem Instruction Scheduling Given a code fragment for some target machine and the latencies for each individual operation, reorder the operations to minimize execution time The Concept Machine description The task Produce correct code Minimize execution cycles slow code Scheduler fast code Avoid spilling registers Operate efficiently CS406/534 Fall 2004, Prof. Li Xu 10 10
11 Instruction Scheduling To capture properties of the code, build a precedence graph G Node n G are operations with type(n) and delay(n) An edge e = (n 1,n 2 ) G if & only if n 2 uses the result of n 1 a: loadai r0,@w r1 b: add r1,r1 r1 c: loadai r0,@x r2 d: mult r1,r2 r1 e: loadai r0,@y r2 f: mult r1,r2 r1 g: loadai r0,@z r2 h: mult r1,r2 r1 i: storeai r1 r0,@w a b d f c h i e g The Code The Precedence Graph CS406/534 Fall 2004, Prof. Li Xu 11 11
12 Instruction Scheduling A correct schedule S maps each n N into a non-negative integer representing its cycle number, and 1. S(n) 0, for all n N 2. If (n 1,n 2 ) E, S(n 1 ) + delay(n 1 ) S(n 2 ) 3. For each type t, there are no more operations of type t in any cycle than the target machine can issue The length of a schedule S, denoted L(S), is L(S) = max n N (S(n) + delay(n)) The goal is to find the shortest possible correct schedule. S is time-optimal if L(S) L(S 1 ), for all other schedules S 1 A schedule might also be optimal in terms of registers, power, or space. CS406/534 Fall 2004, Prof. Li Xu 12 12
13 Critical Points Instruction Scheduling All operands must be available Multiple operations can be ready Moving operations can lengthen register lifetimes Placing uses near definitions can shorten register lifetimes Operands can have multiple predecessors Together, these issues make scheduling hard (NP-Complete) Local scheduling is the simple case Restricted to straight-line code Consistent and predictable latencies CS406/534 Fall 2004, Prof. Li Xu 13 13
14 Instruction Scheduling The big picture 1. Build a precedence graph, P 2. Compute a priority function over the nodes in P 3. Use list scheduling to construct a schedule, one cycle at a time a. Use a queue of operations that are ready b. At each cycle I. Choose a ready operation and schedule it II. Update the ready queue Local list scheduling The dominant algorithm for twenty years A greedy, heuristic, local technique CS406/534 Fall 2004, Prof. Li Xu 14 14
15 Local List Scheduling Cycle 1 Ready leaves of P Active Ø while (Ready Active Ø) if (Ready Ø) then remove an op from Ready S(op) Cycle Active Active op Cycle Cycle + 1 for each op Active if (S(op) + delay(op) Cycle) then remove op from Active for each successor s of op in P if (s is ready) then Ready Ready s Removal in priority order op has completed execution If successor s operands are ready, put it on Ready CS406/534 Fall 2004, Prof. Li Xu 15 15
16 Scheduling Example 1. Build the precedence graph a: loadai r1 b: add r1,r1 r1 c: loadai r2 d: mult r1,r2 r1 e: loadai r2 f: mult r1,r2 r1 g: loadai r2 h: mult r1,r2 r1 i: storeai r1 a b d f c h i e g The Code The Precedence Graph CS406/534 Fall 2004, Prof. Li Xu 16 16
17 Scheduling Example 1. Build the precedence graph 2. Determine priorities: longest latency-weighted path a: loadai r1 b: add r1,r1 r1 c: loadai r2 d: mult r1,r2 r1 e: loadai r2 f: mult r1,r2 r1 g: loadai r2 h: mult r1,r2 r1 i: storeai r a b 9 d 7 12 c 10 e 8 g f 5 h 3 i The Code The Precedence Graph CS406/534 Fall 2004, Prof. Li Xu 17 17
18 Scheduling Example 1. Build the precedence graph 2. Determine priorities: longest latency-weighted path 3. Perform list scheduling New register name used 1) a: loadai r1 2) c: loadai r2 3) e: loadai r3 4) b: add r1,r1 r1 5) d: mult r1,r2 r1 6) g: loadai r2 7) f: mult r1,r3 r1 9) h: mult r1,r2 r1 11) i: storeai r a b 9 d 7 12 c 10 e 8 g f 5 h 3 i The Code The Precedence Graph CS406/534 Fall 2004, Prof. Li Xu 18 18
19 Detailed Scheduling Algorithm Idea: Keep a collection of worklists W[c], one per cycle We need MaxC = max delay + 1 such worklists Code: for each n N do begin count[n] := 0; earliest[n] = 0 end for each (n1,n2) E do begin count[n2] := count[n2] + 1; successors[n1] := successors[n1] {n2}; end for i := 0 to MaxC 1 do W[i] := ; Wcount := 0; for each n N do if count[n] = 0 then begin W[0] := W[0] {n}; Wcount := Wcount + 1; end c := 0; // c is the cycle number cw := 0;// cw is the number of the worklist for cycle c instr[c] := ; CS406/534 Fall 2004, Prof. Li Xu 19 19
20 Detailed Scheduling Algorithm while Wcount > 0 do begin while W[cW] = do begin c := c + 1; instr[c] := ; cw := mod(cw+1,maxc); end nextc := mod(c+1,maxc); while W[cW] do begin Priority select and remove an arbitrary instruction x from W[cW]; if free issue units of type(x) on cycle c then begin instr[c] := instr[c] {x}; Wcount := Wcount - 1; for each y successors[x] do begin count[y] := count[y] 1; earliest[y] := max(earliest[y], c+delay(x)); if count[y] = 0 then begin loc := mod(earliest[y],maxc); W[loc] := W[loc] {y}; Wcount := Wcount + 1; end end else W[nextc] := W[nextc] {x}; end end CS406/534 Fall 2004, Prof. Li Xu 20 20
21 More List Scheduling List scheduling breaks down into two distinct classes Forward list scheduling Start with available operations Work forward in time Ready all operands available Backward list scheduling Start with no successors Work backward in time Ready latency covers uses Variations on list scheduling Prioritize critical path(s) Schedule last use as soon as possible Depth first in precedence graph (minimize registers) Breadth first in precedence graph (minimize interlocks) Prefer operation with most successors CS406/534 Fall 2004, Prof. Li Xu 21 21
22 Local Scheduling As long as we stay within a single block List scheduling does well Problem is hard, so tie-breaking matters More descendants in dependence graph Prefer operation with a last use over one with none Breadth first makes progress on all paths Tends toward more ILP & fewer interlocks Depth first tries to complete uses of a value Tends to use fewer registers Classic work on this is Gibbons & Muchnick CS406/534 Fall 2004, Prof. Li Xu 22 22
23 Local Scheduling Forward and backward can produce different results loadi 1 lshift loadi 2 loadi 3 loadi add 1 add 2 add 3 add 4 addi Latency to the cbr Subscript to identify cmp store 1 store 2 store 3 store 4 store 5 1 cbr Block from SPEC benchmark go Operation load loadi add addi store cmp Latency CS406/534 Fall 2004, Prof. Li Xu 23 23
24 F o r w a r d S c h e d u l e Int loadi 1 loadi 2 loadi 4 add 2 add 4 cmp cbr Local Scheduling Int Mem Int B lshift a 1 loadi 4 loadi 3 c 2 addi add k 1 3 add w 4 add 3 a 4 add 3 addi store 1 r 5 add 2 d store 2 6 add 1 store 3 S 7 store 4 c 8 h store 5 9 e d 10 u 11 cmp l e 12 cbr 13 Using latency to root as the priority Mem CS406/534 Fall 2004, Prof. Li Xu 24 Int lshift loadi 3 loadi 2 loadi 1 store 5 store 4 store 3 store 2 store 1 24
25 Schielke s RBF algorithm Local Scheduling Run 5 passes of forward list scheduling and 5 passes of backward list scheduling Break each tie randomly Keep the best schedule Shortest time to completion Other metrics are possible (shortest time + fewest registers) Randomized Backward & Forward In practice, this does very well CS406/534 Fall 2004, Prof. Li Xu 25 25
26 Scheduling Larger Regions Superlocal Scheduling Work EBB at a time Example has four EBBs B 1 a b c d B 2 e f B 3 g B 4 h i B 5 j k B 6 l CS406/534 Fall 2004, Prof. Li Xu 26 26
27 Scheduling Larger Regions Superlocal Scheduling Work EBB at a time Example has four EBBs Only two have nontrivial paths {B 1,B 2,B 4 } & {B 1,B 3 } Having B 1 in both causes conflicts Moving an op out of B 1 causes problems B 2 e f B 1 a b c d B 3 g B 4 h i B 5 j k B 6 l CS406/534 Fall 2004, Prof. Li Xu 27 27
28 Scheduling Larger Regions Superlocal Scheduling Work EBB at a time Example has four EBBs Only two have nontrivial paths {B 1,B 2,B 4 } & {B 1,B 3 } Having B 1 in both causes conflicts Moving an op out of B 1 causes problems B 2 c,e f B 1 a b c d B 3 no c here! g B 4 h i B 5 j k B 6 l CS406/534 Fall 2004, Prof. Li Xu 28 28
29 Scheduling Larger Regions Superlocal Scheduling Work EBB at a time Example has four EBBs Only two have nontrivial paths {B 1,B 2,B 4 } & {B 1,B 3 } Having B 1 in both causes conflicts Moving an op out of B 1 causes problems Must insert compensation code in B 3 Increases code space B 4 h i B 2 c,e f B 1 B 5 a b c d j k B 3 This one wasn t done for speed! c g B 6 l CS406/534 Fall 2004, Prof. Li Xu 29 29
30 Scheduling Larger Regions Superlocal Scheduling Work EBB at a time Example has four EBBs Only two have nontrivial paths {B 1,B 2,B 4 } & {B 1,B 3 } Having B 1 in both causes conflicts Moving an op into B 1 causes problems B 4 h i B 2 e f B 1 B 5 a b c d j k B 3 g B 6 l CS406/534 Fall 2004, Prof. Li Xu 30 30
31 Scheduling Larger Regions Superlocal Scheduling Work EBB at a time Example has four EBBs Only two have nontrivial paths {B 1,B 2,B 4 } & {B 1,B 3 } Having B 1 in both causes conflicts Moving an op into B 1 causes problems Lengthens {B 1,B 3 } B 4 Adds computation to {B 1,B 3 } May need compensation code, too Renaming may avoid undof h i B 2 e f B 1 B 5 a b c d,f j k B 3 This makes the path even longer! undo f g B 6 l CS406/534 Fall 2004, Prof. Li Xu 31 31
32 Scheduling Larger Regions Superlocal Scheduling How much can we get? Schielke saw 11 to 12% speed ups Constrained away compensation code B 2 e f B 1 a b c d B 3 g B 4 h i B 5 j k B 6 l Value numbering is the best case for superlocal scope CS406/534 Fall 2004, Prof. Li Xu 32 32
33 Scheduling Larger Regions More Aggressive Superlocal Scheduling Clone blocks to create more context B 1 a b c d B 2 e f B 3 g Join points create blocks that must work in multiple contexts B 4 h i B 5 j k 2 paths B 6 l 3 paths CS406/534 Fall 2004, Prof. Li Xu 33 33
34 Scheduling Larger Regions More Aggressive Superlocal Scheduling Clone blocks to create more context Some blocks can combine Single successor, single predecessor B 1 a b c d B 2 e f B 3 g B 4 h i B 5a j k B 5b j k B 6a l B 6b l B 6c l CS406/534 Fall 2004, Prof. Li Xu 34 34
35 Scheduling Larger Regions More Aggressive Superlocal Scheduling Clone blocks to create more context Some blocks can combine Single successor, single predecessor B 1 a b c d B 2 e f B 3 g B 4 h i B 5a j k B 5b j k B 6a l B 6b l B 6c l CS406/534 Fall 2004, Prof. Li Xu 35 35
36 Scheduling Larger Regions More Aggressive Superlocal Scheduling Clone blocks to create more context Some blocks can combine Single successor, single predecessor Now schedule EBBs {B 1,B 2,B 4 }, {B 1,B 2,B 5q }, {B 1,B 3,B 5b } Pay heed to compensation code B 2 e f B 1 a b c d B 3 g Works well for forward motion Backward motion still has off-path problems Speeding up one path can slow down others (undo) B 4 h i l B 5a j k l B 5b j k l CS406/534 Fall 2004, Prof. Li Xu 36 36
37 Scheduling Larger Regions Trace Scheduling Start with execution counts for edges Obtained by profiling B 1 a b c d B 2 e f B 3 g B 4 h i B 5 j k B 6 l CS406/534 Fall 2004, Prof. Li Xu 37 37
38 Scheduling Larger Regions Trace Scheduling Start with execution counts for edges Obtained by profiling Pick the hot path B 1 7 a b c d 10 3 B 2 e f B 3 g B 4 h i B 5 j k 5 B 6 l 5 Block counts could mislead us see B 5 CS406/534 Fall 2004, Prof. Li Xu 38 38
39 Scheduling Larger Regions Trace Scheduling Start with execution counts for edges Obtained by profiling Pick the hot path B 1,B 2,B 4,B 6 Schedule it Compensation code in B 3,B 5 if needed Get the hot path right! B 4 If we picked the right path, the other blocks do not matter as much Places a premium on quality profiles h i B B 6 e f l B 1 7 B a b c d j k 10 B g CS406/534 Fall 2004, Prof. Li Xu 39 39
40 Scheduling Larger Regions Trace Scheduling Entire CFG Pick & schedule hot path Insert compensation code Remove hot path from CFG Repeat the process until CFG is empty B 1 7 a b c d 10 3 B 2 e f B 3 g Idea Hot paths matter Farther off hot path, less it matters B 4 h i 5 5 B 6 l B j k 3 CS406/534 Fall 2004, Prof. Li Xu 40 40
41 Interaction with Register Allocation Dependence constraint due to register names True dependence (a: =>r1 b:,r1 =>) Anti dependence (a: r1 => b: => r1) Output dependence (a: =>r1 b: => r1) Renaming can remove anti and output dependence Usually increase register pressure Solution: pre- and post-scheduling CS406/534 Fall 2004, Prof. Li Xu 41 41
42 Lab 3 Static schedule machine model Run iloc_sim s 3 <file.i for non-scheduled code Implement two schedulers for basic blocks in ILOC One must be list scheduling with specified priority Other can be a second priority, or a different algorithm Same ILOC subset as in lab 1 (plus NOP) Different execution model Two asymmetric functional units Latencies different from lab 1 Simulator different from lab 1 CS406/534 Fall 2004, Prof. Li Xu 42 42
43 Lab 3 Specific questions 1. Are we in over our heads? No. The hard part of this lab should be trying to get good code. The programming is not bad; you can reuse some stuff from lab 1. You have all the specifications & tools you need. Jump in and start programming. 2. How many registers can we use? Assume that you have as many registers as you need. If the simulator runs out, we ll get more. Rename registers to avoid false dependences & conflicts. 3. What about a store followed by a load? If you can show that the two operations must refer to different memory locations, the scheduler can overlap their execution. Otherwise, the store must complete before the load issues. 4. What about test files? We will put some test files online. We will test your lab on files you do not see. We have had problems (in the past) with people optimizing their labs for the test data. (Not that you would do that!) CS406/534 Fall 2004, Prof. Li Xu 43 43
44 Lab 3 - Hints Begin by renaming registers to eliminate false dependencies Pay attention to loads and stores When can they be reordered? Understand the simulator Inserting NOPs may be necessary to get correct code Tiebreakers can make a difference CS406/534 Fall 2004, Prof. Li Xu 44 44
45 Summary Instruction scheduling List scheduling Scheduling larger regions CS406/534 Fall 2004, Prof. Li Xu 45 45
46 Next Class Code shape Code generation CS406/534 Fall 2004, Prof. Li Xu 46 46
CS415 Compilers. Instruction Scheduling and Lexical Analysis
CS415 Compilers Instruction Scheduling and Lexical Analysis These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University Instruction Scheduling (Engineer
More informationIntroduction to Optimization, Instruction Selection and Scheduling, and Register Allocation
Introduction to Optimization, Instruction Selection and Scheduling, and Register Allocation Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Traditional Three-pass Compiler
More informationInstruction Selection and Scheduling
Instruction Selection and Scheduling The Problem Writing a compiler is a lot of work Would like to reuse components whenever possible Would like to automate construction of components Front End Middle
More informationfast code (preserve flow of data)
Instruction scheduling: The engineer s view The problem Given a code fragment for some target machine and the latencies for each individual instruction, reorder the instructions to minimize execution time
More informationInstruction Scheduling Beyond Basic Blocks Extended Basic Blocks, Superblock Cloning, & Traces, with a quick introduction to Dominators.
Instruction Scheduling Beyond Basic Blocks Extended Basic Blocks, Superblock Cloning, & Traces, with a quick introduction to Dominators Comp 412 COMP 412 FALL 2016 source code IR Front End Optimizer Back
More informationk register IR Register Allocation IR Instruction Scheduling n), maybe O(n 2 ), but not O(2 n ) k register code
Register Allocation Part of the compiler s back end IR Instruction Selection m register IR Register Allocation k register IR Instruction Scheduling Machine code Errors Critical properties Produce correct
More informationCS 406/534 Compiler Construction Putting It All Together
CS 406/534 Compiler Construction Putting It All Together Prof. Li Xu Dept. of Computer Science UMass Lowell Fall 2004 Part of the course lecture notes are based on Prof. Keith Cooper, Prof. Ken Kennedy
More informationInstruction Scheduling
Instruction Scheduling Michael O Boyle February, 2014 1 Course Structure Introduction and Recap Course Work Scalar optimisation and dataflow L5 Code generation L6 Instruction scheduling Next register allocation
More informationCS 406/534 Compiler Construction Instruction Scheduling beyond Basic Block and Code Generation
CS 406/534 Compiler Construction Instruction Scheduling beyond Basic Block and Code Generation Prof. Li Xu Dept. of Computer Science UMass Lowell Fall 2004 Part of the course lecture notes are based on
More informationCS415 Compilers Register Allocation. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University
CS415 Compilers Register Allocation These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University Review: The Back End IR Instruction Selection IR Register
More informationLab 3, Tutorial 1 Comp 412
COMP 412 FALL 2018 Lab 3, Tutorial 1 Comp 412 source code IR IR Front End Optimizer Back End target code Copyright 2018, Keith D. Cooper, Linda Torczon & Zoran Budimlić, all rights reserved. Students enrolled
More informationLecture 18 List Scheduling & Global Scheduling Carnegie Mellon
Lecture 18 List Scheduling & Global Scheduling Reading: Chapter 10.3-10.4 1 Review: The Ideal Scheduling Outcome What prevents us from achieving this ideal? Before After Time 1 cycle N cycles 2 Review:
More informationCS5363 Final Review. cs5363 1
CS5363 Final Review cs5363 1 Programming language implementation Programming languages Tools for describing data and algorithms Instructing machines what to do Communicate between computers and programmers
More informationReview: The Ideal Scheduling Outcome. List Scheduling. Scheduling Roadmap. Review: Scheduling Constraints. Todd C. Mowry CS745: Optimizing Compilers
List Scheduling Review: The Ideal Scheduling Outcome Before After Time cycle CS: Optimizing Compilers N cycles What prevents us from achieving this ideal? CS: Instruction Scheduling -- Review: Scheduling
More informationInstruction Selection: Preliminaries. Comp 412
COMP 412 FALL 2017 Instruction Selection: Preliminaries Comp 412 source code Front End Optimizer Back End target code Copyright 2017, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled
More informationAdministration CS 412/413. Instruction ordering issues. Simplified architecture model. Examples. Impact of instruction ordering
dministration CS 1/13 Introduction to Compilers and Translators ndrew Myers Cornell University P due in 1 week Optional reading: Muchnick 17 Lecture 30: Instruction scheduling 1 pril 00 1 Impact of instruction
More informationInstruction Selection: Peephole Matching. Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Instruction Selection: Peephole Matching Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. The Problem Writing a compiler is a lot of work Would like to reuse components
More informationCS415 Compilers. Intermediate Represeation & Code Generation
CS415 Compilers Intermediate Represeation & Code Generation These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University Review - Types of Intermediate Representations
More informationThe ILOC Simulator User Documentation
The ILOC Simulator User Documentation COMP 412, Fall 2015 Documentation for Lab 1 The ILOC instruction set is taken from the book, Engineering A Compiler, published by the Elsevier Morgan-Kaufmann [1].
More informationCS 406/534 Compiler Construction Instruction Selection and Global Register Allocation
CS 406/534 Compiler Construction Instruction Selection and Global Register Allocation Prof. Li Xu Dept. of Computer Science UMass Lowell Fall 2004 Part of the course lecture notes are based on Prof. Keith
More informationGlobal Register Allocation via Graph Coloring
Global Register Allocation via Graph Coloring Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit permission
More informationLecture 7. Instruction Scheduling. I. Basic Block Scheduling II. Global Scheduling (for Non-Numeric Code)
Lecture 7 Instruction Scheduling I. Basic Block Scheduling II. Global Scheduling (for Non-Numeric Code) Reading: Chapter 10.3 10.4 CS243: Instruction Scheduling 1 Scheduling Constraints Data dependences
More informationLecture Compiler Backend
Lecture 19-23 Compiler Backend Jianwen Zhu Electrical and Computer Engineering University of Toronto Jianwen Zhu 2009 - P. 1 Backend Tasks Instruction selection Map virtual instructions To machine instructions
More informationThe ILOC Simulator User Documentation
The ILOC Simulator User Documentation Spring 2015 Semester The ILOC instruction set is taken from the book, Engineering A Compiler, published by the Morgan- Kaufmann imprint of Elsevier [1]. The simulator
More informationThe ILOC Simulator User Documentation
The ILOC Simulator User Documentation Comp 506, Spring 2017 The ILOC instruction set is taken from the book, Engineering A Compiler, published by the Morgan- Kaufmann imprint of Elsevier [1]. The simulator
More informationThe ILOC Virtual Machine (Lab 1 Background Material) Comp 412
COMP 12 FALL 20 The ILOC Virtual Machine (Lab 1 Background Material) Comp 12 source code IR Front End OpMmizer Back End IR target code Copyright 20, Keith D. Cooper & Linda Torczon, all rights reserved.
More informationLecture 12: Compiler Backend
CS 515 Programming Language and Compilers I Lecture 1: Compiler Backend (The lectures are based on the slides copyrighted by Keith Cooper and Linda Torczon from Rice University.) Zheng (Eddy) Zhang Rutgers
More informationInstruction Selection, II Tree-pattern matching
Instruction Selection, II Tree-pattern matching Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 4 at Rice University have explicit permission
More informationImplementing Control Flow Constructs Comp 412
COMP 412 FALL 2018 Implementing Control Flow Constructs Comp 412 source code IR Front End Optimizer Back End IR target code Copyright 2018, Keith D. Cooper & Linda Torczon, all rights reserved. Students
More informationLocal Register Allocation (critical content for Lab 2) Comp 412
Updated After Tutorial COMP 412 FALL 2018 Local Register Allocation (critical content for Lab 2) Comp 412 source code IR Front End Optimizer Back End IR target code Copyright 2018, Keith D. Cooper & Linda
More informationCS415 Compilers. Lexical Analysis
CS415 Compilers Lexical Analysis These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University Lecture 7 1 Announcements First project and second homework
More informationCode Shape Comp 412 COMP 412 FALL Chapters 4, 5, 6 & 7 in EaC2e. source code. IR IR target. code. Front End Optimizer Back End
COMP 412 FALL 2017 Code Shape Comp 412 source code IR IR target Front End Optimizer Back End code Copyright 2017, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at
More informationAdvanced Computer Architecture
ECE 563 Advanced Computer Architecture Fall 2010 Lecture 6: VLIW 563 L06.1 Fall 2010 Little s Law Number of Instructions in the pipeline (parallelism) = Throughput * Latency or N T L Throughput per Cycle
More informationLocal Optimization: Value Numbering The Desert Island Optimization. Comp 412 COMP 412 FALL Chapter 8 in EaC2e. target code
COMP 412 FALL 2017 Local Optimization: Value Numbering The Desert Island Optimization Comp 412 source code IR Front End Optimizer Back End IR target code Copyright 2017, Keith D. Cooper & Linda Torczon,
More informationData Structures and Algorithms in Compiler Optimization. Comp314 Lecture Dave Peixotto
Data Structures and Algorithms in Compiler Optimization Comp314 Lecture Dave Peixotto 1 What is a compiler Compilers translate between program representations Interpreters evaluate their input to produce
More informationCopyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit
Intermediate Representations Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit permission to make copies
More informationSimple Machine Model. Lectures 14 & 15: Instruction Scheduling. Simple Execution Model. Simple Execution Model
Simple Machine Model Fall 005 Lectures & 5: Instruction Scheduling Instructions are executed in sequence Fetch, decode, execute, store results One instruction at a time For branch instructions, start fetching
More information: Advanced Compiler Design. 8.0 Instruc?on scheduling
6-80: Advanced Compiler Design 8.0 Instruc?on scheduling Thomas R. Gross Computer Science Department ETH Zurich, Switzerland Overview 8. Instruc?on scheduling basics 8. Scheduling for ILP processors 8.
More informationCode Generation. CS 540 George Mason University
Code Generation CS 540 George Mason University Compiler Architecture Intermediate Language Intermediate Language Source language Scanner (lexical analysis) tokens Parser (syntax analysis) Syntactic structure
More informationBackground: Pipelining Basics. Instruction Scheduling. Pipelining Details. Idealized Instruction Data-Path. Last week Register allocation
Instruction Scheduling Last week Register allocation Background: Pipelining Basics Idea Begin executing an instruction before completing the previous one Today Instruction scheduling The problem: Pipelined
More informationHardware Speculation Support
Hardware Speculation Support Conditional instructions Most common form is conditional move BNEZ R1, L ;if MOV R2, R3 ;then CMOVZ R2,R3, R1 L: ;else Other variants conditional loads and stores nullification
More informationCS311 Lecture: Pipelining, Superscalar, and VLIW Architectures revised 10/18/07
CS311 Lecture: Pipelining, Superscalar, and VLIW Architectures revised 10/18/07 Objectives ---------- 1. To introduce the basic concept of CPU speedup 2. To explain how data and branch hazards arise as
More informationTopic 14: Scheduling COS 320. Compiling Techniques. Princeton University Spring Lennart Beringer
Topic 14: Scheduling COS 320 Compiling Techniques Princeton University Spring 2016 Lennart Beringer 1 The Back End Well, let s see Motivating example Starting point Motivating example Starting point Multiplication
More informationECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 15 Very Long Instruction Word Machines
ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 15 Very Long Instruction Word Machines Benjamin Lee Electrical and Computer Engineering Duke University www.duke.edu/~bcl15 www.duke.edu/~bcl15/class/class_ece252fall11.html
More informationLecture 13 - VLIW Machines and Statically Scheduled ILP
CS 152 Computer Architecture and Engineering Lecture 13 - VLIW Machines and Statically Scheduled ILP John Wawrzynek Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~johnw
More informationCSE P 501 Compilers. Register Allocation Hal Perkins Autumn /22/ Hal Perkins & UW CSE P-1
CSE P 501 Compilers Register Allocation Hal Perkins Autumn 2011 11/22/2011 2002-11 Hal Perkins & UW CSE P-1 Agenda Register allocation constraints Local methods Faster compile, slower code, but good enough
More informationCompiler Design. Register Allocation. Hwansoo Han
Compiler Design Register Allocation Hwansoo Han Big Picture of Code Generation Register allocation Decides which values will reside in registers Changes the storage mapping Concerns about placement of
More informationSoftware Tools for Lab 1
Software Tools for Lab 1 We have built a number of tools to help you build and debug Lab 1 An ILOC Simulator Essentially, an interpreter for Lab 1 ILOC Matches the Lab 1 specs Single functional unit, latencies
More informationVLIW/EPIC: Statically Scheduled ILP
6.823, L21-1 VLIW/EPIC: Statically Scheduled ILP Computer Science & Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind
More informationCompiler Architecture
Code Generation 1 Compiler Architecture Source language Scanner (lexical analysis) Tokens Parser (syntax analysis) Syntactic structure Semantic Analysis (IC generator) Intermediate Language Code Optimizer
More informationECE 252 / CPS 220 Advanced Computer Architecture I. Lecture 14 Very Long Instruction Word Machines
ECE 252 / CPS 220 Advanced Computer Architecture I Lecture 14 Very Long Instruction Word Machines Benjamin Lee Electrical and Computer Engineering Duke University www.duke.edu/~bcl15 www.duke.edu/~bcl15/class/class_ece252fall11.html
More informationRISC & Superscalar. COMP 212 Computer Organization & Architecture. COMP 212 Fall Lecture 12. Instruction Pipeline no hazard.
COMP 212 Computer Organization & Architecture Pipeline Re-Cap Pipeline is ILP -Instruction Level Parallelism COMP 212 Fall 2008 Lecture 12 RISC & Superscalar Divide instruction cycles into stages, overlapped
More informationInstruction Level Parallelism. Appendix C and Chapter 3, HP5e
Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation
More informationPipelining and Exploiting Instruction-Level Parallelism (ILP)
Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining and Instruction-Level Parallelism (ILP). Definition of basic instruction block Increasing Instruction-Level Parallelism (ILP) &
More informationUNIT 8 1. Explain in detail the hardware support for preserving exception behavior during Speculation.
UNIT 8 1. Explain in detail the hardware support for preserving exception behavior during Speculation. July 14) (June 2013) (June 2015)(Jan 2016)(June 2016) H/W Support : Conditional Execution Also known
More informationCS 406/534 Compiler Construction Parsing Part I
CS 406/534 Compiler Construction Parsing Part I Prof. Li Xu Dept. of Computer Science UMass Lowell Fall 2004 Part of the course lecture notes are based on Prof. Keith Cooper, Prof. Ken Kennedy and Dr.
More informationCS 432 Fall Mike Lam, Professor. Code Generation
CS 432 Fall 2015 Mike Lam, Professor Code Generation Compilers "Back end" Source code Tokens Syntax tree Machine code char data[20]; int main() { float x = 42.0; return 7; } 7f 45 4c 46 01 01 01 00 00
More informationComputer Systems Architecture I. CSE 560M Lecture 10 Prof. Patrick Crowley
Computer Systems Architecture I CSE 560M Lecture 10 Prof. Patrick Crowley Plan for Today Questions Dynamic Execution III discussion Multiple Issue Static multiple issue (+ examples) Dynamic multiple issue
More informationComputer Architecture and Engineering CS152 Quiz #3 March 22nd, 2012 Professor Krste Asanović
Computer Architecture and Engineering CS52 Quiz #3 March 22nd, 202 Professor Krste Asanović Name: This is a closed book, closed notes exam. 80 Minutes 0 Pages Notes: Not all questions are
More informationRegister Allocation. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.
Register Allocation Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP at Rice. Copyright 00, Keith D. Cooper & Linda Torczon, all rights reserved.
More informationReal Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University
Real Processors Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel
More informationCOMPUTER ORGANIZATION AND DESI
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler
More informationMultiple Instruction Issue. Superscalars
Multiple Instruction Issue Multiple instructions issued each cycle better performance increase instruction throughput decrease in CPI (below 1) greater hardware complexity, potentially longer wire lengths
More informationMulti-cycle Instructions in the Pipeline (Floating Point)
Lecture 6 Multi-cycle Instructions in the Pipeline (Floating Point) Introduction to instruction level parallelism Recap: Support of multi-cycle instructions in a pipeline (App A.5) Recap: Superpipelining
More informationUG4 Honours project selection: Talk to Vijay or Boris if interested in computer architecture projects
Announcements UG4 Honours project selection: Talk to Vijay or Boris if interested in computer architecture projects Inf3 Computer Architecture - 2017-2018 1 Last time: Tomasulo s Algorithm Inf3 Computer
More informationThe Processor: Instruction-Level Parallelism
The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy
More information15-740/ Computer Architecture Lecture 23: Superscalar Processing (III) Prof. Onur Mutlu Carnegie Mellon University
15-740/18-740 Computer Architecture Lecture 23: Superscalar Processing (III) Prof. Onur Mutlu Carnegie Mellon University Announcements Homework 4 Out today Due November 15 Midterm II November 22 Project
More informationCode Shape II Expressions & Assignment
Code Shape II Expressions & Assignment Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit permission to make
More informationAgenda. What is the Itanium Architecture? Terminology What is the Itanium Architecture? Thomas Siebold Technology Consultant Alpha Systems Division
What is the Itanium Architecture? Thomas Siebold Technology Consultant Alpha Systems Division thomas.siebold@hp.com Agenda Terminology What is the Itanium Architecture? 1 Terminology Processor Architectures
More informationCode generation for modern processors
Code generation for modern processors Definitions (1 of 2) What are the dominant performance issues for a superscalar RISC processor? Refs: AS&U, Chapter 9 + Notes. Optional: Muchnick, 16.3 & 17.1 Instruction
More informationCode generation for modern processors
Code generation for modern processors What are the dominant performance issues for a superscalar RISC processor? Refs: AS&U, Chapter 9 + Notes. Optional: Muchnick, 16.3 & 17.1 Strategy il il il il asm
More informationIntermediate Representations
Most of the material in this lecture comes from Chapter 5 of EaC2 Intermediate Representations Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP
More informationCS 152 Computer Architecture and Engineering. Lecture 13 - VLIW Machines and Statically Scheduled ILP
CS 152 Computer Architecture and Engineering Lecture 13 - VLIW Machines and Statically Scheduled ILP Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste!
More informationCS415 Compilers Overview of the Course. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University
CS415 Compilers Overview of the Course These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University Critical Facts Welcome to CS415 Compilers Topics in the
More informationLecture 7: Pipelining Contd. More pipelining complications: Interrupts and Exceptions
Lecture 7: Pipelining Contd. Kunle Olukotun Gates 302 kunle@ogun.stanford.edu http://www-leland.stanford.edu/class/ee282h/ 1 More pipelining complications: Interrupts and Exceptions Hard to handle in pipelined
More informationProcessor (IV) - advanced ILP. Hwansoo Han
Processor (IV) - advanced ILP Hwansoo Han Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline Less work per stage shorter clock cycle
More informationPipelined Processor Design. EE/ECE 4305: Computer Architecture University of Minnesota Duluth By Dr. Taek M. Kwon
Pipelined Processor Design EE/ECE 4305: Computer Architecture University of Minnesota Duluth By Dr. Taek M. Kwon Concept Identification of Pipeline Segments Add Pipeline Registers Pipeline Stage Control
More informationThe IA-64 Architecture. Salient Points
The IA-64 Architecture Department of Electrical Engineering at College Park OUTLINE: Architecture overview Background Architecture Specifics UNIVERSITY OF MARYLAND AT COLLEGE PARK Salient Points 128 Registers
More informationComputing Inside The Parser Syntax-Directed Translation, II. Comp 412 COMP 412 FALL Chapter 4 in EaC2e. source code. IR IR target.
COMP 412 FALL 20167 Computing Inside The Parser Syntax-Directed Translation, II Comp 412 source code IR IR target Front End Optimizer Back End code Copyright 2017, Keith D. Cooper & Linda Torczon, all
More informationGetting CPI under 1: Outline
CMSC 411 Computer Systems Architecture Lecture 12 Instruction Level Parallelism 5 (Improving CPI) Getting CPI under 1: Outline More ILP VLIW branch target buffer return address predictor superscalar more
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationCS 432 Fall Mike Lam, Professor. Data-Flow Analysis
CS 432 Fall 2018 Mike Lam, Professor Data-Flow Analysis Compilers "Back end" Source code Tokens Syntax tree Machine code char data[20]; int main() { float x = 42.0; return 7; } 7f 45 4c 46 01 01 01 00
More informationCS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #22 CPU Design: Pipelining to Improve Performance II 2007-8-1 Scott Beamer, Instructor CS61C L22 CPU Design : Pipelining to Improve Performance
More informationCS 406/534 Compiler Construction Intermediate Representation and Procedure Abstraction
CS 406/534 Compiler Construction Intermediate Representation and Procedure Abstraction Prof. Li Xu Dept. of Computer Science UMass Lowell Fall 2004 Part of the course lecture notes are based on Prof. Keith
More informationInstruction Level Parallelism
Instruction Level Parallelism Software View of Computer Architecture COMP2 Godfrey van der Linden 200-0-0 Introduction Definition of Instruction Level Parallelism(ILP) Pipelining Hazards & Solutions Dynamic
More informationInstruction scheduling. Advanced Compiler Construction Michel Schinz
Instruction scheduling Advanced Compiler Construction Michel Schinz 2015 05 21 Instruction ordering When a compiler emits the instructions corresponding to a program, it imposes a total order on them.
More informationPhoto David Wright STEVEN R. BAGLEY PIPELINES AND ILP
Photo David Wright https://www.flickr.com/photos/dhwright/3312563248 STEVEN R. BAGLEY PIPELINES AND ILP INTRODUCTION Been considering what makes the CPU run at a particular speed Spent the last two weeks
More informationA Bad Name. CS 2210: Optimization. Register Allocation. Optimization. Reaching Definitions. Dataflow Analyses 4/10/2013
A Bad Name Optimization is the process by which we turn a program into a better one, for some definition of better. CS 2210: Optimization This is impossible in the general case. For instance, a fully optimizing
More informationCMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading)
CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) Limits to ILP Conflicting studies of amount of ILP Benchmarks» vectorized Fortran FP vs. integer
More informationCS425 Computer Systems Architecture
CS425 Computer Systems Architecture Fall 2017 Multiple Issue: Superscalar and VLIW CS425 - Vassilis Papaefstathiou 1 Example: Dynamic Scheduling in PowerPC 604 and Pentium Pro In-order Issue, Out-of-order
More informationIntermediate Representations
COMP 506 Rice University Spring 2018 Intermediate Representations source code IR Front End Optimizer Back End IR target code Copyright 2018, Keith D. Cooper & Linda Torczon, all rights reserved. Students
More informationPreventing Stalls: 1
Preventing Stalls: 1 2 PipeLine Pipeline efficiency Pipeline CPI = Ideal pipeline CPI + Structural Stalls + Data Hazard Stalls + Control Stalls Ideal pipeline CPI: best possible (1 as n ) Structural hazards:
More informationParallelism. Execution Cycle. Dual Bus Simple CPU. Pipelining COMP375 1
Pipelining COMP375 Computer Architecture and dorganization Parallelism The most common method of making computers faster is to increase parallelism. There are many levels of parallelism Macro Multiple
More informationAn Experimental Evaluation of List Scheduling
An Experimental Evaluation of List Scheduling Keith D. Cooper, Philip J. Schielke, and Devika Subramanian Department of Computer Science Rice University Houston, Texas, USA Rice Computer Science Techcnical
More informationCS377P Programming for Performance Single Thread Performance Out-of-order Superscalar Pipelines
CS377P Programming for Performance Single Thread Performance Out-of-order Superscalar Pipelines Sreepathi Pai UTCS September 14, 2015 Outline 1 Introduction 2 Out-of-order Scheduling 3 The Intel Haswell
More informationCSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1
CSE 820 Graduate Computer Architecture week 6 Instruction Level Parallelism Based on slides by David Patterson Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level
More informationData Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard
Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Consider: a = b + c; d = e - f; Assume loads have a latency of one clock cycle:
More informationWhat Compilers Can and Cannot Do. Saman Amarasinghe Fall 2009
What Compilers Can and Cannot Do Saman Amarasinghe Fall 009 Optimization Continuum Many examples across the compilation pipeline Static Dynamic Program Compiler Linker Loader Runtime System Optimization
More informationLecture 9: More ILP. Today: limits of ILP, case studies, boosting ILP (Sections )
Lecture 9: More ILP Today: limits of ILP, case studies, boosting ILP (Sections 3.8-3.14) 1 ILP Limits The perfect processor: Infinite registers (no WAW or WAR hazards) Perfect branch direction and target
More information