CS 406/534 Compiler Construction Instruction Scheduling

Size: px
Start display at page:

Download "CS 406/534 Compiler Construction Instruction Scheduling"

Transcription

1 CS 406/534 Compiler Construction Instruction Scheduling Prof. Li Xu Dept. of Computer Science UMass Lowell Fall 2004 Part of the course lecture notes are based on Prof. Keith Cooper, Prof. Ken Kennedy and Dr. Linda Torczon s teaching materials at Rice University. All rights reserved. 1

2 Administravia Lab2 due Class presentation by student today Lab3 has been posted on the web Due on 12/5 in 3 weeks Instruction scheduling for single basic block Implement 2 algorithms and compare results Lecture today covers the necessary background simulator and test files in ~cs406/lab3 CS406/534 Fall 2004, Prof. Li Xu 2 2

3 What We Did Last Time Procedure activation record and run-time structures for OO languages Change of schedule today to jump start lab3 Back to code generation in the next class CS406/534 Fall 2004, Prof. Li Xu 3 3

4 Today s Goals Instruction scheduling Motivation List scheduling Scheduling for larger regions CS406/534 Fall 2004, Prof. Li Xu 4 4

5 The Big Picture Source Code Front End IR Middle IR Back End Machine code Errors So far, we have covered the front-end Parsing, intermediate representation Middle end Data flow analysis and optimization, later Back end Generate machine code and deal with processor architecture Instruction scheduling today CS406/534 Fall 2004, Prof. Li Xu 5 5

6 The Big Picture IR Instruction Selection IR Register Allocation IR Instruction Scheduling Machine code Errors Back end has become increasingly important due to hardware architectures Multiple-issue, deep pipelining in modern micro-architecture Memory hierarchy to deal with CPU-memory gap Critical for application performance on modern machines CS406/534 Fall 2004, Prof. Li Xu 6 6

7 Recap from Computer Architecture Instruction-level parallelism Execute multiple instructions in parallel Employ multiple functional units and pipelining Typical paradigms Out-of-order superscalar Hardware dynamically reorders instructions Compiler scheduling can boost performance Examples: alpha, sparc, mips and other RISCs, even Pentium VLIW Compiler statically schedule parallel instructions into a long instruction word Depends on compiler to generate correct code Not just performance issues Examples: Intel Itanium IA-64, TI C6xxx CS406/534 Fall 2004, Prof. Li Xu 7 7

8 Motivation Instruction Scheduling Operations have non-uniform latencies due to pipelining Multiple-issue arch can issue several operations per cycle As a result, execution time is order-dependent Instruction Scheduling: decide on instruction execution order to minimize execution cycles Assumed latencies (conservative) Operation Cycles Loads & stores may or may not block load 3 > Non-blocking fill those issue slots store 3 Branch costs vary with path taken loadi 1 Branches typically have delay slots add 1 > Fill slots with unrelated operations mult 2 > Percolates branch upward fadd 1 Scheduler should hide the latencies fmult 2 shift 1 branch 0 to 8 Lab 3 will build a local scheduler CS406/534 Fall 2004, Prof. Li Xu 8 8

9 Example w w * 2 * x * y * z Simple schedule Schedule loads early 1 loadai r0,@w r1 4 add r1,r1 r1 5 loadai r0,@x r2 8 mult r1,r2 r1 9 loadai r0,@y r2 12 mult r1,r2 r1 13 loadai r0,@z r2 16 mult r1,r2 r1 18 storeai r1 r0,@w 21 r1 is free 1 loadai r0,@w r1 2 loadai r0,@x r2 3 loadai r0,@y r3 4 add r1,r1 r1 5 mult r1,r2 r1 6 loadai r0,@z r2 7 mult r1,r3 r1 9 mult r1,r2 r1 11 storeai r1 r0,@w 14 r1 is free 2 registers, 20 cycles 3 registers, 13 cycles Reordering operations for speed is called instruction scheduling CS406/534 Fall 2004, Prof. Li Xu 9 9

10 The Problem Instruction Scheduling Given a code fragment for some target machine and the latencies for each individual operation, reorder the operations to minimize execution time The Concept Machine description The task Produce correct code Minimize execution cycles slow code Scheduler fast code Avoid spilling registers Operate efficiently CS406/534 Fall 2004, Prof. Li Xu 10 10

11 Instruction Scheduling To capture properties of the code, build a precedence graph G Node n G are operations with type(n) and delay(n) An edge e = (n 1,n 2 ) G if & only if n 2 uses the result of n 1 a: loadai r0,@w r1 b: add r1,r1 r1 c: loadai r0,@x r2 d: mult r1,r2 r1 e: loadai r0,@y r2 f: mult r1,r2 r1 g: loadai r0,@z r2 h: mult r1,r2 r1 i: storeai r1 r0,@w a b d f c h i e g The Code The Precedence Graph CS406/534 Fall 2004, Prof. Li Xu 11 11

12 Instruction Scheduling A correct schedule S maps each n N into a non-negative integer representing its cycle number, and 1. S(n) 0, for all n N 2. If (n 1,n 2 ) E, S(n 1 ) + delay(n 1 ) S(n 2 ) 3. For each type t, there are no more operations of type t in any cycle than the target machine can issue The length of a schedule S, denoted L(S), is L(S) = max n N (S(n) + delay(n)) The goal is to find the shortest possible correct schedule. S is time-optimal if L(S) L(S 1 ), for all other schedules S 1 A schedule might also be optimal in terms of registers, power, or space. CS406/534 Fall 2004, Prof. Li Xu 12 12

13 Critical Points Instruction Scheduling All operands must be available Multiple operations can be ready Moving operations can lengthen register lifetimes Placing uses near definitions can shorten register lifetimes Operands can have multiple predecessors Together, these issues make scheduling hard (NP-Complete) Local scheduling is the simple case Restricted to straight-line code Consistent and predictable latencies CS406/534 Fall 2004, Prof. Li Xu 13 13

14 Instruction Scheduling The big picture 1. Build a precedence graph, P 2. Compute a priority function over the nodes in P 3. Use list scheduling to construct a schedule, one cycle at a time a. Use a queue of operations that are ready b. At each cycle I. Choose a ready operation and schedule it II. Update the ready queue Local list scheduling The dominant algorithm for twenty years A greedy, heuristic, local technique CS406/534 Fall 2004, Prof. Li Xu 14 14

15 Local List Scheduling Cycle 1 Ready leaves of P Active Ø while (Ready Active Ø) if (Ready Ø) then remove an op from Ready S(op) Cycle Active Active op Cycle Cycle + 1 for each op Active if (S(op) + delay(op) Cycle) then remove op from Active for each successor s of op in P if (s is ready) then Ready Ready s Removal in priority order op has completed execution If successor s operands are ready, put it on Ready CS406/534 Fall 2004, Prof. Li Xu 15 15

16 Scheduling Example 1. Build the precedence graph a: loadai r1 b: add r1,r1 r1 c: loadai r2 d: mult r1,r2 r1 e: loadai r2 f: mult r1,r2 r1 g: loadai r2 h: mult r1,r2 r1 i: storeai r1 a b d f c h i e g The Code The Precedence Graph CS406/534 Fall 2004, Prof. Li Xu 16 16

17 Scheduling Example 1. Build the precedence graph 2. Determine priorities: longest latency-weighted path a: loadai r1 b: add r1,r1 r1 c: loadai r2 d: mult r1,r2 r1 e: loadai r2 f: mult r1,r2 r1 g: loadai r2 h: mult r1,r2 r1 i: storeai r a b 9 d 7 12 c 10 e 8 g f 5 h 3 i The Code The Precedence Graph CS406/534 Fall 2004, Prof. Li Xu 17 17

18 Scheduling Example 1. Build the precedence graph 2. Determine priorities: longest latency-weighted path 3. Perform list scheduling New register name used 1) a: loadai r1 2) c: loadai r2 3) e: loadai r3 4) b: add r1,r1 r1 5) d: mult r1,r2 r1 6) g: loadai r2 7) f: mult r1,r3 r1 9) h: mult r1,r2 r1 11) i: storeai r a b 9 d 7 12 c 10 e 8 g f 5 h 3 i The Code The Precedence Graph CS406/534 Fall 2004, Prof. Li Xu 18 18

19 Detailed Scheduling Algorithm Idea: Keep a collection of worklists W[c], one per cycle We need MaxC = max delay + 1 such worklists Code: for each n N do begin count[n] := 0; earliest[n] = 0 end for each (n1,n2) E do begin count[n2] := count[n2] + 1; successors[n1] := successors[n1] {n2}; end for i := 0 to MaxC 1 do W[i] := ; Wcount := 0; for each n N do if count[n] = 0 then begin W[0] := W[0] {n}; Wcount := Wcount + 1; end c := 0; // c is the cycle number cw := 0;// cw is the number of the worklist for cycle c instr[c] := ; CS406/534 Fall 2004, Prof. Li Xu 19 19

20 Detailed Scheduling Algorithm while Wcount > 0 do begin while W[cW] = do begin c := c + 1; instr[c] := ; cw := mod(cw+1,maxc); end nextc := mod(c+1,maxc); while W[cW] do begin Priority select and remove an arbitrary instruction x from W[cW]; if free issue units of type(x) on cycle c then begin instr[c] := instr[c] {x}; Wcount := Wcount - 1; for each y successors[x] do begin count[y] := count[y] 1; earliest[y] := max(earliest[y], c+delay(x)); if count[y] = 0 then begin loc := mod(earliest[y],maxc); W[loc] := W[loc] {y}; Wcount := Wcount + 1; end end else W[nextc] := W[nextc] {x}; end end CS406/534 Fall 2004, Prof. Li Xu 20 20

21 More List Scheduling List scheduling breaks down into two distinct classes Forward list scheduling Start with available operations Work forward in time Ready all operands available Backward list scheduling Start with no successors Work backward in time Ready latency covers uses Variations on list scheduling Prioritize critical path(s) Schedule last use as soon as possible Depth first in precedence graph (minimize registers) Breadth first in precedence graph (minimize interlocks) Prefer operation with most successors CS406/534 Fall 2004, Prof. Li Xu 21 21

22 Local Scheduling As long as we stay within a single block List scheduling does well Problem is hard, so tie-breaking matters More descendants in dependence graph Prefer operation with a last use over one with none Breadth first makes progress on all paths Tends toward more ILP & fewer interlocks Depth first tries to complete uses of a value Tends to use fewer registers Classic work on this is Gibbons & Muchnick CS406/534 Fall 2004, Prof. Li Xu 22 22

23 Local Scheduling Forward and backward can produce different results loadi 1 lshift loadi 2 loadi 3 loadi add 1 add 2 add 3 add 4 addi Latency to the cbr Subscript to identify cmp store 1 store 2 store 3 store 4 store 5 1 cbr Block from SPEC benchmark go Operation load loadi add addi store cmp Latency CS406/534 Fall 2004, Prof. Li Xu 23 23

24 F o r w a r d S c h e d u l e Int loadi 1 loadi 2 loadi 4 add 2 add 4 cmp cbr Local Scheduling Int Mem Int B lshift a 1 loadi 4 loadi 3 c 2 addi add k 1 3 add w 4 add 3 a 4 add 3 addi store 1 r 5 add 2 d store 2 6 add 1 store 3 S 7 store 4 c 8 h store 5 9 e d 10 u 11 cmp l e 12 cbr 13 Using latency to root as the priority Mem CS406/534 Fall 2004, Prof. Li Xu 24 Int lshift loadi 3 loadi 2 loadi 1 store 5 store 4 store 3 store 2 store 1 24

25 Schielke s RBF algorithm Local Scheduling Run 5 passes of forward list scheduling and 5 passes of backward list scheduling Break each tie randomly Keep the best schedule Shortest time to completion Other metrics are possible (shortest time + fewest registers) Randomized Backward & Forward In practice, this does very well CS406/534 Fall 2004, Prof. Li Xu 25 25

26 Scheduling Larger Regions Superlocal Scheduling Work EBB at a time Example has four EBBs B 1 a b c d B 2 e f B 3 g B 4 h i B 5 j k B 6 l CS406/534 Fall 2004, Prof. Li Xu 26 26

27 Scheduling Larger Regions Superlocal Scheduling Work EBB at a time Example has four EBBs Only two have nontrivial paths {B 1,B 2,B 4 } & {B 1,B 3 } Having B 1 in both causes conflicts Moving an op out of B 1 causes problems B 2 e f B 1 a b c d B 3 g B 4 h i B 5 j k B 6 l CS406/534 Fall 2004, Prof. Li Xu 27 27

28 Scheduling Larger Regions Superlocal Scheduling Work EBB at a time Example has four EBBs Only two have nontrivial paths {B 1,B 2,B 4 } & {B 1,B 3 } Having B 1 in both causes conflicts Moving an op out of B 1 causes problems B 2 c,e f B 1 a b c d B 3 no c here! g B 4 h i B 5 j k B 6 l CS406/534 Fall 2004, Prof. Li Xu 28 28

29 Scheduling Larger Regions Superlocal Scheduling Work EBB at a time Example has four EBBs Only two have nontrivial paths {B 1,B 2,B 4 } & {B 1,B 3 } Having B 1 in both causes conflicts Moving an op out of B 1 causes problems Must insert compensation code in B 3 Increases code space B 4 h i B 2 c,e f B 1 B 5 a b c d j k B 3 This one wasn t done for speed! c g B 6 l CS406/534 Fall 2004, Prof. Li Xu 29 29

30 Scheduling Larger Regions Superlocal Scheduling Work EBB at a time Example has four EBBs Only two have nontrivial paths {B 1,B 2,B 4 } & {B 1,B 3 } Having B 1 in both causes conflicts Moving an op into B 1 causes problems B 4 h i B 2 e f B 1 B 5 a b c d j k B 3 g B 6 l CS406/534 Fall 2004, Prof. Li Xu 30 30

31 Scheduling Larger Regions Superlocal Scheduling Work EBB at a time Example has four EBBs Only two have nontrivial paths {B 1,B 2,B 4 } & {B 1,B 3 } Having B 1 in both causes conflicts Moving an op into B 1 causes problems Lengthens {B 1,B 3 } B 4 Adds computation to {B 1,B 3 } May need compensation code, too Renaming may avoid undof h i B 2 e f B 1 B 5 a b c d,f j k B 3 This makes the path even longer! undo f g B 6 l CS406/534 Fall 2004, Prof. Li Xu 31 31

32 Scheduling Larger Regions Superlocal Scheduling How much can we get? Schielke saw 11 to 12% speed ups Constrained away compensation code B 2 e f B 1 a b c d B 3 g B 4 h i B 5 j k B 6 l Value numbering is the best case for superlocal scope CS406/534 Fall 2004, Prof. Li Xu 32 32

33 Scheduling Larger Regions More Aggressive Superlocal Scheduling Clone blocks to create more context B 1 a b c d B 2 e f B 3 g Join points create blocks that must work in multiple contexts B 4 h i B 5 j k 2 paths B 6 l 3 paths CS406/534 Fall 2004, Prof. Li Xu 33 33

34 Scheduling Larger Regions More Aggressive Superlocal Scheduling Clone blocks to create more context Some blocks can combine Single successor, single predecessor B 1 a b c d B 2 e f B 3 g B 4 h i B 5a j k B 5b j k B 6a l B 6b l B 6c l CS406/534 Fall 2004, Prof. Li Xu 34 34

35 Scheduling Larger Regions More Aggressive Superlocal Scheduling Clone blocks to create more context Some blocks can combine Single successor, single predecessor B 1 a b c d B 2 e f B 3 g B 4 h i B 5a j k B 5b j k B 6a l B 6b l B 6c l CS406/534 Fall 2004, Prof. Li Xu 35 35

36 Scheduling Larger Regions More Aggressive Superlocal Scheduling Clone blocks to create more context Some blocks can combine Single successor, single predecessor Now schedule EBBs {B 1,B 2,B 4 }, {B 1,B 2,B 5q }, {B 1,B 3,B 5b } Pay heed to compensation code B 2 e f B 1 a b c d B 3 g Works well for forward motion Backward motion still has off-path problems Speeding up one path can slow down others (undo) B 4 h i l B 5a j k l B 5b j k l CS406/534 Fall 2004, Prof. Li Xu 36 36

37 Scheduling Larger Regions Trace Scheduling Start with execution counts for edges Obtained by profiling B 1 a b c d B 2 e f B 3 g B 4 h i B 5 j k B 6 l CS406/534 Fall 2004, Prof. Li Xu 37 37

38 Scheduling Larger Regions Trace Scheduling Start with execution counts for edges Obtained by profiling Pick the hot path B 1 7 a b c d 10 3 B 2 e f B 3 g B 4 h i B 5 j k 5 B 6 l 5 Block counts could mislead us see B 5 CS406/534 Fall 2004, Prof. Li Xu 38 38

39 Scheduling Larger Regions Trace Scheduling Start with execution counts for edges Obtained by profiling Pick the hot path B 1,B 2,B 4,B 6 Schedule it Compensation code in B 3,B 5 if needed Get the hot path right! B 4 If we picked the right path, the other blocks do not matter as much Places a premium on quality profiles h i B B 6 e f l B 1 7 B a b c d j k 10 B g CS406/534 Fall 2004, Prof. Li Xu 39 39

40 Scheduling Larger Regions Trace Scheduling Entire CFG Pick & schedule hot path Insert compensation code Remove hot path from CFG Repeat the process until CFG is empty B 1 7 a b c d 10 3 B 2 e f B 3 g Idea Hot paths matter Farther off hot path, less it matters B 4 h i 5 5 B 6 l B j k 3 CS406/534 Fall 2004, Prof. Li Xu 40 40

41 Interaction with Register Allocation Dependence constraint due to register names True dependence (a: =>r1 b:,r1 =>) Anti dependence (a: r1 => b: => r1) Output dependence (a: =>r1 b: => r1) Renaming can remove anti and output dependence Usually increase register pressure Solution: pre- and post-scheduling CS406/534 Fall 2004, Prof. Li Xu 41 41

42 Lab 3 Static schedule machine model Run iloc_sim s 3 <file.i for non-scheduled code Implement two schedulers for basic blocks in ILOC One must be list scheduling with specified priority Other can be a second priority, or a different algorithm Same ILOC subset as in lab 1 (plus NOP) Different execution model Two asymmetric functional units Latencies different from lab 1 Simulator different from lab 1 CS406/534 Fall 2004, Prof. Li Xu 42 42

43 Lab 3 Specific questions 1. Are we in over our heads? No. The hard part of this lab should be trying to get good code. The programming is not bad; you can reuse some stuff from lab 1. You have all the specifications & tools you need. Jump in and start programming. 2. How many registers can we use? Assume that you have as many registers as you need. If the simulator runs out, we ll get more. Rename registers to avoid false dependences & conflicts. 3. What about a store followed by a load? If you can show that the two operations must refer to different memory locations, the scheduler can overlap their execution. Otherwise, the store must complete before the load issues. 4. What about test files? We will put some test files online. We will test your lab on files you do not see. We have had problems (in the past) with people optimizing their labs for the test data. (Not that you would do that!) CS406/534 Fall 2004, Prof. Li Xu 43 43

44 Lab 3 - Hints Begin by renaming registers to eliminate false dependencies Pay attention to loads and stores When can they be reordered? Understand the simulator Inserting NOPs may be necessary to get correct code Tiebreakers can make a difference CS406/534 Fall 2004, Prof. Li Xu 44 44

45 Summary Instruction scheduling List scheduling Scheduling larger regions CS406/534 Fall 2004, Prof. Li Xu 45 45

46 Next Class Code shape Code generation CS406/534 Fall 2004, Prof. Li Xu 46 46

CS415 Compilers. Instruction Scheduling and Lexical Analysis

CS415 Compilers. Instruction Scheduling and Lexical Analysis CS415 Compilers Instruction Scheduling and Lexical Analysis These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University Instruction Scheduling (Engineer

More information

Introduction to Optimization, Instruction Selection and Scheduling, and Register Allocation

Introduction to Optimization, Instruction Selection and Scheduling, and Register Allocation Introduction to Optimization, Instruction Selection and Scheduling, and Register Allocation Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Traditional Three-pass Compiler

More information

Instruction Selection and Scheduling

Instruction Selection and Scheduling Instruction Selection and Scheduling The Problem Writing a compiler is a lot of work Would like to reuse components whenever possible Would like to automate construction of components Front End Middle

More information

fast code (preserve flow of data)

fast code (preserve flow of data) Instruction scheduling: The engineer s view The problem Given a code fragment for some target machine and the latencies for each individual instruction, reorder the instructions to minimize execution time

More information

Instruction Scheduling Beyond Basic Blocks Extended Basic Blocks, Superblock Cloning, & Traces, with a quick introduction to Dominators.

Instruction Scheduling Beyond Basic Blocks Extended Basic Blocks, Superblock Cloning, & Traces, with a quick introduction to Dominators. Instruction Scheduling Beyond Basic Blocks Extended Basic Blocks, Superblock Cloning, & Traces, with a quick introduction to Dominators Comp 412 COMP 412 FALL 2016 source code IR Front End Optimizer Back

More information

k register IR Register Allocation IR Instruction Scheduling n), maybe O(n 2 ), but not O(2 n ) k register code

k register IR Register Allocation IR Instruction Scheduling n), maybe O(n 2 ), but not O(2 n ) k register code Register Allocation Part of the compiler s back end IR Instruction Selection m register IR Register Allocation k register IR Instruction Scheduling Machine code Errors Critical properties Produce correct

More information

CS 406/534 Compiler Construction Putting It All Together

CS 406/534 Compiler Construction Putting It All Together CS 406/534 Compiler Construction Putting It All Together Prof. Li Xu Dept. of Computer Science UMass Lowell Fall 2004 Part of the course lecture notes are based on Prof. Keith Cooper, Prof. Ken Kennedy

More information

Instruction Scheduling

Instruction Scheduling Instruction Scheduling Michael O Boyle February, 2014 1 Course Structure Introduction and Recap Course Work Scalar optimisation and dataflow L5 Code generation L6 Instruction scheduling Next register allocation

More information

CS 406/534 Compiler Construction Instruction Scheduling beyond Basic Block and Code Generation

CS 406/534 Compiler Construction Instruction Scheduling beyond Basic Block and Code Generation CS 406/534 Compiler Construction Instruction Scheduling beyond Basic Block and Code Generation Prof. Li Xu Dept. of Computer Science UMass Lowell Fall 2004 Part of the course lecture notes are based on

More information

CS415 Compilers Register Allocation. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

CS415 Compilers Register Allocation. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University CS415 Compilers Register Allocation These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University Review: The Back End IR Instruction Selection IR Register

More information

Lab 3, Tutorial 1 Comp 412

Lab 3, Tutorial 1 Comp 412 COMP 412 FALL 2018 Lab 3, Tutorial 1 Comp 412 source code IR IR Front End Optimizer Back End target code Copyright 2018, Keith D. Cooper, Linda Torczon & Zoran Budimlić, all rights reserved. Students enrolled

More information

Lecture 18 List Scheduling & Global Scheduling Carnegie Mellon

Lecture 18 List Scheduling & Global Scheduling Carnegie Mellon Lecture 18 List Scheduling & Global Scheduling Reading: Chapter 10.3-10.4 1 Review: The Ideal Scheduling Outcome What prevents us from achieving this ideal? Before After Time 1 cycle N cycles 2 Review:

More information

CS5363 Final Review. cs5363 1

CS5363 Final Review. cs5363 1 CS5363 Final Review cs5363 1 Programming language implementation Programming languages Tools for describing data and algorithms Instructing machines what to do Communicate between computers and programmers

More information

Review: The Ideal Scheduling Outcome. List Scheduling. Scheduling Roadmap. Review: Scheduling Constraints. Todd C. Mowry CS745: Optimizing Compilers

Review: The Ideal Scheduling Outcome. List Scheduling. Scheduling Roadmap. Review: Scheduling Constraints. Todd C. Mowry CS745: Optimizing Compilers List Scheduling Review: The Ideal Scheduling Outcome Before After Time cycle CS: Optimizing Compilers N cycles What prevents us from achieving this ideal? CS: Instruction Scheduling -- Review: Scheduling

More information

Instruction Selection: Preliminaries. Comp 412

Instruction Selection: Preliminaries. Comp 412 COMP 412 FALL 2017 Instruction Selection: Preliminaries Comp 412 source code Front End Optimizer Back End target code Copyright 2017, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled

More information

Administration CS 412/413. Instruction ordering issues. Simplified architecture model. Examples. Impact of instruction ordering

Administration CS 412/413. Instruction ordering issues. Simplified architecture model. Examples. Impact of instruction ordering dministration CS 1/13 Introduction to Compilers and Translators ndrew Myers Cornell University P due in 1 week Optional reading: Muchnick 17 Lecture 30: Instruction scheduling 1 pril 00 1 Impact of instruction

More information

Instruction Selection: Peephole Matching. Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.

Instruction Selection: Peephole Matching. Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Instruction Selection: Peephole Matching Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. The Problem Writing a compiler is a lot of work Would like to reuse components

More information

CS415 Compilers. Intermediate Represeation & Code Generation

CS415 Compilers. Intermediate Represeation & Code Generation CS415 Compilers Intermediate Represeation & Code Generation These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University Review - Types of Intermediate Representations

More information

The ILOC Simulator User Documentation

The ILOC Simulator User Documentation The ILOC Simulator User Documentation COMP 412, Fall 2015 Documentation for Lab 1 The ILOC instruction set is taken from the book, Engineering A Compiler, published by the Elsevier Morgan-Kaufmann [1].

More information

CS 406/534 Compiler Construction Instruction Selection and Global Register Allocation

CS 406/534 Compiler Construction Instruction Selection and Global Register Allocation CS 406/534 Compiler Construction Instruction Selection and Global Register Allocation Prof. Li Xu Dept. of Computer Science UMass Lowell Fall 2004 Part of the course lecture notes are based on Prof. Keith

More information

Global Register Allocation via Graph Coloring

Global Register Allocation via Graph Coloring Global Register Allocation via Graph Coloring Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit permission

More information

Lecture 7. Instruction Scheduling. I. Basic Block Scheduling II. Global Scheduling (for Non-Numeric Code)

Lecture 7. Instruction Scheduling. I. Basic Block Scheduling II. Global Scheduling (for Non-Numeric Code) Lecture 7 Instruction Scheduling I. Basic Block Scheduling II. Global Scheduling (for Non-Numeric Code) Reading: Chapter 10.3 10.4 CS243: Instruction Scheduling 1 Scheduling Constraints Data dependences

More information

Lecture Compiler Backend

Lecture Compiler Backend Lecture 19-23 Compiler Backend Jianwen Zhu Electrical and Computer Engineering University of Toronto Jianwen Zhu 2009 - P. 1 Backend Tasks Instruction selection Map virtual instructions To machine instructions

More information

The ILOC Simulator User Documentation

The ILOC Simulator User Documentation The ILOC Simulator User Documentation Spring 2015 Semester The ILOC instruction set is taken from the book, Engineering A Compiler, published by the Morgan- Kaufmann imprint of Elsevier [1]. The simulator

More information

The ILOC Simulator User Documentation

The ILOC Simulator User Documentation The ILOC Simulator User Documentation Comp 506, Spring 2017 The ILOC instruction set is taken from the book, Engineering A Compiler, published by the Morgan- Kaufmann imprint of Elsevier [1]. The simulator

More information

The ILOC Virtual Machine (Lab 1 Background Material) Comp 412

The ILOC Virtual Machine (Lab 1 Background Material) Comp 412 COMP 12 FALL 20 The ILOC Virtual Machine (Lab 1 Background Material) Comp 12 source code IR Front End OpMmizer Back End IR target code Copyright 20, Keith D. Cooper & Linda Torczon, all rights reserved.

More information

Lecture 12: Compiler Backend

Lecture 12: Compiler Backend CS 515 Programming Language and Compilers I Lecture 1: Compiler Backend (The lectures are based on the slides copyrighted by Keith Cooper and Linda Torczon from Rice University.) Zheng (Eddy) Zhang Rutgers

More information

Instruction Selection, II Tree-pattern matching

Instruction Selection, II Tree-pattern matching Instruction Selection, II Tree-pattern matching Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 4 at Rice University have explicit permission

More information

Implementing Control Flow Constructs Comp 412

Implementing Control Flow Constructs Comp 412 COMP 412 FALL 2018 Implementing Control Flow Constructs Comp 412 source code IR Front End Optimizer Back End IR target code Copyright 2018, Keith D. Cooper & Linda Torczon, all rights reserved. Students

More information

Local Register Allocation (critical content for Lab 2) Comp 412

Local Register Allocation (critical content for Lab 2) Comp 412 Updated After Tutorial COMP 412 FALL 2018 Local Register Allocation (critical content for Lab 2) Comp 412 source code IR Front End Optimizer Back End IR target code Copyright 2018, Keith D. Cooper & Linda

More information

CS415 Compilers. Lexical Analysis

CS415 Compilers. Lexical Analysis CS415 Compilers Lexical Analysis These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University Lecture 7 1 Announcements First project and second homework

More information

Code Shape Comp 412 COMP 412 FALL Chapters 4, 5, 6 & 7 in EaC2e. source code. IR IR target. code. Front End Optimizer Back End

Code Shape Comp 412 COMP 412 FALL Chapters 4, 5, 6 & 7 in EaC2e. source code. IR IR target. code. Front End Optimizer Back End COMP 412 FALL 2017 Code Shape Comp 412 source code IR IR target Front End Optimizer Back End code Copyright 2017, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at

More information

Advanced Computer Architecture

Advanced Computer Architecture ECE 563 Advanced Computer Architecture Fall 2010 Lecture 6: VLIW 563 L06.1 Fall 2010 Little s Law Number of Instructions in the pipeline (parallelism) = Throughput * Latency or N T L Throughput per Cycle

More information

Local Optimization: Value Numbering The Desert Island Optimization. Comp 412 COMP 412 FALL Chapter 8 in EaC2e. target code

Local Optimization: Value Numbering The Desert Island Optimization. Comp 412 COMP 412 FALL Chapter 8 in EaC2e. target code COMP 412 FALL 2017 Local Optimization: Value Numbering The Desert Island Optimization Comp 412 source code IR Front End Optimizer Back End IR target code Copyright 2017, Keith D. Cooper & Linda Torczon,

More information

Data Structures and Algorithms in Compiler Optimization. Comp314 Lecture Dave Peixotto

Data Structures and Algorithms in Compiler Optimization. Comp314 Lecture Dave Peixotto Data Structures and Algorithms in Compiler Optimization Comp314 Lecture Dave Peixotto 1 What is a compiler Compilers translate between program representations Interpreters evaluate their input to produce

More information

Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit

Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit Intermediate Representations Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit permission to make copies

More information

Simple Machine Model. Lectures 14 & 15: Instruction Scheduling. Simple Execution Model. Simple Execution Model

Simple Machine Model. Lectures 14 & 15: Instruction Scheduling. Simple Execution Model. Simple Execution Model Simple Machine Model Fall 005 Lectures & 5: Instruction Scheduling Instructions are executed in sequence Fetch, decode, execute, store results One instruction at a time For branch instructions, start fetching

More information

: Advanced Compiler Design. 8.0 Instruc?on scheduling

: Advanced Compiler Design. 8.0 Instruc?on scheduling 6-80: Advanced Compiler Design 8.0 Instruc?on scheduling Thomas R. Gross Computer Science Department ETH Zurich, Switzerland Overview 8. Instruc?on scheduling basics 8. Scheduling for ILP processors 8.

More information

Code Generation. CS 540 George Mason University

Code Generation. CS 540 George Mason University Code Generation CS 540 George Mason University Compiler Architecture Intermediate Language Intermediate Language Source language Scanner (lexical analysis) tokens Parser (syntax analysis) Syntactic structure

More information

Background: Pipelining Basics. Instruction Scheduling. Pipelining Details. Idealized Instruction Data-Path. Last week Register allocation

Background: Pipelining Basics. Instruction Scheduling. Pipelining Details. Idealized Instruction Data-Path. Last week Register allocation Instruction Scheduling Last week Register allocation Background: Pipelining Basics Idea Begin executing an instruction before completing the previous one Today Instruction scheduling The problem: Pipelined

More information

Hardware Speculation Support

Hardware Speculation Support Hardware Speculation Support Conditional instructions Most common form is conditional move BNEZ R1, L ;if MOV R2, R3 ;then CMOVZ R2,R3, R1 L: ;else Other variants conditional loads and stores nullification

More information

CS311 Lecture: Pipelining, Superscalar, and VLIW Architectures revised 10/18/07

CS311 Lecture: Pipelining, Superscalar, and VLIW Architectures revised 10/18/07 CS311 Lecture: Pipelining, Superscalar, and VLIW Architectures revised 10/18/07 Objectives ---------- 1. To introduce the basic concept of CPU speedup 2. To explain how data and branch hazards arise as

More information

Topic 14: Scheduling COS 320. Compiling Techniques. Princeton University Spring Lennart Beringer

Topic 14: Scheduling COS 320. Compiling Techniques. Princeton University Spring Lennart Beringer Topic 14: Scheduling COS 320 Compiling Techniques Princeton University Spring 2016 Lennart Beringer 1 The Back End Well, let s see Motivating example Starting point Motivating example Starting point Multiplication

More information

ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 15 Very Long Instruction Word Machines

ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 15 Very Long Instruction Word Machines ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 15 Very Long Instruction Word Machines Benjamin Lee Electrical and Computer Engineering Duke University www.duke.edu/~bcl15 www.duke.edu/~bcl15/class/class_ece252fall11.html

More information

Lecture 13 - VLIW Machines and Statically Scheduled ILP

Lecture 13 - VLIW Machines and Statically Scheduled ILP CS 152 Computer Architecture and Engineering Lecture 13 - VLIW Machines and Statically Scheduled ILP John Wawrzynek Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~johnw

More information

CSE P 501 Compilers. Register Allocation Hal Perkins Autumn /22/ Hal Perkins & UW CSE P-1

CSE P 501 Compilers. Register Allocation Hal Perkins Autumn /22/ Hal Perkins & UW CSE P-1 CSE P 501 Compilers Register Allocation Hal Perkins Autumn 2011 11/22/2011 2002-11 Hal Perkins & UW CSE P-1 Agenda Register allocation constraints Local methods Faster compile, slower code, but good enough

More information

Compiler Design. Register Allocation. Hwansoo Han

Compiler Design. Register Allocation. Hwansoo Han Compiler Design Register Allocation Hwansoo Han Big Picture of Code Generation Register allocation Decides which values will reside in registers Changes the storage mapping Concerns about placement of

More information

Software Tools for Lab 1

Software Tools for Lab 1 Software Tools for Lab 1 We have built a number of tools to help you build and debug Lab 1 An ILOC Simulator Essentially, an interpreter for Lab 1 ILOC Matches the Lab 1 specs Single functional unit, latencies

More information

VLIW/EPIC: Statically Scheduled ILP

VLIW/EPIC: Statically Scheduled ILP 6.823, L21-1 VLIW/EPIC: Statically Scheduled ILP Computer Science & Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind

More information

Compiler Architecture

Compiler Architecture Code Generation 1 Compiler Architecture Source language Scanner (lexical analysis) Tokens Parser (syntax analysis) Syntactic structure Semantic Analysis (IC generator) Intermediate Language Code Optimizer

More information

ECE 252 / CPS 220 Advanced Computer Architecture I. Lecture 14 Very Long Instruction Word Machines

ECE 252 / CPS 220 Advanced Computer Architecture I. Lecture 14 Very Long Instruction Word Machines ECE 252 / CPS 220 Advanced Computer Architecture I Lecture 14 Very Long Instruction Word Machines Benjamin Lee Electrical and Computer Engineering Duke University www.duke.edu/~bcl15 www.duke.edu/~bcl15/class/class_ece252fall11.html

More information

RISC & Superscalar. COMP 212 Computer Organization & Architecture. COMP 212 Fall Lecture 12. Instruction Pipeline no hazard.

RISC & Superscalar. COMP 212 Computer Organization & Architecture. COMP 212 Fall Lecture 12. Instruction Pipeline no hazard. COMP 212 Computer Organization & Architecture Pipeline Re-Cap Pipeline is ILP -Instruction Level Parallelism COMP 212 Fall 2008 Lecture 12 RISC & Superscalar Divide instruction cycles into stages, overlapped

More information

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation

More information

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining and Instruction-Level Parallelism (ILP). Definition of basic instruction block Increasing Instruction-Level Parallelism (ILP) &

More information

UNIT 8 1. Explain in detail the hardware support for preserving exception behavior during Speculation.

UNIT 8 1. Explain in detail the hardware support for preserving exception behavior during Speculation. UNIT 8 1. Explain in detail the hardware support for preserving exception behavior during Speculation. July 14) (June 2013) (June 2015)(Jan 2016)(June 2016) H/W Support : Conditional Execution Also known

More information

CS 406/534 Compiler Construction Parsing Part I

CS 406/534 Compiler Construction Parsing Part I CS 406/534 Compiler Construction Parsing Part I Prof. Li Xu Dept. of Computer Science UMass Lowell Fall 2004 Part of the course lecture notes are based on Prof. Keith Cooper, Prof. Ken Kennedy and Dr.

More information

CS 432 Fall Mike Lam, Professor. Code Generation

CS 432 Fall Mike Lam, Professor. Code Generation CS 432 Fall 2015 Mike Lam, Professor Code Generation Compilers "Back end" Source code Tokens Syntax tree Machine code char data[20]; int main() { float x = 42.0; return 7; } 7f 45 4c 46 01 01 01 00 00

More information

Computer Systems Architecture I. CSE 560M Lecture 10 Prof. Patrick Crowley

Computer Systems Architecture I. CSE 560M Lecture 10 Prof. Patrick Crowley Computer Systems Architecture I CSE 560M Lecture 10 Prof. Patrick Crowley Plan for Today Questions Dynamic Execution III discussion Multiple Issue Static multiple issue (+ examples) Dynamic multiple issue

More information

Computer Architecture and Engineering CS152 Quiz #3 March 22nd, 2012 Professor Krste Asanović

Computer Architecture and Engineering CS152 Quiz #3 March 22nd, 2012 Professor Krste Asanović Computer Architecture and Engineering CS52 Quiz #3 March 22nd, 202 Professor Krste Asanović Name: This is a closed book, closed notes exam. 80 Minutes 0 Pages Notes: Not all questions are

More information

Register Allocation. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Register Allocation. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice. Register Allocation Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP at Rice. Copyright 00, Keith D. Cooper & Linda Torczon, all rights reserved.

More information

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Real Processors Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel

More information

COMPUTER ORGANIZATION AND DESI

COMPUTER ORGANIZATION AND DESI COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler

More information

Multiple Instruction Issue. Superscalars

Multiple Instruction Issue. Superscalars Multiple Instruction Issue Multiple instructions issued each cycle better performance increase instruction throughput decrease in CPI (below 1) greater hardware complexity, potentially longer wire lengths

More information

Multi-cycle Instructions in the Pipeline (Floating Point)

Multi-cycle Instructions in the Pipeline (Floating Point) Lecture 6 Multi-cycle Instructions in the Pipeline (Floating Point) Introduction to instruction level parallelism Recap: Support of multi-cycle instructions in a pipeline (App A.5) Recap: Superpipelining

More information

UG4 Honours project selection: Talk to Vijay or Boris if interested in computer architecture projects

UG4 Honours project selection: Talk to Vijay or Boris if interested in computer architecture projects Announcements UG4 Honours project selection: Talk to Vijay or Boris if interested in computer architecture projects Inf3 Computer Architecture - 2017-2018 1 Last time: Tomasulo s Algorithm Inf3 Computer

More information

The Processor: Instruction-Level Parallelism

The Processor: Instruction-Level Parallelism The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy

More information

15-740/ Computer Architecture Lecture 23: Superscalar Processing (III) Prof. Onur Mutlu Carnegie Mellon University

15-740/ Computer Architecture Lecture 23: Superscalar Processing (III) Prof. Onur Mutlu Carnegie Mellon University 15-740/18-740 Computer Architecture Lecture 23: Superscalar Processing (III) Prof. Onur Mutlu Carnegie Mellon University Announcements Homework 4 Out today Due November 15 Midterm II November 22 Project

More information

Code Shape II Expressions & Assignment

Code Shape II Expressions & Assignment Code Shape II Expressions & Assignment Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit permission to make

More information

Agenda. What is the Itanium Architecture? Terminology What is the Itanium Architecture? Thomas Siebold Technology Consultant Alpha Systems Division

Agenda. What is the Itanium Architecture? Terminology What is the Itanium Architecture? Thomas Siebold Technology Consultant Alpha Systems Division What is the Itanium Architecture? Thomas Siebold Technology Consultant Alpha Systems Division thomas.siebold@hp.com Agenda Terminology What is the Itanium Architecture? 1 Terminology Processor Architectures

More information

Code generation for modern processors

Code generation for modern processors Code generation for modern processors Definitions (1 of 2) What are the dominant performance issues for a superscalar RISC processor? Refs: AS&U, Chapter 9 + Notes. Optional: Muchnick, 16.3 & 17.1 Instruction

More information

Code generation for modern processors

Code generation for modern processors Code generation for modern processors What are the dominant performance issues for a superscalar RISC processor? Refs: AS&U, Chapter 9 + Notes. Optional: Muchnick, 16.3 & 17.1 Strategy il il il il asm

More information

Intermediate Representations

Intermediate Representations Most of the material in this lecture comes from Chapter 5 of EaC2 Intermediate Representations Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP

More information

CS 152 Computer Architecture and Engineering. Lecture 13 - VLIW Machines and Statically Scheduled ILP

CS 152 Computer Architecture and Engineering. Lecture 13 - VLIW Machines and Statically Scheduled ILP CS 152 Computer Architecture and Engineering Lecture 13 - VLIW Machines and Statically Scheduled ILP Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste!

More information

CS415 Compilers Overview of the Course. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

CS415 Compilers Overview of the Course. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University CS415 Compilers Overview of the Course These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University Critical Facts Welcome to CS415 Compilers Topics in the

More information

Lecture 7: Pipelining Contd. More pipelining complications: Interrupts and Exceptions

Lecture 7: Pipelining Contd. More pipelining complications: Interrupts and Exceptions Lecture 7: Pipelining Contd. Kunle Olukotun Gates 302 kunle@ogun.stanford.edu http://www-leland.stanford.edu/class/ee282h/ 1 More pipelining complications: Interrupts and Exceptions Hard to handle in pipelined

More information

Processor (IV) - advanced ILP. Hwansoo Han

Processor (IV) - advanced ILP. Hwansoo Han Processor (IV) - advanced ILP Hwansoo Han Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline Less work per stage shorter clock cycle

More information

Pipelined Processor Design. EE/ECE 4305: Computer Architecture University of Minnesota Duluth By Dr. Taek M. Kwon

Pipelined Processor Design. EE/ECE 4305: Computer Architecture University of Minnesota Duluth By Dr. Taek M. Kwon Pipelined Processor Design EE/ECE 4305: Computer Architecture University of Minnesota Duluth By Dr. Taek M. Kwon Concept Identification of Pipeline Segments Add Pipeline Registers Pipeline Stage Control

More information

The IA-64 Architecture. Salient Points

The IA-64 Architecture. Salient Points The IA-64 Architecture Department of Electrical Engineering at College Park OUTLINE: Architecture overview Background Architecture Specifics UNIVERSITY OF MARYLAND AT COLLEGE PARK Salient Points 128 Registers

More information

Computing Inside The Parser Syntax-Directed Translation, II. Comp 412 COMP 412 FALL Chapter 4 in EaC2e. source code. IR IR target.

Computing Inside The Parser Syntax-Directed Translation, II. Comp 412 COMP 412 FALL Chapter 4 in EaC2e. source code. IR IR target. COMP 412 FALL 20167 Computing Inside The Parser Syntax-Directed Translation, II Comp 412 source code IR IR target Front End Optimizer Back End code Copyright 2017, Keith D. Cooper & Linda Torczon, all

More information

Getting CPI under 1: Outline

Getting CPI under 1: Outline CMSC 411 Computer Systems Architecture Lecture 12 Instruction Level Parallelism 5 (Improving CPI) Getting CPI under 1: Outline More ILP VLIW branch target buffer return address predictor superscalar more

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

CS 432 Fall Mike Lam, Professor. Data-Flow Analysis

CS 432 Fall Mike Lam, Professor. Data-Flow Analysis CS 432 Fall 2018 Mike Lam, Professor Data-Flow Analysis Compilers "Back end" Source code Tokens Syntax tree Machine code char data[20]; int main() { float x = 42.0; return 7; } 7f 45 4c 46 01 01 01 00

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #22 CPU Design: Pipelining to Improve Performance II 2007-8-1 Scott Beamer, Instructor CS61C L22 CPU Design : Pipelining to Improve Performance

More information

CS 406/534 Compiler Construction Intermediate Representation and Procedure Abstraction

CS 406/534 Compiler Construction Intermediate Representation and Procedure Abstraction CS 406/534 Compiler Construction Intermediate Representation and Procedure Abstraction Prof. Li Xu Dept. of Computer Science UMass Lowell Fall 2004 Part of the course lecture notes are based on Prof. Keith

More information

Instruction Level Parallelism

Instruction Level Parallelism Instruction Level Parallelism Software View of Computer Architecture COMP2 Godfrey van der Linden 200-0-0 Introduction Definition of Instruction Level Parallelism(ILP) Pipelining Hazards & Solutions Dynamic

More information

Instruction scheduling. Advanced Compiler Construction Michel Schinz

Instruction scheduling. Advanced Compiler Construction Michel Schinz Instruction scheduling Advanced Compiler Construction Michel Schinz 2015 05 21 Instruction ordering When a compiler emits the instructions corresponding to a program, it imposes a total order on them.

More information

Photo David Wright STEVEN R. BAGLEY PIPELINES AND ILP

Photo David Wright   STEVEN R. BAGLEY PIPELINES AND ILP Photo David Wright https://www.flickr.com/photos/dhwright/3312563248 STEVEN R. BAGLEY PIPELINES AND ILP INTRODUCTION Been considering what makes the CPU run at a particular speed Spent the last two weeks

More information

A Bad Name. CS 2210: Optimization. Register Allocation. Optimization. Reaching Definitions. Dataflow Analyses 4/10/2013

A Bad Name. CS 2210: Optimization. Register Allocation. Optimization. Reaching Definitions. Dataflow Analyses 4/10/2013 A Bad Name Optimization is the process by which we turn a program into a better one, for some definition of better. CS 2210: Optimization This is impossible in the general case. For instance, a fully optimizing

More information

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading)

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) Limits to ILP Conflicting studies of amount of ILP Benchmarks» vectorized Fortran FP vs. integer

More information

CS425 Computer Systems Architecture

CS425 Computer Systems Architecture CS425 Computer Systems Architecture Fall 2017 Multiple Issue: Superscalar and VLIW CS425 - Vassilis Papaefstathiou 1 Example: Dynamic Scheduling in PowerPC 604 and Pentium Pro In-order Issue, Out-of-order

More information

Intermediate Representations

Intermediate Representations COMP 506 Rice University Spring 2018 Intermediate Representations source code IR Front End Optimizer Back End IR target code Copyright 2018, Keith D. Cooper & Linda Torczon, all rights reserved. Students

More information

Preventing Stalls: 1

Preventing Stalls: 1 Preventing Stalls: 1 2 PipeLine Pipeline efficiency Pipeline CPI = Ideal pipeline CPI + Structural Stalls + Data Hazard Stalls + Control Stalls Ideal pipeline CPI: best possible (1 as n ) Structural hazards:

More information

Parallelism. Execution Cycle. Dual Bus Simple CPU. Pipelining COMP375 1

Parallelism. Execution Cycle. Dual Bus Simple CPU. Pipelining COMP375 1 Pipelining COMP375 Computer Architecture and dorganization Parallelism The most common method of making computers faster is to increase parallelism. There are many levels of parallelism Macro Multiple

More information

An Experimental Evaluation of List Scheduling

An Experimental Evaluation of List Scheduling An Experimental Evaluation of List Scheduling Keith D. Cooper, Philip J. Schielke, and Devika Subramanian Department of Computer Science Rice University Houston, Texas, USA Rice Computer Science Techcnical

More information

CS377P Programming for Performance Single Thread Performance Out-of-order Superscalar Pipelines

CS377P Programming for Performance Single Thread Performance Out-of-order Superscalar Pipelines CS377P Programming for Performance Single Thread Performance Out-of-order Superscalar Pipelines Sreepathi Pai UTCS September 14, 2015 Outline 1 Introduction 2 Out-of-order Scheduling 3 The Intel Haswell

More information

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1 CSE 820 Graduate Computer Architecture week 6 Instruction Level Parallelism Based on slides by David Patterson Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level

More information

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Consider: a = b + c; d = e - f; Assume loads have a latency of one clock cycle:

More information

What Compilers Can and Cannot Do. Saman Amarasinghe Fall 2009

What Compilers Can and Cannot Do. Saman Amarasinghe Fall 2009 What Compilers Can and Cannot Do Saman Amarasinghe Fall 009 Optimization Continuum Many examples across the compilation pipeline Static Dynamic Program Compiler Linker Loader Runtime System Optimization

More information

Lecture 9: More ILP. Today: limits of ILP, case studies, boosting ILP (Sections )

Lecture 9: More ILP. Today: limits of ILP, case studies, boosting ILP (Sections ) Lecture 9: More ILP Today: limits of ILP, case studies, boosting ILP (Sections 3.8-3.14) 1 ILP Limits The perfect processor: Infinite registers (no WAW or WAR hazards) Perfect branch direction and target

More information