Lab 3, Tutorial 1 Comp 412

Similar documents
Local Register Allocation (critical content for Lab 2) Comp 412

CS415 Compilers. Instruction Scheduling and Lexical Analysis

Runtime Support for Algol-Like Languages Comp 412

Software Tools for Lab 1

Implementing Control Flow Constructs Comp 412

Instruction Selection: Preliminaries. Comp 412

Local Optimization: Value Numbering The Desert Island Optimization. Comp 412 COMP 412 FALL Chapter 8 in EaC2e. target code

CS 406/534 Compiler Construction Instruction Scheduling

Instruction Scheduling Beyond Basic Blocks Extended Basic Blocks, Superblock Cloning, & Traces, with a quick introduction to Dominators.

Register Allocation. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Instruction Selection and Scheduling

Introduction to Optimization, Instruction Selection and Scheduling, and Register Allocation

Syntax Analysis, V Bottom-up Parsing & The Magic of Handles Comp 412

Intermediate Representations

Computing Inside The Parser Syntax-Directed Translation, II. Comp 412 COMP 412 FALL Chapter 4 in EaC2e. source code. IR IR target.

COMP 412, Fall 2018 Lab 3: Instruction Scheduling

Computing Inside The Parser Syntax-Directed Translation, II. Comp 412

Code Shape Comp 412 COMP 412 FALL Chapters 4, 5, 6 & 7 in EaC2e. source code. IR IR target. code. Front End Optimizer Back End

Intermediate Representations

COMP 412, Fall 2018 Lab 1: A Front End for ILOC

CS415 Compilers. Intermediate Represeation & Code Generation

Syntax Analysis, VI Examples from LR Parsing. Comp 412

CS 432 Fall Mike Lam, Professor. Compilers. Advanced Systems Elective

The ILOC Virtual Machine (Lab 1 Background Material) Comp 412

Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit

Computing Inside The Parser Syntax-Directed Translation. Comp 412 COMP 412 FALL Chapter 4 in EaC2e. source code. IR IR target.

The Software Stack: From Assembly Language to Machine Code

The ILOC Simulator User Documentation

Generating Code for Assignment Statements back to work. Comp 412 COMP 412 FALL Chapters 4, 6 & 7 in EaC2e. source code. IR IR target.

The ILOC Simulator User Documentation

The ILOC Simulator User Documentation

Parsing II Top-down parsing. Comp 412

Building a Parser Part III

Combining Optimizations: Sparse Conditional Constant Propagation

Instruction Selection, II Tree-pattern matching

Syntax Analysis, III Comp 412

CS 406/534 Compiler Construction Putting It All Together

CS 432 Fall Mike Lam, Professor. Compilers. Advanced Systems Elective

Instruction Selection: Peephole Matching. Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.

Syntax Analysis, III Comp 412

Code Shape II Expressions & Assignment

Procedure and Function Calls, Part II. Comp 412 COMP 412 FALL Chapter 6 in EaC2e. target code. source code Front End Optimizer Back End

Syntax Analysis, VII One more LR(1) example, plus some more stuff. Comp 412 COMP 412 FALL Chapter 3 in EaC2e. target code.

Introduction to Optimization Local Value Numbering

Naming in OOLs and Storage Layout Comp 412

Sustainable Memory Use Allocation & (Implicit) Deallocation (mostly in Java)

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

CS /534 Compiler Construction University of Massachusetts Lowell

Lecture 7. Instruction Scheduling. I. Basic Block Scheduling II. Global Scheduling (for Non-Numeric Code)

Computing Inside The Parser Syntax-Directed Translation. Comp 412

Unit 10: Data Structures CS 101, Fall 2018

Lexical Analysis. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Background: Pipelining Basics. Instruction Scheduling. Pipelining Details. Idealized Instruction Data-Path. Last week Register allocation

ECE573 Introduction to Compilers & Translators

CS415 Compilers Register Allocation. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Runtime Support for OOLs Object Records, Code Vectors, Inheritance Comp 412

CS415 Compilers Overview of the Course. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

CSE 373 Winter 2009: Midterm #1 (closed book, closed notes, NO calculators allowed)

Overview of the Class

fast code (preserve flow of data)

CS 4700: Artificial Intelligence

Handling Assignment Comp 412

Arrays and Functions

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

CS415 Compilers. Lexical Analysis

Lecture 12: Compiler Backend

Transla'on Out of SSA Form

CSE P 501 Compilers. Intermediate Representations Hal Perkins Spring UW CSE P 501 Spring 2018 G-1

Lexical Analysis - An Introduction. Lecture 4 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Week 4 Lecture Notes Part 1 Blind Search cont. and Informed Search

The Processor Memory Hierarchy

CSE413: Programming Languages and Implementation Racket structs Implementing languages with interpreters Implementing closures

Assignment 5: Priority Queue

CISC 3130 Data Structures Spring 2018

ANITA S SUPER AWESOME RECITATION SLIDES

CLOVIS WEST DIRECTIVE STUDIES P.E INFORMATION SHEET

MLR Institute of Technology

Class Information ANNOUCEMENTS

ECE250: Algorithms and Data Structures Midterm Review

Habanero Extreme Scale Software Research Project

Lecture 18 List Scheduling & Global Scheduling Carnegie Mellon

CoSci 440 SYLLABUS Programming in C++ INSTRUCTOR Mari Rettke cell SECTION : and 13345

Instruction scheduling. Advanced Compiler Construction Michel Schinz

Optimizer. Defining and preserving the meaning of the program

An Introduction to Trees

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

BINARY SEARCH TREES CS 200 RECITATION 7

Just-In-Time Compilers & Runtime Optimizers

6. Intermediate Representation

Other conditional and loop constructs. Fundamentals of Computer Science Keith Vertanen

UG3 Compiling Techniques Overview of the Course

Lecture Notes on Common Subexpression Elimination

CS4961 Parallel Programming. Lecture 12: Advanced Synchronization (Pthreads) 10/4/11. Administrative. Mary Hall October 4, 2011

CSE 373 Spring 2010: Midterm #1 (closed book, closed notes, NO calculators allowed)

Engineering a Compiler

Instruction Scheduling

Global Register Allocation via Graph Coloring

Recitation #12 Malloc Lab - Part 2. November 14th, 2017

Introduction & First Content Comp 412

Agenda. CSE P 501 Compilers. Big Picture. Compiler Organization. Intermediate Representations. IR for Code Generation. CSE P 501 Au05 N-1

Transcription:

COMP 412 FALL 2018 Lab 3, Tutorial 1 Comp 412 source code IR IR Front End Optimizer Back End target code Copyright 2018, Keith D. Cooper, Linda Torczon & Zoran Budimlić, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit permission to make copies of these materials for their personal use. Faculty from other educational institutions may use these materials for nonprofit educational purposes, provided this copyright notice is preserved.

Lab 3 Schedule Sunday Monday Tuesday Wednesday Thursday Friday Saturday Oct 21 22 23 24 25 26 You are here Docs Available Lab Lecture 28 29 30 31 1 2 Start work 27 3 Tutorial 4PM Complete & Correct Dependence Graph 4 5 6 7 8 9 10 Correct Schedule 11 12 13 14 15 16 17 The goal is here Improve Schedules 18 19 20 Lab 3 Due Date 21 No Class (Walk) 22 23 24 Thanksgiving Break Late day Not a late day Not a late day Not a late day COMP 412, Fall 2018 1

Lab 3 Builds On Lab 2 There are a number of similarities between lab 2 & lab 3 Same set of ILOC operations With some new twists And, has a purpose Same set of tools Lab 3 Reference Implementation Lab 3 ILOC Simulator It seems obvious, but you must use the lab 3 simulator, not the lab 2 simulator Same kind of testing scripts Make a copy, and change the defined variable SCHED in SchedHelper By default, the scripts run lab3_ref A large library of ILOC test blocks Poke around in ~comp412/students/lab3 on CLEAR COMP 412, Fall 2018 2

Lab 3 Differs From Lab 2 Lab 3 tools are different than the lab 2 tools ILOC now has a notation for starting two ops in a single cycle [ op 1 ; op 2 ] tells the simulator to start op 1 and op 2 in the same cycle They start at the same time; they finish when they are done ILOC used as input to your scheduler has only one operation per instruction At most 1 load or store, and at most 1 mult in a given instruction Simulator dispatches the operations to the appropriate functional unit Simulator has a flag that sets the interlock mode Use -s 3 to determine number of cycles for original code Tells the simulator to enforce all interlocks (& is the default setting) Use s 1 when testing your scheduler s effectiveness & output code s speed Tells the simulator to ignore interlocks based on values (& must be specified) COMP 412, Fall 2018 3

Lab 3: The Big Picture The key steps in your scheduler should be: 1. Scan, parse, & rename the input ILOC code (in order) 2. Build a dependence graph for the block being scheduled 3. Compute priorities for each operation 4. Schedule the code in a way that preserves the dependences Algorithms You implemented the front end for the lab 2 code check Best practice: reuse that code (& wish you had commented it) Dependence Graph: see lecture notes from lab 3 primer (e.g., L27, slide 14) Priorities: lots of variation possible. (see Gibbons & Muchnick, or slide 17) Common schemes: latency-weighted depth, breadth first, depth first Scheduler: most people build greedy list schedulers (e.g., slide 19) COMP 412, Fall 2018 Read 12.3 in EaC2e 4

Scary Admonition Warning: Your scheduler must not perform optimizations other than instruction scheduling and register renaming. It must not remove useless (dead) instructions; it must not fold constants. Any instruction left in the code by the optimizer is presumed to have a purpose. Assume that the optimizer had a firm basis for not performing the optimization. Your lab should schedule the instructions, not search for other opportunities for improvement. This lab is an investigation of instruction scheduling, not a project in general optimization algorithms. Additionally, your scheduler must not rely on input constants specified in comments. When the TAs grade your scheduler, they are not required to use the input specified in the comments. Your scheduler is expected to generate correct code regardless of whether or not the TAs use the input specified in the comments. If the result of an op is not used, you cannot delete the op. COMP 412, Fall 2018 5

Dependence For operations other than load, store, & output, dependence is straightforward Each use is has an edge to the operation that defines it Definition is the edge s sink, use is the edge s source Each edge has a latency equal to that of its sink operation 1 loadi 8 => r1 2 loadi 12 => r2 3 mult r1, r2 => r3 4 add r1,r3 => r4 Original Code 4: add vr1, vr2 => vr0 prio: <0,0,0> sr1, 1 1: loadi 8 => vr1 prio: <4,4,2> sr2, 3 root 3: mult vr1, vr3 => vr2 prio: <3,3,1> sr1, 1 sr3, 1 2: loadi 12 => vr3 prio: <4,4,2> Dependence Graph leaves [loadi 12 => r3 ; loadi 8 => r1] [ ; multi r1,r3 => r2] [ ; ] [ ; ] [add r1, r2 => r4 ; ] Scheduled Code (lab3_ref) COMP Credits: 412, Schedule Fall 2018 and graph produced by lab3_ref s g on original code. Drawing by graphviz. 6

Dependence Graph Construction Create an empty map, M walk the block, top to bottom at each operation o: create a node for o if it defines VR i : set M(VR i ) to o for each VR j used in o, add an edge from o to the node in M(VR j ) if o is a load, store, or output operation, add edges to ensure serialization of memory ops Building the graph 1. M[*] 2. Create node for op 1 Set M(r1) to node 1 1 loadi 8 => r1 2 loadi 12 => r2 3 mult r1, r2 => r3 4 add r1,r3 => r4 Example 1 COMP 412, Fall 2018 7

Dependence Graph Construction Create an empty map, M walk the block, top to bottom at each operation o: create a node for o if it defines VR i : set M(VR i ) to o for each VR j used in o, add an edge from o to the node in M(VR j ) if o is a load, store, or output operation, add edges to ensure serialization of memory ops Building the graph 1. M[*] 2. Create node for op 1 Set M(r1) to node 1 3. Create node for op 2 Set M(r2) to node 2 1 loadi 8 => r1 2 loadi 12 => r2 3 mult r1, r2 => r3 4 add r1,r3 => r4 Example 1 2 COMP 412, Fall 2018 8

Dependence Graph Construction Create an empty map, M walk the block, top to bottom at each operation o: create a node for o if it defines VR i : set M(VR i ) to o for each VR j used in o, add an edge from o to the node in M(VR j ) if o is a load, store, or output operation, add edges to ensure serialization of memory ops Building the graph 1. M[*] 2. Create node for op 1 Set M(r1) to node 1 3. Create node for op 2 Set M(r2) to node 2 4. Create node for op 3 Set M(r3) to node 3 1 loadi 8 => r1 2 loadi 12 => r2 3 mult r1, r2 => r3 4 add r1,r3 => r4 Example 3 1 2 COMP 412, Fall 2018 9

Dependence Graph Construction Create an empty map, M walk the block, top to bottom at each operation o: create a node for o if it defines VR i : set M(VR i ) to o for each VR j used in o, add an edge from o to the node in M(VR j ) if o is a load, store, or output operation, add edges to ensure serialization of memory ops 1 loadi 8 => r1 2 loadi 12 => r2 3 mult r1, r2 => r3 4 add r1,r3 => r4 Example 3 1 2 Building the graph 1. M[*] 2. Create node for op 1 Set M(r1) to node 1 3. Create node for op 2 Set M(r2) to node 2 4. Create node for op 3 Set M(r3) to node 3 Op 3 uses r1 & r2, so add edges to M(r1) and M(r2) COMP 412, Fall 2018 10

Dependence Graph Construction Create an empty map, M walk the block, top to bottom at each operation o: create a node for o if it defines VR i : set M(VR i ) to o for each VR j used in o, add an edge from o to the node in M(VR j ) if o is a load, store, or output operation, add edges to ensure serialization of memory ops 1 loadi 8 => r1 2 loadi 12 => r2 3 mult r1, r2 => r3 4 add r1,r3 => r4 Example 4 3 1 2 Building the graph 1. M[*] 2. Create node for op 1 Set M(r1) to node 1 3. Create node for op 2 Set M(r2) to node 2 4. Create node for op 3 Set M(r3) to node 3 Op 3 uses r1 & r2, so add edges to M(r1) and M(r2) 5. Create node for op 4 Set M(r4) to node 4 COMP 412, Fall 2018 11

Dependence Graph Construction Create an empty map, M walk the block, top to bottom at each operation o: create a node for o if it defines VR i : set M(VR i ) to o for each VR j used in o, add an edge from o to the node in M(VR j ) if o is a load, store, or output operation, add edges to ensure serialization of memory ops 1 loadi 8 => r1 2 loadi 12 => r2 3 mult r1, r2 => r3 4 add r1,r3 => r4 Example 4 3 1 2 Building the graph 1. M[*] 2. Create node for op 1 Set M(r1) to node 1 3. Create node for op 2 Set M(r2) to node 2 4. Create node for op 3 Set M(r3) to node 3 Op 3 uses r1 & r2, so add edges to M(r1) and M(r2) 5. Create node for op 4 Set M(r4) to node 4 Op 4 uses r1 & r3, so add edges to M(r1) and M(r3) COMP 412, Fall 2018 12

The Scheduler Needs Some Additional Edges To ensure the correct flow of values, the scheduler needs edges that specify the relative ordering of loads, stores, and outputs First op Second op Same address Distinct addresses Unknown address(es) Acronym load load No conflict (after renaming) load or output store conflict no conflict conflict WAR store store conflict no conflict conflict WAW store load or output conflict no conflict conflict RAW output output need an edge to serialize the outputs conflict implies 1 st op must complete before 2 nd op issues an edge in dependence graph COMP 412, Fall 2018 13

Dependence Dependence relations among memory operations are more complex 1 loadi 8 => r1 2 loadi 12 => r2 3 add r1, r2 => r3 4 load r3 => r4 5 load r1 => r5 6 store r4 => r3 7 output 12 Original Code vr0, 1 3: add vr3, vr4 => vr0 prio: <11,11,3> vr4, 1 2: loadi 12 => vr4 prio: <12,12,4> 7: output 12 prio: <0,0,0> 6: store vr1 => vr0 prio: <5,5,1> 4: load vr0 => vr1 prio: <10,10,2> vr0, 1 IO Edge, 5 vr1, 5 IO Edge, 1 vr3, 1 5: load vr3 => vr2 prio: <10,10,2> vr3, 1 1: loadi 8 => vr3 prio: <12,12,4> Dependence Graph What is this edge? loadi 12 => r4 loadi 8 => r3 load r3 => r2 add r3, r4 => r0 load r0 => r1 store r1 => r0 output 12 Scheduled Code (lab3_ref) COMP 412, Fall 2018 14

From local scheduling lecture Dependence Graph Construction Build a dependence graph to capture critical relationships in the code Create an empty map, M walk the block, top to bottom at each operation o that defines VR i : create a node for o 1 set M(VR i ) to o for each VR j used in o, add an edge from o to the node in M(VR j ) if o is a load, store, or output operation, add edges to ensure serialization of memory ops * Building the Graph * O(n + m 2 ), where m is memory ops Explanatory Notes 1. o refers to both the operation in the original block and the node that represents it in the graph. The meaning should be clear by context. 2. At a minimum, in the absence of other information: load & output need an edge to the most recent store (full latency) output needs an edge to the most recent output (serialization) store needs an edge to the most recent store & output, as well as each previous load (serialization) If your lab tries to simplify the graph, it may need more edges for store & output nodes COMP 412, Fall 2018 15

Dependence Dependence relations among memory operations are more complex 1 loadi 8 => r1 2 loadi 12 => r2 3 add r1, r2 => r3 4 load r3 => r4 5 load r1 => r5 6 store r4 => r3 7 output 12 Why no IO edge here? It would be redundant. Original Code vr0, 1 3: add vr3, vr4 => vr0 prio: <11,11,3> vr4, 1 2: loadi 12 => vr4 prio: <12,12,4> 7: output 12 prio: <0,0,0> 6: store vr1 => vr0 prio: <5,5,1> 4: load vr0 => vr1 prio: <10,10,2> vr0, 1 IO Edge, 5 vr1, 5 IO Edge, 1 vr3, 1 5: load vr3 => vr2 prio: <10,10,2> vr3, 1 1: loadi 8 => vr3 prio: <12,12,4> Dependence Graph loadi 12 => r4 loadi 8 => r3 load r3 => r2 add r3, r4 => r0 load r0 => r1 store r1 => r0 output 12 Scheduled Code (lab3_ref) COMP 412, Fall 2018 16

Dependence Among Memory Operations The key point is simple: The original code is the specification for the computation The order of operations in the original code is correct Any legal schedule must produce equivalent results Consequences Loads & outputs that precede a store in the original code must precede it in the scheduled code Unless the scheduler can prove that the operations are disjoint Store never touches the same memory location as the load or output Stores that precede a load or output in the original code must precede them in the scheduled code Unless the scheduler can prove that the operations are disjoint Outputs must be strictly ordered, according to the original code The order of execution of outputs defines correctness COMP 412, Fall 2018 17

Dependence Among Memory Operations The key point is simple: The original code is the specification for the computation The order of operations in the original code is correct Any legal schedule must produce equivalent results Consequences To preserve these relationships, you will add a lot of edges to the graph Those edges constrain the schedule, of course If they constrain the schedule needlessly, that hurts performance Many students implement some kind of graph simplification phase Limits on what you can do (arbitrary, dictatorial limits) Limits on what the analysis can detect First: get the dependence graph correct & the scheduler working Second: simplify the graph COMP 412, Fall 2018 Next week s tutorial will look at graph simplification. 18

Representing the Graph Appendix B.3 in EaC2e describes several graph representations lab3_ref uses the tabular representation on page 747 It allows easy traversals in either direction requires no actual pointers Node Table Name Succ Pred n 1 e 1 n 2 e 4 n 3 e 3 e 2 n 4 e 1 4: add vr1, vr2 => vr0 prio: <0,0,0> e 1 sr1, 1 1: loadi 8 => vr1 prio: <4,4,2> e 2 sr2, 3 3: mult vr1, vr3 => vr2 prio: <3,3,1> e 3 e 4 sr1, 1 sr3, 1 2: loadi 12 => vr3 prio: <4,4,2> Edge Table Name Source Sink Next Succ Next Pred e 1 n 4 n 1 e 2 e 3 e 2 n 4 n 3 e 3 n 3 n 1 e 4 e 4 n 3 n 2 COMP 412, Fall 2018 19

Implementation Plan Three Key Steps Rename to virtual registers to avoid anti-dependences and constraints based on source register names (reuse code from lab 2) Build the dependence graph Specifics of serialization edges may depend on whether or not your scheduler tries to simplify the graph Schedule the operations One Significant Optimization Simplify the graph by eliminating IO serialization edges Prove that the memory addresses of IO operations are distinct Track loadi values, as in rematerialization Some students will go so far as to track values computed from rematerializable values COMP 412, Fall 2018 20

Implementing the Scheduler Straightforward to implement, hard to get performance Cycle 1 Ready leaves of D Active Ø while (Ready È Active ¹ Ø) for each functional unit, f, do if 1 or more op in Ready for f then remove highest prio. op for f from Ready S(op) Cycle Active Active È op Cycle Cycle + 1 for each op in Active if (S(op) + delay(op) Cycle) then remove op from Active add the ready successors of op to Ready if ((op is load or store) & S(op) = Cycle -1) add the ready store succs of op to Ready Implementation Choices Is the Ready queue a priority queue or a list or a set of lists? Priority queue is asymptotically faster, don t know if it gets long enough to see the difference Separate queues for loads & stores, multiplies, & all other operations? Is Active one queue or many lists? Fundamental tradeoff in time, space, & ease of programming How does your lab break ties? Many, many options How does your lab update check for ready successors? COMP 412, Fall 2018 21 COMP 412, Fall 2018

Debugging The instruction scheduler is complex Use graphviz to look at the dependence graph Only useful on small graphs Most of the report blocks are small enough for graphviz COMP 412, Fall 2018 22

Dot File digraph DG { 18 [label="18: loadi 1024 => vr2 prio: <21,21,5>"]; 19 [label="19: loadi 1024 => vr0 prio: <16,16,4>"]; 20 [label="20: loadi 100 => vr4 prio: <21,21,5>"]; 21 [label="21: loadi 200 => vr3 prio: <16,16,4>"]; 23 [label="23: store vr4 => vr2 prio: <20,20,4>"]; 24 [label="24: store vr3 => vr0 prio: <15,15,3>"]; 26 [label="26: load vr2 => vr1 prio: <10,10,2>"]; 27 [label="27: store vr1 => vr0 prio: <5,5,1>"]; 30 [label="30: output 1024 prio: <0,0,0>"]; 23 -> 20 [ label=" vr4, 1"]; 23 -> 18 [ label=" vr2, 1"]; 24 -> 21 [ label=" vr3, 1"]; 24 -> 19 [ label=" vr0, 1"]; 24 -> 23 [ label=" IO Edge, 1"]; 26 -> 18 [ label=" vr2, 1"]; 26 -> 24 [ label=" IO Edge, 5"]; 27 -> 26 [ label=" vr1, 5"]; 27 -> 19 [ label=" vr0, 1"]; 27 -> 24 [ label=" IO Edge, 1"]; 30 -> 27 [ label=" IO Edge, 5"]; } COMP 412, Fall 2018 23

Generating A Dot File That seems like a lot of work! It is actually quite simple 41 lines of C code in lab3_ref void DGDump() { struct Node *nptr; struct Edge *eptr; fprintf(dotfile,"digraph DG {\n"); DGNodeWalker(ListOfNodes); DGEdgeWalker(ListOfEdges); fprintf(dotfile,"}\n"); fflush(dotfile); } static void DGNodeWalker(struct Node *np) { /* DGDump helper fn */ if (np!= DGNullNode) { DGNodeWalker(np->NextNode); fprintf(dotfile," %d [label=\"%d: ", np->op->line,np->op->line); PrintOp(np->Op); if (PrintPriorities!= 0) fprintf(dotfile,"\nprio: <%d,%d,%d>",np->priority,np->prio2, np->prio3); fprintf(dotfile,"\"];\n"); } } static void DGEdgeWalker(struct Edge *ep) { /* DGDump helper fn */ int latency; if (ep!= DGNullEdge) { DGEdgeWalker(ep->NextEdge); if (ep->vreg!= DELETED_EDGE) { fprintf(dotfile," %d -> %d",ep->from->op->line, ep->to->op->line); if (ep->vreg > -1) fprintf(dotfile," [ label=\" sr%d, %d\"];\n, ep->vreg,ep->to->latency); else { if (ep->from->op->opcode == STORE) latency = 1; else latency = ep->to->latency; fprintf(dotfile," [ label=\" IO Edge, %d\"];\n",latency); } } } fflush(dotfile); } COMP 412, Fall 2018 24

Debugging The instruction scheduler is complex Use graphviz to look at the dependence graph Only useful on small graphs Most of the report blocks are small enough for graphviz Write a script that tests it, routinely, against the available blocks The report blocks are pretty good; the contributed blocks are also good. Checkpoint it regularly GitHub is your friend: Create branches for different heuristics Use the trace option on the simulator to understand executions Compare the values that reach specific operations Program defensively write code that checks your internal state Exhaustive graph-checker to verify that the graph was well formed; it tracked down two subtle bugs in the reference implementation. Dump dot files both before & after running the simplifier COMP 412, Fall 2018 25 STOP