Compiler: Control Flow Optimization

Size: px

Start display at page:

Download "Compiler: Control Flow Optimization"

Bartholomew Evans
5 years ago
Views:

Compiler: Control Flow Optimization Virendra Singh Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian

1 Compiler: Control Flow Optimization Virendra Singh Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay Advanced Topics in MNIT Lecture 4 (02 Oct 2015)

2 Compiler Backend Structure Improve code quality (machine independent opti Virtual to physical mapping and machine dependent optimization Control flow analysis Control flow optimization Dataflow analysis Dataflow optimization Instruction Selection Instruction Scheduling Register Allocation Machine Code Emission/Opti Branching structure Computation instructions Bind instrs to physical realizations Bind instrs to physical resources Bind virtual regs to physical regs 02 Oct

3 Control Flow Control transfer = branch (taken or fall- through) Control flow Branching behavior of an applica>on What sequences of instruc>ons can be executed Execu>on à Dynamic control flow Direc>on of a par>cular instance of a branch Predict, speculate, squash, etc. Compiler à Sta>c control flow Not execu>ng the program Input not known, so what could happen, worst case 02 Oct 2015 virendra@mnit 3

4 Regions Region: A collec>on of opera>ons that are treated as a single unit by the compiler Examples Basic block Procedure Body of a loop Proper>es Connected subgraph of opera>ons Control flow is the key parameter that defines regions Hierarchically organized Problem Basic blocks are too small (3-5 opera>ons) Hard to extract sufficient parallelism Procedure control flow too complex for many compiler transforms Plus only parts of a procedure are important (90/ rule) 02 Oct 2015 virendra@mnit 4

5 Regions Want Ø Intermediate sized regions with simple control flow Ø Bigger basic blocks would be ideal!! Ø Separate important code from less important Ø Op>mize frequently executed code at the expense of the rest Solu>on Ø Define new region types that consist of mul>ple BBs Ø Profile informa>on used in the iden>fica>on Ø Sequen>al control flow Ø Pretend the regions are basic blocks 02 Oct

6 Region Type 1 - Trace Trace - Linear collec>on of basic blocks that tend to execute in sequence Likely control flow path Acyclic (outer backedge ok) Side entrance branch into the middle of a trace Side exit branch out of the middle of a trace Compila>on strategy Compile assuming path occurs 0% of the >me Patch up side entrances and exits a_erwards Mo>vated by scheduling (i.e., trace scheduling) 90 BB2 BB Oct 2015 virendra@mnit

7 Linearizing a Trace (entry count) 90 (entry/ exit count) 80 BB (side exit) 20 (side entrance) (side exit) BB5 (side entrance) (exit count) 02 Oct 2015 virendra@mnit 7

8 Intelligent Trace Layout for better I-cache Performance BB2 Intra-procedural code placement Procedure positioning Procedure splitting trace1 trace 2 trace 3 BB5 Trace view The rest Procedure view 02 Oct 2015 virendra@mnit 8

9 Issues With Selecting Traces Acyclic Cannot go past a backedge Trace length Longer = beaer? Not always! On- trace Maximize on- trace Compile assuming on- trace is 0% (ie single BB) Tradeoff (heuris>c) Length Likelihood remain within the trace 90 BB2 BB Oct 2015 virendra@mnit

10 Trace Selection Algorithm i = 0; mark all BBs unvisited while (there are unvisited nodes) do seed = unvisited BB with largest execu>on freq trace[i] += seed mark seed visited current = seed /* Grow trace forward */ while (1) do next = best_successor_of(current) if (next == 0) then break trace[i] += next mark next visited current = next endwhile /* Grow trace backward analogously */ i++ endwhile 02 Oct 2015 virendra@mnit

11 Best Successor/Predecessor Node weight vs edge weight edge more accurate THRESHOLD Controls trace probability 60-70% found best Notes on this algorithm BB only allowed in 1 trace Cumula>ve probability ignored Min weight for seed to be chose (ie executed 0 >mes) best_successor_of(bb) e = control flow edge with highest probability leaving BB if (e is a backedge) then return 0 endif if (probability(e) <= THRESHOLD) then return 0 endif d = destination of e if (d is visited) then return 0 endif return d end procedure 02 Oct 2015 virendra@mnit

12 Trace Selection Example Find the traces. Assume a threshold probability of 60%. 20 BB BB Oct 2015 virendra@mnit 12

13 Traces are Nice, But Treat trace as a big BB Transform trace ignoring side entrance/exits Insert fixup code aka bookkeeping Side entrance fixup is more painful Some>mes not possible so transform not allowed Solu>on Eliminate side entrances The superblock is born 02 Oct 2015 virendra@mnit 90 BB2 BB5 90

14 Region Type 2 - Superblock Superblock - Linear collec>on of basic blocks that tend to execute in sequence in which control flow may only enter at the first BB Likely control flow path Acyclic (outer backedge ok) Trace with no side entrances Side exits s>ll exist Superblock forma>on Trace selec>on Eliminate side entrances 02 Oct 2015 virendra@mnit 90 BB2 BB5 90

15 Tail Duplication To eliminate all side entrances replicate the tail por>on of the trace Iden>fy first side entrance Replicate all BB from the target to the boaom Redirect all side entrances to the duplicated BBs Copy each BB only once Max code expansion = 2x- 1 where x is the number of BB in the trace Adjust profile informa>on 02 Oct 2015 virendra@mnit 90 BB2 BB5 90

16 Superblock Formation 90 BB2 BB BB BB Oct 2015 virendra@mnit 16

17 Issues with Superblocks Central tradeoff Side entrance elimina>on Compiler complexity Compiler effec>veness Code size increase Apply intelligently Most frequently executed BBs are converted to SBs Set upper limit on code expansion x are typical code expansion ra>os from SB forma>on 02 Oct BB BB

18 Predicated Execution Hardware mechanism that allows opera>ons to be condi>onally executed Add an addi>onal boolean source operand (predicate) ADD r1, r2, r3 if p1 if (p1 is True), r1 = r2 + r3 else if (p1 is False), do nothing (Add treated like a NOP) p1 referred to as the guarding predicate Predicated on True means always executed Omiaed predicated also means always executed Provides compiler with an alterna>ve to using branches to selec>vely execute opera>ons If statements in the source Realize with branches in the assembly code Could also realize with condi>onal instruc>ons Or use a combina>on of both 02 Oct 2015 virendra@mnit 18

19 Predicated Execution Example a = b + c if (a > 0) e = f + g else e = f / g h = i - j BB2 add a, b, c bgt a, 0, L1 div e, f, g jump L2 L1: add e, f, g L2: sub h, i, j BB2 p2 à BB2 p3 à Tradi>onal branching code add a, b, c if T p2 = a > 0 if T p3 = a <= 0 if T div e, f, g if p3 BB2 add e, f, g if p2 sub h, i, j if T Predicated code BB2 02 Oct 2015 virendra@mnit 19

20 What About Nested If-then-else s? a = b + c if (a > 0) if (a > 25) e = f + g else e = f * g else e = f / g h = i - j BB2 BB5 add a, b, c bgt a, 0, L1 div e, f, g jump L2 L1: bgt a, 25, L3 mpy e, f, g jump L2 L3: add e, f, g L2: sub h, i, j BB5 BB2 Tradi>onal branching code 02 Oct 2015 virendra@mnit 20

21 Nested If-then-else s No Problem a = b + c if (a > 0) if (a > 25) e = f + g else e = f * g else e = f / g h = i - j BB5 add a, b, c if T p2 = a > 0 if T p3 = a <= 0 if T div e, f, g if p3 p5 = a > 25 if p2 p6 = a <= 25 if p2 mpy e, f, g if p6 add e, f, g if p5 sub h, i, j if T BB2 BB5 Predicated code What do we assume to make this work?? if p2 is False, both p5 and p6 are False So, predicate sesng instruc>on should set result to False if guarding predicate is false!!! 02 Oct 2015 virendra@mnit 21

22 Benefits/Costs of Predicated Execution Benefits Remove branches (both condi>onal and uncondi>onal) Remove branch mispredic>ons Overlap execu>on of if- then- else statements Branches tend to sequen>alize opera>ons Predicates can be computed/used in parallel Costs Useless instruc>ons executed Code size (extra operand, can t fit into 32- bits) Possibly longer schedule lengths The real story Must be applied selec>vely or you get worse performance than not using it at all 02 Oct 2015 virendra@mnit 22

23 Benefits/Costs of Predicated Execution BB2 BB5 BB7 BB2 BB5 BB7 Benefits: - No branches, no mispredicts - Can freely reorder independent opera>ons in the predicated block - Overlap BB2 with BB5 and Costs (execute all paths) - worst case schedule length - worst case resources required 02 Oct 2015 virendra@mnit 23

24 Compare-to-Predicate Operations (CMPPs) How do we compute predicates Compare registers/literals like a branch would do Efficiency, code size, nested condi>onals, etc 2 targets for compu>ng taken/fall- through condi>ons with 1 opera>on p1, p2 = CMPP.cond.D1a.D2a (r1, r2) if p3 p1 = first des>na>on predicate p2 = second des>na>on predicate cond = compare condi>on (ie EQ, LT, GE, ) D1a = ac>on specifier for first des>na>on D2a = ac>on specifier for second des>na>on (r1,r2) = data inputs to be compared (ie r1 < r2) p3 = guarding predicate 02 Oct 2015 virendra@mnit 24

25 CMPP Action Specifiers Guarding predicate Compare Result UN UC ON OC AN AC UN/UC = Uncondi>onal normal/complement This is what we used in the earlier examples guard = 0, both outputs are 0 guard = 1, UN = Compare result, UC = opposite ON/OC = OR- type normal/complement AN/AC = AND- type normal/complement 02 Oct 2015 virendra@mnit 25

26 Thank You 02 Oct

EECS 583 Class 3 Region Formation, Predicated Execution

EECS 583 Class 3 Region Formation, Predicated Execution University of Michigan September 14, 2011 Reading Material Today s class» Trace Selection for Compiling Large C Applications to Microcode, Chang