: Advanced Compiler Design. 8.0 Instruc?on scheduling

Size: px

Start display at page:

Download ": Advanced Compiler Design. 8.0 Instruc?on scheduling"

Cassandra Beasley
5 years ago
Views:

1 6-80: Advanced Compiler Design 8.0 Instruc?on scheduling Thomas R. Gross Computer Science Department ETH Zurich, Switzerland

2 Overview 8. Instruc?on scheduling basics 8. Scheduling for ILP processors

3 8. Instruc?on scheduling basics Mo?va?on Problem formula?on The data dependence graph Instruc?on scheduling techniques List scheduling Overview Algorithm Priority func5ons Example

4 Instruc?on scheduling Input Output Source code Frontend IR Op5mizer IR Code Generator Machine program Input Output HIR Instruc5on Selec5on LLIR Instruc?on Scheduling reordered LLIR Register Alloca5on LLIR HIR: High-level IR LLIR: Low-level IR

5 5

6 Mo?va?on Or: why don t we just hand the code as-is to processor? Processor interprets instruc5ons load r ß MEM[r0] add r4 ß r, # load r ß MEM[r] add r ß r, # add r0 ß r0, #4 add r ß r, #4 mul r ß r, r4 store MEM[r], r add r ß r, #4

7 Mo?va?on Or: why don t we just hand the code as-is to processor? Processor interprets instruc5ons load r ß MEM[r0] add r4 ß r, # load r ß MEM[r] add r ß r, # add r0 ß r0, #4 add r ß r, #4 mul r ß r, r4 store MEM[r], r add r ß r, #4 LLIR Instruc?on Scheduler load r ß MEM[r0] add r4 ß r, # load r ß MEM[r] add r ß r, # add r0 ß r0, #4 add r ß r, #4 mul r ß r, r4 store MEM[r], r add r ß r, #4 reordered LLIR

8 Mo?va?on Or: why don t we just hand the code as-is to processor? Processor interprets instruc5ons load r ß MEM[r0] add r4 ß r, # load r ß MEM[r] add r ß r, # add r0 ß r0, #4 add r ß r, #4 mul r ß r, r4 store MEM[r], r add r ß r, #4 LLIR Copy load r ß MEM[r0] add r4 ß r, # load r ß MEM[r] add r ß r, # add r0 ß r0, #4 add r ß r, #4 mul r ß r, r4 store MEM[r], r add r ß r, #4 reordered LLIR

9 Mo?va?on Assume a target machine with the following proper?es Pipelined with forwarding, single issue, in-order Opera5on latencies: Examples add, sub: cycle mul, load: cycles store: cycle load r ß MEM[r0] add r4 ß r, # add r ß r, #4 mul r ß r, r4 add r5 ß r, r4 store MEM[r], r5

10 Mo?va?on Execu?ng the example code load r ß MEM[r0] add r4 ß r, # load r ß MEM[r] add r ß r, # add r0 ß r0, #4 add r ß r, #4 mul r ß r, r4 store MEM[r], r add r ß r, #4

11 Mo?va?on Execu?ng the example code load r ß MEM[r0] add r4 ß r, # load r ß MEM[r] add r ß r, # add r0 ß r0, #4 add r ß r, #4 mul r ß r, r4 store MEM[r], r add r ß r, #4 cycle operation load r ß MEM[r0] 4 add r4 ß r, # 5 load r ß MEM[r] add r ß r, # 9 add r0 ß r0, #4 0 add r ß r, #4 mul r ß r, r4 4 store MEM[r], r 5 add r ß r, #4

12 Mo?va?on Can we do beuer? load r ß MEM[r0] add r4 ß r, # load r ß MEM[r] add r ß r, # add r0 ß r0, #4 add r ß r, #4 mul r ß r, r4 store MEM[r], r add r ß r, #4 cycle operation Target machine proper5es pipelined with forwarding, single issue, in-order opera5on latencies: add, sub: cycle; mul, load: cycles; store: cycle

13 Mo?va?on Can we do beuer? load r ß MEM[r0] add r4 ß r, # load r ß MEM[r] add r ß r, # add r0 ß r0, #4 add r ß r, #4 mul r ß r, r4 store MEM[r], r add r ß r, #4 cycle operation load r ß MEM[r0] add r0 ß r0, #4 4 add r4 ß r, # 5 load r ß MEM[r] 6 add r ß r, #4 7 8 add r ß r, # 9 mul r ß r, r4 0 store MEM[r], r add r ß r, #4 Target machine proper5es pipelined with forwarding, single issue, in-order opera5on latencies: add, sub: cycle; mul, load: cycles; store: cycle

14 Mo?va?on Can we do even beuer? Registers constrain schedule load r ß MEM[r0] add r4 ß r, # load r ß MEM[r] If st instruc5on loads into r4 nd load could start earlier load r4 ß MEM[r0] add r4 ß r4, # load r ß MEM[r]

15 Mo?va?on Can we do even beuer? Registers constrain schedule load r ß MEM[r0] add r4 ß r, # load r ß MEM[r] If st instruc5on loads into r4 nd load could start earlier load r4 ß MEM[r0] add r4 ß r4, # load r ß MEM[r] cycle operation load r4 ß MEM[r0] load r ß MEM[r] add r0 ß r0, #4 4 add r4 ß r4, #

16 Mo?va?on Can we do even beuer? Registers constrain schedule load r4 ß MEM[r0] add r4 ß r4, # load r ß MEM[r] add r ß r, # add r0 ß r0, #4 add r ß r, #4 mul r ß r, r4 store MEM[r], r add r ß r, #4 cycle operation Target machine proper5es pipelined with forwarding, single issue, in-order opera5on latencies: add, sub: cycle; mul, load: cycles; store: cycle

17 Mo?va?on Can we do even beuer? Registers constrain schedule load r4 ß MEM[r0] add r4 ß r4, # load r ß MEM[r] add r ß r, # add r0 ß r0, #4 add r ß r, #4 mul r ß r, r4 store MEM[r], r add r ß r, #4 cycle operation load r4 ß MEM[r0] load r ß MEM[r] add r0 ß r0, #4 4 add r4 ß r4, # 5 add r ß r, # 6 mul r ß r, r4 7 add r ß r, #4 8 9 store MEM[r], r 0 add r ß r, #4 Target machine proper5es pipelined with forwarding, single issue, in-order opera5on latencies: add, sub: cycle; mul, load: cycles; store: cycle

18 Comparison cycle operation load r ß MEM[r0] 4 add r4 ß r, # 5 load r ß MEM[r] add r ß r, # 9 add r0 ß r0, #4 0 add r ß r, #4 mul r ß r, r4 4 store MEM[r], r 5 add r ß r, #4 versus cycle operation load r4 ß MEM[r0] load r ß MEM[r] add r0 ß r0, #4 4 add r4 ß r4, # 5 add r ß r, # 6 mul r ß r, r4 7 add r ß r, #4 8 9 store MEM[r], r 0 add r ß r, #4

19 Comparison cycle operation load r ß MEM[r0] 4 add r4 ß r, # 5 load r ß MEM[r] add r ß r, # 9 add r0 ß r0, #4 0 add r ß r, #4 mul r ß r, r4 4 store MEM[r], r 5 add r ß r, #4 versus cycle operation load r4 ß MEM[r0] load r ß MEM[r] add r0 ß r0, #4 4 add r4 ß r4, # 5 add r ß r, # 6 mul r ß r, r4 7 add r ß r, #4 8 9 store MEM[r], r 0 add r ß r, #4 % Improvement (Code size)

20 Food for thought Is this schedule the best we can auain (if we are willing to reconsider other register assignments)? If another register rx is free, use it instead of r cycle operation load r4 ß MEM[r0] load r ß MEM[r] add r0 ß r0, #4 4 add r4 ß r4, # 5 add r ß r, # 6 mul r ß r, r4 7 add r ß r, #4 8 9 store MEM[r], r 0 add r ß r, #4 0

21 Instruc?on scheduling Scheduling: define the order of interpreta?on (Instruc?on) scheduling: define the order in which instruc?ons are presented to the CPU for processing CPU may decide to execute instruc5ons in a different order out of order execu5on Beyond control of compiler Read the fine print! Some instruc?ons specify mul?ple opera?ons Or the CPU may fetch mul5ple instruc5ons ILP: Instruc5on-Level Parallelism Scheduling of opera5ons à instruc5ons

22 Why do processors change the order of execu?on Rela5ve to order of fetch/interpreta5on? Why should the compiler bother to implement scheduling if the processor re-orders instruc?ons? Can the compiler handle all cases

23 Instruc?on scheduling Input Output Source code Frontend IR Op5mizer IR Code Generator Machine program Input Output HIR Instruc5on Selec5on LLIR Instruc?on Scheduling reordered LLIR Register Alloca5on LLIR HIR: High-level IR LLIR: Low-level IR

24 Code genera?on is easy 4

25 Code genera?on is easy as long as the code generator includes only two tasks from {instruc?on selec?on, scheduling, register alloca?on} 5

26 Instruc?on scheduling Schedule: par?ally ordered list of instruc?ons Par5al order determined by resource usage What is a good schedule? Main constraint Preserve meaning of the code (control flow, data flow) Metrics Typical: shortest in terms of execu5on 5me Desired: Shortest execu5on 5me Some5mes: conserve energy

27 Good schedules Metrics O`en can t predict execu5on 5me (data or context dependent) Varies for different processor implementa5ons May not be published Hidden conflicts (in decoding, address transla5on,.) Memory system performance difficult to model Addi?onal constraints imposed by H/W proper?es Opera5on latencies Processor pipeline # of func5onal units (FU) available # of registers Memory hierarchy (e.g., pre-fetching)

28 Data Dependence Graph DDG = (V, E) Nodes V represent each opera5on Augmented with Opera5on type Opera5on latency (delay) Edges E represent data dependences between opera5ons Forward (def-use) An5 (use-def) Output (def-def) Root nodes = no successors Leaf nodes = no predecessors Latencies on nodes or edges? Latency of an5/output dependences?

29 9

30 Root nodes Root nodes on bouom Successor: uses result generated 0

31 load r ß MEM[r0] add r4 ß r, # load r ß MEM[r] add r ß r, # add r0 ß r0, #4 add r ß r, #4 mul r ß r, r4 store MEM[r], r add r ß r, #4

32 Root nodes Root node on top Produces result that flows to consumer node

33 load r ß MEM[r0] add r4 ß r, # load r ß MEM[r] add r ß r, # add r0 ß r0, #4 add r ß r, #4 mul r ß r, r4 store MEM[r], r add r ß r, #4

34 r ß MEM[r0] r4 ß r + load r ß MEM[r0] add r4 ß r, # load r ß MEM[r] add r ß r, # add r0 ß r0, #4 add r ß r, #4 mul r ß r, r4 store MEM[r], r add r ß r, #4 r ß MEM[r] r ß r + r ß r4 * r MEM[r] ß r r0 ß r0 + 4 r ß r + 4 r ß r + 4 4

dependences that constrain the scheduler Can be eliminated by

35 Renaming Dealing with an?/output dependences An5/output dependences are ar5ficial dependences that constrain the scheduler Can be eliminated by renaming Effect on register pressure? Can we eliminate all an5/output dependences? r! MEM[r0] r4! r + r! MEM[r]

36 Instruc?on schedule S(n): n V t N + mapping from an opera?on n to an non-nega?ve integer t t denotes cycle when opera?on is processed by CPU t denotes instruc?on that contains opera?on n Constraints: S(n) (and at least one opera5on O with S(O) = ) If (n, n ) E then S(n ) + delay(n ) S(n ) For each t, there are no more opera5ons with S(n) = t than the H/W (resp. the instruc5on format) can support cycle t operation n load r4 ß MEM[r0] load r ß MEM[r] add r0 ß r0, #4 4 add r4 ß r4, # 5 add r ß r, # 6 mul r ß r, r4 7 add r ß r, #4 8 9 store MEM[r], r 0 add r ß r, #4

37 Length of a schedule Length of the schedule L(S) = max n V (S(n) + delay(n)) L(S) = cycle t operation n load r4 ß MEM[r0] load r ß MEM[r] add r0 ß r0, #4 4 add r4 ß r4, # 5 add r ß r, # 6 mul r ß r, r4 7 add r ß r, #4 8 9 store MEM[r], r 0 add r ß r, #4 7

38 Instruc?on schedule Path length: star5ng at the roots, annotate each node with its accumulated delay 5 r4 ß r4 + 8 r4 ß MEM[r0] Cri?cal path: longest path over all paths in the data dependence graph Shortest (minimal) schedule cannot be shorter than the length of the cri?cal path 4 8 r ß MEM[r] r ß r + r ß r4 * r MEM[r] ß r 5 0 r ß r r0 ß r0 + 4 r ß r + 4

39 Finding a mapping Given a DDG, mapping can be found Forward: start at leaves (producers) Backward: start at root (consumers) 9

40 Instruc?on scheduling techniques Local instruc?on scheduling: scheduling of one DDG Local scheduling is an NP-complete problem (scheduling à job shop scheduling à TSP)

41 Cycle vs. opera?on scheduling add mul? sub load store add sub mul load store Opera?on scheduling more powerful than cycle-based scheduling in the presence of long-latency opera?ons However, much more complicated to implement

42 Linear vs. graph-based techniques Linear techniques Run5me O(n) Produces the schedule by one or more passes over the input LLIR Most common technique: cri5cal-path scheduling Three passes: ASAP, ALAP, non-cri5cal opera5ons Limita5on: unable to consider global proper5es of opera5ons Graph-based techniques Run5me: O(n ) for DAG crea5on plus scheduling Prevalent technique: list scheduling (O(n log n)) Greedy: select one opera5on and schedule it

43 List scheduling Prevalent scheduling heuris?cs are based on list scheduling Method Rename (op5onal) Build data dependence graph Assign priori5es to opera5ons Itera5vely select and schedule an opera5on

44 List scheduling t := ready := { leaves of DDG } active := {} while (ready active {}) do for each operation o in active do if (S(o) + delay(o) < t) then active := active {o} for each successor s of o in DDG do if (s is ready) then ready := ready {s} if (ready {}) then o := pick the operation from ready with the highest priority if (o can be scheduled on the H/W units) then ready := ready {o} active := active {o} S(op) := t t := t + end

45 List scheduling t := ready := { leaves of DDG } active := {} while (ready active {}) do for each operation o in active do if (S(o) + delay(o) < t) then active := active {o} for each successor s of o in DDG do if (s is ready) then ready := ready {s} < or? if (ready {}) then o := pick the operation from ready with the highest priority if (o can be scheduled on the H/W units) then ready := ready {o} active := active {o} S(op) := t t := t + end

46 List scheduling Picking an opera?on from ready If ready never contains more than one opera5on, the generated schedule is op5mal If more than one opera5ons are ready, the choice of the next-to-bescheduled opera5on is cri5cal to the performance of the algorithm Pick the opera5on with the highest priority Most algorithms use several priori5es to break 5es How do we compute these priori?es?

47 47

48 Priority func?ons in list scheduling procedure ListScheduling( ); begin o := pick one operation from ready using some priority function; end; Common priority func?ons: Height: distance from exit node gives priority to amount of work le` Slackness: inverse of slack gives priority to opera5ons on the cri5cal path Register use: number of source operands reduces the number of live registers Uncover: fanout (number of children) frees up nodes quickly Original instruc5on order

49 List scheduling Priori?es based on the DDG Estart: earliest start 5me (ASAP as soon as possible) Lstart: latest start 5me (ALAP as late as possible) slack: scheduling freedom slack(op)=lstart(op) Estart(op)

50 List scheduling slack(op)=lstart(op) Estart(op)

51 List scheduling Compu?ng Estart, Lstart, slack Estart = latency = Lstart = Estart = latency = Lstart = Estart = latency = Lstart = exit Estart = latency = Lstart = Estart = latency = Lstart = Estart = latency = Lstart = Estart = latency = Lstart = Estart = latency = Lstart = Estart = latency = 0 Lstart =

52 Estart Estart = latency = Lstart = Estart = latency = Lstart = Estart = 8 latency = Lstart = exit Estart = 0 latency = Lstart = Estart = latency = Lstart = Estart = latency = Lstart = Estart = 6 latency = Lstart = Estart = 8 latency = Lstart = Estart = 0 latency = 0 Lstart =

53 Lstart Estart = latency = Lstart = Estart = latency = Lstart = 4 Estart = 8 latency = Lstart = exit Estart = 0 latency = Lstart = 0 Estart = latency = Lstart = Estart = latency = Lstart = Estart = 6 latency = Lstart = 6 Estart = 8 latency = Lstart = 8 Estart = 0 latency = 0 Lstart = 0

54 Slack slack(op)=lstart(op) Estart(op) slack = 0 Estart = 0 latency = Lstart = 0 Estart = latency = Lstart = 0 Estart = latency = Lstart = Estart = latency = Lstart = Estart = latency = Lstart = Estart = 8 latency = Lstart = Estart = 6 latency = Lstart = 6 Estart = 8 latency = Lstart = 8 0 exit Estart = 0 latency = 0 Lstart = 0

55 List scheduling Another way to look at the cri?cal path Sequence of cri5cal opera5ons Cri5cal opera5on: slack(op) = 0 slack = 0 Estart = 0 latency = Lstart = 0 Estart = latency = Lstart = 0 Estart = latency = Lstart = Estart = latency = Lstart = Estart = latency = Lstart = Estart = 8 latency = Lstart = Estart = 6 latency = Lstart = 6 0 Estart = 8 latency = Lstart = 8 0 exit Estart = 0 latency = 0 Lstart = 0

56 Priority func?on: height-based Height-based priority func?on Gives priority to amount of work le` priority(op) = Lstart(exit) Lstart(op) + Estart = latency = Lstart = Estart = latency = Lstart = 4 Estart = 8 latency = Lstart = exit Estart = 0 latency = Lstart = 0 Estart = latency = Lstart = Estart = latency = Lstart = Estart = 6 latency = Lstart = 6 Estart = 8 latency = Lstart = 8 Estart = 0 latency = 0 Lstart = 0 op exit priority

57 Priority func?on: height-based Height-based priority func?on Gives priority to amount of work le` priority(op) = Lstart(exit) Lstart(op) + Estart = latency = Lstart = Estart = latency = Lstart = 4 Estart = 8 latency = Lstart = exit Estart = 0 latency = Lstart = 0 Estart = latency = Lstart = Estart = latency = Lstart = Estart = 6 latency = Lstart = 6 Estart = 8 latency = Lstart = 8 Estart = 0 latency = 0 Lstart = 0 op priority exit

58 Example height-based 8 r4 ß MEM[r0] load r4 ß MEM[r0] add r4 ß r4, # load r ß MEM[r] 4 add r ß r, # 5 add r0 ß r0, #4 6 add r ß r, #4 7 mul r ß r, r4 8 store MEM[r], r 9 add r ß r, #4 5 r4 ß r4 + 8 r ß MEM[r] 5 r ß r r0 ß r0 + 4 r ß r + 4 Target machine proper5es pipelined with forwarding, single issue, in-order opera5on latencies: add, sub: cycle; mul, load: cycles; store: cycle 4 r ß r4 * r MEM[r] ß r 0 r ß r + 4

59 Example height-based load r4 ß MEM[r0] add r4 ß r4, # load r ß MEM[r] 4 add r ß r, # 5 add r0 ß r0, #4 6 add r ß r, #4 7 mul r ß r, r4 8 store MEM[r], r 9 add r ß r, #4 Estart = Lstart = Estart = Lstart = 5 0 Estart = Lstart = 4 Estart = Lstart = Estart = Lstart = 0 6 Estart = Lstart = 7 Estart = Lstart = Target machine proper5es pipelined with forwarding, single issue, in-order opera5on latencies: add, sub: cycle; mul, load: cycles; store: cycle Estart = Lstart = exit Estart = Lstart = Estart = Lstart =

60 Estart = 0 Lstart = 0 Estart = Lstart = 0 Estart = 0 Lstart = 0 4 Estart = Lstart = Estart = 0 Lstart = Estart = 0 Lstart = 7 7 Estart = 4 Lstart = Estart = 7 Lstart = 7 Estart = 7 Lstart = 7 9 exit Estart = 8 Lstart = 8 60

61 Estart = 0 Lstart = 0 Estart = Lstart = 0 Estart = 0 Lstart = 0 4 Estart = Lstart = Estart = 0 Lstart = Estart = 0 Lstart = 7 7 Estart = 4 Lstart = Estart = 7 Lstart = 7 Estart = 7 Lstart = 7 9 exit Estart = 8 Lstart = 8 6

62 op priority Estart = 0 Lstart = 0 Estart = 0 Lstart = Estart = Lstart = 4 Estart = 0 Lstart = 0 Estart = Lstart = exit Estart = 4 Lstart = 4 Estart = 7 Lstart = 7 6 Estart = 0 Lstart = 7 Estart = 7 Lstart = 7 9 exit Estart = 8 Lstart = 8 6

63 Estart = 0 Lstart = 0 Estart = Lstart = op priority Estart = Lstart = Estart = 0 Lstart = Estart = 0 Lstart = Estart = 4 Lstart = 4 Estart = 7 Lstart = 7 6 Estart = 0 Lstart = 7 9 exit Estart = 7 Lstart = 7 9 exit Estart = 8 Lstart = 8 6

64 Example con?nued Ini5aliza5on t := ready := {,, 5, 6 } active := {} op exit priority Itera5on: t =, ready = {,,5,6}, active = {} active is empty à pass ready is not empty o := (with priority 9) S() = S t op t =, ready = {,5,6}, active = {} active is not empty o = : S()+delay() t? no ready is not empty o := (with priority 9) ready := {5,6}, active = {,} S() = S t op

65 Example con?nued Itera5on: t =, ready = {5,6}, active = {,} active is not empty o = : S()+delay()? no o = : S()+delay()? No ready is not empty o := 5 (with priority ) ready := {6}, active = {,,5} S(5) = S t op 5 op exit priority t = 4, ready = {6}, active = {,,5} active is not empty o = : S()+delay() 4? yes active := active \ {} ready := ready {} o = : S()+delay() 4? No ready is not empty o := (with priority 6) ready := {6}, active = {,5,} S() = 4 S t op 5 4

66 Example con?nued op exit priority Itera5on: t = 5, ready = {6}, active = {,5,} active is not empty o = : S()+delay() 5? yes active := active \ {} ready := ready {4} o = 5: S(5)+delay(5) 4? yes active := active \ {5} no data dependent successors o = : S()+delay() 4? yes active := active \ {} successor (7) not ready due to 4 S t op ready is not empty o := 4 (with priority 6) ready := {6}, active = {4} S(4) = 5

67 Example -- con?nued op exit priority Itera5on: t = 6, ready = {6}, active = {4} active is not empty o = 4: S(4)+delay(4) 6? ne active := active \ {4} ready := ready {7} ready is not empty o := 7 (with priority 5) ready := {6}, active = {7} S(7) = 6 t = 7, ready = {6}, active = {7} active is not empty o = 7: S(7)+delay(7) 7? No ready is not empty o := 6 (with priority ) ready := {}, active = {6,7} S(6) = 7 S t op S t op

68 Example con?nued Itera5on: t = 8, ready = {}, active = {6,7} active is not empty o = 6: S(6)+delay(6) 8? yes active := active \ {6} no data dependent successors o = 7: S(7)+delay(7) 8? no S t op op exit priority ready is empty t = 9, ready = {}, active = {7} active is not empty o = 7: S(7)+delay(7) 9? yes active := active \ {7} ready := ready {8,9} ready is not empty o := 8 (with priority ) ready := {9}, active = {8} S(8) = 9 S t op

$yes active := active \ {8} no data dependent successors ready is not empty o := 9 (with priority ) ready := {}, active = {9} S(9) = 0 t =, ready = {}, active = {9} active is not empty o = 9:$

69 Example con?nued Itera5on: t = 0, ready = {9}, active = {8} active is not empty o = 8: S(8)+delay(8) 0? yes active := active \ {8} no data dependent successors ready is not empty o := 9 (with priority ) ready := {}, active = {9} S(9) = 0 t =, ready = {}, active = {9} active is not empty o = 9: S(9)+delay(9)? yes active := active \ {9} no data dependent successors done. ready is empty S t op S t op op exit priority

70 Classifica?on of scheduling techniques Direc5on Scheduling Flow analysis Search Scheduling unit Forward Backward Cycle Opera5ons Linear Graph Greedy Backtrack Acyclic Cyclic Basic block Trace Superblock 70

71 Scheduling & register alloca?on Phase ordering problem between instruc?on scheduling and register alloca?on (RA) Effects of the scheduler on RA The scheduler can use renaming to get rid of an5 dependences to obtain more freedom in scheduling The resul5ng overlap of previously constrained opera5ons may increase register pressure, Which, in turn, may force the register allocator to spill one more variable And vice versa (RA constrains the scheduler in a RA-first compiler) Combining scheduling and register alloca?on Poten5al to produce beter solu5ons Typically not done due to complexity

72 with thanks to Bernhard Egger for slide material 7

Register Alloca.on Deconstructed. David Ryan Koes Seth Copen Goldstein

Register Alloca.on Deconstructed David Ryan Koes Seth Copen Goldstein 12th Interna+onal Workshop on So3ware and Compilers for Embedded Systems April 24, 12009 Register Alloca:on Problem unbounded number