: Advanced Compiler Design. 8.0 Instruc?on scheduling

Size: px
Start display at page:

Download ": Advanced Compiler Design. 8.0 Instruc?on scheduling"

Transcription

1 6-80: Advanced Compiler Design 8.0 Instruc?on scheduling Thomas R. Gross Computer Science Department ETH Zurich, Switzerland

2 Overview 8. Instruc?on scheduling basics 8. Scheduling for ILP processors

3 8. Instruc?on scheduling basics Mo?va?on Problem formula?on The data dependence graph Instruc?on scheduling techniques List scheduling Overview Algorithm Priority func5ons Example

4 Instruc?on scheduling Input Output Source code Frontend IR Op5mizer IR Code Generator Machine program Input Output HIR Instruc5on Selec5on LLIR Instruc?on Scheduling reordered LLIR Register Alloca5on LLIR HIR: High-level IR LLIR: Low-level IR

5 5

6 Mo?va?on Or: why don t we just hand the code as-is to processor? Processor interprets instruc5ons load r ß MEM[r0] add r4 ß r, # load r ß MEM[r] add r ß r, # add r0 ß r0, #4 add r ß r, #4 mul r ß r, r4 store MEM[r], r add r ß r, #4

7 Mo?va?on Or: why don t we just hand the code as-is to processor? Processor interprets instruc5ons load r ß MEM[r0] add r4 ß r, # load r ß MEM[r] add r ß r, # add r0 ß r0, #4 add r ß r, #4 mul r ß r, r4 store MEM[r], r add r ß r, #4 LLIR Instruc?on Scheduler load r ß MEM[r0] add r4 ß r, # load r ß MEM[r] add r ß r, # add r0 ß r0, #4 add r ß r, #4 mul r ß r, r4 store MEM[r], r add r ß r, #4 reordered LLIR

8 Mo?va?on Or: why don t we just hand the code as-is to processor? Processor interprets instruc5ons load r ß MEM[r0] add r4 ß r, # load r ß MEM[r] add r ß r, # add r0 ß r0, #4 add r ß r, #4 mul r ß r, r4 store MEM[r], r add r ß r, #4 LLIR Copy load r ß MEM[r0] add r4 ß r, # load r ß MEM[r] add r ß r, # add r0 ß r0, #4 add r ß r, #4 mul r ß r, r4 store MEM[r], r add r ß r, #4 reordered LLIR

9 Mo?va?on Assume a target machine with the following proper?es Pipelined with forwarding, single issue, in-order Opera5on latencies: Examples add, sub: cycle mul, load: cycles store: cycle load r ß MEM[r0] add r4 ß r, # add r ß r, #4 mul r ß r, r4 add r5 ß r, r4 store MEM[r], r5

10 Mo?va?on Execu?ng the example code load r ß MEM[r0] add r4 ß r, # load r ß MEM[r] add r ß r, # add r0 ß r0, #4 add r ß r, #4 mul r ß r, r4 store MEM[r], r add r ß r, #4

11 Mo?va?on Execu?ng the example code load r ß MEM[r0] add r4 ß r, # load r ß MEM[r] add r ß r, # add r0 ß r0, #4 add r ß r, #4 mul r ß r, r4 store MEM[r], r add r ß r, #4 cycle operation load r ß MEM[r0] 4 add r4 ß r, # 5 load r ß MEM[r] add r ß r, # 9 add r0 ß r0, #4 0 add r ß r, #4 mul r ß r, r4 4 store MEM[r], r 5 add r ß r, #4

12 Mo?va?on Can we do beuer? load r ß MEM[r0] add r4 ß r, # load r ß MEM[r] add r ß r, # add r0 ß r0, #4 add r ß r, #4 mul r ß r, r4 store MEM[r], r add r ß r, #4 cycle operation Target machine proper5es pipelined with forwarding, single issue, in-order opera5on latencies: add, sub: cycle; mul, load: cycles; store: cycle

13 Mo?va?on Can we do beuer? load r ß MEM[r0] add r4 ß r, # load r ß MEM[r] add r ß r, # add r0 ß r0, #4 add r ß r, #4 mul r ß r, r4 store MEM[r], r add r ß r, #4 cycle operation load r ß MEM[r0] add r0 ß r0, #4 4 add r4 ß r, # 5 load r ß MEM[r] 6 add r ß r, #4 7 8 add r ß r, # 9 mul r ß r, r4 0 store MEM[r], r add r ß r, #4 Target machine proper5es pipelined with forwarding, single issue, in-order opera5on latencies: add, sub: cycle; mul, load: cycles; store: cycle

14 Mo?va?on Can we do even beuer? Registers constrain schedule load r ß MEM[r0] add r4 ß r, # load r ß MEM[r] If st instruc5on loads into r4 nd load could start earlier load r4 ß MEM[r0] add r4 ß r4, # load r ß MEM[r]

15 Mo?va?on Can we do even beuer? Registers constrain schedule load r ß MEM[r0] add r4 ß r, # load r ß MEM[r] If st instruc5on loads into r4 nd load could start earlier load r4 ß MEM[r0] add r4 ß r4, # load r ß MEM[r] cycle operation load r4 ß MEM[r0] load r ß MEM[r] add r0 ß r0, #4 4 add r4 ß r4, #

16 Mo?va?on Can we do even beuer? Registers constrain schedule load r4 ß MEM[r0] add r4 ß r4, # load r ß MEM[r] add r ß r, # add r0 ß r0, #4 add r ß r, #4 mul r ß r, r4 store MEM[r], r add r ß r, #4 cycle operation Target machine proper5es pipelined with forwarding, single issue, in-order opera5on latencies: add, sub: cycle; mul, load: cycles; store: cycle

17 Mo?va?on Can we do even beuer? Registers constrain schedule load r4 ß MEM[r0] add r4 ß r4, # load r ß MEM[r] add r ß r, # add r0 ß r0, #4 add r ß r, #4 mul r ß r, r4 store MEM[r], r add r ß r, #4 cycle operation load r4 ß MEM[r0] load r ß MEM[r] add r0 ß r0, #4 4 add r4 ß r4, # 5 add r ß r, # 6 mul r ß r, r4 7 add r ß r, #4 8 9 store MEM[r], r 0 add r ß r, #4 Target machine proper5es pipelined with forwarding, single issue, in-order opera5on latencies: add, sub: cycle; mul, load: cycles; store: cycle

18 Comparison cycle operation load r ß MEM[r0] 4 add r4 ß r, # 5 load r ß MEM[r] add r ß r, # 9 add r0 ß r0, #4 0 add r ß r, #4 mul r ß r, r4 4 store MEM[r], r 5 add r ß r, #4 versus cycle operation load r4 ß MEM[r0] load r ß MEM[r] add r0 ß r0, #4 4 add r4 ß r4, # 5 add r ß r, # 6 mul r ß r, r4 7 add r ß r, #4 8 9 store MEM[r], r 0 add r ß r, #4

19 Comparison cycle operation load r ß MEM[r0] 4 add r4 ß r, # 5 load r ß MEM[r] add r ß r, # 9 add r0 ß r0, #4 0 add r ß r, #4 mul r ß r, r4 4 store MEM[r], r 5 add r ß r, #4 versus cycle operation load r4 ß MEM[r0] load r ß MEM[r] add r0 ß r0, #4 4 add r4 ß r4, # 5 add r ß r, # 6 mul r ß r, r4 7 add r ß r, #4 8 9 store MEM[r], r 0 add r ß r, #4 % Improvement (Code size)

20 Food for thought Is this schedule the best we can auain (if we are willing to reconsider other register assignments)? If another register rx is free, use it instead of r cycle operation load r4 ß MEM[r0] load r ß MEM[r] add r0 ß r0, #4 4 add r4 ß r4, # 5 add r ß r, # 6 mul r ß r, r4 7 add r ß r, #4 8 9 store MEM[r], r 0 add r ß r, #4 0

21 Instruc?on scheduling Scheduling: define the order of interpreta?on (Instruc?on) scheduling: define the order in which instruc?ons are presented to the CPU for processing CPU may decide to execute instruc5ons in a different order out of order execu5on Beyond control of compiler Read the fine print! Some instruc?ons specify mul?ple opera?ons Or the CPU may fetch mul5ple instruc5ons ILP: Instruc5on-Level Parallelism Scheduling of opera5ons à instruc5ons

22 Why do processors change the order of execu?on Rela5ve to order of fetch/interpreta5on? Why should the compiler bother to implement scheduling if the processor re-orders instruc?ons? Can the compiler handle all cases

23 Instruc?on scheduling Input Output Source code Frontend IR Op5mizer IR Code Generator Machine program Input Output HIR Instruc5on Selec5on LLIR Instruc?on Scheduling reordered LLIR Register Alloca5on LLIR HIR: High-level IR LLIR: Low-level IR

24 Code genera?on is easy 4

25 Code genera?on is easy as long as the code generator includes only two tasks from {instruc?on selec?on, scheduling, register alloca?on} 5

26 Instruc?on scheduling Schedule: par?ally ordered list of instruc?ons Par5al order determined by resource usage What is a good schedule? Main constraint Preserve meaning of the code (control flow, data flow) Metrics Typical: shortest in terms of execu5on 5me Desired: Shortest execu5on 5me Some5mes: conserve energy

27 Good schedules Metrics O`en can t predict execu5on 5me (data or context dependent) Varies for different processor implementa5ons May not be published Hidden conflicts (in decoding, address transla5on,.) Memory system performance difficult to model Addi?onal constraints imposed by H/W proper?es Opera5on latencies Processor pipeline # of func5onal units (FU) available # of registers Memory hierarchy (e.g., pre-fetching)

28 Data Dependence Graph DDG = (V, E) Nodes V represent each opera5on Augmented with Opera5on type Opera5on latency (delay) Edges E represent data dependences between opera5ons Forward (def-use) An5 (use-def) Output (def-def) Root nodes = no successors Leaf nodes = no predecessors Latencies on nodes or edges? Latency of an5/output dependences?

29 9

30 Root nodes Root nodes on bouom Successor: uses result generated 0

31 load r ß MEM[r0] add r4 ß r, # load r ß MEM[r] add r ß r, # add r0 ß r0, #4 add r ß r, #4 mul r ß r, r4 store MEM[r], r add r ß r, #4

32 Root nodes Root node on top Produces result that flows to consumer node

33 load r ß MEM[r0] add r4 ß r, # load r ß MEM[r] add r ß r, # add r0 ß r0, #4 add r ß r, #4 mul r ß r, r4 store MEM[r], r add r ß r, #4

34 r ß MEM[r0] r4 ß r + load r ß MEM[r0] add r4 ß r, # load r ß MEM[r] add r ß r, # add r0 ß r0, #4 add r ß r, #4 mul r ß r, r4 store MEM[r], r add r ß r, #4 r ß MEM[r] r ß r + r ß r4 * r MEM[r] ß r r0 ß r0 + 4 r ß r + 4 r ß r + 4 4

35 Renaming Dealing with an?/output dependences An5/output dependences are ar5ficial dependences that constrain the scheduler Can be eliminated by renaming Effect on register pressure? Can we eliminate all an5/output dependences? r! MEM[r0] r4! r + r! MEM[r]

36 Instruc?on schedule S(n): n V t N + mapping from an opera?on n to an non-nega?ve integer t t denotes cycle when opera?on is processed by CPU t denotes instruc?on that contains opera?on n Constraints: S(n) (and at least one opera5on O with S(O) = ) If (n, n ) E then S(n ) + delay(n ) S(n ) For each t, there are no more opera5ons with S(n) = t than the H/W (resp. the instruc5on format) can support cycle t operation n load r4 ß MEM[r0] load r ß MEM[r] add r0 ß r0, #4 4 add r4 ß r4, # 5 add r ß r, # 6 mul r ß r, r4 7 add r ß r, #4 8 9 store MEM[r], r 0 add r ß r, #4

37 Length of a schedule Length of the schedule L(S) = max n V (S(n) + delay(n)) L(S) = cycle t operation n load r4 ß MEM[r0] load r ß MEM[r] add r0 ß r0, #4 4 add r4 ß r4, # 5 add r ß r, # 6 mul r ß r, r4 7 add r ß r, #4 8 9 store MEM[r], r 0 add r ß r, #4 7

38 Instruc?on schedule Path length: star5ng at the roots, annotate each node with its accumulated delay 5 r4 ß r4 + 8 r4 ß MEM[r0] Cri?cal path: longest path over all paths in the data dependence graph Shortest (minimal) schedule cannot be shorter than the length of the cri?cal path 4 8 r ß MEM[r] r ß r + r ß r4 * r MEM[r] ß r 5 0 r ß r r0 ß r0 + 4 r ß r + 4

39 Finding a mapping Given a DDG, mapping can be found Forward: start at leaves (producers) Backward: start at root (consumers) 9

40 Instruc?on scheduling techniques Local instruc?on scheduling: scheduling of one DDG Local scheduling is an NP-complete problem (scheduling à job shop scheduling à TSP)

41 Cycle vs. opera?on scheduling add mul? sub load store add sub mul load store Opera?on scheduling more powerful than cycle-based scheduling in the presence of long-latency opera?ons However, much more complicated to implement

42 Linear vs. graph-based techniques Linear techniques Run5me O(n) Produces the schedule by one or more passes over the input LLIR Most common technique: cri5cal-path scheduling Three passes: ASAP, ALAP, non-cri5cal opera5ons Limita5on: unable to consider global proper5es of opera5ons Graph-based techniques Run5me: O(n ) for DAG crea5on plus scheduling Prevalent technique: list scheduling (O(n log n)) Greedy: select one opera5on and schedule it

43 List scheduling Prevalent scheduling heuris?cs are based on list scheduling Method Rename (op5onal) Build data dependence graph Assign priori5es to opera5ons Itera5vely select and schedule an opera5on

44 List scheduling t := ready := { leaves of DDG } active := {} while (ready active {}) do for each operation o in active do if (S(o) + delay(o) < t) then active := active {o} for each successor s of o in DDG do if (s is ready) then ready := ready {s} if (ready {}) then o := pick the operation from ready with the highest priority if (o can be scheduled on the H/W units) then ready := ready {o} active := active {o} S(op) := t t := t + end

45 List scheduling t := ready := { leaves of DDG } active := {} while (ready active {}) do for each operation o in active do if (S(o) + delay(o) < t) then active := active {o} for each successor s of o in DDG do if (s is ready) then ready := ready {s} < or? if (ready {}) then o := pick the operation from ready with the highest priority if (o can be scheduled on the H/W units) then ready := ready {o} active := active {o} S(op) := t t := t + end

46 List scheduling Picking an opera?on from ready If ready never contains more than one opera5on, the generated schedule is op5mal If more than one opera5ons are ready, the choice of the next-to-bescheduled opera5on is cri5cal to the performance of the algorithm Pick the opera5on with the highest priority Most algorithms use several priori5es to break 5es How do we compute these priori?es?

47 47

48 Priority func?ons in list scheduling procedure ListScheduling( ); begin o := pick one operation from ready using some priority function; end; Common priority func?ons: Height: distance from exit node gives priority to amount of work le` Slackness: inverse of slack gives priority to opera5ons on the cri5cal path Register use: number of source operands reduces the number of live registers Uncover: fanout (number of children) frees up nodes quickly Original instruc5on order

49 List scheduling Priori?es based on the DDG Estart: earliest start 5me (ASAP as soon as possible) Lstart: latest start 5me (ALAP as late as possible) slack: scheduling freedom slack(op)=lstart(op) Estart(op)

50 List scheduling slack(op)=lstart(op) Estart(op)

51 List scheduling Compu?ng Estart, Lstart, slack Estart = latency = Lstart = Estart = latency = Lstart = Estart = latency = Lstart = exit Estart = latency = Lstart = Estart = latency = Lstart = Estart = latency = Lstart = Estart = latency = Lstart = Estart = latency = Lstart = Estart = latency = 0 Lstart =

52 Estart Estart = latency = Lstart = Estart = latency = Lstart = Estart = 8 latency = Lstart = exit Estart = 0 latency = Lstart = Estart = latency = Lstart = Estart = latency = Lstart = Estart = 6 latency = Lstart = Estart = 8 latency = Lstart = Estart = 0 latency = 0 Lstart =

53 Lstart Estart = latency = Lstart = Estart = latency = Lstart = 4 Estart = 8 latency = Lstart = exit Estart = 0 latency = Lstart = 0 Estart = latency = Lstart = Estart = latency = Lstart = Estart = 6 latency = Lstart = 6 Estart = 8 latency = Lstart = 8 Estart = 0 latency = 0 Lstart = 0

54 Slack slack(op)=lstart(op) Estart(op) slack = 0 Estart = 0 latency = Lstart = 0 Estart = latency = Lstart = 0 Estart = latency = Lstart = Estart = latency = Lstart = Estart = latency = Lstart = Estart = 8 latency = Lstart = Estart = 6 latency = Lstart = 6 Estart = 8 latency = Lstart = 8 0 exit Estart = 0 latency = 0 Lstart = 0

55 List scheduling Another way to look at the cri?cal path Sequence of cri5cal opera5ons Cri5cal opera5on: slack(op) = 0 slack = 0 Estart = 0 latency = Lstart = 0 Estart = latency = Lstart = 0 Estart = latency = Lstart = Estart = latency = Lstart = Estart = latency = Lstart = Estart = 8 latency = Lstart = Estart = 6 latency = Lstart = 6 0 Estart = 8 latency = Lstart = 8 0 exit Estart = 0 latency = 0 Lstart = 0

56 Priority func?on: height-based Height-based priority func?on Gives priority to amount of work le` priority(op) = Lstart(exit) Lstart(op) + Estart = latency = Lstart = Estart = latency = Lstart = 4 Estart = 8 latency = Lstart = exit Estart = 0 latency = Lstart = 0 Estart = latency = Lstart = Estart = latency = Lstart = Estart = 6 latency = Lstart = 6 Estart = 8 latency = Lstart = 8 Estart = 0 latency = 0 Lstart = 0 op exit priority

57 Priority func?on: height-based Height-based priority func?on Gives priority to amount of work le` priority(op) = Lstart(exit) Lstart(op) + Estart = latency = Lstart = Estart = latency = Lstart = 4 Estart = 8 latency = Lstart = exit Estart = 0 latency = Lstart = 0 Estart = latency = Lstart = Estart = latency = Lstart = Estart = 6 latency = Lstart = 6 Estart = 8 latency = Lstart = 8 Estart = 0 latency = 0 Lstart = 0 op priority exit

58 Example height-based 8 r4 ß MEM[r0] load r4 ß MEM[r0] add r4 ß r4, # load r ß MEM[r] 4 add r ß r, # 5 add r0 ß r0, #4 6 add r ß r, #4 7 mul r ß r, r4 8 store MEM[r], r 9 add r ß r, #4 5 r4 ß r4 + 8 r ß MEM[r] 5 r ß r r0 ß r0 + 4 r ß r + 4 Target machine proper5es pipelined with forwarding, single issue, in-order opera5on latencies: add, sub: cycle; mul, load: cycles; store: cycle 4 r ß r4 * r MEM[r] ß r 0 r ß r + 4

59 Example height-based load r4 ß MEM[r0] add r4 ß r4, # load r ß MEM[r] 4 add r ß r, # 5 add r0 ß r0, #4 6 add r ß r, #4 7 mul r ß r, r4 8 store MEM[r], r 9 add r ß r, #4 Estart = Lstart = Estart = Lstart = 5 0 Estart = Lstart = 4 Estart = Lstart = Estart = Lstart = 0 6 Estart = Lstart = 7 Estart = Lstart = Target machine proper5es pipelined with forwarding, single issue, in-order opera5on latencies: add, sub: cycle; mul, load: cycles; store: cycle Estart = Lstart = exit Estart = Lstart = Estart = Lstart =

60 Estart = 0 Lstart = 0 Estart = Lstart = 0 Estart = 0 Lstart = 0 4 Estart = Lstart = Estart = 0 Lstart = Estart = 0 Lstart = 7 7 Estart = 4 Lstart = Estart = 7 Lstart = 7 Estart = 7 Lstart = 7 9 exit Estart = 8 Lstart = 8 60

61 Estart = 0 Lstart = 0 Estart = Lstart = 0 Estart = 0 Lstart = 0 4 Estart = Lstart = Estart = 0 Lstart = Estart = 0 Lstart = 7 7 Estart = 4 Lstart = Estart = 7 Lstart = 7 Estart = 7 Lstart = 7 9 exit Estart = 8 Lstart = 8 6

62 op priority Estart = 0 Lstart = 0 Estart = 0 Lstart = Estart = Lstart = 4 Estart = 0 Lstart = 0 Estart = Lstart = exit Estart = 4 Lstart = 4 Estart = 7 Lstart = 7 6 Estart = 0 Lstart = 7 Estart = 7 Lstart = 7 9 exit Estart = 8 Lstart = 8 6

63 Estart = 0 Lstart = 0 Estart = Lstart = op priority Estart = Lstart = Estart = 0 Lstart = Estart = 0 Lstart = Estart = 4 Lstart = 4 Estart = 7 Lstart = 7 6 Estart = 0 Lstart = 7 9 exit Estart = 7 Lstart = 7 9 exit Estart = 8 Lstart = 8 6

64 Example con?nued Ini5aliza5on t := ready := {,, 5, 6 } active := {} op exit priority Itera5on: t =, ready = {,,5,6}, active = {} active is empty à pass ready is not empty o := (with priority 9) S() = S t op t =, ready = {,5,6}, active = {} active is not empty o = : S()+delay() t? no ready is not empty o := (with priority 9) ready := {5,6}, active = {,} S() = S t op

65 Example con?nued Itera5on: t =, ready = {5,6}, active = {,} active is not empty o = : S()+delay()? no o = : S()+delay()? No ready is not empty o := 5 (with priority ) ready := {6}, active = {,,5} S(5) = S t op 5 op exit priority t = 4, ready = {6}, active = {,,5} active is not empty o = : S()+delay() 4? yes active := active \ {} ready := ready {} o = : S()+delay() 4? No ready is not empty o := (with priority 6) ready := {6}, active = {,5,} S() = 4 S t op 5 4

66 Example con?nued op exit priority Itera5on: t = 5, ready = {6}, active = {,5,} active is not empty o = : S()+delay() 5? yes active := active \ {} ready := ready {4} o = 5: S(5)+delay(5) 4? yes active := active \ {5} no data dependent successors o = : S()+delay() 4? yes active := active \ {} successor (7) not ready due to 4 S t op ready is not empty o := 4 (with priority 6) ready := {6}, active = {4} S(4) = 5

67 Example -- con?nued op exit priority Itera5on: t = 6, ready = {6}, active = {4} active is not empty o = 4: S(4)+delay(4) 6? ne active := active \ {4} ready := ready {7} ready is not empty o := 7 (with priority 5) ready := {6}, active = {7} S(7) = 6 t = 7, ready = {6}, active = {7} active is not empty o = 7: S(7)+delay(7) 7? No ready is not empty o := 6 (with priority ) ready := {}, active = {6,7} S(6) = 7 S t op S t op

68 Example con?nued Itera5on: t = 8, ready = {}, active = {6,7} active is not empty o = 6: S(6)+delay(6) 8? yes active := active \ {6} no data dependent successors o = 7: S(7)+delay(7) 8? no S t op op exit priority ready is empty t = 9, ready = {}, active = {7} active is not empty o = 7: S(7)+delay(7) 9? yes active := active \ {7} ready := ready {8,9} ready is not empty o := 8 (with priority ) ready := {9}, active = {8} S(8) = 9 S t op

69 Example con?nued Itera5on: t = 0, ready = {9}, active = {8} active is not empty o = 8: S(8)+delay(8) 0? yes active := active \ {8} no data dependent successors ready is not empty o := 9 (with priority ) ready := {}, active = {9} S(9) = 0 t =, ready = {}, active = {9} active is not empty o = 9: S(9)+delay(9)? yes active := active \ {9} no data dependent successors done. ready is empty S t op S t op op exit priority

70 Classifica?on of scheduling techniques Direc5on Scheduling Flow analysis Search Scheduling unit Forward Backward Cycle Opera5ons Linear Graph Greedy Backtrack Acyclic Cyclic Basic block Trace Superblock 70

71 Scheduling & register alloca?on Phase ordering problem between instruc?on scheduling and register alloca?on (RA) Effects of the scheduler on RA The scheduler can use renaming to get rid of an5 dependences to obtain more freedom in scheduling The resul5ng overlap of previously constrained opera5ons may increase register pressure, Which, in turn, may force the register allocator to spill one more variable And vice versa (RA constrains the scheduler in a RA-first compiler) Combining scheduling and register alloca?on Poten5al to produce beter solu5ons Typically not done due to complexity

72 with thanks to Bernhard Egger for slide material 7

Register Alloca.on Deconstructed. David Ryan Koes Seth Copen Goldstein

Register Alloca.on Deconstructed. David Ryan Koes Seth Copen Goldstein Register Alloca.on Deconstructed David Ryan Koes Seth Copen Goldstein 12th Interna+onal Workshop on So3ware and Compilers for Embedded Systems April 24, 12009 Register Alloca:on Problem unbounded number

More information

Compiler Optimization Intermediate Representation

Compiler Optimization Intermediate Representation Compiler Optimization Intermediate Representation Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology

More information

Instruction scheduling. Advanced Compiler Construction Michel Schinz

Instruction scheduling. Advanced Compiler Construction Michel Schinz Instruction scheduling Advanced Compiler Construction Michel Schinz 2015 05 21 Instruction ordering When a compiler emits the instructions corresponding to a program, it imposes a total order on them.

More information

Instruction Scheduling

Instruction Scheduling Instruction Scheduling Michael O Boyle February, 2014 1 Course Structure Introduction and Recap Course Work Scalar optimisation and dataflow L5 Code generation L6 Instruction scheduling Next register allocation

More information

Lecture Compiler Backend

Lecture Compiler Backend Lecture 19-23 Compiler Backend Jianwen Zhu Electrical and Computer Engineering University of Toronto Jianwen Zhu 2009 - P. 1 Backend Tasks Instruction selection Map virtual instructions To machine instructions

More information

Instruction Selection and Scheduling

Instruction Selection and Scheduling Instruction Selection and Scheduling The Problem Writing a compiler is a lot of work Would like to reuse components whenever possible Would like to automate construction of components Front End Middle

More information

Compiler: Control Flow Optimization

Compiler: Control Flow Optimization Compiler: Control Flow Optimization Virendra Singh Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/

More information

CS415 Compilers. Instruction Scheduling and Lexical Analysis

CS415 Compilers. Instruction Scheduling and Lexical Analysis CS415 Compilers Instruction Scheduling and Lexical Analysis These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University Instruction Scheduling (Engineer

More information

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 32: Pipeline Parallelism 3

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 32: Pipeline Parallelism 3 CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 32: Pipeline Parallelism 3 Instructor: Dan Garcia inst.eecs.berkeley.edu/~cs61c! Compu@ng in the News At a laboratory in São Paulo,

More information

Instructor: Randy H. Katz hap://inst.eecs.berkeley.edu/~cs61c/fa13. Fall Lecture #7. Warehouse Scale Computer

Instructor: Randy H. Katz hap://inst.eecs.berkeley.edu/~cs61c/fa13. Fall Lecture #7. Warehouse Scale Computer CS 61C: Great Ideas in Computer Architecture Everything is a Number Instructor: Randy H. Katz hap://inst.eecs.berkeley.edu/~cs61c/fa13 9/19/13 Fall 2013 - - Lecture #7 1 New- School Machine Structures

More information

UNIT V: CENTRAL PROCESSING UNIT

UNIT V: CENTRAL PROCESSING UNIT UNIT V: CENTRAL PROCESSING UNIT Agenda Basic Instruc1on Cycle & Sets Addressing Instruc1on Format Processor Organiza1on Register Organiza1on Pipeline Processors Instruc1on Pipelining Co-Processors RISC

More information

CSSE232 Computer Architecture I. Datapath

CSSE232 Computer Architecture I. Datapath CSSE232 Computer Architecture I Datapath Class Status Reading Sec;ons 4.1-3 Project Project group milestone assigned Indicate who you want to work with Indicate who you don t want to work with Due next

More information

Chapter 3: Instruc0on Level Parallelism and Its Exploita0on

Chapter 3: Instruc0on Level Parallelism and Its Exploita0on Chapter 3: Instruc0on Level Parallelism and Its Exploita0on - Abdullah Muzahid Hardware- Based Specula0on (Sec0on 3.6) In mul0ple issue processors, stalls due to branches would be frequent: You may need

More information

fast code (preserve flow of data)

fast code (preserve flow of data) Instruction scheduling: The engineer s view The problem Given a code fragment for some target machine and the latencies for each individual instruction, reorder the instructions to minimize execution time

More information

Introduction to Optimization, Instruction Selection and Scheduling, and Register Allocation

Introduction to Optimization, Instruction Selection and Scheduling, and Register Allocation Introduction to Optimization, Instruction Selection and Scheduling, and Register Allocation Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Traditional Three-pass Compiler

More information

Topic 14: Scheduling COS 320. Compiling Techniques. Princeton University Spring Lennart Beringer

Topic 14: Scheduling COS 320. Compiling Techniques. Princeton University Spring Lennart Beringer Topic 14: Scheduling COS 320 Compiling Techniques Princeton University Spring 2016 Lennart Beringer 1 The Back End Well, let s see Motivating example Starting point Motivating example Starting point Multiplication

More information

Parallel Implementation of Task Scheduling using Ant Colony Optimization

Parallel Implementation of Task Scheduling using Ant Colony Optimization Parallel Implementaon of Task Scheduling using Ant Colony Opmizaon T. Vetri Selvan 1, Mrs. P. Chitra 2, Dr. P. Venkatesh 3 1 Thiagaraar College of Engineering /Department of Computer Science, Madurai,

More information

Administration CS 412/413. Instruction ordering issues. Simplified architecture model. Examples. Impact of instruction ordering

Administration CS 412/413. Instruction ordering issues. Simplified architecture model. Examples. Impact of instruction ordering dministration CS 1/13 Introduction to Compilers and Translators ndrew Myers Cornell University P due in 1 week Optional reading: Muchnick 17 Lecture 30: Instruction scheduling 1 pril 00 1 Impact of instruction

More information

CS 406/534 Compiler Construction Instruction Scheduling

CS 406/534 Compiler Construction Instruction Scheduling CS 406/534 Compiler Construction Instruction Scheduling Prof. Li Xu Dept. of Computer Science UMass Lowell Fall 2004 Part of the course lecture notes are based on Prof. Keith Cooper, Prof. Ken Kennedy

More information

Opera&ng Systems ECE344

Opera&ng Systems ECE344 Opera&ng Systems ECE344 Lecture 10: Scheduling Ding Yuan Scheduling Overview In discussing process management and synchroniza&on, we talked about context switching among processes/threads on the ready

More information

High-Level Synthesis Creating Custom Circuits from High-Level Code

High-Level Synthesis Creating Custom Circuits from High-Level Code High-Level Synthesis Creating Custom Circuits from High-Level Code Hao Zheng Comp Sci & Eng University of South Florida Exis%ng Design Flow Register-transfer (RT) synthesis - Specify RT structure (muxes,

More information

Ar#ficial Intelligence

Ar#ficial Intelligence Ar#ficial Intelligence Advanced Searching Prof Alexiei Dingli Gene#c Algorithms Charles Darwin Genetic Algorithms are good at taking large, potentially huge search spaces and navigating them, looking for

More information

Example. You manage a web site, that suddenly becomes wildly popular. Performance starts to degrade. Do you?

Example. You manage a web site, that suddenly becomes wildly popular. Performance starts to degrade. Do you? Scheduling Main Points Scheduling policy: what to do next, when there are mul:ple threads ready to run Or mul:ple packets to send, or web requests to serve, or Defini:ons response :me, throughput, predictability

More information

Soft GPGPUs for Embedded FPGAS: An Architectural Evaluation

Soft GPGPUs for Embedded FPGAS: An Architectural Evaluation Soft GPGPUs for Embedded FPGAS: An Architectural Evaluation 2nd International Workshop on Overlay Architectures for FPGAs (OLAF) 2016 Kevin Andryc, Tedy Thomas and Russell Tessier University of Massachusetts

More information

Ways to implement a language

Ways to implement a language Interpreters Implemen+ng PLs Most of the course is learning fundamental concepts for using PLs Syntax vs. seman+cs vs. idioms Powerful constructs like closures, first- class objects, iterators (streams),

More information

Simple Machine Model. Lectures 14 & 15: Instruction Scheduling. Simple Execution Model. Simple Execution Model

Simple Machine Model. Lectures 14 & 15: Instruction Scheduling. Simple Execution Model. Simple Execution Model Simple Machine Model Fall 005 Lectures & 5: Instruction Scheduling Instructions are executed in sequence Fetch, decode, execute, store results One instruction at a time For branch instructions, start fetching

More information

Register Allocation (wrapup) & Code Scheduling. Constructing and Representing the Interference Graph. Adjacency List CS2210

Register Allocation (wrapup) & Code Scheduling. Constructing and Representing the Interference Graph. Adjacency List CS2210 Register Allocation (wrapup) & Code Scheduling CS2210 Lecture 22 Constructing and Representing the Interference Graph Construction alternatives: as side effect of live variables analysis (when variables

More information

Lecture 19. Software Pipelining. I. Example of DoAll Loops. I. Introduction. II. Problem Formulation. III. Algorithm.

Lecture 19. Software Pipelining. I. Example of DoAll Loops. I. Introduction. II. Problem Formulation. III. Algorithm. Lecture 19 Software Pipelining I. Introduction II. Problem Formulation III. Algorithm I. Example of DoAll Loops Machine: Per clock: 1 read, 1 write, 1 (2-stage) arithmetic op, with hardware loop op and

More information

Objec&ves. Review. Directed Graphs: Strong Connec&vity Greedy Algorithms

Objec&ves. Review. Directed Graphs: Strong Connec&vity Greedy Algorithms Objec&ves Directed Graphs: Strong Connec&vity Greedy Algorithms Ø Interval Scheduling Feb 7, 2018 CSCI211 - Sprenkle 1 Review Compare and contrast directed and undirected graphs What is a topological ordering?

More information

Instructors: Randy H. Katz David A. PaGerson hgp://inst.eecs.berkeley.edu/~cs61c/fa10. 10/4/10 Fall Lecture #16. Agenda

Instructors: Randy H. Katz David A. PaGerson hgp://inst.eecs.berkeley.edu/~cs61c/fa10. 10/4/10 Fall Lecture #16. Agenda CS 61C: Great Ideas in Computer Architecture (Machine Structures) Instructors: Randy H. Katz David A. PaGerson hgp://inst.eecs.berkeley.edu/~cs61c/fa10 1 Agenda Cache Sizing/Hits and Misses Administrivia

More information

Reading assignment. Chapter 3.1, 3.2 Chapter 4.1, 4.3

Reading assignment. Chapter 3.1, 3.2 Chapter 4.1, 4.3 Reading assignment Chapter 3.1, 3.2 Chapter 4.1, 4.3 1 Outline Introduc5on to assembly programing Introduc5on to Y86 Y86 instruc5ons, encoding and execu5on 2 Assembly The CPU uses machine language to perform

More information

Computer Architecture. CSE 1019Y Week 16. Introduc>on to MARIE

Computer Architecture. CSE 1019Y Week 16. Introduc>on to MARIE Computer Architecture CSE 1019Y Week 16 Introduc>on to MARIE MARIE Simple model computer used in this class MARIE Machine Architecture that is Really Intui>ve and Easy Designed for educa>on only While

More information

Virtualization. Introduction. Why we interested? 11/28/15. Virtualiza5on provide an abstract environment to run applica5ons.

Virtualization. Introduction. Why we interested? 11/28/15. Virtualiza5on provide an abstract environment to run applica5ons. Virtualization Yifu Rong Introduction Virtualiza5on provide an abstract environment to run applica5ons. Virtualiza5on technologies have a long trail in the history of computer science. Why we interested?

More information

ECSE 425 Lecture 25: Mul1- threading

ECSE 425 Lecture 25: Mul1- threading ECSE 425 Lecture 25: Mul1- threading H&P Chapter 3 Last Time Theore1cal and prac1cal limits of ILP Instruc1on window Branch predic1on Register renaming 2 Today Mul1- threading Chapter 3.5 Summary of ILP:

More information

Computer Architecture: Mul1ple Issue. Berk Sunar and Thomas Eisenbarth ECE 505

Computer Architecture: Mul1ple Issue. Berk Sunar and Thomas Eisenbarth ECE 505 Computer Architecture: Mul1ple Issue Berk Sunar and Thomas Eisenbarth ECE 505 Outline 5 stages of RISC Type of hazards Sta@c and Dynamic Branch Predic@on Pipelining with Excep@ons Pipelining with Floa@ng-

More information

: Compiler Design

: Compiler Design 252-210: Compiler Design 7.5.* Actuals/formals correspondence Thomas R. Gross Computer Science Department ETH Zurich, Switzerland Actual- formal correspondence 7.5.1 Call- by- value Caller passes value

More information

Parallel Programming Pa,erns

Parallel Programming Pa,erns Parallel Programming Pa,erns Bryan Mills, PhD Spring 2017 What is a programming pa,erns? Repeatable solu@on to commonly occurring problem It isn t a solu@on that you can t simply apply, the engineer has

More information

CS 61C: Great Ideas in Computer Architecture (Machine Structures)

CS 61C: Great Ideas in Computer Architecture (Machine Structures) CS 61C: Great Ideas in Computer Architecture (Machine Structures) Instructors: Randy H. Katz David A. PaGerson hgp://inst.eecs.berkeley.edu/~cs61c/fa10 1 2 Cache Field Sizes Number of bits in a cache includes

More information

CS 61C: Great Ideas in Computer Architecture Strings and Func.ons. Anything can be represented as a number, i.e., data or instruc\ons

CS 61C: Great Ideas in Computer Architecture Strings and Func.ons. Anything can be represented as a number, i.e., data or instruc\ons CS 61C: Great Ideas in Computer Architecture Strings and Func.ons Instructor: Krste Asanovic, Randy H. Katz hdp://inst.eecs.berkeley.edu/~cs61c/sp12 Fall 2012 - - Lecture #7 1 New- School Machine Structures

More information

Instruction Scheduling Beyond Basic Blocks Extended Basic Blocks, Superblock Cloning, & Traces, with a quick introduction to Dominators.

Instruction Scheduling Beyond Basic Blocks Extended Basic Blocks, Superblock Cloning, & Traces, with a quick introduction to Dominators. Instruction Scheduling Beyond Basic Blocks Extended Basic Blocks, Superblock Cloning, & Traces, with a quick introduction to Dominators Comp 412 COMP 412 FALL 2016 source code IR Front End Optimizer Back

More information

The ILOC Virtual Machine (Lab 1 Background Material) Comp 412

The ILOC Virtual Machine (Lab 1 Background Material) Comp 412 COMP 12 FALL 20 The ILOC Virtual Machine (Lab 1 Background Material) Comp 12 source code IR Front End OpMmizer Back End IR target code Copyright 20, Keith D. Cooper & Linda Torczon, all rights reserved.

More information

Punctual Coalescing. Fernando Magno Quintão Pereira

Punctual Coalescing. Fernando Magno Quintão Pereira Punctual Coalescing Fernando Magno Quintão Pereira Register Coalescing Register coalescing is an op7miza7on on top of register alloca7on. The objec7ve is to map both variables used in a copy instruc7on

More information

Principles of Programming Languages

Principles of Programming Languages Principles of Programming Languages h"p://www.di.unipi.it/~andrea/dida2ca/plp- 14/ Prof. Andrea Corradini Department of Computer Science, Pisa Lesson 18! Bootstrapping Names in programming languages Binding

More information

CS 33. Architecture and Optimization (2) CS33 Intro to Computer Systems XV 1 Copyright 2016 Thomas W. Doeppner. All rights reserved.

CS 33. Architecture and Optimization (2) CS33 Intro to Computer Systems XV 1 Copyright 2016 Thomas W. Doeppner. All rights reserved. CS 33 Architecture and Optimization (2) CS33 Intro to Computer Systems XV 1 Copyright 2016 Thomas W. Doeppner. All rights reserved. Modern CPU Design Instruc&on Control Re%rement Unit Register File Fetch

More information

CS5363 Final Review. cs5363 1

CS5363 Final Review. cs5363 1 CS5363 Final Review cs5363 1 Programming language implementation Programming languages Tools for describing data and algorithms Instructing machines what to do Communicate between computers and programmers

More information

Virtual Memory: Concepts

Virtual Memory: Concepts Virtual Memory: Concepts Instructor: Dr. Hyunyoung Lee Based on slides provided by Randy Bryant and Dave O Hallaron Today Address spaces VM as a tool for caching VM as a tool for memory management VM as

More information

What is Search For? CS 188: Ar)ficial Intelligence. Constraint Sa)sfac)on Problems Sep 14, 2015

What is Search For? CS 188: Ar)ficial Intelligence. Constraint Sa)sfac)on Problems Sep 14, 2015 CS 188: Ar)ficial Intelligence Constraint Sa)sfac)on Problems Sep 14, 2015 What is Search For? Assump)ons about the world: a single agent, determinis)c ac)ons, fully observed state, discrete state space

More information

ECE 468, Fall Midterm 2

ECE 468, Fall Midterm 2 ECE 468, Fall 08. Midterm INSTRUCTIONS (read carefully) Fill in your name and PUID. NAME: PUID: Please sign the following: I affirm that the answers given on this test are mine and mine alone. I did not

More information

Research opportuni/es with me

Research opportuni/es with me Research opportuni/es with me Independent study for credit - Build PL tools (parsers, editors) e.g., JDial - Build educa/on tools (e.g., Automata Tutor) - Automata theory problems e.g., AutomatArk - Research

More information

: Compiler Design

: Compiler Design 252-210: Compiler Design 9.0 Data- Flow Analysis Thomas R. Gross Computer Science Department ETH Zurich, Switzerland Global program analysis is a crucial part of all real compilers. Global : beyond a statement

More information

Special Topics on Algorithms Fall 2017 Dynamic Programming. Vangelis Markakis, Ioannis Milis and George Zois

Special Topics on Algorithms Fall 2017 Dynamic Programming. Vangelis Markakis, Ioannis Milis and George Zois Special Topics on Algorithms Fall 2017 Dynamic Programming Vangelis Markakis, Ioannis Milis and George Zois Basic Algorithmic Techniques Content Dynamic Programming Introduc

More information

Instruc=on Set Architecture

Instruc=on Set Architecture ECPE 170 Jeff Shafer University of the Pacific Instruc=on Set Architecture 2 Schedule Today Closer look at instruc=on sets Thursday Brief discussion of real ISAs Quiz 4 (over Chapter 5, i.e. HW #10 and

More information

Chapter. Out of order Execution

Chapter. Out of order Execution Chapter Long EX Instruction stages We have assumed that all stages. There is a problem with the EX stage multiply (MUL) takes more time than ADD MUL ADD We can clearly delay the execution of the ADD until

More information

CE431 Parallel Computer Architecture Spring Compile-time ILP extraction Modulo Scheduling

CE431 Parallel Computer Architecture Spring Compile-time ILP extraction Modulo Scheduling CE431 Parallel Computer Architecture Spring 2018 Compile-time ILP extraction Modulo Scheduling Nikos Bellas Electrical and Computer Engineering University of Thessaly Parallel Computer Architecture 1 Readings

More information

CS 465 Final Review. Fall 2017 Prof. Daniel Menasce

CS 465 Final Review. Fall 2017 Prof. Daniel Menasce CS 465 Final Review Fall 2017 Prof. Daniel Menasce Ques@ons What are the types of hazards in a datapath and how each of them can be mi@gated? State and explain some of the methods used to deal with branch

More information

Data Flow Analysis. Suman Jana. Adopted From U Penn CIS 570: Modern Programming Language Implementa=on (Autumn 2006)

Data Flow Analysis. Suman Jana. Adopted From U Penn CIS 570: Modern Programming Language Implementa=on (Autumn 2006) Data Flow Analysis Suman Jana Adopted From U Penn CIS 570: Modern Programming Language Implementa=on (Autumn 2006) Data flow analysis Derives informa=on about the dynamic behavior of a program by only

More information

CS 61C: Great Ideas in Computer Architecture Func%ons and Numbers

CS 61C: Great Ideas in Computer Architecture Func%ons and Numbers CS 61C: Great Ideas in Computer Architecture Func%ons and Numbers 9/11/12 Instructor: Krste Asanovic, Randy H. Katz hcp://inst.eecs.berkeley.edu/~cs61c/sp12 Fall 2012 - - Lecture #8 1 New- School Machine

More information

Instruction scheduling

Instruction scheduling Instruction scheduling iaokang Qiu Purdue University ECE 468 October 12, 2018 What is instruction scheduling? Code generation has created a sequence of assembly instructions But that is not the only valid

More information

What Compilers Can and Cannot Do. Saman Amarasinghe Fall 2009

What Compilers Can and Cannot Do. Saman Amarasinghe Fall 2009 What Compilers Can and Cannot Do Saman Amarasinghe Fall 009 Optimization Continuum Many examples across the compilation pipeline Static Dynamic Program Compiler Linker Loader Runtime System Optimization

More information

CS 188: Ar)ficial Intelligence

CS 188: Ar)ficial Intelligence CS 188: Ar)ficial Intelligence Search Instructors: Pieter Abbeel & Anca Dragan University of California, Berkeley [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley

More information

Lecture 2: Memory in C

Lecture 2: Memory in C CIS 330:! / / / / (_) / / / / _/_/ / / / / / \/ / /_/ / `/ \/ / / / _/_// / / / / /_ / /_/ / / / / /> < / /_/ / / / / /_/ / / / /_/ / / / / / \ /_/ /_/_/_/ _ \,_/_/ /_/\,_/ \ /_/ \ //_/ /_/ Lecture 2:

More information

Efficient Memory and Bandwidth Management for Industrial Strength Kirchhoff Migra<on

Efficient Memory and Bandwidth Management for Industrial Strength Kirchhoff Migra<on Efficient Memory and Bandwidth Management for Industrial Strength Kirchhoff Migra

More information

Lecture 7. Instruction Scheduling. I. Basic Block Scheduling II. Global Scheduling (for Non-Numeric Code)

Lecture 7. Instruction Scheduling. I. Basic Block Scheduling II. Global Scheduling (for Non-Numeric Code) Lecture 7 Instruction Scheduling I. Basic Block Scheduling II. Global Scheduling (for Non-Numeric Code) Reading: Chapter 10.3 10.4 CS243: Instruction Scheduling 1 Scheduling Constraints Data dependences

More information

Topics. Computer Organization CS Improving Performance. Opportunity for (Easy) Points. Three Generic Data Hazards

Topics. Computer Organization CS Improving Performance. Opportunity for (Easy) Points. Three Generic Data Hazards Computer Organization CS 231-01 Improving Performance Dr. William H. Robinson November 8, 2004 Topics Money's only important when you don't have any. Sting Cache Scoreboarding http://eecs.vanderbilt.edu/courses/cs231/

More information

Deformable Part Models

Deformable Part Models Deformable Part Models References: Felzenszwalb, Girshick, McAllester and Ramanan, Object Detec@on with Discrimina@vely Trained Part Based Models, PAMI 2010 Code available at hkp://www.cs.berkeley.edu/~rbg/latent/

More information

: Advanced Compiler Design

: Advanced Compiler Design 263-2810: Advanced Compiler Design Thomas R. Gross Computer Science Department ETH Zurich, Switzerland Topics Program opgmizagon Op%miza%on Op%mize for (execu%on) speed Op%mize for (code) size Op%mize

More information

Superscalar Processors Ch 14

Superscalar Processors Ch 14 Superscalar Processors Ch 14 Limitations, Hazards Instruction Issue Policy Register Renaming Branch Prediction PowerPC, Pentium 4 1 Superscalar Processing (5) Basic idea: more than one instruction completion

More information

CSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on. Instructor: Wei-Min Shen

CSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on. Instructor: Wei-Min Shen CSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on Instructor: Wei-Min Shen Status Check and Review Status check Have you registered in Piazza? Have you run the Project-1?

More information

Superscalar Processing (5) Superscalar Processors Ch 14. New dependency for superscalar case? (8) Output Dependency?

Superscalar Processing (5) Superscalar Processors Ch 14. New dependency for superscalar case? (8) Output Dependency? Superscalar Processors Ch 14 Limitations, Hazards Instruction Issue Policy Register Renaming Branch Prediction PowerPC, Pentium 4 1 Superscalar Processing (5) Basic idea: more than one instruction completion

More information

CSE 473: Ar+ficial Intelligence

CSE 473: Ar+ficial Intelligence CSE 473: Ar+ficial Intelligence Search Instructor: Luke Ze=lemoyer University of Washington [These slides were adapted from Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials

More information

BIL 682 Ar+ficial Intelligence Week #2: Solving problems by searching. Asst. Prof. Aykut Erdem Dept. of Computer Engineering HaceDepe University

BIL 682 Ar+ficial Intelligence Week #2: Solving problems by searching. Asst. Prof. Aykut Erdem Dept. of Computer Engineering HaceDepe University BIL 682 Ar+ficial Intelligence Week #2: Solving problems by searching Asst. Prof. Aykut Erdem Dept. of Computer Engineering HaceDepe University Today Search problems Uninformed search Informed (heuris+c)

More information

CS 61C: Great Ideas in Computer Architecture (Machine Structures) More Cache: Set Associa0vity. Smart Phone. Today s Lecture. Core.

CS 61C: Great Ideas in Computer Architecture (Machine Structures) More Cache: Set Associa0vity. Smart Phone. Today s Lecture. Core. CS 6C: Great Ideas in Computer Architecture (Machine Structures) More Cache: Set Associavity Instructors: Randy H Katz David A PaGerson Guest Lecture: Krste Asanovic hgp://insteecsberkeleyedu/~cs6c/fa

More information

CSE Opera,ng System Principles

CSE Opera,ng System Principles CSE 30341 Opera,ng System Principles Synchroniza2on Overview Background The Cri,cal-Sec,on Problem Peterson s Solu,on Synchroniza,on Hardware Mutex Locks Semaphores Classic Problems of Synchroniza,on Monitors

More information

Sta$c Single Assignment (SSA) Form

Sta$c Single Assignment (SSA) Form Sta$c Single Assignment (SSA) Form SSA form Sta$c single assignment form Intermediate representa$on of program in which every use of a variable is reached by exactly one defini$on Most programs do not

More information

CSE P 501 Compilers. Instruc7on Selec7on Hal Perkins Winter UW CSE P 501 Winter 2016 N-1

CSE P 501 Compilers. Instruc7on Selec7on Hal Perkins Winter UW CSE P 501 Winter 2016 N-1 CSE P 501 Compilers Instruc7on Selec7on Hal Perkins Winter 2016 UW CSE P 501 Winter 2016 N-1 Agenda Compiler back- end organiza7on Instruc7on selec7on tree padern matching Credits: Slides by Keith Cooper

More information

hashfs Applying Hashing to Op2mize File Systems for Small File Reads

hashfs Applying Hashing to Op2mize File Systems for Small File Reads hashfs Applying Hashing to Op2mize File Systems for Small File Reads Paul Lensing, Dirk Meister, André Brinkmann Paderborn Center for Parallel Compu2ng University of Paderborn Mo2va2on and Problem Design

More information

Compiler Architecture

Compiler Architecture Code Generation 1 Compiler Architecture Source language Scanner (lexical analysis) Tokens Parser (syntax analysis) Syntactic structure Semantic Analysis (IC generator) Intermediate Language Code Optimizer

More information

Instruction-Level Parallelism (ILP)

Instruction-Level Parallelism (ILP) Instruction Level Parallelism Instruction-Level Parallelism (ILP): overlap the execution of instructions to improve performance 2 approaches to exploit ILP: 1. Rely on hardware to help discover and exploit

More information

Opera&ng Systems ECE344

Opera&ng Systems ECE344 Opera&ng Systems ECE344 Lecture 8: Paging Ding Yuan Lecture Overview Today we ll cover more paging mechanisms: Op&miza&ons Managing page tables (space) Efficient transla&ons (TLBs) (&me) Demand paged virtual

More information

Recursive Helper functions

Recursive Helper functions 11/16/16 Page 22 11/16/16 Page 23 11/16/16 Page 24 Recursive Helper functions Some%mes it is easier to find a recursive solu%on if you make a slight change to the original problem. Consider the palindrome

More information

Register Allocation. Global Register Allocation Webs and Graph Coloring Node Splitting and Other Transformations

Register Allocation. Global Register Allocation Webs and Graph Coloring Node Splitting and Other Transformations Register Allocation Global Register Allocation Webs and Graph Coloring Node Splitting and Other Transformations Copyright 2015, Pedro C. Diniz, all rights reserved. Students enrolled in the Compilers class

More information

hnp://

hnp:// The bots face off in a tournament against one another and about an equal number of humans, with each player trying to score points by elimina&ng its opponents. Each player also has a "judging gun" in addi&on

More information

Networks and Opera/ng Systems Chapter 13: Scheduling

Networks and Opera/ng Systems Chapter 13: Scheduling Networks and Opera/ng Systems Chapter 13: Scheduling (252 0062 00) Donald Kossmann & Torsten Hoefler Frühjahrssemester 2013 Systems Group Department of Computer Science ETH Zürich Last /me Process concepts

More information

Compiler Optimization Techniques

Compiler Optimization Techniques Compiler Optimization Techniques Department of Computer Science, Faculty of ICT February 5, 2014 Introduction Code optimisations usually involve the replacement (transformation) of code from one sequence

More information

Superscalar Processors Ch 13. Superscalar Processing (5) Computer Organization II 10/10/2001. New dependency for superscalar case? (8) Name dependency

Superscalar Processors Ch 13. Superscalar Processing (5) Computer Organization II 10/10/2001. New dependency for superscalar case? (8) Name dependency Superscalar Processors Ch 13 Limitations, Hazards Instruction Issue Policy Register Renaming Branch Prediction 1 New dependency for superscalar case? (8) Name dependency (nimiriippuvuus) two use the same

More information

W1005 Intro to CS and Programming in MATLAB. Brief History of Compu?ng. Fall 2014 Instructor: Ilia Vovsha. hip://www.cs.columbia.

W1005 Intro to CS and Programming in MATLAB. Brief History of Compu?ng. Fall 2014 Instructor: Ilia Vovsha. hip://www.cs.columbia. W1005 Intro to CS and Programming in MATLAB Brief History of Compu?ng Fall 2014 Instructor: Ilia Vovsha hip://www.cs.columbia.edu/~vovsha/w1005 Computer Philosophy Computer is a (electronic digital) device

More information

Virtual Memory B: Objec5ves

Virtual Memory B: Objec5ves Virtual Memory B: Objec5ves Benefits of a virtual memory system" Demand paging, page-replacement algorithms, and allocation of page frames" The working-set model" Relationship between shared memory and

More information

EE382A Lecture 7: Dynamic Scheduling. Department of Electrical Engineering Stanford University

EE382A Lecture 7: Dynamic Scheduling. Department of Electrical Engineering Stanford University EE382A Lecture 7: Dynamic Scheduling Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a Lecture 7-1 Announcements Project proposal due on Wed 10/14 2-3 pages submitted

More information

Program Op*miza*on and Analysis. Chenyang Lu CSE 467S

Program Op*miza*on and Analysis. Chenyang Lu CSE 467S Program Op*miza*on and Analysis Chenyang Lu CSE 467S 1 Program Transforma*on op#mize Analyze HLL compile assembly assemble Physical Address Rela5ve Address assembly object load executable link Absolute

More information

Virtual Memory: Concepts

Virtual Memory: Concepts Virtual Memory: Concepts 5-23 / 8-23: Introduc=on to Computer Systems 6 th Lecture, Mar. 8, 24 Instructors: Anthony Rowe, Seth Goldstein, and Gregory Kesden Today VM Movaon and Address spaces ) VM as a

More information

Page # Let the Compiler Do it Pros and Cons Pros. Exploiting ILP through Software Approaches. Cons. Perhaps a mixture of the two?

Page # Let the Compiler Do it Pros and Cons Pros. Exploiting ILP through Software Approaches. Cons. Perhaps a mixture of the two? Exploiting ILP through Software Approaches Venkatesh Akella EEC 270 Winter 2005 Based on Slides from Prof. Al. Davis @ cs.utah.edu Let the Compiler Do it Pros and Cons Pros No window size limitation, the

More information

Applied Algorithm Design Lecture 3

Applied Algorithm Design Lecture 3 Applied Algorithm Design Lecture 3 Pietro Michiardi Eurecom Pietro Michiardi (Eurecom) Applied Algorithm Design Lecture 3 1 / 75 PART I : GREEDY ALGORITHMS Pietro Michiardi (Eurecom) Applied Algorithm

More information

Computer Architecture

Computer Architecture Computer Architecture An Introduction Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/

More information

Dynamic Scheduling. CSE471 Susan Eggers 1

Dynamic Scheduling. CSE471 Susan Eggers 1 Dynamic Scheduling Why go out of style? expensive hardware for the time (actually, still is, relatively) register files grew so less register pressure early RISCs had lower CPIs Why come back? higher chip

More information

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining and Instruction-Level Parallelism (ILP). Definition of basic instruction block Increasing Instruction-Level Parallelism (ILP) &

More information

CS252 Graduate Computer Architecture Spring 2014 Lecture 13: Mul>threading

CS252 Graduate Computer Architecture Spring 2014 Lecture 13: Mul>threading CS252 Graduate Computer Architecture Spring 2014 Lecture 13: Mul>threading Krste Asanovic krste@eecs.berkeley.edu http://inst.eecs.berkeley.edu/~cs252/sp14 Last Time in Lecture 12 Synchroniza?on and Memory

More information

Preventing Stalls: 1

Preventing Stalls: 1 Preventing Stalls: 1 2 PipeLine Pipeline efficiency Pipeline CPI = Ideal pipeline CPI + Structural Stalls + Data Hazard Stalls + Control Stalls Ideal pipeline CPI: best possible (1 as n ) Structural hazards:

More information

Unit 2: High-Level Synthesis

Unit 2: High-Level Synthesis Course contents Unit 2: High-Level Synthesis Hardware modeling Data flow Scheduling/allocation/assignment Reading Chapter 11 Unit 2 1 High-Level Synthesis (HLS) Hardware-description language (HDL) synthesis

More information

CS152 Computer Architecture and Engineering. Complex Pipelines

CS152 Computer Architecture and Engineering. Complex Pipelines CS152 Computer Architecture and Engineering Complex Pipelines Assigned March 6 Problem Set #3 Due March 20 http://inst.eecs.berkeley.edu/~cs152/sp12 The problem sets are intended to help you learn the

More information

Instruction-Level Parallelism and Its Exploitation

Instruction-Level Parallelism and Its Exploitation Chapter 2 Instruction-Level Parallelism and Its Exploitation 1 Overview Instruction level parallelism Dynamic Scheduling Techniques es Scoreboarding Tomasulo s s Algorithm Reducing Branch Cost with Dynamic

More information