: Advanced Compiler Design. 8.0 Instruc?on scheduling
|
|
- Cassandra Beasley
- 5 years ago
- Views:
Transcription
1 6-80: Advanced Compiler Design 8.0 Instruc?on scheduling Thomas R. Gross Computer Science Department ETH Zurich, Switzerland
2 Overview 8. Instruc?on scheduling basics 8. Scheduling for ILP processors
3 8. Instruc?on scheduling basics Mo?va?on Problem formula?on The data dependence graph Instruc?on scheduling techniques List scheduling Overview Algorithm Priority func5ons Example
4 Instruc?on scheduling Input Output Source code Frontend IR Op5mizer IR Code Generator Machine program Input Output HIR Instruc5on Selec5on LLIR Instruc?on Scheduling reordered LLIR Register Alloca5on LLIR HIR: High-level IR LLIR: Low-level IR
5 5
6 Mo?va?on Or: why don t we just hand the code as-is to processor? Processor interprets instruc5ons load r ß MEM[r0] add r4 ß r, # load r ß MEM[r] add r ß r, # add r0 ß r0, #4 add r ß r, #4 mul r ß r, r4 store MEM[r], r add r ß r, #4
7 Mo?va?on Or: why don t we just hand the code as-is to processor? Processor interprets instruc5ons load r ß MEM[r0] add r4 ß r, # load r ß MEM[r] add r ß r, # add r0 ß r0, #4 add r ß r, #4 mul r ß r, r4 store MEM[r], r add r ß r, #4 LLIR Instruc?on Scheduler load r ß MEM[r0] add r4 ß r, # load r ß MEM[r] add r ß r, # add r0 ß r0, #4 add r ß r, #4 mul r ß r, r4 store MEM[r], r add r ß r, #4 reordered LLIR
8 Mo?va?on Or: why don t we just hand the code as-is to processor? Processor interprets instruc5ons load r ß MEM[r0] add r4 ß r, # load r ß MEM[r] add r ß r, # add r0 ß r0, #4 add r ß r, #4 mul r ß r, r4 store MEM[r], r add r ß r, #4 LLIR Copy load r ß MEM[r0] add r4 ß r, # load r ß MEM[r] add r ß r, # add r0 ß r0, #4 add r ß r, #4 mul r ß r, r4 store MEM[r], r add r ß r, #4 reordered LLIR
9 Mo?va?on Assume a target machine with the following proper?es Pipelined with forwarding, single issue, in-order Opera5on latencies: Examples add, sub: cycle mul, load: cycles store: cycle load r ß MEM[r0] add r4 ß r, # add r ß r, #4 mul r ß r, r4 add r5 ß r, r4 store MEM[r], r5
10 Mo?va?on Execu?ng the example code load r ß MEM[r0] add r4 ß r, # load r ß MEM[r] add r ß r, # add r0 ß r0, #4 add r ß r, #4 mul r ß r, r4 store MEM[r], r add r ß r, #4
11 Mo?va?on Execu?ng the example code load r ß MEM[r0] add r4 ß r, # load r ß MEM[r] add r ß r, # add r0 ß r0, #4 add r ß r, #4 mul r ß r, r4 store MEM[r], r add r ß r, #4 cycle operation load r ß MEM[r0] 4 add r4 ß r, # 5 load r ß MEM[r] add r ß r, # 9 add r0 ß r0, #4 0 add r ß r, #4 mul r ß r, r4 4 store MEM[r], r 5 add r ß r, #4
12 Mo?va?on Can we do beuer? load r ß MEM[r0] add r4 ß r, # load r ß MEM[r] add r ß r, # add r0 ß r0, #4 add r ß r, #4 mul r ß r, r4 store MEM[r], r add r ß r, #4 cycle operation Target machine proper5es pipelined with forwarding, single issue, in-order opera5on latencies: add, sub: cycle; mul, load: cycles; store: cycle
13 Mo?va?on Can we do beuer? load r ß MEM[r0] add r4 ß r, # load r ß MEM[r] add r ß r, # add r0 ß r0, #4 add r ß r, #4 mul r ß r, r4 store MEM[r], r add r ß r, #4 cycle operation load r ß MEM[r0] add r0 ß r0, #4 4 add r4 ß r, # 5 load r ß MEM[r] 6 add r ß r, #4 7 8 add r ß r, # 9 mul r ß r, r4 0 store MEM[r], r add r ß r, #4 Target machine proper5es pipelined with forwarding, single issue, in-order opera5on latencies: add, sub: cycle; mul, load: cycles; store: cycle
14 Mo?va?on Can we do even beuer? Registers constrain schedule load r ß MEM[r0] add r4 ß r, # load r ß MEM[r] If st instruc5on loads into r4 nd load could start earlier load r4 ß MEM[r0] add r4 ß r4, # load r ß MEM[r]
15 Mo?va?on Can we do even beuer? Registers constrain schedule load r ß MEM[r0] add r4 ß r, # load r ß MEM[r] If st instruc5on loads into r4 nd load could start earlier load r4 ß MEM[r0] add r4 ß r4, # load r ß MEM[r] cycle operation load r4 ß MEM[r0] load r ß MEM[r] add r0 ß r0, #4 4 add r4 ß r4, #
16 Mo?va?on Can we do even beuer? Registers constrain schedule load r4 ß MEM[r0] add r4 ß r4, # load r ß MEM[r] add r ß r, # add r0 ß r0, #4 add r ß r, #4 mul r ß r, r4 store MEM[r], r add r ß r, #4 cycle operation Target machine proper5es pipelined with forwarding, single issue, in-order opera5on latencies: add, sub: cycle; mul, load: cycles; store: cycle
17 Mo?va?on Can we do even beuer? Registers constrain schedule load r4 ß MEM[r0] add r4 ß r4, # load r ß MEM[r] add r ß r, # add r0 ß r0, #4 add r ß r, #4 mul r ß r, r4 store MEM[r], r add r ß r, #4 cycle operation load r4 ß MEM[r0] load r ß MEM[r] add r0 ß r0, #4 4 add r4 ß r4, # 5 add r ß r, # 6 mul r ß r, r4 7 add r ß r, #4 8 9 store MEM[r], r 0 add r ß r, #4 Target machine proper5es pipelined with forwarding, single issue, in-order opera5on latencies: add, sub: cycle; mul, load: cycles; store: cycle
18 Comparison cycle operation load r ß MEM[r0] 4 add r4 ß r, # 5 load r ß MEM[r] add r ß r, # 9 add r0 ß r0, #4 0 add r ß r, #4 mul r ß r, r4 4 store MEM[r], r 5 add r ß r, #4 versus cycle operation load r4 ß MEM[r0] load r ß MEM[r] add r0 ß r0, #4 4 add r4 ß r4, # 5 add r ß r, # 6 mul r ß r, r4 7 add r ß r, #4 8 9 store MEM[r], r 0 add r ß r, #4
19 Comparison cycle operation load r ß MEM[r0] 4 add r4 ß r, # 5 load r ß MEM[r] add r ß r, # 9 add r0 ß r0, #4 0 add r ß r, #4 mul r ß r, r4 4 store MEM[r], r 5 add r ß r, #4 versus cycle operation load r4 ß MEM[r0] load r ß MEM[r] add r0 ß r0, #4 4 add r4 ß r4, # 5 add r ß r, # 6 mul r ß r, r4 7 add r ß r, #4 8 9 store MEM[r], r 0 add r ß r, #4 % Improvement (Code size)
20 Food for thought Is this schedule the best we can auain (if we are willing to reconsider other register assignments)? If another register rx is free, use it instead of r cycle operation load r4 ß MEM[r0] load r ß MEM[r] add r0 ß r0, #4 4 add r4 ß r4, # 5 add r ß r, # 6 mul r ß r, r4 7 add r ß r, #4 8 9 store MEM[r], r 0 add r ß r, #4 0
21 Instruc?on scheduling Scheduling: define the order of interpreta?on (Instruc?on) scheduling: define the order in which instruc?ons are presented to the CPU for processing CPU may decide to execute instruc5ons in a different order out of order execu5on Beyond control of compiler Read the fine print! Some instruc?ons specify mul?ple opera?ons Or the CPU may fetch mul5ple instruc5ons ILP: Instruc5on-Level Parallelism Scheduling of opera5ons à instruc5ons
22 Why do processors change the order of execu?on Rela5ve to order of fetch/interpreta5on? Why should the compiler bother to implement scheduling if the processor re-orders instruc?ons? Can the compiler handle all cases
23 Instruc?on scheduling Input Output Source code Frontend IR Op5mizer IR Code Generator Machine program Input Output HIR Instruc5on Selec5on LLIR Instruc?on Scheduling reordered LLIR Register Alloca5on LLIR HIR: High-level IR LLIR: Low-level IR
24 Code genera?on is easy 4
25 Code genera?on is easy as long as the code generator includes only two tasks from {instruc?on selec?on, scheduling, register alloca?on} 5
26 Instruc?on scheduling Schedule: par?ally ordered list of instruc?ons Par5al order determined by resource usage What is a good schedule? Main constraint Preserve meaning of the code (control flow, data flow) Metrics Typical: shortest in terms of execu5on 5me Desired: Shortest execu5on 5me Some5mes: conserve energy
27 Good schedules Metrics O`en can t predict execu5on 5me (data or context dependent) Varies for different processor implementa5ons May not be published Hidden conflicts (in decoding, address transla5on,.) Memory system performance difficult to model Addi?onal constraints imposed by H/W proper?es Opera5on latencies Processor pipeline # of func5onal units (FU) available # of registers Memory hierarchy (e.g., pre-fetching)
28 Data Dependence Graph DDG = (V, E) Nodes V represent each opera5on Augmented with Opera5on type Opera5on latency (delay) Edges E represent data dependences between opera5ons Forward (def-use) An5 (use-def) Output (def-def) Root nodes = no successors Leaf nodes = no predecessors Latencies on nodes or edges? Latency of an5/output dependences?
29 9
30 Root nodes Root nodes on bouom Successor: uses result generated 0
31 load r ß MEM[r0] add r4 ß r, # load r ß MEM[r] add r ß r, # add r0 ß r0, #4 add r ß r, #4 mul r ß r, r4 store MEM[r], r add r ß r, #4
32 Root nodes Root node on top Produces result that flows to consumer node
33 load r ß MEM[r0] add r4 ß r, # load r ß MEM[r] add r ß r, # add r0 ß r0, #4 add r ß r, #4 mul r ß r, r4 store MEM[r], r add r ß r, #4
34 r ß MEM[r0] r4 ß r + load r ß MEM[r0] add r4 ß r, # load r ß MEM[r] add r ß r, # add r0 ß r0, #4 add r ß r, #4 mul r ß r, r4 store MEM[r], r add r ß r, #4 r ß MEM[r] r ß r + r ß r4 * r MEM[r] ß r r0 ß r0 + 4 r ß r + 4 r ß r + 4 4
35 Renaming Dealing with an?/output dependences An5/output dependences are ar5ficial dependences that constrain the scheduler Can be eliminated by renaming Effect on register pressure? Can we eliminate all an5/output dependences? r! MEM[r0] r4! r + r! MEM[r]
36 Instruc?on schedule S(n): n V t N + mapping from an opera?on n to an non-nega?ve integer t t denotes cycle when opera?on is processed by CPU t denotes instruc?on that contains opera?on n Constraints: S(n) (and at least one opera5on O with S(O) = ) If (n, n ) E then S(n ) + delay(n ) S(n ) For each t, there are no more opera5ons with S(n) = t than the H/W (resp. the instruc5on format) can support cycle t operation n load r4 ß MEM[r0] load r ß MEM[r] add r0 ß r0, #4 4 add r4 ß r4, # 5 add r ß r, # 6 mul r ß r, r4 7 add r ß r, #4 8 9 store MEM[r], r 0 add r ß r, #4
37 Length of a schedule Length of the schedule L(S) = max n V (S(n) + delay(n)) L(S) = cycle t operation n load r4 ß MEM[r0] load r ß MEM[r] add r0 ß r0, #4 4 add r4 ß r4, # 5 add r ß r, # 6 mul r ß r, r4 7 add r ß r, #4 8 9 store MEM[r], r 0 add r ß r, #4 7
38 Instruc?on schedule Path length: star5ng at the roots, annotate each node with its accumulated delay 5 r4 ß r4 + 8 r4 ß MEM[r0] Cri?cal path: longest path over all paths in the data dependence graph Shortest (minimal) schedule cannot be shorter than the length of the cri?cal path 4 8 r ß MEM[r] r ß r + r ß r4 * r MEM[r] ß r 5 0 r ß r r0 ß r0 + 4 r ß r + 4
39 Finding a mapping Given a DDG, mapping can be found Forward: start at leaves (producers) Backward: start at root (consumers) 9
40 Instruc?on scheduling techniques Local instruc?on scheduling: scheduling of one DDG Local scheduling is an NP-complete problem (scheduling à job shop scheduling à TSP)
41 Cycle vs. opera?on scheduling add mul? sub load store add sub mul load store Opera?on scheduling more powerful than cycle-based scheduling in the presence of long-latency opera?ons However, much more complicated to implement
42 Linear vs. graph-based techniques Linear techniques Run5me O(n) Produces the schedule by one or more passes over the input LLIR Most common technique: cri5cal-path scheduling Three passes: ASAP, ALAP, non-cri5cal opera5ons Limita5on: unable to consider global proper5es of opera5ons Graph-based techniques Run5me: O(n ) for DAG crea5on plus scheduling Prevalent technique: list scheduling (O(n log n)) Greedy: select one opera5on and schedule it
43 List scheduling Prevalent scheduling heuris?cs are based on list scheduling Method Rename (op5onal) Build data dependence graph Assign priori5es to opera5ons Itera5vely select and schedule an opera5on
44 List scheduling t := ready := { leaves of DDG } active := {} while (ready active {}) do for each operation o in active do if (S(o) + delay(o) < t) then active := active {o} for each successor s of o in DDG do if (s is ready) then ready := ready {s} if (ready {}) then o := pick the operation from ready with the highest priority if (o can be scheduled on the H/W units) then ready := ready {o} active := active {o} S(op) := t t := t + end
45 List scheduling t := ready := { leaves of DDG } active := {} while (ready active {}) do for each operation o in active do if (S(o) + delay(o) < t) then active := active {o} for each successor s of o in DDG do if (s is ready) then ready := ready {s} < or? if (ready {}) then o := pick the operation from ready with the highest priority if (o can be scheduled on the H/W units) then ready := ready {o} active := active {o} S(op) := t t := t + end
46 List scheduling Picking an opera?on from ready If ready never contains more than one opera5on, the generated schedule is op5mal If more than one opera5ons are ready, the choice of the next-to-bescheduled opera5on is cri5cal to the performance of the algorithm Pick the opera5on with the highest priority Most algorithms use several priori5es to break 5es How do we compute these priori?es?
47 47
48 Priority func?ons in list scheduling procedure ListScheduling( ); begin o := pick one operation from ready using some priority function; end; Common priority func?ons: Height: distance from exit node gives priority to amount of work le` Slackness: inverse of slack gives priority to opera5ons on the cri5cal path Register use: number of source operands reduces the number of live registers Uncover: fanout (number of children) frees up nodes quickly Original instruc5on order
49 List scheduling Priori?es based on the DDG Estart: earliest start 5me (ASAP as soon as possible) Lstart: latest start 5me (ALAP as late as possible) slack: scheduling freedom slack(op)=lstart(op) Estart(op)
50 List scheduling slack(op)=lstart(op) Estart(op)
51 List scheduling Compu?ng Estart, Lstart, slack Estart = latency = Lstart = Estart = latency = Lstart = Estart = latency = Lstart = exit Estart = latency = Lstart = Estart = latency = Lstart = Estart = latency = Lstart = Estart = latency = Lstart = Estart = latency = Lstart = Estart = latency = 0 Lstart =
52 Estart Estart = latency = Lstart = Estart = latency = Lstart = Estart = 8 latency = Lstart = exit Estart = 0 latency = Lstart = Estart = latency = Lstart = Estart = latency = Lstart = Estart = 6 latency = Lstart = Estart = 8 latency = Lstart = Estart = 0 latency = 0 Lstart =
53 Lstart Estart = latency = Lstart = Estart = latency = Lstart = 4 Estart = 8 latency = Lstart = exit Estart = 0 latency = Lstart = 0 Estart = latency = Lstart = Estart = latency = Lstart = Estart = 6 latency = Lstart = 6 Estart = 8 latency = Lstart = 8 Estart = 0 latency = 0 Lstart = 0
54 Slack slack(op)=lstart(op) Estart(op) slack = 0 Estart = 0 latency = Lstart = 0 Estart = latency = Lstart = 0 Estart = latency = Lstart = Estart = latency = Lstart = Estart = latency = Lstart = Estart = 8 latency = Lstart = Estart = 6 latency = Lstart = 6 Estart = 8 latency = Lstart = 8 0 exit Estart = 0 latency = 0 Lstart = 0
55 List scheduling Another way to look at the cri?cal path Sequence of cri5cal opera5ons Cri5cal opera5on: slack(op) = 0 slack = 0 Estart = 0 latency = Lstart = 0 Estart = latency = Lstart = 0 Estart = latency = Lstart = Estart = latency = Lstart = Estart = latency = Lstart = Estart = 8 latency = Lstart = Estart = 6 latency = Lstart = 6 0 Estart = 8 latency = Lstart = 8 0 exit Estart = 0 latency = 0 Lstart = 0
56 Priority func?on: height-based Height-based priority func?on Gives priority to amount of work le` priority(op) = Lstart(exit) Lstart(op) + Estart = latency = Lstart = Estart = latency = Lstart = 4 Estart = 8 latency = Lstart = exit Estart = 0 latency = Lstart = 0 Estart = latency = Lstart = Estart = latency = Lstart = Estart = 6 latency = Lstart = 6 Estart = 8 latency = Lstart = 8 Estart = 0 latency = 0 Lstart = 0 op exit priority
57 Priority func?on: height-based Height-based priority func?on Gives priority to amount of work le` priority(op) = Lstart(exit) Lstart(op) + Estart = latency = Lstart = Estart = latency = Lstart = 4 Estart = 8 latency = Lstart = exit Estart = 0 latency = Lstart = 0 Estart = latency = Lstart = Estart = latency = Lstart = Estart = 6 latency = Lstart = 6 Estart = 8 latency = Lstart = 8 Estart = 0 latency = 0 Lstart = 0 op priority exit
58 Example height-based 8 r4 ß MEM[r0] load r4 ß MEM[r0] add r4 ß r4, # load r ß MEM[r] 4 add r ß r, # 5 add r0 ß r0, #4 6 add r ß r, #4 7 mul r ß r, r4 8 store MEM[r], r 9 add r ß r, #4 5 r4 ß r4 + 8 r ß MEM[r] 5 r ß r r0 ß r0 + 4 r ß r + 4 Target machine proper5es pipelined with forwarding, single issue, in-order opera5on latencies: add, sub: cycle; mul, load: cycles; store: cycle 4 r ß r4 * r MEM[r] ß r 0 r ß r + 4
59 Example height-based load r4 ß MEM[r0] add r4 ß r4, # load r ß MEM[r] 4 add r ß r, # 5 add r0 ß r0, #4 6 add r ß r, #4 7 mul r ß r, r4 8 store MEM[r], r 9 add r ß r, #4 Estart = Lstart = Estart = Lstart = 5 0 Estart = Lstart = 4 Estart = Lstart = Estart = Lstart = 0 6 Estart = Lstart = 7 Estart = Lstart = Target machine proper5es pipelined with forwarding, single issue, in-order opera5on latencies: add, sub: cycle; mul, load: cycles; store: cycle Estart = Lstart = exit Estart = Lstart = Estart = Lstart =
60 Estart = 0 Lstart = 0 Estart = Lstart = 0 Estart = 0 Lstart = 0 4 Estart = Lstart = Estart = 0 Lstart = Estart = 0 Lstart = 7 7 Estart = 4 Lstart = Estart = 7 Lstart = 7 Estart = 7 Lstart = 7 9 exit Estart = 8 Lstart = 8 60
61 Estart = 0 Lstart = 0 Estart = Lstart = 0 Estart = 0 Lstart = 0 4 Estart = Lstart = Estart = 0 Lstart = Estart = 0 Lstart = 7 7 Estart = 4 Lstart = Estart = 7 Lstart = 7 Estart = 7 Lstart = 7 9 exit Estart = 8 Lstart = 8 6
62 op priority Estart = 0 Lstart = 0 Estart = 0 Lstart = Estart = Lstart = 4 Estart = 0 Lstart = 0 Estart = Lstart = exit Estart = 4 Lstart = 4 Estart = 7 Lstart = 7 6 Estart = 0 Lstart = 7 Estart = 7 Lstart = 7 9 exit Estart = 8 Lstart = 8 6
63 Estart = 0 Lstart = 0 Estart = Lstart = op priority Estart = Lstart = Estart = 0 Lstart = Estart = 0 Lstart = Estart = 4 Lstart = 4 Estart = 7 Lstart = 7 6 Estart = 0 Lstart = 7 9 exit Estart = 7 Lstart = 7 9 exit Estart = 8 Lstart = 8 6
64 Example con?nued Ini5aliza5on t := ready := {,, 5, 6 } active := {} op exit priority Itera5on: t =, ready = {,,5,6}, active = {} active is empty à pass ready is not empty o := (with priority 9) S() = S t op t =, ready = {,5,6}, active = {} active is not empty o = : S()+delay() t? no ready is not empty o := (with priority 9) ready := {5,6}, active = {,} S() = S t op
65 Example con?nued Itera5on: t =, ready = {5,6}, active = {,} active is not empty o = : S()+delay()? no o = : S()+delay()? No ready is not empty o := 5 (with priority ) ready := {6}, active = {,,5} S(5) = S t op 5 op exit priority t = 4, ready = {6}, active = {,,5} active is not empty o = : S()+delay() 4? yes active := active \ {} ready := ready {} o = : S()+delay() 4? No ready is not empty o := (with priority 6) ready := {6}, active = {,5,} S() = 4 S t op 5 4
66 Example con?nued op exit priority Itera5on: t = 5, ready = {6}, active = {,5,} active is not empty o = : S()+delay() 5? yes active := active \ {} ready := ready {4} o = 5: S(5)+delay(5) 4? yes active := active \ {5} no data dependent successors o = : S()+delay() 4? yes active := active \ {} successor (7) not ready due to 4 S t op ready is not empty o := 4 (with priority 6) ready := {6}, active = {4} S(4) = 5
67 Example -- con?nued op exit priority Itera5on: t = 6, ready = {6}, active = {4} active is not empty o = 4: S(4)+delay(4) 6? ne active := active \ {4} ready := ready {7} ready is not empty o := 7 (with priority 5) ready := {6}, active = {7} S(7) = 6 t = 7, ready = {6}, active = {7} active is not empty o = 7: S(7)+delay(7) 7? No ready is not empty o := 6 (with priority ) ready := {}, active = {6,7} S(6) = 7 S t op S t op
68 Example con?nued Itera5on: t = 8, ready = {}, active = {6,7} active is not empty o = 6: S(6)+delay(6) 8? yes active := active \ {6} no data dependent successors o = 7: S(7)+delay(7) 8? no S t op op exit priority ready is empty t = 9, ready = {}, active = {7} active is not empty o = 7: S(7)+delay(7) 9? yes active := active \ {7} ready := ready {8,9} ready is not empty o := 8 (with priority ) ready := {9}, active = {8} S(8) = 9 S t op
69 Example con?nued Itera5on: t = 0, ready = {9}, active = {8} active is not empty o = 8: S(8)+delay(8) 0? yes active := active \ {8} no data dependent successors ready is not empty o := 9 (with priority ) ready := {}, active = {9} S(9) = 0 t =, ready = {}, active = {9} active is not empty o = 9: S(9)+delay(9)? yes active := active \ {9} no data dependent successors done. ready is empty S t op S t op op exit priority
70 Classifica?on of scheduling techniques Direc5on Scheduling Flow analysis Search Scheduling unit Forward Backward Cycle Opera5ons Linear Graph Greedy Backtrack Acyclic Cyclic Basic block Trace Superblock 70
71 Scheduling & register alloca?on Phase ordering problem between instruc?on scheduling and register alloca?on (RA) Effects of the scheduler on RA The scheduler can use renaming to get rid of an5 dependences to obtain more freedom in scheduling The resul5ng overlap of previously constrained opera5ons may increase register pressure, Which, in turn, may force the register allocator to spill one more variable And vice versa (RA constrains the scheduler in a RA-first compiler) Combining scheduling and register alloca?on Poten5al to produce beter solu5ons Typically not done due to complexity
72 with thanks to Bernhard Egger for slide material 7
Register Alloca.on Deconstructed. David Ryan Koes Seth Copen Goldstein
Register Alloca.on Deconstructed David Ryan Koes Seth Copen Goldstein 12th Interna+onal Workshop on So3ware and Compilers for Embedded Systems April 24, 12009 Register Alloca:on Problem unbounded number
More informationCompiler Optimization Intermediate Representation
Compiler Optimization Intermediate Representation Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology
More informationInstruction scheduling. Advanced Compiler Construction Michel Schinz
Instruction scheduling Advanced Compiler Construction Michel Schinz 2015 05 21 Instruction ordering When a compiler emits the instructions corresponding to a program, it imposes a total order on them.
More informationInstruction Scheduling
Instruction Scheduling Michael O Boyle February, 2014 1 Course Structure Introduction and Recap Course Work Scalar optimisation and dataflow L5 Code generation L6 Instruction scheduling Next register allocation
More informationLecture Compiler Backend
Lecture 19-23 Compiler Backend Jianwen Zhu Electrical and Computer Engineering University of Toronto Jianwen Zhu 2009 - P. 1 Backend Tasks Instruction selection Map virtual instructions To machine instructions
More informationInstruction Selection and Scheduling
Instruction Selection and Scheduling The Problem Writing a compiler is a lot of work Would like to reuse components whenever possible Would like to automate construction of components Front End Middle
More informationCompiler: Control Flow Optimization
Compiler: Control Flow Optimization Virendra Singh Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/
More informationCS415 Compilers. Instruction Scheduling and Lexical Analysis
CS415 Compilers Instruction Scheduling and Lexical Analysis These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University Instruction Scheduling (Engineer
More informationCS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 32: Pipeline Parallelism 3
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 32: Pipeline Parallelism 3 Instructor: Dan Garcia inst.eecs.berkeley.edu/~cs61c! Compu@ng in the News At a laboratory in São Paulo,
More informationInstructor: Randy H. Katz hap://inst.eecs.berkeley.edu/~cs61c/fa13. Fall Lecture #7. Warehouse Scale Computer
CS 61C: Great Ideas in Computer Architecture Everything is a Number Instructor: Randy H. Katz hap://inst.eecs.berkeley.edu/~cs61c/fa13 9/19/13 Fall 2013 - - Lecture #7 1 New- School Machine Structures
More informationUNIT V: CENTRAL PROCESSING UNIT
UNIT V: CENTRAL PROCESSING UNIT Agenda Basic Instruc1on Cycle & Sets Addressing Instruc1on Format Processor Organiza1on Register Organiza1on Pipeline Processors Instruc1on Pipelining Co-Processors RISC
More informationCSSE232 Computer Architecture I. Datapath
CSSE232 Computer Architecture I Datapath Class Status Reading Sec;ons 4.1-3 Project Project group milestone assigned Indicate who you want to work with Indicate who you don t want to work with Due next
More informationChapter 3: Instruc0on Level Parallelism and Its Exploita0on
Chapter 3: Instruc0on Level Parallelism and Its Exploita0on - Abdullah Muzahid Hardware- Based Specula0on (Sec0on 3.6) In mul0ple issue processors, stalls due to branches would be frequent: You may need
More informationfast code (preserve flow of data)
Instruction scheduling: The engineer s view The problem Given a code fragment for some target machine and the latencies for each individual instruction, reorder the instructions to minimize execution time
More informationIntroduction to Optimization, Instruction Selection and Scheduling, and Register Allocation
Introduction to Optimization, Instruction Selection and Scheduling, and Register Allocation Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Traditional Three-pass Compiler
More informationTopic 14: Scheduling COS 320. Compiling Techniques. Princeton University Spring Lennart Beringer
Topic 14: Scheduling COS 320 Compiling Techniques Princeton University Spring 2016 Lennart Beringer 1 The Back End Well, let s see Motivating example Starting point Motivating example Starting point Multiplication
More informationParallel Implementation of Task Scheduling using Ant Colony Optimization
Parallel Implementaon of Task Scheduling using Ant Colony Opmizaon T. Vetri Selvan 1, Mrs. P. Chitra 2, Dr. P. Venkatesh 3 1 Thiagaraar College of Engineering /Department of Computer Science, Madurai,
More informationAdministration CS 412/413. Instruction ordering issues. Simplified architecture model. Examples. Impact of instruction ordering
dministration CS 1/13 Introduction to Compilers and Translators ndrew Myers Cornell University P due in 1 week Optional reading: Muchnick 17 Lecture 30: Instruction scheduling 1 pril 00 1 Impact of instruction
More informationCS 406/534 Compiler Construction Instruction Scheduling
CS 406/534 Compiler Construction Instruction Scheduling Prof. Li Xu Dept. of Computer Science UMass Lowell Fall 2004 Part of the course lecture notes are based on Prof. Keith Cooper, Prof. Ken Kennedy
More informationOpera&ng Systems ECE344
Opera&ng Systems ECE344 Lecture 10: Scheduling Ding Yuan Scheduling Overview In discussing process management and synchroniza&on, we talked about context switching among processes/threads on the ready
More informationHigh-Level Synthesis Creating Custom Circuits from High-Level Code
High-Level Synthesis Creating Custom Circuits from High-Level Code Hao Zheng Comp Sci & Eng University of South Florida Exis%ng Design Flow Register-transfer (RT) synthesis - Specify RT structure (muxes,
More informationAr#ficial Intelligence
Ar#ficial Intelligence Advanced Searching Prof Alexiei Dingli Gene#c Algorithms Charles Darwin Genetic Algorithms are good at taking large, potentially huge search spaces and navigating them, looking for
More informationExample. You manage a web site, that suddenly becomes wildly popular. Performance starts to degrade. Do you?
Scheduling Main Points Scheduling policy: what to do next, when there are mul:ple threads ready to run Or mul:ple packets to send, or web requests to serve, or Defini:ons response :me, throughput, predictability
More informationSoft GPGPUs for Embedded FPGAS: An Architectural Evaluation
Soft GPGPUs for Embedded FPGAS: An Architectural Evaluation 2nd International Workshop on Overlay Architectures for FPGAs (OLAF) 2016 Kevin Andryc, Tedy Thomas and Russell Tessier University of Massachusetts
More informationWays to implement a language
Interpreters Implemen+ng PLs Most of the course is learning fundamental concepts for using PLs Syntax vs. seman+cs vs. idioms Powerful constructs like closures, first- class objects, iterators (streams),
More informationSimple Machine Model. Lectures 14 & 15: Instruction Scheduling. Simple Execution Model. Simple Execution Model
Simple Machine Model Fall 005 Lectures & 5: Instruction Scheduling Instructions are executed in sequence Fetch, decode, execute, store results One instruction at a time For branch instructions, start fetching
More informationRegister Allocation (wrapup) & Code Scheduling. Constructing and Representing the Interference Graph. Adjacency List CS2210
Register Allocation (wrapup) & Code Scheduling CS2210 Lecture 22 Constructing and Representing the Interference Graph Construction alternatives: as side effect of live variables analysis (when variables
More informationLecture 19. Software Pipelining. I. Example of DoAll Loops. I. Introduction. II. Problem Formulation. III. Algorithm.
Lecture 19 Software Pipelining I. Introduction II. Problem Formulation III. Algorithm I. Example of DoAll Loops Machine: Per clock: 1 read, 1 write, 1 (2-stage) arithmetic op, with hardware loop op and
More informationObjec&ves. Review. Directed Graphs: Strong Connec&vity Greedy Algorithms
Objec&ves Directed Graphs: Strong Connec&vity Greedy Algorithms Ø Interval Scheduling Feb 7, 2018 CSCI211 - Sprenkle 1 Review Compare and contrast directed and undirected graphs What is a topological ordering?
More informationInstructors: Randy H. Katz David A. PaGerson hgp://inst.eecs.berkeley.edu/~cs61c/fa10. 10/4/10 Fall Lecture #16. Agenda
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Instructors: Randy H. Katz David A. PaGerson hgp://inst.eecs.berkeley.edu/~cs61c/fa10 1 Agenda Cache Sizing/Hits and Misses Administrivia
More informationReading assignment. Chapter 3.1, 3.2 Chapter 4.1, 4.3
Reading assignment Chapter 3.1, 3.2 Chapter 4.1, 4.3 1 Outline Introduc5on to assembly programing Introduc5on to Y86 Y86 instruc5ons, encoding and execu5on 2 Assembly The CPU uses machine language to perform
More informationComputer Architecture. CSE 1019Y Week 16. Introduc>on to MARIE
Computer Architecture CSE 1019Y Week 16 Introduc>on to MARIE MARIE Simple model computer used in this class MARIE Machine Architecture that is Really Intui>ve and Easy Designed for educa>on only While
More informationVirtualization. Introduction. Why we interested? 11/28/15. Virtualiza5on provide an abstract environment to run applica5ons.
Virtualization Yifu Rong Introduction Virtualiza5on provide an abstract environment to run applica5ons. Virtualiza5on technologies have a long trail in the history of computer science. Why we interested?
More informationECSE 425 Lecture 25: Mul1- threading
ECSE 425 Lecture 25: Mul1- threading H&P Chapter 3 Last Time Theore1cal and prac1cal limits of ILP Instruc1on window Branch predic1on Register renaming 2 Today Mul1- threading Chapter 3.5 Summary of ILP:
More informationComputer Architecture: Mul1ple Issue. Berk Sunar and Thomas Eisenbarth ECE 505
Computer Architecture: Mul1ple Issue Berk Sunar and Thomas Eisenbarth ECE 505 Outline 5 stages of RISC Type of hazards Sta@c and Dynamic Branch Predic@on Pipelining with Excep@ons Pipelining with Floa@ng-
More information: Compiler Design
252-210: Compiler Design 7.5.* Actuals/formals correspondence Thomas R. Gross Computer Science Department ETH Zurich, Switzerland Actual- formal correspondence 7.5.1 Call- by- value Caller passes value
More informationParallel Programming Pa,erns
Parallel Programming Pa,erns Bryan Mills, PhD Spring 2017 What is a programming pa,erns? Repeatable solu@on to commonly occurring problem It isn t a solu@on that you can t simply apply, the engineer has
More informationCS 61C: Great Ideas in Computer Architecture (Machine Structures)
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Instructors: Randy H. Katz David A. PaGerson hgp://inst.eecs.berkeley.edu/~cs61c/fa10 1 2 Cache Field Sizes Number of bits in a cache includes
More informationCS 61C: Great Ideas in Computer Architecture Strings and Func.ons. Anything can be represented as a number, i.e., data or instruc\ons
CS 61C: Great Ideas in Computer Architecture Strings and Func.ons Instructor: Krste Asanovic, Randy H. Katz hdp://inst.eecs.berkeley.edu/~cs61c/sp12 Fall 2012 - - Lecture #7 1 New- School Machine Structures
More informationInstruction Scheduling Beyond Basic Blocks Extended Basic Blocks, Superblock Cloning, & Traces, with a quick introduction to Dominators.
Instruction Scheduling Beyond Basic Blocks Extended Basic Blocks, Superblock Cloning, & Traces, with a quick introduction to Dominators Comp 412 COMP 412 FALL 2016 source code IR Front End Optimizer Back
More informationThe ILOC Virtual Machine (Lab 1 Background Material) Comp 412
COMP 12 FALL 20 The ILOC Virtual Machine (Lab 1 Background Material) Comp 12 source code IR Front End OpMmizer Back End IR target code Copyright 20, Keith D. Cooper & Linda Torczon, all rights reserved.
More informationPunctual Coalescing. Fernando Magno Quintão Pereira
Punctual Coalescing Fernando Magno Quintão Pereira Register Coalescing Register coalescing is an op7miza7on on top of register alloca7on. The objec7ve is to map both variables used in a copy instruc7on
More informationPrinciples of Programming Languages
Principles of Programming Languages h"p://www.di.unipi.it/~andrea/dida2ca/plp- 14/ Prof. Andrea Corradini Department of Computer Science, Pisa Lesson 18! Bootstrapping Names in programming languages Binding
More informationCS 33. Architecture and Optimization (2) CS33 Intro to Computer Systems XV 1 Copyright 2016 Thomas W. Doeppner. All rights reserved.
CS 33 Architecture and Optimization (2) CS33 Intro to Computer Systems XV 1 Copyright 2016 Thomas W. Doeppner. All rights reserved. Modern CPU Design Instruc&on Control Re%rement Unit Register File Fetch
More informationCS5363 Final Review. cs5363 1
CS5363 Final Review cs5363 1 Programming language implementation Programming languages Tools for describing data and algorithms Instructing machines what to do Communicate between computers and programmers
More informationVirtual Memory: Concepts
Virtual Memory: Concepts Instructor: Dr. Hyunyoung Lee Based on slides provided by Randy Bryant and Dave O Hallaron Today Address spaces VM as a tool for caching VM as a tool for memory management VM as
More informationWhat is Search For? CS 188: Ar)ficial Intelligence. Constraint Sa)sfac)on Problems Sep 14, 2015
CS 188: Ar)ficial Intelligence Constraint Sa)sfac)on Problems Sep 14, 2015 What is Search For? Assump)ons about the world: a single agent, determinis)c ac)ons, fully observed state, discrete state space
More informationECE 468, Fall Midterm 2
ECE 468, Fall 08. Midterm INSTRUCTIONS (read carefully) Fill in your name and PUID. NAME: PUID: Please sign the following: I affirm that the answers given on this test are mine and mine alone. I did not
More informationResearch opportuni/es with me
Research opportuni/es with me Independent study for credit - Build PL tools (parsers, editors) e.g., JDial - Build educa/on tools (e.g., Automata Tutor) - Automata theory problems e.g., AutomatArk - Research
More information: Compiler Design
252-210: Compiler Design 9.0 Data- Flow Analysis Thomas R. Gross Computer Science Department ETH Zurich, Switzerland Global program analysis is a crucial part of all real compilers. Global : beyond a statement
More informationSpecial Topics on Algorithms Fall 2017 Dynamic Programming. Vangelis Markakis, Ioannis Milis and George Zois
Special Topics on Algorithms Fall 2017 Dynamic Programming Vangelis Markakis, Ioannis Milis and George Zois Basic Algorithmic Techniques Content Dynamic Programming Introduc
More informationInstruc=on Set Architecture
ECPE 170 Jeff Shafer University of the Pacific Instruc=on Set Architecture 2 Schedule Today Closer look at instruc=on sets Thursday Brief discussion of real ISAs Quiz 4 (over Chapter 5, i.e. HW #10 and
More informationChapter. Out of order Execution
Chapter Long EX Instruction stages We have assumed that all stages. There is a problem with the EX stage multiply (MUL) takes more time than ADD MUL ADD We can clearly delay the execution of the ADD until
More informationCE431 Parallel Computer Architecture Spring Compile-time ILP extraction Modulo Scheduling
CE431 Parallel Computer Architecture Spring 2018 Compile-time ILP extraction Modulo Scheduling Nikos Bellas Electrical and Computer Engineering University of Thessaly Parallel Computer Architecture 1 Readings
More informationCS 465 Final Review. Fall 2017 Prof. Daniel Menasce
CS 465 Final Review Fall 2017 Prof. Daniel Menasce Ques@ons What are the types of hazards in a datapath and how each of them can be mi@gated? State and explain some of the methods used to deal with branch
More informationData Flow Analysis. Suman Jana. Adopted From U Penn CIS 570: Modern Programming Language Implementa=on (Autumn 2006)
Data Flow Analysis Suman Jana Adopted From U Penn CIS 570: Modern Programming Language Implementa=on (Autumn 2006) Data flow analysis Derives informa=on about the dynamic behavior of a program by only
More informationCS 61C: Great Ideas in Computer Architecture Func%ons and Numbers
CS 61C: Great Ideas in Computer Architecture Func%ons and Numbers 9/11/12 Instructor: Krste Asanovic, Randy H. Katz hcp://inst.eecs.berkeley.edu/~cs61c/sp12 Fall 2012 - - Lecture #8 1 New- School Machine
More informationInstruction scheduling
Instruction scheduling iaokang Qiu Purdue University ECE 468 October 12, 2018 What is instruction scheduling? Code generation has created a sequence of assembly instructions But that is not the only valid
More informationWhat Compilers Can and Cannot Do. Saman Amarasinghe Fall 2009
What Compilers Can and Cannot Do Saman Amarasinghe Fall 009 Optimization Continuum Many examples across the compilation pipeline Static Dynamic Program Compiler Linker Loader Runtime System Optimization
More informationCS 188: Ar)ficial Intelligence
CS 188: Ar)ficial Intelligence Search Instructors: Pieter Abbeel & Anca Dragan University of California, Berkeley [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley
More informationLecture 2: Memory in C
CIS 330:! / / / / (_) / / / / _/_/ / / / / / \/ / /_/ / `/ \/ / / / _/_// / / / / /_ / /_/ / / / / /> < / /_/ / / / / /_/ / / / /_/ / / / / / \ /_/ /_/_/_/ _ \,_/_/ /_/\,_/ \ /_/ \ //_/ /_/ Lecture 2:
More informationEfficient Memory and Bandwidth Management for Industrial Strength Kirchhoff Migra<on
Efficient Memory and Bandwidth Management for Industrial Strength Kirchhoff Migra
More informationLecture 7. Instruction Scheduling. I. Basic Block Scheduling II. Global Scheduling (for Non-Numeric Code)
Lecture 7 Instruction Scheduling I. Basic Block Scheduling II. Global Scheduling (for Non-Numeric Code) Reading: Chapter 10.3 10.4 CS243: Instruction Scheduling 1 Scheduling Constraints Data dependences
More informationTopics. Computer Organization CS Improving Performance. Opportunity for (Easy) Points. Three Generic Data Hazards
Computer Organization CS 231-01 Improving Performance Dr. William H. Robinson November 8, 2004 Topics Money's only important when you don't have any. Sting Cache Scoreboarding http://eecs.vanderbilt.edu/courses/cs231/
More informationDeformable Part Models
Deformable Part Models References: Felzenszwalb, Girshick, McAllester and Ramanan, Object Detec@on with Discrimina@vely Trained Part Based Models, PAMI 2010 Code available at hkp://www.cs.berkeley.edu/~rbg/latent/
More information: Advanced Compiler Design
263-2810: Advanced Compiler Design Thomas R. Gross Computer Science Department ETH Zurich, Switzerland Topics Program opgmizagon Op%miza%on Op%mize for (execu%on) speed Op%mize for (code) size Op%mize
More informationSuperscalar Processors Ch 14
Superscalar Processors Ch 14 Limitations, Hazards Instruction Issue Policy Register Renaming Branch Prediction PowerPC, Pentium 4 1 Superscalar Processing (5) Basic idea: more than one instruction completion
More informationCSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on. Instructor: Wei-Min Shen
CSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on Instructor: Wei-Min Shen Status Check and Review Status check Have you registered in Piazza? Have you run the Project-1?
More informationSuperscalar Processing (5) Superscalar Processors Ch 14. New dependency for superscalar case? (8) Output Dependency?
Superscalar Processors Ch 14 Limitations, Hazards Instruction Issue Policy Register Renaming Branch Prediction PowerPC, Pentium 4 1 Superscalar Processing (5) Basic idea: more than one instruction completion
More informationCSE 473: Ar+ficial Intelligence
CSE 473: Ar+ficial Intelligence Search Instructor: Luke Ze=lemoyer University of Washington [These slides were adapted from Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials
More informationBIL 682 Ar+ficial Intelligence Week #2: Solving problems by searching. Asst. Prof. Aykut Erdem Dept. of Computer Engineering HaceDepe University
BIL 682 Ar+ficial Intelligence Week #2: Solving problems by searching Asst. Prof. Aykut Erdem Dept. of Computer Engineering HaceDepe University Today Search problems Uninformed search Informed (heuris+c)
More informationCS 61C: Great Ideas in Computer Architecture (Machine Structures) More Cache: Set Associa0vity. Smart Phone. Today s Lecture. Core.
CS 6C: Great Ideas in Computer Architecture (Machine Structures) More Cache: Set Associavity Instructors: Randy H Katz David A PaGerson Guest Lecture: Krste Asanovic hgp://insteecsberkeleyedu/~cs6c/fa
More informationCSE Opera,ng System Principles
CSE 30341 Opera,ng System Principles Synchroniza2on Overview Background The Cri,cal-Sec,on Problem Peterson s Solu,on Synchroniza,on Hardware Mutex Locks Semaphores Classic Problems of Synchroniza,on Monitors
More informationSta$c Single Assignment (SSA) Form
Sta$c Single Assignment (SSA) Form SSA form Sta$c single assignment form Intermediate representa$on of program in which every use of a variable is reached by exactly one defini$on Most programs do not
More informationCSE P 501 Compilers. Instruc7on Selec7on Hal Perkins Winter UW CSE P 501 Winter 2016 N-1
CSE P 501 Compilers Instruc7on Selec7on Hal Perkins Winter 2016 UW CSE P 501 Winter 2016 N-1 Agenda Compiler back- end organiza7on Instruc7on selec7on tree padern matching Credits: Slides by Keith Cooper
More informationhashfs Applying Hashing to Op2mize File Systems for Small File Reads
hashfs Applying Hashing to Op2mize File Systems for Small File Reads Paul Lensing, Dirk Meister, André Brinkmann Paderborn Center for Parallel Compu2ng University of Paderborn Mo2va2on and Problem Design
More informationCompiler Architecture
Code Generation 1 Compiler Architecture Source language Scanner (lexical analysis) Tokens Parser (syntax analysis) Syntactic structure Semantic Analysis (IC generator) Intermediate Language Code Optimizer
More informationInstruction-Level Parallelism (ILP)
Instruction Level Parallelism Instruction-Level Parallelism (ILP): overlap the execution of instructions to improve performance 2 approaches to exploit ILP: 1. Rely on hardware to help discover and exploit
More informationOpera&ng Systems ECE344
Opera&ng Systems ECE344 Lecture 8: Paging Ding Yuan Lecture Overview Today we ll cover more paging mechanisms: Op&miza&ons Managing page tables (space) Efficient transla&ons (TLBs) (&me) Demand paged virtual
More informationRecursive Helper functions
11/16/16 Page 22 11/16/16 Page 23 11/16/16 Page 24 Recursive Helper functions Some%mes it is easier to find a recursive solu%on if you make a slight change to the original problem. Consider the palindrome
More informationRegister Allocation. Global Register Allocation Webs and Graph Coloring Node Splitting and Other Transformations
Register Allocation Global Register Allocation Webs and Graph Coloring Node Splitting and Other Transformations Copyright 2015, Pedro C. Diniz, all rights reserved. Students enrolled in the Compilers class
More informationhnp://
The bots face off in a tournament against one another and about an equal number of humans, with each player trying to score points by elimina&ng its opponents. Each player also has a "judging gun" in addi&on
More informationNetworks and Opera/ng Systems Chapter 13: Scheduling
Networks and Opera/ng Systems Chapter 13: Scheduling (252 0062 00) Donald Kossmann & Torsten Hoefler Frühjahrssemester 2013 Systems Group Department of Computer Science ETH Zürich Last /me Process concepts
More informationCompiler Optimization Techniques
Compiler Optimization Techniques Department of Computer Science, Faculty of ICT February 5, 2014 Introduction Code optimisations usually involve the replacement (transformation) of code from one sequence
More informationSuperscalar Processors Ch 13. Superscalar Processing (5) Computer Organization II 10/10/2001. New dependency for superscalar case? (8) Name dependency
Superscalar Processors Ch 13 Limitations, Hazards Instruction Issue Policy Register Renaming Branch Prediction 1 New dependency for superscalar case? (8) Name dependency (nimiriippuvuus) two use the same
More informationW1005 Intro to CS and Programming in MATLAB. Brief History of Compu?ng. Fall 2014 Instructor: Ilia Vovsha. hip://www.cs.columbia.
W1005 Intro to CS and Programming in MATLAB Brief History of Compu?ng Fall 2014 Instructor: Ilia Vovsha hip://www.cs.columbia.edu/~vovsha/w1005 Computer Philosophy Computer is a (electronic digital) device
More informationVirtual Memory B: Objec5ves
Virtual Memory B: Objec5ves Benefits of a virtual memory system" Demand paging, page-replacement algorithms, and allocation of page frames" The working-set model" Relationship between shared memory and
More informationEE382A Lecture 7: Dynamic Scheduling. Department of Electrical Engineering Stanford University
EE382A Lecture 7: Dynamic Scheduling Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a Lecture 7-1 Announcements Project proposal due on Wed 10/14 2-3 pages submitted
More informationProgram Op*miza*on and Analysis. Chenyang Lu CSE 467S
Program Op*miza*on and Analysis Chenyang Lu CSE 467S 1 Program Transforma*on op#mize Analyze HLL compile assembly assemble Physical Address Rela5ve Address assembly object load executable link Absolute
More informationVirtual Memory: Concepts
Virtual Memory: Concepts 5-23 / 8-23: Introduc=on to Computer Systems 6 th Lecture, Mar. 8, 24 Instructors: Anthony Rowe, Seth Goldstein, and Gregory Kesden Today VM Movaon and Address spaces ) VM as a
More informationPage # Let the Compiler Do it Pros and Cons Pros. Exploiting ILP through Software Approaches. Cons. Perhaps a mixture of the two?
Exploiting ILP through Software Approaches Venkatesh Akella EEC 270 Winter 2005 Based on Slides from Prof. Al. Davis @ cs.utah.edu Let the Compiler Do it Pros and Cons Pros No window size limitation, the
More informationApplied Algorithm Design Lecture 3
Applied Algorithm Design Lecture 3 Pietro Michiardi Eurecom Pietro Michiardi (Eurecom) Applied Algorithm Design Lecture 3 1 / 75 PART I : GREEDY ALGORITHMS Pietro Michiardi (Eurecom) Applied Algorithm
More informationComputer Architecture
Computer Architecture An Introduction Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/
More informationDynamic Scheduling. CSE471 Susan Eggers 1
Dynamic Scheduling Why go out of style? expensive hardware for the time (actually, still is, relatively) register files grew so less register pressure early RISCs had lower CPIs Why come back? higher chip
More informationPipelining and Exploiting Instruction-Level Parallelism (ILP)
Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining and Instruction-Level Parallelism (ILP). Definition of basic instruction block Increasing Instruction-Level Parallelism (ILP) &
More informationCS252 Graduate Computer Architecture Spring 2014 Lecture 13: Mul>threading
CS252 Graduate Computer Architecture Spring 2014 Lecture 13: Mul>threading Krste Asanovic krste@eecs.berkeley.edu http://inst.eecs.berkeley.edu/~cs252/sp14 Last Time in Lecture 12 Synchroniza?on and Memory
More informationPreventing Stalls: 1
Preventing Stalls: 1 2 PipeLine Pipeline efficiency Pipeline CPI = Ideal pipeline CPI + Structural Stalls + Data Hazard Stalls + Control Stalls Ideal pipeline CPI: best possible (1 as n ) Structural hazards:
More informationUnit 2: High-Level Synthesis
Course contents Unit 2: High-Level Synthesis Hardware modeling Data flow Scheduling/allocation/assignment Reading Chapter 11 Unit 2 1 High-Level Synthesis (HLS) Hardware-description language (HDL) synthesis
More informationCS152 Computer Architecture and Engineering. Complex Pipelines
CS152 Computer Architecture and Engineering Complex Pipelines Assigned March 6 Problem Set #3 Due March 20 http://inst.eecs.berkeley.edu/~cs152/sp12 The problem sets are intended to help you learn the
More informationInstruction-Level Parallelism and Its Exploitation
Chapter 2 Instruction-Level Parallelism and Its Exploitation 1 Overview Instruction level parallelism Dynamic Scheduling Techniques es Scoreboarding Tomasulo s s Algorithm Reducing Branch Cost with Dynamic
More information