Processor Architecture
|
|
- George Eaton
- 6 years ago
- Views:
Transcription
1 Processor Architecture Advanced Dynamic Scheduling Techniques M. Schölzel
2 Content Tomasulo with speculative execution Introducing superscalarity into the instruction pipeline Multithreading
3 Content Tomasulo with speculative execution Introducing superscalarity into the instruction pipeline Multithreading
4 Control Flow Dependencies Let b a conditional branch at address a with branch taget z. An operation c ist control flow dependent on b, if the execution of c depends on the branch of b. Otherwise c is not control flow dependent. Examples: a: c b a: b a+: z: a: b a+: z: b2 c a+: z: c d c is not control flow dependent on b c is not control flow dependent on b c is control flow dependent on b and b2 What about d?
5 Scheduling restrictions imposed by control flow dependencies for control flow dependent operations: cannot be moved before the branch for not control flow dependent operations: cannot be moved behind a branch b c b c b b c c Program order Speculative Execution of c Program order c may be not executed
6 Performance Problem due to Control Hazards Problem: Branch target of an operation is only known after execution Long pipeline stalls required in processors with deep pipelines Instruction Queue Speicher b PC Branch operation? Address for next instruction fetch is not known Solution: to the reservation stations Branch prediction helps, but is limitted Tomasulo supports speculative fetch, issue, but not execute of operations
7 Drawbacks of Speculative Execution What happens if an operation is executed speculatively and speculation was wrong? May affect the data flow May affect the exception behavior block to be executed b speculatively executed block c b c after dynamic scheduling executed program control flow graph of a program
8 Example Affected Data Flow c is executed speculatively before b mul-operation now receives the value in r from sub- instead of from addoperation Affected Exceptions Behavior c is executed speculatively before b Division by possible a c b add r <- r2,r3 sub r <- r4,r5 a c b div r <- r4,r5 if r5 = then x else y does not write in r c x: No division y: executed c mul r <- r,r6
9 Solution Divide WB-Phase from Tomasulo-Algorithm into two phases: Forwarding results from EU to s (WB-Phase) Writing results into architectural registers/memory (Commit-Phase) Implemented by: Reorder-Buffer for buffering results from WB-phase Committing buffered results from the Reorder-Buffer in-oder By this: Usage of speculative results possible, without modifying architectural registers/memory locations
10 Architecture for Tomasulo with speculative Execution Operand Bus A Program Memory Instruction Queue PC Operation Bus Reg Reg Reg 2 Reg r Architecture Register Reorder Buffer Operand Bus B EU-Bus EU-Bus EU-Bus Memory Unit Execute Execute m Result Bus
11 Reorder Buffer (ROB) Implemented as a queue: When issuing an operation, an entry is reserved During WB, result is written-back to the reserved entry Commit is done in-order and writes results back to the architectural register speculatively executed operations are committed after preceding branches have been comitted ROB-entries have now the meaning of virtual registers Bypass zu den Result Bus DeMux entry entry 2 entry n Mux To the architectural registers busy to issue-phase (bypass) Reserved entry from first last
12 Fields of the ROB Structure of a ROB-entry res addr type valid busy Meaning of the fields depends on the operation type Operation types: Branch operation Memory operation ALU operation field/meaning res addr type busy valid Branch operation computed target address (will be stored in the PC) c = speculation was correct w = speculation was wrong 3 entry reserved = result has not been computed yet Memory operation Value to be stored in the memory Address at which the res- Value should be stored ALU operation Result of the operation - 2 = result was computed and is available in the resfield
13 Reservation Station Fields has the same functionality as in ordinary Tomasulo: Buffers operations Buffers operands But, ROB-entries are used for determining operand source (virtual register) Operation to be executed (e.g. add, sub, mul, ) Qj = x, if ROB-entry x will store value for operand A, otherwise Qk = x, if ROB-entry x will store value for operand B, otherwise Value for operand A Value for operand B Miscellaneous Type of operation (see table in previous slide) Reserved ROB-entry Status in pipeline (RO, EX, WB) is occupied/free Operand Bus A Operation Bus Operand Bus B DeMux opc Qj Qk Vj Vk misc type rob stat busy opc Qj Qk Vj Vk misc type rob stat busy opc Qj Qk Vj Vk misc type rob stat busy Mux Reservation Station EU-Bus
14 Register File Extensions Mapping of architecture registers to virtual registers (ROB-entries) Architecture register n stores ROB-entry, of the latest operation that is computing the value for n (register renaming) Result Bus Reg Reg Reg 2 Reg r rob rob rob rob Operand Bus A Operand Bus B Example: Reg Reg Reg 2 5 ROB-entry 5 contains result of latest operation with destination register Register is not computed by any operation in the pipeline ROB-entry contains result of latest operation with destination register 2 Reg r
15 Overview Pipeline Phases Issue Schedule operation from instruction queue to Read operand values or rename registers (solving WAR- und WAW-Hazard) Reserve ROB-entry Issue is in-order Execute Wait for operands to be ready Execute operation as soon as operands are ready and EU is available (solve RAW-Hazard) Execute is out-of-order Write-Back Write result through result bus into reserved ROB-entry WB is out-of-order Commit Write results from ROB in order into destination registers/memory Commit is in-order
16 Overview Pipeline Phases (Issue) Issue operation from instruction queue to, if: is free and ROB not full Otherwise: Stall issue stage Allocate - and ROB-entry Read operands, if present in register file, or present in ROB ROB-entry corresponds to a virtual register Programmspeicher Op A PC Reg Reg Reg 2 Reg r reservierter Platz für Op A Reorder Buffer Op A Memory Unit Execute Execute m
17 Overview Pipeline Phases (Execute) Operation is waiting in for operands and free EU Execute operation as soon as all operands are available and EU is free can store state of operation during execution Programmspeicher PC Reg Reg Reg 2 reservierter Platz für Op A Reorder Buffer Reg r Op A Memory Unit Execute Op A Execute m
18 Overview Pipeline Phases (Write-Back) Write result into reserved ROB-entry ROB-entry ID has been stored in the rob-field of the Result is forwarded to all waiting through the result bus (value is identified by its ROB ID) Free Programmspeicher PC Reg Reg Reg 2 reservierter Platz für Ergebnis Op A Reorder Buffer Reg r Memory Unit Execute Op A Execute m
19 Overview Pipeline Phases (Commit) Write results from the first entry in the ROB into the corresponding destination register Free ROB-entry Programmspeicher PC Reg Ergebnis Reg Reg 2 reservierter Platz für Op A Reorder Buffer Reg r Memory Unit Execute Execute m
20 For the operation teat will be issued let denote: opc operation type (add, sub, mul, ) src, src2 source registers dst destination registers Operation can be issued, if there exists an x, where [x].busy = and ROB[last].busy = Update after issue Issue-Phase Details (for ALU-operations) if Reg[src].rob = then // determine value of left operand [x].qj := ; [x].vj := Reg[src] // read left operand from the register file else // left operand is still under computation or in ROB if ROB[Reg[src].rob].valid = then [x].qj := ; [x].vj := ROB[Reg[src].rob].res // read operand from ROB else [x].qj := Reg[src].rob // wait for operand in fi if Reg[src2].rob = then // the same for the right operand [x].busy := ; [x].rob := tail [x].opc := opc; [x].type := ; [x].status := RO
21 Issue-Phase Details (Example ) Situation: Op A can be issued Value for r is taken from the register file Value for r2 is taken from the ROB res add r <- r, r2 sub r3 <- r, r // Op A // Op B addr type valid busy Update if Reg[srcy].rob = then [x].qj/k := ; [x].vj/k := Reg[srcy] Programmspeicher else if ROB[Reg[srcy].rob].valid = then [x].qj/k := ; [x].vj/k := ROB[Reg[srcy].rob].res else [x].qj/k := Reg[srcy].Qj/k fi OP A PC R: 5 R: 4 R2: 89 R3: 7 4 : 2: 3: 4: [56,-,,,] 5: [-,-,-,,] Memory Unit Execute Execute m
22 Issue-Phase Details (Example ) Situation: Op A was issued and ROB-entry 5 was allocated add r <- r, r2 sub r3 <- r, r // Op A // Op B res addr type valid busy Update after issue if Reg[srcy].rob = then [x].qj/k := ; [x].vj/k := Reg[srcy] Programmspeicher else if ROB[Reg[srcy].rob].valid = then [x].qj/k := ; [x].vj/k := ROB[Reg[srcy].rob].res else [x].qj/k := Reg[srcy].Qj/k fi PC R: 5 R: 4 R2: 89 R3: : 2: 3: 4: [56,-,,,] 5: [-,-,,,] [add,,,4,56,-,,5,ro,] Memory Unit Execute Execute m opc Qj Qk Vj Vk misc type rob stat busy
23 Issue-Phase Details (Example 2) Situation: issue of Op A Value of r is read from the register file Value in r2 is computed by 4 res add r <- r, r2 sub r3 <- r, r // Op A // Op B addr type valid busy Update after issue: if Reg[srcy].rob = then [x].qj/k := ; [x].vj/k := Reg[srcy] Programmspeicher else if ROB[Reg[srcy].rob].valid = then [x].qj/k := ; [x].vj/k := ROB[Reg[srcy].rob].res else [x].qj/k := Reg[src].Qj/k fi OP A PC R: 5 R: 4 R2: 89 R3: 7 4 : 2: 3: 4: [-,-,,,] 5: [-,-,-,,] [ld,,,,-,-,2,4,ro,] Memory Unit Execute Execute m opc Qj Qk Vj Vk misc type rob stat busy
24 Issue-Phase Details (Example 2) Situation: Op A was issued Uses ROB-entry 5 Has to wait for the value from r res add r <- r, r2 sub r3 <- r, r // Op A // Op B addr type valid busy Update after issue: if Reg[srcy].rob = then [x].qj/k := ; [x].vj/k := Reg[srcy] Programmspeicher else if ROB[Reg[srcy].rob].valid = then [x].qj/k := ; [x].vj/k := ROB[Reg[srcy].rob].res else [x].qj/k := Reg[src].Qj/k fi PC R: 5 R: 4 R2: 89 R3: 7 4 : 2: 3: 4: [-,-,,,] 5: [-,-,,,] [add,,4,4,-,-,,5,ro,] [ld,,,,-,-,2,4,ro,] Memory Unit Execute Execute m opc Qj Qk Vj Vk misc type rob stat busy
25 Execute Details Executing an operation from a is possible, if [x].status = RO and [x].qj = and [x].qk = Update after start of execution: Perform computation with [x].vj and [x].vk [x].status := EX Update after end of execution: [x].vj := res // Store result temporary in the reservation station [x].status := WB
26 Execute-Phase Details (Example 3) Both operands are ready: [x].status = RO and [x].qj = und [x].qk = res add r <- r, r2 sub r3 <- r, r // Op A // Op B addr type valid busy Programmspeicher OP A PC Reg Reg Reg 2 Reg r 4 : 2: 3: 4: [-,-,,,] 5: [ld,,,,-,-,2,4,ro,] Memory Unit Execute Execute m opc Qj Qk Vj Vk misc type rob stat busy
27 Execute-Phase Details (Example 3) Operation is executed: [x].status = EX add r <- r, r2 sub r3 <- r, r // Op A // Op B res addr type valid busy Programmspeicher OP A PC Reg Reg Reg 2 Reg r 4 : 2: 3: 4: [-,-,,,] 5: [ld,,,,-,-,2,4,ex,] Memory Unit Execute Execute m opc Qj Qk Vj Vk misc type rob stat busy
28 Execute-Phase Details (Example 3) Result is computed: Result will be stored temporarily in the field Vj [x].status = WB Result is ready for WB Programmspeicher OP A PC res add r <- r, r2 sub r3 <- r, r Reg Reg Reg 2 Reg r 4 // Op A // Op B addr type valid busy : 2: 3: 4: [-,-,,,] 5: [ld,,,89,-,-,2,4,wb,] Memory Unit Execute Execute m opc Qj Qk Vj Vk misc type rob stat busy
29 Write-Back Details (ALU-Operation) Write-Back of the result res from x possible, if [x].status = WB and Result bus available Update after WB: ROB[[x].rob] := [x].vj [x].busy := ROB[[x].rob].valid := // Write result to allocated ROB-entry // free // Declare ROB-entry as valid for all reservation stations y ¹ x: // Forwarding of the result if [y].qj = [x].rob then [y].vj := [x].rob; [y].qj := if [y].qk = [x].rob then [y].vk := [x].rob; [y].qk :=
30 Write-Back-Phase Details (Example 4) Situation: Result of the ld-operation is written back Result bus contains: ROB-entry ID, e.g. 4 ROB-value, e.g. 89 add-operation waits for the right-hand operand Programmspeicher OP B PC res add r <- r, r2 sub r3 <- r, r R: 5 R: 4 R2: 89 R3: 7 // Op A // Op B addr type valid busy 5 4 : 2: 3: 4: [-,-,,,] 5: [-,-,,,] [add,,4,4,-,-,,5,ro,] [ld,,,2,-,-,2,4,wb,] Memory Unit Execute Execute m opc Qj Qk Vj Vk misc type rob stat busy (4,89)
31 Write-Back-Phase Details (Example 4) Situation: Result was stored in ROB-entry 4 containing add-operation has also stored the result was freed Programmspeicher OP B PC res add r <- r, r2 sub r3 <- r, r R: 5 R: 4 R2: 89 R3: 7 // Op A // Op B addr type valid busy 5 4 : 2: 3: 4: [2,-,,,] 5: [-,-,,,] [add,,,4,2,-,,5,ro,] Memory Unit Execute Execute m opc Qj Qk Vj Vk misc type rob stat busy
32 Commit Details (ALU-Operation) It must be checked: ROB[first].valid = Update by commit: for all Architectural Registers r with Reg[r].rob = head do Reg[r] := ROB[head].res Reg[r].rob :=
33 Commit-Phase Details (Example 5) Situation: Let be head = 4 for the ROB-head R2 waits for result from ROB-entry 4 res add r <- r, r2 sub r3 <- r, r // Op A // Op B addr type valid busy (4,89) Programmspeicher OP B PC R: 5 R: 4 R2: 89 R3: : 2: 3: 4: [2,-,,,] 5: [-,-,,,] [add,,,56,89,-,,5,ro,] Memory Unit Execute Execute m opc Qj Qk Vj Vk misc type rob stat busy
34 Commit-Phase Details (Example 5) Situation: R2 has received result from ROB res add r <- r, r2 sub r3 <- r, r // Op A // Op B addr type valid busy (4,89) Programmspeicher OP B PC R: 5 R: 4 R2: 2 R3: 7 5 : 2: 3: 4: [2,-,,,] 5: [-,-,,,] [add,,,56,89,-,,5,ro,] Memory Unit Execute Execute m opc Qj Qk Vj Vk misc type rob stat busy
35 Executing Branch Operations Issue: Vk-field: stores branch target z misc-field: remembers address a of the branch operation misc-field: also remembers which address ( z or a ) was predicted Execute: Computed target address is stored in Vk-Field of : Vk := z, if branch is taken Vk := a+, if branch is not taken misc-field stores, whether or not prediction was correct ( c = correct; w = wrong) Write-Back: res-field of ROB received branch target (Vk-field of the ) addr-field receives value of misc-field from : c or w Commit: If addr-field = c, nothing must be done (operations were fetched from correct address) If addr-field = w, then copy res-field into PC and flush the whole pipeline: All subsequent ROB-entries All -entries instruction queue
36 Branch Details (Example 6 correct prediction) Situation: Branch-operation was issued to 2 Branch depends on ld-operation Op A, Op B, will be executed speculateively : ld r2 <- (2) 2: bz r2, #23 3: add r <- r, r // Op A 4: sub r3 <- r, r // Op B 5: res addr type valid busy Programmspeicher OP C OP B OP A PC 2 R: 5 R: 4 R2: 89 R3: : [-,-,2,,] 2: [-,-,3,,] 3: [-,-,-,,] 4: [-,-,-,,] 5: [-,-,-,,] [ld,,,-,2,-,2,,ex,] [bz,,,-,23,2a,3,2,ro,] Memory ld Unit Execute Execute m opc Qj Qk Vj Vk misc type rob stat busy
37 Branch Details (Example 6 correct prediction) Situation: Op A is executed speculatively Op B waits for result of Op A : ld r2 <- (2) 2: bz r2, #23 3: add r <- r, r // Op A 4: sub r3 <- r, r // Op B 5: res addr type valid busy Programmspeicher OP E OP D OP C PC R: 5 R: 4 R2: 89 R3: 3 4 : [-,-,2,,] 2: [-,-,3,,] 3: [-,-,,,] 4: [-,-,,,] 5: [-,-,-,,] [ld,,,-,2,-,2,,ex,] [bz,,,-,23,2a,3,2,ro,] OP A OP B Memory ld Unit OP A Execute Execute m opc Qj Qk Vj Vk misc type rob stat busy
38 Branch Details (Example 6 correct prediction) Situation: Op A wrote result to ROB, but not to R Op B is executed speculatively : ld r2 <- (2) 2: bz r2, #23 3: add r <- r, r // Op A 4: sub r3 <- r, r // Op B 5: res addr type valid busy Programmspeicher OP E OP D OP C PC R: 5 R: 4 R2: 89 R3: 3 4 : [-,-,2,,] 2: [-,-,3,,] 3: [9,-,,,] 4: [-,-,,,] 5: [-,-,-,,] [ld,,,-,2,-,2,,ex,] [bz,,,-,23,2a,3,2,ro,] OP B Memory ld Unit Execute Execute m opc Qj Qk Vj Vk misc type rob stat busy
39 Branch Details (Example 6 correct prediction) Situation: Op B wrote result to ROB : ld r2 <- (2) 2: bz r2, #23 3: add r <- r, r // Op A 4: sub r3 <- r, r // Op B 5: ld write result to ROB res addr type valid busy bz can be executed Programmspeicher OP E OP D OP C PC R: 5 R: 4 R2: 89 R3: 3 4 : [6,-,2,,] 2: [-,-,3,,] 3: [9,-,,,] 4: [-4,-,,,] 5: [-,-,-,,] [bz,,,6,23,2a,3,2,ro,] Memory Unit Execute Execute m opc Qj Qk Vj Vk misc type rob stat busy
40 Branch Details (Example 6 correct prediction) Situation: bz will be executed: Branch is not taken Commit for ld-operation is done : ld r2 <- (2) 2: bz r2, #23 3: add r <- r, r // Op A 4: sub r3 <- r, r // Op B 5: res addr type valid busy Programmspeicher OP E OP D OP C PC R: 5 R: 4 R2: 6 R3: 3 4 : [-,-,-,,] 2: [-,-,3,,] 3: [9,-,,,] 4: [-4,-,,,] 5: [-,-,-,,] [bz,,,6,23,2a,3,2,ex,] Memory Unit BZ Execute Execute m opc Qj Qk Vj Vk misc type rob stat busy
41 Branch Details (Example 6 correct prediction) Situation: bz was executed : ld r2 <- (2) 2: bz r2, #23 3: add r <- r, r // Op A 4: sub r3 <- r, r // Op B 5: WB for bz was done res addr type valid busy Prediction was correct (ROB[2].addr := c) Programmspeicher OP E OP D OP C PC R: 5 R: 4 R2: 6 R3: 3 4 : [-,-,-,,] 2: [3,c,3,,] 3: [9,-,,,] 4: [-4,-,,,] 5: [-,-,-,,] Memory Unit Execute Execute m opc Qj Qk Vj Vk misc type rob stat busy
42 Branch Details (Example 6 correct prediction) Situation: Commit of the branch operation does not require any action, because prediction was correct : ld r2 <- (2) 2: bz r2, #23 3: add r <- r, r // Op A 4: sub r3 <- r, r // Op B 5: res addr type valid busy Now Commit can be done for speculatively executed operations A and B Programmspeicher OP E OP D OP C PC R: 5 R: 4 R2: 6 R3: 3 4 : [-,-,-,,] 2: [-,-,-,,] 3: [9,-,,,] 4: [-4,-,,,] 5: [-,-,-,,] Memory Unit Execute Execute m opc Qj Qk Vj Vk misc type rob stat busy
43 Branch Details (Example 7 wrong prediction) Situation: Same situation as in example 6 : ld r2 <- (2) 2: bz r2, #23 3: add r <- r, r // Op A 4: sub r3 <- r, r // Op B 5: But, ld-operation has stored in R2 res addr type valid busy I.e., branch is taken Programmspeicher OP F OP E OP D PC R: 5 R: 4 R2: R3: 3 4 : [-,-,-,,] 2: [-,-,3,,] 3: [9,-,,,] 4: [-4,-,,,] 5: [-,-,-,,] [bz,,,,23,2a,3,2,ex,] OP C Memory Unit BZ Execute Execute m opc Qj Qk Vj Vk misc type rob stat busy
44 Branch Details (Example 7 wrong prediction) Situation: bz-operation was executed : ld r2 <- (2) 2: bz r2, #23 3: add r <- r, r // Op A 4: sub r3 <- r, r // Op B 5: Prediction was wrong (ROB[2].addr := w) res addr type valid busy Correct target can be found in the res-field Programmspeicher OP G OP F OP E PC R: 5 R: 4 R2: 6 R3: 3 4 : [-,-,-,,] 2: [23,w,3,,] 3: [9,-,,,] 4: [-4,-,,,] 5: [-,-,-,,] OP C OP D Memory Unit OP C Execute Execute m opc Qj Qk Vj Vk misc type rob stat busy
45 Branch Details (Example 7 wrong prediction) Situation: Commit of the branch moves correct address into PC : ld r2 <- (2) 2: bz r2, #23 3: add r <- r, r // Op A 4: sub r3 <- r, r // Op B 5: res addr type valid busy Flushing the pipeline Programmspeicher OP G OP F OP E PC 23 R: 5 R: 4 R2: 6 R3: 3 4 : [-,-,-,,] 2: [-,-,-,,] 3: [9,-,,,] 4: [-4,-,,,] 5: [-,-,-,,] OP C OP D Memory Unit OP C Execute Execute m opc Qj Qk Vj Vk misc type rob stat busy
46 Executing Memory Operations For out-of-order-execution of memory operations holds: Ordering of load-does not matter Ordering of load- and store-operations as well as of store- and storeoperations must be maintained Example: ld r2 <- (r) ld r <- (r4) st r4 -> (r) ld r5 <- (r6) st r7 -> (r8) Strategy: Writing to memory takes place during commit-phase (in-order) Reading from memory takes place during execute-phase (out-of-order) But only, if valid-field of all preceding write-operations in the ROB is
47 Example (store-operation) Issue-Phase: issue the first st-operation st r3 -> (r) ld r3 <- (r) st r-> (r2) Execute-Phase Execution of store-operation can start, if both source operands are available Execution has no effect Rather, WB of st-operation starts Programmspeicher immediately M PC res R: 5 R: 4 R2: 2 R3: 7 addr type valid busy : [-,-,2,,] 2: 3: 4: 5: [st,,,7,5,-,2-,,ro,] Memory Unit Execute Execute m opc Qj Qk Vj Vk misc type rob stat busy
48 Example (store-operation) Updates during WB of the st-operation ROB[x].res := [y].vj ROB[x].addr := [y].vk st r3 -> (r) ld r3 <- (r) st r-> (r2) res addr type valid busy M Programmspeicher PC R: 5 R: 4 R2: 2 R3: 7 : [-,-,2,,] 2: 3: 4: 5: [st,,,7,5,-,2-,,wb,] Memory Unit Execute Execute m opc Qj Qk Vj Vk misc type rob stat busy
49 Example (store-operation) Commit for st-operation MEM[ROB[first].addr] := ROB[first].res st r3 -> (r) ld r3 <- (r) st r-> (r2) res addr type valid busy M Programmspeicher PC R: 5 R: 4 R2: 2 R3: 7 : [7,5,2,,] 2: 3: 4: 5: Memory Unit Execute Execute m opc Qj Qk Vj Vk misc type rob stat busy
50 Suppose first st-operation was issued and waits for execution Then ld-operation was issued, and its source operands are available Example (load-operation) Programmspeicher M PC st r3 -> (r) ld r3 <- (r) st r-> (r2) res R: 5 R: 4 R2: 2 R3: 7 addr type valid busy 2 : [-,-,2,,] 2: [-,-,2,,] 3: 4: 5: OP C [st,5,,-,5,-,2,,ro,] [ld,,,4,-,-,2,2,ro,] OP C Memory Unit Execute Execute m opc Qj Qk Vj Vk misc type rob stat busy
51 Example (load-operation) Situation : ld-operation is not executed, because valid-bit of first st-operation is st r3 -> (r) ld r3 <- (r) st r-> (r2) res addr type valid busy M Programmspeicher PC R: 5 R: 4 R2: 2 R3: 7 2 : [-,-,2,,] 2: [-,-,2,,] 3: 4: 5: OP C [st,5,,-,5,-,2,,ro,] [ld,,,4,-,-,2,2,ro,] OP C Memory Unit Execute Execute m opc Qj Qk Vj Vk misc type rob stat busy
52 Example (load-operation) Situation: Now, ld-operation can be executed (see valid-bit of first st-operation) st r3 -> (r) ld r3 <- (r) st r-> (r2) res addr type valid busy ld-operation can read value either from memory or from ROB (if addr-field Programmspeicher of a preceding st-operation matches Vj-field of ld-operation M PC R: 5 R: 4 R2: 2 R3: 7 2 : [7,5,2,,] 2: [-,-,2,,] 3: 4: 5: [ld,,,4,-,-,2,2,ro,] Memory Unit Execute Execute m opc Qj Qk Vj Vk misc type rob stat busy
53 Example (load-operation) WB for ld-operation complete Commit-phase for ld-operations is the same as for alu-operations st r3 -> (r) ld r3 <- (r) st r-> (r2) res addr type valid busy M Programmspeicher PC R: 5 R: 4 R2: 2 R3: 7 2 : [7,5,2,,] 2: [2,-,2,,] 3: 4: 5: [ld,,,4,-,-,2,2,ro,] Memory Unit Execute Execute m opc Qj Qk Vj Vk misc type rob stat busy
54 Tomasulo with Speculation (Example 9: Loop Iteration, Cycle ) Loop: ld r <- (r) mul r4 <- r, r2 add r3 <- r3, r4 add r <- r, bne loop, r,2 PC add r3,r3,r4 mul r4,r,r2 ld r,(r) R: R: R2: 3 R3: R4: : 2: 3: 4: 5: Memory Unit Execute Execute 2
55 Tomasulo with Speculation (Example 9: Loop Iteration, Cycle ) Loop: ld r <- (r) mul r4 <- r, r2 add r3 <- r3, r4 add r <- r, bne loop, r,2 PC add r,r, R: R: R2: 3 : 2: 3: ld r,() add r3,r3,r4 mul r4,r,r2 R3: R4: 4: 5: ld r,() Memory Unit Execute Execute 2
56 Tomasulo with Speculation (Example 9: Loop Iteration, Cycle 2) Loop: ld r <- (r) mul r4 <- r, r2 add r3 <- r3, r4 add r <- r, bne loop, r,2 PC bne loop,r,2 R: R: R2: 3 : 2: 3: ld r,() mul r4 add r,r, add r3,r3,r4 R3: R4: 2 4: 5: ld r,() 2 mul r4,rob, Memory ld r,() Unit Execute Execute 2
57 Tomasulo with Speculation (Example 9: Loop Iteration, Cycle 3) Loop: ld r <- (r) mul r4 <- r, r2 add r3 <- r3, r4 add r <- r, bne loop, r,2 PC ld r,(r) R: R: R2: 3 : 2: 3: ld r,() mul r4 add r3 bne loop,r,2 add r,r, R3: 3 R4: 2 4: 5: ld r,() 2 mul r4,rob,3 add r3,,rob Memory ld r,() Unit Execute Execute 2
58 Tomasulo with Speculation (Example 9: Loop Iteration, Cycle 4) Loop: ld r <- (r) mul r4 <- r, r2 add r3 <- r3, r4 add r <- r, bne loop, r,2 PC mul r4,r,r2 R: R: R2: 3 4 : 2: 3: 2 mul r4 add r3 ld r,(r) bne loop,r,2 R3: 3 R4: 2 4: 5: add r ld r,() 2 mul r4,2,3 add r3,,rob2 3 4 add r,, 5 6 Memory Unit Execute Execute 2
59 Tomasulo with Speculation (Example 9: Loop Iteration, Cycle 5) Loop: ld r <- (r) mul r4 <- r, r2 add r3 <- r3, r4 add r <- r, bne loop, r,2 PC 5 add r3,r3,r4 R: 2 R: R2: 3 4 : 2: 3: mul r4 add r3 mul r4,r,r2 ld r,(r) R3: 3 R4: 2 4: 5: add r bne 2 mul r4,2,3 add r3,,rob2 3 4 add r,, 5 bne loop,rob4,2 6 Memory Unit EU-Bus mul Execute r4,2,3 add Execute r,, 2
60 Tomasulo with Speculation (Example 9: Loop Iteration, Cycle 6) Loop: ld r <- (r) mul r4 <- r, r2 add r3 <- r3, r4 add r <- r, bne loop, r,2 PC 5 Operations are fetched and issued speculatively add r,r, add r3,r3,r4 mul r4,r,r2 R: 2 R: R2: 3 4 R3: 3 R4: 2 : 2: 3: 4: 5: ld r 26 add r3 bne Operations from different loop iterations are in the pipeline ld r,() 2 mul r4,2,3 add r3,, add r,, bne loop,,2 5 6 ld-operation is no longer dependent on the branch operation Memory Unit Execute Execute 2
61 Tomasulo with Speculation (Example 9: Loop Iteration, Cycle 7) Loop: ld r <- (r) mul r4 <- r, r2 add r3 <- r3, r4 add r <- r, bne loop, r,2 PC 5 bne loop,r,2 R: 2 R: R2: 3 4 : 2: 3: ld r mul r4 add r3 add r,r, add r3,r3,r4 R3: 3 R4: : 5: bne Now speculative execution possible ld r,() 2 mul r4,rob,3 add r3,, bne loop,,2 5 6 Memory ld r,() Unit add Execute r3,,26 bne loop,,2 Execute 2
62 Tomasulo with Speculation (Example 9: Loop Iteration, Cycle 8) Loop: ld r <- (r) mul r4 <- r, r2 add r3 <- r3, r4 add r <- r, bne loop, r,2 PC 5 bne loop,r,2 add r,r, R: 2 R: R2: 3 R3: 4 3 : 2: 3: 4: ld r mul r4 26 add r3,r3,r4 R4: : bne c ld r,() 2 mul r4,rob,3 add r3,, bne loop,,2 5 6 Memory ld r,() Unit Execute Execute 2
63 Tomasulo with Speculation (Example 9: Loop Iteration, Cycle 9) Loop: ld r <- (r) mul r4 <- r, r2 add r3 <- r3, r4 add r <- r, bne loop, r,2 PC 5 ld r,(r) bne loop,r,2 R: 2 R: R2: 3 R3: : 2: 3: 4: 8 mul r4 add r3 add r,r, R4: : bne c ld r,() 2 mul r4,8,3 add r,, 3 4 add r3,26,rob2 5 6 Memory Unit Execute Execute 2
64 Summary We have seen Tomasulo-algorithm with speculation Importance of the Reorder-Buffer Execution of Alu-operations Branch-operations Memory-operations But: Issue- and Commit-phase are limited to processing a single operation per clock cycle
65 Content Tomasulo with speculative execution Introducing superscalarity into the instruction pipeline Multithreading
66 Superscalar Instruction Pipeline So far: Only data path is superscalar Parallel execution of operation in the data path, but CPI < not possible Required: super scalar Fetch-, Issue-, WB-, Commit-Phase Programmspeicher PC R: 5 R: 4 R2: 2 R3: 7 2 ROB Memory Unit Execute Execute m
67 Superscalar Fetch-Phase Fetch: Fetching n operations simultaneously from code cache/memory Requires wider busses Cache/Memory n operation bus n operation bus Instruction queue Register File operand bus A A n operand bus B operand bus B n
68 Superscalar Issue-Phase Issue: Issue the first n Operations from the instruction queue (n operation busses required) n operand busses for left operand required (A) n operand busses for right operand required (B) Checking for free and free ROB-entry must be done simultaneously for up to n operations! Cache/Memory n operation bus n operation bus Instruction queue Register File operand bus A A n operand bus B operand bus B n
69 Implementing simultaneous checking in issue-phase For a single operations For two operations Old ROB Status Old Status Old ROB Status Old Status RF Control for operand buses A and B New state for ROB Issue-Logic ( Operation) New state for RF Control for operand buses A and B Issue-Logic (. Op) RF control for operand busses A2 und B2 Issue-Locik (2. Op) Combine New state for ROB New state for
70 Superscalar WB-Phase Every EU has its own result bus E i All EUs may write simultaneously to the ROB This makes also the bypass for the reservation stations more complex A A n B B n E E m R R k ROB Bypass opc Qj Qk Vj Vk misc type rob stat busy Bypasses to Memory Unit Execute Execute m E E E m result busses
71 Superscalar Commit-Phase For up to n ROB-entries starting at the head: check if the valid-bit is set to Then write their result to the register file Register file needs n write-ports A A n B B n E E m R R k Bypass Register File ROB opc Qj Qk Vj Vk misc type rob stat busy Bypasses to Memory Unit Execute Execute m E E E m result busses
72 Example PowerPC Quelle: PowerPC e5 Core Family Reference Manual
73 Limitations for ILP Memory band width limits the amount of simultaneously fetched operations (typical 4 to 6 operations) HW-Overhead and delay for: Control logic issue-phase Bypasses for reservation stations Number of read-/write-ports in the register file Branches Possible Solution: Branch prediction Available parallelism in the application Possible Solution: Multithreading
74 Content Tomasulo with speculative execution Introducing superscalarity into the instruction pipeline Multithreading
75 Motivation for Multithreading True dependencies prevent the EUs from being used in parallel (horizontal performance loss) Operations with a very long delay during execution create vertical performance loss E.g. memory access of an operation A in a Pentium 4 (3-way-superscalar) can take 38 clock cycles (cache misses) I.e. 4 operations have to bypass Op A in order to utilize EUs fully during this time But: Reorder buffer has only 26 entries Hence, 339 execution cycles are wasted Solution Multithreading: Execute multiple threads that share the same execution units, but have no dependencies OP OP 2 OP n OP OP 2 OP 3 OP A OP 4 OP 5 OP 6 after 38 cycles WB of A only 4 cycles EU usage EU usage 2 3 A
76 Process vs. Thread Each process has its own context address space (Code, data, heap, stack) TLB Switching between processes takes tens of thousands of clock cycles (context switch) Threads share the same context Switching between two threads only requires to change the values in the architectural registers OS is involved Code Section Code Section Code Section Code Section Data Section Data Section Data Section Data Section Heap Heap Heap Heap Stack 2 Stack Stack 2 Stack Stack Stack Thread in Process 2 Thread 2 in Process 2 Process Process 2
77 Multithreading Programmspeicher PC PC 2 OP E OP C OP A Instruction queue OP F OP D OP B Instruction queue RF RF 2 ROB ROB 2 EU-Bus Memory Unit Execute Execute m Multithreading: A fixed number of n threads can share the same execution units Hardware supports fast switching between n threads: n copies of some resources, e.g. architectural registers (including PC) fix partitioning of some resources, e.g. (or limited sharing) shared usage of some resources, e.g. EUs
78 Multithreading Types of Multithreading: no MT Coarse Grained MT Fine Grained MT Simultaneous MT
79 Coarse Grained Multithreading A single thread runs for many clock cycles before the hardware switches to another thread Hardware switches between threads only, if a long running operation is detected, e.g. cache miss, or a fix time slice has passed A processor with n-way MT appears to an operating system like n processors OS schedules n threads of the same process to these processors
80 Example Two threads are scheduled to the processor Reservation stations and EUs are shared resources Hardware switches between both PCs and IQ (e.g. by multiplexors) Fetched operations are tagged with thread number Situation: Thread is running Instruction Queue thread Programmspeicher OP F. OP C.2 OP E. OP B.2 OP D. OP A.2 Instruction Queue thread 2 PC PC 2 RF RF 2 OP A. OP B. ROB OP C. ROB 2 OP C. OP A. OP B. Memory Unit OP A. Execute OP B. Execute m
81 Example Memory operation D. of thread was issued Thread is still running Instruction Queue thread Programmspeicher OP G. OP F. OP E. OP C.2 OP B.2 OP A.2 Instruction Queue thread 2 PC PC 2 RF RF 2 OP A. OP B. ROB OP C. OP D. ROB 2 OP C. OP D. OP A. OP B. Memory Unit OP A. Execute OP B. Execute m
82 Example Memory operation is executed and cache miss is detected Processor has switched to thread 2 another PC is used another instruction queue is used Instruction Queue thread Programmspeicher OP H. OP G. OP F. OP C.2 OP B.2 OP A.2 2 Instruction Queue thread 2 PC PC 2 RF RF 2 OP A. OP B. ROB OP C. OP D. OP E. ROB 2 OP C. OP E. OP D. OP A. OP D. Memory Unit OP A. Execute Execute m
83 Example Issued operations of thread are further processed But issue now takes place from instruction queue 2 2 Instruction Queue thread Programmspeicher OP H. OP D.2 OP G. OP F. OP C.2 OP B.2 Instruction Queue thread 2 PC PC 2 RF RF 2 OP A. OP B. ROB OP C. OP D. OP E. OP A.2 ROB 2 OP C. OP E. OP D. OP A. OP A.2 OP D. Memory Unit OP A. Execute Execute m
84 Example Issued operations of thread are further processed But issue now takes place from instruction queue 2 2 Instruction Queue thread Programmspeicher OP H. OP G. OP E.2 OP D.2 OP F. OP C.2 Instruction Queue thread 2 PC PC 2 RF RF 2 OP A. OP B. ROB OP C. OP D. OP E. OP A.2 OP B.2 ROB 2 OP B.2 OP C. OP E. OP D. OP A.2 OP D. Memory Unit Execute OP A.2 Execute m
85 Example Operations of Thread are further processed, but not committed while simultaneously operations from Thread 2 are processed If operation E. has a true-dependency to D. then it blocks the reservation station for operations from Thread 2 Balancing between shared resources important Instruction Queue thread Programmspeicher OP H. OP G. OP F. OP F.2 OP E.2 OP D.2 2 Instruction Queue thread 2 PC PC 2 RF RF 2 OP B. ROB OP C. OP D. OP E. OP A.2 OP B.2 ROB OP C.2 2 OP B.2 OP C. OP E. OP D. OP C.2 OP D. Memory Unit Execute Execute m
86 Coarse Grained MT - Limitations Does not help to overcome the problem of horizontal performance loss (a single thread may not have enough ILP) Only right after switching between threads, there are operations of both threads simultaneously processed Switching between threads may has a negative impact on the cache hit rate for each thread and affects the performance negatively
87 Fine-Grained Multithreading Processor switches in every clock cycle to another thread E.g. in a round robin manner: This helps to overcome horizontal performance loss A single instruction queue and a single reorder buffer are sufficient (shared) Operations must be tagged with the corresponding Thread number
88 Example Programmspeicher PC PC 2 OP A. Instruction queue RF RF 2 ROB Memory Unit Execute Execute m
89 Example 2 Programmspeicher PC PC 2 OP A.2 OP A. Instruction queue RF RF 2 ROB Memory Unit Execute Execute m
90 Example Programmspeicher PC PC 2 OP B. OP A.2 OP A. Instruction queue RF RF 2 ROB Memory Unit Execute Execute m
91 Example 2 Programmspeicher PC PC 2 OP A. OP B.2 OP B. OP A.2 Instruction queue RF RF 2 ROB OP A. Memory Unit Execute Execute m
92 Example Programmspeicher OP C. OP B.2 OP B. Instruction queue PC PC 2 RF RF 2 OP A. OP A.2 ROB OP A.2 Memory Unit OP A. Execute Execute m
93 Example 2 Programmspeicher OP C.2 OP C. OP B.2 Instruction queue PC PC 2 RF RF 2 OP A. OP A.2 OP ROB B. OP B. Memory Unit OP A. Execute OP A.2 Execute m
94 Example Programmspeicher OP D. OP C.2 OP C. Instruction queue PC PC 2 RF RF 2 OP A. OP A.2 OP ROB B. OP B.2 OP B. OP B.2 Memory Unit OP A. Execute Execute m
95 Example 2 Programmspeicher OP D.2 OP D. OP C.2 Instruction queue PC PC 2 RF RF 2 OP A. OP A.2 OP ROB B. OP B.2 OP C. OP C. Memory Unit OP B. Execute OP B.2 Execute m
96 Example Programmspeicher OP D.2 OP D. OP C.2 Instruction queue PC PC 2 RF RF 2 OP A.2 OP ROB B. OP B.2 OP C. OP C. Memory Unit OP B. Execute OP B.2 Execute m
97 Fine-Grained Multithreading - Limitations Vertically performance loss cannot be avoided A long running operation prevents other operation from the same thread from being executed due to the shared IQ and ROB, also the other thread is blocked after a while Improvement: Stop fetching for a blocked thread Performance of a single thread is reduced (even if there are no operations from a second blocked thread), because issue takes place in every second cycle MT reduces cache performance
98 Simultaneous Multithreading Mixing Coarse- and Fine-Grained MT In every clock cycle operations from n threads will be fetched and issued (Intel calls this Hyperthreading) Operations must be tagged with the corresponding Thread number Solving the problem of having either horizontal or vertical performance loss: If both threads are not blocked, then available ILP is utilized, and horizontal performance loss is avoided If one thread is blocked, then the other thread still uses the resources, and vertical performance loss is avoided (but not horizontal one) Even if one thread is blocked, the other one can run at full speed (issue in every clock cycle)
99 Example Fetch and issue takes place simultaneously for both threads Each thread has its own IQ, RF, PC, ROB Reservation Stations are partitioned Programmspeicher OP E OP C OP A Instruction queue OP F OP D OP B Instruction queue 2 PC PC 2 RF RF 2 ROB ROB 2 Used for thread Used for thread 2 Memory Unit Execute Execute m
100 Example Both threads are executed... Avoids horizontal performance loss Programmspeicher PC PC 2 OP G OP E OP C Instruction queue OP H OP F OP D Instruction queue 2 RF RF 2 ROB ROB 2 Used for thread Used for thread 2 OP B OP A Memory Unit Execute Execute m
101 Example Both threads are executed... Avoids horizontal performance loss Programmspeicher PC PC 2 OP I OP G OP E Instruction queue OP J OP H OP F Instruction queue 2 RF RF 2 ROB ROB 2 Used for thread Used for thread 2 OP C OP B OP A OP D Memory Unit OP B Execute OP A Execute m
102 Example Both threads are executed... Avoids horizontal performance loss But, now long running operation E is issued Programmspeicher PC PC 2 OP A OP B OP K OP I OP G Instruction queue OP L OP J OP H Instruction queue 2 RF RF 2 ROB ROB 2 Used for thread Used for thread 2 OP E OP C OP D OP F Memory Unit OP C Execute OP D Execute m
103 Example Assume G is true dependent on E Programmspeicher OP M OP K OP I Instruction queue OP N OP L OP J Instruction queue 2 PC PC 2 Res A RF RF 2 Res B Res C ROB Res D ROB 2 Used for thread Used for thread 2 OP E OP G OP H OP F OP E Memory Unit Execute OP F Execute m
104 Example Assume G is true dependent on E; and I, too Then, thread is now blocked Programmspeicher OP O OP M OP K Instruction queue OP P OP N OP L Instruction queue 2 PC PC 2 Res A RF RF 2 Res B Res C ROB ROB Res F 2 Used for thread Used for thread 2 OP E OP G OP H OP I OP J OP E Memory Unit OP H Execute Execute m
105 Example but thread 2 can continuous Programmspeicher OP O OP M OP K Instruction queue OP Q OP P OP N Instruction queue 2 PC PC 2 Res A RF RF 2 Res B Res C ROB ROB 2 Used for thread Used for thread 2 OP E OP G OP H OP I OP J OP L OP E Memory Unit OP H Execute Execute m
106 Summary - Multithreading Allows to fill the pipeline with operations from different threads no data dependency between operations from different threads allows for higher resource utilization Coarse-grained MT suffers from horizontal performance loss Fine-grained MT suffers from horizontal performance loss SMT solves these problems Improvement: Balancing between partitioned resources All MT approaches have impact on the cache performance In particular Fine-Grained MT can be also used in statically scheduled processor pipelines to avoid hazards In a pipeline with n pipeline stages, operations from n threads are issued no data-/control hazard occur because operations in the pipeline have no dependencies
Hardware-based Speculation
Hardware-based Speculation M. Sonza Reorda Politecnico di Torino Dipartimento di Automatica e Informatica 1 Introduction Hardware-based speculation is a technique for reducing the effects of control dependences
More informationHardware-based speculation (2.6) Multiple-issue plus static scheduling = VLIW (2.7) Multiple-issue, dynamic scheduling, and speculation (2.
Instruction-Level Parallelism and its Exploitation: PART 2 Hardware-based speculation (2.6) Multiple-issue plus static scheduling = VLIW (2.7) Multiple-issue, dynamic scheduling, and speculation (2.8)
More informationLoad1 no Load2 no Add1 Y Sub Reg[F2] Reg[F6] Add2 Y Add Reg[F2] Add1 Add3 no Mult1 Y Mul Reg[F2] Reg[F4] Mult2 Y Div Reg[F6] Mult1
Instruction Issue Execute Write result L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6 DIV.D F10, F0, F6 ADD.D F6, F8, F2 Name Busy Op Vj Vk Qj Qk A Load1 no Load2 no Add1 Y Sub Reg[F2]
More informationSpring 2010 Prof. Hyesoon Kim. Thanks to Prof. Loh & Prof. Prvulovic
Spring 2010 Prof. Hyesoon Kim Thanks to Prof. Loh & Prof. Prvulovic C/C++ program Compiler Assembly Code (binary) Processor 0010101010101011110 Memory MAR MDR INPUT Processing Unit OUTPUT ALU TEMP PC Control
More informationHandout 2 ILP: Part B
Handout 2 ILP: Part B Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level Parallelism Loop unrolling by compiler to increase ILP Branch prediction to increase ILP
More informationWebsite for Students VTU NOTES QUESTION PAPERS NEWS RESULTS
Advanced Computer Architecture- 06CS81 Hardware Based Speculation Tomasulu algorithm and Reorder Buffer Tomasulu idea: 1. Have reservation stations where register renaming is possible 2. Results are directly
More informationProcessor: Superscalars Dynamic Scheduling
Processor: Superscalars Dynamic Scheduling Z. Jerry Shi Assistant Professor of Computer Science and Engineering University of Connecticut * Slides adapted from Blumrich&Gschwind/ELE475 03, Peh/ELE475 (Princeton),
More informationReduction of Data Hazards Stalls with Dynamic Scheduling So far we have dealt with data hazards in instruction pipelines by:
Reduction of Data Hazards Stalls with Dynamic Scheduling So far we have dealt with data hazards in instruction pipelines by: Result forwarding (register bypassing) to reduce or eliminate stalls needed
More informationScoreboard information (3 tables) Four stages of scoreboard control
Scoreboard information (3 tables) Instruction : issued, read operands and started execution (dispatched), completed execution or wrote result, Functional unit (assuming non-pipelined units) busy/not busy
More informationCPE 631 Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation
Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation Aleksandar Milenković, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Outline Tomasulo
More informationDynamic Scheduling. Better than static scheduling Scoreboarding: Tomasulo algorithm:
LECTURE - 13 Dynamic Scheduling Better than static scheduling Scoreboarding: Used by the CDC 6600 Useful only within basic block WAW and WAR stalls Tomasulo algorithm: Used in IBM 360/91 for the FP unit
More informationHardware-Based Speculation
Hardware-Based Speculation Execute instructions along predicted execution paths but only commit the results if prediction was correct Instruction commit: allowing an instruction to update the register
More informationChapter 3: Instruction Level Parallelism (ILP) and its exploitation. Types of dependences
Chapter 3: Instruction Level Parallelism (ILP) and its exploitation Pipeline CPI = Ideal pipeline CPI + stalls due to hazards invisible to programmer (unlike process level parallelism) ILP: overlap execution
More informationThe basic structure of a MIPS floating-point unit
Tomasulo s scheme The algorithm based on the idea of reservation station The reservation station fetches and buffers an operand as soon as it is available, eliminating the need to get the operand from
More informationCPE 631 Lecture 11: Instruction Level Parallelism and Its Dynamic Exploitation
Lecture 11: Instruction Level Parallelism and Its Dynamic Exploitation Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Outline Instruction
More informationHardware-based Speculation
Hardware-based Speculation Hardware-based Speculation To exploit instruction-level parallelism, maintaining control dependences becomes an increasing burden. For a processor executing multiple instructions
More informationELEC 5200/6200 Computer Architecture and Design Fall 2016 Lecture 9: Instruction Level Parallelism
ELEC 5200/6200 Computer Architecture and Design Fall 2016 Lecture 9: Instruction Level Parallelism Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University,
More informationEITF20: Computer Architecture Part3.2.1: Pipeline - 3
EITF20: Computer Architecture Part3.2.1: Pipeline - 3 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Dynamic scheduling - Tomasulo Superscalar, VLIW Speculation ILP limitations What we have done
More informationStatic vs. Dynamic Scheduling
Static vs. Dynamic Scheduling Dynamic Scheduling Fast Requires complex hardware More power consumption May result in a slower clock Static Scheduling Done in S/W (compiler) Maybe not as fast Simpler processor
More informationChapter 3 (CONT II) Instructor: Josep Torrellas CS433. Copyright J. Torrellas 1999,2001,2002,2007,
Chapter 3 (CONT II) Instructor: Josep Torrellas CS433 Copyright J. Torrellas 1999,2001,2002,2007, 2013 1 Hardware-Based Speculation (Section 3.6) In multiple issue processors, stalls due to branches would
More informationEECC551 Exam Review 4 questions out of 6 questions
EECC551 Exam Review 4 questions out of 6 questions (Must answer first 2 questions and 2 from remaining 4) Instruction Dependencies and graphs In-order Floating Point/Multicycle Pipelining (quiz 2) Improving
More informationPage 1. Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls
CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: March 16, 2001 Prof. David A. Patterson Computer Science 252 Spring
More informationHardware-Based Speculation
Hardware-Based Speculation Execute instructions along predicted execution paths but only commit the results if prediction was correct Instruction commit: allowing an instruction to update the register
More informationILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5)
Instruction-Level Parallelism and its Exploitation: PART 1 ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5) Project and Case
More information5008: Computer Architecture
5008: Computer Architecture Chapter 2 Instruction-Level Parallelism and Its Exploitation CA Lecture05 - ILP (cwliu@twins.ee.nctu.edu.tw) 05-1 Review from Last Lecture Instruction Level Parallelism Leverage
More informationLecture-13 (ROB and Multi-threading) CS422-Spring
Lecture-13 (ROB and Multi-threading) CS422-Spring 2018 Biswa@CSE-IITK Cycle 62 (Scoreboard) vs 57 in Tomasulo Instruction status: Read Exec Write Exec Write Instruction j k Issue Oper Comp Result Issue
More informationSuper Scalar. Kalyan Basu March 21,
Super Scalar Kalyan Basu basu@cse.uta.edu March 21, 2007 1 Super scalar Pipelines A pipeline that can complete more than 1 instruction per cycle is called a super scalar pipeline. We know how to build
More informationLecture: Out-of-order Processors. Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ
Lecture: Out-of-order Processors Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ 1 An Out-of-Order Processor Implementation Reorder Buffer (ROB)
More information250P: Computer Systems Architecture. Lecture 9: Out-of-order execution (continued) Anton Burtsev February, 2019
250P: Computer Systems Architecture Lecture 9: Out-of-order execution (continued) Anton Burtsev February, 2019 The Alpha 21264 Out-of-Order Implementation Reorder Buffer (ROB) Branch prediction and instr
More informationE0-243: Computer Architecture
E0-243: Computer Architecture L1 ILP Processors RG:E0243:L1-ILP Processors 1 ILP Architectures Superscalar Architecture VLIW Architecture EPIC, Subword Parallelism, RG:E0243:L1-ILP Processors 2 Motivation
More informationCSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1
CSE 820 Graduate Computer Architecture week 6 Instruction Level Parallelism Based on slides by David Patterson Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level
More informationAdapted from David Patterson s slides on graduate computer architecture
Mei Yang Adapted from David Patterson s slides on graduate computer architecture Introduction Basic Compiler Techniques for Exposing ILP Advanced Branch Prediction Dynamic Scheduling Hardware-Based Speculation
More informationCS433 Midterm. Prof Josep Torrellas. October 16, Time: 1 hour + 15 minutes
CS433 Midterm Prof Josep Torrellas October 16, 2014 Time: 1 hour + 15 minutes Name: Alias: Instructions: 1. This is a closed-book, closed-notes examination. 2. The Exam has 4 Questions. Please budget your
More informationCPE 631 Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation
Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Outline Instruction
More informationRecall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls
CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: March 16, 2001 Prof. David A. Patterson Computer Science 252 Spring
More informationInstruction Level Parallelism
Instruction Level Parallelism The potential overlap among instruction execution is called Instruction Level Parallelism (ILP) since instructions can be executed in parallel. There are mainly two approaches
More informationChapter 3 Instruction-Level Parallelism and its Exploitation (Part 1)
Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1) ILP vs. Parallel Computers Dynamic Scheduling (Section 3.4, 3.5) Dynamic Branch Prediction (Section 3.3) Hardware Speculation and Precise
More informationPage 1. Recall from Pipelining Review. Lecture 15: Instruction Level Parallelism and Dynamic Execution
CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 15: Instruction Level Parallelism and Dynamic Execution March 11, 2002 Prof. David E. Culler Computer Science 252 Spring 2002
More informationMultithreaded Processors. Department of Electrical Engineering Stanford University
Lecture 12: Multithreaded Processors Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a Lecture 12-1 The Big Picture Previous lectures: Core design for single-thread
More informationCOSC4201 Instruction Level Parallelism Dynamic Scheduling
COSC4201 Instruction Level Parallelism Dynamic Scheduling Prof. Mokhtar Aboelaze Parts of these slides are taken from Notes by Prof. David Patterson (UCB) Outline Data dependence and hazards Exposing parallelism
More informationDYNAMIC AND SPECULATIVE INSTRUCTION SCHEDULING
DYNAMIC AND SPECULATIVE INSTRUCTION SCHEDULING Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 3, John L. Hennessy and David A. Patterson,
More informationLecture 8: Branch Prediction, Dynamic ILP. Topics: static speculation and branch prediction (Sections )
Lecture 8: Branch Prediction, Dynamic ILP Topics: static speculation and branch prediction (Sections 2.3-2.6) 1 Correlating Predictors Basic branch prediction: maintain a 2-bit saturating counter for each
More information" # " $ % & ' ( ) * + $ " % '* + * ' "
! )! # & ) * + * + * & *,+,- Update Instruction Address IA Instruction Fetch IF Instruction Decode ID Execute EX Memory Access ME Writeback Results WB Program Counter Instruction Register Register File
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 3 Instruction-Level Parallelism and Its Exploitation 1 Branch Prediction Basic 2-bit predictor: For each branch: Predict taken or not
More informationCSE 490/590 Computer Architecture Homework 2
CSE 490/590 Computer Architecture Homework 2 1. Suppose that you have the following out-of-order datapath with 1-cycle ALU, 2-cycle Mem, 3-cycle Fadd, 5-cycle Fmul, no branch prediction, and in-order fetch
More informationMetodologie di Progettazione Hardware-Software
Metodologie di Progettazione Hardware-Software Advanced Pipelining and Instruction-Level Paralelism Metodologie di Progettazione Hardware/Software LS Ing. Informatica 1 ILP Instruction-level Parallelism
More informationLecture 9: Dynamic ILP. Topics: out-of-order processors (Sections )
Lecture 9: Dynamic ILP Topics: out-of-order processors (Sections 2.3-2.6) 1 An Out-of-Order Processor Implementation Reorder Buffer (ROB) Branch prediction and instr fetch R1 R1+R2 R2 R1+R3 BEQZ R2 R3
More informationCPI IPC. 1 - One At Best 1 - One At best. Multiple issue processors: VLIW (Very Long Instruction Word) Speculative Tomasulo Processor
Single-Issue Processor (AKA Scalar Processor) CPI IPC 1 - One At Best 1 - One At best 1 From Single-Issue to: AKS Scalar Processors CPI < 1? How? Multiple issue processors: VLIW (Very Long Instruction
More informationNOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline
CSE 820 Graduate Computer Architecture Lec 8 Instruction Level Parallelism Based on slides by David Patterson Review Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level Parallelism
More informationMultiple Instruction Issue and Hardware Based Speculation
Multiple Instruction Issue and Hardware Based Speculation Soner Önder Michigan Technological University, Houghton MI www.cs.mtu.edu/~soner Hardware Based Speculation Exploiting more ILP requires that we
More informationTDT 4260 lecture 7 spring semester 2015
1 TDT 4260 lecture 7 spring semester 2015 Lasse Natvig, The CARD group Dept. of computer & information science NTNU 2 Lecture overview Repetition Superscalar processor (out-of-order) Dependencies/forwarding
More informationCISC 662 Graduate Computer Architecture. Lecture 10 - ILP 3
CISC 662 Graduate Computer Architecture Lecture 10 - ILP 3 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer
More informationInstruction Level Parallelism (ILP)
Instruction Level Parallelism (ILP) Pipelining supports a limited sense of ILP e.g. overlapped instructions, out of order completion and issue, bypass logic, etc. Remember Pipeline CPI = Ideal Pipeline
More informationCS425 Computer Systems Architecture
CS425 Computer Systems Architecture Fall 2017 Thread Level Parallelism (TLP) CS425 - Vassilis Papaefstathiou 1 Multiple Issue CPI = CPI IDEAL + Stalls STRUC + Stalls RAW + Stalls WAR + Stalls WAW + Stalls
More informationCPI < 1? How? What if dynamic branch prediction is wrong? Multiple issue processors: Speculative Tomasulo Processor
1 CPI < 1? How? From Single-Issue to: AKS Scalar Processors Multiple issue processors: VLIW (Very Long Instruction Word) Superscalar processors No ISA Support Needed ISA Support Needed 2 What if dynamic
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 09
More informationFour Steps of Speculative Tomasulo cycle 0
HW support for More ILP Hardware Speculative Execution Speculation: allow an instruction to issue that is dependent on branch, without any consequences (including exceptions) if branch is predicted incorrectly
More informationLecture 9: More ILP. Today: limits of ILP, case studies, boosting ILP (Sections )
Lecture 9: More ILP Today: limits of ILP, case studies, boosting ILP (Sections 3.8-3.14) 1 ILP Limits The perfect processor: Infinite registers (no WAW or WAR hazards) Perfect branch direction and target
More informationCISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1
CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer
More informationGetting CPI under 1: Outline
CMSC 411 Computer Systems Architecture Lecture 12 Instruction Level Parallelism 5 (Improving CPI) Getting CPI under 1: Outline More ILP VLIW branch target buffer return address predictor superscalar more
More informationInstruction Level Parallelism
Instruction Level Parallelism Dynamic scheduling Scoreboard Technique Tomasulo Algorithm Speculation Reorder Buffer Superscalar Processors 1 Definition of ILP ILP=Potential overlap of execution among unrelated
More informationDYNAMIC SPECULATIVE EXECUTION
DYNAMIC SPECULATIVE EXECUTION Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 3, John L. Hennessy and David A. Patterson, Morgan Kaufmann,
More informationGood luck and have fun!
Midterm Exam October 13, 2014 Name: Problem 1 2 3 4 total Points Exam rules: Time: 90 minutes. Individual test: No team work! Open book, open notes. No electronic devices, except an unprogrammed calculator.
More informationComputer Architecture A Quantitative Approach, Fifth Edition. Chapter 3. Instruction-Level Parallelism and Its Exploitation
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 3 Instruction-Level Parallelism and Its Exploitation Introduction Pipelining become universal technique in 1985 Overlaps execution of
More informationInstruction Level Parallelism. Appendix C and Chapter 3, HP5e
Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation
More informationComputer Systems Architecture I. CSE 560M Lecture 10 Prof. Patrick Crowley
Computer Systems Architecture I CSE 560M Lecture 10 Prof. Patrick Crowley Plan for Today Questions Dynamic Execution III discussion Multiple Issue Static multiple issue (+ examples) Dynamic multiple issue
More informationCS 152 Computer Architecture and Engineering
CS 152 Computer Architecture and Engineering Lecture 18 Advanced Processors II 2006-10-31 John Lazzaro (www.cs.berkeley.edu/~lazzaro) Thanks to Krste Asanovic... TAs: Udam Saini and Jue Sun www-inst.eecs.berkeley.edu/~cs152/
More informationCOSC 6385 Computer Architecture - Instruction Level Parallelism (II)
COSC 6385 Computer Architecture - Instruction Level Parallelism (II) Edgar Gabriel Spring 2016 Data fields for reservation stations Op: operation to perform on source operands S1 and S2 Q j, Q k : reservation
More informationCS 2410 Mid term (fall 2015) Indicate which of the following statements is true and which is false.
CS 2410 Mid term (fall 2015) Name: Question 1 (10 points) Indicate which of the following statements is true and which is false. (1) SMT architectures reduces the thread context switch time by saving in
More informationExploitation of instruction level parallelism
Exploitation of instruction level parallelism Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering
More informationOut of Order Processing
Out of Order Processing Manu Awasthi July 3 rd 2018 Computer Architecture Summer School 2018 Slide deck acknowledgements : Rajeev Balasubramonian (University of Utah), Computer Architecture: A Quantitative
More informationLecture 19: Instruction Level Parallelism
Lecture 19: Instruction Level Parallelism Administrative: Homework #5 due Homework #6 handed out today Last Time: DRAM organization and implementation Today Static and Dynamic ILP Instruction windows Register
More informationGraduate Computer Architecture. Chapter 3. Instruction Level Parallelism and Its Dynamic Exploitation
Graduate Computer Architecture Chapter 3 Instruction Level Parallelism and Its Dynamic Exploitation 1 Overview Instruction level parallelism Dynamic Scheduling Techniques Scoreboarding (Appendix A.8) Tomasulo
More informationInstruction-Level Parallelism and Its Exploitation
Chapter 2 Instruction-Level Parallelism and Its Exploitation 1 Overview Instruction level parallelism Dynamic Scheduling Techniques es Scoreboarding Tomasulo s s Algorithm Reducing Branch Cost with Dynamic
More informationCS252 Graduate Computer Architecture Lecture 6. Recall: Software Pipelining Example
CS252 Graduate Computer Architecture Lecture 6 Tomasulo, Implicit Register Renaming, Loop-Level Parallelism Extraction Explicit Register Renaming John Kubiatowicz Electrical Engineering and Computer Sciences
More informationPipeline issues. Pipeline hazard: RaW. Pipeline hazard: RaW. Calcolatori Elettronici e Sistemi Operativi. Hazards. Data hazard.
Calcolatori Elettronici e Sistemi Operativi Pipeline issues Hazards Pipeline issues Data hazard Control hazard Structural hazard Pipeline hazard: RaW Pipeline hazard: RaW 5 6 7 8 9 5 6 7 8 9 : add R,R,R
More informationEEC 581 Computer Architecture. Instruction Level Parallelism (3.6 Hardware-based Speculation and 3.7 Static Scheduling/VLIW)
1 EEC 581 Computer Architecture Instruction Level Parallelism (3.6 Hardware-based Speculation and 3.7 Static Scheduling/VLIW) Chansu Yu Electrical and Computer Engineering Cleveland State University Overview
More informationLecture 11: Out-of-order Processors. Topics: more ooo design details, timing, load-store queue
Lecture 11: Out-of-order Processors Topics: more ooo design details, timing, load-store queue 1 Problem 0 Show the renamed version of the following code: Assume that you have 36 physical registers and
More informationSimultaneous Multithreading Processor
Simultaneous Multithreading Processor Paper presented: Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor James Lue Some slides are modified from http://hassan.shojania.com/pdf/smt_presentation.pdf
More informationReview: Compiler techniques for parallelism Loop unrolling Ÿ Multiple iterations of loop in software:
CS152 Computer Architecture and Engineering Lecture 17 Dynamic Scheduling: Tomasulo March 20, 2001 John Kubiatowicz (http.cs.berkeley.edu/~kubitron) lecture slides: http://www-inst.eecs.berkeley.edu/~cs152/
More informationLecture 16: Core Design. Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue
Lecture 16: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue 1 The Alpha 21264 Out-of-Order Implementation Reorder Buffer (ROB) Branch prediction
More informationTomasulo s Algorithm
Tomasulo s Algorithm Architecture to increase ILP Removes WAR and WAW dependencies during issue WAR and WAW Name Dependencies Artifact of using the same storage location (variable name) Can be avoided
More informationTDT 4260 TDT ILP Chap 2, App. C
TDT 4260 ILP Chap 2, App. C Intro Ian Bratt (ianbra@idi.ntnu.no) ntnu no) Instruction level parallelism (ILP) A program is sequence of instructions typically written to be executed one after the other
More informationCourse on Advanced Computer Architectures
Surname (Cognome) Name (Nome) POLIMI ID Number Signature (Firma) SOLUTION Politecnico di Milano, July 9, 2018 Course on Advanced Computer Architectures Prof. D. Sciuto, Prof. C. Silvano EX1 EX2 EX3 Q1
More informationUG4 Honours project selection: Talk to Vijay or Boris if interested in computer architecture projects
Announcements UG4 Honours project selection: Talk to Vijay or Boris if interested in computer architecture projects Inf3 Computer Architecture - 2017-2018 1 Last time: Tomasulo s Algorithm Inf3 Computer
More informationComputer Architectures. Chapter 4. Tien-Fu Chen. National Chung Cheng Univ.
Computer Architectures Chapter 4 Tien-Fu Chen National Chung Cheng Univ. chap4-0 Advance Pipelining! Static Scheduling Have compiler to minimize the effect of structural, data, and control dependence "
More informationTopics. Digital Systems Architecture EECE EECE Predication, Prediction, and Speculation
Digital Systems Architecture EECE 343-01 EECE 292-02 Predication, Prediction, and Speculation Dr. William H. Robinson February 25, 2004 http://eecs.vanderbilt.edu/courses/eece343/ Topics Aha, now I see,
More informationEECC551 - Shaaban. 1 GHz? to???? GHz CPI > (?)
Evolution of Processor Performance So far we examined static & dynamic techniques to improve the performance of single-issue (scalar) pipelined CPU designs including: static & dynamic scheduling, static
More informationCS252 Graduate Computer Architecture Lecture 8. Review: Scoreboard (CDC 6600) Explicit Renaming Precise Interrupts February 13 th, 2010
CS252 Graduate Computer Architecture Lecture 8 Explicit Renaming Precise Interrupts February 13 th, 2010 John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley
More information吳俊興高雄大學資訊工程學系. October Example to eleminate WAR and WAW by register renaming. Tomasulo Algorithm. A Dynamic Algorithm: Tomasulo s Algorithm
EEF011 Computer Architecture 計算機結構 吳俊興高雄大學資訊工程學系 October 2004 Example to eleminate WAR and WAW by register renaming Original DIV.D ADD.D S.D SUB.D MUL.D F0, F2, F4 F6, F0, F8 F6, 0(R1) F8, F10, F14 F6,
More informationInstruction Level Parallelism (ILP)
1 / 26 Instruction Level Parallelism (ILP) ILP: The simultaneous execution of multiple instructions from a program. While pipelining is a form of ILP, the general application of ILP goes much further into
More informationSuperscalar Architectures: Part 2
Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23 rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr) Computer Science and Engineering Seoul NaMonal University Download this
More informationComplex Pipelining COE 501. Computer Architecture Prof. Muhamed Mudawar
Complex Pipelining COE 501 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Diversified Pipeline Detecting
More informationDynamic Scheduling. CSE471 Susan Eggers 1
Dynamic Scheduling Why go out of style? expensive hardware for the time (actually, still is, relatively) register files grew so less register pressure early RISCs had lower CPIs Why come back? higher chip
More informationInstruction Level Parallelism
Instruction Level Parallelism Software View of Computer Architecture COMP2 Godfrey van der Linden 200-0-0 Introduction Definition of Instruction Level Parallelism(ILP) Pipelining Hazards & Solutions Dynamic
More informationECE 571 Advanced Microprocessor-Based Design Lecture 4
ECE 571 Advanced Microprocessor-Based Design Lecture 4 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 28 January 2016 Homework #1 was due Announcements Homework #2 will be posted
More informationAdvanced Computer Architecture
Advanced Computer Architecture 1 L E C T U R E 4: D A T A S T R E A M S I N S T R U C T I O N E X E C U T I O N I N S T R U C T I O N C O M P L E T I O N & R E T I R E M E N T D A T A F L O W & R E G I
More informationComplex Pipelines and Branch Prediction
Complex Pipelines and Branch Prediction Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. L22-1 Processor Performance Time Program Instructions Program Cycles Instruction CPI Time Cycle
More informationCS 2410 Mid term (fall 2018)
CS 2410 Mid term (fall 2018) Name: Question 1 (6+6+3=15 points): Consider two machines, the first being a 5-stage operating at 1ns clock and the second is a 12-stage operating at 0.7ns clock. Due to data
More informationEE382A Lecture 7: Dynamic Scheduling. Department of Electrical Engineering Stanford University
EE382A Lecture 7: Dynamic Scheduling Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a Lecture 7-1 Announcements Project proposal due on Wed 10/14 2-3 pages submitted
More information