Slide Set 8. for ENCM 501 in Winter Steve Norman, PhD, PEng

Size: px

Start display at page:

Download "Slide Set 8. for ENCM 501 in Winter Steve Norman, PhD, PEng"

Alexina Hamilton
5 years ago
Views:

1 Slide Set 8 for ENCM 501 in Winter 2018 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary March 2018

2 ENCM 501 Winter 2018 Slide Set 8 slide 2/34 Contents Pipelines with long-latency instructions What does program order mean? Out-of-order execution, WAW and WAR hazards Data hazards related to memory locations Managing hazards in OoO processor designs

3 ENCM 501 Winter 2018 Slide Set 8 slide 3/34 Outline of Slide Set 8 Pipelines with long-latency instructions What does program order mean? Out-of-order execution, WAW and WAR hazards Data hazards related to memory locations Managing hazards in OoO processor designs

4 ENCM 501 Winter 2018 Slide Set 8 slide 4/34 Pipelines with long-latency instructions Trying to execute instructions with long latencies in-order using a single pipeline can have very bad effects when data hazards arise. The costs in lost cycles are much worse than they might appear to be from study of a simple pipeline that can always do the EX step in one clock cycle. Let s look at some examples using the pipeline of textbook Figure C.35 as a reference. That pipeline is fairly realistic about variable latency in EX. The Figure C.35 pipeline model is unreasonably simplistic about instruction fetch and data memory access, but problems waiting for EX results will make the point that needs to be made.

5 ENCM 501 Winter 2018 Slide Set 8 slide 5/34 Integer unit EX M1 FP/integer multiply M2 M3 M4 M5 M6 M7 IF ID MEM WB FP adder A1 A2 A3 A4 FP/integer divider DIV Image is Figure C.35 from Hennessy J. L. and Patterson D. A., Computer Architecture: A Quantitative Approach, 5nd ed., c 2012, Elsevier, Inc.

6 ENCM 501 Winter 2018 Slide Set 8 slide 6/34 Let s look at the effects of RAW data hazards concerning use of FPRs (floating-point registers). Such a hazard involving two instructions can be detected in the ID stage of the later instruction a source register number of the later instruction will match the destination register number of the earlier instruction. The later instruction will not be allowed to enter EX until the result of the earlier instruction can be forwarded. If this requires one or more stall cycles, that blocks other instructions from even entering the ID stage.

7 ENCM 501 Winter 2018 Slide Set 8 slide 7/34 Example: MUL.D F0, F20, F22 S.D F0, (R8) L.D F2, (R9) DADDIU R9, R9, 8 ADD.D F24, F24, F2 S.D can t leave ID until MUL.D leaves M7... MUL.D S.D L.D DADDIU IF-ID-M1-M2-M3-M4-M5-M6-M7-Me-W IF-ID-ID-ID-ID-ID-ID-ID-EX-Me-W IF-IF-IF-IF-IF-IF-IF-ID-EX-Me-W IF-ID-EX-Me-W L.D, DADDIU and ADD.D don t depend on the MUL.D result, but are all seriously delayed because in an in-order system, they all have to wait for S.D to leave ID.

8 ENCM 501 Winter 2018 Slide Set 8 slide 8/34 A worse example: MUL.D ADD.D L.D SUB.D F0, F20, F22 F24, F0, F24 F2, (R9) F26, F26, F2 ADD.D can t leave ID until M7 of MUL.D is done, and then L.D can t enter Mem until Mem of ADD.D is done... MUL.D ADD.D L.D SUB.D IF-ID-M1-M2-M3-M4-M5-M6-M7-Me-W IF-ID-ID-ID-ID-ID-ID-ID-A1-A2-A3-A4-Me-W IF-IF-IF-IF-IF-IF-IF-ID-EX-EX-EX-EX-Me-W IF-ID-ID-ID-ID-ID-A1-

9 ENCM 501 Winter 2018 Slide Set 8 slide 9/34 An even worse example: DIV.D F0, F20, F22 ADD.D F24, F0, F24 L.D F2, (R9)

10 ENCM 501 Winter 2018 Slide Set 8 slide 10/34 Mitigation of long stalls due to RAW hazards In-order pipelined execution used to be common even in reasonably high-end processors, and is still common in embedded processors. What criticism could be made of a compiler that emitted the following sequence of instructions? MUL.D F0, F20, F22 S.D F0, (R8) L.D F2, (R9) DADDIU R9, R9, 8 ADD.D F24, F24, F2

11 ENCM 501 Winter 2018 Slide Set 8 slide 11/34 Integer unit EX M1 FP/integer multiply M2 M3 M4 M5 M6 M7 IF ID MEM WB FP adder A1 A2 A3 A4 FP/integer divider DIV The above model is simplistic about memory accesses. With a more realistic model for memory access, let s come up with another source of stalls to due to RAW hazards with long-latency instructions, not involving complicated arithmetic.

12 ENCM 501 Winter 2018 Slide Set 8 slide 12/34 Outline of Slide Set 8 Pipelines with long-latency instructions What does program order mean? Out-of-order execution, WAW and WAR hazards Data hazards related to memory locations Managing hazards in OoO processor designs

13 ENCM 501 Winter 2018 Slide Set 8 slide 13/34 What does program order mean? Most (maybe all?) current mainstream ISAs guarantee that results register and memory writes, branch decisions, etc. produced by a single stream of instructions are what you would predict if each instruction completed before the next instruction was fetched. Program order refers to the order in which instructions would be processed in a hypothetical computer with no ILP. ILP schemes aim to get the effects of execution in program order, without the massive performance penalty of actually waiting to finish one instruction before starting the next.

14 ENCM 501 Winter 2018 Slide Set 8 slide 14/34 If a scalar in-order pipeline has proper hazard detection, it is pretty much guaranteed to generate correct program order results... Forwarding, preceded if necessary by stalling, ensures that instructions don t work with stale versions of source operands. A later instruction can t pass an earlier instruction in the pipeline key stages such as decode and data memory access can be occupied by only one instruction. That means that the earlier instruction will always write its result before the later instruction does. Warning: The above material isn t perfectly correct! But it does provide a decent explanation about the main benefits of in-order instruction processing.

15 ENCM 501 Winter 2018 Slide Set 8 slide 15/34 Outline of Slide Set 8 Pipelines with long-latency instructions What does program order mean? Out-of-order execution, WAW and WAR hazards Data hazards related to memory locations Managing hazards in OoO processor designs

16 ENCM 501 Winter 2018 Slide Set 8 slide 16/34 Out-of-order execution, WAW and WAR hazards The next slide shows a hypothetical variation of textbook Figure C.35. As in Figure C.35 instructions are sent one per clock cycle to one of the four paths that lead to instruction completion. Transfer of an instruction from the fetch-and-decode unit to one of the four functional units is called instruction issue. Let s assume that in each cycle the fetch-and-decode unit can inspect a window of four or so instructions that need to be issued soon, and picks one, not necessarily in program order, but instead with the goal of optimizing instruction throughput.

17 ENCM 501 Winter 2018 Slide Set 8 slide 17/34 integer unit EX FP/integer multiplier instruction fetch and decode unit M1 M2 M3 M4 M5 M6 M7 FP adder A1 A2 A3 A4 FP/integer divider MEM WB

18 ENCM 501 Winter 2018 Slide Set 8 slide 18/34 Further, as the paths through the execution units have different latencies, our hypothetical system will allow not only out-of-order issue, but also out-of-order completion instructions may arrive at the MEM stage not in program order. (And, two or more instructions could arrive at MEM in the same cycle, which would require some sort of arbitration and queueing system.) OoO (out-of-order) issue and OoO completion may create new kinds of hazards, ones that would be impossible with in-order issue and in-order completion.

19 ENCM 501 Winter 2018 Slide Set 8 slide 19/34 WAW (Write-After-Write) hazard example MUL.D ADD.D L.D ADD.D F0, F20, F20 F2, F2, F0 F0, (R8) F4, F4, F0 several more instructions, but no more writes to F0 SUB.D F6, F6, F0 The first ADD.D has to wait for the MUL.D result, so it makes sense to issue L.D and the second ADD.D before the first ADD.D. What is the potential bad consequence for the SUB.D instruction? Suppose the instruction sequence was produced by a compiler. How could the compiler have avoided the WAW hazard?

20 ENCM 501 Winter 2018 Slide Set 8 slide 20/34 WAR (Write-After-Read) hazard example MUL.D F2, F22, F22 S.D F2, 40(R29) L.D F2, 32(R29) ADD.D F20, F20, F2 In program order, L.D writes to F2 after S.D reads from F2. But if L.D is issued out of order, S.D could store the L.D result to memory instead of storing the MUL.D result.

21 ENCM 501 Winter 2018 Slide Set 8 slide 21/34 Data dependencies and name dependencies A RAW hazard is a data dependency, sometimes also called a true dependency. A later instruction has a source that was a destination of an earlier instruction. The earlier instruction may not have written its result to a register or to memory when the later instruction tries to read that result.

22 ENCM 501 Winter 2018 Slide Set 8 slide 22/34 In contrast, WAW and WAR hazards are sometimes called name dependencies. Generally, the situation is like this... Some instruction B receives information from an earlier instruction A, in some register or memory location. Some instruction D receives unrelated information from an earlier instruction C, in the same register or memory location. In program order, there is no problem the first use of the storage location is over before the second use starts. But in an out-of-order environment, communication between one pair of instructions may interfere with communication between another pair.

23 ENCM 501 Winter 2018 Slide Set 8 slide 23/34 The term name dependency comes from the idea that two or more pairs of instructions are making conflicting use of a name (register number or memory address) used for inter-instruction communication. WAW name dependencies are also called output dependencies: The correctness of a write depends on the write overwriting any writes to the same location that should occur earlier in program order. WAR name dependencies are also called anti-dependencies: The correctness of a read depends on the read result not being contaminated by a write that should occur later in program order.

24 ENCM 501 Winter 2018 Slide Set 8 slide 24/34 Name dependencies are a real problem The example WAW and WAR hazards given earlier in the Slide Set are easy to fix the hazards go away if better choices are made about FPRs used for intermediate results. But what if a compiler has very few registers to allocate? This could happen with a register-poor ISA (like x86!), or with code that for some reason needs to use many registers. A more common and important problem arises from short loops with long-latency instructions...

25 ENCM 501 Winter 2018 Slide Set 8 slide 25/34 Here s a loop to multiply each element in a vector by a factor in register F12... L1: L.D F2, (R8) MUL.D F4, F2, F12 DADDIU R8, R8, 8 S.D F4, (R9) DADDIU R9, R9, 8 BNE R8, R10, L1 MUL.D instructions have long latencies but they can be pipelined. It would be good to start the second MUL.D before the first MUL.D finishes, the third MUL.D before the second MUL.D finishes, and so on. Where is there a name dependency in this sequence? (By the way, there are several RAW hazards in this example, too!)

26 ENCM 501 Winter 2018 Slide Set 8 slide 26/34 Outline of Slide Set 8 Pipelines with long-latency instructions What does program order mean? Out-of-order execution, WAW and WAR hazards Data hazards related to memory locations Managing hazards in OoO processor designs

27 ENCM 501 Winter 2018 Slide Set 8 slide 27/34 Data hazards related to memory locations Example RAW, WAW and WAR hazards given earlier in this slide set have been related to sequences of writes and reads to registers. Similar hazards can arise related to stores to and loads from memory locations.

28 ENCM 501 Winter 2018 Slide Set 8 slide 28/34 Is there any kind of data hazard that could arise from OoO issue or completion of the two load instructions here? If so, what kind of hazard is it? L.D F0, (R8) instructions, but no loads or stores L.D F2, (R9)

29 ENCM 501 Winter 2018 Slide Set 8 slide 29/34 What kind of data hazard is possible involving the store instruction and the load instruction, in an out-of-order system? MUL.D S.D F2, F0, F0 F2, 40(R29) instructions, but no loads or stores L.D F4, (R9)

30 ENCM 501 Winter 2018 Slide Set 8 slide 30/34 What kind of data hazard is possible involving the two store instructions, in an out-of-order system? MUL.D S.D F0, F2, F2 F0, (R8) instructions, but no loads or stores S.D F4, (R9)

31 ENCM 501 Winter 2018 Slide Set 8 slide 31/34 What kind of data hazard is possible involving the L.D instruction and the S.D instruction, in an out-of-order system? LD R9, (R8) # read address from memory L.D F0, (R9) # use just-read address instructions, but no loads or stores S.D F2, (R10)

32 ENCM 501 Winter 2018 Slide Set 8 slide 32/34 Outline of Slide Set 8 Pipelines with long-latency instructions What does program order mean? Out-of-order execution, WAW and WAR hazards Data hazards related to memory locations Managing hazards in OoO processor designs

33 ENCM 501 Winter 2018 Slide Set 8 slide 33/34 Managing hazards in OoO processor designs Several different design approaches have been used. Because of time constraints in ENCM 501, we ll focus on one design approach, called Tomasulo s algorithm. Textbook Section 3.4 introduces the algorithm. Textbook Section 3.5 provides detailed examples of how the algorithm works. Textbook Section 3.6 shows how the algorithm can be extended to ensure correct processing in the face of problems such as exceptions and branch mispredictions.

34 ENCM 501 Winter 2018 Slide Set 8 slide 34/34 The details in Sections focus mainly on data hazards related to writes to and reads from floating-point registers. There is some discussion, less detailed, of data hazards related to writes to and reads from memory locations. To keep things as simple as possible as simple as possible is still quite complicated, as we ll see details are left out regarding writes to and reads from general purpose registers. In a practical design, hazards related to GPR reads and writes would be handled in a similar fashion to FPR read and write hazards.

Slide Set 7. for ENCM 501 in Winter Term, Steve Norman, PhD, PEng

Slide Set 7. for ENCM 501 in Winter Term, Steve Norman, PhD, PEng Slide Set 7 for ENCM 501 in Winter Term, 2017 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 2017 ENCM 501 W17 Lectures: Slide