Slide Set 7 for Lecture Section 01

Size: px

Start display at page:

Download "Slide Set 7 for Lecture Section 01"

Vivien Blankenship
5 years ago
Views:

1 Slide Set 7 for Lecture Section 01 for ENCM 369 Winter 2017 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary February 2017

2 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 2/86 Contents The multicycle processor (textbook Section 7.4) Introduction to Pipelining 5 pipeline stages for our MIPS subset Pipeline Hazards Making pipelining work in hardware Hardware features to manage data hazards Hardware changes to manage control hazards Exceptions

3 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 3/86 Outline of Slide Set 7 for Lecture Section 01 The multicycle processor (textbook Section 7.4) Introduction to Pipelining 5 pipeline stages for our MIPS subset Pipeline Hazards Making pipelining work in hardware Hardware features to manage data hazards Hardware changes to manage control hazards Exceptions

4 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 4/86 The multicycle processor (textbook Section 7.4) ENCM 369 will not cover Section 7.4 in detail, because terms at Canadian universities are short! That s too bad, because the multicycle design has some interesting aspects... It shows how a computer can use a single memory array for both instructions and data. It makes very efficient use of the ALU the ALU gets used to compute three different results for every instruction. The control unit is sequential it s a really nice and practical example of a finite state machine (FSM).

5 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 5/86 Outline of Slide Set 7 for Lecture Section 01 The multicycle processor (textbook Section 7.4) Introduction to Pipelining 5 pipeline stages for our MIPS subset Pipeline Hazards Making pipelining work in hardware Hardware features to manage data hazards Hardware changes to manage control hazards Exceptions

6 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 6/86 Introduction to Pipelining Before we start to learn about pipelining, let s review a model we will call the one-instruction-at-a-time model: Step 1: Processor reads instruction from memory and updates PC. Step 2: Processor executes the instruction. The processor performs Step 1, Step 2, Step 1, Step 2,..., forever (or until the power is turned off). This model correctly predicts the results produced by sequences of instructions in assembly language code. Also, the model accurately describes the organization of the processors of textbook Sections 7.3 and 7.4.

7 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 7/86 The one-instruction-at-a-time model and modern processors The model DOES NOT accurately describe the organization of modern processors! At a given moment in time, a modern processor will be working on many different instructions this allows much greater speed than one-instruction-at-a-time processing. However, the processor must produce results as if instructions were being handled one at a time.

8 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 8/86 Remark: Your instructor thinks that as if is a very short and very useful summary of many of the important ideas related to modern computer system designs. Modern processor chips often process instructions in ways that are hard for humans to understand, but nevertheless do what skilled coders want in time- and energy-efficient ways.

9 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 9/86 The Laundry Analogy This analogy is taken from Computer Organization and Design, by David Patterson and John Hennessy, which was the ENCM 369 textbook for many years. You have many loads of laundry to do, with these four resources: a washing machine a dryer a folding unit (you) a putting-away unit (your roommate) (In real life not very many students would ask their roommates to put away laundry for them, but let s just follow Patterson and Hennessy here.)

10 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 10/86 The Laundry Analogy, continued Let s assume that each step in processing laundry takes 30 minutes. (In real life, this close to correct for washers but unfortunately not at all correct for dryers.) Suppose you have four loads of dirty laundry. If you process each load completely before starting the next, how long does it take to finish all four loads?

11 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 11/86 Processing four loads of laundry, one at a time... Load 1st 2nd 3rd 4th W D F PA W D F 6:00pm 8:00pm 10:00pm midnight 2:00am Time The work takes EIGHT HOURS in total! And each resource (washer, dryer, etc.) is IDLE for three-quarters of the time. PA There is an obvious way to speed this up... W D F PA W D F PA

12 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 12/86 Processing four loads of laundry, making better use of resources... As soon as one load is out of the washer, the washer is free for the next load. The same is true for all of the other resources. So we can schedule the work this way... Load 1st 2nd 3rd 4th W D W F D W PA F D W PA F D PA F PA 6:00pm 8:00pm 10:00pm midnight 2:00am Time

13 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 13/86 The concept of pipelining in digital logic design A pipelined system is a collection of stages, each with a simple role to perform. When a stage is finished producing its current output, it can pass that output to the next stage and receive new input. In the laundry analogy, the washer stage receives a load of dirty clothes as input, and produces a load of wet, clean clothes as output, which gets passed as input to the dryer stage. In Harris and Harris, this year s textbook, pipelining is introduced in Section 3.6, along with an analogy to baking cookies. That section is short and worth reading.

14 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 14/86 Pipelined execution of instructions In a pipelined processor, an instruction is like a single load of laundry. Processing an instruction can start long before processing of the preceding instruction is finished. To divide the work of processing an instruction across a number of pipeline stages, that work has to be broken down into simple steps that take roughly equal amounts of time. Each step must fit into a single clock cycle.

15 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 15/86 Outline of Slide Set 7 for Lecture Section 01 The multicycle processor (textbook Section 7.4) Introduction to Pipelining 5 pipeline stages for our MIPS subset Pipeline Hazards Making pipelining work in hardware Hardware features to manage data hazards Hardware changes to manage control hazards Exceptions

16 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 16/86 5 pipeline stages for our MIPS subset The subset is: ADD, SUB, SLT, AND, OR, LW, SW, BEQ. The stages are: Fetch: Read instruction from I-Mem and update PC. Decode: Determine outputs of Control Unit and read GPRs from R-File. Execute: Get a result from the ALU. Memory: D-Mem access for loads and stores. Writeback: Write to a GPR at the end of a load or an R-type instruction. In some stages, for some instructions, nothing happens. What are some examples of that?

17 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 17/86 Example sequence of instructions in our 5-stage MIPS subset pipeline # This sequence is not practical code, # but it makes for a simple example. ADD $t2, $t0, $t1 LW $t4, ($t3) SW $t5, ($t6) SUB $t9, $t7, $t8 Let s suppose we have a 1 GHz clock, so the clock period is 1 ns. How long will it take from the beginning of the ADD instruction to the end of the SUB instruction?

18 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 18/86 Pipelined processing for example instruction sequence ADD IF ID EX MEM WB LW IF ID EX MEM WB SW IF ID EX MEM WB SUB IF ID EX MEM WB 0 ns 1 ns 2 ns 3 ns 4 ns 5 ns 6 ns 7 ns 8 ns time

19 The single-cycle processor starts one instruction per clock cycle. A pipelined processor also starts one instruction per clock cycle. Why will a pipelined design allow much greater instruction throughput? (The diagram below provides a hint at the answer!) CLK PC output Instruction main decoder outputs R-File outputs ALU decoder outputs ALU result D-Mem RD output $s1 contents

20 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 20/86 An example 3-instruction sequence in a pipelined processor The sequence is... lw $t0, 20($t1) or $t2, $t3, $t4 sw $t5, 40($t6) Let s use the Pipeline Basics handout to track all the steps in processing these instructions.

21 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 21/86 Outline of Slide Set 7 for Lecture Section 01 The multicycle processor (textbook Section 7.4) Introduction to Pipelining 5 pipeline stages for our MIPS subset Pipeline Hazards Making pipelining work in hardware Hardware features to manage data hazards Hardware changes to manage control hazards Exceptions

22 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 22/86 Pipeline Hazards These can be defined as situations that prevent throughput of one instruction per clock cycle. There are three main kinds: structural hazards, data hazards, and control hazards.

23 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 23/86 Structural Hazards A structural hazard occurs when a unit within a computer is asked to do two (or more) incompatible things at the same time. Example: In a computer with a single memory unit, the processor can t do the Fetch step of one instruction while also doing the Memory step of an earlier LW or SW instruction.

24 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 24/86 Solution to Structural Hazards Design the instruction set and hardware so that this kind of hazard does not occur. Example: Have separate Instruction and Data Memories, so Fetch can be simultaneous with Memory of an earlier instruction. (Note: When we get to textbook Chapter 8, we ll see that for modern processors, separation of instructions and data really means having separate caches for instructions and data.)

25 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 25/86 Data Hazards: Are inputs to instructions up-to-date? Example: add $t0, $t1, $t2 sub $t4, $t3, $t0 The destination of ADD is a source for SUB. The Writeback step of ADD will happen later than the Decode step of SUB, so there is a risk that SUB will use old, wrong data from $t0. Remember, the processor must produce results as if one instruction completes before the next instruction starts!

26 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 26/86 Control Hazards: What instruction address should be used in the next Fetch? Example... beq $t0, $t1, L1 and $t4, $t2, $t3... more instructions... L1: lw $t5, ($t6) Which instruction should be fetched after BEQ is fetched? AND or LW? The processor will not know until the $t0 == $t1 comparison is done!

27 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 27/86 Assumption about Register File in textbook Section 7.5, related to data hazards Writes to the Register File occur in the first half of a clock cycle, and reads from the Register File occur in the second half. To enable this behaviour, what choices can be made about flip-flops, Data Memory, and other clocked components? What are the consequences regarding GPR reads and writes that happen within the same clock cycle?

28 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 28/86 Edge-triggering for pipelined computers in Section 7.5 Updates to GPRs in the Register File happen in response to negative clock edges. system clock 1 0 Updates to PC, Data Memory, and pipeline registers happen in response to positive clock edges. This is NOT applicable to the single-cycle design of Section 7.3 and the multicycle design of Section 7.4!

29 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 29/86 Review: 5 pipeline stages for our MIPS subset Fetch: Read instruction from Instruction Memory; do PC = PC + 4. Decode: Determine Control Unit outputs appropriate for instruction opcode; copy two GPR values out of Register File. Execute: Do computation in ALU. Memory: Read or write Data Memory. Writeback: Update a GPR in the Register File.

30 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 30/86 Solutions to data hazards, first of three: stalling the pipeline Example: add $t0, $t1, $t2 sub $t4, $t3, $t0

31 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 31/86 Solutions to data hazards, second of three: forwarding Example A, from previous slide: add $t0, $t1, $t2 sub $t4, $t3, $t0 Example B: lw $t0, ($t1) add $t2, $t2, $t3 slt $t6, $t0, $t5

32 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 32/86 Solutions to data hazards, third of three: combine stalling and forwarding Example: lw $t0, ($t1) add $t3, $t0, $t2 Can forwarding by itself solve this data hazard?

33 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 33/86 Control Hazards (repeat of earlier example) What instruction address should be used in the next Fetch step after the Fetch step of a branch instruction? Example... beq $t0, $t1, L1 and $t4, $t2, $t3... more instructions... L1: lw $t5, ($t6) Which instruction should be fetched after BEQ is fetched? AND or LW? The processor will not know until the $t0 == $t1 comparison is done!

34 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 34/86 Control hazard illustration BEQ F D E M W next instruction F D E M W Here next means next in time, not necessarily next location in Instruction Memory. Why will it be difficult to do the Fetch step for the next instruction just one clock cycle after the Fetch step for BEQ? (There are multiple reasons.)

35 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 35/86 Four kinds of solutions for control hazards 1. Stall: Delay the Fetch step for the next instruction until the address of the next instruction is known. 2. Predict: Guess what the address of the next instruction will be, and act on the guess without delay. Check that the guess was correct; if not, cancel instructions that have incorrectly entered the pipeline. 3. Delayed branch and jump rules 4. Conditional instructions

36 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 36/86 Dynamic branch prediction This is widely used in modern processors (but mostly not used in low-power embedded processors). A large and complex branch prediction circuit is dedicated to recording information about recently-encountered branch instructions. For each branch instruction, its target address is recorded along with a prediction about whether the branch will be taken.

37 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 37/86 Dynamic branch prediction, continued When a branch instruction is encountered, the branch prediction circuit can quickly supply a guess for the next PC value, and instruction fetch can occur without delay. If a guess is wrong, some instructions will have to be cancelled, and clock cycles will be lost. This system is called dynamic because a taken/not-taken prediction will be changed if it has recently been more often wrong than right.

38 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 38/86 Branch prediction code example p and past_last are of type int*. count is an int. do { if (*p < 0) count++; p++; } while (p!= past_last); p walks through an array of int elements, and count records how many of those elements are negative.

39 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 39/86 Branch prediction code example, continued Let s suppose that there are a lot of array elements, and most of them are negative... L1: lw $t0, ($a0) slt $t1, $zero, $t0 beq $t1, $zero, L2 # branch if!(*p < 0) addiu $t9, $t9, 1 # count++ L2: addiu $a0, $a0, 4 # p++ bne $a0, $t8, L1 # branch if p!= past_last As the processor runs the loop, what predictions will it make about the BEQ and BNE instructions?

40 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 40/86 Delayed branch and jump rules This kind of solution to control hazards is older and less sophisticated than branch prediction. This is a feature of the real MIPS instruction set, but is NOT enabled by default in MARS and other MIPS simulators used for education! The idea is that one instruction of useful work can get started in the clock cycle needed to make a branch decision and compute a branch or jump target address. Details are in the two paragraphs at the bottom of the page in the Control Hazard Solutions document.

41 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 41/86 MIPS delayed branch example What will the flow of instructions be if $t0!= 0? What will it be if $t0 == 0? slt beq add lw operands $t0, $zero, L1 operands operands L1: or operands sub operands

42 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 42/86 Examples of MIPS delayed jumps # C code: i = foo(17);... suppose i is in $s0. jal foo addiu $a0, $zero, 17 # Argument set up after call starts! addu $s0, $v0, $zero # $ra points to this instruction. # Example return from nonleaf procedure... jr $ra addiu $sp, $sp, 32 # Deallocate stack after return starts! If you ever do A.L. programming for real MIPS processors, or need to read MIPS compiler output, be aware of delayed branches and jumps!

43 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 43/86 Conditional instructions if (a < b) c = a; else c = b; Suppose this if-else code is inside a loop. Translating this with a branch and a jump could cause a lot of lost clock cycles, especially if branch prediction does a poor job. Suppose that a, b, and c are ints in $s0, $s1, $s2. Let s see how this can be coded with MIPS move conditional instructions movn and movz. By the way, ARM instruction sets have very rich collections of conditional instructions.

44 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 44/86 Outline of Slide Set 7 for Lecture Section 01 The multicycle processor (textbook Section 7.4) Introduction to Pipelining 5 pipeline stages for our MIPS subset Pipeline Hazards Making pipelining work in hardware Hardware features to manage data hazards Hardware changes to manage control hazards Exceptions

45 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 45/86 Making pipelining work in hardware The textbook presents a sequence of designs, from Figure 7.45 to Figure The earliest designs are incomplete and incorrect in many ways. Later designs get closer and closer to being complete and correct. Recommendation: Read Sections through carefully and observe how new features get added and existing features get modified.

46 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 46/86 Remarks on Textbook Figure 7.47 This computer handles R-type, LW, and SW instructions correctly, except when there are data hazards. It makes an attempt to handle BEQ, but doesn t get it right. This computer works as if three delay-slot instructions should be processed before a branch is taken.

47 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 47/86 D flip-flops: What s the point? (Repeat slide from Slide Set 6) This is important! Knowing what a D flip-flop does is as important as knowing the truth tables for NOT, AND, and OR. A clock cycle is a span of time from one active edge of a clock to the next active edge. A D flip-flop captures the value of the input bit D at the end of a clock cycle, and makes that captured bit value available on Q throughout the next clock cycle.

48 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 48/86 Pipeline registers Prominent in all of the Section 7.5 designs are pipeline registers made of D flip-flops. The pipeline registers are not 32 bits wide they re much wider than that. They have clock inputs; the register outputs change only on active clock edges. At the end of each clock cycle, each pipeline register collects information from one pipeline stage, and makes that information available to the next stage throughout the next clock cycle.

49 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 49/86 A sketch of a pipelined datapath This is essentially textbook Figure 7.46 with the wiring removed to reduce clutter. Note the highlighted pipeline registers! CLK CLK CLK CLK CLK CLK CLK PC I-Mem F/D pipeline register R-File + 4 SignExt D/E pipeline register <<2 ALU + E/M pipeline register D-Mem M/W pipeline register F stage D stage E stage M stage W stage

50 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 50/86 Review: Edge-triggering for pipelined computers in Section 7.5 Updates to GPRs in the Register File happen in response to negative clock edges. system clock 1 0 Updates to PC, Data Memory, and pipeline registers happen in response to positive clock edges.

51 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 51/86 Tracing an instruction through the datapath of Figure 7.46 Let s trace an R-type instruction: SLT $2, $4, $5. We ll assume that this instruction is located at address 0x0040_0030 in Instruction Memory. For now, we ll look at the datapath only. We ll consider control later, after we have seen the whole datapath.

52 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 52/86 SLT $2, $4, $5 located at 0x0040_0030: F stage CLK CLK 0 1 PC PCPlus4F PCF 4 I-Mem + F/D pipeline reg. PCBranchM (from M stage) How many DFFs are there in the F/D register? What values get written into the F/D register at the end of the Fetch clock cycle of the SLT?

53 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 53/86 SLT $2, $4, $5 located at 0x0040_0030: D stage CLK CLK InstrD 25:21 WE3 20:16 R-File F/D pipeline reg. 20:16 15:11 15:0 SignExt PCPlus4D WriteRegW ResultW CLK D/E pipeline reg. How many DFFs are there in the D/E register? What gets into the D/E register at the end of the Decode clock cycle? What is going on with WriteRegW and ResultW?

54 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 54/86 SLT $2, $4, $5 located at 0x0040_0030: E stage CLK D/E pipeline reg. RtE RdE 0 1 SignImmE PCPlus4E 0 1 SrcAE SrcBE <<2 ALU WriteDataE WriteRegE + CLK E/M pipeline reg. How many DFFs are there in the E/M register? For the SLT instruction, what useful information gets written into the E/M register at the end of the Execute clock cycle? What useful information gets written into the E/M register in the cases of LW, SW and BEQ?

55 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 55/86 SLT $2, $4, $5 located at 0x0040_0030: M stage CLK ZeroM E/M pipeline reg. ALUOutM WriteDataM WriteRegM PCBranchM CLK WE D-Mem CLK M/W pipeline reg. How many DFFs are there in the M/W register? For the SLT instruction, what useful information gets written into the E/M register at the end of Memory clock cycle? What happens in the M stage for LW, SW, and BEQ?

56 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 56/86 SLT $2, $4, $5 located at 0x0040_0030: W stage CLK ALUOutW M/W pipeline reg. ReadDataW 1 WriteRegW 0 For the SLT instruction, what happens in the Writeback stage? Let s draw part of a schematic to help explain it. What would be the same and what would be different for an LW instruction in the W stage? ResultW

57 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 57/86 Pipelined control for the Figure 7.46 datapath Perhaps surprisingly, we can use exactly the same control unit that was designed for the single-cycle machine. We can drop the Control Unit into the Decode stage. However, now we must organize the control signals so that each one arrives at the correct time wherever it is needed on the datapath! For example... Q1: RegWrite = 1 is generated for LW. When should that value of RegWrite arrive at the R-File? Q2: MemWrite = 1 is generated for SW. When should that value of MemWrite arrive at D-Mem? Q3: What general method can we use to get the timing correct for all of the control signals?

58 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 58/86 Control circuit for pipelined datapath of Figure :26 Instr 5:0 Control Unit opcode funct RegWriteD MemtoRegD MemWriteD BranchD ALUControlD ALUSrcD RegDstD to R-File CLK CLK PCSrcM CLK D/E pipeline register. RegWriteE MemtoRegE MemWriteE BranchE ALUControlE ALUSrcE RegDstE E/M pipeline register. RegWriteM MemtoRegM MemWriteM BranchM ZeroM (from ALU) M/W pipeline register MemtoRegW. RegWriteW Let s make a few notes about how this circuit works.

59 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 59/86 How much progress have we made so far? Reminder: processor designs near the beginning of Section 7.5 are incomplete and partly incorrect. Processor designs get better and better as corrections and improvements are made. The datapath and control system we have just looked at in detail are combined in the textbook in the computer of Figure That computer can t deal with data hazards and handles BEQ incorrectly.

60 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 60/86 Outline of Slide Set 7 for Lecture Section 01 The multicycle processor (textbook Section 7.4) Introduction to Pipelining 5 pipeline stages for our MIPS subset Pipeline Hazards Making pipelining work in hardware Hardware features to manage data hazards Hardware changes to manage control hazards Exceptions

61 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 61/86 Hardware features to manage data hazards Let s start by reviewing two of the more complicated kinds of data hazard. For example #2 of the Hazard Examples document... first ADD F D E M W second ADD F D E M W SUB F D E M W Let s illustrate why forwarding by itself won t work for example #4 in Hazard Examples...

62 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 62/86 Hardware for forwarding: This incomplete sketch of an upgraded Execute stage allows a lot of choice for ALU A and B inputs! CLK ID/EX pipeline register GPR GPR LW/SW offset ForwardAE Hazard Unit ALUSrcE ForwardBE 0 1 A B ALU WriteDataE ALUOutM ResultW

63 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 63/86 Hardware for forwarding, continued: Q1: What should the values of ForwardAE and ForwardBE be in the case where no forwarding is needed? Consider this sequence: LW AND SUB R8, 0(R4) R9, R10, R11 R12, R8, R9 Q2: What should the values of ForwardAE and ForwardBE be when SUB is in the EX stage? Q3: What inputs does the Hazard Unit need in order to decide correctly on the values of ForwardAE and ForwardBE?

64 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 64/86 Hazard Unit for computer of textbook Figure 7.50 RsE RtE ForwardAE ForwardBE WriteRegM RegWriteM WriteRegW RegWriteW Hazard Unit What are RsE and RtE, and how are they used by the Hazard Unit? A complete description of the logic in this version of the Hazard Unit can be found on pages 416 and 418 in the textbook. Note: The computer of Figure 7.50 properly handles data hazards that can be solved using forwarding only. It is not capable of solving data hazards that require stalls.

65 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 65/86 Hardware for data hazard stalls This is an example of what is called a load-use data hazard: LW $8, 0($9) ADD $16, $17, $8 SUB $18, $4, $5 We ve already seen that a one-cycle stall is needed so that the M stage result of LW can be forwarded to the E stage of ADD. The need for a stall can be detected in the D stage of ADD. Let s draw a diagram to show how LW, ADD, and SUB will be processed.

66 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 66/86 To make this work in hardware, we must enhance some of the registers in the system... Add an EN (enable) input to the PC. If EN is turned off the PC is frozen and does not update on a positive clock edge. Add a similar EN input to the F/D pipeline register. Add a CLR (clear) input to the D/E pipeline register. If CLR is turned on, the instruction arriving in the register is converted to a harmless NOP. These changes are sketched in an incomplete schematic on the next slide...

67 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 67/86 CLK CLK CLK PC EN StallF F/D register StallD EN RsD RtD CLR D/E register FlushE RtE MemtoRegE extension to Hazard Unit For clarity, the schematic above only shows Hazard Unit inputs and outputs that are used to effect the stall for LW instructions. See textbook Figure 7.53 for a complete schematic.

68 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 68/86 For a complete description of all of the logic used to effect the stall for LW instructions, see pages in the textbook. In lecture, it s really only possible to present a sketch of that logic.

69 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 69/86 Outline of Slide Set 7 for Lecture Section 01 The multicycle processor (textbook Section 7.4) Introduction to Pipelining 5 pipeline stages for our MIPS subset Pipeline Hazards Making pipelining work in hardware Hardware features to manage data hazards Hardware changes to manage control hazards Exceptions

70 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 70/86 Hardware changes to manage control hazards ENCM 369 will NOT cover this material in depth, and there will be NO lab exercises or midterm or final exam questions on it! The Figure 7.53 processor is excellent regarding data hazards, but handles BEQ instructions poorly three instructions follow a BEQ into the pipeline before the branch decision gets made. Why does that happen? The Figure 7.53 processor makes the branch decision in the Memory stage. (Check the location of the AND gate... )

71 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 71/86 Redesign to make branch instructions work better The processor of Figure 7.56 moves the branch decision from the Memory stage to the Decode stage, and the branch target address generation from the Execute stage to the Decode stage. So, only one instruction follows BEQ into the pipeline before a branch is taken, which is better, but making the decision in the Decode stage causes new data hazards!

72 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 72/86 Redesign to make branch instructions work better, continued Example #6 from the Hazard Examples document, with an extra instruction... LW $17, 0($4) BEQ $17, $0, some_other_label ADD $2, $5, $6 What is needed to get the LW result into the Decode step of BEQ? If the branch is taken, what should happen to ADD? (Assume that we re designing a computer that does NOT have a delayed branch rule.)

73 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 73/86 Redesign to make branch instructions work better: Remarks It s hard to process branches with perfect accuracy without losing lots of cycles due to hazards! Therefore, dynamic branch prediction can save a lot of cycles if most guesses are correct. Also, conditional instructions such as MIPS movn and movz are sometimes better choices than branch instructions.

74 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 74/86 Outline of Slide Set 7 for Lecture Section 01 The multicycle processor (textbook Section 7.4) Introduction to Pipelining 5 pipeline stages for our MIPS subset Pipeline Hazards Making pipelining work in hardware Hardware features to manage data hazards Hardware changes to manage control hazards Exceptions

75 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 75/86 Exceptions: General Concepts An exception is an event that changes flow of instructions in a way that is quite different from a branch or jump. So, obviously, an exception causes a special kind of PC update. But an exception can also cause a change in privilege a switch from a user program to operating system kernel software.

76 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 76/86 Privilege: user program vs. kernel A user program has rights to read and write memory allocated to that program and to read and write registers. That s all it can do by itself, but it can also ask for help from the kernel. The kernel controls hardware like disks and network interfaces. The kernel has power over all memory in the computer and can start and stop all other programs.

77 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 77/86 Two meanings for exception The concept of an exception in discussion of hardware or assembly language code is NOT THE SAME as the concept of an exception in a high-level language like C++, Java, or Python! Exception-related keywords in C++: try, catch, throw Exception-related keywords in Java: try, catch, finally, throw, throws Exception-related keywords in Python: try, except, raise

78 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 78/86 Two meanings for exception, continued High-level language exception: a special kind of jump (possibly involving a return through one or more procedure calls) to code that is set up to handle an error condition. Do NOT try to connect the above concept to hardware exceptions if you do, your brain will hurt and your understanding of both kinds of exceptions will be damaged.

79 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 79/86 Exceptions in Hardware and Assembly Language: 3 Main Categories 1. The processor notices that a program has tried to do a bad thing. 2. A program intentionally generates the exception. 3. Interrupts hardware external to the processor sends a signal to the processor asking for attention.

80 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 80/86 Examples of program-trying-a-bad-thing exceptions Instruction fetched with opcode that does not make sense to processor ( undefined instruction ). Addition or subtraction of integers resulted in overflow (e.g., MIPS ADD, SUB, ADDI, but not ADDU, SUBU, ADDIU). Attempt to access memory a program is not permitted to access. Attempt to access memory with invalid address (e.g., LW data address is not a multiple of 4). (Note: Memory units in Chapter 7 computers don t have the capability to report memory access errors, but memory systems in real computers usually do.)

81 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 81/86 Programs intentionally causing exceptions This mainly happens with system calls. Examples: MIPS syscall instruction, similar instructions in other instruction sets. A user program asks the operating system kernel to provide a service.

82 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 82/86 Examples of Interrupts Laptop user presses key on keyboard. Desktop user moves a mouse. Smartphone or tablet user taps finger on a touchscreen. A data packet arrives on a network interface. A disk controller reports that a write operation on a disk has completed.

83 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 83/86 What happens when an exception occurs? The processor will start executing instructions that form an exception handler (like a procedure, but not exactly the same). Before starting the exception handler, the processor must record some essential information in some special-purpose registers... Let s make some notes about this essential information.

84 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 84/86 Exceptions and pipelines Due to time limitations and lack of textbook support, we will not look in detail at this topic, just give a quick sketch. Useful terms: exception victim and flushing. The victim of an exception is either the instruction that caused the exception or, when there is an interrupt, the first instruction in the pipeline that will not be allowed to complete. To flush an instruction in a pipeline means ensuring that the instruction does not update system state, such as register file or memory contents.

85 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 85/86 Exceptions and pipelines: key challenges Instructions that enter a pipeline before the victim must be allowed to complete. The victim and the instructions that followed the victim in the pipeline must be flushed. The address of the victim must be identified NOT easy, because in a pipelined system, the PC probably will NOT be pointing to the victim.

86 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 86/86 Example of MIPS exception processing Suppose there is an exception when the LW instruction (at address 0x0040_0090) is in the Memory stage. What should happen? Scenario 1: Exception is caused by $t0 not being a multiple of 4. Scenario 2: Exception is an interrupt, unrelated to this program. # Code running in a # 5-stage pipeline, an # actual MIPS computer, # not a Ch. 7 machine! andi $t2, $s4, 0xFF sll $t3, $t2, 8 or $s2, $s2, $t3 lw $t1, ($t0) addiu $t0, $t0, 4 sw $t1, ($s0) addiu $s0, $s0, 4 slt $t4, $t0, $s7

ENCM 369 Winter 2018 Lab 9 for the Week of March 19

page 1 of 9 ENCM 369 Winter 2018 Lab 9 for the Week of March 19 Steve Norman Department of Electrical & Computer Engineering University of Calgary March 2018 Lab instructions and other documents for ENCM