Slide Set 7 for Lecture Section 01
|
|
- Vivien Blankenship
- 5 years ago
- Views:
Transcription
1 Slide Set 7 for Lecture Section 01 for ENCM 369 Winter 2017 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary February 2017
2 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 2/86 Contents The multicycle processor (textbook Section 7.4) Introduction to Pipelining 5 pipeline stages for our MIPS subset Pipeline Hazards Making pipelining work in hardware Hardware features to manage data hazards Hardware changes to manage control hazards Exceptions
3 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 3/86 Outline of Slide Set 7 for Lecture Section 01 The multicycle processor (textbook Section 7.4) Introduction to Pipelining 5 pipeline stages for our MIPS subset Pipeline Hazards Making pipelining work in hardware Hardware features to manage data hazards Hardware changes to manage control hazards Exceptions
4 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 4/86 The multicycle processor (textbook Section 7.4) ENCM 369 will not cover Section 7.4 in detail, because terms at Canadian universities are short! That s too bad, because the multicycle design has some interesting aspects... It shows how a computer can use a single memory array for both instructions and data. It makes very efficient use of the ALU the ALU gets used to compute three different results for every instruction. The control unit is sequential it s a really nice and practical example of a finite state machine (FSM).
5 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 5/86 Outline of Slide Set 7 for Lecture Section 01 The multicycle processor (textbook Section 7.4) Introduction to Pipelining 5 pipeline stages for our MIPS subset Pipeline Hazards Making pipelining work in hardware Hardware features to manage data hazards Hardware changes to manage control hazards Exceptions
6 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 6/86 Introduction to Pipelining Before we start to learn about pipelining, let s review a model we will call the one-instruction-at-a-time model: Step 1: Processor reads instruction from memory and updates PC. Step 2: Processor executes the instruction. The processor performs Step 1, Step 2, Step 1, Step 2,..., forever (or until the power is turned off). This model correctly predicts the results produced by sequences of instructions in assembly language code. Also, the model accurately describes the organization of the processors of textbook Sections 7.3 and 7.4.
7 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 7/86 The one-instruction-at-a-time model and modern processors The model DOES NOT accurately describe the organization of modern processors! At a given moment in time, a modern processor will be working on many different instructions this allows much greater speed than one-instruction-at-a-time processing. However, the processor must produce results as if instructions were being handled one at a time.
8 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 8/86 Remark: Your instructor thinks that as if is a very short and very useful summary of many of the important ideas related to modern computer system designs. Modern processor chips often process instructions in ways that are hard for humans to understand, but nevertheless do what skilled coders want in time- and energy-efficient ways.
9 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 9/86 The Laundry Analogy This analogy is taken from Computer Organization and Design, by David Patterson and John Hennessy, which was the ENCM 369 textbook for many years. You have many loads of laundry to do, with these four resources: a washing machine a dryer a folding unit (you) a putting-away unit (your roommate) (In real life not very many students would ask their roommates to put away laundry for them, but let s just follow Patterson and Hennessy here.)
10 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 10/86 The Laundry Analogy, continued Let s assume that each step in processing laundry takes 30 minutes. (In real life, this close to correct for washers but unfortunately not at all correct for dryers.) Suppose you have four loads of dirty laundry. If you process each load completely before starting the next, how long does it take to finish all four loads?
11 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 11/86 Processing four loads of laundry, one at a time... Load 1st 2nd 3rd 4th W D F PA W D F 6:00pm 8:00pm 10:00pm midnight 2:00am Time The work takes EIGHT HOURS in total! And each resource (washer, dryer, etc.) is IDLE for three-quarters of the time. PA There is an obvious way to speed this up... W D F PA W D F PA
12 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 12/86 Processing four loads of laundry, making better use of resources... As soon as one load is out of the washer, the washer is free for the next load. The same is true for all of the other resources. So we can schedule the work this way... Load 1st 2nd 3rd 4th W D W F D W PA F D W PA F D PA F PA 6:00pm 8:00pm 10:00pm midnight 2:00am Time
13 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 13/86 The concept of pipelining in digital logic design A pipelined system is a collection of stages, each with a simple role to perform. When a stage is finished producing its current output, it can pass that output to the next stage and receive new input. In the laundry analogy, the washer stage receives a load of dirty clothes as input, and produces a load of wet, clean clothes as output, which gets passed as input to the dryer stage. In Harris and Harris, this year s textbook, pipelining is introduced in Section 3.6, along with an analogy to baking cookies. That section is short and worth reading.
14 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 14/86 Pipelined execution of instructions In a pipelined processor, an instruction is like a single load of laundry. Processing an instruction can start long before processing of the preceding instruction is finished. To divide the work of processing an instruction across a number of pipeline stages, that work has to be broken down into simple steps that take roughly equal amounts of time. Each step must fit into a single clock cycle.
15 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 15/86 Outline of Slide Set 7 for Lecture Section 01 The multicycle processor (textbook Section 7.4) Introduction to Pipelining 5 pipeline stages for our MIPS subset Pipeline Hazards Making pipelining work in hardware Hardware features to manage data hazards Hardware changes to manage control hazards Exceptions
16 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 16/86 5 pipeline stages for our MIPS subset The subset is: ADD, SUB, SLT, AND, OR, LW, SW, BEQ. The stages are: Fetch: Read instruction from I-Mem and update PC. Decode: Determine outputs of Control Unit and read GPRs from R-File. Execute: Get a result from the ALU. Memory: D-Mem access for loads and stores. Writeback: Write to a GPR at the end of a load or an R-type instruction. In some stages, for some instructions, nothing happens. What are some examples of that?
17 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 17/86 Example sequence of instructions in our 5-stage MIPS subset pipeline # This sequence is not practical code, # but it makes for a simple example. ADD $t2, $t0, $t1 LW $t4, ($t3) SW $t5, ($t6) SUB $t9, $t7, $t8 Let s suppose we have a 1 GHz clock, so the clock period is 1 ns. How long will it take from the beginning of the ADD instruction to the end of the SUB instruction?
18 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 18/86 Pipelined processing for example instruction sequence ADD IF ID EX MEM WB LW IF ID EX MEM WB SW IF ID EX MEM WB SUB IF ID EX MEM WB 0 ns 1 ns 2 ns 3 ns 4 ns 5 ns 6 ns 7 ns 8 ns time
19 The single-cycle processor starts one instruction per clock cycle. A pipelined processor also starts one instruction per clock cycle. Why will a pipelined design allow much greater instruction throughput? (The diagram below provides a hint at the answer!) CLK PC output Instruction main decoder outputs R-File outputs ALU decoder outputs ALU result D-Mem RD output $s1 contents
20 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 20/86 An example 3-instruction sequence in a pipelined processor The sequence is... lw $t0, 20($t1) or $t2, $t3, $t4 sw $t5, 40($t6) Let s use the Pipeline Basics handout to track all the steps in processing these instructions.
21 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 21/86 Outline of Slide Set 7 for Lecture Section 01 The multicycle processor (textbook Section 7.4) Introduction to Pipelining 5 pipeline stages for our MIPS subset Pipeline Hazards Making pipelining work in hardware Hardware features to manage data hazards Hardware changes to manage control hazards Exceptions
22 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 22/86 Pipeline Hazards These can be defined as situations that prevent throughput of one instruction per clock cycle. There are three main kinds: structural hazards, data hazards, and control hazards.
23 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 23/86 Structural Hazards A structural hazard occurs when a unit within a computer is asked to do two (or more) incompatible things at the same time. Example: In a computer with a single memory unit, the processor can t do the Fetch step of one instruction while also doing the Memory step of an earlier LW or SW instruction.
24 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 24/86 Solution to Structural Hazards Design the instruction set and hardware so that this kind of hazard does not occur. Example: Have separate Instruction and Data Memories, so Fetch can be simultaneous with Memory of an earlier instruction. (Note: When we get to textbook Chapter 8, we ll see that for modern processors, separation of instructions and data really means having separate caches for instructions and data.)
25 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 25/86 Data Hazards: Are inputs to instructions up-to-date? Example: add $t0, $t1, $t2 sub $t4, $t3, $t0 The destination of ADD is a source for SUB. The Writeback step of ADD will happen later than the Decode step of SUB, so there is a risk that SUB will use old, wrong data from $t0. Remember, the processor must produce results as if one instruction completes before the next instruction starts!
26 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 26/86 Control Hazards: What instruction address should be used in the next Fetch? Example... beq $t0, $t1, L1 and $t4, $t2, $t3... more instructions... L1: lw $t5, ($t6) Which instruction should be fetched after BEQ is fetched? AND or LW? The processor will not know until the $t0 == $t1 comparison is done!
27 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 27/86 Assumption about Register File in textbook Section 7.5, related to data hazards Writes to the Register File occur in the first half of a clock cycle, and reads from the Register File occur in the second half. To enable this behaviour, what choices can be made about flip-flops, Data Memory, and other clocked components? What are the consequences regarding GPR reads and writes that happen within the same clock cycle?
28 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 28/86 Edge-triggering for pipelined computers in Section 7.5 Updates to GPRs in the Register File happen in response to negative clock edges. system clock 1 0 Updates to PC, Data Memory, and pipeline registers happen in response to positive clock edges. This is NOT applicable to the single-cycle design of Section 7.3 and the multicycle design of Section 7.4!
29 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 29/86 Review: 5 pipeline stages for our MIPS subset Fetch: Read instruction from Instruction Memory; do PC = PC + 4. Decode: Determine Control Unit outputs appropriate for instruction opcode; copy two GPR values out of Register File. Execute: Do computation in ALU. Memory: Read or write Data Memory. Writeback: Update a GPR in the Register File.
30 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 30/86 Solutions to data hazards, first of three: stalling the pipeline Example: add $t0, $t1, $t2 sub $t4, $t3, $t0
31 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 31/86 Solutions to data hazards, second of three: forwarding Example A, from previous slide: add $t0, $t1, $t2 sub $t4, $t3, $t0 Example B: lw $t0, ($t1) add $t2, $t2, $t3 slt $t6, $t0, $t5
32 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 32/86 Solutions to data hazards, third of three: combine stalling and forwarding Example: lw $t0, ($t1) add $t3, $t0, $t2 Can forwarding by itself solve this data hazard?
33 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 33/86 Control Hazards (repeat of earlier example) What instruction address should be used in the next Fetch step after the Fetch step of a branch instruction? Example... beq $t0, $t1, L1 and $t4, $t2, $t3... more instructions... L1: lw $t5, ($t6) Which instruction should be fetched after BEQ is fetched? AND or LW? The processor will not know until the $t0 == $t1 comparison is done!
34 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 34/86 Control hazard illustration BEQ F D E M W next instruction F D E M W Here next means next in time, not necessarily next location in Instruction Memory. Why will it be difficult to do the Fetch step for the next instruction just one clock cycle after the Fetch step for BEQ? (There are multiple reasons.)
35 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 35/86 Four kinds of solutions for control hazards 1. Stall: Delay the Fetch step for the next instruction until the address of the next instruction is known. 2. Predict: Guess what the address of the next instruction will be, and act on the guess without delay. Check that the guess was correct; if not, cancel instructions that have incorrectly entered the pipeline. 3. Delayed branch and jump rules 4. Conditional instructions
36 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 36/86 Dynamic branch prediction This is widely used in modern processors (but mostly not used in low-power embedded processors). A large and complex branch prediction circuit is dedicated to recording information about recently-encountered branch instructions. For each branch instruction, its target address is recorded along with a prediction about whether the branch will be taken.
37 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 37/86 Dynamic branch prediction, continued When a branch instruction is encountered, the branch prediction circuit can quickly supply a guess for the next PC value, and instruction fetch can occur without delay. If a guess is wrong, some instructions will have to be cancelled, and clock cycles will be lost. This system is called dynamic because a taken/not-taken prediction will be changed if it has recently been more often wrong than right.
38 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 38/86 Branch prediction code example p and past_last are of type int*. count is an int. do { if (*p < 0) count++; p++; } while (p!= past_last); p walks through an array of int elements, and count records how many of those elements are negative.
39 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 39/86 Branch prediction code example, continued Let s suppose that there are a lot of array elements, and most of them are negative... L1: lw $t0, ($a0) slt $t1, $zero, $t0 beq $t1, $zero, L2 # branch if!(*p < 0) addiu $t9, $t9, 1 # count++ L2: addiu $a0, $a0, 4 # p++ bne $a0, $t8, L1 # branch if p!= past_last As the processor runs the loop, what predictions will it make about the BEQ and BNE instructions?
40 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 40/86 Delayed branch and jump rules This kind of solution to control hazards is older and less sophisticated than branch prediction. This is a feature of the real MIPS instruction set, but is NOT enabled by default in MARS and other MIPS simulators used for education! The idea is that one instruction of useful work can get started in the clock cycle needed to make a branch decision and compute a branch or jump target address. Details are in the two paragraphs at the bottom of the page in the Control Hazard Solutions document.
41 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 41/86 MIPS delayed branch example What will the flow of instructions be if $t0!= 0? What will it be if $t0 == 0? slt beq add lw operands $t0, $zero, L1 operands operands L1: or operands sub operands
42 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 42/86 Examples of MIPS delayed jumps # C code: i = foo(17);... suppose i is in $s0. jal foo addiu $a0, $zero, 17 # Argument set up after call starts! addu $s0, $v0, $zero # $ra points to this instruction. # Example return from nonleaf procedure... jr $ra addiu $sp, $sp, 32 # Deallocate stack after return starts! If you ever do A.L. programming for real MIPS processors, or need to read MIPS compiler output, be aware of delayed branches and jumps!
43 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 43/86 Conditional instructions if (a < b) c = a; else c = b; Suppose this if-else code is inside a loop. Translating this with a branch and a jump could cause a lot of lost clock cycles, especially if branch prediction does a poor job. Suppose that a, b, and c are ints in $s0, $s1, $s2. Let s see how this can be coded with MIPS move conditional instructions movn and movz. By the way, ARM instruction sets have very rich collections of conditional instructions.
44 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 44/86 Outline of Slide Set 7 for Lecture Section 01 The multicycle processor (textbook Section 7.4) Introduction to Pipelining 5 pipeline stages for our MIPS subset Pipeline Hazards Making pipelining work in hardware Hardware features to manage data hazards Hardware changes to manage control hazards Exceptions
45 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 45/86 Making pipelining work in hardware The textbook presents a sequence of designs, from Figure 7.45 to Figure The earliest designs are incomplete and incorrect in many ways. Later designs get closer and closer to being complete and correct. Recommendation: Read Sections through carefully and observe how new features get added and existing features get modified.
46 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 46/86 Remarks on Textbook Figure 7.47 This computer handles R-type, LW, and SW instructions correctly, except when there are data hazards. It makes an attempt to handle BEQ, but doesn t get it right. This computer works as if three delay-slot instructions should be processed before a branch is taken.
47 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 47/86 D flip-flops: What s the point? (Repeat slide from Slide Set 6) This is important! Knowing what a D flip-flop does is as important as knowing the truth tables for NOT, AND, and OR. A clock cycle is a span of time from one active edge of a clock to the next active edge. A D flip-flop captures the value of the input bit D at the end of a clock cycle, and makes that captured bit value available on Q throughout the next clock cycle.
48 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 48/86 Pipeline registers Prominent in all of the Section 7.5 designs are pipeline registers made of D flip-flops. The pipeline registers are not 32 bits wide they re much wider than that. They have clock inputs; the register outputs change only on active clock edges. At the end of each clock cycle, each pipeline register collects information from one pipeline stage, and makes that information available to the next stage throughout the next clock cycle.
49 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 49/86 A sketch of a pipelined datapath This is essentially textbook Figure 7.46 with the wiring removed to reduce clutter. Note the highlighted pipeline registers! CLK CLK CLK CLK CLK CLK CLK PC I-Mem F/D pipeline register R-File + 4 SignExt D/E pipeline register <<2 ALU + E/M pipeline register D-Mem M/W pipeline register F stage D stage E stage M stage W stage
50 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 50/86 Review: Edge-triggering for pipelined computers in Section 7.5 Updates to GPRs in the Register File happen in response to negative clock edges. system clock 1 0 Updates to PC, Data Memory, and pipeline registers happen in response to positive clock edges.
51 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 51/86 Tracing an instruction through the datapath of Figure 7.46 Let s trace an R-type instruction: SLT $2, $4, $5. We ll assume that this instruction is located at address 0x0040_0030 in Instruction Memory. For now, we ll look at the datapath only. We ll consider control later, after we have seen the whole datapath.
52 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 52/86 SLT $2, $4, $5 located at 0x0040_0030: F stage CLK CLK 0 1 PC PCPlus4F PCF 4 I-Mem + F/D pipeline reg. PCBranchM (from M stage) How many DFFs are there in the F/D register? What values get written into the F/D register at the end of the Fetch clock cycle of the SLT?
53 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 53/86 SLT $2, $4, $5 located at 0x0040_0030: D stage CLK CLK InstrD 25:21 WE3 20:16 R-File F/D pipeline reg. 20:16 15:11 15:0 SignExt PCPlus4D WriteRegW ResultW CLK D/E pipeline reg. How many DFFs are there in the D/E register? What gets into the D/E register at the end of the Decode clock cycle? What is going on with WriteRegW and ResultW?
54 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 54/86 SLT $2, $4, $5 located at 0x0040_0030: E stage CLK D/E pipeline reg. RtE RdE 0 1 SignImmE PCPlus4E 0 1 SrcAE SrcBE <<2 ALU WriteDataE WriteRegE + CLK E/M pipeline reg. How many DFFs are there in the E/M register? For the SLT instruction, what useful information gets written into the E/M register at the end of the Execute clock cycle? What useful information gets written into the E/M register in the cases of LW, SW and BEQ?
55 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 55/86 SLT $2, $4, $5 located at 0x0040_0030: M stage CLK ZeroM E/M pipeline reg. ALUOutM WriteDataM WriteRegM PCBranchM CLK WE D-Mem CLK M/W pipeline reg. How many DFFs are there in the M/W register? For the SLT instruction, what useful information gets written into the E/M register at the end of Memory clock cycle? What happens in the M stage for LW, SW, and BEQ?
56 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 56/86 SLT $2, $4, $5 located at 0x0040_0030: W stage CLK ALUOutW M/W pipeline reg. ReadDataW 1 WriteRegW 0 For the SLT instruction, what happens in the Writeback stage? Let s draw part of a schematic to help explain it. What would be the same and what would be different for an LW instruction in the W stage? ResultW
57 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 57/86 Pipelined control for the Figure 7.46 datapath Perhaps surprisingly, we can use exactly the same control unit that was designed for the single-cycle machine. We can drop the Control Unit into the Decode stage. However, now we must organize the control signals so that each one arrives at the correct time wherever it is needed on the datapath! For example... Q1: RegWrite = 1 is generated for LW. When should that value of RegWrite arrive at the R-File? Q2: MemWrite = 1 is generated for SW. When should that value of MemWrite arrive at D-Mem? Q3: What general method can we use to get the timing correct for all of the control signals?
58 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 58/86 Control circuit for pipelined datapath of Figure :26 Instr 5:0 Control Unit opcode funct RegWriteD MemtoRegD MemWriteD BranchD ALUControlD ALUSrcD RegDstD to R-File CLK CLK PCSrcM CLK D/E pipeline register. RegWriteE MemtoRegE MemWriteE BranchE ALUControlE ALUSrcE RegDstE E/M pipeline register. RegWriteM MemtoRegM MemWriteM BranchM ZeroM (from ALU) M/W pipeline register MemtoRegW. RegWriteW Let s make a few notes about how this circuit works.
59 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 59/86 How much progress have we made so far? Reminder: processor designs near the beginning of Section 7.5 are incomplete and partly incorrect. Processor designs get better and better as corrections and improvements are made. The datapath and control system we have just looked at in detail are combined in the textbook in the computer of Figure That computer can t deal with data hazards and handles BEQ incorrectly.
60 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 60/86 Outline of Slide Set 7 for Lecture Section 01 The multicycle processor (textbook Section 7.4) Introduction to Pipelining 5 pipeline stages for our MIPS subset Pipeline Hazards Making pipelining work in hardware Hardware features to manage data hazards Hardware changes to manage control hazards Exceptions
61 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 61/86 Hardware features to manage data hazards Let s start by reviewing two of the more complicated kinds of data hazard. For example #2 of the Hazard Examples document... first ADD F D E M W second ADD F D E M W SUB F D E M W Let s illustrate why forwarding by itself won t work for example #4 in Hazard Examples...
62 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 62/86 Hardware for forwarding: This incomplete sketch of an upgraded Execute stage allows a lot of choice for ALU A and B inputs! CLK ID/EX pipeline register GPR GPR LW/SW offset ForwardAE Hazard Unit ALUSrcE ForwardBE 0 1 A B ALU WriteDataE ALUOutM ResultW
63 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 63/86 Hardware for forwarding, continued: Q1: What should the values of ForwardAE and ForwardBE be in the case where no forwarding is needed? Consider this sequence: LW AND SUB R8, 0(R4) R9, R10, R11 R12, R8, R9 Q2: What should the values of ForwardAE and ForwardBE be when SUB is in the EX stage? Q3: What inputs does the Hazard Unit need in order to decide correctly on the values of ForwardAE and ForwardBE?
64 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 64/86 Hazard Unit for computer of textbook Figure 7.50 RsE RtE ForwardAE ForwardBE WriteRegM RegWriteM WriteRegW RegWriteW Hazard Unit What are RsE and RtE, and how are they used by the Hazard Unit? A complete description of the logic in this version of the Hazard Unit can be found on pages 416 and 418 in the textbook. Note: The computer of Figure 7.50 properly handles data hazards that can be solved using forwarding only. It is not capable of solving data hazards that require stalls.
65 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 65/86 Hardware for data hazard stalls This is an example of what is called a load-use data hazard: LW $8, 0($9) ADD $16, $17, $8 SUB $18, $4, $5 We ve already seen that a one-cycle stall is needed so that the M stage result of LW can be forwarded to the E stage of ADD. The need for a stall can be detected in the D stage of ADD. Let s draw a diagram to show how LW, ADD, and SUB will be processed.
66 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 66/86 To make this work in hardware, we must enhance some of the registers in the system... Add an EN (enable) input to the PC. If EN is turned off the PC is frozen and does not update on a positive clock edge. Add a similar EN input to the F/D pipeline register. Add a CLR (clear) input to the D/E pipeline register. If CLR is turned on, the instruction arriving in the register is converted to a harmless NOP. These changes are sketched in an incomplete schematic on the next slide...
67 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 67/86 CLK CLK CLK PC EN StallF F/D register StallD EN RsD RtD CLR D/E register FlushE RtE MemtoRegE extension to Hazard Unit For clarity, the schematic above only shows Hazard Unit inputs and outputs that are used to effect the stall for LW instructions. See textbook Figure 7.53 for a complete schematic.
68 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 68/86 For a complete description of all of the logic used to effect the stall for LW instructions, see pages in the textbook. In lecture, it s really only possible to present a sketch of that logic.
69 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 69/86 Outline of Slide Set 7 for Lecture Section 01 The multicycle processor (textbook Section 7.4) Introduction to Pipelining 5 pipeline stages for our MIPS subset Pipeline Hazards Making pipelining work in hardware Hardware features to manage data hazards Hardware changes to manage control hazards Exceptions
70 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 70/86 Hardware changes to manage control hazards ENCM 369 will NOT cover this material in depth, and there will be NO lab exercises or midterm or final exam questions on it! The Figure 7.53 processor is excellent regarding data hazards, but handles BEQ instructions poorly three instructions follow a BEQ into the pipeline before the branch decision gets made. Why does that happen? The Figure 7.53 processor makes the branch decision in the Memory stage. (Check the location of the AND gate... )
71 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 71/86 Redesign to make branch instructions work better The processor of Figure 7.56 moves the branch decision from the Memory stage to the Decode stage, and the branch target address generation from the Execute stage to the Decode stage. So, only one instruction follows BEQ into the pipeline before a branch is taken, which is better, but making the decision in the Decode stage causes new data hazards!
72 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 72/86 Redesign to make branch instructions work better, continued Example #6 from the Hazard Examples document, with an extra instruction... LW $17, 0($4) BEQ $17, $0, some_other_label ADD $2, $5, $6 What is needed to get the LW result into the Decode step of BEQ? If the branch is taken, what should happen to ADD? (Assume that we re designing a computer that does NOT have a delayed branch rule.)
73 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 73/86 Redesign to make branch instructions work better: Remarks It s hard to process branches with perfect accuracy without losing lots of cycles due to hazards! Therefore, dynamic branch prediction can save a lot of cycles if most guesses are correct. Also, conditional instructions such as MIPS movn and movz are sometimes better choices than branch instructions.
74 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 74/86 Outline of Slide Set 7 for Lecture Section 01 The multicycle processor (textbook Section 7.4) Introduction to Pipelining 5 pipeline stages for our MIPS subset Pipeline Hazards Making pipelining work in hardware Hardware features to manage data hazards Hardware changes to manage control hazards Exceptions
75 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 75/86 Exceptions: General Concepts An exception is an event that changes flow of instructions in a way that is quite different from a branch or jump. So, obviously, an exception causes a special kind of PC update. But an exception can also cause a change in privilege a switch from a user program to operating system kernel software.
76 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 76/86 Privilege: user program vs. kernel A user program has rights to read and write memory allocated to that program and to read and write registers. That s all it can do by itself, but it can also ask for help from the kernel. The kernel controls hardware like disks and network interfaces. The kernel has power over all memory in the computer and can start and stop all other programs.
77 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 77/86 Two meanings for exception The concept of an exception in discussion of hardware or assembly language code is NOT THE SAME as the concept of an exception in a high-level language like C++, Java, or Python! Exception-related keywords in C++: try, catch, throw Exception-related keywords in Java: try, catch, finally, throw, throws Exception-related keywords in Python: try, except, raise
78 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 78/86 Two meanings for exception, continued High-level language exception: a special kind of jump (possibly involving a return through one or more procedure calls) to code that is set up to handle an error condition. Do NOT try to connect the above concept to hardware exceptions if you do, your brain will hurt and your understanding of both kinds of exceptions will be damaged.
79 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 79/86 Exceptions in Hardware and Assembly Language: 3 Main Categories 1. The processor notices that a program has tried to do a bad thing. 2. A program intentionally generates the exception. 3. Interrupts hardware external to the processor sends a signal to the processor asking for attention.
80 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 80/86 Examples of program-trying-a-bad-thing exceptions Instruction fetched with opcode that does not make sense to processor ( undefined instruction ). Addition or subtraction of integers resulted in overflow (e.g., MIPS ADD, SUB, ADDI, but not ADDU, SUBU, ADDIU). Attempt to access memory a program is not permitted to access. Attempt to access memory with invalid address (e.g., LW data address is not a multiple of 4). (Note: Memory units in Chapter 7 computers don t have the capability to report memory access errors, but memory systems in real computers usually do.)
81 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 81/86 Programs intentionally causing exceptions This mainly happens with system calls. Examples: MIPS syscall instruction, similar instructions in other instruction sets. A user program asks the operating system kernel to provide a service.
82 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 82/86 Examples of Interrupts Laptop user presses key on keyboard. Desktop user moves a mouse. Smartphone or tablet user taps finger on a touchscreen. A data packet arrives on a network interface. A disk controller reports that a write operation on a disk has completed.
83 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 83/86 What happens when an exception occurs? The processor will start executing instructions that form an exception handler (like a procedure, but not exactly the same). Before starting the exception handler, the processor must record some essential information in some special-purpose registers... Let s make some notes about this essential information.
84 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 84/86 Exceptions and pipelines Due to time limitations and lack of textbook support, we will not look in detail at this topic, just give a quick sketch. Useful terms: exception victim and flushing. The victim of an exception is either the instruction that caused the exception or, when there is an interrupt, the first instruction in the pipeline that will not be allowed to complete. To flush an instruction in a pipeline means ensuring that the instruction does not update system state, such as register file or memory contents.
85 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 85/86 Exceptions and pipelines: key challenges Instructions that enter a pipeline before the victim must be allowed to complete. The victim and the instructions that followed the victim in the pipeline must be flushed. The address of the victim must be identified NOT easy, because in a pipelined system, the PC probably will NOT be pointing to the victim.
86 ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 86/86 Example of MIPS exception processing Suppose there is an exception when the LW instruction (at address 0x0040_0090) is in the Memory stage. What should happen? Scenario 1: Exception is caused by $t0 not being a multiple of 4. Scenario 2: Exception is an interrupt, unrelated to this program. # Code running in a # 5-stage pipeline, an # actual MIPS computer, # not a Ch. 7 machine! andi $t2, $s4, 0xFF sll $t3, $t2, 8 or $s2, $s2, $t3 lw $t1, ($t0) addiu $t0, $t0, 4 sw $t1, ($s0) addiu $s0, $s0, 4 slt $t4, $t0, $s7
ENCM 369 Winter 2018 Lab 9 for the Week of March 19
page 1 of 9 ENCM 369 Winter 2018 Lab 9 for the Week of March 19 Steve Norman Department of Electrical & Computer Engineering University of Calgary March 2018 Lab instructions and other documents for ENCM
More informationTopics. Lecture 12: Pipelining. Introduction to pipelining. Pipelined datapath. Hazards in pipeline. Performance. Design issues.
Lecture 2: Pipelining Topics Introduction to pipelining Performance Pipelined datapath Design issues Hazards in pipeline Types Solutions Pipelining is Natural! Laundry Example Use case scenario Ann, Brian,
More informationENCM 501 Winter 2019 Assignment 6 for the Week of March 11
page of 8 ENCM 5 Winter 29 Assignment 6 for the Week of March Steve Norman Department of Electrical & Computer Engineering University of Calgary February 29 Assignment instructions and other documents
More informationSlide Set 7. for ENCM 501 in Winter Term, Steve Norman, PhD, PEng
Slide Set 7 for ENCM 501 in Winter Term, 2017 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 2017 ENCM 501 W17 Lectures: Slide
More informationSlides for Lecture 15
Slides for Lecture 15 ENCM 501: Principles of Computer Architecture Winter 2014 Term Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary 6 March,
More informationCHW 362 : Computer Architecture & Organization
CHW 362 : Computer Architecture & Organization Instructors: Dr Ahmed Shalaby Dr Mona Ali http://bu.edu.eg/staff/ahmedshalaby4# http://www.bu.edu.eg/staff/mona.abdelbaset Review: Instruction Formats R-Type
More informationComputer Architectures
Computer Architectures Pipelined instruction execution Hazards, stages balancing, super-scalar systems Pavel Píša, Michal Štepanovský, Miroslav Šnorek Main source of inspiration: Patterson Czech Technical
More informationDesign of Digital Circuits Lecture 17: Pipelining Issues. Prof. Onur Mutlu ETH Zurich Spring April 2017
Design of Digital Circuits Lecture 17: Pipelining Issues Prof. Onur Mutlu ETH Zurich Spring 2017 28 April 2017 Agenda for Today & Next Few Lectures! Single-cycle Microarchitectures! Multi-cycle and Microprogrammed
More informationDesign of Digital Circuits Lecture 16: Dependence Handling. Prof. Onur Mutlu ETH Zurich Spring April 2017
Design of Digital Circuits Lecture 16: Dependence Handling Prof. Onur Mutlu ETH Zurich Spring 2017 27 April 2017 Agenda for Today & Next Few Lectures! Single-cycle Microarchitectures! Multi-cycle and Microprogrammed
More informationCENG 5133 Computer Architecture Design Spring Sample Exam 2
CENG 533 Computer Architecture Design Spring 24 Sample Exam 2. (6 pt) Determine the propagation delay and contamination delay of the following circuit using the gate delays given below. Gate t pd (ps)
More informationSlide Set 9. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng
Slide Set 9 for ENCM 369 Winter 2018 Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary March 2018 ENCM 369 Winter 2018 Section 01
More informationSlide Set 5. for ENCM 369 Winter 2014 Lecture Section 01. Steve Norman, PhD, PEng
Slide Set 5 for ENCM 369 Winter 2014 Lecture Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 2014 ENCM 369 W14 Section
More informationLecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1
Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number
More informationENCM 369 Winter 2019 Lab 6 for the Week of February 25
page of ENCM 369 Winter 29 Lab 6 for the Week of February 25 Steve Norman Department of Electrical & Computer Engineering University of Calgary February 29 Lab instructions and other documents for ENCM
More informationEECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141
EECS 151/251A Fall 2017 Digital Design and Integrated Circuits Instructor: John Wawrzynek and Nicholas Weaver Lecture 13 Project Introduction You will design and optimize a RISC-V processor Phase 1: Design
More informationContents. Slide Set 2. Outline of Slide Set 2. More about Pseudoinstructions. Avoid using pseudoinstructions in ENCM 369 labs
Slide Set 2 for ENCM 369 Winter 2014 Lecture Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 2014 ENCM 369 W14 Section
More informationWinter 2006 FINAL EXAMINATION Auxiliary Gymnasium Tuesday, April 18 7:00pm to 10:00pm
University of Calgary Department of Electrical and Computer Engineering ENCM 369: Computer Organization Lecture Instructor for L01 and L02: Dr. S. A. Norman Winter 2006 FINAL EXAMINATION Auxiliary Gymnasium
More informationCS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #22 CPU Design: Pipelining to Improve Performance II 2007-8-1 Scott Beamer, Instructor CS61C L22 CPU Design : Pipelining to Improve Performance
More informationSlide Set 5. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng
Slide Set 5 for ENCM 369 Winter 2018 Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary February 2018 ENCM 369 Winter 2018 Section
More informationDesign of A Six-stage Pipelined MIPS Processor Based on FPGA
Design of A Six-stage Pipelined MIPS Processor Based on FPGA Qiao-Zhi Sun, De-Chun Kong, Cheng-Long Zhao, and Hui-Bin Shi Department of Computer Science and Technology, Nanjing University of Aeronautics
More informationSlide Set 8. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng
Slide Set 8 for ENCM 369 Winter 2018 Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary March 2018 ENCM 369 Winter 2018 Section 01
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined
More informationContents. Slide Set 1. About these slides. Outline of Slide Set 1. Typographical conventions: Italics. Typographical conventions. About these slides
Slide Set 1 for ENCM 369 Winter 2014 Lecture Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 2014 ENCM 369 W14 Section
More informationMidnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4
IC220 Set #9: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life Return to Chapter 4 Midnight Laundry Task order A B C D 6 PM 7 8 9 0 2 2 AM 2 Smarty Laundry Task order A B C D 6 PM
More informationWinter 2002 FINAL EXAMINATION
University of Calgary Department of Electrical and Computer Engineering ENCM 369: Computer Organization Instructors: Dr. S. A. Norman (L01) and Dr. S. Yanushkevich (L02) Note for Winter 2005 students Winter
More informationSlide Set 4. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng
Slide Set 4 for ENCM 369 Winter 2018 Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary January 2018 ENCM 369 Winter 2018 Section
More informationChapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationSlide Set 1 (corrected)
Slide Set 1 (corrected) for ENCM 369 Winter 2018 Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary January 2018 ENCM 369 Winter 2018
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationWinter 2009 FINAL EXAMINATION Location: Engineering A Block, Room 201 Saturday, April 25 noon to 3:00pm
University of Calgary Department of Electrical and Computer Engineering ENCM 369: Computer Organization Lecture Instructors: S. A. Norman (L01), N. R. Bartley (L02) Winter 2009 FINAL EXAMINATION Location:
More informationContents Slide Set 9. Final Notes on Textbook Chapter 7. Outline of Slide Set 9. More about skipped sections in Chapter 7. Outline of Slide Set 9
slide 2/41 Contents Slide Set 9 for ENCM 369 Winter 2014 Lecture Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 2014
More informationOutline Marquette University
COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations from Mike
More information1 Hazards COMP2611 Fall 2015 Pipelined Processor
1 Hazards Dependences in Programs 2 Data dependence Example: lw $1, 200($2) add $3, $4, $1 add can t do ID (i.e., read register $1) until lw updates $1 Control dependence Example: bne $1, $2, target add
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More information#1 #2 with corrections Monday, March 12 7:00pm to 8:30pm. Please do not write your U of C ID number on this cover page.
page 1 of 6 University of Calgary Department of Electrical and Computer Engineering ENCM 369: Computer Organization Lecture Instructors: Steve Norman and Norm Bartley Winter 2018 MIDTERM TEST #1 #2 with
More informationLecture 7 Pipelining. Peng Liu.
Lecture 7 Pipelining Peng Liu liupeng@zju.edu.cn 1 Review: The Single Cycle Processor 2 Review: Given Datapath,RTL -> Control Instruction Inst Memory Adr Op Fun Rt
More informationDepartment of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri
Department of Computer and IT Engineering University of Kurdistan Computer Architecture Pipelining By: Dr. Alireza Abdollahpouri Pipelined MIPS processor Any instruction set can be implemented in many
More informationCS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST
CS 110 Computer Architecture Pipelining Guest Lecture: Shu Yin http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University Slides based on UC Berkley's CS61C
More informationSlide Set 8. for ENCM 501 in Winter Steve Norman, PhD, PEng
Slide Set 8 for ENCM 501 in Winter 2018 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary March 2018 ENCM 501 Winter 2018 Slide Set 8 slide
More informationEECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction
EECS150 - Digital Design Lecture 10- CPU Microarchitecture Feb 18, 2010 John Wawrzynek Spring 2010 EECS150 - Lec10-cpu Page 1 Processor Microarchitecture Introduction Microarchitecture: how to implement
More informationComputer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining
Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Single-Cycle Design Problems Assuming fixed-period clock every instruction datapath uses one
More informationCENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu
CENG 342 Computer Organization and Design Lecture 6: MIPS Processor - I Bei Yu CEG342 L6. Spring 26 The Processor: Datapath & Control q We're ready to look at an implementation of the MIPS q Simplified
More informationFull Datapath. Chapter 4 The Processor 2
Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory
More informationCS 351 Exam 2 Mon. 11/2/2015
CS 351 Exam 2 Mon. 11/2/2015 Name: Rules and Hints The MIPS cheat sheet and datapath diagram are attached at the end of this exam for your reference. You may use one handwritten 8.5 11 cheat sheet (front
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationCS3350B Computer Architecture Quiz 3 March 15, 2018
CS3350B Computer Architecture Quiz 3 March 15, 2018 Student ID number: Student Last Name: Question 1.1 1.2 1.3 2.1 2.2 2.3 Total Marks The quiz consists of two exercises. The expected duration is 30 minutes.
More informationLECTURE 3: THE PROCESSOR
LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU
More informationColumbia University CSEE 3827 Fundamentals of Computer Systems Final Exam
Columbia University CSEE 3827 Fundamentals of Computer Systems Final Exam Prof. Martha A. Kim December 7, 23 Name: First Last (Family) UNI (e.g., mak29) You are allowed 3 hours. You may consult your own
More informationCOMPUTER ORGANIZATION AND DESIGN
ARM COMPUTER ORGANIZATION AND DESIGN Edition The Hardware/Software Interface Chapter 4 The Processor Modified and extended by R.J. Leduc - 2016 To understand this chapter, you will need to understand some
More informationEECS150 - Digital Design Lecture 9- CPU Microarchitecture. Watson: Jeopardy-playing Computer
EECS150 - Digital Design Lecture 9- CPU Microarchitecture Feb 15, 2011 John Wawrzynek Spring 2011 EECS150 - Lec09-cpu Page 1 Watson: Jeopardy-playing Computer Watson is made up of a cluster of ninety IBM
More informationChapter 4 The Processor 1. Chapter 4A. The Processor
Chapter 4 The Processor 1 Chapter 4A The Processor Chapter 4 The Processor 2 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationCSEE 3827: Fundamentals of Computer Systems
CSEE 3827: Fundamentals of Computer Systems Lecture 21 and 22 April 22 and 27, 2009 martha@cs.columbia.edu Amdahl s Law Be aware when optimizing... T = improved Taffected improvement factor + T unaffected
More informationPipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.
Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =2n/05n+15 2n/0.5n 1.5 4 = number of stages 4.5 An Overview
More informationOutline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception
Outline A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception 1 4 Which stage is the branch decision made? Case 1: 0 M u x 1 Add
More informationSystems Architecture
Systems Architecture Lecture 15: A Simple Implementation of MIPS Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some or all figures from Computer Organization and Design: The Hardware/Software
More informationDepartment of Electrical Engineering and Computer Sciences Fall 2003 Instructor: Dave Patterson CS 152 Exam #1. Personal Information
University of California, Berkeley College of Engineering Department of Electrical Engineering and Computer Sciences Fall 2003 Instructor: Dave Patterson 2003-10-8 CS 152 Exam #1 Personal Information First
More informationCS232 Final Exam May 5, 2001
CS232 Final Exam May 5, 2 Name: This exam has 4 pages, including this cover. There are six questions, worth a total of 5 points. You have 3 hours. Budget your time! Write clearly and show your work. State
More informationSI232 Set #20: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Chapter 6 ADMIN. Reading for Chapter 6: 6.1,
SI232 Set #20: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life Chapter 6 ADMIN ing for Chapter 6: 6., 6.9-6.2 2 Midnight Laundry Task order A 6 PM 7 8 9 0 2 2 AM B C D 3 Smarty
More informationDesign of Digital Circuits Lecture 15: Pipelining. Prof. Onur Mutlu ETH Zurich Spring April 2017
Design of Digital Circuits Lecture 5: Pipelining Prof. Onur Mutlu ETH Zurich Spring 27 3 April 27 Agenda for Today & Next Few Lectures! Single-cycle Microarchitectures! Multi-cycle and Microprogrammed
More information4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds?
Chapter 4: Assessing and Understanding Performance 1. Define response (execution) time. 2. Define throughput. 3. Describe why using the clock rate of a processor is a bad way to measure performance. Provide
More informationCENG 3420 Lecture 06: Datapath
CENG 342 Lecture 6: Datapath Bei Yu byu@cse.cuhk.edu.hk CENG342 L6. Spring 27 The Processor: Datapath & Control q We're ready to look at an implementation of the MIPS q Simplified to contain only: memory-reference
More informationChapter 4 The Processor 1. Chapter 4B. The Processor
Chapter 4 The Processor 1 Chapter 4B The Processor Chapter 4 The Processor 2 Control Hazards Branch determines flow of control Fetching next instruction depends on branch outcome Pipeline can t always
More informationDigital Design & Computer Architecture (E85) D. Money Harris Fall 2007
Digital Design & Computer Architecture (E85) D. Money Harris Fall 2007 Final Exam This is a closed-book take-home exam. You are permitted a calculator and two 8.5x sheets of paper with notes. The exam
More informationECE260: Fundamentals of Computer Engineering
Pipelining James Moscola Dept. of Engineering & Computer Science York College of Pennsylvania Based on Computer Organization and Design, 5th Edition by Patterson & Hennessy What is Pipelining? Pipelining
More informationOrange Coast College. Business Division. Computer Science Department. CS 116- Computer Architecture. Pipelining
Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture Pipelining Recall Pipelining is parallelizing execution Key to speedups in processors Split instruction
More informationThe Pipelined MIPS Processor
1 The niversity of Texas at Dallas Lecture #20: The Pipeline IPS Processor The Pipelined IPS Processor We complete our study of AL architecture by investigating an approach providing even higher performance
More informationMark Redekopp and Gandhi Puvvada, All rights reserved. EE 357 Unit 15. Single-Cycle CPU Datapath and Control
EE 37 Unit Single-Cycle CPU path and Control CPU Organization Scope We will build a CPU to implement our subset of the MIPS ISA Memory Reference Instructions: Load Word (LW) Store Word (SW) Arithmetic
More informationENCM 369 Winter 2017 Lab 3 for the Week of January 30
page 1 of 11 ENCM 369 Winter 2017 Lab 3 for the Week of January 30 Steve Norman Department of Electrical & Computer Engineering University of Calgary January 2017 Lab instructions and other documents for
More informationComputer Architecture. Lecture 6.1: Fundamentals of
CS3350B Computer Architecture Winter 2015 Lecture 6.1: Fundamentals of Instructional Level Parallelism Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b [Adapted from lectures on Computer Organization and
More informationENCM 369 Winter 2016 Lab 11 for the Week of April 4
page 1 of 13 ENCM 369 Winter 2016 Lab 11 for the Week of April 4 Steve Norman Department of Electrical & Computer Engineering University of Calgary April 2016 Lab instructions and other documents for ENCM
More informationGrading Results Total 100
University of California, Berkeley College of Engineering Department of Electrical Engineering and Computer Sciences Fall 2003 Instructor: Dave Patterson 2003-10-8 CS 152 Exam #1 Personal Information First
More informationFaculty of Science FINAL EXAMINATION
Faculty of Science FINAL EXAMINATION COMPUTER SCIENCE COMP 273 INTRODUCTION TO COMPUTER SYSTEMS Examiner: Prof. Michael Langer April 18, 2012 Associate Examiner: Mr. Joseph Vybihal 2 P.M. 5 P.M. STUDENT
More informationUniversity of Jordan Computer Engineering Department CPE439: Computer Design Lab
University of Jordan Computer Engineering Department CPE439: Computer Design Lab Experiment : Introduction to Verilogger Pro Objective: The objective of this experiment is to introduce the student to the
More informationThe Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture
The Processor Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut CSE3666: Introduction to Computer Architecture Introduction CPU performance factors Instruction count
More informationEE557--FALL 1999 MAKE-UP MIDTERM 1. Closed books, closed notes
NAME: STUDENT NUMBER: EE557--FALL 1999 MAKE-UP MIDTERM 1 Closed books, closed notes Q1: /1 Q2: /1 Q3: /1 Q4: /1 Q5: /15 Q6: /1 TOTAL: /65 Grade: /25 1 QUESTION 1(Performance evaluation) 1 points We are
More informationInteger Multiplication and Division
Integer Multiplication and Division for ENCM 369: Computer Organization Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 208 Integer
More informationThomas Polzer Institut für Technische Informatik
Thomas Polzer tpolzer@ecs.tuwien.ac.at Institut für Technische Informatik Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =
More informationComputer Organization and Structure. Bing-Yu Chen National Taiwan University
Computer Organization and Structure Bing-Yu Chen National Taiwan University The Processor Logic Design Conventions Building a Datapath A Simple Implementation Scheme An Overview of Pipelining Pipelined
More informationCOMPUTER ORGANIZATION AND DESI
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler
More informationWorking on the Pipeline
Computer Science 6C Spring 27 Working on the Pipeline Datapath Control Signals Computer Science 6C Spring 27 MemWr: write memory MemtoReg: ALU; Mem RegDst: rt ; rd RegWr: write register 4 PC Ext Imm6 Adder
More informationUniversity of Calgary Department of Electrical and Computer Engineering ENCM 369: Computer Organization Instructor: Steve Norman
page of 9 University of Calgary Department of Electrical and Computer Engineering ENCM 369: Computer Organization Instructor: Steve Norman Winter 26 FINAL EXAMINATION (with corrections) Location: ICT 2
More informationProcessor (I) - datapath & control. Hwansoo Han
Processor (I) - datapath & control Hwansoo Han Introduction CPU performance factors Instruction count - Determined by ISA and compiler CPI and Cycle time - Determined by CPU hardware We will examine two
More informationBinvert Operation (add, and, or) M U X
Exercises 5 - IPS datapath and control Questions 1. In the circuit of the AL back in lecture 4, we included an adder, an AND gate, and an OR gate. A multiplexor was used to select one of these three values.
More informationCS 61C Summer 2016 Guerrilla Section 4: MIPS CPU (Datapath & Control)
CS 61C Summer 2016 Guerrilla Section 4: MIPS CPU (Datapath & Control) 1) If this exam were a CPU, you d be halfway through the pipeline (Sp15 Final) We found that the instruction fetch and memory stages
More informationComputer Organization and Structure
Computer Organization and Structure 1. Assuming the following repeating pattern (e.g., in a loop) of branch outcomes: Branch outcomes a. T, T, NT, T b. T, T, T, NT, NT Homework #4 Due: 2014/12/9 a. What
More informationLecture Topics. Announcements. Today: Single-Cycle Processors (P&H ) Next: continued. Milestone #3 (due 2/9) Milestone #4 (due 2/23)
Lecture Topics Today: Single-Cycle Processors (P&H 4.1-4.4) Next: continued 1 Announcements Milestone #3 (due 2/9) Milestone #4 (due 2/23) Exam #1 (Wednesday, 2/15) 2 1 Exam #1 Wednesday, 2/15 (3:00-4:20
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More information3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?
CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:
More informationCSEE W3827 Fundamentals of Computer Systems Homework Assignment 3 Solutions
CSEE W3827 Fundamentals of Computer Systems Homework Assignment 3 Solutions 2 3 4 5 Prof. Stephen A. Edwards Columbia University Due June 26, 207 at :00 PM ame: Solutions Uni: Show your work for each problem;
More informationSlides for Lecture 6
Slides for Lecture 6 ENCM 501: Principles of Computer Architecture Winter 2014 Term Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary 28 January,
More informationCSE 141 Computer Architecture Spring Lectures 11 Exceptions and Introduction to Pipelining. Announcements
CSE 4 Computer Architecture Spring 25 Lectures Exceptions and Introduction to Pipelining May 4, 25 Announcements Reading Assignment Sections 5.6, 5.9 The Processor Datapath and Control Section 6., Enhancing
More informationLaboratory Pipeline MIPS CPU Design (2): 16-bits version
Laboratory 10 10. Pipeline MIPS CPU Design (2): 16-bits version 10.1. Objectives Study, design, implement and test MIPS 16 CPU, pipeline version with the modified program without hazards Familiarize the
More informationCS Computer Architecture Spring Week 10: Chapter
CS 35101 Computer Architecture Spring 2008 Week 10: Chapter 5.1-5.3 Materials adapated from Mary Jane Irwin (www.cse.psu.edu/~mji) and Kevin Schaffer [adapted from D. Patterson slides] CS 35101 Ch 5.1
More informationCS 251, Winter 2018, Assignment % of course mark
CS 251, Winter 2018, Assignment 5.0.4 3% of course mark Due Wednesday, March 21st, 4:30PM Lates accepted until 10:00am March 22nd with a 15% penalty 1. (10 points) The code sequence below executes on a
More informationCS 61C Fall 2016 Guerrilla Section 4: MIPS CPU (Datapath & Control)
CS 61C Fall 2016 Guerrilla Section 4: MIPS CPU (Datapath & Control) 1) If this exam were a CPU, you d be halfway through the pipeline (Sp15 Final) We found that the instruction fetch and memory stages
More informationLecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University
Lecture 9 Pipeline Hazards Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee18b 1 Announcements PA-1 is due today Electronic submission Lab2 is due on Tuesday 2/13 th Quiz1 grades will
More informationEIE/ENE 334 Microprocessors
EIE/ENE 334 Microprocessors Lecture 6: The Processor Week #06/07 : Dejwoot KHAWPARISUTH Adapted from Computer Organization and Design, 4 th Edition, Patterson & Hennessy, 2009, Elsevier (MK) http://webstaff.kmutt.ac.th/~dejwoot.kha/
More information