Slide Set 7. for ENCM 501 in Winter Term, Steve Norman, PhD, PEng
|
|
- Vivian Peters
- 5 years ago
- Views:
Transcription
1 Slide Set 7 for ENCM 501 in Winter Term, 2017 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 2017
2 ENCM 501 W17 Lectures: Slide Set 7 slide 2/56 Contents ILP: Instruction-Level Parallelism Review of simple pipelining Pipeline Hazards Dynamic branch prediction Review of floating-point numbers Fitting FP operations into the 5-stage pipeline In-order versus out-of-order
3 ENCM 501 W17 Lectures: Slide Set 7 slide 3/56 Outline of Slide Set 7 ILP: Instruction-Level Parallelism Review of simple pipelining Pipeline Hazards Dynamic branch prediction Review of floating-point numbers Fitting FP operations into the 5-stage pipeline In-order versus out-of-order
4 ENCM 501 W17 Lectures: Slide Set 7 slide 4/56 ILP: Instruction-Level Parallelism ILP is a general term for enhancing instruction throughput within a single processor core by having multiple instructions in flight at any given time. Two important forms of ILP are pipelining: each instruction takes several clock cycles to complete, but instructions are started one per clock cycle multiple issue: two or more instructions are started in the same clock cycle Modern processors use both pipelining and multiple issue, and use complex sets of related features to try to maximize instruction throughput.
5 ENCM 501 W17 Lectures: Slide Set 7 slide 5/56 Outline of Slide Set 7 ILP: Instruction-Level Parallelism Review of simple pipelining Pipeline Hazards Dynamic branch prediction Review of floating-point numbers Fitting FP operations into the 5-stage pipeline In-order versus out-of-order
6 ENCM 501 W17 Lectures: Slide Set 7 slide 6/56 Review of simple pipelining Before diving into microarchitectures with multiple pipelines, let s review the design challenges of getting a single pipeline to work fast and correctly. The basic organization of a pipeline involves pipeline stages: A stage performs some small simple step as part of handling an instruction. For example, one stage might be responsible for reading GPR values used in an instruction, and another stage might compute memory addresses to be used in loads and stores. pipeline registers: At the end of each clock cycle, a pipeline register captures the results produced by a stage, making those results available for the next stage in the next cycle.
7 ENCM 501 W17 Lectures: Slide Set 7 slide 7/56 First stage of a simple pipeline: IF (instruction fetch) We ll look at an example pipeline that can handle a few different kinds of MIPS instructions. The IF stage is responsible for updating the PC register as appropriate reading an instruction from memory and copying the instruction in a pipeline register so the instruction is available to the next stage, called the ID stage. Despite what we ve just learned about memory, we ll pretend that instruction memory is a simple functional unit that can be read within a single clock cycle!
8 ENCM 501 W17 Lectures: Slide Set 7 slide 8/56 branch target address branch decision IF stage CLK ID stage CLK address instruction instruction memory PC 0x add 32 IF/ID 32 usual PC update In every single clock cycle, the IF stage will dump a new instruction into the IF/ID pipeline register.
9 ENCM 501 W17 Lectures: Slide Set 7 slide 9/56 More stages This lecture will follow the 5-stage design presented in Section C.3 of the course textbook. The stages are: IF, which we ve just seen ID: instruction decode and GPR read EX: execute perform computation in ALU (arithmetic/logic unit) MEM: access to data memory for load or store WB: writeback write result of a load or an instruction like DADD to a GPR Let s sketch out what each of these stages do...
10 ENCM 501 W17 Lectures: Slide Set 7 slide 10/56 IF ID EX MEM WB Attention: This slide and others like it will not attempt to describe every detail of a pipeline stage. Instead it will just explain the general role of a stage. The ID stage: decodes the instruction finds out what kind of instruction it is, and what its operands are copies two GPR values into the ID/EX register copies an offset into the ID/EX register, in case the offset is needed for load, store, or branch copies some instruction address information into the ID/EX register, in case that is needed to generate a branch target address
11 ENCM 501 W17 Lectures: Slide Set 7 slide 11/56 R-type instructions R-type is MIPS jargon for instructions such as DADDU, DSUBU, OR, AND, etc. An R-type instruction involves performing some simple ALU computation involving two GPR values, and writing the result to a GPR.
12 ENCM 501 W17 Lectures: Slide Set 7 slide 12/56 IF ID EX MEM WB The EX stage performs a computation in the ALU. For an R-type instruction, the ALU performs whatever operation is appropriate (add, subtract, AND, OR, etc.), and writes the result into the EX/MEM register. For a load or store, the ALU computes a memory address, and writes the address into the EX/MEM register. For a branch, the ALU computes a branch target address and makes a branch decision. Both of those results get written into the EX/MEM register. Attention: The branch instruction handling described on this slide is specific to textbook Figure C.22! We ll look at problems related to that design in the next lecture.
13 ENCM 501 W17 Lectures: Slide Set 7 slide 13/56 IF ID EX MEM WB The MEM stage is mostly for data memory access by loads and stores. Again we pretend that memory is really simple! For an R-type instruction, not much happens. Results are copied from the EX/MEM register to the MEM/WB register. For a load, data read from memory gets copied into the MEM/WB register. For a store, data memory is updated using an address and data found in the EX/MEM register. For a branch, if the decision in EX was to take the branch, the PC gets updated with the branch target address. Attention, again: The branch instruction handling described on this slide is specific to textbook Figure C.22!
14 ENCM 501 W17 Lectures: Slide Set 7 slide 14/56 IF ID EX MEM WB The WB stage is used to update a GPR with the result of an R-type or load instruction. For an R-type or load instruction, a GPR is updated, using the appropriate result from the MEM/WB register. It wasn t mentioned before, but the 5-bit number specifying the destination register had to be passed from ID through EX and MEM to get to WB at the same time as the ALU or load result. For a store or a branch, nothing happens in WB. Those instructions finish in MEM.
15 ENCM 501 W17 Lectures: Slide Set 7 slide 15/56 A rough sketch of the 5-stage pipeline IF ID EX MEM WB CLK CLK CLK CLK CLK I-mem instr. decode CLK ALU D-mem? PC add GPRs IF/ID ID/EX EX/MEM MEM/WB A lot of detail has been left out, but there s enough here for us to trace processing of LW followed by DSUBU followed by SW.
16 ENCM 501 W17 Lectures: Slide Set 7 slide 16/56 Outline of Slide Set 7 ILP: Instruction-Level Parallelism Review of simple pipelining Pipeline Hazards Dynamic branch prediction Review of floating-point numbers Fitting FP operations into the 5-stage pipeline In-order versus out-of-order
17 ENCM 501 W17 Lectures: Slide Set 7 slide 17/56 Pipeline Hazards If a certain sequence of instructions prevents the usual throughput of one instruction for clock cycle in a simple pipeline, the situation is called a pipeline hazard. Hazards can be categorized into three main types: structural hazards, data hazards, and control hazards.
18 ENCM 501 W17 Lectures: Slide Set 7 slide 18/56 Structural hazards These occur when two instructions want to use the same physical resource at the same time, in incompatible ways. For example, if the simple 5-stage pipeline had a single memory unit, instead of split instruction and data memories, MEM of an LW or SW instruction would interfere with IF of a later instruction. Why is access to three GPRs by two different instructions, one in WB and a later one in ID, not a structural hazard?
19 ENCM 501 W17 Lectures: Slide Set 7 slide 19/56 Structural hazards: solutions The best solution is to design hardware to avoid structural hazards wherever possible. For example: in the simple, 5-stage pipeline, use separate instruction and data memories; in real pipelines, have separate I-TLBs and D-TLBs, and separate L1 I-caches and D-caches. For complex pipelines, it may be practically impossible to avoid all structural hazards, so stalls may be required if two instructions are contending for a resource, one or the other will be delayed one or more clock cycles.
20 ENCM 501 W17 Lectures: Slide Set 7 slide 20/56 Data hazards (We ll use MIPS32 instructions as examples, to match the 32-bit system depicted in textbook Figures C.21 and C.22.) The most common kind of data hazard is called a RAW hazard: RAW stands for Read-After-Write. ADD SUB R8, R9, R10 R11, R12, R8 For correct processing, SUB must work as if R8 is read by SUB after R8 is written by ADD. (This is where the term RAW comes from.) Let s draw a pipeline diagram to get a precise understanding of the problem.
21 ENCM 501 W17 Lectures: Slide Set 7 slide 21/56 More examples of RAW hazards For the simple 5-stage pipeline, let s find all the RAW hazards in this sequence... LW AND OR SLT R8, 0(R4) R9, R8, R5 R10, R6, R8 R11, R8, R7 Remark: The deeper a pipeline is (the more stages it has), the greater will be the number and complexity of potential RAW hazards.
22 ENCM 501 W17 Lectures: Slide Set 7 slide 22/56 Forwarding Forwarding is the name given to a technique that can often solve RAW data hazards without loss of clock cycles to stalls. (Another name for forwarding is bypassing.) The essential idea is that if Instruction B depends on the result of Instruction A, Instruction B should not wait for Instruction A to write that result to its destination, but instead grab that result as soon as it is available. Let s look at how forwarding helps with this sequence... LW AND OR SLT R8, 0(R4) R9, R8, R5 R10, R6, R8 R11, R8, R7
23 ENCM 501 W17 Lectures: Slide Set 7 slide 23/56 Sketch of forwarding hardware for 5-stage MIPS32 Here is an incomplete schematic for the EX stage... CLK ID/EX pipeline register GPR GPR LW/SW offset forward control FwdA FwdB ALU data for SW ALU result from EX/MEM reg. LW or ALU result from MEM/WB reg. A B
24 ENCM 501 W17 Lectures: Slide Set 7 slide 24/56 Q1: What should the values of the forward control outputs be in the case where no forwarding is needed? Consider this sequence: LW AND SUB R8, 0(R4) R9, R10, R11 R12, R8, R9 Q2: What should the values of the forward control outputs be when SUB is in the EX stage? Q3: What are the inputs to forward control and how does the forwarding logic work? (We ll give an example or two, not completely specify the logic!)
25 ENCM 501 W17 Lectures: Slide Set 7 slide 25/56 Can forwarding solve all RAW hazards? Consider this sequence: LW ADD R15, 0(R14) R16, R17, R15 Is it possible to solve the hazard by forwarding? If not, what is the most time-efficient way to solve the hazard? Let s make some general remarks about optimal solutions of RAW data hazards.
26 ENCM 501 W17 Lectures: Slide Set 7 slide 26/56 Control hazards: Introduction In a simple pipeline, a control hazard is a difficulty in determining the address to use for the next Instruction Fetch. Look at this example, and assume a version of MIPS32 in which the delay slot instruction is not supposed to be completed if the branch is taken: L1: LW R9, 0(R5) instructions in loop body BEQ OR R8, R0, L1 R16, R10, R0 In the clock cycle after IF for the BEQ instruction, why is doing IF difficult? (There is more than one reason.)
27 ENCM 501 W17 Lectures: Slide Set 7 slide 27/56 Control hazards: Not just for conditional branches! In a conditional branch, there is an obvious motivation to wait for the decision about whether or not to take the branch. But consider the following unconditional updates to the PC: jump within a procedure; procedure call; procedure return. Why do these kinds of instructions generate control hazards? How many cycles might be lost due to such a hazard in a 5-stage pipeline like the one we ve been looking at?
28 ENCM 501 W17 Lectures: Slide Set 7 slide 28/56 Old school solutions to control hazards (1) Stall as long as necessary to ensure that instruction results are correct. This obviously makes CPI worse (higher) if programs have lots of conditional branches and unconditional jumps.
29 ENCM 501 W17 Lectures: Slide Set 7 slide 29/56 Old school solutions to control hazards (2) Delayed jumps and branches. Because it is very difficult to do IF properly in the cycle immediately following a jump or a taken branch, many ISA designs decreed that the successor to a jump or branch would always be completed before the jump or branch target instruction... BEQ R12, R0, L99 ADD R13, R14, R15 # successor more instructions L99: SUB R8, R9, R10 # branch target OR R16, R8, R0 Real MIPS ISAs (as opposed to some hypothetical MIPS-like ISAs in textbooks and lecture slides) have delayed branches and jumps.
30 ENCM 501 W17 Lectures: Slide Set 7 slide 30/56 Outline of Slide Set 7 ILP: Instruction-Level Parallelism Review of simple pipelining Pipeline Hazards Dynamic branch prediction Review of floating-point numbers Fitting FP operations into the 5-stage pipeline In-order versus out-of-order
31 ENCM 501 W17 Lectures: Slide Set 7 slide 31/56 Dynamic branch prediction Dynamic branch prediction is the most important current technology for management of control hazards. A branch prediction circuit is a memory array comparable in size to an L1 I-cache, and somewhat more complex. A branch prediction circuit records the locations of thousands of recently-encountered branches and jumps, along with the addresses of their targets. For each conditional branch, a branch prediction circuit maintains a few bits of information that can be used to predict whether the branch will be taken or untaken.
32 ENCM 501 W17 Lectures: Slide Set 7 slide 32/56 Branch prediction code example p and past_last are of type int*. count is an int. do { if (*p < 0) count++; p++; } while (p!= past_last); p walks through an array of int elements, and count records how many of those elements are negative.
33 ENCM 501 W17 Lectures: Slide Set 7 slide 33/56 Branch prediction code example, continued Assembly language for a MIPS32-like ISA that does not have delayed branch... L1: LW R8, (R4) SLT R9, R0, R8 BEQ R9, R0, L2 # branch if!(*p < 0) ADDIU R25, R25, 1 # count++ L2: ADDIU R4, R4, 4 # p++ BNE R4, R24, L1 # branch if p!= past_last Let s suppose that there are a lot of array elements, and most of them are negative. As the processor runs the loop, what predictions will it learn to make about the BEQ and BNE instructions?
34 ENCM 501 W17 Lectures: Slide Set 7 slide 34/56 Scalar versus Superscalar It seems like the right moment to introduce these terms. A scalar processor core starts no more than one instruction per clock cycle. In some cycles it can t start an instruction, due to a stall caused by a pipeline hazard. All of the pipeline examples so far have been for scalar cores. A superscalar processor core tries to start two or more instructions per clock cycle. When I start talking about superscalar cores, I will let you know.
35 ENCM 501 W17 Lectures: Slide Set 7 slide 35/56 A 5-stage pipeline with dynamic branch prediction Let s review our previous sketch of the 5-stage pipeline, then show how it would be modified to support dynamic branch prediction. An instruction fetch unit encapsulates a PC, an L1 I-cache, and a branch prediction circuit. Both sketches are for scalar systems.
36 ENCM 501 W17 Lectures: Slide Set 7 slide 36/56 A rough sketch of the 5-stage pipeline These are the pieces we saw previously... IF ID EX MEM WB CLK CLK CLK CLK CLK I-mem instr. decode CLK ALU D-mem? PC add GPRs IF/ID ID/EX EX/MEM MEM/WB
37 ENCM 501 W17 Lectures: Slide Set 7 slide 37/56 5-stage pipeline with dynamic branch prediction Note that a monster has moved into the IF stage... IF ID EX MEM WB CLK CLK CLK CLK CLK instr. decode instruction fetch unit CLK ALU D-mem? GPRs IF/ID ID/EX EX/MEM MEM/WB
38 ENCM 501 W17 Lectures: Slide Set 7 slide 38/56 Scalar performance with dynamic branch prediction If the branch predictor does a good job, CPI will be very close to 1. What are two reasons why, for most programs, CPI will be somewhat greater than 1?
39 ENCM 501 W17 Lectures: Slide Set 7 slide 39/56 Outline of Slide Set 7 ILP: Instruction-Level Parallelism Review of simple pipelining Pipeline Hazards Dynamic branch prediction Review of floating-point numbers Fitting FP operations into the 5-stage pipeline In-order versus out-of-order
40 ENCM 501 W17 Lectures: Slide Set 7 slide 40/56 A quick, incomplete review of floating-point numbers A lot of textbook examples use floating-point instructions, so a brief review might be a good idea. Essentially, floating-point is a base two version of scientific notation. Here s an example of scientific notation: The mass of the earth is about kg, more conveniently written as kg.
41 ENCM 501 W17 Lectures: Slide Set 7 slide 41/56 Any nonzero real number can be written as sign 2 exponent (1 + fraction), where the exponent is an integer and 0 fraction < 1.0. If we have a finite number of exponent bits, that will limit the magnitude range of the numbers we can represent. With a finite number of fraction bits, most real numbers can only be approximated floating-point representation involves rounding error. For a computer to work with floating-point numbers, we need a way to organize sign, exponent, and fraction bits into fixed-size chunks...
42 ENCM 501 W17 Lectures: Slide Set 7 slide 42/56 Bit fields in 64-bit floating-point exponent bits sign bit 52 fraction bits Sign bit: 0 for positive, 1 for negative. Exponent: Uses a bias of two = 1023 ten. Example bit patterns: means the exponent is zero; means the exponent is 1; means the exponent is +1.
43 ENCM 501 W17 Lectures: Slide Set 7 slide 43/ exponent bits sign bit 52 fraction bits Fraction bits: Only bits from the right side of the binary point are recorded. It is assumed that there is a single 1 bit to the left of the binary point, so that bit need not be recorded. Example: How is ten represented? = = two sign, exponent, and fraction are:
44 ENCM 501 W17 Lectures: Slide Set 7 slide 44/56 In IEEE 754 floating-point formats there are some special bit patterns: zero + NaN not a number. For example in IEEE 754, the result of 1.0/0.0 is +, but the result of 0.0/0.0 is NaN.
45 ENCM 501 W17 Lectures: Slide Set 7 slide 45/56 FP multiplication If A and B are nonzero, then A B is signa signb 2 (exponenta + exponentb) (1 + fractiona) (1 + fractionb) To do an FP multiplication, a logic circuit first has to check that operands are not zero or other special bit patterns. If the operands aren t special, the step that costs the most time (and energy) is the 53-bit-by-53-bit integer multiplication for (1 + fractiona) (1 + fractionb). At the end, there must be rounding, exponent adjustment, and a check for underflow or overflow.
46 ENCM 501 W17 Lectures: Slide Set 7 slide 46/56 Will FP multiplication fit into a single clock cycle? No! An example in textbook Section C.5 suggests a latency of 7 clock cycles for FP multiplication. The same example suggests a latency of 4 clock cycles for FP addition or subtraction, which are easier than FP multiplication, but much more complicated than integer addition or subtraction. Those numbers are examples. Together, Moore s Law and the ingenuity of circuit designers imply that the latencies of FP arithmetic operations vary from year to year and from one design to another.
47 ENCM 501 W17 Lectures: Slide Set 7 slide 47/56 Outline of Slide Set 7 ILP: Instruction-Level Parallelism Review of simple pipelining Pipeline Hazards Dynamic branch prediction Review of floating-point numbers Fitting FP operations into the 5-stage pipeline In-order versus out-of-order
48 ENCM 501 W17 Lectures: Slide Set 7 slide 48/56 Fitting FP operations into the 5-stage pipeline Actually, this applies to fitting in integer multiplication and integer division as well. Let s follow the textbook example: 7-cycle latency for FP or integer multiplication 4-cycle latency for FP addition 24-cycle latency for FP or integer division (Note: Division is notoriously hard to do fast in digital logic!) We are going to have to give up on our nice, easy 1-cycle EX stage in the middle of the 5-stage pipeline!
49 ENCM 501 W17 Lectures: Slide Set 7 slide 49/56 Let s make some notes about this picture... Integer unit EX M1 FP/integer multiply M2 M3 M4 M5 M6 M7 IF ID MEM WB FP adder A1 A2 A3 A4 FP/integer divider DIV Image is Figure C.35 from Hennessy J. L. and Patterson D. A., Computer Architecture: A Quantitative Approach, 5nd ed., c 2012, Elsevier, Inc.
50 ENCM 501 W17 Lectures: Slide Set 7 slide 50/56 Attention... The picture on slide 49 makes it clear that the simple 5-stage model used to introduce pipelining hides some important real-world difficulties! But the picture is still hiding one of the major difficulties in modern computer design. (Well, actually, it s hiding more than one such difficulty.) What is the most glaring oversimplification in the picture?
51 ENCM 501 W17 Lectures: Slide Set 7 slide 51/56 Quick overview of MIPS FP instructions Many versions of the MIPS ISA have bit floating-point registers: F0, F2, F4,..., F30 note use of even numbers only for FPRs. (Newer ISA versions have bit FPRs.) F0 is not special. Unlike the GPR R0, F0 is not hard-wired to have a value of 0.0.
52 ENCM 501 W17 Lectures: Slide Set 7 slide 52/56 Loads, stores and arithmetic are easy to understand. Here is a very short example: L.D F2, 0(R4) # load L.D F4, 0(R5) # load MUL.D F6, F2, F4 # multiply S.D F6, 0(R7) # store Note the use of GPRs for addresses. Remember, memory addresses are integers! The suffix.d is for double precision. Use.S instead to work with with 32-bit single precision FP numbers. To understand examples in ENCM 501, we do not need to know the details of instructions for FP comparison, branching on FP comparison results, or converting between integer and FP formats.
53 ENCM 501 W17 Lectures: Slide Set 7 slide 53/56 Outline of Slide Set 7 ILP: Instruction-Level Parallelism Review of simple pipelining Pipeline Hazards Dynamic branch prediction Review of floating-point numbers Fitting FP operations into the 5-stage pipeline In-order versus out-of-order
54 ENCM 501 W17 Lectures: Slide Set 7 slide 54/56 In-order versus out-of-order In-order execution of instructions implies that instructions are processed in the same order that they would be in a hypothetical computer that always completes one instruction before starting the next. The simple 5-stage pipeline is in-order, even though there are usually 5 instructions in flight within the pipeline. (What about instructions that get into the 5-stage pipeline but get cancelled due to a branch?) Out-of-order execution implies that start and completion of instructions is often but not always in-order.
55 ENCM 501 W17 Lectures: Slide Set 7 slide 55/56 5-stage pipeline with variable-length EX stage This pipeline always starts instructions in-order. This is known as in-order issue of instructions. However, there is a design choice to be made: Should we allow instructions to complete out-of-order? What are the advantages and disadvantages of forcing instruction completion to be in-order? What are some challenges created by allowing out-of-order completion?
56 ENCM 501 W17 Lectures: Slide Set 7 slide 56/56 More about out-of-order execution... In the next slide set, we ll look in detail about hazards related to out-of-order execution. Then we ll look at an organization for out-of-order execution called Tomasulo s algorithm, which solves RAW hazards, in a way that is interesting to compare to forwarding solves so-called WAW and WAR hazards, which can t happen with in-order execution deals effectively with variable latencies related to different kinds of arithmetic and variable latencies in memory access due TLB and cache misses
Slides for Lecture 15
Slides for Lecture 15 ENCM 501: Principles of Computer Architecture Winter 2014 Term Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary 6 March,
More informationSlide Set 8. for ENCM 501 in Winter Steve Norman, PhD, PEng
Slide Set 8 for ENCM 501 in Winter 2018 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary March 2018 ENCM 501 Winter 2018 Slide Set 8 slide
More informationLecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1
Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number
More informationLECTURE 3: THE PROCESSOR
LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU
More informationAdvanced Computer Architecture
Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata What is a Processors Architecture Instruction Set Architecture (ISA) Describes
More informationContents. Slide Set 1. About these slides. Outline of Slide Set 1. Typographical conventions: Italics. Typographical conventions. About these slides
Slide Set 1 for ENCM 369 Winter 2014 Lecture Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 2014 ENCM 369 W14 Section
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationThe Processor Pipeline. Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes.
The Processor Pipeline Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes. Pipeline A Basic MIPS Implementation Memory-reference instructions Load Word (lw) and Store Word (sw) ALU instructions
More informationMulti-cycle Instructions in the Pipeline (Floating Point)
Lecture 6 Multi-cycle Instructions in the Pipeline (Floating Point) Introduction to instruction level parallelism Recap: Support of multi-cycle instructions in a pipeline (App A.5) Recap: Superpipelining
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationInstruction Level Parallelism. ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction
Instruction Level Parallelism ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction Basic Block A straight line code sequence with no branches in except to the entry and no branches
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined
More informationECE154A Introduction to Computer Architecture. Homework 4 solution
ECE154A Introduction to Computer Architecture Homework 4 solution 4.16.1 According to Figure 4.65 on the textbook, each register located between two pipeline stages keeps data shown below. Register IF/ID
More informationInstruction Pipelining Review
Instruction Pipelining Review Instruction pipelining is CPU implementation technique where multiple operations on a number of instructions are overlapped. An instruction execution pipeline involves a number
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationChapter 4 The Processor 1. Chapter 4A. The Processor
Chapter 4 The Processor 1 Chapter 4A The Processor Chapter 4 The Processor 2 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationSlide Set 9. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng
Slide Set 9 for ENCM 369 Winter 2018 Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary March 2018 ENCM 369 Winter 2018 Section 01
More informationSlide Set 1 (corrected)
Slide Set 1 (corrected) for ENCM 369 Winter 2018 Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary January 2018 ENCM 369 Winter 2018
More informationSlide Set 8. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng
Slide Set 8 for ENCM 369 Winter 2018 Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary March 2018 ENCM 369 Winter 2018 Section 01
More informationSlide Set 5. for ENCM 369 Winter 2014 Lecture Section 01. Steve Norman, PhD, PEng
Slide Set 5 for ENCM 369 Winter 2014 Lecture Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 2014 ENCM 369 W14 Section
More informationChapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction
More informationThe Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture
The Processor Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut CSE3666: Introduction to Computer Architecture Introduction CPU performance factors Instruction count
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationSome material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier
Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science Cases that affect instruction execution semantics
More information#1 #2 with corrections Monday, March 12 7:00pm to 8:30pm. Please do not write your U of C ID number on this cover page.
page 1 of 6 University of Calgary Department of Electrical and Computer Engineering ENCM 369: Computer Organization Lecture Instructors: Steve Norman and Norm Bartley Winter 2018 MIDTERM TEST #1 #2 with
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationMidnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4
IC220 Set #9: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life Return to Chapter 4 Midnight Laundry Task order A B C D 6 PM 7 8 9 0 2 2 AM 2 Smarty Laundry Task order A B C D 6 PM
More informationSlide Set 7 for Lecture Section 01
Slide Set 7 for Lecture Section 01 for ENCM 369 Winter 2017 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary February 2017 ENCM 369 Winter
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationChapter 4 The Processor 1. Chapter 4B. The Processor
Chapter 4 The Processor 1 Chapter 4B The Processor Chapter 4 The Processor 2 Control Hazards Branch determines flow of control Fetching next instruction depends on branch outcome Pipeline can t always
More informationInstruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31
4.16 Exercises 419 Exercise 4.11 In this exercise we examine in detail how an instruction is executed in a single-cycle datapath. Problems in this exercise refer to a clock cycle in which the processor
More informationSlide Set 11. for ENCM 369 Winter 2015 Lecture Section 01. Steve Norman, PhD, PEng
Slide Set 11 for ENCM 369 Winter 2015 Lecture Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 2015 ENCM 369 W15 Section
More informationCOSC 6385 Computer Architecture - Pipelining
COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage
More informationWhat is Pipelining? Time per instruction on unpipelined machine Number of pipe stages
What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism
More informationPipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.
Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =2n/05n+15 2n/0.5n 1.5 4 = number of stages 4.5 An Overview
More informationCMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3. Complications With Long Instructions
CMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3 Long Instructions & MIPS Case Study Complications With Long Instructions So far, all MIPS instructions take 5 cycles But haven't talked
More informationMinimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline
Instruction Pipelining Review: MIPS In-Order Single-Issue Integer Pipeline Performance of Pipelines with Stalls Pipeline Hazards Structural hazards Data hazards Minimizing Data hazard Stalls by Forwarding
More informationECE331: Hardware Organization and Design
ECE331: Hardware Organization and Design Lecture 35: Final Exam Review Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Material from Earlier in the Semester Throughput and latency
More informationContents Slide Set 9. Final Notes on Textbook Chapter 7. Outline of Slide Set 9. More about skipped sections in Chapter 7. Outline of Slide Set 9
slide 2/41 Contents Slide Set 9 for ENCM 369 Winter 2014 Lecture Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 2014
More informationComputer Architecture
Lecture 3: Pipelining Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture Measurements and metrics : Performance, Cost, Dependability, Power Guidelines and principles in
More informationData Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard
Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Consider: a = b + c; d = e - f; Assume loads have a latency of one clock cycle:
More informationSlides for Lecture 6
Slides for Lecture 6 ENCM 501: Principles of Computer Architecture Winter 2014 Term Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary 28 January,
More informationInstruction Pipelining
Instruction Pipelining Simplest form is a 3-stage linear pipeline New instruction fetched each clock cycle Instruction finished each clock cycle Maximal speedup = 3 achieved if and only if all pipe stages
More informationAdvanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017
Advanced Parallel Architecture Lessons 5 and 6 Annalisa Massini - Pipelining Hennessy, Patterson Computer architecture A quantitive approach Appendix C Sections C.1, C.2 Pipelining Pipelining is an implementation
More informationHY425 Lecture 05: Branch Prediction
HY425 Lecture 05: Branch Prediction Dimitrios S. Nikolopoulos University of Crete and FORTH-ICS October 19, 2011 Dimitrios S. Nikolopoulos HY425 Lecture 05: Branch Prediction 1 / 45 Exploiting ILP in hardware
More informationPipelined Processors. Ideal Pipelining. Example: FP Multiplier. 55:132/22C:160 Spring Jon Kuhl 1
55:3/C:60 Spring 00 Pipelined Design Motivation: Increase processor throughput with modest increase in hardware. Bandwidth or Throughput = Performance Pipelined Processors Chapter Bandwidth (BW) = no.
More informationENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design
ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University
More informationBasic Instruction Timings. Pipelining 1. How long would it take to execute the following sequence of instructions?
Basic Instruction Timings Pipelining 1 Making some assumptions regarding the operation times for some of the basic hardware units in our datapath, we have the following timings: Instruction class Instruction
More informationCMSC 411 Practice Exam 1 w/answers. 1. CPU performance Suppose we have the following instruction mix and clock cycles per instruction.
CMSC 4 Practice Exam w/answers General instructions. Be complete, yet concise. You may leave arithmetic expressions in any form that a calculator could evaluate.. CPU performance Suppose we have the following
More informationCS 352H Computer Systems Architecture Exam #1 - Prof. Keckler October 11, 2007
CS 352H Computer Systems Architecture Exam #1 - Prof. Keckler October 11, 2007 Name: Solutions (please print) 1-3. 11 points 4. 7 points 5. 7 points 6. 20 points 7. 30 points 8. 25 points Total (105 pts):
More informationLECTURE 10. Pipelining: Advanced ILP
LECTURE 10 Pipelining: Advanced ILP EXCEPTIONS An exception, or interrupt, is an event other than regular transfers of control (branches, jumps, calls, returns) that changes the normal flow of instruction
More informationDepartment of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri
Department of Computer and IT Engineering University of Kurdistan Computer Architecture Pipelining By: Dr. Alireza Abdollahpouri Pipelined MIPS processor Any instruction set can be implemented in many
More informationOutline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception
Outline A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception 1 4 Which stage is the branch decision made? Case 1: 0 M u x 1 Add
More informationPipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
Pipeline Hazards Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Hazards What are hazards? Situations that prevent starting the next instruction
More informationControl Hazards - branching causes problems since the pipeline can be filled with the wrong instructions.
Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions Stage Instruction Fetch Instruction Decode Execution / Effective addr Memory access Write-back Abbreviation
More informationLaboratory Pipeline MIPS CPU Design (2): 16-bits version
Laboratory 10 10. Pipeline MIPS CPU Design (2): 16-bits version 10.1. Objectives Study, design, implement and test MIPS 16 CPU, pipeline version with the modified program without hazards Familiarize the
More informationECE260: Fundamentals of Computer Engineering
ECE260: Fundamentals of Computer Engineering Pipelined Datapath and Control James Moscola Dept. of Engineering & Computer Science York College of Pennsylvania ECE260: Fundamentals of Computer Engineering
More informationHardware-Based Speculation
Hardware-Based Speculation Execute instructions along predicted execution paths but only commit the results if prediction was correct Instruction commit: allowing an instruction to update the register
More informationECE331: Hardware Organization and Design
ECE331: Hardware Organization and Design Lecture 27: Midterm2 review Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Midterm 2 Review Midterm will cover Section 1.6: Processor
More informationPipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science
Pipeline Overview Dr. Jiang Li Adapted from the slides provided by the authors Outline MIPS An ISA for Pipelining 5 stage pipelining Structural and Data Hazards Forwarding Branch Schemes Exceptions and
More informationLecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University
Lecture 9 Pipeline Hazards Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee18b 1 Announcements PA-1 is due today Electronic submission Lab2 is due on Tuesday 2/13 th Quiz1 grades will
More informationILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5)
Instruction-Level Parallelism and its Exploitation: PART 1 ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5) Project and Case
More informationCISC 662 Graduate Computer Architecture Lecture 6 - Hazards
CISC 662 Graduate Computer Architecture Lecture 6 - Hazards Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer
More informationCISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1
CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer
More informationComputer Architecture A Quantitative Approach, Fifth Edition. Chapter 3. Instruction-Level Parallelism and Its Exploitation
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 3 Instruction-Level Parallelism and Its Exploitation Introduction Pipelining become universal technique in 1985 Overlaps execution of
More informationPipelining. Pipeline performance
Pipelining Basic concept of assembly line Split a job A into n sequential subjobs (A 1,A 2,,A n ) with each A i taking approximately the same time Each subjob is processed by a different substation (or
More information3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?
CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:
More informationECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 6 Pipelining Part 1
ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 6 Pipelining Part 1 Benjamin Lee Electrical and Computer Engineering Duke University www.duke.edu/~bcl15 www.duke.edu/~bcl15/class/class_ece252fall12.html
More informationLecture 7 Pipelining. Peng Liu.
Lecture 7 Pipelining Peng Liu liupeng@zju.edu.cn 1 Review: The Single Cycle Processor 2 Review: Given Datapath,RTL -> Control Instruction Inst Memory Adr Op Fun Rt
More informationFull Datapath. Chapter 4 The Processor 2
Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory
More informationEIE/ENE 334 Microprocessors
EIE/ENE 334 Microprocessors Lecture 6: The Processor Week #06/07 : Dejwoot KHAWPARISUTH Adapted from Computer Organization and Design, 4 th Edition, Patterson & Hennessy, 2009, Elsevier (MK) http://webstaff.kmutt.ac.th/~dejwoot.kha/
More informationLecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27)
Lecture Topics Today: Data and Control Hazards (P&H 4.7-4.8) Next: continued 1 Announcements Exam #1 returned Milestone #5 (due 2/27) Milestone #6 (due 3/13) 2 1 Review: Pipelined Implementations Pipelining
More informationInstruction Frequency CPI. Load-store 55% 5. Arithmetic 30% 4. Branch 15% 4
PROBLEM 1: An application running on a 1GHz pipelined processor has the following instruction mix: Instruction Frequency CPI Load-store 55% 5 Arithmetic 30% 4 Branch 15% 4 a) Determine the overall CPI
More informationENCM 369 Winter 2018 Lab 9 for the Week of March 19
page 1 of 9 ENCM 369 Winter 2018 Lab 9 for the Week of March 19 Steve Norman Department of Electrical & Computer Engineering University of Calgary March 2018 Lab instructions and other documents for ENCM
More informationDEE 1053 Computer Organization Lecture 6: Pipelining
Dept. Electronics Engineering, National Chiao Tung University DEE 1053 Computer Organization Lecture 6: Pipelining Dr. Tian-Sheuan Chang tschang@twins.ee.nctu.edu.tw Dept. Electronics Engineering National
More informationInstruction Pipelining
Instruction Pipelining Simplest form is a 3-stage linear pipeline New instruction fetched each clock cycle Instruction finished each clock cycle Maximal speedup = 3 achieved if and only if all pipe stages
More informationAppendix C: Pipelining: Basic and Intermediate Concepts
Appendix C: Pipelining: Basic and Intermediate Concepts Key ideas and simple pipeline (Section C.1) Hazards (Sections C.2 and C.3) Structural hazards Data hazards Control hazards Exceptions (Section C.4)
More informationEI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)
EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) Chentao Wu 吴晨涛 Associate Professor Dept. of Computer Science and Engineering Shanghai Jiao Tong University SEIEE Building
More informationComplications with long instructions. CMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3. How slow is slow?
Complications with long instructions CMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3 Long Instructions & MIPS Case Study So far, all MIPS instructions take 5 cycles But haven't talked
More informationPipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ...
CHAPTER 6 1 Pipelining Instruction class Instruction memory ister read ALU Data memory ister write Total (in ps) Load word 200 100 200 200 100 800 Store word 200 100 200 200 700 R-format 200 100 200 100
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 3 Instruction-Level Parallelism and Its Exploitation 1 Branch Prediction Basic 2-bit predictor: For each branch: Predict taken or not
More informationDetermined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version
MIPS Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationMIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14
MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK
More informationLECTURE 9. Pipeline Hazards
LECTURE 9 Pipeline Hazards PIPELINED DATAPATH AND CONTROL In the previous lecture, we finalized the pipelined datapath for instruction sequences which do not include hazards of any kind. Remember that
More information14:332:331 Pipelined Datapath
14:332:331 Pipelined Datapath I n s t r. O r d e r Inst 0 Inst 1 Inst 2 Inst 3 Inst 4 Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently the clock cycle must be timed to accommodate
More informationInteger Multiplication and Division
Integer Multiplication and Division for ENCM 369: Computer Organization Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 208 Integer
More informationInstruction-Level Parallelism and Its Exploitation
Chapter 2 Instruction-Level Parallelism and Its Exploitation 1 Overview Instruction level parallelism Dynamic Scheduling Techniques es Scoreboarding Tomasulo s s Algorithm Reducing Branch Cost with Dynamic
More informationEE557--FALL 1999 MAKE-UP MIDTERM 1. Closed books, closed notes
NAME: STUDENT NUMBER: EE557--FALL 1999 MAKE-UP MIDTERM 1 Closed books, closed notes Q1: /1 Q2: /1 Q3: /1 Q4: /1 Q5: /15 Q6: /1 TOTAL: /65 Grade: /25 1 QUESTION 1(Performance evaluation) 1 points We are
More informationComputer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM
Computer Architecture Computer Science & Engineering Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware
More informationCOMP2611: Computer Organization. The Pipelined Processor
COMP2611: Computer Organization The 1 2 Background 2 High-Performance Processors 3 Two techniques for designing high-performance processors by exploiting parallelism: Multiprocessing: parallelism among
More informationPage # CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Michela Taufer
CISC 662 Graduate Computer Architecture Lecture 8 - ILP 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer Architecture,
More informationComputer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining
Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Single-Cycle Design Problems Assuming fixed-period clock every instruction datapath uses one
More informationSlide Set 5. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng
Slide Set 5 for ENCM 369 Winter 2018 Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary February 2018 ENCM 369 Winter 2018 Section
More informationPipelining. CSC Friday, November 6, 2015
Pipelining CSC 211.01 Friday, November 6, 2015 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory register file ALU data memory register file Not
More informationComputer Architecture Spring 2016
Computer Architecture Spring 2016 Lecture 02: Introduction II Shuai Wang Department of Computer Science and Technology Nanjing University Pipeline Hazards Major hurdle to pipelining: hazards prevent the
More informationECE232: Hardware Organization and Design
ECE232: Hardware Organization and Design Lecture 17: Pipelining Wrapup Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Outline The textbook includes lots of information Focus on
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More information