Slide Set 7. for ENCM 501 in Winter Term, Steve Norman, PhD, PEng

Size: px
Start display at page:

Download "Slide Set 7. for ENCM 501 in Winter Term, Steve Norman, PhD, PEng"

Transcription

1 Slide Set 7 for ENCM 501 in Winter Term, 2017 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 2017

2 ENCM 501 W17 Lectures: Slide Set 7 slide 2/56 Contents ILP: Instruction-Level Parallelism Review of simple pipelining Pipeline Hazards Dynamic branch prediction Review of floating-point numbers Fitting FP operations into the 5-stage pipeline In-order versus out-of-order

3 ENCM 501 W17 Lectures: Slide Set 7 slide 3/56 Outline of Slide Set 7 ILP: Instruction-Level Parallelism Review of simple pipelining Pipeline Hazards Dynamic branch prediction Review of floating-point numbers Fitting FP operations into the 5-stage pipeline In-order versus out-of-order

4 ENCM 501 W17 Lectures: Slide Set 7 slide 4/56 ILP: Instruction-Level Parallelism ILP is a general term for enhancing instruction throughput within a single processor core by having multiple instructions in flight at any given time. Two important forms of ILP are pipelining: each instruction takes several clock cycles to complete, but instructions are started one per clock cycle multiple issue: two or more instructions are started in the same clock cycle Modern processors use both pipelining and multiple issue, and use complex sets of related features to try to maximize instruction throughput.

5 ENCM 501 W17 Lectures: Slide Set 7 slide 5/56 Outline of Slide Set 7 ILP: Instruction-Level Parallelism Review of simple pipelining Pipeline Hazards Dynamic branch prediction Review of floating-point numbers Fitting FP operations into the 5-stage pipeline In-order versus out-of-order

6 ENCM 501 W17 Lectures: Slide Set 7 slide 6/56 Review of simple pipelining Before diving into microarchitectures with multiple pipelines, let s review the design challenges of getting a single pipeline to work fast and correctly. The basic organization of a pipeline involves pipeline stages: A stage performs some small simple step as part of handling an instruction. For example, one stage might be responsible for reading GPR values used in an instruction, and another stage might compute memory addresses to be used in loads and stores. pipeline registers: At the end of each clock cycle, a pipeline register captures the results produced by a stage, making those results available for the next stage in the next cycle.

7 ENCM 501 W17 Lectures: Slide Set 7 slide 7/56 First stage of a simple pipeline: IF (instruction fetch) We ll look at an example pipeline that can handle a few different kinds of MIPS instructions. The IF stage is responsible for updating the PC register as appropriate reading an instruction from memory and copying the instruction in a pipeline register so the instruction is available to the next stage, called the ID stage. Despite what we ve just learned about memory, we ll pretend that instruction memory is a simple functional unit that can be read within a single clock cycle!

8 ENCM 501 W17 Lectures: Slide Set 7 slide 8/56 branch target address branch decision IF stage CLK ID stage CLK address instruction instruction memory PC 0x add 32 IF/ID 32 usual PC update In every single clock cycle, the IF stage will dump a new instruction into the IF/ID pipeline register.

9 ENCM 501 W17 Lectures: Slide Set 7 slide 9/56 More stages This lecture will follow the 5-stage design presented in Section C.3 of the course textbook. The stages are: IF, which we ve just seen ID: instruction decode and GPR read EX: execute perform computation in ALU (arithmetic/logic unit) MEM: access to data memory for load or store WB: writeback write result of a load or an instruction like DADD to a GPR Let s sketch out what each of these stages do...

10 ENCM 501 W17 Lectures: Slide Set 7 slide 10/56 IF ID EX MEM WB Attention: This slide and others like it will not attempt to describe every detail of a pipeline stage. Instead it will just explain the general role of a stage. The ID stage: decodes the instruction finds out what kind of instruction it is, and what its operands are copies two GPR values into the ID/EX register copies an offset into the ID/EX register, in case the offset is needed for load, store, or branch copies some instruction address information into the ID/EX register, in case that is needed to generate a branch target address

11 ENCM 501 W17 Lectures: Slide Set 7 slide 11/56 R-type instructions R-type is MIPS jargon for instructions such as DADDU, DSUBU, OR, AND, etc. An R-type instruction involves performing some simple ALU computation involving two GPR values, and writing the result to a GPR.

12 ENCM 501 W17 Lectures: Slide Set 7 slide 12/56 IF ID EX MEM WB The EX stage performs a computation in the ALU. For an R-type instruction, the ALU performs whatever operation is appropriate (add, subtract, AND, OR, etc.), and writes the result into the EX/MEM register. For a load or store, the ALU computes a memory address, and writes the address into the EX/MEM register. For a branch, the ALU computes a branch target address and makes a branch decision. Both of those results get written into the EX/MEM register. Attention: The branch instruction handling described on this slide is specific to textbook Figure C.22! We ll look at problems related to that design in the next lecture.

13 ENCM 501 W17 Lectures: Slide Set 7 slide 13/56 IF ID EX MEM WB The MEM stage is mostly for data memory access by loads and stores. Again we pretend that memory is really simple! For an R-type instruction, not much happens. Results are copied from the EX/MEM register to the MEM/WB register. For a load, data read from memory gets copied into the MEM/WB register. For a store, data memory is updated using an address and data found in the EX/MEM register. For a branch, if the decision in EX was to take the branch, the PC gets updated with the branch target address. Attention, again: The branch instruction handling described on this slide is specific to textbook Figure C.22!

14 ENCM 501 W17 Lectures: Slide Set 7 slide 14/56 IF ID EX MEM WB The WB stage is used to update a GPR with the result of an R-type or load instruction. For an R-type or load instruction, a GPR is updated, using the appropriate result from the MEM/WB register. It wasn t mentioned before, but the 5-bit number specifying the destination register had to be passed from ID through EX and MEM to get to WB at the same time as the ALU or load result. For a store or a branch, nothing happens in WB. Those instructions finish in MEM.

15 ENCM 501 W17 Lectures: Slide Set 7 slide 15/56 A rough sketch of the 5-stage pipeline IF ID EX MEM WB CLK CLK CLK CLK CLK I-mem instr. decode CLK ALU D-mem? PC add GPRs IF/ID ID/EX EX/MEM MEM/WB A lot of detail has been left out, but there s enough here for us to trace processing of LW followed by DSUBU followed by SW.

16 ENCM 501 W17 Lectures: Slide Set 7 slide 16/56 Outline of Slide Set 7 ILP: Instruction-Level Parallelism Review of simple pipelining Pipeline Hazards Dynamic branch prediction Review of floating-point numbers Fitting FP operations into the 5-stage pipeline In-order versus out-of-order

17 ENCM 501 W17 Lectures: Slide Set 7 slide 17/56 Pipeline Hazards If a certain sequence of instructions prevents the usual throughput of one instruction for clock cycle in a simple pipeline, the situation is called a pipeline hazard. Hazards can be categorized into three main types: structural hazards, data hazards, and control hazards.

18 ENCM 501 W17 Lectures: Slide Set 7 slide 18/56 Structural hazards These occur when two instructions want to use the same physical resource at the same time, in incompatible ways. For example, if the simple 5-stage pipeline had a single memory unit, instead of split instruction and data memories, MEM of an LW or SW instruction would interfere with IF of a later instruction. Why is access to three GPRs by two different instructions, one in WB and a later one in ID, not a structural hazard?

19 ENCM 501 W17 Lectures: Slide Set 7 slide 19/56 Structural hazards: solutions The best solution is to design hardware to avoid structural hazards wherever possible. For example: in the simple, 5-stage pipeline, use separate instruction and data memories; in real pipelines, have separate I-TLBs and D-TLBs, and separate L1 I-caches and D-caches. For complex pipelines, it may be practically impossible to avoid all structural hazards, so stalls may be required if two instructions are contending for a resource, one or the other will be delayed one or more clock cycles.

20 ENCM 501 W17 Lectures: Slide Set 7 slide 20/56 Data hazards (We ll use MIPS32 instructions as examples, to match the 32-bit system depicted in textbook Figures C.21 and C.22.) The most common kind of data hazard is called a RAW hazard: RAW stands for Read-After-Write. ADD SUB R8, R9, R10 R11, R12, R8 For correct processing, SUB must work as if R8 is read by SUB after R8 is written by ADD. (This is where the term RAW comes from.) Let s draw a pipeline diagram to get a precise understanding of the problem.

21 ENCM 501 W17 Lectures: Slide Set 7 slide 21/56 More examples of RAW hazards For the simple 5-stage pipeline, let s find all the RAW hazards in this sequence... LW AND OR SLT R8, 0(R4) R9, R8, R5 R10, R6, R8 R11, R8, R7 Remark: The deeper a pipeline is (the more stages it has), the greater will be the number and complexity of potential RAW hazards.

22 ENCM 501 W17 Lectures: Slide Set 7 slide 22/56 Forwarding Forwarding is the name given to a technique that can often solve RAW data hazards without loss of clock cycles to stalls. (Another name for forwarding is bypassing.) The essential idea is that if Instruction B depends on the result of Instruction A, Instruction B should not wait for Instruction A to write that result to its destination, but instead grab that result as soon as it is available. Let s look at how forwarding helps with this sequence... LW AND OR SLT R8, 0(R4) R9, R8, R5 R10, R6, R8 R11, R8, R7

23 ENCM 501 W17 Lectures: Slide Set 7 slide 23/56 Sketch of forwarding hardware for 5-stage MIPS32 Here is an incomplete schematic for the EX stage... CLK ID/EX pipeline register GPR GPR LW/SW offset forward control FwdA FwdB ALU data for SW ALU result from EX/MEM reg. LW or ALU result from MEM/WB reg. A B

24 ENCM 501 W17 Lectures: Slide Set 7 slide 24/56 Q1: What should the values of the forward control outputs be in the case where no forwarding is needed? Consider this sequence: LW AND SUB R8, 0(R4) R9, R10, R11 R12, R8, R9 Q2: What should the values of the forward control outputs be when SUB is in the EX stage? Q3: What are the inputs to forward control and how does the forwarding logic work? (We ll give an example or two, not completely specify the logic!)

25 ENCM 501 W17 Lectures: Slide Set 7 slide 25/56 Can forwarding solve all RAW hazards? Consider this sequence: LW ADD R15, 0(R14) R16, R17, R15 Is it possible to solve the hazard by forwarding? If not, what is the most time-efficient way to solve the hazard? Let s make some general remarks about optimal solutions of RAW data hazards.

26 ENCM 501 W17 Lectures: Slide Set 7 slide 26/56 Control hazards: Introduction In a simple pipeline, a control hazard is a difficulty in determining the address to use for the next Instruction Fetch. Look at this example, and assume a version of MIPS32 in which the delay slot instruction is not supposed to be completed if the branch is taken: L1: LW R9, 0(R5) instructions in loop body BEQ OR R8, R0, L1 R16, R10, R0 In the clock cycle after IF for the BEQ instruction, why is doing IF difficult? (There is more than one reason.)

27 ENCM 501 W17 Lectures: Slide Set 7 slide 27/56 Control hazards: Not just for conditional branches! In a conditional branch, there is an obvious motivation to wait for the decision about whether or not to take the branch. But consider the following unconditional updates to the PC: jump within a procedure; procedure call; procedure return. Why do these kinds of instructions generate control hazards? How many cycles might be lost due to such a hazard in a 5-stage pipeline like the one we ve been looking at?

28 ENCM 501 W17 Lectures: Slide Set 7 slide 28/56 Old school solutions to control hazards (1) Stall as long as necessary to ensure that instruction results are correct. This obviously makes CPI worse (higher) if programs have lots of conditional branches and unconditional jumps.

29 ENCM 501 W17 Lectures: Slide Set 7 slide 29/56 Old school solutions to control hazards (2) Delayed jumps and branches. Because it is very difficult to do IF properly in the cycle immediately following a jump or a taken branch, many ISA designs decreed that the successor to a jump or branch would always be completed before the jump or branch target instruction... BEQ R12, R0, L99 ADD R13, R14, R15 # successor more instructions L99: SUB R8, R9, R10 # branch target OR R16, R8, R0 Real MIPS ISAs (as opposed to some hypothetical MIPS-like ISAs in textbooks and lecture slides) have delayed branches and jumps.

30 ENCM 501 W17 Lectures: Slide Set 7 slide 30/56 Outline of Slide Set 7 ILP: Instruction-Level Parallelism Review of simple pipelining Pipeline Hazards Dynamic branch prediction Review of floating-point numbers Fitting FP operations into the 5-stage pipeline In-order versus out-of-order

31 ENCM 501 W17 Lectures: Slide Set 7 slide 31/56 Dynamic branch prediction Dynamic branch prediction is the most important current technology for management of control hazards. A branch prediction circuit is a memory array comparable in size to an L1 I-cache, and somewhat more complex. A branch prediction circuit records the locations of thousands of recently-encountered branches and jumps, along with the addresses of their targets. For each conditional branch, a branch prediction circuit maintains a few bits of information that can be used to predict whether the branch will be taken or untaken.

32 ENCM 501 W17 Lectures: Slide Set 7 slide 32/56 Branch prediction code example p and past_last are of type int*. count is an int. do { if (*p < 0) count++; p++; } while (p!= past_last); p walks through an array of int elements, and count records how many of those elements are negative.

33 ENCM 501 W17 Lectures: Slide Set 7 slide 33/56 Branch prediction code example, continued Assembly language for a MIPS32-like ISA that does not have delayed branch... L1: LW R8, (R4) SLT R9, R0, R8 BEQ R9, R0, L2 # branch if!(*p < 0) ADDIU R25, R25, 1 # count++ L2: ADDIU R4, R4, 4 # p++ BNE R4, R24, L1 # branch if p!= past_last Let s suppose that there are a lot of array elements, and most of them are negative. As the processor runs the loop, what predictions will it learn to make about the BEQ and BNE instructions?

34 ENCM 501 W17 Lectures: Slide Set 7 slide 34/56 Scalar versus Superscalar It seems like the right moment to introduce these terms. A scalar processor core starts no more than one instruction per clock cycle. In some cycles it can t start an instruction, due to a stall caused by a pipeline hazard. All of the pipeline examples so far have been for scalar cores. A superscalar processor core tries to start two or more instructions per clock cycle. When I start talking about superscalar cores, I will let you know.

35 ENCM 501 W17 Lectures: Slide Set 7 slide 35/56 A 5-stage pipeline with dynamic branch prediction Let s review our previous sketch of the 5-stage pipeline, then show how it would be modified to support dynamic branch prediction. An instruction fetch unit encapsulates a PC, an L1 I-cache, and a branch prediction circuit. Both sketches are for scalar systems.

36 ENCM 501 W17 Lectures: Slide Set 7 slide 36/56 A rough sketch of the 5-stage pipeline These are the pieces we saw previously... IF ID EX MEM WB CLK CLK CLK CLK CLK I-mem instr. decode CLK ALU D-mem? PC add GPRs IF/ID ID/EX EX/MEM MEM/WB

37 ENCM 501 W17 Lectures: Slide Set 7 slide 37/56 5-stage pipeline with dynamic branch prediction Note that a monster has moved into the IF stage... IF ID EX MEM WB CLK CLK CLK CLK CLK instr. decode instruction fetch unit CLK ALU D-mem? GPRs IF/ID ID/EX EX/MEM MEM/WB

38 ENCM 501 W17 Lectures: Slide Set 7 slide 38/56 Scalar performance with dynamic branch prediction If the branch predictor does a good job, CPI will be very close to 1. What are two reasons why, for most programs, CPI will be somewhat greater than 1?

39 ENCM 501 W17 Lectures: Slide Set 7 slide 39/56 Outline of Slide Set 7 ILP: Instruction-Level Parallelism Review of simple pipelining Pipeline Hazards Dynamic branch prediction Review of floating-point numbers Fitting FP operations into the 5-stage pipeline In-order versus out-of-order

40 ENCM 501 W17 Lectures: Slide Set 7 slide 40/56 A quick, incomplete review of floating-point numbers A lot of textbook examples use floating-point instructions, so a brief review might be a good idea. Essentially, floating-point is a base two version of scientific notation. Here s an example of scientific notation: The mass of the earth is about kg, more conveniently written as kg.

41 ENCM 501 W17 Lectures: Slide Set 7 slide 41/56 Any nonzero real number can be written as sign 2 exponent (1 + fraction), where the exponent is an integer and 0 fraction < 1.0. If we have a finite number of exponent bits, that will limit the magnitude range of the numbers we can represent. With a finite number of fraction bits, most real numbers can only be approximated floating-point representation involves rounding error. For a computer to work with floating-point numbers, we need a way to organize sign, exponent, and fraction bits into fixed-size chunks...

42 ENCM 501 W17 Lectures: Slide Set 7 slide 42/56 Bit fields in 64-bit floating-point exponent bits sign bit 52 fraction bits Sign bit: 0 for positive, 1 for negative. Exponent: Uses a bias of two = 1023 ten. Example bit patterns: means the exponent is zero; means the exponent is 1; means the exponent is +1.

43 ENCM 501 W17 Lectures: Slide Set 7 slide 43/ exponent bits sign bit 52 fraction bits Fraction bits: Only bits from the right side of the binary point are recorded. It is assumed that there is a single 1 bit to the left of the binary point, so that bit need not be recorded. Example: How is ten represented? = = two sign, exponent, and fraction are:

44 ENCM 501 W17 Lectures: Slide Set 7 slide 44/56 In IEEE 754 floating-point formats there are some special bit patterns: zero + NaN not a number. For example in IEEE 754, the result of 1.0/0.0 is +, but the result of 0.0/0.0 is NaN.

45 ENCM 501 W17 Lectures: Slide Set 7 slide 45/56 FP multiplication If A and B are nonzero, then A B is signa signb 2 (exponenta + exponentb) (1 + fractiona) (1 + fractionb) To do an FP multiplication, a logic circuit first has to check that operands are not zero or other special bit patterns. If the operands aren t special, the step that costs the most time (and energy) is the 53-bit-by-53-bit integer multiplication for (1 + fractiona) (1 + fractionb). At the end, there must be rounding, exponent adjustment, and a check for underflow or overflow.

46 ENCM 501 W17 Lectures: Slide Set 7 slide 46/56 Will FP multiplication fit into a single clock cycle? No! An example in textbook Section C.5 suggests a latency of 7 clock cycles for FP multiplication. The same example suggests a latency of 4 clock cycles for FP addition or subtraction, which are easier than FP multiplication, but much more complicated than integer addition or subtraction. Those numbers are examples. Together, Moore s Law and the ingenuity of circuit designers imply that the latencies of FP arithmetic operations vary from year to year and from one design to another.

47 ENCM 501 W17 Lectures: Slide Set 7 slide 47/56 Outline of Slide Set 7 ILP: Instruction-Level Parallelism Review of simple pipelining Pipeline Hazards Dynamic branch prediction Review of floating-point numbers Fitting FP operations into the 5-stage pipeline In-order versus out-of-order

48 ENCM 501 W17 Lectures: Slide Set 7 slide 48/56 Fitting FP operations into the 5-stage pipeline Actually, this applies to fitting in integer multiplication and integer division as well. Let s follow the textbook example: 7-cycle latency for FP or integer multiplication 4-cycle latency for FP addition 24-cycle latency for FP or integer division (Note: Division is notoriously hard to do fast in digital logic!) We are going to have to give up on our nice, easy 1-cycle EX stage in the middle of the 5-stage pipeline!

49 ENCM 501 W17 Lectures: Slide Set 7 slide 49/56 Let s make some notes about this picture... Integer unit EX M1 FP/integer multiply M2 M3 M4 M5 M6 M7 IF ID MEM WB FP adder A1 A2 A3 A4 FP/integer divider DIV Image is Figure C.35 from Hennessy J. L. and Patterson D. A., Computer Architecture: A Quantitative Approach, 5nd ed., c 2012, Elsevier, Inc.

50 ENCM 501 W17 Lectures: Slide Set 7 slide 50/56 Attention... The picture on slide 49 makes it clear that the simple 5-stage model used to introduce pipelining hides some important real-world difficulties! But the picture is still hiding one of the major difficulties in modern computer design. (Well, actually, it s hiding more than one such difficulty.) What is the most glaring oversimplification in the picture?

51 ENCM 501 W17 Lectures: Slide Set 7 slide 51/56 Quick overview of MIPS FP instructions Many versions of the MIPS ISA have bit floating-point registers: F0, F2, F4,..., F30 note use of even numbers only for FPRs. (Newer ISA versions have bit FPRs.) F0 is not special. Unlike the GPR R0, F0 is not hard-wired to have a value of 0.0.

52 ENCM 501 W17 Lectures: Slide Set 7 slide 52/56 Loads, stores and arithmetic are easy to understand. Here is a very short example: L.D F2, 0(R4) # load L.D F4, 0(R5) # load MUL.D F6, F2, F4 # multiply S.D F6, 0(R7) # store Note the use of GPRs for addresses. Remember, memory addresses are integers! The suffix.d is for double precision. Use.S instead to work with with 32-bit single precision FP numbers. To understand examples in ENCM 501, we do not need to know the details of instructions for FP comparison, branching on FP comparison results, or converting between integer and FP formats.

53 ENCM 501 W17 Lectures: Slide Set 7 slide 53/56 Outline of Slide Set 7 ILP: Instruction-Level Parallelism Review of simple pipelining Pipeline Hazards Dynamic branch prediction Review of floating-point numbers Fitting FP operations into the 5-stage pipeline In-order versus out-of-order

54 ENCM 501 W17 Lectures: Slide Set 7 slide 54/56 In-order versus out-of-order In-order execution of instructions implies that instructions are processed in the same order that they would be in a hypothetical computer that always completes one instruction before starting the next. The simple 5-stage pipeline is in-order, even though there are usually 5 instructions in flight within the pipeline. (What about instructions that get into the 5-stage pipeline but get cancelled due to a branch?) Out-of-order execution implies that start and completion of instructions is often but not always in-order.

55 ENCM 501 W17 Lectures: Slide Set 7 slide 55/56 5-stage pipeline with variable-length EX stage This pipeline always starts instructions in-order. This is known as in-order issue of instructions. However, there is a design choice to be made: Should we allow instructions to complete out-of-order? What are the advantages and disadvantages of forcing instruction completion to be in-order? What are some challenges created by allowing out-of-order completion?

56 ENCM 501 W17 Lectures: Slide Set 7 slide 56/56 More about out-of-order execution... In the next slide set, we ll look in detail about hazards related to out-of-order execution. Then we ll look at an organization for out-of-order execution called Tomasulo s algorithm, which solves RAW hazards, in a way that is interesting to compare to forwarding solves so-called WAW and WAR hazards, which can t happen with in-order execution deals effectively with variable latencies related to different kinds of arithmetic and variable latencies in memory access due TLB and cache misses

Slides for Lecture 15

Slides for Lecture 15 Slides for Lecture 15 ENCM 501: Principles of Computer Architecture Winter 2014 Term Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary 6 March,

More information

Slide Set 8. for ENCM 501 in Winter Steve Norman, PhD, PEng

Slide Set 8. for ENCM 501 in Winter Steve Norman, PhD, PEng Slide Set 8 for ENCM 501 in Winter 2018 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary March 2018 ENCM 501 Winter 2018 Slide Set 8 slide

More information

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number

More information

LECTURE 3: THE PROCESSOR

LECTURE 3: THE PROCESSOR LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU

More information

Advanced Computer Architecture

Advanced Computer Architecture Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata What is a Processors Architecture Instruction Set Architecture (ISA) Describes

More information

Contents. Slide Set 1. About these slides. Outline of Slide Set 1. Typographical conventions: Italics. Typographical conventions. About these slides

Contents. Slide Set 1. About these slides. Outline of Slide Set 1. Typographical conventions: Italics. Typographical conventions. About these slides Slide Set 1 for ENCM 369 Winter 2014 Lecture Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 2014 ENCM 369 W14 Section

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

The Processor Pipeline. Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes.

The Processor Pipeline. Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes. The Processor Pipeline Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes. Pipeline A Basic MIPS Implementation Memory-reference instructions Load Word (lw) and Store Word (sw) ALU instructions

More information

Multi-cycle Instructions in the Pipeline (Floating Point)

Multi-cycle Instructions in the Pipeline (Floating Point) Lecture 6 Multi-cycle Instructions in the Pipeline (Floating Point) Introduction to instruction level parallelism Recap: Support of multi-cycle instructions in a pipeline (App A.5) Recap: Superpipelining

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

Instruction Level Parallelism. ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction

Instruction Level Parallelism. ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction Instruction Level Parallelism ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction Basic Block A straight line code sequence with no branches in except to the entry and no branches

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined

More information

ECE154A Introduction to Computer Architecture. Homework 4 solution

ECE154A Introduction to Computer Architecture. Homework 4 solution ECE154A Introduction to Computer Architecture Homework 4 solution 4.16.1 According to Figure 4.65 on the textbook, each register located between two pipeline stages keeps data shown below. Register IF/ID

More information

Instruction Pipelining Review

Instruction Pipelining Review Instruction Pipelining Review Instruction pipelining is CPU implementation technique where multiple operations on a number of instructions are overlapped. An instruction execution pipeline involves a number

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Chapter 4 The Processor 1. Chapter 4A. The Processor

Chapter 4 The Processor 1. Chapter 4A. The Processor Chapter 4 The Processor 1 Chapter 4A The Processor Chapter 4 The Processor 2 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Slide Set 9. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng

Slide Set 9. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng Slide Set 9 for ENCM 369 Winter 2018 Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary March 2018 ENCM 369 Winter 2018 Section 01

More information

Slide Set 1 (corrected)

Slide Set 1 (corrected) Slide Set 1 (corrected) for ENCM 369 Winter 2018 Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary January 2018 ENCM 369 Winter 2018

More information

Slide Set 8. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng

Slide Set 8. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng Slide Set 8 for ENCM 369 Winter 2018 Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary March 2018 ENCM 369 Winter 2018 Section 01

More information

Slide Set 5. for ENCM 369 Winter 2014 Lecture Section 01. Steve Norman, PhD, PEng

Slide Set 5. for ENCM 369 Winter 2014 Lecture Section 01. Steve Norman, PhD, PEng Slide Set 5 for ENCM 369 Winter 2014 Lecture Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 2014 ENCM 369 W14 Section

More information

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor. COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction

More information

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture The Processor Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut CSE3666: Introduction to Computer Architecture Introduction CPU performance factors Instruction count

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science Cases that affect instruction execution semantics

More information

#1 #2 with corrections Monday, March 12 7:00pm to 8:30pm. Please do not write your U of C ID number on this cover page.

#1 #2 with corrections Monday, March 12 7:00pm to 8:30pm. Please do not write your U of C ID number on this cover page. page 1 of 6 University of Calgary Department of Electrical and Computer Engineering ENCM 369: Computer Organization Lecture Instructors: Steve Norman and Norm Bartley Winter 2018 MIDTERM TEST #1 #2 with

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

Midnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4

Midnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4 IC220 Set #9: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life Return to Chapter 4 Midnight Laundry Task order A B C D 6 PM 7 8 9 0 2 2 AM 2 Smarty Laundry Task order A B C D 6 PM

More information

Slide Set 7 for Lecture Section 01

Slide Set 7 for Lecture Section 01 Slide Set 7 for Lecture Section 01 for ENCM 369 Winter 2017 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary February 2017 ENCM 369 Winter

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Chapter 4 The Processor 1. Chapter 4B. The Processor

Chapter 4 The Processor 1. Chapter 4B. The Processor Chapter 4 The Processor 1 Chapter 4B The Processor Chapter 4 The Processor 2 Control Hazards Branch determines flow of control Fetching next instruction depends on branch outcome Pipeline can t always

More information

Instruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31

Instruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31 4.16 Exercises 419 Exercise 4.11 In this exercise we examine in detail how an instruction is executed in a single-cycle datapath. Problems in this exercise refer to a clock cycle in which the processor

More information

Slide Set 11. for ENCM 369 Winter 2015 Lecture Section 01. Steve Norman, PhD, PEng

Slide Set 11. for ENCM 369 Winter 2015 Lecture Section 01. Steve Norman, PhD, PEng Slide Set 11 for ENCM 369 Winter 2015 Lecture Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 2015 ENCM 369 W15 Section

More information

COSC 6385 Computer Architecture - Pipelining

COSC 6385 Computer Architecture - Pipelining COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage

More information

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism

More information

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3. Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =2n/05n+15 2n/0.5n 1.5 4 = number of stages 4.5 An Overview

More information

CMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3. Complications With Long Instructions

CMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3. Complications With Long Instructions CMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3 Long Instructions & MIPS Case Study Complications With Long Instructions So far, all MIPS instructions take 5 cycles But haven't talked

More information

Minimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline

Minimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline Instruction Pipelining Review: MIPS In-Order Single-Issue Integer Pipeline Performance of Pipelines with Stalls Pipeline Hazards Structural hazards Data hazards Minimizing Data hazard Stalls by Forwarding

More information

ECE331: Hardware Organization and Design

ECE331: Hardware Organization and Design ECE331: Hardware Organization and Design Lecture 35: Final Exam Review Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Material from Earlier in the Semester Throughput and latency

More information

Contents Slide Set 9. Final Notes on Textbook Chapter 7. Outline of Slide Set 9. More about skipped sections in Chapter 7. Outline of Slide Set 9

Contents Slide Set 9. Final Notes on Textbook Chapter 7. Outline of Slide Set 9. More about skipped sections in Chapter 7. Outline of Slide Set 9 slide 2/41 Contents Slide Set 9 for ENCM 369 Winter 2014 Lecture Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 2014

More information

Computer Architecture

Computer Architecture Lecture 3: Pipelining Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture Measurements and metrics : Performance, Cost, Dependability, Power Guidelines and principles in

More information

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Consider: a = b + c; d = e - f; Assume loads have a latency of one clock cycle:

More information

Slides for Lecture 6

Slides for Lecture 6 Slides for Lecture 6 ENCM 501: Principles of Computer Architecture Winter 2014 Term Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary 28 January,

More information

Instruction Pipelining

Instruction Pipelining Instruction Pipelining Simplest form is a 3-stage linear pipeline New instruction fetched each clock cycle Instruction finished each clock cycle Maximal speedup = 3 achieved if and only if all pipe stages

More information

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017 Advanced Parallel Architecture Lessons 5 and 6 Annalisa Massini - Pipelining Hennessy, Patterson Computer architecture A quantitive approach Appendix C Sections C.1, C.2 Pipelining Pipelining is an implementation

More information

HY425 Lecture 05: Branch Prediction

HY425 Lecture 05: Branch Prediction HY425 Lecture 05: Branch Prediction Dimitrios S. Nikolopoulos University of Crete and FORTH-ICS October 19, 2011 Dimitrios S. Nikolopoulos HY425 Lecture 05: Branch Prediction 1 / 45 Exploiting ILP in hardware

More information

Pipelined Processors. Ideal Pipelining. Example: FP Multiplier. 55:132/22C:160 Spring Jon Kuhl 1

Pipelined Processors. Ideal Pipelining. Example: FP Multiplier. 55:132/22C:160 Spring Jon Kuhl 1 55:3/C:60 Spring 00 Pipelined Design Motivation: Increase processor throughput with modest increase in hardware. Bandwidth or Throughput = Performance Pipelined Processors Chapter Bandwidth (BW) = no.

More information

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University

More information

Basic Instruction Timings. Pipelining 1. How long would it take to execute the following sequence of instructions?

Basic Instruction Timings. Pipelining 1. How long would it take to execute the following sequence of instructions? Basic Instruction Timings Pipelining 1 Making some assumptions regarding the operation times for some of the basic hardware units in our datapath, we have the following timings: Instruction class Instruction

More information

CMSC 411 Practice Exam 1 w/answers. 1. CPU performance Suppose we have the following instruction mix and clock cycles per instruction.

CMSC 411 Practice Exam 1 w/answers. 1. CPU performance Suppose we have the following instruction mix and clock cycles per instruction. CMSC 4 Practice Exam w/answers General instructions. Be complete, yet concise. You may leave arithmetic expressions in any form that a calculator could evaluate.. CPU performance Suppose we have the following

More information

CS 352H Computer Systems Architecture Exam #1 - Prof. Keckler October 11, 2007

CS 352H Computer Systems Architecture Exam #1 - Prof. Keckler October 11, 2007 CS 352H Computer Systems Architecture Exam #1 - Prof. Keckler October 11, 2007 Name: Solutions (please print) 1-3. 11 points 4. 7 points 5. 7 points 6. 20 points 7. 30 points 8. 25 points Total (105 pts):

More information

LECTURE 10. Pipelining: Advanced ILP

LECTURE 10. Pipelining: Advanced ILP LECTURE 10 Pipelining: Advanced ILP EXCEPTIONS An exception, or interrupt, is an event other than regular transfers of control (branches, jumps, calls, returns) that changes the normal flow of instruction

More information

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri Department of Computer and IT Engineering University of Kurdistan Computer Architecture Pipelining By: Dr. Alireza Abdollahpouri Pipelined MIPS processor Any instruction set can be implemented in many

More information

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception Outline A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception 1 4 Which stage is the branch decision made? Case 1: 0 M u x 1 Add

More information

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Pipeline Hazards Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Hazards What are hazards? Situations that prevent starting the next instruction

More information

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions.

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions. Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions Stage Instruction Fetch Instruction Decode Execution / Effective addr Memory access Write-back Abbreviation

More information

Laboratory Pipeline MIPS CPU Design (2): 16-bits version

Laboratory Pipeline MIPS CPU Design (2): 16-bits version Laboratory 10 10. Pipeline MIPS CPU Design (2): 16-bits version 10.1. Objectives Study, design, implement and test MIPS 16 CPU, pipeline version with the modified program without hazards Familiarize the

More information

ECE260: Fundamentals of Computer Engineering

ECE260: Fundamentals of Computer Engineering ECE260: Fundamentals of Computer Engineering Pipelined Datapath and Control James Moscola Dept. of Engineering & Computer Science York College of Pennsylvania ECE260: Fundamentals of Computer Engineering

More information

Hardware-Based Speculation

Hardware-Based Speculation Hardware-Based Speculation Execute instructions along predicted execution paths but only commit the results if prediction was correct Instruction commit: allowing an instruction to update the register

More information

ECE331: Hardware Organization and Design

ECE331: Hardware Organization and Design ECE331: Hardware Organization and Design Lecture 27: Midterm2 review Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Midterm 2 Review Midterm will cover Section 1.6: Processor

More information

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science Pipeline Overview Dr. Jiang Li Adapted from the slides provided by the authors Outline MIPS An ISA for Pipelining 5 stage pipelining Structural and Data Hazards Forwarding Branch Schemes Exceptions and

More information

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University Lecture 9 Pipeline Hazards Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee18b 1 Announcements PA-1 is due today Electronic submission Lab2 is due on Tuesday 2/13 th Quiz1 grades will

More information

ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5)

ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5) Instruction-Level Parallelism and its Exploitation: PART 1 ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5) Project and Case

More information

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards CISC 662 Graduate Computer Architecture Lecture 6 - Hazards Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 3. Instruction-Level Parallelism and Its Exploitation

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 3. Instruction-Level Parallelism and Its Exploitation Computer Architecture A Quantitative Approach, Fifth Edition Chapter 3 Instruction-Level Parallelism and Its Exploitation Introduction Pipelining become universal technique in 1985 Overlaps execution of

More information

Pipelining. Pipeline performance

Pipelining. Pipeline performance Pipelining Basic concept of assembly line Split a job A into n sequential subjobs (A 1,A 2,,A n ) with each A i taking approximately the same time Each subjob is processed by a different substation (or

More information

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle? CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:

More information

ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 6 Pipelining Part 1

ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 6 Pipelining Part 1 ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 6 Pipelining Part 1 Benjamin Lee Electrical and Computer Engineering Duke University www.duke.edu/~bcl15 www.duke.edu/~bcl15/class/class_ece252fall12.html

More information

Lecture 7 Pipelining. Peng Liu.

Lecture 7 Pipelining. Peng Liu. Lecture 7 Pipelining Peng Liu liupeng@zju.edu.cn 1 Review: The Single Cycle Processor 2 Review: Given Datapath,RTL -> Control Instruction Inst Memory Adr Op Fun Rt

More information

Full Datapath. Chapter 4 The Processor 2

Full Datapath. Chapter 4 The Processor 2 Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory

More information

EIE/ENE 334 Microprocessors

EIE/ENE 334 Microprocessors EIE/ENE 334 Microprocessors Lecture 6: The Processor Week #06/07 : Dejwoot KHAWPARISUTH Adapted from Computer Organization and Design, 4 th Edition, Patterson & Hennessy, 2009, Elsevier (MK) http://webstaff.kmutt.ac.th/~dejwoot.kha/

More information

Lecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27)

Lecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27) Lecture Topics Today: Data and Control Hazards (P&H 4.7-4.8) Next: continued 1 Announcements Exam #1 returned Milestone #5 (due 2/27) Milestone #6 (due 3/13) 2 1 Review: Pipelined Implementations Pipelining

More information

Instruction Frequency CPI. Load-store 55% 5. Arithmetic 30% 4. Branch 15% 4

Instruction Frequency CPI. Load-store 55% 5. Arithmetic 30% 4. Branch 15% 4 PROBLEM 1: An application running on a 1GHz pipelined processor has the following instruction mix: Instruction Frequency CPI Load-store 55% 5 Arithmetic 30% 4 Branch 15% 4 a) Determine the overall CPI

More information

ENCM 369 Winter 2018 Lab 9 for the Week of March 19

ENCM 369 Winter 2018 Lab 9 for the Week of March 19 page 1 of 9 ENCM 369 Winter 2018 Lab 9 for the Week of March 19 Steve Norman Department of Electrical & Computer Engineering University of Calgary March 2018 Lab instructions and other documents for ENCM

More information

DEE 1053 Computer Organization Lecture 6: Pipelining

DEE 1053 Computer Organization Lecture 6: Pipelining Dept. Electronics Engineering, National Chiao Tung University DEE 1053 Computer Organization Lecture 6: Pipelining Dr. Tian-Sheuan Chang tschang@twins.ee.nctu.edu.tw Dept. Electronics Engineering National

More information

Instruction Pipelining

Instruction Pipelining Instruction Pipelining Simplest form is a 3-stage linear pipeline New instruction fetched each clock cycle Instruction finished each clock cycle Maximal speedup = 3 achieved if and only if all pipe stages

More information

Appendix C: Pipelining: Basic and Intermediate Concepts

Appendix C: Pipelining: Basic and Intermediate Concepts Appendix C: Pipelining: Basic and Intermediate Concepts Key ideas and simple pipeline (Section C.1) Hazards (Sections C.2 and C.3) Structural hazards Data hazards Control hazards Exceptions (Section C.4)

More information

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) Chentao Wu 吴晨涛 Associate Professor Dept. of Computer Science and Engineering Shanghai Jiao Tong University SEIEE Building

More information

Complications with long instructions. CMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3. How slow is slow?

Complications with long instructions. CMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3. How slow is slow? Complications with long instructions CMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3 Long Instructions & MIPS Case Study So far, all MIPS instructions take 5 cycles But haven't talked

More information

Pipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ...

Pipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ... CHAPTER 6 1 Pipelining Instruction class Instruction memory ister read ALU Data memory ister write Total (in ps) Load word 200 100 200 200 100 800 Store word 200 100 200 200 700 R-format 200 100 200 100

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 3 Instruction-Level Parallelism and Its Exploitation 1 Branch Prediction Basic 2-bit predictor: For each branch: Predict taken or not

More information

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version MIPS Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14 MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK

More information

LECTURE 9. Pipeline Hazards

LECTURE 9. Pipeline Hazards LECTURE 9 Pipeline Hazards PIPELINED DATAPATH AND CONTROL In the previous lecture, we finalized the pipelined datapath for instruction sequences which do not include hazards of any kind. Remember that

More information

14:332:331 Pipelined Datapath

14:332:331 Pipelined Datapath 14:332:331 Pipelined Datapath I n s t r. O r d e r Inst 0 Inst 1 Inst 2 Inst 3 Inst 4 Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently the clock cycle must be timed to accommodate

More information

Integer Multiplication and Division

Integer Multiplication and Division Integer Multiplication and Division for ENCM 369: Computer Organization Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 208 Integer

More information

Instruction-Level Parallelism and Its Exploitation

Instruction-Level Parallelism and Its Exploitation Chapter 2 Instruction-Level Parallelism and Its Exploitation 1 Overview Instruction level parallelism Dynamic Scheduling Techniques es Scoreboarding Tomasulo s s Algorithm Reducing Branch Cost with Dynamic

More information

EE557--FALL 1999 MAKE-UP MIDTERM 1. Closed books, closed notes

EE557--FALL 1999 MAKE-UP MIDTERM 1. Closed books, closed notes NAME: STUDENT NUMBER: EE557--FALL 1999 MAKE-UP MIDTERM 1 Closed books, closed notes Q1: /1 Q2: /1 Q3: /1 Q4: /1 Q5: /15 Q6: /1 TOTAL: /65 Grade: /25 1 QUESTION 1(Performance evaluation) 1 points We are

More information

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM Computer Architecture Computer Science & Engineering Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware

More information

COMP2611: Computer Organization. The Pipelined Processor

COMP2611: Computer Organization. The Pipelined Processor COMP2611: Computer Organization The 1 2 Background 2 High-Performance Processors 3 Two techniques for designing high-performance processors by exploiting parallelism: Multiprocessing: parallelism among

More information

Page # CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Michela Taufer

Page # CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Michela Taufer CISC 662 Graduate Computer Architecture Lecture 8 - ILP 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer Architecture,

More information

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Single-Cycle Design Problems Assuming fixed-period clock every instruction datapath uses one

More information

Slide Set 5. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng

Slide Set 5. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng Slide Set 5 for ENCM 369 Winter 2018 Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary February 2018 ENCM 369 Winter 2018 Section

More information

Pipelining. CSC Friday, November 6, 2015

Pipelining. CSC Friday, November 6, 2015 Pipelining CSC 211.01 Friday, November 6, 2015 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory register file ALU data memory register file Not

More information

Computer Architecture Spring 2016

Computer Architecture Spring 2016 Computer Architecture Spring 2016 Lecture 02: Introduction II Shuai Wang Department of Computer Science and Technology Nanjing University Pipeline Hazards Major hurdle to pipelining: hazards prevent the

More information

ECE232: Hardware Organization and Design

ECE232: Hardware Organization and Design ECE232: Hardware Organization and Design Lecture 17: Pipelining Wrapup Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Outline The textbook includes lots of information Focus on

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information