CS 152 Computer Architecture and Engineering Lecture 4 Pipelining

Size: px
Start display at page:

Download "CS 152 Computer Architecture and Engineering Lecture 4 Pipelining"

Transcription

1 CS 152 Computer rchitecture and Engineering Lecture 4 Pipelining John Lazzaro (not a prof - John is always OK) T: Eric Love www-inst.eecs.berkeley.edu/~cs152/ Play: 1

2 otorola Next week we will return to the microcode story... Today is the anti-microcode story - pipelining! 2

3 RISC CPU Caches Data Path and Control 3

4 Today: Pipelining Pipelining: an idea from assembly line production applied to CPU design Why pipelining is hard: data hazards, control hazards, structural hazards. Visualizing pipelines to evaluate hazard detection and resolution. Short Break. tool kit for hazard resolution. 4

5 Starting Point: Performance Equation Seconds Program Instructions Program Cycles Instruction Seconds Cycle Goal is to optimize execution time, not individual equation terms. achines are optimized with respect to program workloads. The CPI of the program. Reflects the program s instruction mix. Clock period. Optimize jointly with machine CPI. 5

6 Pipelining 6

7 + Recall: Our single-cycle processor Challenge: Speed up clock while keeping CPI == 1 Seconds Program Instructions Program Cycles Instruction Seconds Cycle 0x4 CPI == 1 This is good. Slow. This is bad. D PC Q Instr em Data RegFile rs1 rs2 rd1 ws rd2 wd op L U Data emory Dout Din emtoreg Ext 7

8 Recall: n R-format CPU design Decode fields to get : DD $8 $9 $10 opcode rs rt rd shamt funct Logic op RegFile rs1 rs2 rd1 ws rd2 wd L U 8

9 Reminder: How data flows after posedge PC Instr em + D Q Data 0x4 Logic op RegFile rs1 rs2 rd1 ws rd2 wd L U 9

10 Next posedge: Update state and repeat PC D Q RegFile rs1 rs2 rd1 ws rd2 wd 10

11 Observation: Logic idle most of cycle For most of cycle, LU is either waiting for its inputs, or holding its output Ideal: a CPU architecture where each part is always working. 0x4 + D PC Q Instr em Data RegFile rs1 rs2 rd1 ws rd2 wd op L U Data emory Dout Din emtoreg Ext 11

12 Inspiration: utomobile assembly line ssembly line moves on a steady clock. Each station does the same task on each car. The clock Car body shell erge station Bolting station Car chassis 12

13 Inspiration: utomobile assembly line Simpler station tasks more cars per hour. Simple tasks take less time, clock is faster. 13

14 Inspiration: utomobile assembly line Line speed limited by slowest task. ost efficient if all tasks take same time to do 14

15 Inspiration: utomobile assembly line Simpler tasks, complex car long line! These lines go 24 x 7, and rarely shut down. 15

16 Lessons from car assembly lines Faster line movement yields more cars per hour off the line. Faster line movement requires more stages, each doing simpler tasks. To maximize efficiency, all stages should take same amount of time (if not, workers in fast stages are idle) Filling, flushing, and stalling assembly line are all bad news. 16

17 Key analogy: The instruction is the car Pipeline Stage #1 Stage #2 Stage #3 Stage #4 Stage #5 Instruction Fetch + 0x4 Controls hardware in stage 2 Controls hardware in stage 3 Controls hardware in stage 4 Controls hardware in stage 5 PC Instr em D Q Data Data-stationary control 17

18 + Example: Decode & Register Fetch stage Pipeline Stage #1 Stage #2 Stage #3 Instr Fetch Decode & Reg Fetch SUB R10, R9,R8 OR R7,R6,R5 DD R4,R3,R2 0x4 sample program D PC Q Instr em Data RegFile rs1 rs2 rd1 ws rd2 wd Ext B DD R4,R3,R2 OR R7,R6,R5 SUB R10,R9,R8 R s chosen so that instructions are independent - like cars on the line. 18

19 Performance Equation and Pipelining + Seconds Program Instructions Program Cycles Instruction Seconds Cycle D PC Q Instr Fetch Decode & Reg Fetch Stage #3 0x4 Instr em Data CPI == 1 Once pipe is fill, one instruction completes per cycle rs1 rs2 ws wd RegFile rd1 rd2 Ext B Clock period is shorter Less work to do in each cycle To get shortest clock period, balance the work to do in each pipeline stage. 19

20 Hazards: n instruction is not a car... + Stage #1 Stage #2 Stage #3 Instr Fetch Decode & Reg Fetch D PC Q 0x4 Instr em Data OR R5,R4,R2... wrong value of R4 fetched from RegFile, contract with programmer broken! Oops! rs1 rs2 ws wd RegFile rd1 rd2 Ext B DD R4,R3,R2 R4 not written yet... New sample program DD R4,R3,R2 OR R5,R4,R2 n example of a hazard -- we must (1) detect and (2) resolve all hazards to make a CPU that matches IS 20

21 Performance Equation and Hazards + Seconds Program Instructions Program Cycles Instruction Seconds Cycle D PC Q Instr Fetch Decode & Reg Fetch Stage #3 0x4 Instr em Data Some ways to cope with hazards makes CPI > 1 stalling pipeline rs1 rs2 ws wd RegFile rd1 rd2 Ext B dded logic to detect and resolve hazards increases clock period Software slows the machine down Seymour Cray 21

22 + (simplified) 5-stage pipelined CPU 1 2 IF Stage Instr Fetch ID/RF Stage Decode & Reg Fetch 3 EX Stage Execution 4 E Stage emory 5 WB Write Back, emtoreg op D PC Q 0x4 Instr em Data ux,logic RegFile rs1 rs2 rd1 ws rd2 wd L U Y Data emory Dout Din emtoreg R Ext B 22

23 + Sometimes, contract is a challenge IF Stage Instr Fetch Sample Program LW R4,0(R0) OR R5,R4,R2 1 2 ID/RF Stage Decode & Reg Fetch OR R5,R4,R2... but we haven t even started the load yet! 3 EX Stage Execution LW R4, 0(R0) 4 E Stage emory, emtoreg 5 WB Write Back op D PC Q 0x4 Instr em Data ux,logic RegFile rs1 rs2 rd1 ws rd2 wd L U Y Data emory Dout Din emtoreg R Ext B One approach: change the contract! 23

24 From Lecture 1: Delayed Loads... Instruction Fetch Instruction Decode Operand Fetch Execute Fetch the load inst from memory opcode rs rt offset I-Format Decode fields to get : LW $1, ($2) Retrieve register value: $2 Compute memory address: + $2 Result Store Next Instruction Load memory address contents into: $1 Prepare to fetch instr that follows the LW in the program. Depending on load semantics, new $1 is visible to that instr, or not until the following instr ( delayed loads ). 24

25 + fter we change the contract... D PC IF Stage Instr Fetch Sample Program LW R4,0(R0) OR R5,R4,R2 Q 0x4 1 2 Instr em Data ID/RF Stage Decode & Reg Fetch OR R5,R4,R2... delayed load contract does not guarantee new R4 is seen. ux,logic rs1 rs2 ws wd RegFile rd1 rd2 EX Stage Execution 3 LW R4, 0(R0) op L U Y, emtoreg Data emory Din 4 E Stage emory Dout emtoreg R 5 WB Write Back Ext B Only partially solves problem... soon, we finish the story. 25

26 Visualizing Pipelines 26

27 Pipeline Representation #1: Timeline IF (Fetch) ID (Decode) EX (LU) E WB 0x4 + PC Instr em Good for visualizing pipeline fills. D Q Data Sample Program I1: I2: I3: I4: I5: DD R4,R3,R2 ND R6,R5,R4 SUB R1,R9,R8 XOR R3,R2,R1 OR R7,R6,R5 Time: Inst I1: I2: I3: I4: I5: I6: t1 t2 t3 t4 t5 t6 t7 t8 IF ID IF EX ID IF Pipeline is full E EX ID IF WB E EX ID IF WB E EX ID IF WB E EX ID WB E EX 27

28 Representation #2: Resource Usage + IF (Fetch) ID (Decode) EX (LU) E WB 0x4 PC Instr em Good for visualizing pipeline stalls. D Q Data Sample Program I1: I2: I3: I4: I5: DD R4,R3,R2 ND R6,R5,R4 SUB R1,R9,R8 XOR R3,R2,R1 OR R7,R6,R5 Time: Stage IF: ID: EX: E: WB: t1 t2 t3 t4 t5 t6 t7 t8 I1 I2 I1 I3 I2 I1 Pipeline is full I4 I3 I2 I1 I5 I4 I3 I2 I1 I6 I5 I4 I3 I2 I7 I6 I5 I4 I3 I8 I7 I6 I5 I4 28

29 Hazard Taxonomy 29

30 Structural Hazards Several pipeline stages need to use the same hardware resource at the same time. Solution #1: dd extra copies of the resource (only works sometime). Solution #2: Change resource so that it can handle concurrent use. Solution #3: Stages take turns by stalling parts of the pipeline. 30

31 Structural Hazard Example: One emory IF Stage ID/RF Stage EX Stage E Stage WB Used by IF stage and E stage ux,logic, emtoreg op PC Data emory Dout Din RegFile rs1 rs2 rd1 ws rd2 wd L U To branch logic Y emtoreg R Ext B 31

32 + solution: Extra copies of memory 1 2 IF Stage Instr Fetch ID/RF Stage Decode & Reg Fetch 3 EX Stage Execution 4 E Stage emory 5 WB Write Back, emtoreg ux,logic op D PC Q 0x4 Instr em Data RegFile rs1 rs2 rd1 ws rd2 wd L U Y Data emory Dout Din emtoreg R Ext B I and D caches are a hybrid solution

33 + lternatively: Concurrent use IF Stage Instr Fetch ID/RF Stage Decode & Reg Fetch 3 EX Stage Execution 4 E Stage emory 5 WB Write Back, emtoreg ux,logic op D PC Q 0x4 Instr em Data RegFile rs1 rs2 rd1 ws rd2 wd L U Y Data emory Dout Din emtoreg R Ext B ID and WB stages use register file in same clock cycle 33

34 Data Hazards: 3 Types (RW, WR, WW) Several pipeline stages read or write the same data location in an incompatible way. Read fter Write (RW) hazards. Instruction I2 expects to read a data value written by an earlier instruction, but I2 executes too early and reads the wrong copy of the data. Note data value, not register. Data hazards are possible for any architected state (such as main memory). In practice, main memory hazard avoidance is the job of the memory system. 34

35 Recall: RW example Stage #1 Stage #2 Stage #3 Instr Fetch Decode & Reg Fetch Sample program DD R4,R3,R2 OR R5,R4,R2 + D PC Q 0x4 Instr em Data OR R5,R4,R2... wrong value of R4 fetched from RegFile, contract with programmer broken! Oops! rs1 rs2 ws wd RegFile rd1 rd2 Ext B DD R4,R3,R2 R4 not written yet... This is what we mean when we say Read fter Write (RW) Hazard 35

36 Data Hazards: 3 Types (RW, WR, WW) Write fter Read (WR) hazards. Instruction I2 expects to write over a data value after an earlier instruction I1 reads it. But instead, I2 writes too early, and I1 sees the new value. Write fter Write (WW) hazards. Instruction I2 writes over data an earlier instruction I1 also writes. But instead, I1 writes after I2, and the final data value is incorrect. WR and WW not possible in our 5-stage pipeline. But are possible in other pipeline designs. 36

37 Control Hazards: taken branch/jump + IF (Fetch) ID (Decode) EX (LU) E WB 0x4 D PC Q Instr em Data Note: with branch delay slot, I2 UST complete, I3 UST NOT complete. Sample Program Time: t1 t2 t3 t4 t5 t6 t7 t8 (IS w/o branch Inst EX stage delay slot) I1: IF ID EX E WB computes if I2: IF ID branch is I1: BEQ R4,R3,25 I3: IF taken I2: ND R6,R5,R4 I4: I3: SUB R1,R9,R8 If branch is taken, these I5: instructions UST NOT I6: complete! 37

38 Hazards Recap Structural Hazards Data Hazards (RW, WR, WW) Control Hazards (taken branches and jumps) On each clock cycle, we must detect the presence of all of these hazards, and resolve them before they break the contract with the programmer. 38

39 Break Play: 39

40 Hazard Resolution Tools 40

41 The Hazard Resolution Toolkit Stall earlier instructions in pipeline. Forward results computed in later pipeline stages to earlier stages. dd new hardware or rearrange hardware design to eliminate hazard. Change IS to eliminate hazard. Kill earlier instructions in pipeline. ake hardware handle concurrent requests to eliminate hazard. 41

42 Resolving a RW hazard by stalling Stage #1 Stage #2 Stage #3 Instr Fetch Decode & Reg Fetch Sample program DD R4,R3,R2 OR R5,R4,R2 + D PC Q 0x4 Instr em Data OR R5,R4,R2 Keep executing OR instruction until R4 is ready. Until then, send NOPS to 2/3. rs1 rs2 ws wd RegFile rd1 rd2 DD R4,R3,R2 Let DD proceed to WB stage, so that R4 is written to regfile. New datapath hardware (1) ux into 2/3 to feed in NOP. Freeze PC and until stall is over. Ext B (2) Write enable on PC and 1/2 42

43 The Hazard Resolution Toolkit Stall earlier instructions in pipeline. Forward results computed in later pipeline stages to earlier stages. dd new hardware or rearrange hardware design to eliminate hazard. Change IS to eliminate hazard. Kill earlier instructions in pipeline. ake hardware handle concurrent requests to eliminate hazard. 43

44 Resolving a RW hazard by forwarding + IF Stage Instr Fetch Sample program DD R4,R3,R2 OR R5,R4,R2 0x4 1 2 ID/RF Stage Decode & Reg Fetch OR R5,R4,R2 Just forward it back! EX Stage Execution op L U 3 DD R4,R3,R2 LU computes R4 in the EX stage, so... Y RegFile D PC Q Instr em Data rs1 rs2 ws wd rd1 rd2 Ext B Unlike stalling, does not change CPI. ay hurt cycle time. 44

45 The Hazard Resolution Toolkit Stall earlier instructions in pipeline. Forward results computed in later pipeline stages to earlier stages. dd new hardware or rearrange hardware design to eliminate hazard. Change IS to eliminate hazard. Kill earlier instructions in pipeline. ake hardware handle concurrent requests to eliminate hazard. 45

46 Control Hazards: Fix with more hardware + IF (Fetch) ID (Decode) EX (LU) E WB 0x4 D PC Q Instr em Data If we add hardware, can we move it here? Sample Program Time: t1 t2 t3 t4 t5 t6 t7 t8 (IS w/o branch Inst EX stage delay slot) I1: IF ID EX E WB computes if I2: IF ID branch is I1: BEQ R4,R3,25 I3: IF taken I2: ND R6,R5,R4 I4: I3: SUB R1,R9,R8 If branch is taken, these I5: instructions UST NOT I6: complete! 46

47 + Resolving control hazard with hardware Stage #1 Stage #2 Stage #3 Instr Fetch Decode & Reg Fetch To branch control logic == 0x4 RegFile D PC Q Instr em Data rs1 rs2 ws wd rd1 rd2 Ext B 47

48 Control Hazards: fter more hardware + IF (Fetch) ID (Decode) EX (LU) E WB 0x4 D PC Q Instr em Data If we change IS, can we always let I2 complete ( branch delay slot ) and eliminate the control hazard. Sample Program Time: t1 t2 t3 t4 t5 t6 t7 t8 (IS w/o branch Inst ID stage delay slot) I1: IF ID EX E WB computes if I2: IF branch is I1: BEQ R4,R3,25 I3: taken I2: ND R6,R5,R4 I4: I3: SUB R1,R9,R8 If branch is taken, this I5: instruction UST NOT I6: complete! 48

49 From Lecture 1: BEQ $1,$2,25 Instruction Fetch Instruction Decode Operand Fetch Execute Fetch branch inst from memory opcode rs rt offset I-Format Decode fields to get: BEQ $1, $2, 25 Retrieve register values: $1, $2 Compute if we take branch: $1 == $2? Result Store Next Instruction LWYS prepare to fetch instr that follows the BEQ in the program ( delayed branch ). IF we take branch, the instr we fetch FTER that instruction is PC PC == Program Counter 49

50 The Hazard Resolution Toolkit Stall earlier instructions in pipeline. Forward results computed in later pipeline stages to earlier stages. dd new hardware or rearrange hardware design to eliminate hazard. Change IS to eliminate hazard. Kill earlier instructions in pipeline. ake hardware handle concurrent requests to eliminate hazard. 50

51 Resolve control hazard by killing instr Stage #1 Stage #2 Stage #3 Instr Fetch Decode & Reg Fetch Sample program (no delay slot) J 200 OR R5,R4,R2 + D PC Q 0x4 Instr em Data J 200 Detect J instruction, mux a NOP into 1/2 rs1 rs2 ws wd RegFile rd1 rd2 This hurts CPI. Can we do better? Compute new PC using hardware not shown... Ext 51 B

52 The Hazard Resolution Toolkit Stall earlier instructions in pipeline. Forward results computed in later pipeline stages to earlier stages. dd new hardware or rearrange hardware design to eliminate hazard. Change IS to eliminate hazard. Kill earlier instructions in pipeline. ake hardware handle concurrent requests to eliminate hazard. 52

53 + Structural hazard solution: concurrent use IF Stage Instr Fetch Does not come for free ID/RF Stage Decode & Reg Fetch ux,logic 3 EX Stage Execution 4 E Stage emory, emtoreg 5 WB Write Back op D PC Q 0x4 Instr em Data RegFile rs1 rs2 rd1 ws rd2 wd L U Y Data emory Dout Din emtoreg R Ext B ID and WB stages use register file in same clock cycle 53

54 Hazard Diagnosis 54

55 Data Hazards: Read fter Write Read fter Write (RW) hazards. Instruction I2 expects to read a data value written by an earlier instruction, but I2 executes too early and reads the wrong copy of the data. Classic solution: use forwarding heavily, fall back on stalling when forwarding won t work or slows down the critical path too much. 55

56 Full bypass network... ID (Decode) EX E, emtoreg WB ux,logic From WB op rs1 rs2 RegFile rd1 L U Y Data emory Dout Din emtoreg R ws wd rd2 Ext B 56

57 Common bug: ultiple forwards... DD R4,R3,R2 OR R2,R3,R1 ND R2,R2,R1 Which do we forward from? ID (Decode) EX E, emtoreg WB ux,logic From WB op rs1 rs2 RegFile rd1 L U Y Data emory Dout Din emtoreg R ws wd rd2 Ext B 57

58 Common bug: ultiple forwards II... DD R4,R0,R2 Which do we forward from? ID (Decode) OR R0,R3,R1 ND R0,R2,R1 EX E, emtoreg WB ux,logic rs1 rs2 ws wd RegFile rd1 rd2 From WB op L U Y Data emory Dout Din emtoreg R Ext B 58

59 LW and Hazards No load delay slot 59

60 Questions about LW and forwarding DDIU R1 R1 24 Do we need to stall? ID (Decode) OR R3,R3,R2 LW R1 128(R29) EX E WB, emtoreg ux,logic rs1 rs2 ws wd RegFile rd1 rd2 From WB op L U Y Data emory Dout Din emtoreg R Ext B 60

61 Questions about LW and forwarding DDIU R1 R1 24 Do we need to stall? ID (Decode) LW R1 128(R29) EX OR R1,R3,R1 E, emtoreg WB ux,logic rs1 rs2 ws wd RegFile rd1 rd2 From WB op L U Y Data emory Dout Din emtoreg R Ext B 61

62 Resolving a RW hazard by stalling Stage #1 Stage #2 Stage #3 Instr Fetch Decode & Reg Fetch Sample program DD R4,R3,R2 OR R5,R4,R2 + D PC Q 0x4 Instr em Data OR R5,R4,R2 Keep executing OR instruction until R4 is ready. Until then, send NOPS to 2/3. rs1 rs2 ws wd RegFile rd1 rd2 DD R4,R3,R2 Let DD proceed to WB stage, so that R4 is written to regfile. New datapath hardware (1) ux into 2/3 to feed in NOP. Freeze PC and until stall is over. Ext B (2) Write enable on PC and 1/2 62

63 Branches and Hazards Single delay slot 63

64 + Recall: Control hazard and hardware Stage #1 Stage #2 Stage #3 Instr Fetch Decode & Reg Fetch To branch control logic == 0x4 RegFile D PC Q Instr em Data rs1 rs2 ws wd rd1 rd2 Ext B 64

65 Recall: fter more hardware, change IS + IF (Fetch) ID (Decode) EX (LU) E WB 0x4 D PC Q Instr em Data If we change IS, can we always let I2 complete ( branch delay slot ) and eliminate the control hazard. Sample Program Time: t1 t2 t3 t4 t5 t6 t7 t8 (IS w/o branch Inst ID stage delay slot) I1: IF ID EX E WB computes if I2: IF branch is I1: BEQ R4,R3,25 I3: taken I2: ND R6,R5,R4 I4: I3: SUB R1,R9,R8 If branch is taken, this I5: instruction UST NOT I6: complete! 65

66 Question about branch and forwards: BEQ R1 R3 label Will this work as shown? OR R3,R3,R1 To branch control logic ux,logic ID (Decode) == EX E, emtoreg WB op RegFile rs1 rs2 rd1 L U Y Data emory Dout Din emtoreg R ws wd rd2 Ext B 66

67 Lessons learned Pipelining is hard Study every instruction Write test code in advance Think about interactions... 67

68 Lessons learned Pipelining is hard Study every instruction Write test code in advance Think about interactions... between forwarding, branch and jump delay slots, R0 issues LW issues... a long list! 68

69 Control Implementation 69

70 Recall: What is single cycle control? Instr em Data Equal Combinational Logic (Only Gates, No Flip Flops) Just specify logic functions! RegDest RegWr ExtOp LUsrc emwr emtoreg PCSrc RegDest RegFile rs1 rs2 rd1 ws rd2 wd Ext LUctr op L U Equal Data emory Dout Din RegWr ExtOp LUsrc emwr emtoreg 70

71 In pipelines, all registers are used ID (Decode) EX E WB Equal Combinational Logic (Only Gates, No Flip Flops) (add extra state outside!) RegDest PCSrc RegWr ExtOp emtoreg conceptual design -- for shortest critical path, registers may hold decoded info, not the complete -bit instruction 71

72 On Tuesday Quantitative instruction set architecture... lso, we will revisit the CPU design, and the topic of microcode. Have a good weekend! 72

EECS Digital Design

EECS Digital Design EECS 150 -- Digital Design Lecture 11-- Processor Pipelining 2010-2-23 John Wawrzynek Today s lecture by John Lazzaro www-inst.eecs.berkeley.edu/~cs150 1 Today: Pipelining How to apply the performance

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer rchitecture and Engineering Lecture 10 Pipelining III 2005-2-17 John Lazzaro (www.cs.berkeley.edu/~lazzaro) Ts: Ted Hong and David arquardt www-inst.eecs.berkeley.edu/~cs152/ Last time:

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 7 Pipelining I 2005-9-20 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: David Marquardt and Udam Saini www-inst.eecs.berkeley.edu/~cs152/ Office Hours

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 7 Pipelining I 2006-9-19 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Udam Saini and Jue Sun www-inst.eecs.berkeley.edu/~cs152/ Last Time: ipod

More information

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number

More information

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin.   School of Information Science and Technology SIST CS 110 Computer Architecture Pipelining Guest Lecture: Shu Yin http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University Slides based on UC Berkley's CS61C

More information

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards Instructors: Vladimir Stojanovic and Nicholas Weaver http://inst.eecs.berkeley.edu/~cs61c/sp16 1 Pipelined Execution Representation Time

More information

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University Lecture 9 Pipeline Hazards Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee18b 1 Announcements PA-1 is due today Electronic submission Lab2 is due on Tuesday 2/13 th Quiz1 grades will

More information

Lecture 7 Pipelining. Peng Liu.

Lecture 7 Pipelining. Peng Liu. Lecture 7 Pipelining Peng Liu liupeng@zju.edu.cn 1 Review: The Single Cycle Processor 2 Review: Given Datapath,RTL -> Control Instruction Inst Memory Adr Op Fun Rt

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 6 Superpipelining + Branch Prediction 2014-2-6 John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ Play:

More information

Pipelining: Basic Concepts

Pipelining: Basic Concepts Pipelining: Basic Concepts Prof. Cristina Silvano Dipartimento di Elettronica e Informazione Politecnico di ilano email: silvano@elet.polimi.it Outline Reduced Instruction Set of IPS Processor Implementation

More information

T = I x CPI x C. Both effective CPI and clock cycle C are heavily influenced by CPU design. CPI increased (3-5) bad Shorter cycle good

T = I x CPI x C. Both effective CPI and clock cycle C are heavily influenced by CPU design. CPI increased (3-5) bad Shorter cycle good CPU performance equation: T = I x CPI x C Both effective CPI and clock cycle C are heavily influenced by CPU design. For single-cycle CPU: CPI = 1 good Long cycle time bad On the other hand, for multi-cycle

More information

CS 152 Computer Architecture and Engineering Lecture 1 Single Cycle Design

CS 152 Computer Architecture and Engineering Lecture 1 Single Cycle Design CS 152 Computer Architecture and Engineering Lecture 1 Single Cycle Design 2014-1-21 John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ Play: 1 Today s lecture

More information

COSC 6385 Computer Architecture - Pipelining

COSC 6385 Computer Architecture - Pipelining COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage

More information

Single-Cycle Examples, Multi-Cycle Introduction

Single-Cycle Examples, Multi-Cycle Introduction Single-Cycle Examples, ulti-cycle Introduction 1 Today s enu Single cycle examples Single cycle machines vs. multi-cycle machines Why multi-cycle? Comparative performance Physical and Logical Design of

More information

Processor Design CSCE Instructor: Saraju P. Mohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed

Processor Design CSCE Instructor: Saraju P. Mohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed Lecture 3: General Purpose Processor Design CSCE 665 Advanced VLSI Systems Instructor: Saraju P. ohanty, Ph. D. NOTE: The figures, tet etc included in slides are borrowed from various books, websites,

More information

The Big Picture: Where are We Now? EEM 486: Computer Architecture. Lecture 3. Designing a Single Cycle Datapath

The Big Picture: Where are We Now? EEM 486: Computer Architecture. Lecture 3. Designing a Single Cycle Datapath The Big Picture: Where are We Now? EEM 486: Computer Architecture Lecture 3 The Five Classic Components of a Computer Processor Input Control Memory Designing a Single Cycle path path Output Today s Topic:

More information

Computer Architecture ELEC3441

Computer Architecture ELEC3441 Computer Architecture ELEC3441 RISC vs CISC Iron Law CPUTime = # of instruction program # of cycle instruction cycle Lecture 5 Pipelining Dr. Hayden Kwok-Hay So Department of Electrical and Electronic

More information

L19 Pipelined CPU I 1. Where are the registers? Study Chapter 6 of Text. Pipelined CPUs. Comp 411 Fall /07/07

L19 Pipelined CPU I 1. Where are the registers? Study Chapter 6 of Text. Pipelined CPUs. Comp 411 Fall /07/07 Pipelined CPUs Where are the registers? Study Chapter 6 of Text L19 Pipelined CPU I 1 Review of CPU Performance MIPS = Millions of Instructions/Second MIPS = Freq CPI Freq = Clock Frequency, MHz CPI =

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

What do we have so far? Multi-Cycle Datapath (Textbook Version)

What do we have so far? Multi-Cycle Datapath (Textbook Version) What do we have so far? ulti-cycle Datapath (Textbook Version) CPI: R-Type = 4, Load = 5, Store 4, Branch = 3 Only one instruction being processed in datapath How to lower CPI further? #1 Lec # 8 Summer2001

More information

Pipelined CPUs. Study Chapter 4 of Text. Where are the registers?

Pipelined CPUs. Study Chapter 4 of Text. Where are the registers? Pipelined CPUs Where are the registers? Study Chapter 4 of Text Second Quiz on Friday. Covers lectures 8-14. Open book, open note, no computers or calculators. L17 Pipelined CPU I 1 Review of CPU Performance

More information

Modern Computer Architecture

Modern Computer Architecture Modern Computer Architecture Lecture2 Pipelining: Basic and Intermediate Concepts Hongbin Sun 国家集成电路人才培养基地 Xi an Jiaotong University Pipelining: Its Natural! Laundry Example Ann, Brian, Cathy, Dave each

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

The Pipelined MIPS Processor

The Pipelined MIPS Processor 1 The niversity of Texas at Dallas Lecture #20: The Pipeline IPS Processor The Pipelined IPS Processor We complete our study of AL architecture by investigating an approach providing even higher performance

More information

Computer Architecture

Computer Architecture Lecture 3: Pipelining Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture Measurements and metrics : Performance, Cost, Dependability, Power Guidelines and principles in

More information

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14 MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK

More information

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle? CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 52 Computer Architecture and Engineering Lecture 6 -- Midterm I Review Session 204-3-3 John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs52/ Play: CS 52 L6: Midterm

More information

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) Chentao Wu 吴晨涛 Associate Professor Dept. of Computer Science and Engineering Shanghai Jiao Tong University SEIEE Building

More information

Pipelining. Maurizio Palesi

Pipelining. Maurizio Palesi * Pipelining * Adapted from David A. Patterson s CS252 lecture slides, http://www.cs.berkeley/~pattrsn/252s98/index.html Copyright 1998 UCB 1 References John L. Hennessy and David A. Patterson, Computer

More information

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017 Advanced Parallel Architecture Lessons 5 and 6 Annalisa Massini - Pipelining Hennessy, Patterson Computer architecture A quantitive approach Appendix C Sections C.1, C.2 Pipelining Pipelining is an implementation

More information

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1 Lecture 3 Pipelining Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1 A "Typical" RISC ISA 32-bit fixed format instruction (3 formats) 32 32-bit GPR (R0 contains zero, DP take pair)

More information

LECTURE 3: THE PROCESSOR

LECTURE 3: THE PROCESSOR LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU

More information

Working on the Pipeline

Working on the Pipeline Computer Science 6C Spring 27 Working on the Pipeline Datapath Control Signals Computer Science 6C Spring 27 MemWr: write memory MemtoReg: ALU; Mem RegDst: rt ; rd RegWr: write register 4 PC Ext Imm6 Adder

More information

Chapter 4 The Processor 1. Chapter 4A. The Processor

Chapter 4 The Processor 1. Chapter 4A. The Processor Chapter 4 The Processor 1 Chapter 4A The Processor Chapter 4 The Processor 2 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware

More information

More CPU Pipelining Issues

More CPU Pipelining Issues More CPU Pipelining Issues What have you been beating your head against? This pipe stuff makes my head hurt! Important Stuff: Study Session for Problem Set 5 tomorrow night (11/11) 5:30-9:00pm Study Session

More information

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Thoai Nam Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Reference: Computer Architecture: A Quantitative Approach, John L Hennessy & David a Patterson,

More information

EECS150 - Digital Design Lecture 9 Project Introduction (I), Serial I/O. Announcements

EECS150 - Digital Design Lecture 9 Project Introduction (I), Serial I/O. Announcements EECS150 - Digital Design Lecture 9 Project Introduction (I), Serial I/O September 22, 2011 Elad Alon Electrical Engineering and Computer Sciences University of California, Berkeley http://www-inst.eecs.berkeley.edu/~cs150

More information

COMP303 - Computer Architecture Lecture 8. Designing a Single Cycle Datapath

COMP303 - Computer Architecture Lecture 8. Designing a Single Cycle Datapath COMP33 - Computer Architecture Lecture 8 Designing a Single Cycle Datapath The Big Picture The Five Classic Components of a Computer Processor Input Control Memory Datapath Output The Big Picture: The

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Mark Redekopp and Gandhi Puvvada, All rights reserved. EE 357 Unit 15. Single-Cycle CPU Datapath and Control

Mark Redekopp and Gandhi Puvvada, All rights reserved. EE 357 Unit 15. Single-Cycle CPU Datapath and Control EE 37 Unit Single-Cycle CPU path and Control CPU Organization Scope We will build a CPU to implement our subset of the MIPS ISA Memory Reference Instructions: Load Word (LW) Store Word (SW) Arithmetic

More information

EE 457 Unit 6a. Basic Pipelining Techniques

EE 457 Unit 6a. Basic Pipelining Techniques EE 47 Unit 6a Basic Pipelining Techniques 2 Pipelining Introduction Consider a drink bottling plant Filling the bottle = 3 sec. Placing the cap = 3 sec. Labeling = 3 sec. Would you want Machine = Does

More information

PS Midterm 2. Pipelining

PS Midterm 2. Pipelining PS idterm 2 Pipelining Seqential Landry 6 P 7 8 9 idnight Time T a s k O r d e r A B C D 3 4 2 3 4 2 3 4 2 3 4 2 Seqential landry takes 6 hors for 4 loads If they learned pipelining, how long wold landry

More information

Improving Performance: Pipelining

Improving Performance: Pipelining Improving Performance: Pipelining Memory General registers Memory ID EXE MEM WB Instruction Fetch (includes PC increment) ID Instruction Decode + fetching values from general purpose registers EXE EXEcute

More information

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 Pipelined Datapath Lecture notes from KP, H. H. Lee and S. Yalamanchili Sections 4.5 4. Practice Problems:, 3, 8, 2 ing Note: Appendices A-E in the hardcopy text correspond to chapters 7- in the online

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 4 Testing Processors 2005-1-27 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Ted Hong and David Marquardt www-inst.eecs.berkeley.edu/~cs152/ Last

More information

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141 EECS 151/251A Fall 2017 Digital Design and Integrated Circuits Instructor: John Wawrzynek and Nicholas Weaver Lecture 13 Project Introduction You will design and optimize a RISC-V processor Phase 1: Design

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 20 Advanced Processors I 2005-4-5 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Ted Hong and David Marquardt www-inst.eecs.berkeley.edu/~cs152/ Last

More information

Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Processor Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Moore s Law Gordon Moore @ Intel (1965) 2 Computer Architecture Trends (1)

More information

The overall datapath for RT, lw,sw beq instrucution

The overall datapath for RT, lw,sw beq instrucution Designing The Main Control Unit: Remember the three instruction classes {R-type, Memory, Branch}: a) R-type : Op rs rt rd shamt funct 1.src 2.src dest. 31-26 25-21 20-16 15-11 10-6 5-0 a) Memory : Op rs

More information

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 (2) Lecture notes from MKP, H. H. Lee and S.

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 (2) Lecture notes from MKP, H. H. Lee and S. Pipelined Datapath Lecture notes from KP, H. H. Lee and S. Yalamanchili Sections 4.5 4. Practice Problems:, 3, 8, 2 ing (2) Pipeline Performance Assume time for stages is ps for register read or write

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science 6 PM 7 8 9 10 11 Midnight Time 30 40 20 30 40 20

More information

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards CISC 662 Graduate Computer Architecture Lecture 6 - Hazards Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Pipeline Thoai Nam Outline Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Reference: Computer Architecture: A Quantitative Approach, John L Hennessy

More information

CPS104 Computer Organization and Programming Lecture 19: Pipelining. Robert Wagner

CPS104 Computer Organization and Programming Lecture 19: Pipelining. Robert Wagner CPS104 Computer Organization and Programming Lecture 19: Pipelining Robert Wagner cps 104 Pipelining..1 RW Fall 2000 Lecture Overview A Pipelined Processor : Introduction to the concept of pipelined processor.

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 17 Advanced Processors I 2005-10-27 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: David Marquardt and Udam Saini www-inst.eecs.berkeley.edu/~cs152/

More information

Computer Architecture. Lecture 6.1: Fundamentals of

Computer Architecture. Lecture 6.1: Fundamentals of CS3350B Computer Architecture Winter 2015 Lecture 6.1: Fundamentals of Instructional Level Parallelism Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b [Adapted from lectures on Computer Organization and

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 52 Computer Architecture and Engineering Lecture 26 Mid-Term II Review 26--3 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Udam Saini and Jue Sun www-inst.eecs.berkeley.edu/~cs52/ CS 52 L26: Mid-Term

More information

COMP303 - Computer Architecture Lecture 10. Multi-Cycle Design & Exceptions

COMP303 - Computer Architecture Lecture 10. Multi-Cycle Design & Exceptions COP33 - Computer Architecture Lecture ulti-cycle Design & Exceptions Single Cycle Datapath We designed a processor that requires one cycle per instruction RegDst busw 32 Clk RegWr Rd ux imm6 Rt 5 5 Rs

More information

Pipelined Processor Design

Pipelined Processor Design Pipelined Processor Design Pipelined Implementation: MIPS Virendra Singh Computer Design and Test Lab. Indian Institute of Science (IISc) Bangalore virendra@computer.org Advance Computer Architecture http://www.serc.iisc.ernet.in/~viren/courses/aca/aca.htm

More information

Lecture 4 - Pipelining

Lecture 4 - Pipelining CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining John Wawrzynek Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~johnw

More information

1 Hazards COMP2611 Fall 2015 Pipelined Processor

1 Hazards COMP2611 Fall 2015 Pipelined Processor 1 Hazards Dependences in Programs 2 Data dependence Example: lw $1, 200($2) add $3, $4, $1 add can t do ID (i.e., read register $1) until lw updates $1 Control dependence Example: bne $1, $2, target add

More information

CS 61C: Great Ideas in Computer Architecture. MIPS CPU Datapath, Control Introduction

CS 61C: Great Ideas in Computer Architecture. MIPS CPU Datapath, Control Introduction CS 61C: Great Ideas in Computer Architecture MIPS CPU Datapath, Control Introduction Instructor: Alan Christopher 7/28/214 Summer 214 -- Lecture #2 1 Review of Last Lecture Critical path constrains clock

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction

More information

Pipeline design. Mehran Rezaei

Pipeline design. Mehran Rezaei Pipeline design Mehran Rezaei How Can We Improve the Performance? Exec Time = IC * CPI * CCT Optimization IC CPI CCT Source Level * Compiler * * ISA * * Organization * * Technology * With Pipelining We

More information

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor. COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction

More information

Processor Architecture

Processor Architecture Processor Architecture Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong (jinkyu@skku.edu)

More information

CENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu

CENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu CENG 342 Computer Organization and Design Lecture 6: MIPS Processor - I Bei Yu CEG342 L6. Spring 26 The Processor: Datapath & Control q We're ready to look at an implementation of the MIPS q Simplified

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

Advanced Computer Architecture Pipelining

Advanced Computer Architecture Pipelining Advanced Computer Architecture Pipelining Dr. Shadrokh Samavi Some slides are from the instructors resources which accompany the 6 th and previous editions of the textbook. Some slides are from David Patterson,

More information

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science Pipeline Overview Dr. Jiang Li Adapted from the slides provided by the authors Outline MIPS An ISA for Pipelining 5 stage pipelining Structural and Data Hazards Forwarding Branch Schemes Exceptions and

More information

Improve performance by increasing instruction throughput

Improve performance by increasing instruction throughput Improve performance by increasing instruction throughput Program execution order Time (in instructions) lw $1, 100($0) fetch 2 4 6 8 10 12 14 16 18 ALU Data access lw $2, 200($0) 8ns fetch ALU Data access

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 18 Advanced Processors II 2006-10-31 John Lazzaro (www.cs.berkeley.edu/~lazzaro) Thanks to Krste Asanovic... TAs: Udam Saini and Jue Sun www-inst.eecs.berkeley.edu/~cs152/

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #19 Designing a Single-Cycle CPU 27-7-26 Scott Beamer Instructor AI Focuses on Poker CS61C L19 CPU Design : Designing a Single-Cycle CPU

More information

Major CPU Design Steps

Major CPU Design Steps Datapath Major CPU Design Steps. Analyze instruction set operations using independent RTN ISA => RTN => datapath requirements. This provides the the required datapath components and how they are connected

More information

EC 413 Computer Organization - Fall 2017 Problem Set 3 Problem Set 3 Solution

EC 413 Computer Organization - Fall 2017 Problem Set 3 Problem Set 3 Solution EC 413 Computer Organization - Fall 2017 Problem Set 3 Problem Set 3 Solution Important guidelines: Always state your assumptions and clearly explain your answers. Please upload your solution document

More information

CS61C : Machine Structures

CS61C : Machine Structures CS 61C L path (1) insteecsberkeleyedu/~cs61c/su6 CS61C : Machine Structures Lecture # path natomy: 5 components of any Computer Personal Computer -7-25 This week Computer Processor ( brain ) path ( brawn

More information

Ch 5: Designing a Single Cycle Datapath

Ch 5: Designing a Single Cycle Datapath Ch 5: esigning a Single Cycle path Computer Systems Architecture CS 365 The Big Picture: Where are We Now? The Five Classic Components of a Computer Processor Control Memory path Input Output Today s Topic:

More information

Computer and Information Sciences College / Computer Science Department The Processor: Datapath and Control

Computer and Information Sciences College / Computer Science Department The Processor: Datapath and Control Computer and Information Sciences College / Computer Science Department The Processor: Datapath and Control Chapter 5 The Processor: Datapath and Control Big Picture: Where are We Now? Performance of a

More information

CpE242 Computer Architecture and Engineering Designing a Single Cycle Datapath

CpE242 Computer Architecture and Engineering Designing a Single Cycle Datapath CpE242 Computer Architecture and Engineering Designing a Single Cycle Datapath CPE 442 single-cycle datapath.1 Outline of Today s Lecture Recap and Introduction Where are we with respect to the BIG picture?

More information

CENG 3420 Lecture 06: Datapath

CENG 3420 Lecture 06: Datapath CENG 342 Lecture 6: Datapath Bei Yu byu@cse.cuhk.edu.hk CENG342 L6. Spring 27 The Processor: Datapath & Control q We're ready to look at an implementation of the MIPS q Simplified to contain only: memory-reference

More information

Chapter 4 (Part II) Sequential Laundry

Chapter 4 (Part II) Sequential Laundry Chapter 4 (Part II) The Processor Baback Izadi Division of Engineering Programs bai@engr.newpaltz.edu Sequential Laundry 6 P 7 8 9 10 11 12 1 2 A T a s k O r d e r A B C D 30 30 30 30 30 30 30 30 30 30

More information

CS3350B Computer Architecture Quiz 3 March 15, 2018

CS3350B Computer Architecture Quiz 3 March 15, 2018 CS3350B Computer Architecture Quiz 3 March 15, 2018 Student ID number: Student Last Name: Question 1.1 1.2 1.3 2.1 2.2 2.3 Total Marks The quiz consists of two exercises. The expected duration is 30 minutes.

More information

Lecture Topics. Announcements. Today: Single-Cycle Processors (P&H ) Next: continued. Milestone #3 (due 2/9) Milestone #4 (due 2/23)

Lecture Topics. Announcements. Today: Single-Cycle Processors (P&H ) Next: continued. Milestone #3 (due 2/9) Milestone #4 (due 2/23) Lecture Topics Today: Single-Cycle Processors (P&H 4.1-4.4) Next: continued 1 Announcements Milestone #3 (due 2/9) Milestone #4 (due 2/23) Exam #1 (Wednesday, 2/15) 2 1 Exam #1 Wednesday, 2/15 (3:00-4:20

More information

CS 61C: Great Ideas in Computer Architecture Control and Pipelining

CS 61C: Great Ideas in Computer Architecture Control and Pipelining CS 6C: Great Ideas in Computer Architecture Control and Pipelining Instructors: Vladimir Stojanovic and Nicholas Weaver http://inst.eecs.berkeley.edu/~cs6c/sp6 Datapath Control Signals ExtOp: zero, sign

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined

More information

CS 61C: Great Ideas in Computer Architecture Datapath. Instructors: John Wawrzynek & Vladimir Stojanovic

CS 61C: Great Ideas in Computer Architecture Datapath. Instructors: John Wawrzynek & Vladimir Stojanovic CS 61C: Great Ideas in Computer Architecture Datapath Instructors: John Wawrzynek & Vladimir Stojanovic http://inst.eecs.berkeley.edu/~cs61c/fa15 1 Components of a Computer Processor Control Enable? Read/Write

More information

Lecture 19 Introduction to Pipelining

Lecture 19 Introduction to Pipelining CSE 30321 Lecture 19 Pipelining (Part 1) 1 Lecture 19 Introduction to Pipelining CSE 30321 Lecture 19 Pipelining (Part 1) Basic pipelining basic := single, in-order issue single issue one instruction at

More information

Lecture 5: The Processor

Lecture 5: The Processor Lecture 5: The Processor CSCE 26 Computer Organization Instructor: Saraju P. ohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed from various books, websites, authors pages, and

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

361 datapath.1. Computer Architecture EECS 361 Lecture 8: Designing a Single Cycle Datapath

361 datapath.1. Computer Architecture EECS 361 Lecture 8: Designing a Single Cycle Datapath 361 datapath.1 Computer Architecture EECS 361 Lecture 8: Designing a Single Cycle Datapath Outline of Today s Lecture Introduction Where are we with respect to the BIG picture? Questions and Administrative

More information

Designing a Multicycle Processor

Designing a Multicycle Processor Designing a Multicycle Processor Arquitectura de Computadoras Arturo Díaz D PérezP Centro de Investigación n y de Estudios Avanzados del IPN adiaz@cinvestav.mx Arquitectura de Computadoras Multicycle-

More information

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control ELEC 52/62 Computer Architecture and Design Spring 217 Lecture 4: Datapath and Control Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn, AL 36849

More information

Pipelined Processor Design

Pipelined Processor Design Pipelined Processor Design Pipelined Implementation: MIPS Virendra Singh Indian Institute of Science Bangalore virendra@computer.org Lecture 20 SE-273: Processor Design Courtesy: Prof. Vishwani Agrawal

More information

Introduction to Pipelining. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T.

Introduction to Pipelining. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T. Introduction to Pipelining Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T. L15-1 Performance Measures Two metrics of interest when designing a system: 1. Latency: The delay

More information

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism

More information

Lecture 6 Datapath and Controller

Lecture 6 Datapath and Controller Lecture 6 Datapath and Controller Peng Liu liupeng@zju.edu.cn Windows Editor and Word Processing UltraEdit, EditPlus Gvim Linux or Mac IOS Emacs vi or vim Word Processing(Windows, Linux, and Mac IOS) LaTex

More information