EECS Digital Design

Size: px
Start display at page:

Download "EECS Digital Design"

Transcription

1 EECS Digital Design Lecture 11-- Processor Pipelining John Wawrzynek Today s lecture by John Lazzaro www-inst.eecs.berkeley.edu/~cs150 1

2 Today: Pipelining How to apply the performance equation to our single-cycle CPU. Pipelining: an idea from assembly line production applied to CPU design Why pipelining is hard: data hazards, control hazards, structural hazards. Visualizing pipelines to evaluate hazard detection and resolution. tool kit for hazard resolution. 2

3 New successful instruction sets are rare software instruction set hardware Implementors suffer with original sins of ISs, to support the installed base of software. 3

4 define: The rchitect s Contract To the program, it appears that instructions execute in the correct order defined by the IS. s each instruction completes, the machine state (regs, mem) appears to the program to obey the IS. What the machine actually does is up to the hardware designers, as long as the contract is kept. Goal: Keep contract and run programs faster. 4

5 Performance Measurement (as seen by a CPU designer) Q. Why do we care about a program s performance?. We want the CPU we are designing to run it well! 5

6 Step 1: nalyze the right measurement! Guides CPU design CPU Time: Time the CPU spends running program under measurement. Measuring CPU time (Unix): % time <program name> 25.77u 0.72s 0: % Guides system design Response Time: Total time: CPU Time + time spent waiting (for disk, I/O,...). 6

7 CPU time: Proportional to Instruction Count Q. Once IS is set, who can influence instruction count?. Compiler writer, application developer. CPU time Program Q. Static count? (lines of program printout) Or dynamic count? (trace of execution). Dynamic. Machine Instructions Program Rationale: Every additional instruction you execute takes time. Q. How does an architect influence the number of machine instructions needed to run an algorithm?. Create new instructions: instruction set architect. 7

8 CPU time: Proportional to Clock Period Q. How can architects (not technologists) reduce clock period?. Shorten the machine s critical path. Time Program Q. What ultimately limits an architect s ability to reduce clock period?. Clock-to-Q, setup times. Time One Clock Period Rationale: We measure each instruction s execution time in number of cycles. By shortening the period for each cycle, we shorten execution time. 8

9 Completing the performance equation What factors make different programs have different CPIs? Cache behavior varies. Instruction mix varies. Branch prediction varies. Seconds Program Instructions Program Cycles Instruction Seconds Cycle We need all three terms, and only these terms, to compute CPU Time! CPI -- The verage Number of Clock Cycles Per Instruction For the Program When is it OK to compare clock rates? 9

10 Consider Lecture 10 single-cycle CPU... ll instructions take 1 cycle to execute every time they run. CPI of any program running on machine? 1.0 average CPI for the program is a more-useful concept for more complicated machines... 10

11 Consider machine with a data cache... program s load instructions stride through every memory address. The cache never hits, so every load goes to DRM (100x slower than loads that go to cache). Thus, the average number of cycles for load instructions is higher for this program. Thus, the average number of cycles for all instructions is higher for this program. Seconds Program Instructions Program Cycles Instruction Seconds Cycle Thus, program takes longer to run! 11

12 Final thoughts: Performance Equation Seconds Program Instructions Program Cycles Instruction Seconds Cycle Goal is to optimize execution time, not individual equation terms. Machines are optimized with respect to program workloads. The CPI of the program. Reflects the program s instruction mix. Clock period. Optimize jointly with machine CPI. 12

13 Pipelining 13

14 Clocking methodology... Processor uses synchronous logic design (a clock ). 0&7 f T 1 MHz 1 μs 10 MHz 100 ns 100 MHz 10 ns 1 GHz 1 ns ll state elements act like positive edgetriggered flip flops. D Q clk 14

15 Memory and register file semantics RegFile rs1 rs2 rd1 ws rd2 wd Reads are combinational: Put a stable address on input, a short time later data appears on output. Data Memory Dout Din Writes are clocked: If is high, memory (or register file ws) captures memory Din (or register file wd) on positive edge of clock. Note: May not be best choice for project. 15

16 + Recall: single-cycle processor Challenge: Speed up clock while keeping CPI == 1 Seconds Program Instructions Program Cycles Instruction Seconds Cycle 0x4 CPI == 1 This is good. Slow. This is bad. D PC Q Instr Mem Data RegFile rs1 rs2 rd1 ws rd2 wd op L U Data Memory Dout Din MemToReg Ext 16

17 MIPS R-format CPU design Decode fields to get : DD $8 $9 $10 opcode rs rt rd shamt funct Logic op RegFile rs1 rs2 rd1 ws rd2 wd L U 17

18 How data flows after posedge PC Instr Mem + D Q Data 0x4 Logic op RegFile rs1 rs2 rd1 ws rd2 wd L U 18

19 Next posedge: Update state and repeat PC D Q RegFile rs1 rs2 rd1 ws rd2 wd 19

20 Observation: Logic idle most of cycle For most of cycle, LU is either waiting for its inputs, or holding its output Ideal: a CPU architecture where each part is always working. 0x4 + D PC Q Instr Mem Data RegFile rs1 rs2 rd1 ws rd2 wd op L U Data Memory Dout Din MemToReg Ext 20

21 Inspiration: utomobile assembly line ssembly line moves on a steady clock. Each station does the same task on each car. The clock Car body shell Merge station Bolting station Car chassis 21

22 Lessons from car assembly lines Faster line movement yields more cars per hour off the line. Faster line movement requires more stages, each doing simpler tasks. To maximize efficiency, all stages should take same amount of time (if not, workers in fast stages are idle) Filling, flushing, and stalling assembly line are all bad news. 22

23 + Key analogy: The instruction is the car Pipeline Stage #1 Stage #2 Stage #3 Stage #4 Stage #5 Instruction Fetch PC 0x4 Instr Mem Controls hardware in stage 2 Controls hardware in stage 3 Controls hardware in stage 4 Controls hardware in stage 5 D Q Data Data-stationary control 23

24 + Example: Decode & Register Fetch stage Pipeline Stage #1 Stage #2 Stage #3 Instr Fetch Decode & Reg Fetch SUB R10, R9,R8 OR R7,R6,R5 DD R4,R3,R2 0x4 sample program D PC Q Instr Mem Data RegFile rs1 rs2 rd1 ws rd2 wd Ext M B DD R4,R3,R2 OR R7,R6,R5 SUB R10,R9,R8 R s chosen so that instructions are independent - like cars on the line. 24

25 + Performance Equation and Pipelining Seconds Program Instructions Program Cycles Instruction Seconds Cycle D PC Instr Fetch Decode & Reg Fetch Stage #3 Q 0x4 Instr Mem Data CPI == 1 Once pipe is fill, one instruction completes per cycle rs1 rs2 ws wd RegFile rd1 rd2 Ext M B Clock period is shorter Less work to do in each cycle To get shortest clock period, balance the work to do in each pipeline stage. 25

26 + Hazards: n instruction is not a car... Stage #1 Stage #2 Stage #3 Instr Fetch Decode & Reg Fetch D PC Q 0x4 Instr Mem Data OR R5,R4,R2... wrong value of R4 fetched from RegFile, contract with programmer broken! Oops! rs1 rs2 ws wd RegFile rd1 rd2 Ext M B DD R4,R3,R2 R4 not written yet... New sample program DD R4,R3,R2 OR R5,R4,R2 n example of a hazard -- we must (1) detect and (2) resolve all hazards to make a CPU that matches IS 26

27 + Performance Equation and Hazards Seconds Program Instructions Program Cycles Instruction Seconds Cycle D PC Instr Fetch Decode & Reg Fetch Stage #3 Q 0x4 Instr Mem Data Some ways to cope with hazards makes CPI > 1 stalling pipeline rs1 rs2 ws wd RegFile rd1 rd2 Ext M B dded logic to detect and resolve hazards increases clock period Software slows the machine down Seymour Cray 27

28 + (simplified) 5-stage pipelined CPU 1 2 IF Stage Instr Fetch ID/RF Stage Decode & Reg Fetch 3 EX Stage Execution 4 MEM Stage Memory 5 WB Write Back, MemToReg op D PC Q 0x4 Instr Mem Data Mux,Logic RegFile rs1 rs2 rd1 ws rd2 wd M L U Y M Data Memory Dout Din MemToReg R Ext B 28

29 Sometimes, contract is a challenge + D PC IF Stage Instr Fetch Sample Program LW R4,0(R0) OR R5,R4,R2 Q 0x4 1 2 Instr Mem Data ID/RF Stage Decode & Reg Fetch OR R5,R4,R2... but we haven t even started the load yet! Mux,Logic rs1 rs2 ws wd RegFile rd1 rd2 EX Stage Execution M 3 LW R4, 0(R0) op L U Y M, MemToReg Data Memory Din 4 MEM Stage Memory Dout MemToReg R 5 WB Write Back Ext B One approach: change the contract! 29

30 MIPS LW contract: Delayed Loads Instruction Fetch Instruction Decode Operand Fetch Execute Fetch the load inst from memory opcode rs rt offset I-Format Decode fields to get : LW $1, ($2) Retrieve register value: $2 Compute memory address: + $2 Result Store Next Instruction Load memory address contents into: $1 Prepare to fetch instr that follows the LW in the program. Depending on load semantics, new $1 is visible to that instr, or not until the following instr ( delayed loads ). 30

31 + fter we change the contract... D PC IF Stage Instr Fetch Sample Program LW R4,0(R0) OR R5,R4,R2 Q 0x4 1 2 Instr Mem Data ID/RF Stage Decode & Reg Fetch OR R5,R4,R2... delayed load contract does not guarantee new R4 is seen. Mux,Logic rs1 rs2 ws wd RegFile rd1 rd2 EX Stage Execution M 3 LW R4, 0(R0) op L U Y M MEM Stage Memory, MemToReg Data Memory Din 4 Dout MemToReg R 5 WB Write Back Ext B Only partially solves problem... soon, we finish the story. 31

32 Visualizing Pipelines

33 + Pipeline Representation #1: Timeline IF (Fetch) ID (Decode) EX (LU) MEM WB 0x4 PC Instr Mem Good for visualizing pipeline fills. D Q Data Sample Program I1: I2: I3: I4: I5: DD R4,R3,R2 ND R6,R5,R4 SUB R1,R9,R8 XOR R3,R2,R1 OR R7,R6,R5 Time: Inst I1: I2: I3: I4: I5: I6: t1 t2 t3 t4 t5 t6 t7 t8 IF ID IF EX ID IF Pipeline is full MEM EX ID IF WB MEM EX ID IF WB MEM EX ID IF WB MEM EX ID WB MEM EX 33

34 + Representation #2: Resource Usage IF (Fetch) ID (Decode) EX (LU) MEM WB 0x4 PC Instr Mem Good for visualizing pipeline stalls. D Q Data Sample Program I1: I2: I3: I4: I5: DD R4,R3,R2 ND R6,R5,R4 SUB R1,R9,R8 XOR R3,R2,R1 OR R7,R6,R5 Time: Stage IF: ID: EX: MEM: WB: t1 t2 t3 t4 t5 t6 t7 t8 I1 I2 I1 I3 I2 I1 Pipeline is full I4 I3 I2 I1 I5 I4 I3 I2 I1 I6 I5 I4 I3 I2 I7 I6 I5 I4 I3 I8 I7 I6 I5 I4 34

35 Hazard Taxonomy 35

36 Structural Hazards Several pipeline stages need to use the same hardware resource at the same time. Solution #1: dd extra copies of the resource (only works sometime). Solution #2: Change resource so that it can handle concurrent use. Solution #3: Stages take turns by stalling parts of the pipeline. 36

37 Structural Hazard Example: One Memory IF Stage ID/RF Stage EX Stage MEM Stage WB Used by IF stage and MEM stage Mux,Logic, MemToReg op PC Data Memory Dout Din RegFile rs1 rs2 rd1 ws rd2 wd M L U To branch logic Y M MemToReg R Ext B 37

38 + solution: Extra copies of memory 1 2 IF Stage Instr Fetch ID/RF Stage Decode & Reg Fetch 3 EX Stage Execution 4 MEM Stage Memory 5 WB Write Back, MemToReg Mux,Logic op D PC Q 0x4 Instr Mem Data RegFile rs1 rs2 rd1 ws rd2 wd M L U Y M Data Memory Dout Din MemToReg R Ext B I and D caches are a hybrid solution 38

39 + lternatively: Concurrent use IF Stage Instr Fetch ID/RF Stage Decode & Reg Fetch 3 EX Stage Execution 4 MEM Stage Memory 5 WB Write Back, MemToReg Mux,Logic op D PC Q 0x4 Instr Mem Data RegFile rs1 rs2 rd1 ws rd2 wd M L U Y M Data Memory Dout Din MemToReg R Ext B ID and WB stages use register file in same clock cycle 39

40 Data Hazards: 3 Types (RW, WR, WW) Several pipeline stages read or write the same data location in an incompatible way. Read fter Write (RW) hazards. Instruction I2 expects to read a data value written by an earlier instruction, but I2 executes too early and reads the wrong copy of the data. Note data value, not register. Data hazards are possible for any architected state (such as main memory). In practice, main memory hazard avoidance is the job of the memory system. 40

41 Recall: RW example Stage #1 Stage #2 Stage #3 Instr Fetch Decode & Reg Fetch Sample program DD R4,R3,R2 OR R5,R4,R2 + D PC Q 0x4 Instr Mem Data OR R5,R4,R2... wrong value of R4 fetched from RegFile, contract with programmer broken! Oops! rs1 rs2 ws wd RegFile rd1 rd2 Ext M B DD R4,R3,R2 R4 not written yet... This is what we mean when we say Read fter Write (RW) Hazard 41

42 Data Hazards: 3 Types (RW, WR, WW) Write fter Read (WR) hazards. Instruction I2 expects to write over a data value after an earlier instruction I1 reads it. But instead, I2 writes too early, and I1 sees the new value. Write fter Write (WW) hazards. Instruction I2 writes over data an earlier instruction I1 also writes. But instead, I1 writes after I2, and the final data value is incorrect. WR and WW not possible in our 5-stage pipeline. But are possible in other pipeline designs. 42

43 + Control Hazards: taken branch/jump IF (Fetch) ID (Decode) EX (LU) MEM WB 0x4 D PC Q Instr Mem Data Note: with branch delay slot, I2 MUST complete, I3 MUST NOT complete. Sample Program Time: t1 t2 t3 t4 t5 t6 t7 t8 (IS w/o branch Inst EX stage delay slot) I1: IF ID EX MEM WB computes I2: IF ID if branch I1: BEQ R4,R3,25 I3: IF is taken I2: ND R6,R5,R4 I4: I3: SUB R1,R9,R8 If branch is taken, I5: these instructions I6: MUST NOT complete! 43

44 Hazards Recap Structural Hazards Data Hazards (RW, WR, WW) Control Hazards (taken branches and jumps) On each clock cycle, we must detect the presence of all of these hazards, and resolve them before they break the contract with the programmer. 44

45 Hazard Resolution Tools 45

46 The Hazard Resolution Toolkit Stall earlier instructions in pipeline. Forward results computed in later pipeline stages to earlier stages. dd new hardware or rearrange hardware design to eliminate hazard. Change IS to eliminate hazard. Kill earlier instructions in pipeline. Make hardware handle concurrent requests to eliminate hazard. 46

47 Resolving a RW hazard by stalling Stage #1 Stage #2 Stage #3 Instr Fetch Decode & Reg Fetch Sample program DD R4,R3,R2 OR R5,R4,R2 + D PC Q 0x4 Instr Mem Data OR R5,R4,R2 Keep executing OR instruction until R4 is ready. Until then, send NOPS to 2/3. rs1 rs2 ws wd RegFile rd1 rd2 M DD R4,R3,R2 Let DD proceed to WB stage, so that R4 is written to regfile. New datapath hardware (1) Mux into 2/3 to feed in NOP. Freeze PC and until stall is over. Ext B (2) Write enable on PC and 1/2 47

48 The Hazard Resolution Toolkit Stall earlier instructions in pipeline. Forward results computed in later pipeline stages to earlier stages. dd new hardware or rearrange hardware design to eliminate hazard. Change IS to eliminate hazard. Kill earlier instructions in pipeline. Make hardware handle concurrent requests to eliminate hazard. 48

49 + Resolving a RW hazard by forwarding IF Stage Instr Fetch Sample program DD R4,R3,R2 OR R5,R4,R2 0x4 1 2 ID/RF Stage Decode & Reg Fetch OR R5,R4,R2 Just forward it back! EX Stage Execution op L U 3 DD R4,R3,R2 LU computes R4 in the EX stage, so... Y D PC Q Instr Mem Data RegFile rs1 rs2 rd1 ws rd2 wd M M Ext B Unlike stalling, does not change CPI. May hurt cycle time. 49

50 The Hazard Resolution Toolkit Stall earlier instructions in pipeline. Forward results computed in later pipeline stages to earlier stages. dd new hardware or rearrange hardware design to eliminate hazard. Change IS to eliminate hazard. Kill earlier instructions in pipeline. Make hardware handle concurrent requests to eliminate hazard. 50

51 + Control Hazards: Fix with more hardware IF (Fetch) ID (Decode) EX (LU) MEM WB 0x4 D PC Q Instr Mem Data If we add hardware, can we move it here? Sample Program Time: t1 t2 t3 t4 t5 t6 t7 t8 (IS w/o branch Inst EX stage delay slot) I1: IF ID EX MEM WB computes I2: IF ID if branch I1: BEQ R4,R3,25 I3: IF is taken I2: ND R6,R5,R4 I4: I3: SUB R1,R9,R8 If branch is taken, I5: these instructions I6: MUST NOT complete! 51

52 + Resolving control hazard with hardware Stage #1 Stage #2 Stage #3 Instr Fetch Decode & Reg Fetch To branch control logic == 0x4 RegFile D PC Q Instr Mem Data rs1 rs2 ws wd rd1 rd2 M Ext B 52

53 + Control Hazards: fter more hardware IF (Fetch) ID (Decode) EX (LU) MEM WB 0x4 D PC Q Instr Mem Data If we change IS, can we always let I2 complete ( branch delay slot ) and eliminate the control hazard. Sample Program Time: t1 t2 t3 t4 t5 t6 t7 t8 (IS w/o branch Inst ID stage delay slot) I1: IF ID EX MEM WB computes I2: IF if branch I1: BEQ R4,R3,25 I3: is taken I2: ND R6,R5,R4 I4: I3: SUB R1,R9,R8 If branch is taken, this I5: instruction MUST NOT I6: complete! 53

54 MIPS Delayed Branch: BEQ $1,$2,25 Instruction Fetch Instruction Decode Operand Fetch Execute Fetch branch inst from memory opcode rs rt offset I-Format Decode fields to get: BEQ $1, $2, 25 Retrieve register values: $1, $2 Compute if we take branch: $1 == $2? Result Store Next Instruction LWYS prepare to fetch instr that follows the BEQ in the program ( delayed branch ). IF we take branch, the instr we fetch FTER that instruction is PC PC == Program Counter 54

55 The Hazard Resolution Toolkit Stall earlier instructions in pipeline. Forward results computed in later pipeline stages to earlier stages. dd new hardware or rearrange hardware design to eliminate hazard. Change IS to eliminate hazard. Kill earlier instructions in pipeline. Make hardware handle concurrent requests to eliminate hazard. 55

56 Resolve control hazard by killing instr Stage #1 Stage #2 Stage #3 Instr Fetch Decode & Reg Fetch Sample program (no delay slot) J 200 OR R5,R4,R2 + D PC Q 0x4 Instr Mem Data J 200 Detect J instruction, mux a NOP into 1/2 rs1 rs2 ws wd RegFile rd1 rd2 M This hurts CPI. Can we do better? Compute new Ext PC using hardware not shown... B 56

57 The Hazard Resolution Toolkit Stall earlier instructions in pipeline. Forward results computed in later pipeline stages to earlier stages. dd new hardware or rearrange hardware design to eliminate hazard. Change IS to eliminate hazard. Kill earlier instructions in pipeline. Make hardware handle concurrent requests to eliminate hazard. 57

58 Structural hazard solution: concurrent use + D PC IF Stage Instr Fetch Does not come for free... Q 0x4 1 2 Instr Mem Data ID/RF Stage Decode & Reg Fetch Mux,Logic rs1 rs2 ws wd RegFile rd1 rd2 EX Stage Execution M 3 op L U Y M, MemToReg Data Memory Din 4 MEM Stage Memory Dout MemToReg R 5 WB Write Back Ext B ID and WB stages use register file in same clock cycle 58

59 Hazard Diagnosis 59

60 Data Hazards: Read fter Write Read fter Write (RW) hazards. Instruction I2 expects to read a data value written by an earlier instruction, but I2 executes too early and reads the wrong copy of the data. Classic solution: use forwarding heavily, fall back on stalling when forwarding won t work or slows down the critical path too much. 60

61 Full bypass network... ID (Decode) EX MEM WB, MemToReg Mux,Logic From WB op rs1 rs2 RegFile rd1 L U Y Data Memory Dout Din MemToReg R ws wd rd2 M M Ext B 61

62 Common bug: Multiple forwards... DD R4,R3,R2 OR R2,R3,R1 ND R2,R2,R1 Which do we forward from? ID (Decode) EX MEM WB, MemToReg Mux,Logic rs1 rs2 ws wd RegFile rd1 rd2 From WB M op L U Y M Data Memory Dout Din MemToReg R Ext B 62

63 Common bug: Multiple forwards II... DD R4,R0,R2 Which do we forward from? ID (Decode) OR R0,R3,R1 ND R0,R2,R1 EX MEM WB, MemToReg Mux,Logic rs1 rs2 ws wd RegFile rd1 rd2 From WB M op L U Y M Data Memory Dout Din MemToReg R Ext B 63

64 LW and Hazards No load delay slot 64

65 Questions about LW and forwarding DDIU R1 R1 24 Do we need to stall? ID (Decode) OR R3,R3,R2 LW R1 128(R29) EX MEM WB, MemToReg Mux,Logic rs1 rs2 ws wd RegFile rd1 rd2 From WB M op L U Y M Data Memory Dout Din MemToReg R Ext B 65

66 Questions about LW and forwarding DDIU R1 R1 24 Do we need to stall? ID (Decode) LW R1 128(R29) EX OR R1,R3,R1 MEM WB, MemToReg Mux,Logic rs1 rs2 ws wd RegFile rd1 rd2 From WB M op L U Y M Data Memory Dout Din MemToReg R Ext B 66

67 Resolving a RW hazard by stalling Stage #1 Stage #2 Stage #3 Instr Fetch Decode & Reg Fetch Sample program DD R4,R3,R2 OR R5,R4,R2 + D PC Q 0x4 Instr Mem Data OR R5,R4,R2 Keep executing OR instruction until R4 is ready. Until then, send NOPS to 2/3. rs1 rs2 ws wd RegFile rd1 rd2 M DD R4,R3,R2 Let DD proceed to WB stage, so that R4 is written to regfile. New datapath hardware (1) Mux into 2/3 to feed in NOP. Freeze PC and until stall is over. Ext B (2) Write enable on PC and 1/2 67

68 Branches and Hazards Single delay slot 68

69 + Recall: Control hazard and hardware Stage #1 Stage #2 Stage #3 Instr Fetch Decode & Reg Fetch To branch control logic == 0x4 RegFile D PC Q Instr Mem Data rs1 rs2 ws wd rd1 rd2 M Ext B 69

70 + Recall: fter more hardware, change IS IF (Fetch) ID (Decode) EX (LU) MEM WB 0x4 D PC Q Instr Mem Data If we change IS, can we always let I2 complete ( branch delay slot ) and eliminate the control hazard. Sample Program Time: t1 t2 t3 t4 t5 t6 t7 t8 (IS w/o branch Inst ID stage delay slot) I1: IF ID EX MEM WB computes I2: IF if branch I1: BEQ R4,R3,25 I3: is taken I2: ND R6,R5,R4 I4: I3: SUB R1,R9,R8 If branch is taken, this I5: instruction MUST NOT I6: complete! 70

71 Question about branch and forwards: Mux,Logic BEQ R1 R3 label Will this work as shown? To branch control logic ID (Decode) == OR R3,R3,R1 EX MEM WB, MemToReg op RegFile rs1 rs2 rd1 ws rd2 wd M L U Y M Data Memory Dout Din MemToReg R Ext B 71

72 Lessons learned Pipelining is hard Study every instruction Write test code in advance Think about interactions... 72

73 Lessons learned Pipelining is hard Study every instruction Write test code in advance Think about interactions... between forwarding, branch and jump delay slots, R0 issues LW issues... a long list! 73

74 Control Implementation 74

75 Recall: What is single cycle control? Instr Mem Data Equal Combinational Logic (Only Gates, No Flip Flops) Just specify logic functions! RegDest RegWr ExtOp LUsrc MemWr MemToReg PCSrc RegDest RegFile rs1 rs2 rd1 ws rd2 wd Ext LUctr op L U Equal Data Memory Dout Din RegWr ExtOp LUsrc MemWr MemToReg 75

76 In pipelines, all registers are used ID (Decode) EX MEM WB Equal Combinational Logic (Only Gates, No Flip Flops) (add extra state outside!) RegDest RegWr ExtOp MemToReg PCSrc conceptual design -- for shortest critical path, registers may hold decoded info, not the complete -bit instruction 76

77 Exceptions and Interrupts Exception: n unusual event happens to an instruction during its execution. Examples: divide by zero, undefined opcode. Interrupt: Hardware signal to switch the processor to a new instruction stream. Example: a sound card interrupts when it needs more audio output samples (an audio click happens if it is left waiting). 77

78 Challenge: Precise Interrupt / Exception Definition: (or exception),$'-.)$'(//+(%'()'0*'(#'0#$+%%./$'0)'$(1+#'2+$3++#' $3"'0#)$%.4$0"#)!"#$%& ' #()%& '*+, -./0%01102.%31%#44%'(".562.'3("%67%.3%#()%'(246)'(8%& ' '".3.#44$% (3%01102.%31%#($%'(".562.'3(%#1.05%& ' /#"%.#:0(%74#20 ;/0%'( %/#()405%0'./05%#<35."%./0%75385#9%35% 50".#5."%'.%#.%& '*+% = Follows from the contract between the architect and the programmer... 78

79 Precise Exceptions in Static Pipelines 120$!'()'*3/4)$5#67)*8/-)./0)9!"##$%& '"$(% P9 0?J!!?G:>? K T 0!H@H' 0?J S G/:/1*0 L98.:/-0 6C 6C0..-/440 D>1/3*7(84 KFG!!::/<9:0 I31(./ KFG K IJ/-,:(= KFG 0 F9* E7::0 D>1/3* K-7*/?91@ C9)4/ E7::0H0 G*9</ P9! E7::0F0 G*9</ P9 K E7::0D0 G*9</ P9 0 4B81;-(8()40!8*/--)3*4 D6C C D:E>'?FG?B@=:;'$EHI<'=;'B=B?E=;?';@=E'G:JJ=@'B:=;@',0'<@HI?/ Key observation: architected state only change in memory and register write stages. 79

80 Thursday: 80

CS 152 Computer Architecture and Engineering Lecture 4 Pipelining

CS 152 Computer Architecture and Engineering Lecture 4 Pipelining CS 152 Computer rchitecture and Engineering Lecture 4 Pipelining 2014-1-30 John Lazzaro (not a prof - John is always OK) T: Eric Love www-inst.eecs.berkeley.edu/~cs152/ Play: 1 otorola 68000 Next week

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer rchitecture and Engineering Lecture 10 Pipelining III 2005-2-17 John Lazzaro (www.cs.berkeley.edu/~lazzaro) Ts: Ted Hong and David arquardt www-inst.eecs.berkeley.edu/~cs152/ Last time:

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 7 Pipelining I 2006-9-19 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Udam Saini and Jue Sun www-inst.eecs.berkeley.edu/~cs152/ Last Time: ipod

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 7 Pipelining I 2005-9-20 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: David Marquardt and Udam Saini www-inst.eecs.berkeley.edu/~cs152/ Office Hours

More information

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number

More information

The Big Picture: Where are We Now? EEM 486: Computer Architecture. Lecture 3. Designing a Single Cycle Datapath

The Big Picture: Where are We Now? EEM 486: Computer Architecture. Lecture 3. Designing a Single Cycle Datapath The Big Picture: Where are We Now? EEM 486: Computer Architecture Lecture 3 The Five Classic Components of a Computer Processor Input Control Memory Designing a Single Cycle path path Output Today s Topic:

More information

CS 152 Computer Architecture and Engineering Lecture 1 Single Cycle Design

CS 152 Computer Architecture and Engineering Lecture 1 Single Cycle Design CS 152 Computer Architecture and Engineering Lecture 1 Single Cycle Design 2014-1-21 John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ Play: 1 Today s lecture

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 18 Advanced Processors II 2006-10-31 John Lazzaro (www.cs.berkeley.edu/~lazzaro) Thanks to Krste Asanovic... TAs: Udam Saini and Jue Sun www-inst.eecs.berkeley.edu/~cs152/

More information

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards Instructors: Vladimir Stojanovic and Nicholas Weaver http://inst.eecs.berkeley.edu/~cs61c/sp16 1 Pipelined Execution Representation Time

More information

Lecture 7 Pipelining. Peng Liu.

Lecture 7 Pipelining. Peng Liu. Lecture 7 Pipelining Peng Liu liupeng@zju.edu.cn 1 Review: The Single Cycle Processor 2 Review: Given Datapath,RTL -> Control Instruction Inst Memory Adr Op Fun Rt

More information

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin.   School of Information Science and Technology SIST CS 110 Computer Architecture Pipelining Guest Lecture: Shu Yin http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University Slides based on UC Berkley's CS61C

More information

Computer Architecture

Computer Architecture Lecture 3: Pipelining Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture Measurements and metrics : Performance, Cost, Dependability, Power Guidelines and principles in

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) Chentao Wu 吴晨涛 Associate Professor Dept. of Computer Science and Engineering Shanghai Jiao Tong University SEIEE Building

More information

L19 Pipelined CPU I 1. Where are the registers? Study Chapter 6 of Text. Pipelined CPUs. Comp 411 Fall /07/07

L19 Pipelined CPU I 1. Where are the registers? Study Chapter 6 of Text. Pipelined CPUs. Comp 411 Fall /07/07 Pipelined CPUs Where are the registers? Study Chapter 6 of Text L19 Pipelined CPU I 1 Review of CPU Performance MIPS = Millions of Instructions/Second MIPS = Freq CPI Freq = Clock Frequency, MHz CPI =

More information

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University Lecture 9 Pipeline Hazards Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee18b 1 Announcements PA-1 is due today Electronic submission Lab2 is due on Tuesday 2/13 th Quiz1 grades will

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Computer Architecture ELEC3441

Computer Architecture ELEC3441 Computer Architecture ELEC3441 RISC vs CISC Iron Law CPUTime = # of instruction program # of cycle instruction cycle Lecture 5 Pipelining Dr. Hayden Kwok-Hay So Department of Electrical and Electronic

More information

LECTURE 3: THE PROCESSOR

LECTURE 3: THE PROCESSOR LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU

More information

CENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu

CENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu CENG 342 Computer Organization and Design Lecture 6: MIPS Processor - I Bei Yu CEG342 L6. Spring 26 The Processor: Datapath & Control q We're ready to look at an implementation of the MIPS q Simplified

More information

Pipelined CPUs. Study Chapter 4 of Text. Where are the registers?

Pipelined CPUs. Study Chapter 4 of Text. Where are the registers? Pipelined CPUs Where are the registers? Study Chapter 4 of Text Second Quiz on Friday. Covers lectures 8-14. Open book, open note, no computers or calculators. L17 Pipelined CPU I 1 Review of CPU Performance

More information

Modern Computer Architecture

Modern Computer Architecture Modern Computer Architecture Lecture2 Pipelining: Basic and Intermediate Concepts Hongbin Sun 国家集成电路人才培养基地 Xi an Jiaotong University Pipelining: Its Natural! Laundry Example Ann, Brian, Cathy, Dave each

More information

CS61C : Machine Structures

CS61C : Machine Structures CS 61C L path (1) insteecsberkeleyedu/~cs61c/su6 CS61C : Machine Structures Lecture # path natomy: 5 components of any Computer Personal Computer -7-25 This week Computer Processor ( brain ) path ( brawn

More information

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle? CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 17 Advanced Processors I 2005-10-27 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: David Marquardt and Udam Saini www-inst.eecs.berkeley.edu/~cs152/

More information

Working on the Pipeline

Working on the Pipeline Computer Science 6C Spring 27 Working on the Pipeline Datapath Control Signals Computer Science 6C Spring 27 MemWr: write memory MemtoReg: ALU; Mem RegDst: rt ; rd RegWr: write register 4 PC Ext Imm6 Adder

More information

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science Pipeline Overview Dr. Jiang Li Adapted from the slides provided by the authors Outline MIPS An ISA for Pipelining 5 stage pipelining Structural and Data Hazards Forwarding Branch Schemes Exceptions and

More information

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri Department of Computer and IT Engineering University of Kurdistan Computer Architecture Pipelining By: Dr. Alireza Abdollahpouri Pipelined MIPS processor Any instruction set can be implemented in many

More information

COMP303 - Computer Architecture Lecture 8. Designing a Single Cycle Datapath

COMP303 - Computer Architecture Lecture 8. Designing a Single Cycle Datapath COMP33 - Computer Architecture Lecture 8 Designing a Single Cycle Datapath The Big Picture The Five Classic Components of a Computer Processor Input Control Memory Datapath Output The Big Picture: The

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined

More information

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1 Lecture 3 Pipelining Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1 A "Typical" RISC ISA 32-bit fixed format instruction (3 formats) 32 32-bit GPR (R0 contains zero, DP take pair)

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 20 Advanced Processors I 2005-4-5 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Ted Hong and David Marquardt www-inst.eecs.berkeley.edu/~cs152/ Last

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #19 Designing a Single-Cycle CPU 27-7-26 Scott Beamer Instructor AI Focuses on Poker CS61C L19 CPU Design : Designing a Single-Cycle CPU

More information

Pipeline design. Mehran Rezaei

Pipeline design. Mehran Rezaei Pipeline design Mehran Rezaei How Can We Improve the Performance? Exec Time = IC * CPI * CCT Optimization IC CPI CCT Source Level * Compiler * * ISA * * Organization * * Technology * With Pipelining We

More information

COSC 6385 Computer Architecture - Pipelining

COSC 6385 Computer Architecture - Pipelining COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Pipelined Processor Design

Pipelined Processor Design Pipelined Processor Design Pipelined Implementation: MIPS Virendra Singh Computer Design and Test Lab. Indian Institute of Science (IISc) Bangalore virendra@computer.org Advance Computer Architecture http://www.serc.iisc.ernet.in/~viren/courses/aca/aca.htm

More information

Mark Redekopp and Gandhi Puvvada, All rights reserved. EE 357 Unit 15. Single-Cycle CPU Datapath and Control

Mark Redekopp and Gandhi Puvvada, All rights reserved. EE 357 Unit 15. Single-Cycle CPU Datapath and Control EE 37 Unit Single-Cycle CPU path and Control CPU Organization Scope We will build a CPU to implement our subset of the MIPS ISA Memory Reference Instructions: Load Word (LW) Store Word (SW) Arithmetic

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

CS 61C: Great Ideas in Computer Architecture Datapath. Instructors: John Wawrzynek & Vladimir Stojanovic

CS 61C: Great Ideas in Computer Architecture Datapath. Instructors: John Wawrzynek & Vladimir Stojanovic CS 61C: Great Ideas in Computer Architecture Datapath Instructors: John Wawrzynek & Vladimir Stojanovic http://inst.eecs.berkeley.edu/~cs61c/fa15 1 Components of a Computer Processor Control Enable? Read/Write

More information

CS 61C: Great Ideas in Computer Architecture. MIPS CPU Datapath, Control Introduction

CS 61C: Great Ideas in Computer Architecture. MIPS CPU Datapath, Control Introduction CS 61C: Great Ideas in Computer Architecture MIPS CPU Datapath, Control Introduction Instructor: Alan Christopher 7/28/214 Summer 214 -- Lecture #2 1 Review of Last Lecture Critical path constrains clock

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Major CPU Design Steps

Major CPU Design Steps Datapath Major CPU Design Steps. Analyze instruction set operations using independent RTN ISA => RTN => datapath requirements. This provides the the required datapath components and how they are connected

More information

Chapter 4 The Processor 1. Chapter 4A. The Processor

Chapter 4 The Processor 1. Chapter 4A. The Processor Chapter 4 The Processor 1 Chapter 4A The Processor Chapter 4 The Processor 2 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware

More information

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141 EECS 151/251A Fall 2017 Digital Design and Integrated Circuits Instructor: John Wawrzynek and Nicholas Weaver Lecture 13 Project Introduction You will design and optimize a RISC-V processor Phase 1: Design

More information

361 datapath.1. Computer Architecture EECS 361 Lecture 8: Designing a Single Cycle Datapath

361 datapath.1. Computer Architecture EECS 361 Lecture 8: Designing a Single Cycle Datapath 361 datapath.1 Computer Architecture EECS 361 Lecture 8: Designing a Single Cycle Datapath Outline of Today s Lecture Introduction Where are we with respect to the BIG picture? Questions and Administrative

More information

CENG 3420 Lecture 06: Datapath

CENG 3420 Lecture 06: Datapath CENG 342 Lecture 6: Datapath Bei Yu byu@cse.cuhk.edu.hk CENG342 L6. Spring 27 The Processor: Datapath & Control q We're ready to look at an implementation of the MIPS q Simplified to contain only: memory-reference

More information

Full Datapath. Chapter 4 The Processor 2

Full Datapath. Chapter 4 The Processor 2 Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory

More information

CpE242 Computer Architecture and Engineering Designing a Single Cycle Datapath

CpE242 Computer Architecture and Engineering Designing a Single Cycle Datapath CpE242 Computer Architecture and Engineering Designing a Single Cycle Datapath CPE 442 single-cycle datapath.1 Outline of Today s Lecture Recap and Introduction Where are we with respect to the BIG picture?

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science 6 PM 7 8 9 10 11 Midnight Time 30 40 20 30 40 20

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 4 Testing Processors 2005-1-27 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Ted Hong and David Marquardt www-inst.eecs.berkeley.edu/~cs152/ Last

More information

CS3350B Computer Architecture Quiz 3 March 15, 2018

CS3350B Computer Architecture Quiz 3 March 15, 2018 CS3350B Computer Architecture Quiz 3 March 15, 2018 Student ID number: Student Last Name: Question 1.1 1.2 1.3 2.1 2.2 2.3 Total Marks The quiz consists of two exercises. The expected duration is 30 minutes.

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 6 Superpipelining + Branch Prediction 2014-2-6 John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ Play:

More information

CS 61C: Great Ideas in Computer Architecture Control and Pipelining

CS 61C: Great Ideas in Computer Architecture Control and Pipelining CS 6C: Great Ideas in Computer Architecture Control and Pipelining Instructors: Vladimir Stojanovic and Nicholas Weaver http://inst.eecs.berkeley.edu/~cs6c/sp6 Datapath Control Signals ExtOp: zero, sign

More information

EECS150 - Digital Design Lecture 9 Project Introduction (I), Serial I/O. Announcements

EECS150 - Digital Design Lecture 9 Project Introduction (I), Serial I/O. Announcements EECS150 - Digital Design Lecture 9 Project Introduction (I), Serial I/O September 22, 2011 Elad Alon Electrical Engineering and Computer Sciences University of California, Berkeley http://www-inst.eecs.berkeley.edu/~cs150

More information

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19 CO2-3224 Computer Architecture and Programming Languages CAPL Lecture 8 & 9 Dr. Kinga Lipskoch Fall 27 Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently the clock cycle must be

More information

Improving Performance: Pipelining

Improving Performance: Pipelining Improving Performance: Pipelining Memory General registers Memory ID EXE MEM WB Instruction Fetch (includes PC increment) ID Instruction Decode + fetching values from general purpose registers EXE EXEcute

More information

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14 MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK

More information

Computer Architecture Spring 2016

Computer Architecture Spring 2016 Computer Architecture Spring 2016 Lecture 02: Introduction II Shuai Wang Department of Computer Science and Technology Nanjing University Pipeline Hazards Major hurdle to pipelining: hazards prevent the

More information

The overall datapath for RT, lw,sw beq instrucution

The overall datapath for RT, lw,sw beq instrucution Designing The Main Control Unit: Remember the three instruction classes {R-type, Memory, Branch}: a) R-type : Op rs rt rd shamt funct 1.src 2.src dest. 31-26 25-21 20-16 15-11 10-6 5-0 a) Memory : Op rs

More information

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017 Advanced Parallel Architecture Lessons 5 and 6 Annalisa Massini - Pipelining Hennessy, Patterson Computer architecture A quantitive approach Appendix C Sections C.1, C.2 Pipelining Pipelining is an implementation

More information

COMPUTER ORGANIZATION AND DESI

COMPUTER ORGANIZATION AND DESI COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler

More information

Outline. EEL-4713 Computer Architecture Designing a Single Cycle Datapath

Outline. EEL-4713 Computer Architecture Designing a Single Cycle Datapath Outline EEL-473 Computer Architecture Designing a Single Cycle path Introduction The steps of designing a processor path and timing for register-register operations path for logical operations with immediates

More information

Review. N-bit adder-subtractor done using N 1- bit adders with XOR gates on input. Lecture #19 Designing a Single-Cycle CPU

Review. N-bit adder-subtractor done using N 1- bit adders with XOR gates on input. Lecture #19 Designing a Single-Cycle CPU CS6C L9 CPU Design : Designing a Single-Cycle CPU () insteecsberkeleyedu/~cs6c CS6C : Machine Structures Lecture #9 Designing a Single-Cycle CPU 27-7-26 Scott Beamer Instructor AI Focuses on Poker Review

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 52 Computer Architecture and Engineering Lecture 6 -- Midterm I Review Session 204-3-3 John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs52/ Play: CS 52 L6: Midterm

More information

EECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction

EECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction EECS150 - Digital Design Lecture 10- CPU Microarchitecture Feb 18, 2010 John Wawrzynek Spring 2010 EECS150 - Lec10-cpu Page 1 Processor Microarchitecture Introduction Microarchitecture: how to implement

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction

More information

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor. COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 52 Computer Architecture and Engineering Lecture 26 Mid-Term II Review 26--3 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Udam Saini and Jue Sun www-inst.eecs.berkeley.edu/~cs52/ CS 52 L26: Mid-Term

More information

Ch 5: Designing a Single Cycle Datapath

Ch 5: Designing a Single Cycle Datapath Ch 5: esigning a Single Cycle path Computer Systems Architecture CS 365 The Big Picture: Where are We Now? The Five Classic Components of a Computer Processor Control Memory path Input Output Today s Topic:

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

Pipelined Processor Design

Pipelined Processor Design Pipelined Processor Design Pipelined Implementation: MIPS Virendra Singh Indian Institute of Science Bangalore virendra@computer.org Lecture 20 SE-273: Processor Design Courtesy: Prof. Vishwani Agrawal

More information

EE 457 Unit 6a. Basic Pipelining Techniques

EE 457 Unit 6a. Basic Pipelining Techniques EE 47 Unit 6a Basic Pipelining Techniques 2 Pipelining Introduction Consider a drink bottling plant Filling the bottle = 3 sec. Placing the cap = 3 sec. Labeling = 3 sec. Would you want Machine = Does

More information

1 Hazards COMP2611 Fall 2015 Pipelined Processor

1 Hazards COMP2611 Fall 2015 Pipelined Processor 1 Hazards Dependences in Programs 2 Data dependence Example: lw $1, 200($2) add $3, $4, $1 add can t do ID (i.e., read register $1) until lw updates $1 Control dependence Example: bne $1, $2, target add

More information

More CPU Pipelining Issues

More CPU Pipelining Issues More CPU Pipelining Issues What have you been beating your head against? This pipe stuff makes my head hurt! Important Stuff: Study Session for Problem Set 5 tomorrow night (11/11) 5:30-9:00pm Study Session

More information

CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content

CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content 3/6/8 CSCI 42: Computer Architectures The Processor (2) Fengguang Song Department of Computer & Information Science IUPUI Today s Content We have looked at how to design a Data Path. 4.4, 4.5 We will design

More information

Full Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI

Full Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI CSCI 42: Computer Architectures The Processor (2) Fengguang Song Department of Computer & Information Science IUPUI Full Datapath Branch Target Instruction Fetch Immediate 4 Today s Contents We have looked

More information

CPU Design Steps. EECC550 - Shaaban

CPU Design Steps. EECC550 - Shaaban CPU Design Steps 1. Analyze instruction set operations using independent RTN => datapath requirements. 2. Select set of datapath components & establish clock methodology. 3. Assemble datapath meeting the

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 22 Advanced Processors III 2005-4-12 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Ted Hong and David Marquardt www-inst.eecs.berkeley.edu/~cs152/

More information

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards CISC 662 Graduate Computer Architecture Lecture 6 - Hazards Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions.

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions. Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions Stage Instruction Fetch Instruction Decode Execution / Effective addr Memory access Write-back Abbreviation

More information

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture The Processor Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut CSE3666: Introduction to Computer Architecture Introduction CPU performance factors Instruction count

More information

COMP303 Computer Architecture Lecture 9. Single Cycle Control

COMP303 Computer Architecture Lecture 9. Single Cycle Control COMP33 Computer Architecture Lecture 9 Single Cycle Control A Single Cycle Datapath We have everything except control signals (underlined) RegDst busw Today s lecture will look at how to generate the control

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

CS3350B Computer Architecture Winter Lecture 5.7: Single-Cycle CPU: Datapath Control (Part 2)

CS3350B Computer Architecture Winter Lecture 5.7: Single-Cycle CPU: Datapath Control (Part 2) CS335B Computer Architecture Winter 25 Lecture 5.7: Single-Cycle CPU: Datapath Control (Part 2) Marc Moreno Maza www.csd.uwo.ca/courses/cs335b [Adapted from lectures on Computer Organization and Design,

More information

CS 152 Computer Architecture and Engineering Lecture 3 Metrics

CS 152 Computer Architecture and Engineering Lecture 3 Metrics CS 152 Computer Architecture and Engineering Lecture 3 Metrics 2014-1-28 John Lazzaro (not a prof - John is always OK) TA: Eric Love www-insteecsberkeleyedu/~cs152/ Play: CS 152 L3: Metrics UC Regents

More information

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16 4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3 Emil Sekerinski, McMaster University, Fall Term 2015/16 Instruction Execution Consider simplified MIPS: lw/sw rt, offset(rs) add/sub/and/or/slt

More information

Introduction to Pipelining. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T.

Introduction to Pipelining. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T. Introduction to Pipelining Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T. L15-1 Performance Measures Two metrics of interest when designing a system: 1. Latency: The delay

More information

Lecture 6 Datapath and Controller

Lecture 6 Datapath and Controller Lecture 6 Datapath and Controller Peng Liu liupeng@zju.edu.cn Windows Editor and Word Processing UltraEdit, EditPlus Gvim Linux or Mac IOS Emacs vi or vim Word Processing(Windows, Linux, and Mac IOS) LaTex

More information

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Thoai Nam Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Reference: Computer Architecture: A Quantitative Approach, John L Hennessy & David a Patterson,

More information

CSEE 3827: Fundamentals of Computer Systems

CSEE 3827: Fundamentals of Computer Systems CSEE 3827: Fundamentals of Computer Systems Lecture 21 and 22 April 22 and 27, 2009 martha@cs.columbia.edu Amdahl s Law Be aware when optimizing... T = improved Taffected improvement factor + T unaffected

More information

EECS150 - Digital Design Lecture 08 - Project Introduction Part 1

EECS150 - Digital Design Lecture 08 - Project Introduction Part 1 EECS150 - Digital Design Lecture 08 - Project Introduction Part 1 Feb 9, 2012 John Wawrzynek Spring 2012 EECS150 - Lec08-proj1 Page 1 Project Overview A. Pipelined CPU review B.MIPS150 pipeline structure

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware 4.1 Introduction We will examine two MIPS implementations

More information

EECS150 - Digital Design Lecture 9- CPU Microarchitecture. Watson: Jeopardy-playing Computer

EECS150 - Digital Design Lecture 9- CPU Microarchitecture. Watson: Jeopardy-playing Computer EECS150 - Digital Design Lecture 9- CPU Microarchitecture Feb 15, 2011 John Wawrzynek Spring 2011 EECS150 - Lec09-cpu Page 1 Watson: Jeopardy-playing Computer Watson is made up of a cluster of ninety IBM

More information

Designing a Multicycle Processor

Designing a Multicycle Processor Designing a Multicycle Processor Arquitectura de Computadoras Arturo Díaz D PérezP Centro de Investigación n y de Estudios Avanzados del IPN adiaz@cinvestav.mx Arquitectura de Computadoras Multicycle-

More information

Instruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31

Instruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31 4.16 Exercises 419 Exercise 4.11 In this exercise we examine in detail how an instruction is executed in a single-cycle datapath. Problems in this exercise refer to a clock cycle in which the processor

More information

Lecture Topics. Announcements. Today: Single-Cycle Processors (P&H ) Next: continued. Milestone #3 (due 2/9) Milestone #4 (due 2/23)

Lecture Topics. Announcements. Today: Single-Cycle Processors (P&H ) Next: continued. Milestone #3 (due 2/9) Milestone #4 (due 2/23) Lecture Topics Today: Single-Cycle Processors (P&H 4.1-4.4) Next: continued 1 Announcements Milestone #3 (due 2/9) Milestone #4 (due 2/23) Exam #1 (Wednesday, 2/15) 2 1 Exam #1 Wednesday, 2/15 (3:00-4:20

More information

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Single-Cycle Design Problems Assuming fixed-period clock every instruction datapath uses one

More information

Pipeline Review. Review

Pipeline Review. Review Pipeline Review Review Covered in EECS2021 (was CSE2021) Just a reminder of pipeline and hazards If you need more details, review 2021 materials 1 The basic MIPS Processor Pipeline 2 Performance of pipelining

More information

Lecture #17: CPU Design II Control

Lecture #17: CPU Design II Control Lecture #7: CPU Design II Control 25-7-9 Anatomy: 5 components of any Computer Personal Computer Computer Processor Control ( brain ) This week ( ) path ( brawn ) (where programs, data live when running)

More information