Computer Architecture Lecture 6: Multi-cycle Microarchitectures. Prof. Onur Mutlu Carnegie Mellon University Spring 2012, 2/6/2012

Size: px
Start display at page:

Download "Computer Architecture Lecture 6: Multi-cycle Microarchitectures. Prof. Onur Mutlu Carnegie Mellon University Spring 2012, 2/6/2012"

Transcription

1 8-447 Compter Architectre Lectre 6: lti-cycle icroarchitectres Prof. Onr tl Carnegie ellon University Spring 22, 2/6/22

2 Reminder: Homeworks Homework soltions Check and stdy the soltions! Learning now is better than rshing later Homework 2 Already ot De Febrary 3 ISA concepts, ISA vs. microarchitectre, microcoded machines 2

3 Reminder: Lab Assignment 2 Lab Assignment.5 Verilog practice Not to be trned in Lab Assignment 2 De Friday, Feb 7, at the end of the lab Individal assignment No collaboration; please respect the honor code 3

4 Etra Credit for Lab Assignment 2 Complete yor normal (single-cycle) implementation first, and get it checked off in lab. Then, implement the IPS core sing a microcoded approach similar to what we are discssing in class. We are not specifying any particlar details of the microcode format or the microarchitectre; yo shold be creative. For the etra credit, the microcoded implementation shold eecte the same programs that yor ordinary implementation does, and yo shold demo it by the normal lab deadline. 4

5 Feedback on Lab Assignment Chris, Lavanya, and Abeer are working hard on grading We will have very comprehensive tests for all labs Lab tests eercise every case of each instrction as well as long programs (e.g., REP OVS) We will release test cases and register dmps Be thorogh and test all possible cases Follow directions they are there for a reason No modifications to shell code! No naligned accesses to memory Remove all yor debgging printf s before handing in code Do the etra credit work if the lab is too easy! 5

6 ings for Today P&P, Revised Appendi C icroarchitectre of the LC-3b Appendi A (LC-3b ISA) will be sefl in following this P&H, Appendi D apping Control to Hardware Optional arice Wilkes, The Best Way to Design an Atomatic Calclating achine, anchester Univ. Compter Inagral Conf., 95. 6

7 ings for Net Lectre Pipelining P&H Chapter

8 Review of Last Lectre: Single-Cycle Uarch What phases of the instrction processing cycle does the IPS JAL instrction eercise? How many cycles does it take to process an instrction in the single-cycle microarchitectre? What determines the clock cycle time? What is the difference between path and control logic? What abot combinational vs. seqential control? What is the semantics of a delayed branch? Why this is so will become clear when we cover pipelining 8

9 Review: Instrction Processing Cycle Instrctions are processed nder the direction of a control nit step by step. Instrction cycle: Seqence of steps to process an instrction Fndamentally, there are si phases: Fetch Decode Evalate Address Fetch Operands Eecte Store Reslt Not all instrctions reqire all si stages (see P&P Ch. 4) 9

10 Review: Datapath vs. Control Logic Instrctions transform Data (AS) to Data (AS ) This transformation is done by fnctional nits Units that operate on These nits need to be told what to do to the An instrction processing engine consists of two components Datapath: Consists of hardware elements that deal with and transform signals fnctional nits that operate on hardware strctres (e.g. wires and mes) that enable the flow of into the fnctional nits and registers storage nits that store (e.g., registers) Control logic: Consists of hardware elements that determine control signals, i.e., signals that specify what the path elements shold do to the

11 Today s Agenda Finish single-cycle microarchitectres Critical path icroarchitectre design principles Performance evalation primer lti-cycle microarchitectres icroprogrammed control

12 A Note: How to ake the Best Ot of 447? Do the readings P&P Appendies A and C Wilkes 95 paper Today s lectres will be easy to nderstand if yo read these And, yo can ask more in-depth qestions and learn more Do the assignments early Yo can do things for etra credit if yo finish early We will describe what to do for etra credit Stdy the material and bzzwords daily Lectre notes, videos Bzzwords take notes dring class 2

13 Review: The Fll Single-Cycle Datapath Instrction [25 ] Shift Jmp address [3 ] left PCSrc =Jmp 4 Add PC+4 [3 28] Instrction [3 26] Control RegDst Jmp Branch em emtoreg ALUOp em ALUSrc Reg Shift left 2 Add ALU reslt PCSrc 2 =Br Taken PC address Instrction memory Instrction [3 ] Instrction [25 2] Instrction [2 6] Instrction [5 ] register register 2 Registers register 2 bcond Zero ALU ALU reslt Address Data memory Instrction [5 ] 6 Sign 32 etend ALU control ALU operation Instrction [5 ] **Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] JAL, JR, JALR omitted 3

14 Single-Cycle Datapath for Arithmetic and Logical Instrctions 4

15 Review: Datapath for R and I-Type ALU Insts. Add 4 PC address Instrction memory Instrction 25:2 2:6 Instrction 5: RegDest isitype register register 2 Registers register Reg 2 6 Sign 32 etend 3 ALUSrc isitype ALU operation Zero ALU ALU reslt Address em Data memo em if E[PC] == ADDI rt rs immediate GPR[rt] GPR[rs] + sign-etend (immediate) PC PC + 4 **Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] IF ID EX E WB Combinational state pdate logic 5

16 Single-Cycle Datapath for Data ovement Instrctions 6

17 Review: Datapath for Non-Control-Flow Insts. Add PC address Instrction memory 4 Instrction Instrction RegDest isitype register register 2 Registers register Reg!isStore 2 6 Sign 32 etend 3 Zero ALU ALU reslt ALUSrc isitype ALU operation Address isstore em Data memory isload em emtoreg isload **Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] 7

18 Single-Cycle Datapath for Control Flow Instrctions 8

19 Review: Unconditional Jmp Instrctions Assembly J immediate 26 achine encoding J 6-bit immediate 26-bit J-type Semantics if E[PC]==J immediate 26 target = { PC[3:28], immediate 26, 2 b } PC target 9

20 Review: Unconditional Jmp Datapath isj PCSrc concat PC address Instrction memory **Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] 4 Add Instrction Instrction? register register 2 Registers register Reg 2 6 Sign 32 etend X 3 X ALU operation Zero ALU ALU reslt ALUSrc Address em Data memory em if E[PC]==J immediate26 PC = { PC[3:28], immediate26, 2 b } What abot JR, JAL, JALR? 2

21 Conditional Branch Instrctions Assembly (e.g., branch if eqal) BEQ rs reg rt reg immediate 6 achine encoding BEQ 6-bit rs 5-bit rt 5-bit immediate 6-bit I-type Semantics (assming no branch delay slot) if E[PC]==BEQ rs rt immediate 6 target = PC sign-etend(immediate) 4 if GPR[rs]==GPR[rt] then PC target else PC PC + 4 2

22 Conditional Branch Datapath (For Yo to Fi) watch ot PCSrc concat PC address Instrction memory 4 Instrction Add Instrction PC + 4 from instrction path register register 2 Registers register 2 Shift left 2 Add sb 3 Sm ALU operation ALU bcond Zero Branch target To branch control logic Reg 6 Sign 32 etend **Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] How to phold the delayed branch semantics? 22

23 Ptting It All Together Instrction [25 ] Shift Jmp address [3 ] left PCSrc =Jmp 4 Add PC+4 [3 28] Instrction [3 26] Control RegDst Jmp Branch em emtoreg ALUOp em ALUSrc Reg Shift left 2 Add ALU reslt PCSrc 2 =Br Taken PC address Instrction memory Instrction [3 ] Instrction [25 2] Instrction [2 6] Instrction [5 ] register register 2 Registers register 2 bcond Zero ALU ALU reslt Address Data memory Instrction [5 ] 6 Sign 32 etend ALU control ALU operation Instrction [5 ] **Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] JAL, JR, JALR omitted 23

24 Single-Cycle Control Logic 24

25 Single-Cycle Hardwired Control As combinational fnction of Inst=E[PC] 3 6-bit 3 3 opcode 6-bit opcode 6-bit rs 5-bit rs 5-bit 2 2 rt 5-bit rt 5-bit immediate 26-bit 6 6 rd 5-bit immediate 6-bit shamt 5-bit 6 fnct 6-bit R-type I-type J-type Consider All R-type and I-type ALU instrctions LW and SW BEQ, BNE, BLEZ, BGTZ J, JR, JAL, JALR 25

26 Single-Bit Control Signals When De-asserted When asserted Eqation RegDest GPR write select according to rt, i.e., inst[2:6] GPR write select according to rd, i.e., inst[5:] opcode== ALUSrc 2 nd ALU inpt from 2 nd GPR read port 2 nd ALU inpt from signetended 6-bit immediate (opcode!=) && (opcode!=beq) && (opcode!=bne) Steer ALU reslt to GPR steer memory load to opcode==lw write port GPR wr. port JAL and JALR reqire additional RegDest and emtoreg options 26

27 Single-Bit Control Signals When De-asserted When asserted Eqation em emory read disabled emory read port retrn load vale opcode==lw em emory write disabled emory write enabled opcode==sw PCSrc According to PCSrc 2 net PC is based on 26- bit immediate jmp target (opcode==j) (opcode==jal) PCSrc 2 net PC = PC + 4 net PC is based on 6- bit immediate branch target (opcode==b) && bcond is satisfied JR and JALR reqire additional PCSrc options 27

28 ALU Control case opcode select operation according to fnct ALUi selection operation according to opcode LW select addition SW select addition B select bcond generation fnction don t care Eample ALU operations ADD, SUB, AND, OR, XOR, NOR, etc. bcond on eqal, not eqal, LE zero, GT zero, etc. 28

29 R-Type ALU Instrction [25 ] Shift Jmp address [3 ] left PCSrc =Jmp PC 4 address Instrction memory Add Instrction [3 ] PC+4 [3 28] Instrction [3 26] Instrction [25 2] Instrction [2 6] Instrction [5 ] Control RegDst Jmp Branch em emtoreg ALUOp em ALUSrc Reg register register 2 Registers register 2 Shift left 2 Add ALU reslt bcond Zero ALU ALU reslt Address Data memory PCSrc 2 =Br Taken Instrction [5 ] Instrction [5 ] 6 Sign 32 etend ALU fnct control ALU operation **Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] 29

30 I-Type ALU Instrction [25 ] Shift Jmp address [3 ] left PCSrc =Jmp PC 4 address Instrction memory Add Instrction [3 ] PC+4 [3 28] Instrction [3 26] Instrction [25 2] Instrction [2 6] Instrction [5 ] Control RegDst Jmp Branch em emtoreg ALUOp em ALUSrc Reg register register 2 Registers register 2 Shift left 2 Add ALU reslt bcond Zero ALU ALU reslt Address Data memory PCSrc 2 =Br Taken Instrction [5 ] Instrction [5 ] 6 Sign 32 etend ALU opcode control ALU operation **Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] 3

31 LW Instrction [25 ] Shift Jmp address [3 ] left PCSrc =Jmp PC 4 address Instrction memory Add Instrction [3 ] PC+4 [3 28] Instrction [3 26] Instrction [25 2] Instrction [2 6] Instrction [5 ] Control RegDst Jmp Branch em emtoreg ALUOp em ALUSrc Reg register register 2 Registers register 2 Shift left 2 Add ALU reslt bcond Zero ALU ALU reslt Address Data memory PCSrc 2 =Br Taken Instrction [5 ] Instrction [5 ] 6 Sign 32 etend ALU Add control ALU operation **Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] 3

32 SW Instrction [25 ] Shift Jmp address [3 ] left PCSrc =Jmp PC 4 address Instrction memory Add Instrction [3 ] PC+4 [3 28] Instrction [3 26] Instrction [25 2] Instrction [2 6] Instrction [5 ] Control * RegDst Jmp Branch em emtoreg ALUOp em ALUSrc Reg register register 2 Registers register 2 Shift left 2 Add ALU reslt bcond Zero ALU ALU reslt Address Data memory PCSrc 2 =Br Taken * Instrction [5 ] Instrction [5 ] 6 Sign 32 etend ALU Add control ALU operation **Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] 32

33 Branch Not Taken Instrction [25 ] Shift Jmp address [3 ] left PCSrc =Jmp PC 4 address Instrction memory Add Instrction [3 ] PC+4 [3 28] Instrction [3 26] Instrction [25 2] Instrction [2 6] Instrction [5 ] Control * RegDst Jmp Branch em emtoreg ALUOp em ALUSrc Reg register register 2 Registers register 2 Shift left 2 Add ALU reslt bcond Zero ALU ALU reslt Address Data memory PCSrc 2 =Br Taken * Instrction [5 ] Instrction [5 ] 6 Sign 32 etend ALU bcond control ALU operation **Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] 33

34 Branch Taken Instrction [25 ] Shift Jmp address [3 ] left PCSrc =Jmp PC 4 address Instrction memory Add Instrction [3 ] PC+4 [3 28] Instrction [3 26] Instrction [25 2] Instrction [2 6] Instrction [5 ] Control * RegDst Jmp Branch em emtoreg ALUOp em ALUSrc Reg register register 2 Registers register 2 Shift left 2 Add ALU reslt bcond Zero ALU ALU reslt Address Data memory PCSrc 2 =Br Taken * Instrction [5 ] Instrction [5 ] 6 Sign 32 etend ALU bcond control ALU operation **Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] 34

35 Jmp Instrction [25 ] Shift Jmp address [3 ] left PCSrc =Jmp PC 4 address Instrction memory Add Instrction [3 ] PC+4 [3 28] Instrction [3 26] Instrction [25 2] Instrction [2 6] Instrction [5 ] Control * RegDst Jmp Branch em emtoreg ALUOp em ALUSrc Reg register register 2 Registers register 2 Shift left 2 * Add ALU reslt bcond Zero ALU ALU reslt Address * Data memory PCSrc 2 =Br Taken * Instrction [5 ] Instrction [5 ] 6 Sign 32 etend * ALU control ALU operation **Based on original figre from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] 35

36 What is in That Control Bo? Combinational Logic Hardwired Control Idea: Control signals generated combinationally based on instrction Seqential Logic Seqential/icroprogrammed Control Control Store Idea: A memory strctre contains the control signals associated with an instrction 36

37 Evalating the Single-Cycle icroarchitectre 37

38 A Single-Cycle icroarchitectre Is this a good idea/design? When is this a good design? When is this a bad design? How can we design a better microarchitectre? 38

39 A Single-Cycle icroarchitectre: Analysis Every instrction takes cycle to eecte CPI (Cycles per instrction) is strictly How long each instrction takes is determined by how long the slowest instrction takes to eecte Even thogh many instrctions do not need that long to eecte Clock cycle time of the microarchitectre is determined by how long it takes to complete the slowest instrction Critical path of the design is determined by the processing time of the slowest instrction 39

40 What is the Slowest Instrction to Process? Let s go back to the basics All si phases of the instrction processing cycle take a single machine clock cycle to complete Fetch Decode Evalate Address Fetch Operands Eecte Store Reslt. Instrction fetch (IF) 2. Instrction decode and register operand fetch (ID/RF) 3. Eecte/Evalate memory address (EX/AG) 4. emory operand fetch (E) 5. Store/writeback reslt (WB) Do each of the above phases take the same time (latency) for all instrctions? 4

41 Single-Cycle Datapath Analysis Assme memory nits (read or write): 2 ps ALU and adders: ps register file (read or write): 5 ps other combinational logic: ps steps IF ID EX E WB resorces mem RF ALU mem RF Delay R-type I-type LW SW Branch Jmp 2 2 4

42 Let s Find the Critical Path Instrction [25 ] Shift Jmp address [3 ] left PCSrc =Jmp 4 Add PC+4 [3 28] Instrction [3 26] Control RegDst Jmp Branch em emtoreg ALUOp em ALUSrc Reg Shift left 2 Add ALU reslt PCSrc 2 =Br Taken PC address Instrction memory Instrction [3 ] Instrction [25 2] Instrction [2 6] Instrction [5 ] register register 2 Registers register 2 bcond Zero ALU ALU reslt Address Data memory Instrction [5 ] 6 Sign 32 etend ALU control ALU operation Instrction [5 ] [Based on original figre from P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] 42

43 R-Type and I-Type ALU Instrction [25 ] Shift Jmp address [3 ] left PCSrc =Jmp ps 4 Add ps PC+4 [3 28] Instrction [3 26] Control RegDst Jmp Branch em emtoreg ALUOp em ALUSrc Reg Shift left 2 Add ALU reslt PCSrc 2 =Br Taken PC address Instrction memory Instrction [3 ] 2ps Instrction [25 2] Instrction [2 6] Instrction [5 ] register register 2 Registers register 2 4ps 25ps bcond Zero ALU ALU reslt Address 35ps Data memory Instrction [5 ] 6 Sign 32 etend ALU control ALU operation Instrction [5 ] [Based on original figre from P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] 43

44 LW Instrction [25 ] Shift Jmp address [3 ] left PCSrc =Jmp ps 4 Add ps PC+4 [3 28] Instrction [3 26] Control RegDst Jmp Branch em emtoreg ALUOp em ALUSrc Reg Shift left 2 Add ALU reslt PCSrc 2 =Br Taken PC address Instrction memory Instrction [3 ] 2ps Instrction [25 2] Instrction [2 6] Instrction [5 ] register register 2 Registers 2 register 6ps 25ps bcond Zero ALU ALU reslt Address 35ps Data memory 55ps Instrction [5 ] 6 Sign 32 etend ALU control ALU operation Instrction [5 ] [Based on original figre from P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] 44

45 SW Instrction [25 ] Shift Jmp address [3 ] left PCSrc =Jmp ps 4 Add ps PC+4 [3 28] Instrction [3 26] Control RegDst Jmp Branch em emtoreg ALUOp em ALUSrc Reg Shift left 2 Add ALU reslt PCSrc 2 =Br Taken PC address Instrction memory Instrction [3 ] 2ps Instrction [25 2] Instrction [2 6] Instrction [5 ] register register 2 Registers register 2 25ps bcond Zero ALU ALU reslt 35ps Address Data 55ps memory Instrction [5 ] 6 Sign 32 etend ALU control ALU operation Instrction [5 ] [Based on original figre from P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] 45

46 Branch Taken 35ps PC 4 address Instrction memory Add ps Instrction [3 ] Instrction [25 ] Shift Jmp address [3 ] left ps PC+4 [3 28] Instrction [3 26] Instrction [25 2] Instrction [2 6] Instrction [5 ] Instrction [5 ] Control RegDst Jmp Branch em emtoreg ALUOp em ALUSrc Reg register register 2 Registers register 2 6 Sign 32 etend Shift left 2 25ps ALU control 2ps Add ALU reslt bcond Zero ALU ALU reslt 35ps ALU operation Address PCSrc =Jmp Data memory PCSrc 2 =Br Taken Instrction [5 ] [Based on original figre from P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] 46

47 Jmp Instrction [25 ] Shift Jmp address [3 ] left PCSrc =Jmp 2ps 4 Add ps PC+4 [3 28] Instrction [3 26] Control RegDst Jmp Branch em emtoreg ALUOp em ALUSrc Reg Shift left 2 Add ALU reslt PCSrc 2 =Br Taken PC address Instrction memory Instrction [3 ] 2ps Instrction [25 2] Instrction [2 6] Instrction [5 ] register register 2 Registers register 2 bcond Zero ALU ALU reslt Address Data memory Instrction [5 ] 6 Sign 32 etend ALU control ALU operation Instrction [5 ] [Based on original figre from P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] 47

48 What Abot Control Logic? How does that affect the critical path? Food for thoght for yo: Can control logic be on the critical path? A note on CDC 56: control store access too long 48

49 What is the Slowest Instrction to Process? emory is not magic What if memory sometimes takes ms to access? Does it make sense to have a simple register to register add or jmp to take {ms+all else to do a memory operation}? And, what if yo need to access memory more than once to process an instrction? Which instrctions need this? VAX INDEX instrction Do yo provide mltiple ports to memory? 49

50 Single Cycle Arch: Compleity Contrived All instrctions rn as slow as the slowest instrction Inefficient All instrctions rn as slow as the slowest instrction st provide worst-case combinational resorces in parallel as reqired by any instrction Need to replicate a resorce if it is needed more than once by an instrction dring different parts of the instrction processing cycle Not necessarily the simplest way to implement an ISA Single-cycle implementation of REP OVS, INDEX, POLY? Not easy to optimize/improve performance Optimizing the common case does not work (e.g. common instrctions) Need to optimize the worst case all the time 5

51 icroarchitectre Design Principles Critical path design Find the maimm combinational logic delay and decrease it Bread and btter (common case) design Spend time and resorces on where it matters i.e., improve what the machine is really designed to do Common case vs. ncommon case Balanced design Balance instrction/ flow throgh hardware components Balance the hardware needed to accomplish the work How does a single-cycle microarchitectre fare in light of these principles? 5

52 lti-cycle icroarchitectres 52

53 lti-cycle icroarchitectres Goal: Let each instrction take (close to) only as mch time it really needs Idea Determine clock cycle time independently of instrction processing time Each instrction takes as many clock cycles as it needs to take ltiple state transitions per instrction The states followed by each instrction is different 53

54 Remember: The Process instrction Step ISA specifies abstractly what A shold be, given an instrction and A It defines an abstract finite state machine where State = programmer-visible state Net-state logic = instrction eection specification From ISA point of view, there are no intermediate states between A and A dring instrction eection One state transition per instrction icroarchitectre implements how A is transformed to A There are many choices in implementation We can have programmer-invisible state to optimize the speed of instrction eection: mltiple state transitions per instrction Choice : AS AS (transform A to A in a single clock cycle) Choice 2: AS AS+S AS+S2 AS+S3 AS (take mltiple clock cycles to transform AS to AS ) 54

55 lti-cycle icroarchitectre AS = Architectral (programmer visible) state at the beginning of an instrction Step : Process part of instrction in one clock cycle Step 2: Process part of instrction in the net clock cycle AS = Architectral (programmer visible) state at the end of a clock cycle 55

56 Benefits of lti-cycle Design Critical path design Can keep redcing the critical path independently of worst-case processing time of any instrction Bread and btter (common case) design Can optimize the nmber of states it takes to eecte important instrctions that make p mch of the eection time Balanced design No need to provide more capability or resorces than really needed An instrction that needs resorce X mltiple times does not reqire mltiple X s to be implemented Leads to more efficient hardware: Can rese hardware components needed mltiple times for an instrction 56

57 Performance Analysis Eection time of an instrction {CPI} {clock cycle time} Eection time of a program Sm over all instrctions [{CPI} {clock cycle time}] {# of instrctions} {Average CPI} {clock cycle time} Single cycle microarchitectre performance CPI = Clock cycle time = long lti-cycle microarchitectre performance CPI = different for each instrction Average CPI hopeflly small Clock cycle time = short Now, we have two degrees of freedom to optimize independently 57

58 An Aside: CPI vs. Freqency CPI vs. Clock cycle time At odds with each other Redcing one increases the other for a single instrction Why? Average CPI can be amortized/redced via concrrent processing of mltiple instrctions The same cycle is devoted to mltiple instrctions Eample: Pipelining, sperscalar eection 58

59 A lti-cycle icroarchitectre A Closer Look 59

60 How Do We Implement This? arice Wilkes, The Best Way to Design an Atomatic Calclating achine, anchester Univ. Compter Inagral Conf., 95. The concept of microcoded/microprogrammed machines Realization One can implement the process instrction step as a finite state machine that seqences between states and eventally retrns back to the fetch instrction state A state is defined by the control signals asserted in it Control signals for the net state determined in crrent state 6

61 The Instrction Processing Cycle Fetch Decode Evalate Address Fetch Operands Eecte Store Reslt 6

62 A Basic lti-cycle icroarchitectre Instrction processing cycle divided into states A stage in the instrction processing cycle can take mltiple states A mlti-cycle microarchitectre seqences from state to state to process an instrction The behavior of the machine in a state is completely determined by control signals in that state The behavior of the entire processor is specified flly by a finite state machine In a state (clock cycle), control signals control How the path shold process the How to generate the control signals for the net clock cycle 62

63 icroprogrammed Control Terminology Control signals associated with the crrent state icroinstrction Act of transitioning from one state to another Determining the net state and the microinstrction for the net state icroseqencing Control store stores control signals for every possible state Store for microinstrctions for the entire FS icroseqencer determines which set of control signals will be sed in the net clock cycle (i.e. net state) 63

64 What Happens In A Clock Cycle? The control signals (microinstrction) for the crrent state control Processing in the path Generation of control signals (microinstrction) for the net cycle See Spplemental Fig Datapath and microseqencer operate concrrently Qestion: why not generate control signals for the crrent cycle in the crrent cycle? This will lengthen the clock cycle Why wold it lengthen the clock cycle? See Spplemental Fig 2 64

65 A Clock Cycle 65

66 A Bad Clock Cycle! 66

67 A Simple LC-3b Control and Datapath 67

68 What Determines Net-State Control Signals? What is happening in the crrent clock cycle See the 9 control signals coming from Control block What are these for? The instrction that is being eected IR[5:] coming from the Data Path Whether the condition of a branch is met, if the instrction being processed is a branch BEN bit coming from the path Whether the memory operation is completing in the crrent cycle, if one is in progress R bit coming from memory 68

69 A Simple LC-3b Control and Datapath 69

70 The State achine for lti-cycle Processing The behavior of the LC-3b arch is completely determined by the 35 control signals and additional 7 bits that go into the control logic from the path 35 control signals completely describe the state of the control strctre We can completely describe the behavior of the LC-3b as a state machine, i.e. a directed graph of Nodes (one corresponding to each state) Arcs (showing flow from each state to the net state(s)) 7

71 An LC-3b State achine Patt and Patel, App C, Figre C.2 Each state mst be niqely specified Done by means of state variables 3 distinct states in this LC-3b state machine Encoded with 6 state variables Eamples State 8,9 correspond to the beginning of the instrction processing cycle Fetch phase: state 8, 9 state 33 state 35 Decode phase: state 32 7

72 AR <! PC PC <! PC + 2 8, 9 DR <! 33 R R IR <! DR 35 To 8 RTI ADD BEN<! IR[] & N + IR[] & Z + IR[9] & P [IR[5:2]] 32 BR To To To 8 DR<! SR+OP2* set CC DR<! SR&OP2* set CC 5 AND XOR TRAP SHF LEA LDB LDW STW STB JSR JP [BEN] 22 PC<! PC+LSHF(off9,) To 8 9 DR<! SR XOR OP2* set CC 2 PC<! BaseR To 8 To 8 AR<! LSHF(ZEXT[IR[7:]],) 5 4 [IR[]] To 8 R DR<! [AR] R7<! PC R PC<! DR R7<! PC PC<! BaseR 2 R7<! PC To 8 PC<! PC+LSHF(off,) To 8 3 DR<! SHF(SR,A,D,amt4) set CC To 8 To 8 4 DR<! PC+LSHF(off9, ) set CC 2 AR<! B+off6 6 AR<! B+LSHF(off6,) 7 AR<! B+LSHF(off6,) 3 AR<! B+off6 To NOTES B+off6 : Base + SEXT[offset6] PC+off9 : PC + SEXT[offset9] *OP2 may be SR2 or SEXT[imm5] ** [5:8] or [7:] depending on AR[] DR<! [AR[5:] ] R R 3 DR<! SEXT[BYTE.DATA] set CC DR<! [AR] 27 R DR<! DR set CC R DR<! SR 6 [AR]<! DR R R DR<! SR[7:] 7 [AR]<! DR** R R To 8 To 8 To 8 To 9

73 LC-3b State achine: Some Qestions How many cycles does the fastest instrction take? How many cycles does the slowest instrction take? Why does the BR take as long as it takes in the FS? What determines the clock cycle? Is this a ealy machine or a oore machine? 73

74 LC-3b Datapath Patt and Patel, App C, Figre C.3 Single-bs path design At any point only one vale can be gated on the bs (i.e., can be driving the bs) Advantage: Low hardware cost: one bs Disadvantage: Redced concrrency if instrction needs the bs twice for two different things, these need to happen in different states Control signals (26 of them) determine what happens in the path in one clock cycle Patt and Patel, App C, Table C. 74

75

76 We did not cover the following slides in lectre. These are for yor preparation for the net lectre.

77 C.4. THE CONTROL STRUCTURE IR[:9] IR[:9] DR SR IR[8:6] DRUX SRUX (a) (b) IR[:9] N Z P Logic BEN (c) Figre C.6: Additional logic reqired to provide control signals

78

79 LC-3b Datapath: Some Qestions How does instrction fetch happen in this path according to the state machine? What is the difference between gating and loading? Is this the smallest hardware yo can design? 79

80 LC-3b icroprogrammed Control Strctre Patt and Patel, App C, Figre C.4 Three components: icroinstrction, control store, microseqencer icroinstrction: control signals that control the path (26 of them) and determine the net state (9 of them) Each microinstrction is stored in a niqe location in the control store (a special memory strctre) Uniqe location: address of the state corresponding to the microinstrction Remember each state corresponds to one microinstrction icroseqencer determines the address of the net microinstrction (i.e., net state) 8

81 R IR[5:] BEN icroseqencer 6 Control Store icroinstrction 9 26 (J, COND, IRD)

82 APPENDIX C. THE ICROARCHITECTURE OF THE LC-3B, BASIC ACHINE COND COND BEN R IR[] Branch y Addr. ode J[5] J[4] J[3] J[2] J[] J[],,IR[5:2] 6 IRD 6 Address of Net State Figre C.5: The microseqencer of the LC-3b base machine

83 J IRD Cond LD.DR LD.IR LD.BEN LD.REG LD.CC LD.AR GatePC GateDR GateALU LD.PC GateARUX GateSHF PCUX DRUX SRUX ADDRUX ADDR2UX ARUX ALUK IO.EN R.W DATA.SIZE LSHF (State ) (State ) (State 2) (State 3) (State 4) (State 5) (State 6) (State 7) (State 8) (State 9) (State ) (State ) (State 2) (State 3) (State 4) (State 5) (State 6) (State 7) (State 8) (State 9) (State 2) (State 2) (State 22) (State 23) (State 24) (State 25) (State 26) (State 27) (State 28) (State 29) (State 3) (State 3) (State 32) (State 33) (State 34) (State 35) (State 36) (State 37) (State 38) (State 39) (State 4) (State 4) (State 42) (State 43) (State 44) (State 45) (State 46) (State 47) (State 48) (State 49) (State 5) (State 5) (State 52) (State 53) (State 54) (State 55) (State 56) (State 57) (State 58) (State 59) (State 6) (State 6) (State 62) (State 63)

84 LC-3b icroseqencer Patt and Patel, App C, Figre C.5 The prpose of the microseqencer is to determine the address of the net microinstrction (i.e., net state) Net address depends on 9 control signals 84

85 APPENDIX C. THE ICROARCHITECTURE OF THE LC-3B, BASIC ACHINE COND COND BEN R IR[] Branch y Addr. ode J[5] J[4] J[3] J[2] J[] J[],,IR[5:2] 6 IRD 6 Address of Net State Figre C.5: The microseqencer of the LC-3b base machine

86 The icroseqencer: Some Qestions When is the IRD signal asserted? What happens if an illegal instrction is decoded? What are condition (COND) bits for? How is variable latency memory handled? How do yo do the state encoding? inimize nmber of state variables Start with the 6-way branch Then determine constraint tables and states dependent on COND 86

87 Variable-Latency emory The ready signal (R) enables memory read/write to eecte correctly Eample: transition from state 8 to state 33 is controlled by the R bit asserted by memory when memory is available Cold we have done this in a single-cycle microarchitectre? 87

88 The icroseqencer: Advanced Qestions What happens if the machine is interrpted? What if an instrction generates an eception? How can yo implement a comple instrction sing this control strctre? Think REP OVS 88

Computer Architecture

Computer Architecture Compter Architectre Lectre 4: Intro to icroarchitectre: Single- Cycle Dr. Ahmed Sallam Sez Canal University Based on original slides by Prof. Onr tl Review Compter Architectre Today and Basics (Lectres

More information

Computer Architecture

Computer Architecture Compter Architectre Lectre 4: Intro to icroarchitectre: Single- Cycle Dr. Ahmed Sallam Sez Canal University Spring 25 Based on original slides by Prof. Onr tl Review Compter Architectre Today and Basics

More information

Computer Architecture. Lecture 6: Pipelining

Computer Architecture. Lecture 6: Pipelining Compter Architectre Lectre 6: Pipelining Dr. Ahmed Sallam Based on original slides by Prof. Onr tl Agenda for Today & Net Few Lectres Single-cycle icroarchitectres lti-cycle and icroprogrammed icroarchitectres

More information

Computer Architecture. Lecture 5: Multi-Cycle and Microprogrammed Microarchitectures

Computer Architecture. Lecture 5: Multi-Cycle and Microprogrammed Microarchitectures Computer Architecture Lecture 5: Multi-Cycle and Microprogrammed Microarchitectures Dr. Ahmed Sallam Based on original slides by Prof. Onur Mutlu Agenda for Today & Next Few Lectures Single-cycle Microarchitectures

More information

Lecture 6: Microprogrammed Multi Cycle Implementation. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 6: Microprogrammed Multi Cycle Implementation. James C. Hoe Department of ECE Carnegie Mellon University 8 447 Lectre 6: icroprogrammed lti Cycle Implementation James C. Hoe Department of ECE Carnegie ellon University 8 447 S8 L06 S, James C. Hoe, CU/ECE/CALC, 208 Yor goal today Hosekeeping nderstand why

More information

Lecture 9: Microcontrolled Multi-Cycle Implementations

Lecture 9: Microcontrolled Multi-Cycle Implementations 8-447 Lectre 9: icroled lti-cycle Implementations James C. Hoe Dept of ECE, CU Febrary 8, 29 S 9 L9- Annoncements: P&H Appendi D Get started t on Lab Handots: Handot #8: Project (on Blackboard) Single-Cycle

More information

The extra single-cycle adders

The extra single-cycle adders lticycle Datapath As an added bons, we can eliminate some of the etra hardware from the single-cycle path. We will restrict orselves to sing each fnctional nit once per cycle, jst like before. Bt since

More information

The single-cycle design from last time

The single-cycle design from last time lticycle path Last time we saw a single-cycle path and control nit for or simple IPS-based instrction set. A mlticycle processor fies some shortcomings in the single-cycle CPU. Faster instrctions are not

More information

The final datapath. M u x. Add. 4 Add. Shift left 2. PCSrc. RegWrite. MemToR. MemWrite. Read data 1 I [25-21] Instruction. Read. register 1 Read.

The final datapath. M u x. Add. 4 Add. Shift left 2. PCSrc. RegWrite. MemToR. MemWrite. Read data 1 I [25-21] Instruction. Read. register 1 Read. The final path PC 4 Add Reg Shift left 2 Add PCSrc Instrction [3-] Instrction I [25-2] I [2-6] I [5 - ] register register 2 register 2 Registers ALU Zero Reslt ALUOp em Data emtor RegDst ALUSrc em I [5

More information

Pipelining. Chapter 4

Pipelining. Chapter 4 Pipelining Chapter 4 ake processor rns faster Pipelining is an implementation techniqe in which mltiple instrctions are overlapped in eection Key of making processor fast Pipelining Single cycle path we

More information

Review Multicycle: What is Happening. Controlling The Multicycle Design

Review Multicycle: What is Happening. Controlling The Multicycle Design Review lticycle: What is Happening Reslt Zero Op SrcA SrcB Registers Reg Address emory em Data Sign etend Shift left Sorce A B Ot [-6] [5-] [-6] [5-] [5-] Instrction emory IR RegDst emtoreg IorD em em

More information

1048: Computer Organization

1048: Computer Organization 48: Compter Organization Lectre 5 Datapath and Control Lectre5A - simple implementation (cwli@twins.ee.nct.ed.tw) 5A- Introdction In this lectre, we will try to implement simplified IPS which contain emory

More information

CMSC Computer Architecture Lecture 4: Single-Cycle uarch and Pipelining. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 4: Single-Cycle uarch and Pipelining. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 4: Single-Cycle uarch and Pipelining Prof. Yanjing Li University of Chicago Administrative Stuff! Lab1 due at 11:59pm today! Lab2 out " Pipeline ARM simulator "

More information

Computer Architecture Chapter 5. Fall 2005 Department of Computer Science Kent State University

Computer Architecture Chapter 5. Fall 2005 Department of Computer Science Kent State University Compter Architectre Chapter 5 Fall 25 Department of Compter Science Kent State University The Processor: Datapath & Control Or implementation of the MIPS is simplified memory-reference instrctions: lw,

More information

CS 251, Winter 2018, Assignment % of course mark

CS 251, Winter 2018, Assignment % of course mark CS 25, Winter 28, Assignment 4.. 3% of corse mark De Wednesday, arch 7th, 4:3P Lates accepted ntil Thrsday arch 8th, am with a 5% penalty. (6 points) In the diagram below, the mlticycle compter from the

More information

EXAMINATIONS 2010 END OF YEAR NWEN 242 COMPUTER ORGANIZATION

EXAMINATIONS 2010 END OF YEAR NWEN 242 COMPUTER ORGANIZATION EXAINATIONS 2010 END OF YEAR COPUTER ORGANIZATION Time Allowed: 3 Hors (180 mintes) Instrctions: Answer all qestions. ake sre yor answers are clear and to the point. Calclators and paper foreign langage

More information

Quiz #1 EEC 483, Spring 2019

Quiz #1 EEC 483, Spring 2019 Qiz # EEC 483, Spring 29 Date: Jan 22 Name: Eercise #: Translate the following instrction in C into IPS code. Eercise #2: Translate the following instrction in C into IPS code. Hint: operand C is stored

More information

EEC 483 Computer Organization

EEC 483 Computer Organization EEC 483 Compter Organization Chapter 4.4 A Simple Implementation Scheme Chans Y The Big Pictre The Five Classic Components of a Compter Processor Control emory Inpt path Otpt path & Control 2 path and

More information

EXAMINATIONS 2003 END-YEAR COMP 203. Computer Organisation

EXAMINATIONS 2003 END-YEAR COMP 203. Computer Organisation EXAINATIONS 2003 COP203 END-YEAR Compter Organisation Time Allowed: 3 Hors (180 mintes) Instrctions: Answer all qestions. There are 180 possible marks on the eam. Calclators and foreign langage dictionaries

More information

Review: Computer Organization

Review: Computer Organization Review: Compter Organization Pipelining Chans Y Landry Eample Landry Eample Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 3 mintes A B C D Dryer takes 3 mintes

More information

The multicycle datapath. Lecture 10 (Wed 10/15/2008) Finite-state machine for the control unit. Implementing the FSM

The multicycle datapath. Lecture 10 (Wed 10/15/2008) Finite-state machine for the control unit. Implementing the FSM Lectre (Wed /5/28) Lab # Hardware De Fri Oct 7 HW #2 IPS programming, de Wed Oct 22 idterm Fri Oct 2 IorD The mlticycle path SrcA Today s objectives: icroprogramming Etending the mlti-cycle path lti-cycle

More information

Review. A single-cycle MIPS processor

Review. A single-cycle MIPS processor Review If three instrctions have opcodes, 7 and 5 are they all of the same type? If we were to add an instrction to IPS of the form OD $t, $t2, $t3, which performs $t = $t2 OD $t3, what wold be its opcode?

More information

PART I: Adding Instructions to the Datapath. (2 nd Edition):

PART I: Adding Instructions to the Datapath. (2 nd Edition): EE57 Instrctor: G. Pvvada ===================================================================== Homework #5b De: check on the blackboard =====================================================================

More information

CS 251, Winter 2019, Assignment % of course mark

CS 251, Winter 2019, Assignment % of course mark CS 25, Winter 29, Assignment.. 3% of corse mark De Wednesday, arch 3th, 5:3P Lates accepted ntil Thrsday arch th, pm with a 5% penalty. (7 points) In the diagram below, the mlticycle compter from the corse

More information

What do we have so far? Multi-Cycle Datapath

What do we have so far? Multi-Cycle Datapath What do we have so far? lti-cycle Datapath CPI: R-Type = 4, Load = 5, Store 4, Branch = 3 Only one instrction being processed in datapath How to lower CPI frther? #1 Lec # 8 Spring2 4-11-2 Pipelining pipelining

More information

Prof. Kozyrakis. 1. (10 points) Consider the following fragment of Java code:

Prof. Kozyrakis. 1. (10 points) Consider the following fragment of Java code: EE8 Winter 25 Homework #2 Soltions De Thrsday, Feb 2, 5 P. ( points) Consider the following fragment of Java code: for (i=; i

More information

Computer Architecture Lecture 6: Multi-Cycle and Microprogrammed Microarchitectures

Computer Architecture Lecture 6: Multi-Cycle and Microprogrammed Microarchitectures 18-447 Computer Architecture Lecture 6: Multi-Cycle and Microprogrammed Microarchitectures Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 1/28/2015 Agenda for Today & Next Few Lectures Single-cycle

More information

Exceptions and interrupts

Exceptions and interrupts Eceptions and interrpts An eception or interrpt is an nepected event that reqires the CPU to pase or stop the crrent program. Eception handling is the hardware analog of error handling in software. Classes

More information

1048: Computer Organization

1048: Computer Organization 8: Compter Organization Lectre 6 Pipelining Lectre6 - pipelining (cwli@twins.ee.nct.ed.tw) 6- Otline An overview of pipelining A pipelined path Pipelined control Data hazards and forwarding Data hazards

More information

CS 251, Spring 2018, Assignment 3.0 3% of course mark

CS 251, Spring 2018, Assignment 3.0 3% of course mark CS 25, Spring 28, Assignment 3. 3% of corse mark De onday, Jne 25th, 5:3 P. (5 points) Consider the single-cycle compter shown on page 6 of this assignment. Sppose the circit elements take the following

More information

EEC 483 Computer Organization

EEC 483 Computer Organization EEC 83 Compter Organization Chapter.6 A Pipelined path Chans Y Pipelined Approach 2 - Cycle time, No. stages - Resorce conflict E E A B C D 3 E E 5 E 2 3 5 2 6 7 8 9 c.y9@csohio.ed Resorces sed in 5 Stages

More information

Enhanced Performance with Pipelining

Enhanced Performance with Pipelining Chapter 6 Enhanced Performance with Pipelining Note: The slides being presented represent a mi. Some are created by ark Franklin, Washington University in St. Lois, Dept. of CSE. any are taken from the

More information

CSE Introduction to Computer Architecture Chapter 5 The Processor: Datapath & Control

CSE Introduction to Computer Architecture Chapter 5 The Processor: Datapath & Control CSE-45432 Introdction to Compter Architectre Chapter 5 The Processor: Datapath & Control Dr. Izadi Data Processor Register # PC Address Registers ALU memory Register # Register # Address Data memory Data

More information

1048: Computer Organization

1048: Computer Organization 48: Compter Organization Lectre 5 Datapath and Control Lectre5B - mlticycle implementation (cwli@twins.ee.nct.ed.tw) 5B- Recap: A Single-Cycle Processor PCSrc 4 Add Shift left 2 Add ALU reslt PC address

More information

TDT4255 Friday the 21st of October. Real world examples of pipelining? How does pipelining influence instruction

TDT4255 Friday the 21st of October. Real world examples of pipelining? How does pipelining influence instruction Review Friday the 2st of October Real world eamples of pipelining? How does pipelining pp inflence instrction latency? How does pipelining inflence instrction throghpt? What are the three types of hazard

More information

CS 251, Winter 2018, Assignment % of course mark

CS 251, Winter 2018, Assignment % of course mark CS 25, Winter 28, Assignment 3.. 3% of corse mark De onday, Febrary 26th, 4:3 P Lates accepted ntil : A, Febrary 27th with a 5% penalty. IEEE 754 Floating Point ( points): (a) (4 points) Complete the following

More information

Lecture 7. Building A Simple Processor

Lecture 7. Building A Simple Processor Lectre 7 Bilding A Simple Processor Christos Kozyrakis Stanford University http://eeclass.stanford.ed/ee8b C. Kozyrakis EE8b Lectre 7 Annoncements Upcoming deadlines Lab is de today Demo by 5pm, report

More information

Chapter 6: Pipelining

Chapter 6: Pipelining CSE 322 COPUTER ARCHITECTURE II Chapter 6: Pipelining Chapter 6: Pipelining Febrary 10, 2000 1 Clothes Washing CSE 322 COPUTER ARCHITECTURE II The Assembly Line Accmlate dirty clothes in hamper Place in

More information

Chapter 3 & Appendix C Pipelining Part A: Basic and Intermediate Concepts

Chapter 3 & Appendix C Pipelining Part A: Basic and Intermediate Concepts CS359: Compter Architectre Chapter 3 & Appendi C Pipelining Part A: Basic and Intermediate Concepts Yanyan Shen Department of Compter Science and Engineering Shanghai Jiao Tong University 1 Otline Introdction

More information

Lecture 3: Single Cycle Microarchitecture. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 3: Single Cycle Microarchitecture. James C. Hoe Department of ECE Carnegie Mellon University 8 447 Lecture 3: Single Cycle Microarchitecture James C. Hoe Department of ECE Carnegie Mellon University 8 447 S8 L03 S, James C. Hoe, CMU/ECE/CALCM, 208 Your goal today Housekeeping first try at implementing

More information

Solutions for Chapter 6 Exercises

Solutions for Chapter 6 Exercises Soltions for Chapter 6 Eercises Soltions for Chapter 6 Eercises 6. 6.2 a. Shortening the ALU operation will not affect the speedp obtained from pipelining. It wold not affect the clock cycle. b. If the

More information

PS Midterm 2. Pipelining

PS Midterm 2. Pipelining PS idterm 2 Pipelining Seqential Landry 6 P 7 8 9 idnight Time T a s k O r d e r A B C D 3 4 2 3 4 2 3 4 2 3 4 2 Seqential landry takes 6 hors for 4 loads If they learned pipelining, how long wold landry

More information

Design of Digital Circuits Lecture 13: Multi-Cycle Microarch. Prof. Onur Mutlu ETH Zurich Spring April 2017

Design of Digital Circuits Lecture 13: Multi-Cycle Microarch. Prof. Onur Mutlu ETH Zurich Spring April 2017 Design of Digital Circuits Lecture 3: Multi-Cycle Microarch. Prof. Onur Mutlu ETH Zurich Spring 27 6 April 27 Agenda for Today & Next Few Lectures! Single-cycle Microarchitectures! Multi-cycle and Microprogrammed

More information

Lab 8 (All Sections) Prelab: ALU and ALU Control

Lab 8 (All Sections) Prelab: ALU and ALU Control Lab 8 (All Sections) Prelab: and Control Name: Sign the following statement: On my honor, as an Aggie, I have neither given nor received nathorized aid on this academic work Objective In this lab yo will

More information

Comp 303 Computer Architecture A Pipelined Datapath Control. Lecture 13

Comp 303 Computer Architecture A Pipelined Datapath Control. Lecture 13 Comp 33 Compter Architectre A Pipelined path Lectre 3 Pipelined path with Signals PCSrc IF/ ID ID/ EX EX / E E / Add PC 4 Address Instrction emory RegWr ra rb rw Registers bsw [5-] [2-6] [5-] bsa bsb Sign

More information

Lecture 10: Pipelined Implementations

Lecture 10: Pipelined Implementations U 8-7 S 9 L- 8-7 Lectre : Pipelined Implementations James. Hoe ept of EE, U Febrary 23, 29 nnoncements: Project is de this week idterm graded, d reslts posted Handots: H9 Homework 3 (on lackboard) Graded

More information

Overview of Pipelining

Overview of Pipelining EEC 58 Compter Architectre Pipelining Department of Electrical Engineering and Compter Science Cleveland State University Fndamental Principles Overview of Pipelining Pipelined Design otivation: Increase

More information

Hardware Design Tips. Outline

Hardware Design Tips. Outline Hardware Design Tips EE 36 University of Hawaii EE 36 Fall 23 University of Hawaii Otline Verilog: some sbleties Simlators Test Benching Implementing the IPS Actally a simplified 6 bit version EE 36 Fall

More information

Chapter 6 Enhancing Performance with. Pipelining. Pipelining. Pipelined vs. Single-Cycle Instruction Execution: the Plan. Pipelining: Keep in Mind

Chapter 6 Enhancing Performance with. Pipelining. Pipelining. Pipelined vs. Single-Cycle Instruction Execution: the Plan. Pipelining: Keep in Mind Pipelining hink of sing machines in landry services Chapter 6 nhancing Performance with Pipelining 6 P 7 8 9 A ime ask A B C ot pipelined Assme 3 min. each task wash, dry, fold, store and that separate

More information

Lecture 13: Exceptions and Interrupts

Lecture 13: Exceptions and Interrupts 18 447 Lectre 13: Eceptions and Interrpts S 10 L13 1 James C. Hoe Dept of ECE, CU arch 1, 2010 Annoncements: Handots: Spring break is almost here Check grades on Blackboard idterm 1 graded Handot #9: Lab

More information

Animating the Datapath. Animating the Datapath: R-type Instruction. Animating the Datapath: Load Instruction. MIPS Datapath I: Single-Cycle

Animating the Datapath. Animating the Datapath: R-type Instruction. Animating the Datapath: Load Instruction. MIPS Datapath I: Single-Cycle nimating the atapath PS atapath : Single-Cycle npt is either (-type) or sign-etended lower half of instrction (load/store) op offset/immediate W egister File 6 6 + from instrction path beq,, offset if

More information

Instruction fetch. MemRead. IRWrite ALUSrcB = 01. ALUOp = 00. PCWrite. PCSource = 00. ALUSrcB = 00. R-type completion

Instruction fetch. MemRead. IRWrite ALUSrcB = 01. ALUOp = 00. PCWrite. PCSource = 00. ALUSrcB = 00. R-type completion . (Chapter 5) Fill in the vales for SrcA, SrcB, IorD, Dst and emto to complete the Finite State achine for the mlti-cycle datapath shown below. emory address comptation 2 SrcA = SrcB = Op = fetch em SrcA

More information

Chapter 6: Pipelining

Chapter 6: Pipelining Chapter 6: Pipelining Otline An overview of pipelining A pipelined path Pipelined control Data hazards and forwarding Data hazards and stalls Branch hazards Eceptions Sperscalar and dynamic pipelining

More information

EEC 483 Computer Organization. Branch (Control) Hazards

EEC 483 Computer Organization. Branch (Control) Hazards EEC 483 Compter Organization Section 4.8 Branch Hazards Section 4.9 Exceptions Chans Y Branch (Control) Hazards While execting a previos branch, next instrction address might not yet be known. s n i o

More information

Single-Cycle Examples, Multi-Cycle Introduction

Single-Cycle Examples, Multi-Cycle Introduction Single-Cycle Examples, ulti-cycle Introduction 1 Today s enu Single cycle examples Single cycle machines vs. multi-cycle machines Why multi-cycle? Comparative performance Physical and Logical Design of

More information

CSEN 601: Computer System Architecture Summer 2014

CSEN 601: Computer System Architecture Summer 2014 CSEN 601: Computer System Architecture Summer 2014 Practice Assignment 5 Solutions Exercise 5-1: (Midterm Spring 2013) a. What are the values of the control signals (except ALUOp) for each of the following

More information

CSSE232 Computer Architecture I. Mul5cycle Datapath

CSSE232 Computer Architecture I. Mul5cycle Datapath CSSE232 Compter Architectre I Ml5cycle Datapath Class Stats Next 3 days : Ml5cycle datapath ing Ml5cycle datapath is not in the book! How long do instrc5ons take? ALU 2ns Mem 2ns Reg File 1ns Everything

More information

Lecture 7 Pipelining. Peng Liu.

Lecture 7 Pipelining. Peng Liu. Lecture 7 Pipelining Peng Liu liupeng@zju.edu.cn 1 Review: The Single Cycle Processor 2 Review: Given Datapath,RTL -> Control Instruction Inst Memory Adr Op Fun Rt

More information

PIPELINING. Pipelining: Natural Phenomenon. Pipelining. Pipelining Lessons

PIPELINING. Pipelining: Natural Phenomenon. Pipelining. Pipelining Lessons Pipelining: Natral Phenomenon Landry Eample: nn, rian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 mintes C D Dryer takes 0 mintes PIPELINING Folder takes 20 mintes

More information

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control ELEC 52/62 Computer Architecture and Design Spring 217 Lecture 4: Datapath and Control Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn, AL 36849

More information

ETH, Design of Digital Circuits, SS17 Practice Exercises III

ETH, Design of Digital Circuits, SS17 Practice Exercises III ETH, Design of Digital Circuits, SS17 Practice Exercises III Instructors: Prof. Onur Mutlu, Prof. Srdjan Capkun TAs: Jeremie Kim, Minesh Patel, Hasan Hassan, Arash Tavakkol, Der-Yeuan Yu, Francois Serre,

More information

MIPS Architecture. Fibonacci (C) Fibonacci (Assembly) Another Example: MIPS. Example: subset of MIPS processor architecture

MIPS Architecture. Fibonacci (C) Fibonacci (Assembly) Another Example: MIPS. Example: subset of MIPS processor architecture Another Eample: IPS From the Harris/Weste book Based on the IPS-like processor from the Hennessy/Patterson book IPS Architectre Eample: sbset of IPS processor architectre Drawn from Patterson & Hennessy

More information

Lecture 8: Data Hazard and Resolution. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 8: Data Hazard and Resolution. James C. Hoe Department of ECE Carnegie Mellon University 18 447 Lecture 8: Data Hazard and Resolution James C. Hoe Department of ECE Carnegie ellon University 18 447 S18 L08 S1, James C. Hoe, CU/ECE/CALC, 2018 Your goal today Housekeeping detect and resolve

More information

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture The Processor Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut CSE3666: Introduction to Computer Architecture Introduction CPU performance factors Instruction count

More information

Computer and Information Sciences College / Computer Science Department The Processor: Datapath and Control

Computer and Information Sciences College / Computer Science Department The Processor: Datapath and Control Computer and Information Sciences College / Computer Science Department The Processor: Datapath and Control Chapter 5 The Processor: Datapath and Control Big Picture: Where are We Now? Performance of a

More information

Lecture 9: Microcontrolled Multi-Cycle Implementations. Who Am I?

Lecture 9: Microcontrolled Multi-Cycle Implementations. Who Am I? 18-447 Lecture 9: Microcontrolled Multi-Cycle Implementations S 10 L9-1 James C. Hoe José F. Martínez Electrical & Computer Engineering Carnegie Mellon University February 1, 2010 Who Am I? S 10 L9-2 Associate

More information

CSE 141 Computer Architecture Summer Session I, Lectures 10 Advanced Topics, Memory Hierarchy and Cache. Pramod V. Argade

CSE 141 Computer Architecture Summer Session I, Lectures 10 Advanced Topics, Memory Hierarchy and Cache. Pramod V. Argade CSE 141 Compter Architectre Smmer Session I, 2004 Lectres 10 Advanced Topics, emory Hierarchy and Cache Pramod V. Argade CSE141: Introdction to Compter Architectre Instrctor: TA: Pramod V. Argade (p2argade@cs.csd.ed)

More information

4.13 Advanced Topic: An Introduction to Digital Design Using a Hardware Design Language 345.e1

4.13 Advanced Topic: An Introduction to Digital Design Using a Hardware Design Language 345.e1 .3 Advanced Topic: An Introdction to Digital Design Using a Hardware Design Langage 35.e.3 Advanced Topic: An Introdction to Digital Design Using a Hardware Design Langage to Describe and odel a Pipeline

More information

CC 311- Computer Architecture. The Processor - Control

CC 311- Computer Architecture. The Processor - Control CC 311- Computer Architecture The Processor - Control Control Unit Functions: Instruction code Control Unit Control Signals Select operations to be performed (ALU, read/write, etc.) Control data flow (multiplexor

More information

Full Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI

Full Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI CSCI 42: Computer Architectures The Processor (2) Fengguang Song Department of Computer & Information Science IUPUI Full Datapath Branch Target Instruction Fetch Immediate 4 Today s Contents We have looked

More information

Mark Redekopp and Gandhi Puvvada, All rights reserved. EE 357 Unit 15. Single-Cycle CPU Datapath and Control

Mark Redekopp and Gandhi Puvvada, All rights reserved. EE 357 Unit 15. Single-Cycle CPU Datapath and Control EE 37 Unit Single-Cycle CPU path and Control CPU Organization Scope We will build a CPU to implement our subset of the MIPS ISA Memory Reference Instructions: Load Word (LW) Store Word (SW) Arithmetic

More information

Department of Electrical and Computer Engineering The University of Texas at Austin

Department of Electrical and Computer Engineering The University of Texas at Austin Department of Electrical and Computer Engineering The University of Texas at Austin EE 360N, Spring 2003 Yale Patt, Instructor Hyesoon Kim, Onur Mutlu, Moinuddin Qureshi, Santhosh Srinath, TAs Exam 1,

More information

CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content

CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content 3/6/8 CSCI 42: Computer Architectures The Processor (2) Fengguang Song Department of Computer & Information Science IUPUI Today s Content We have looked at how to design a Data Path. 4.4, 4.5 We will design

More information

Design of Digital Circuits Lecture 15: Pipelining. Prof. Onur Mutlu ETH Zurich Spring April 2017

Design of Digital Circuits Lecture 15: Pipelining. Prof. Onur Mutlu ETH Zurich Spring April 2017 Design of Digital Circuits Lecture 5: Pipelining Prof. Onur Mutlu ETH Zurich Spring 27 3 April 27 Agenda for Today & Next Few Lectures! Single-cycle Microarchitectures! Multi-cycle and Microprogrammed

More information

Lecture 5: The Processor

Lecture 5: The Processor Lecture 5: The Processor CSCE 26 Computer Organization Instructor: Saraju P. ohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed from various books, websites, authors pages, and

More information

Review: Abstract Implementation View

Review: Abstract Implementation View Review: Abstract Implementation View Split memory (Harvard) model - single cycle operation Simplified to contain only the instructions: memory-reference instructions: lw, sw arithmetic-logical instructions:

More information

CPE 335 Computer Organization. Basic MIPS Architecture Part I

CPE 335 Computer Organization. Basic MIPS Architecture Part I CPE 335 Computer Organization Basic MIPS Architecture Part I Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides http://www.abandah.com/gheith/courses/cpe335_s8/index.html CPE232 Basic MIPS Architecture

More information

Inf2C - Computer Systems Lecture Processor Design Single Cycle

Inf2C - Computer Systems Lecture Processor Design Single Cycle Inf2C - Computer Systems Lecture 10-11 Processor Design Single Cycle Boris Grot School of Informatics University of Edinburgh Previous lectures Combinational circuits Combinations of gates (INV, AND, OR,

More information

Instruction Pipelining is the use of pipelining to allow more than one instruction to be in some stage of execution at the same time.

Instruction Pipelining is the use of pipelining to allow more than one instruction to be in some stage of execution at the same time. Pipelining Pipelining is the se of pipelining to allow more than one instrction to be in some stage of eection at the same time. Ferranti ATLAS (963): Pipelining redced the average time per instrction

More information

CS 152 Computer Architecture and Engineering Lecture 4 Pipelining

CS 152 Computer Architecture and Engineering Lecture 4 Pipelining CS 152 Computer rchitecture and Engineering Lecture 4 Pipelining 2014-1-30 John Lazzaro (not a prof - John is always OK) T: Eric Love www-inst.eecs.berkeley.edu/~cs152/ Play: 1 otorola 68000 Next week

More information

Winter 2013 MIDTERM TEST #2 Wednesday, March 20 7:00pm to 8:15pm. Please do not write your U of C ID number on this cover page.

Winter 2013 MIDTERM TEST #2 Wednesday, March 20 7:00pm to 8:15pm. Please do not write your U of C ID number on this cover page. page of 7 University of Calgary Departent of Electrical and Copter Engineering ENCM 369: Copter Organization Lectre Instrctors: Steve Noran and Nor Bartley Winter 23 MIDTERM TEST #2 Wednesday, March 2

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

MIPS-Lite Single-Cycle Control

MIPS-Lite Single-Cycle Control MIPS-Lite Single-Cycle Control COE68: Computer Organization and Architecture Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer Engineering Ryerson University Overview Single cycle

More information

CENG 3420 Lecture 06: Datapath

CENG 3420 Lecture 06: Datapath CENG 342 Lecture 6: Datapath Bei Yu byu@cse.cuhk.edu.hk CENG342 L6. Spring 27 The Processor: Datapath & Control q We're ready to look at an implementation of the MIPS q Simplified to contain only: memory-reference

More information

Improving Performance: Pipelining

Improving Performance: Pipelining Improving Performance: Pipelining Memory General registers Memory ID EXE MEM WB Instruction Fetch (includes PC increment) ID Instruction Decode + fetching values from general purpose registers EXE EXEcute

More information

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number

More information

Chapter 4. The Processor. Computer Architecture and IC Design Lab

Chapter 4. The Processor. Computer Architecture and IC Design Lab Chapter 4 The Processor Introduction CPU performance factors CPI Clock Cycle Time Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS

More information

CPE 335. Basic MIPS Architecture Part II

CPE 335. Basic MIPS Architecture Part II CPE 335 Computer Organization Basic MIPS Architecture Part II Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides http://www.abandah.com/gheith/courses/cpe335_s08/index.html CPE232 Basic MIPS Architecture

More information

CS2214 COMPUTER ARCHITECTURE & ORGANIZATION SPRING 2014

CS2214 COMPUTER ARCHITECTURE & ORGANIZATION SPRING 2014 CS COPTER ARCHITECTRE & ORGANIZATION SPRING DE : TA HOEWORK IV READ : i) Related portions of Chapter (except Sections. through.) ii) Related portions of Appendix A iii) Related portions of Appendix iv)

More information

Processor (I) - datapath & control. Hwansoo Han

Processor (I) - datapath & control. Hwansoo Han Processor (I) - datapath & control Hwansoo Han Introduction CPU performance factors Instruction count - Determined by ISA and compiler CPI and Cycle time - Determined by CPU hardware We will examine two

More information

CENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu

CENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu CENG 342 Computer Organization and Design Lecture 6: MIPS Processor - I Bei Yu CEG342 L6. Spring 26 The Processor: Datapath & Control q We're ready to look at an implementation of the MIPS q Simplified

More information

COMP303 - Computer Architecture Lecture 10. Multi-Cycle Design & Exceptions

COMP303 - Computer Architecture Lecture 10. Multi-Cycle Design & Exceptions COP33 - Computer Architecture Lecture ulti-cycle Design & Exceptions Single Cycle Datapath We designed a processor that requires one cycle per instruction RegDst busw 32 Clk RegWr Rd ux imm6 Rt 5 5 Rs

More information

Lecture 10 Multi-Cycle Implementation

Lecture 10 Multi-Cycle Implementation Lecture 10 ulti-cycle Implementation 1 Today s enu ulti-cycle machines Why multi-cycle? Comparative performance Physical and Logical Design of Datapath and Control icroprogramming 2 ulti-cycle Solution

More information

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor. COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction

More information

Department of Electrical and Computer Engineering The University of Texas at Austin

Department of Electrical and Computer Engineering The University of Texas at Austin Department of Electrical and Computer Engineering The University of Texas at Austin EE 360N, Fall 003 Yale Patt, Instructor Santhosh Srinath, Danny Lynch, TAs Exam 1, October 0, 003 Name: Problem 1 (0

More information

361 control.1. EECS 361 Computer Architecture Lecture 9: Designing Single Cycle Control

361 control.1. EECS 361 Computer Architecture Lecture 9: Designing Single Cycle Control 36 control. EECS 36 Computer Architecture Lecture 9: Designing Single Cycle Control Recap: The MIPS Subset ADD and subtract add rd, rs, rt sub rd, rs, rt OR Imm: ori rt, rs, imm6 3 3 26 2 6 op rs rt rd

More information

Department of Electrical and Computer Engineering The University of Texas at Austin

Department of Electrical and Computer Engineering The University of Texas at Austin Department of Electrical and Computer Engineering The University of Texas at Austin EE 60N, Fall 00 Yale Patt, Instructor Santhosh Srinath, Danny Lynch, TAs Exam, November 9, 00 Name: Problem (0 points):

More information

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 4 The Processor: A Based on P&H Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined

More information

Chapter 5 Solutions: For More Practice

Chapter 5 Solutions: For More Practice Chapter 5 Solutions: For More Practice 1 Chapter 5 Solutions: For More Practice 5.4 Fetching, reading registers, and writing the destination register takes a total of 300ps for both floating point add/subtract

More information